Discussion:
PDOS/86
(too old to reply)
muta...@gmail.com
2021-07-09 13:44:38 UTC
Permalink
I think PDOS/86 should probably only be compatible
with MSDOS at the PosWriteFile() etc level, just as
PDOS/386 is.

In particular, for allocating memory, I would like the
"allocate memory" call to return a seg:offs rather than
just a seg. In a flexible segment shift environment,
this would allow multiple requests to share a single
64k page, even if the segment shift is 16 bits.

But this introduces another issue. Callers have
traditionally cared about being properly aligned, but
segmentation introduces a requirement for the
caller to be able to request that the entire block
be within a single segment. Huge memory model
callers don't care about that, but everyone else
does, so some sort of indicator, or different
INT 21H function code would seem to be
appropriate.

BFN. Paul.
muta...@gmail.com
2021-07-11 05:23:48 UTC
Permalink
Post by ***@gmail.com
But this introduces another issue. Callers have
traditionally cared about being properly aligned, but
segmentation introduces a requirement for the
caller to be able to request that the entire block
be within a single segment. Huge memory model
callers don't care about that, but everyone else
does, so some sort of indicator, or different
INT 21H function code would seem to be
appropriate.
I think we have the following use cases:

1. Less than 64k memory requested, caller expecting
an offset of 0.

2. More than 64k memory requested, caller expecting
an offset of 0.

3. Less than 64k memory requested, caller can handle
a non-zero offset, but the memory block must be fully
contained within the segment returned.

4. Less than 64k memory requested, caller can handle
a block spanning a segment, but the offset must be 0.

5. More than 64k memory requested, caller can handle
both a non-zero offset and the memory spanning more
than 1 segment.

I think implementing 4 should be delayed until there is
an actual use case though.

5 is what huge memory model programs should be requesting,
to be nice.

3 is useful for non-huge memory model programs when
segment shifts are increased to 16 bits or whatever.

1 and 2 are the only thing MSDOS provides, and are
distinguished by the number of pages requested.

BFN. Paul.
muta...@gmail.com
2021-07-11 05:27:34 UTC
Permalink
Post by ***@gmail.com
4. Less than 64k memory requested, caller can handle
a block spanning a segment, but the offset must be 0.
I think implementing 4 should be delayed until there is
an actual use case though.
Strike number 4. The memory won't span a segment if
the offset is 0, so it's the same as 1.

BFN. Paul.
muta...@gmail.com
2021-07-11 06:04:59 UTC
Permalink
Post by ***@gmail.com
5. More than 64k memory requested, caller can handle
both a non-zero offset and the memory spanning more
than 1 segment.
More than 64k of memory spans a segment by definition,
so that can be reduced to:

5. More than 64k memory requested, caller can handle
a non-zero offset.

Therefore the only thing required is whether you can handle
a non-zero offset.

We just need another allocate-memory call that takes a
32-bit size and a flag to say whether it needs a 0 offset
(ie be on a segment boundary).

BFN. Paul.
wolfgang kern
2021-07-11 06:57:39 UTC
Permalink
On 11.07.2021 08:04, ***@gmail.com wrote:
...
Post by ***@gmail.com
We just need another allocate-memory call that takes a
32-bit size and a flag to say whether it needs a 0 offset
(ie be on a segment boundary).
while you have your own OS (even a fictive one yet) then
you can use just one single GETMEM variant like I use in
my OS:

getmem:

called by RM16 or PM32 entry points or INT7F for both

input:AL=mode 0=seg:offset 1=32bit flat
ECX= requested size in bytes

returns:
Carry+Z size not available, Cy+NZ no more handles
No Carry: means no error, then
BX=handle mode0 EDI=flat start
mode1 ES:DI = start

and yes, the flat mode works for RM16(unreal) as well.
__
wolfgang
muta...@gmail.com
2021-07-11 07:38:44 UTC
Permalink
Post by wolfgang kern
...
Post by ***@gmail.com
We just need another allocate-memory call that takes a
32-bit size and a flag to say whether it needs a 0 offset
(ie be on a segment boundary).
while you have your own OS (even a fictive one yet) then
PDOS/86 is not fictional. It's been available for decades
and predates PDOS/386.
Post by wolfgang kern
you can use just one single GETMEM variant like I use in
called by RM16 or PM32 entry points or INT7F for both
input:AL=mode 0=seg:offset 1=32bit flat
ECX= requested size in bytes
Carry+Z size not available, Cy+NZ no more handles
No Carry: means no error, then
BX=handle mode0 EDI=flat start
mode1 ES:DI = start
and yes, the flat mode works for RM16(unreal) as well.
I'm using normal RM16, not unreal.

I want a solution that works on the 8086.

BFN. Paul.
wolfgang kern
2021-07-11 14:13:40 UTC
Permalink
On 11.07.2021 09:38, ***@gmail.com wrote:
...
Post by ***@gmail.com
I'm using normal RM16, not unreal.
I want a solution that works on the 8086.
you posted an alloc option >64KB. even possible with
two merged 16 bit registers you'll break your nails
on the required overhead for consecutive blocks fit.

OK I see, you want an 8086 with 32 bit addressing :)
__
wolfgang
Branimir Maksimovic
2021-07-11 14:28:46 UTC
Permalink
Post by wolfgang kern
...
Post by ***@gmail.com
I'm using normal RM16, not unreal.
I want a solution that works on the 8086.
you posted an alloc option >64KB. even possible with
two merged 16 bit registers you'll break your nails
on the required overhead for consecutive blocks fit.
OK I see, you want an 8086 with 32 bit addressing :)
Heh my T27 emulator used 64kb for code and 64kb for data.
it is full featured Unisys terminal, with features like
web browsers nowadays :P
Post by wolfgang kern
__
wolfgang
--
bmaxa now listens Drowned in Fear by Graveworm from Engraved in Black
Scott Lurndal
2021-07-11 17:32:02 UTC
Permalink
Post by Branimir Maksimovic
Post by wolfgang kern
...
Post by ***@gmail.com
I'm using normal RM16, not unreal.
I want a solution that works on the 8086.
you posted an alloc option >64KB. even possible with
two merged 16 bit registers you'll break your nails
on the required overhead for consecutive blocks fit.
OK I see, you want an 8086 with 32 bit addressing :)
Heh my T27 emulator used 64kb for code and 64kb for data.
it is full featured Unisys terminal, with features like
web browsers nowadays :P
I have written a T27 emulator in C++/X11 for my V-series
emulator. I don't know that including a Web browser
in the emulator rather than using a dedicated browser
on the same host running the emulator is the most optimal solution.

What do you connect yours too? MCP Express?
Branimir Maksimovic
2021-07-11 18:34:26 UTC
Permalink
Post by Scott Lurndal
Post by Branimir Maksimovic
Post by wolfgang kern
...
Post by ***@gmail.com
I'm using normal RM16, not unreal.
I want a solution that works on the 8086.
you posted an alloc option >64KB. even possible with
two merged 16 bit registers you'll break your nails
on the required overhead for consecutive blocks fit.
OK I see, you want an 8086 with 32 bit addressing :)
Heh my T27 emulator used 64kb for code and 64kb for data.
it is full featured Unisys terminal, with features like
web browsers nowadays :P
I have written a T27 emulator in C++/X11 for my V-series
emulator. I don't know that including a Web browser
in the emulator rather than using a dedicated browser
on the same host running the emulator is the most optimal solution.
What do you connect yours too? MCP Express?
This was written for DOS to replace real T27 connected
to Unisis machine via serial port :P
Unisys price for it was 800$ :P
--
bmaxa now listens Satanic Propaganda (S.N.T.F.) by Diabolos Rising from Blood Vampirism And Sadism
muta...@gmail.com
2021-07-12 01:30:39 UTC
Permalink
Post by wolfgang kern
...
Post by ***@gmail.com
I'm using normal RM16, not unreal.
I want a solution that works on the 8086.
you posted an alloc option >64KB. even possible with
Note that MSDOS already provides this, and things like
executables require that, since executables can be
greater than 64k.
Post by wolfgang kern
two merged 16 bit registers you'll break your nails
on the required overhead for consecutive blocks fit.
Can you elaborate on this? PDOS/86 huge memory
model is still only theoretical. I am determined to
make this code:

https://sourceforge.net/p/pdos/gitcode/ci/master/tree/pdpclib/__memmgr.c

which was a general-purpose memory-manager for
use on the original MVS/380 (where the OS wouldn't
manage memory above 16 MiB for you - the app had
to do that itself), work as the memory manager for
PDOS/86, no matter how many gerbils need to die.
Post by wolfgang kern
OK I see, you want an 8086 with 32 bit addressing :)
I want an 8086 with 20-bit addressing, an 80286 with
24-bit addressing, and a theoretical x86 processor with
32-bit addressing. Come to think of it, it might be possible
to use an actual 80386 to do effective 16-bit segment
shifts. Or surely I can at least match the 80286 and do
(effective) 8-bit shifts. That would be a load of fun.
I guess it depends how many selectors I can define on
the 80386. I'll run everything in supervisor mode, so I
can use both GDT and LDT if that helps.

BFN. Paul.
muta...@gmail.com
2021-07-12 02:05:17 UTC
Permalink
Post by ***@gmail.com
shifts. Or surely I can at least match the 80286 and do
(effective) 8-bit shifts. That would be a load of fun.
I guess it depends how many selectors I can define on
the 80386. I'll run everything in supervisor mode, so I
can use both GDT and LDT if that helps.
And if I can get the 8086 to trap and ignore db'66' and
db'67' I will be able to have 16-bit executables that work
on either shift value.

And since I want the OS to return to real mode to do
BIOS calls, the 80386 is probably much better than the
80286 for this.

But I believe the 80286 can do it anyway, with sufficient
hardware support, and that would be a fun thing to do
as well.

BFN. Paul.
wolfgang kern
2021-07-12 08:52:59 UTC
Permalink
Post by ***@gmail.com
Post by ***@gmail.com
shifts. Or surely I can at least match the 80286 and do
(effective) 8-bit shifts. That would be a load of fun.
I guess it depends how many selectors I can define on
the 80386. I'll run everything in supervisor mode, so I
can use both GDT and LDT if that helps.
so why don't you stick to 386 and forget no more existing
old crap ?
Post by ***@gmail.com
And if I can get the 8086 to trap and ignore db'66' and
db'67' I will be able to have 16-bit executables that work
on either shift value.
Now that's a really bad idea. compiled code may look like:

66 b8 44 33 22 11 MOV eax,imm32
67 03 84 11 55 44 33 22 ADD ax,[ecx+edx+d32]

so guess what's left w/o these prefixes:

b8 44 33 MOV ax,imm16
22 11 AND dl,[bx+di]

03 84 11 55 ADD ax,[si+d16]
44 INC sp
33 22 XOR sp,[[bp+si]

But IIRC 66 and 67 were ignored anyway on real 86/88.
Post by ***@gmail.com
And since I want the OS to return to real mode to do
BIOS calls, the 80386 is probably much better than the
80286 for this.
yes.
Post by ***@gmail.com
But I believe the 80286 can do it anyway, with sufficient
hardware support, and that would be a fun thing to do
as well.
no. I wont recommend to use a solder iron on CMOS CPUs.
__
wolfgang
muta...@gmail.com
2021-07-12 11:15:49 UTC
Permalink
Post by wolfgang kern
Post by ***@gmail.com
Post by ***@gmail.com
shifts. Or surely I can at least match the 80286 and do
(effective) 8-bit shifts. That would be a load of fun.
I guess it depends how many selectors I can define on
the 80386. I'll run everything in supervisor mode, so I
can use both GDT and LDT if that helps.
so why don't you stick to 386 and forget no more existing
old crap ?
I would like to have a solution to the 80286 too.
Post by wolfgang kern
Post by ***@gmail.com
And if I can get the 8086 to trap and ignore db'66' and
db'67' I will be able to have 16-bit executables that work
on either shift value.
66 b8 44 33 22 11 MOV eax,imm32
No, I won't use 32-bit instructions.
Post by wolfgang kern
67 03 84 11 55 44 33 22 ADD ax,[ecx+edx+d32]
b8 44 33 MOV ax,imm16
22 11 AND dl,[bx+di]
If I just have bb 44 33, will it work on both the 8086
and the 80386?
Post by wolfgang kern
But IIRC 66 and 67 were ignored anyway on real 86/88.
Well that's fantastic then.

Unless I'm missing something. Something that the
compiler can't work around.

BFN. Paul.
wolfgang kern
2021-07-13 03:05:21 UTC
Permalink
Post by ***@gmail.com
Post by wolfgang kern
so why don't you stick to 386 and forget no more existing
old crap ?
I would like to have a solution to the 80286 too.
you wont find a working 286 anymore, not even in the museum.
and return from PM needs hardware to react on forced crash.
Post by ***@gmail.com
Post by wolfgang kern
Post by ***@gmail.com
And if I can get the 8086 to trap and ignore db'66' and
db'67' I will be able to have 16-bit executables that work
on either shift value.
66 b8 44 33 22 11 MOV eax,imm32
No, I won't use 32-bit instructions.
with the prefix it is still a valid (386) 16 bit instruction!
and many applications may use overrides, BIOS also do it.
Post by ***@gmail.com
If I just have bb 44 33, will it work on both the 8086
and the 80386?
Yes as long 386 is in RM or PM16, but it become a five byte
opcode in PM32:
bb 44 33 22 11 MOV ebx.imm32
or a four byte:
66 bb 44 33 MOV bx.imm16
__
wolfgang
muta...@gmail.com
2021-07-13 03:28:17 UTC
Permalink
Post by wolfgang kern
Post by ***@gmail.com
Post by wolfgang kern
so why don't you stick to 386 and forget no more existing
old crap ?
I would like to have a solution to the 80286 too.
you wont find a working 286 anymore, not even in the museum.
On a phone call at work about 2 weeks ago, someone
was bragging that they had an AT.

On discord about 3 weeks ago, someone said they had
brushed off an old XT and wanted to do OS programming
on it and thus wanted a 16-bit compiler, at least ideally.
Post by wolfgang kern
and return from PM needs hardware to react on forced crash.
Ok.
Post by wolfgang kern
Post by ***@gmail.com
Post by wolfgang kern
Post by ***@gmail.com
And if I can get the 8086 to trap and ignore db'66' and
db'67' I will be able to have 16-bit executables that work
on either shift value.
66 b8 44 33 22 11 MOV eax,imm32
No, I won't use 32-bit instructions.
with the prefix it is still a valid (386) 16 bit instruction!
Ok, I'm after instructions that work on both an 80386 in
protected mode (adding a x'66' is fine) and an 8086
(with or without the ignored x'66').

Basically the intention is to get the C compiler to generate
16-bit instructions that work in either environment, adding
an x'66' when it knows it needs to.

Is there enough instructions that will work this way?
Post by wolfgang kern
and many applications may use overrides, BIOS also do it.
Ok, the BIOS can be dedicated 80386 or dedicated 8086,
I don't care about that.

For now I only care about applications.
Post by wolfgang kern
Post by ***@gmail.com
If I just have bb 44 33, will it work on both the 8086
and the 80386?
Yes as long 386 is in RM or PM16,
No, I don't want that. I want PM32.
Post by wolfgang kern
but it become a five byte
bb 44 33 22 11 MOV ebx.imm32
66 bb 44 33 MOV bx.imm16
This 4 byte one looks like it will work on both PM32
and RM16.

That's exactly what I'm after.

It will all work, right?

Thanks. Paul.
wolfgang kern
2021-07-13 03:39:33 UTC
Permalink
On 13.07.2021 05:28, ***@gmail.com wrote:
...
Post by ***@gmail.com
Post by wolfgang kern
Post by ***@gmail.com
No, I won't use 32-bit instructions.
with the prefix it is still a valid (386) 16 bit instruction!
Ok, I'm after instructions that work on both an 80386 in
protected mode (adding a x'66' is fine) and an 8086
(with or without the ignored x'66').
Basically the intention is to get the C compiler to generate
16-bit instructions that work in either environment, adding
an x'66' when it knows it needs to.
Is there enough instructions that will work this way?
Not much.
Post by ***@gmail.com
Post by wolfgang kern
Post by ***@gmail.com
If I just have bb 44 33, will it work on both the 8086
and the 80386?
Yes as long 386 is in RM or PM16,
No, I don't want that. I want PM32.
Post by wolfgang kern
but it become a five byte
bb 44 33 22 11 MOV ebx.imm32
66 bb 44 33 MOV bx.imm16
This 4 byte one looks like it will work on both PM32
and RM16.
That's exactly what I'm after.
It will all work, right?
No, the 66 override reverses the default operand size.
so it makes 16 bit within PM32 and 32 bit within PM16/RM.
__
wolfgang
Frank Kotler
2021-07-13 04:11:42 UTC
Permalink
...
Post by wolfgang kern
Post by ***@gmail.com
It will all work, right?
No, the 66 override reverses the default operand size.
so it makes 16 bit within PM32 and 32 bit within PM16/RM.
__
wolfgang
Hi Wolfgang,

Hope you are well! I am good. Old and tired, but I feel pretty
good.

You may remember John Fine, who used to write for the assembly language
groups. He wrote up differences between real and protected mode and how
it affected segmentation... I think it might help Paul understand why
what he wants won't work.

Found it!

http://www.oldlinux.org/Linux.old/study/sabre/os/files/Protect

Best,
Frank
Frank Kotler
2021-07-13 04:24:38 UTC
Permalink
Post by Frank Kotler
http://www.oldlinux.org/Linux.old/study/sabre/os/files/Protect
Shift! 404

http://www.oldlinux.org/Linux.old/study/sabre/os/files/Protect

I think that's the same. Search for "john fine segment"

Best,
Frank
wolfgang kern
2021-07-13 05:25:33 UTC
Permalink
Post by Frank Kotler
Hi Wolfgang,
Hope you are well! I am good. Old and tired, but I feel pretty
good.
good to see you're in good shape, 77 isn't that old :)
I'm almost fine thanks to my oxygen generator...
and I will get a new set of eyes, already scheduled for next week.
hope I can avoid the white crane after scientist gadget around on me.
__
wolfgang
Post by Frank Kotler
You may remember John Fine, who used to write for the assembly language
groups. He wrote up differences between real and protected mode and how
it affected segmentation... I think it might help Paul understand why
what he wants won't work.
Found it!
http://www.oldlinux.org/Linux.old/study/sabre/os/files/Protect
Best,
Frank
muta...@gmail.com
2021-07-13 04:34:21 UTC
Permalink
Post by wolfgang kern
Post by ***@gmail.com
Post by wolfgang kern
but it become a five byte
bb 44 33 22 11 MOV ebx.imm32
66 bb 44 33 MOV bx.imm16
This 4 byte one looks like it will work on both PM32
and RM16.
That's exactly what I'm after.
It will all work, right?
No, the 66 override reverses the default operand size.
so it makes 16 bit within PM32 and 32 bit within PM16/RM.
No. On a real 8086. The x'66' will be ignored. So a real
8086 and PM32 will both work, right?

BFN. Paul.
wolfgang kern
2021-07-13 05:15:58 UTC
Permalink
Post by ***@gmail.com
Post by wolfgang kern
Post by ***@gmail.com
Post by wolfgang kern
but it become a five byte
bb 44 33 22 11 MOV ebx.imm32
66 bb 44 33 MOV bx.imm16
This 4 byte one looks like it will work on both PM32
and RM16.
That's exactly what I'm after.
It will all work, right?
No, the 66 override reverses the default operand size.
so it makes 16 bit within PM32 and 32 bit within PM16/RM.
No. On a real 8086. The x'66' will be ignored. So a real
8086 and PM32 will both work, right?
Not sure why you want a 16 bit operand in PM32. the 66 "may"
be ignored by 8086/88, could be an alias for other as well.

If you want to always use 16 bit code then use PM16.
only but all segment registers are different in PM16.

AND with PM16 you can use 64KB blocks anywhere within 4GB.
__
wolfgang
muta...@gmail.com
2021-07-13 08:05:17 UTC
Permalink
Post by wolfgang kern
Post by ***@gmail.com
No. On a real 8086. The x'66' will be ignored. So a real
8086 and PM32 will both work, right?
Not sure why you want a 16 bit operand in PM32.
I'm trying to do 16-bit programming with a bigger
EFFECTIVE shift than 4 bits.
Post by wolfgang kern
the 66 "may"
be ignored by 8086/88, could be an alias for other as well.
How can it be an alias on an 8086?
Post by wolfgang kern
If you want to always use 16 bit code then use PM16.
You know what? That's a great idea! I didn't think of the
murky PM16 world.
Post by wolfgang kern
only but all segment registers are different in PM16.
They are selectors. But I can make them look like
segment registers with a particular shift.
Post by wolfgang kern
AND with PM16 you can use 64KB blocks anywhere within 4GB.
I assume I still have 16384 selectors, so I will still be limited
to a total of 1 GiB.

BFN. Paul.
wolfgang kern
2021-07-13 15:30:05 UTC
Permalink
Post by ***@gmail.com
Post by wolfgang kern
Post by ***@gmail.com
No. On a real 8086. The x'66' will be ignored. So a real
8086 and PM32 will both work, right?
Not sure why you want a 16 bit operand in PM32.
I'm trying to do 16-bit programming with a bigger
EFFECTIVE shift than 4 bits.
Post by wolfgang kern
the 66 "may"
be ignored by 8086/88, could be an alias for other as well.
How can it be an alias on an 8086?
would need to remove several layers of dust from my collection
of Intel books to check if it is mentioned at all.
the code 0x60..group could be alias for 0x70.. or 0x50..
similar to 0x82 which is (was until 2000) an alias for 0x80
Post by ***@gmail.com
Post by wolfgang kern
If you want to always use 16 bit code then use PM16.
You know what? That's a great idea! I didn't think of the
murky PM16 world.
Post by wolfgang kern
only but all segment registers are different in PM16.
They are selectors. But I can make them look like
segment registers with a particular shift.
Shift wont do what you want, look at the layout of descriptors.
Post by ***@gmail.com
Post by wolfgang kern
AND with PM16 you can use 64KB blocks anywhere within 4GB.
I assume I still have 16384 selectors, so I will still be limited
to a total of 1 GiB.
No you can have maximal 8191 selectors and you'll need CS-DS pairs.
so 4095*64KB ~ 512MB.
But try what older EMM did: have only a few variable selectors
and put your address bits extension right into the start field.

you could even define this start-addresses as C-variables..
oh, did I say this yet? :)
__
wolfgang
muta...@gmail.com
2021-07-13 22:32:51 UTC
Permalink
Post by wolfgang kern
Post by ***@gmail.com
You know what? That's a great idea! I didn't think of the
murky PM16 world.
Post by wolfgang kern
only but all segment registers are different in PM16.
They are selectors. But I can make them look like
segment registers with a particular shift.
Shift wont do what you want, look at the layout of descriptors.
I don't know what you are talking about.

Let me just restate my goal.

With a fixed GDT/LDT maxed out, and (for argument's
sake) a machine with 1 GiB of memory, I want the
equivalent of an 8086 (4-bit shift) except I want
EFFECTIVE 16-bit shifts. What that means is that
every time you do the appropriate adjustment of
the "segment" register, you get to the next 64k
boundary. As opposed to when you do the appropriate
adjustment of the segment register you get to the
next 16-byte boundary.

On the 8086, the adjustment you do on the segment
register to get to the next boundary is to add 1.
Note that applications on the 8086 do not normally
adjust the segment register in that way at all, at
least not in C-generated code. The only place where
C-generated code does this is when using the rare
huge memory model. Or the equivalent of using
huge pointers.

I have looked at the code generated by Watcom's C
compiler for the huge memory model, and when you
add a 32-bit value to a pointer in the huge memory
model, it calls a routine to do the addition.

I have my own C library. I don't use Watcom's. I only
use their compiler. So currently I don't actually support
huge memory model. I need to write that routine.

How do I write that routine?

Let's say that I have an address of 4000:0000 and I
have been asked to add 64k to it. I know I need to
jump up to the next 64k boundary. Which would
mean adding x'1000' to the above address to get
5000:0000.

I could hardcode that value x'1000'. Actually the
pointers will all be normalized, so I am looking for
multiples of 16, ie 1. I could hardcode both of those
values - 16 and 1. ie divide by 16 and add 1 times
that amount.

Rather than hardcode those numbers, I want to get
them from global variables in my C program.

And I want those global variables to be set by doing a
PDOS/86 INT 21H call.

On an 80286 or 80386, if I linearly map the memory
with selectors, the above 16-bit application will continue
to run, completely unchanged. The only thing that needs
to change is the 16 and 1. On an 80386 the appropriate
selectors are all spaced 0x20 apart, so the 1 will become
0x20. Regardless of whether I am using a 16-bit shift or
a 15-bit shift etc. The only thing that changes is the
divisor. In the case of a 16-bit shift, the divisor will be
0x10000. In the case of a 15-bit shift, the divisor will be
0x8000. ie for every multiple of 0x8000 that you need
to increase the boundary that the segment is pointing
to, adjust the segment register (selector) by a standard
amount (0x20).
Post by wolfgang kern
Post by ***@gmail.com
Post by wolfgang kern
AND with PM16 you can use 64KB blocks anywhere within 4GB.
I assume I still have 16384 selectors, so I will still be limited
to a total of 1 GiB.
No you can have maximal 8191 selectors and you'll need CS-DS pairs.
so 4095*64KB ~ 512MB.
That's 256 MiB.

And that's a reason to get x'66' working on the 80386 and
8086, so that I can have 1 GiB instead. Although the x'66'
may add say 5% to the program's footprint. And impact
the 8086 too.
Post by wolfgang kern
But try what older EMM did: have only a few variable selectors
and put your address bits extension right into the start field.
you could even define this start-addresses as C-variables..
oh, did I say this yet? :)
I don't understand this proposal, but will it allow 8086
(with ignored x'66') huge memory model programs to
run unchanged on the 80386?

Thanks. Paul.
wolfgang kern
2021-07-14 05:28:51 UTC
Permalink
Post by ***@gmail.com
Post by wolfgang kern
Post by ***@gmail.com
I assume I still have 16384 selectors, so I will still be limited
to a total of 1 GiB.
No you can have maximal 8191 selectors and you'll need CS-DS pairs.
so 4095*64KB ~ 512MB.
That's 256 MiB.
Yeah, test passed.
Post by ***@gmail.com
And that's a reason to get x'66' working on the 80386 and
8086, so that I can have 1 GiB instead. Although the x'66'
may add say 5% to the program's footprint. And impact
the 8086 too.
Post by wolfgang kern
But try what older EMM did: have only a few variable selectors
and put your address bits extension right into the start field.
you could even define this start-addresses as C-variables..
oh, did I say this yet? :)
I don't understand this proposal, but will it allow 8086
(with ignored x'66') huge memory model programs to
run unchanged on the 80386?
I try yet to show you how descriptors look like:
(copy as text, undo linewrap, rename to HTML and watch)
__
wolfgang

<!--
DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Strict//EN"
"http://www.w3.org/TR/html4/strict.dtd"
-->
<html>
<head><title>x86descriptors</title></head>

<!--
translated from page 366 of the "Holy Book of KESYS" Jan.1999,
added x86-64 descriptors 2004.
Author: Wolfgang Kern, Vienna Austria (LEOC, KESYS-development)
-->

<body bgcolor="#c0c0c0" text="#000000">
<basefont face="Lucida Console">

<p style="position:absolute; left:70pt; top:05pt; font-size:12pt;">
<u><b>x86 PM16/32 Descriptors</b></u><p>

<table style="position:absolute; left:70pt; top:40pt; font-size:8pt;"
border="2"; frame="box"; rules="all"; bgcolor="#FFFFFF"; height="200";
width="380"; cellspacing="0"; cellpadding="0"; bordercolor="#000000"; >
<colgroup>
<col width="8">
<col width="16" span="8">
<col width="200">
</colgroup>
<tr align="center"; style="font-size:7pt;" > <td></td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>

<td style="font-size:10pt"; align="left" valign="top" rowspan="9"> <b><u>
DATA [GDT,LDT]</b></u><pre>
G 4Kb granular limit
B 32-bit stack
0 MBZ
P present
E expand down [stack]
W writable
A accessed
[93]</pre></td></tr>

<tr align="center"><td>7</td><td colspan="8">BASE 24..31</td></tr>
<tr align="center"><td>6</td>
<td><b>G</td><td><b>B</td><td><b>0</td><td>x</td>
<td colspan="4">LIM 16..19</td></tr>
<tr align="center"><td>5</td><td>P</td><td colspan="2">DPL</td>
<td colspan="2" style="border-width:medium; border-color:#000000;
border-style:double;">
<b>1 0</td> <td><b>E</td> <td><b>W</td> <td><b>A</td></tr>
<tr align="center"><td>4</td> <td rowspan="3" colspan="8">BASE
0..23</td></tr>
<tr align="center"><td>3</td></tr>
<tr align="center"><td>2</td></tr>
<tr align="center"><td>1</td> <td rowspan="2"; colspan="8">LIMIT
0..15</td></tr>
<tr align="center"><td>0</td></tr>
</table>


<table style="position:absolute; left:380pt; top:40pt;
font-size:8pt;"border="2"; frame="box"; rules="all"; bgcolor="#FFFFFF";
height="200"; width="380"; cellspacing="0"; cellpadding="0";
bordercolor="#808080"; >
<colgroup>
<col width="8">
<col width="16" span="8">
<col width="200">
</colgroup>
<tr style="font-size:7pt;" align="center"> <td></td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"><b><u>
CODE [GDT,LDT]</u></b><pre>
G 4Kb granular
B 32-bit
0 MBZ
P present
C confirming
R readable
A accessed
[9b]</pre></td></tr>
<tr align="center"> <td>7</td> <td colspan="8" align="center">BASE
24..31</td> </tr>
<tr align="center"> <td>6</td> <td><b>G</td> <td><b>B</td> <td><b>0</td>
<td>x</td>
<td nowrap colspan="4">LIM 16..19</td> </tr>
<tr align="center"> <td>5</td> <td>P</td> <td colspan="2">DPL</td> <td
colspan="2"
style="border-width:medium;border-color:#000000; border-style:double;
padding:0px;">
<b>1 1</td> <td><b>C</td> <td><b>R</td> <td><b>A</td> </tr>
<tr align="center"> <td>4</td> <td rowspan="3" colspan="8">BASE
0..23</td> </tr>
<tr align="center"> <td>3</td></tr>
<tr align="center"> <td>2</td></tr>
<tr align="center"> <td>1</td> <td rowspan="2" colspan="8">LIMIT
0..15</td> </tr>
<tr align="center"> <td>0</td></tr>
</table>


<table style="position:absolute; left:70pt; top:200pt;
font-size:8pt;"border="2" frame="box" rules="all" bgcolor="#ffffff"
height="200" width="380" cellspacing="0" cellpadding="0"
bordercolor="#808080">
<colgroup>
<col width="8">
<col width="16" span=8>
<col width="200">
</colgroup>
<tr style="font-size:7pt;" align="center"> <td></td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"><u><b>
INT-GATE [IDT]</b></u><pre>
T 32-bit
disables IRQ,
TRAP and NT cleared
until IRET
[86/8e]</pre></td></tr>
<tr align="center"> <td>7</td>
<td colspan="8" rowspan="2" align="center" valign="middle">Offset
16..31</td></tr>
<tr align="center"> <td>6</td></tr>
<tr align="center"> <td>5</td> <td>P</td> <td colspan="2">DPL</td>
<td
style="border-color:#000000;border-width:medium;border-style:double"><b>0</td>
<td><b>T</td>
<td colspan="3"
style="border-color:#000000;border-width:medium;border-style:double">
<b>1 1 0</td></tr>
<tr align="center"> <td>4</td> <td colspan="8">reserved</td> </tr>
<tr align="center"> <td>3</td> <td colspan="8">SEGMENT-</td> </tr>
<tr align="center"> <td>2</td> <td colspan="5">SELECTOR</td> <td>x</td>
<td colspan="2">RPL</td> </tr>
<tr align="center"> <td>1</td> <td rowspan="2" colspan="8">Offset
0..15</td> </tr>
<tr align="center"> <td>0</td></tr>
</table>


<table style="position:absolute; left:380pt; top:200pt;
font-size:8pt;"border="2" frame="box" rules="all" bgcolor="#ffffff"
height="200" width="380" cellspacing="0" cellpadding="0"
bordercolor="#808080">
<colgroup>
<col width="8">
<col width="16"span=8>
<col width="200">
</colgroup>
<tr style="font-size:7pt;"align="center";>
<td></td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"><u><b>
INT-TRAP [IDT]</b></u><pre>
T 32-bit
IRQ-status unchanged,
TRAP and NT cleared
until IRET
[87/8f]</pre></td></tr>
<tr align="center"> <td>7</td> <td rowspan="2" colspan="8">Offset
16..31</td></tr>
<tr align="center"> <td>6</td> </tr>
<tr align="center"> <td>5</td> <td>P</td> <td colspan="2">DPL</td>
<td
style="border-color:#000000;border-width:medium;border-style:double"><b>0</td>
<td><b>T</td>
<td colspan="3"
style="border-color:#000000;border-width:medium;border-style:double">
<b>1 1 1</td></tr>
<tr align="center"> <td>4</td> <td colspan="8">reserved</td></tr>
<tr align="center"> <td>3</td> <td colspan="8">SEGMENT-</td></tr>
<tr align="center"> <td>2</td> <td colspan="5">SELECTOR</td><td>x</td>
<td colspan="2">RPL</td> </tr>
<tr align="center"> <td>1</td> <td rowspan="2" colspan="8">Offset
0..15</td> </tr>
<tr align="center"> <td>0</td></tr> </table>


<table style="position:absolute; left:70pt; top:360pt;
font-size:8pt;"border="2" frame="box" rules="all" bgcolor="#ffffff"
height="200" width="380" cellspacing="0" cellpadding="0"
bordercolor="#808080">
<colgroup>
<col width="8">
<col width="16" span=8>
<col width="200">
</colgroup>
<tr style="font-size:7pt"; align="center";> <td></td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"><b><u>
TASK-switch [GDT]</u></b><pre>
G 4Kb granular limit
P present
T 32-bit
BS task is busy
[81/89]</pre></td></tr>
<tr align="center"> <td>7</td><td colspan="8">BASE 24..31</td></tr>
<tr align="center">
<td>6</td><td><b>G</td><td><b>0</td><td><b>0</td><td>x</td>
<td nowrap colspan="4">LIM 16..19</td></tr>
<tr align="center"><td>5</td><td>P</td><td colspan="2">DPL</td>
<td
style="border-color:#000000;border-width:medium;border-style:double"><b>0</td>
<td><b>T</td>
<td colspan= "3"
style="border-color:#000000;border-width:medium;border-style:double">
<b>0 BS 1</td></tr>
<tr align="center"><td>4</td> <td rowspan="3" colspan="8">BASE
0..23</td> </tr>
<tr align="center"><td>3</td></tr>
<tr align="center"><td>2</td></tr>
<tr align="center"><td>1</td> <td rowspan="2" colspan="8">LIMIT
0..15</td> </tr>
<tr align="center"><td>0</td></tr></table>


<table style="position:absolute; left:380pt; top:360pt;
font-size:8pt;"border="2" frame="box" rules="all" bgcolor="#ffffff"
height="200" width="380" cellspacing="0" cellpadding="0"
bordercolor="#808080">
<colgroup>
<col width="8">
<col width="16" span=8>
<col width="200">
</colgroup>
<tr style="font-size:7pt"; align="center";> <td></td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"><b><u>
LDT [GDT]</b></u> <pre>
0 and reserved: MBZ
[82]</pre></td></tr>
<tr align="center"> <td>7</td><td colspan="8">BASE 24..31</td></tr>
<tr align="center"> <td>6</td> <td><b>G</td> <td><b>x</td>
<td><b>x</td> <td>x</td>
<td colspan="4">LIM 16..19</td></tr>
<tr align="center"> <td>5</td> <td>P</td> <td colspan="2">DPL</td>
<td colspan="5" style="border-width:medium; border-color:#000000;
border-style:double;">
<b>0 0 0 1 0</td></tr>
<tr align="center"> <td>4</td> <td rowspan="3" colspan="8">BASE
0..23</td></tr>
<tr> <td>3</td></tr><tr><td>2</td></tr>
<tr align="center"> <td>1</td> <td rowspan="2" colspan="8">LIMIT
0..15</td></tr>
<tr align="center"> <td>0</td></tr>
</table>


<table style="position:absolute; left:70pt; top:520pt;
font-size:8pt;"border="2" frame="box" rules="all" bgcolor="#ffffff"
height="200" width="380" cellspacing="0" cellpadding="0"
bordercolor="#808080">
<colgroup>
<col width="8">
<col width="16" align="middle" span=8>
<col width="200" align="left" valign="top">
</colgroup>
<tr style="font-size:7pt"; align="center";> <td></td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"><b><u>
Call-GATE [GDT,LDT]</b></u><pre>
T 32-bit
L LDT (else GDT)
Dwords copied from
callers stack.
[84/8c]</pre></td></tr>
<tr align="center"> <td>7</td> <td rowspan="2" colspan="8">Offset
16..31</td></tr>
<tr align="center"> <td>6</td> </tr>
<tr align="center"> <td>5</td> <td>P</td> <td colspan="2">DPL</td>
<td style="border-color:#000000;border-width:medium;border-style:double">
<b>0</td> <td><b>T</td>
<td colspan= "3"
style="border-color:#000000;border-width:medium;border-style:double">
<b>1 0 0</td></tr>
<tr align="center"> <td>4</td> <td colspan="4">0 0 0 0</td>
<td colspan="4">Dwords </td> </tr>
<tr align="center"> <td>3</td> <td colspan="8">SEGMENT-</td> </tr>
<tr align="center"> <td>2</td> <td colspan="5">SELECTOR</td> <td>L</td>
<td colspan="2">RPL</td> </tr>
<tr align="center"> <td>1</td> <td rowspan="2" colspan="8">Offset
0..15</td></tr>
<tr align="center"><td>0</td></tr></table>


<table style="position:absolute; left:380pt; top:520pt;
font-size:8pt;"border="2" frame="box" rules="all" bgcolor="#ffffff"
height="200" width="380" cellspacing="0" cellpadding="0"
bordercolor="#808080">
<colgroup>
<col width="8">
<col width="16" align="middle" span=8>
<col width="200" align="left" valign="top">
</colgroup>
<tr style="font-size:7pt"; align="center";> <td></td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"><b><u>
TASK-GATE [GDT,IDT,LDT]</b></u><pre>
T: 32-bit
[85/8d]</pre></td></tr>
<tr align="center"> <td>7</td> <td rowspan="2" colspan="8">reserved</td>
<tr align="center"> <td>6</td> </tr>
<tr align="center"> <td>5</td> <td>P</td> <td colspan="2">DPL</td>
<td
style="border-color:#000000;border-width:medium;border-style:double">
<b>0</td>
<td><b>T</td>
<td colspan= "3"
style="border-color:#000000;border-width:medium;border-style:double">
<b>1 0 1</td> </tr>

<tr align="center"> <td>4</td> <td colspan="8">reserved</td> </tr>
<tr align="center"> <td>3</td> <td colspan="8">SEGMENT-</td> </tr>
<tr align="center"> <td>2</td> <td colspan="5">SELECTOR</td> <td>x</td>
<td colspan="2">RPL</td> </tr>
<tr align="center"> <td>1</td> <td rowspan="2" colspan="8">reserved</td>
</tr>
<tr align="center"> <td>0</td></tr></table>

<pre><p style="position:absolute; left:70pt; top:700pt; font-size:10pt;"><b>
comments after uppercase character mean "if bit SET"
* and x mean don't care (can be used by OS as flags)
MBZ Must Be Zero
reserved bits should be written with zeros.
</b></p></pre>




<p style="position:absolute; left:70pt; top:770pt; font-size:12pt;"><b><u>
64-bit descriptors</u></b> (IDT entries, LDT and TSS descriptors are
expanded to 64 bits)</p>

<table style="position:absolute; left:70pt; top:800pt;
font-size:8pt;"border="2"; frame="box" rules="all" bgcolor="#FFFFFF"
height="200" width="380" cellspacing="0" cellpadding="0"
bordercolor="#808080">
<colgroup>
<col width="8">
<col width="16" span="8">
<col width="200">
</colgroup>
<tr style="font-size:7pt;" align="center"> <td></td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"> <b><u>
DATA [GDT,LDT]</b></u>[93]
<pre> G 4Kb granular limit
B 32-bit stack
P present
E expand down [stack]
W writable
A accessed

ALL except P,DPL and "10"
is ignored in LONG Mode
but see FS and GS.</pre></td></tr>
<tr align="center"> <td>7</td><td colspan="8">BASE 24..31</td></tr>
<tr align="center">
<td>6</td><td><b>G</td><td><b>B</td><td><b>0</td><td>x</td>
<td colspan="4">LIM 16..19</td></tr>
<tr align="center"> <td>5</td> <td>P</td> <td colspan="2">DPL</td>
<td colspan="2" style="border-width:medium; border-color:#000000;
border-style:double;">
<b>1 0</td> <td><b>E</td> <td><b>W</td> <td><b>A</td></tr>
<tr align="center"> <td>4</td> <td rowspan="3" colspan="8">BASE
0..23</td></tr>
<tr> <td>3</td></tr><tr><td>2</td></tr>
<tr align="center"> <td>1</td> <td rowspan="2" colspan="8">LIMIT
0..15</td></tr>
<tr> <td>0</td></tr></table>


<table style="position:absolute; left:380pt; top:800pt; font-size:8pt;"
border="2" frame="box" rules="all" bgcolor="#FFFFFF" height="200"
width="380" cellspacing="0" cellpadding="0" bordercolor="#808080">
<colgroup>
<col width="8">
<col width="16" span="8">
<col width="200">
</colgroup>
<tr style="font-size:7pt;" align="center"> <td></td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"><b><u>
CODE [GDT,LDT]</u></b>[9b/98]
<pre> G 4Kb granular *
B 32-bit/16bit *
L 64-bit/compatible mode
P present
C confirming
R readable *
A accessed *

base,limit and "*"
ignored in 64bit mode</pre></td></tr>

<tr align="center"> <td>7</td> <td colspan="8" align="center">BASE
24..31</td> </tr>
<tr align="center"> <td>6</td> <td><b>G</td> <td><b>B</td> <td><b>L</td>
<td>x</td>
<td nowrap colspan="4">LIM 16..19</td> </tr>
<tr align="center"> <td>5</td> <td>P</td> <td colspan="2">DPL</td> <td
colspan="2"
style="border-width:medium;border-color:#000000; border-style:double;
padding:0px;">
<b>1 1</td> <td><b>C</td> <td><b>R</td> <td><b>A</td> </tr>
<tr align="center"> <td>4</td> <td rowspan="3" colspan="8">BASE
0..23</td> </tr>
<tr align="center"> <td>3</td></tr>
<tr align="center"> <td>2</td></tr>
<tr align="center"> <td>1</td> <td rowspan="2" colspan="8">LIMIT
0..15</td> </tr>
<tr align="center"> <td>0</td></tr>
</table>


<table style="position:absolute; left:70pt; top:960pt;
font-size:8pt;"border="4" frame="box" rules="all" bgcolor="#ffffff"
height="200" width="550" cellspacing="0" cellpadding="0"
bordercolor="#808080">
<colgroup>
<col width="8">
<col width="16" span="16">
<col width="200">
</colgroup>
<tr style="font-size:7pt;" align="center"> <td></td>

<td>15</td><td>14</td><td>13</td><td>12</td><td>11</td><td>10</td><td>9</td><td>8</td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"><u><b>
INT-GATE/INT-TRAP [IDT]</b></u><pre>
[0e/0f]
V GATE/trap

(GATE disables IRQ.
TRAP-bit and NT cleared
until IRET on both)

IST: Interrupt Service Index
</pre></td></tr>
<tr align="center"> <td>E</td> <td colspan="16"
rowspan="2">ignored</td></tr>
<tr align="center"> <td>C</td> </tr>
<tr align="center"> <td>A</td> <td colspan="16" rowspan="2"> Offset
32..63</td></tr>
<tr align="center"> <td>8</td> </td> </tr>
<tr align="center"> <td>6</td> <td colspan="16">Offset 16..31</td> </tr>
<tr align="center"> <td>4</td> <td>P</td> <td colspan="2">DPL</td>
<td colspan="5"
style="border-color:#000000;border-width:medium;border-style:double">
<b>0 0 1 1 V</td> <td colspan="5">reserved</td><td
colspan="3">IST</td></tr>
<tr align="center"><td>2</td> <td colspan="13">SEGMENT-SELECTOR</td>
<td>x</td>
<td colspan="2">RPL</td> </tr>
<tr align="center"><td>0</td> <td colspan="16">Offset 0..15</td></tr>
</table>


<table style="position:absolute; left:70pt; top:1120pt; font-size:8pt;"
border="4" frame="box" rules="all" bgcolor="#ffffff" height="200"
width="550" cellspacing="0" cellpadding="0" bordercolor="#808080">
<colgroup>
<col width="8">
<col width="16" span="16">
<col width="200">
</colgroup>
<tr style="font-size:7pt;" align="center"> <td>#</td>

<td>15</td><td>14</td><td>13</td><td>12</td><td>11</td><td>10</td><td>9</td><td>8</td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"><u><b>
CALLGATE -----------[GDT/LDT]</b></u>[0C]<pre>
L LDT (else GDT)

</pre></td></tr>

<tr align="center"> <td>E</td> <td colspan="16">reserved</td></tr>
<tr align="center"> <td>C</td> <td colspan="3">* * *</td> <td
colspan="5">0 0 0 0 0</td>
<td colspan="8">reserved</td></tr>
<tr align="center"> <td>A</td> <td colspan="16" rowspan="2"> Offset
32..63</td></tr>
<tr align="center"> <td>8</td> </td> </tr>
<tr align="center"> <td>6</td> <td colspan="16">Offset 16..31</td> </tr>
<tr align="center"> <td>4</td> <td>P</td> <td colspan="2">DPL</td>
<td colspan="5"
style="border-color:#000000;border-width:medium;border-style:double">
<b>0 1 1 0 0</td> <td colspan="8">reserved</td></tr>
<tr align="center"> <td>2</td> <td colspan="13">SEGMENT-SELECTOR</td>
<td><b>L</td>
<td colspan="2">RPL</td> </tr>
<tr align="center"> <td>0</td> <td colspan="16">Offset 0..15</td></tr>
</table>



<table style="position:absolute; left:70pt; top:1280pt;
font-size:8pt;"border="4" frame="box" rules="all" bgcolor="#ffffff"
height="200" width="550" cellspacing="0" cellpadding="0"
bordercolor="#808080">
<colgroup>
<col width="8">
<col width="16" span="16">
<col width="200">
</colgroup>
<tr style="font-size:7pt;" align="center"> <td></td>

<td>15</td><td>14</td><td>13</td><td>12</td><td>11</td><td>10</td><td>9</td><td>8</td>
<td>7</td> <td>6</td> <td>5</td> <td>4</td> <td>3</td> <td>2</td>
<td>1</td> <td>0</td>
<td style="font-size:10pt"; align="left" valign="top" rowspan="9"><u><b>
LDT/TSS---------[GDT]</b></u>[02/09]<pre>
TYPE:
0010 LDT
1001 TSS
1011 busy TSS
G: 4K granular limit
</pre></td></tr>
<tr align="center"> <td>E</td> <td colspan="16">reserved</td></tr>
<tr align="center"> <td>C</td> <td colspan="3"> * * * </td><td
colspan="5">0 0 0 0 0</td>
<td colspan="8">reserved</td></tr>
<tr align="center"> <td>A</td> <td colspan="16" rowspan="2"> BASE
32..63</td></tr>
<tr align="center"> <td>8</td> </td> </tr>
<tr align="center"> <td>6</td> <td colspan="8">BASE 24..31</td>
<td colspan="4"><b> G x x * </td> <td colspan="4">LIM 16..19</td> </tr>
<tr align="center"> <td>4</td> <td>P</td> <td colspan="2">DPL</td>
<td colspan="5"
style="border-color:#000000;border-width:medium;border-style:double">
<b>0 T Y P E</td> <td colspan="8">BASE 16..23</td></tr>
<tr align="center"><td>2</td> <td colspan="16">BASE 0..15</td> </tr>
<tr align="center"><td>0</td> <td colspan="16">LIMIT 0..15</td></tr></table>

<p style="position:absolute; left:40pt; top:1500pt;">eop

</body>
</html>
muta...@gmail.com
2021-07-14 10:59:13 UTC
Permalink
Post by wolfgang kern
Post by ***@gmail.com
That's 256 MiB.
And that's a reason to get x'66' working on the 80386 and
8086, so that I can have 1 GiB instead. Although the x'66'
Sorry, I meant 512 MiB because as you said, I need
separate cs and ds. But it's still double, and running
in PM32 instead of PM16 is cool regardless.
Post by wolfgang kern
Post by ***@gmail.com
Post by wolfgang kern
But try what older EMM did: have only a few variable selectors
and put your address bits extension right into the start field.
you could even define this start-addresses as C-variables..
oh, did I say this yet? :)
I don't understand this proposal, but will it allow 8086
(with ignored x'66') huge memory model programs to
run unchanged on the 80386?
(copy as text, undo linewrap, rename to HTML and watch)
I have no idea what point you are trying to make. If you
write in German, I'll cut and paste into google translate
and hopefully see what you are talking about.

BFN. Paul.
wolfgang kern
2021-07-14 13:28:53 UTC
Permalink
On 14.07.2021 12:59, ***@gmail.com wrote:
...
Post by ***@gmail.com
Post by wolfgang kern
(copy as text, undo linewrap, rename to HTML and watch)
I have no idea what point you are trying to make. If you
write in German, I'll cut and paste into google translate
and hopefully see what you are talking about.
I appended HTML-source (from my teachers help pages) to show you the
layout of available segment-selectors, no German sentence in there.

So you can see that bit shifts aren't a good choice for your task.
__
wolfgang
muta...@gmail.com
2021-07-14 18:18:58 UTC
Permalink
Post by wolfgang kern
Post by ***@gmail.com
Post by wolfgang kern
(copy as text, undo linewrap, rename to HTML and watch)
I have no idea what point you are trying to make. If you
write in German, I'll cut and paste into google translate
and hopefully see what you are talking about.
I appended HTML-source (from my teachers help pages) to show you the
layout of available segment-selectors, no German sentence in there.
I didn't say there was German in there. I said that I would
be happy for you to make your point in German and I will
translate it.
Post by wolfgang kern
So you can see that bit shifts aren't a good choice for your task.
Again, I have no idea what you are talking about.

At this point I don't even know if you understand my proposal.

However, I have two more pieces of information:

1. I mentioned doing a divide followed by a multiply. Actually
that can be combined into a single divide so long as the
boundary is a multiple of 2 * GDT size. Which may tie you
down to a particular processor implementation. Probaby
better to keep it separate.

2. Since you aren't sure whether the x'66' is a no-op on the
8086, can you suggest a bit of assembler that can be
entered into debug.com on MSDOS so that I can ask the
guy with an XT to try it out and provide the result? I don't
want to recompile all of my C programs with x'66' littered
throughout them only to find out that x'66' makes the
8086 explode because it is an alias of HCF.

Thanks. Paul.
wolfgang kern
2021-07-14 20:03:50 UTC
Permalink
On 14.07.2021 20:18, ***@gmail.com wrote:

[about...]
my English is good enough to understand what you try to achieve.
but it seems you haven't got how segment descriptors are built
and how segment selectors interact with the GDT.

you sure can map many 64KB physical consecutive blocks to also
consecutive segment descriptors i.e.:

0008 base 0000_0000 code limit ffff
0010 base 0000_0000 data
0018 base 0001_0000
0020 base 0001_0000

but the selector must be calculated, simple shift wont help here.
so your idea to make it an easy solution fell flat :)
__
wolfgang
muta...@gmail.com
2021-07-14 22:18:40 UTC
Permalink
Post by wolfgang kern
my English is good enough to understand what you try to achieve.
It's not a question of good English skills. Native English
speakers often don't understand what I am proposing
either.
Post by wolfgang kern
but it seems you haven't got how segment descriptors are built
and how segment selectors interact with the GDT.
Note that I have written an 80386 operating system, so
at some point I do get things, at least partially, but then
I forget them again (perhaps some mental deficiency),
which is why you had to make multiple corrections.
Post by wolfgang kern
you sure can map many 64KB physical consecutive blocks to also
0008 base 0000_0000 code limit ffff
0010 base 0000_0000 data
0018 base 0001_0000
0020 base 0001_0000
but the selector must be calculated, simple shift wont help here.
so your idea to make it an easy solution fell flat :)
I have told you at least twice that my calculation will be
a division and a multiplication. Not a shift.

And the calculation doesn't start from a random value,
it starts from an existing segment/selector.

If I currently have a data (ie not cs) seg:off of 3000:0000,
which on the 80386 with effective 16-bit shifts would be
x'3000' / 64k = 3 * x'10' (distance between data selectors)
+ x'10' (starting point of data selectors, not code selectors)
= x'40', and then the program adds 64k to this pointer, and
so the new target is 4000:0000 there will be a non-hardcoded
divide and multiply, in our case because we're doing 16-bit
effective shifts the divisor is 64k, and because we're using
an 80386 not some theoretical 99986 processor (with a
distance of say 5 bytes), the distance between data selectors
is x'10' so the multiplier is x'10' then you have:

x'40' + 1 * x'10' = x'50'

which will correctly get you to the next selector.

On an 8086 the divisor (provided by the OS to the application
at startup time, exactly the same way when running on an
80386) will always be 16 and the multiplier will always
be 1, so to jump 64k starting at 3000:0000 you will have

64k / 16 = 4096 * 1 = 4096 = x'1000' added to x'3000' and
you have x'4000' so you end up at 4000:0000.

Which bit won't work?

I'm sure plenty of people won't like the design, but will it
work or not?

If it works then the next question is - does the 8086 ignore
x'66' and advance to the next byte? Or can x'66' be trapped
and ignored? Or if it is an alias, what effect does it have,
and can memory be structures to keep that x'66' from
harming anything important?

If the answer to the above is "ok it will work, but the trapping
will make it 5% slower", that is within my 10% flexibility, so
the next question is:

What 16-bit instructions are available on the 80386 if I code
x'66' or x'67' wherever necessary?

And finally, with the above instructions, can a Turing
machine be constructed on the 8086 and if so, how
much slower would a C program be compared to if
I had access to the full range of 8086 instructions?

Since this is my own C applications we are talking
about, I probably don't care (even if the year is 1988)
if my program runs 3 times slower than the machine
code generated by the same C compiler if it has
access to the full range of 8086 instructions rather
than just the subset that work on 80386. I'm more
interested in producing a future-proofed 16-bit
executable than being "first and best" in 1988.

BFN. Paul.
muta...@gmail.com
2021-07-15 00:36:29 UTC
Permalink
Post by ***@gmail.com
If it works then the next question is - does the 8086 ignore
x'66' and advance to the next byte? Or can x'66' be trapped
and ignored? Or if it is an alias, what effect does it have,
and can memory be structures to keep that x'66' from
harming anything important?
And same question for a real 80286. There was no
32-bit "other" to worry about.

BFN. Paul.
wolfgang kern
2021-07-15 07:02:49 UTC
Permalink
Post by ***@gmail.com
Post by ***@gmail.com
If it works then the next question is - does the 8086 ignore
x'66' and advance to the next byte? Or can x'66' be trapped
and ignored? Or if it is an alias, what effect does it have,
and can memory be structures to keep that x'66' from
harming anything important?
trap seems impossible (would mean all code run single step debug),
and who knows if it acts as a NOP on all 8086 ?
Post by ***@gmail.com
And same question for a real 80286. There was no
32-bit "other" to worry about.
some brands may ignore it while others may have it as an alias,
so you better forget about using 66 in both worlds.

[your segment descriptor calculation...]
you still haven't realized that base bits 0..23 and 24..31 are
NOT adjective. There are two bytes in between.
You would have known this if you already wrote an 386 OS.
__
wolfgang
muta...@gmail.com
2021-07-15 08:45:03 UTC
Permalink
Post by wolfgang kern
Post by ***@gmail.com
If it works then the next question is - does the 8086 ignore
x'66' and advance to the next byte? Or can x'66' be trapped
and ignored? Or if it is an alias, what effect does it have,
and can memory be structures to keep that x'66' from
harming anything important?
trap seems impossible (would mean all code run single step debug),
and who knows if it acts as a NOP on all 8086 ?
There are different types of 8086???
Post by wolfgang kern
[your segment descriptor calculation...]
you still haven't realized that base bits 0..23 and 24..31 are
NOT adjective. There are two bytes in between.
You would have known this if you already wrote an 386 OS.
Why do I care if the base bits are not adjacent?

BFN. Paul.
muta...@gmail.com
2021-07-15 10:40:30 UTC
Permalink
Post by ***@gmail.com
Post by wolfgang kern
you still haven't realized that base bits 0..23 and 24..31 are
NOT adjective. There are two bytes in between.
You would have known this if you already wrote an 386 OS.
Why do I care if the base bits are not adjacent?
Just to be clear.

When PDOS/86 is running on an 80386, at startup time, it will
set all 16384 selectors to map the first 512 MiB of memory,
and then never ever change them.

When PDOS/86 is running on an 8086, at startup time, it will
do nothing.

When PDOS/86 is running in those two different environments,
when it loads a 16-bit APPLICATION into memory, it will adjust all
the segments (a normal part of the load process), but the
adjustment will be completely different for these 2 different CPUs.

tiny, small, medium, compact and large memory model programs
do not ever manipulation segment registers, they only ever load
them. (at least the applications supported by PDOS/86).

Only huge memory model programs manipulate the segment
registers. The PDOS/86 rules for huge memory models doing
segment adjustment when they have a value of e.g. 64k added
to them, and the segment needs to adjust by whatever the
representation of 64k is, is to do this by doing a divide, and
then a multiply. Both the value to divide by, and the value to
multiply by, are required to be obtained from the operating
system when the huge memory model application begins
execution. The values from PDOS/86 when running on an
8086 will always be the same. The values from PDOS/86
when running on an 80386 will always be very different
from when PDOS/86 is running on an 8086.

Two separate PDOS/86 systems, both running on an
80386 with 512 MiB memory will have the exact same
values for the divide and multiply.

But if one PDOS/86 is running on an 80386 with 256 MiB
of memory, it will have a different divisor. The smaller
divisor allows finer granularity, for the same reason that
Intel made the 8086 shift 4 bits instead of 16.

The multiplier on the 80386 will always be 0x10, because
that is the distance between two GDT data (same thing
applies to code) entries.

As far as I can tell, based on your mention of base values
not being adjacent in the GDT entry, you do not understand
my proposal.

I'm not saying that's your fault. I'm not very good at
explaining things, and it's only when I have working code
and can say "look at that", that people understand.

Although sometimes I actually show it working and people
still say it won't work.

BFN. Paul.
Joe Monk
2021-07-15 12:39:26 UTC
Permalink
Post by ***@gmail.com
When PDOS/86 is running on an 80386, at startup time, it will
set all 16384 selectors to map the first 512 MiB of memory,
and then never ever change them.
First off, there arent 16384 selectors on an 80386.

A selector is only 13 bits wide. 2^13 = 8192.

Bits 0-1 of a selector are the privilege level, and bit 2 specifies which descriptor table (GDT,LDT) you are indexing into.

https://pdos.csail.mit.edu/6.828/2005/readings/i386/c05.htm

Joe
Joe Monk
2021-07-15 13:25:48 UTC
Permalink
And finally, just a little light reading on the differences between 8086 real mode and 80386 real mode.

https://pdos.csail.mit.edu/6.828/2005/readings/i386/s14_07.htm

Joe
muta...@gmail.com
2021-07-15 19:53:12 UTC
Permalink
Post by Joe Monk
Post by ***@gmail.com
When PDOS/86 is running on an 80386, at startup time, it will
set all 16384 selectors to map the first 512 MiB of memory,
and then never ever change them.
First off, there arent 16384 selectors on an 80386.
A selector is only 13 bits wide. 2^13 = 8192.
Bits 0-1 of a selector are the privilege level, and bit 2 specifies which descriptor table (GDT,LDT) you are indexing into.
In my scheme, as far as I can tell, I can line up the 8192
LDT selectors after the 8192 GDT selectors for a total
of 16384 selectors.

I'm trying to run 8086 programs.

Selective 8086 programs.

Specifically programs that deliberately or accidentally
followed rules that I'm making up right now. Once I know
what the rules are, I'll make Open Watcom or some other
compiler conform to the rules. And then I'll recompile
all my programs, now that I know how to "program
properly".

Note that when I created AM32 I found that PDPCLIB
had forced AM31 in some places (in mvssupa) requiring
me to fix that, and recompile all my applications.

BFN. Paul.
Joe Monk
2021-07-15 13:21:26 UTC
Permalink
Post by ***@gmail.com
When PDOS/86 is running on an 80386, at startup time, it will
set all 16384 selectors to map the first 512 MiB of memory,
and then never ever change them.
If youre talking about running in real mode, then you should take note:

The 80386 provides a one Mbyte + 64 Kbyte memory space for an 8086 program. Segment relocation is performed as in the 8086: the 16-bit value in a segment selector is shifted left by four bits to form the base address of a segment. The effective address is extended with four high order zeros and added to the base to form a linear address as Figure 14-1 illustrates. (The linear address is equivalent to the physical address, because paging is not used in real-address mode.) Unlike the 8086, the resulting linear address may have up to 21 significant bits. There is a possibility of a carry when the base address is added to the effective address. On the 8086, the carried bit is truncated, whereas on the 80386 the carried bit is stored in bit position 20 of the linear address.

https://pdos.csail.mit.edu/6.828/2005/readings/i386/s14_01.htm

Joe
Joe Monk
2021-07-15 21:30:42 UTC
Permalink
No, I'm running 8086 programs in PM32.
Ah, so V86 mode. Well in PM32 then, your max address space is 1MB + 64k...

"The hardware provides a virtual set of registers (via the TSS), a virtual memory space (the first megabyte of the linear address space of the task), and directly executes all instructions that deal with these registers and with this address space."

https://pdos.csail.mit.edu/6.828/2005/readings/i386/c15.htm

Joe
muta...@gmail.com
2021-07-15 22:04:49 UTC
Permalink
Post by Joe Monk
No, I'm running 8086 programs in PM32.
Ah, so V86 mode.
No. Not V86 mode. Normal 32-bit protected mode.

The OS won't be "normal" though. It will be very odd.

BFN. Paul.
Joe Monk
2021-07-15 22:30:55 UTC
Permalink
Post by ***@gmail.com
No. Not V86 mode. Normal 32-bit protected mode.
How do you expect the processor to execute an 8086 instruction then?

"The 80386 supports execution of one or more 8086, 8088, 80186, or 80188 programs in an 80386 protected-mode environment. An 8086 program runs in this environment as part of a V86 (virtual 8086) task."

Joe
Joe Monk
2021-07-16 00:56:54 UTC
Permalink
The same way that z/Arch running in AM64 executes
a LR R3,R4 perfectly fine.
So too the 80386 will execute "mov ax, bx" perfectly fine,
in exactly the same manner as the 8086.
At least I think it will. I'm more familiar with S/3X0
assembler than x86.
On an S/370, an LR R3,R4 is a 32-bit move. Same for S/390, and z/Arch.

On an 8086, a mov ax,bx is a 16-bit instruction. Same on the 80386. In PM32, that register is called EAX. AX only references the lower 16 bits of EAX.

See the problem? They're not the same operations.

Joe
muta...@gmail.com
2021-07-16 01:48:27 UTC
Permalink
Post by Joe Monk
The same way that z/Arch running in AM64 executes
a LR R3,R4 perfectly fine.
So too the 80386 will execute "mov ax, bx" perfectly fine,
in exactly the same manner as the 8086.
At least I think it will. I'm more familiar with S/3X0
assembler than x86.
On an S/370, an LR R3,R4 is a 32-bit move. Same for S/390, and z/Arch.
On an 8086, a mov ax,bx is a 16-bit instruction. Same on the 80386. In PM32, that register is called EAX. AX only references the lower 16 bits of EAX.
See the problem? They're not the same operations.
In the same way that z/Arch supports both 32-bit and 64-bit operations,
the 80386 supports both 16-bit and 32-bit operations.

You can do a mov ax, bx on both the 8086 and the
80386 (even in normal PM32, forget RM16, forget
V8086).

Or am I missing something?

BFN. Paul.
Joe Monk
2021-07-16 02:31:49 UTC
Permalink
Post by ***@gmail.com
In the same way that z/Arch supports both 32-bit and 64-bit operations,
the 80386 supports both 16-bit and 32-bit operations.
On z/arch there are two different opcodes (LR/LGR) for 32/64bit operations.

So no, not in the same way that z/arch supports 32/64 bit operations.

Joe
muta...@gmail.com
2021-07-16 02:48:40 UTC
Permalink
Post by Joe Monk
Post by ***@gmail.com
In the same way that z/Arch supports both 32-bit and 64-bit operations,
the 80386 supports both 16-bit and 32-bit operations.
On z/arch there are two different opcodes (LR/LGR) for 32/64bit operations.
True, two different mnemonics for two different opcodes.

This is a feature of the assembler. The assembler on the
mainframe recognizes the LGR and generates a different
opcode, even though the registers are identically named.

On the 80386 the assemblers took a different approach,
and they require you to keep the instruction name the
same, and change the names of the registers. The "mov"
generates a different op code depending on whether
the programmer wrote "ax" or "eax".

The end result is identical though, quibbling aside.
Post by Joe Monk
So no, not in the same way that z/arch supports 32/64 bit operations.
Yes, at the op code level it is identical, quibbling aside.

BFN. Paul.
Joe Monk
2021-07-16 03:09:29 UTC
Permalink
Post by ***@gmail.com
This is a feature of the assembler. The assembler on the
mainframe recognizes the LGR and generates a different
opcode, even though the registers are identically named.
LR is OPCODE 18. OPCODE 18 is ALWAYS a 32-bit operation.
LGR is OPCODE B904. OPCODE B904 is ALWAYS a 64-bit operation.

So just looking at the opcode I know what is going on...

By contrast, the MOV instruction is context sensitive. MOV EAX? 32 bit operation. MOV AX? 16-bit operation. MOV AH? 8-bit operation. MOV is always opcode B8. But just looking at the opcode do I know if I am moving 8 bits, 16 bits or 32 bits? No.

Thats why IBM object code is so much easier to work with than x86. And thats why the two machines are worlds apart and dont operate in the same way.

Joe
Joe Monk
2021-07-16 03:12:23 UTC
Permalink
Post by Joe Monk
MOV is always opcode B8.
Sorry, hit Enter too fast.

MOV has 14 different opcodes...

https://c9x.me/x86/html/file_module_x86_id_176.html

Joe
muta...@gmail.com
2021-07-16 03:30:29 UTC
Permalink
Post by Joe Monk
Post by ***@gmail.com
This is a feature of the assembler. The assembler on the
mainframe recognizes the LGR and generates a different
opcode, even though the registers are identically named.
LR is OPCODE 18. OPCODE 18 is ALWAYS a 32-bit operation.
LGR is OPCODE B904. OPCODE B904 is ALWAYS a 64-bit operation.
Sure. And the 8-bit, 16-bit and 32-bit operations done
in mov al/ah/ax/eax also have unique opcodes.
Post by Joe Monk
So just looking at the opcode I know what is going on...
You do in 80386 machine code too. The CPU wouldn't
recognize the instruction correctly if the opcode wasn't
correct.
Post by Joe Monk
By contrast, the MOV instruction is context sensitive.
MOV EAX? 32 bit operation. MOV AX? 16-bit operation.
MOV AH? 8-bit operation. MOV is always opcode B8.
But just looking at the opcode do I know if I am moving
8 bits, 16 bits or 32 bits? No.
This is a feature of the assembler, and can easily be
changed if you don't like it.

You can code mov.b and mov.w and mov.l if you prefer
(some people do prefer).
Post by Joe Monk
Thats why IBM object code is so much easier to work with than x86.
Nope. For starters, most people don't look at the object
code. But for those that do, it's a minor issue to look up
what each opcode does.
Post by Joe Monk
And thats why the two machines are worlds apart and dont operate in the same way.
Nope. The machines are near-identical Turing machines
when looked at from the perspective of a C90-compliant
application.

BFN. Paul.
Joe Monk
2021-07-16 03:53:09 UTC
Permalink
Post by ***@gmail.com
Sure. And the 8-bit, 16-bit and 32-bit operations done
in mov al/ah/ax/eax also have unique opcodes.
Actually, they dont. The same opcode is used x'89' for 16 and 32 bit operations...

Joe
muta...@gmail.com
2021-07-16 03:56:02 UTC
Permalink
Post by Joe Monk
Post by ***@gmail.com
Sure. And the 8-bit, 16-bit and 32-bit operations done
in mov al/ah/ax/eax also have unique opcodes.
Actually, they dont. The same opcode is used x'89' for 16 and 32 bit operations...
There is an x'66' in front of the 16-bit op-code. It wouldn't
work otherwise.

That's why we've been talking about x'66'.

BFN. Paul.
Joe Monk
2021-07-16 04:44:40 UTC
Permalink
Post by ***@gmail.com
There is an x'66' in front of the 16-bit op-code. It wouldn't
work otherwise.
That's why we've been talking about x'66'.
Not Necessarily.

"For programs executed in protected mode, the D-bit in executable-segment descriptors determines the default attribute for both address size and operand size. These default attributes apply to the execution of all instructions in the segment. A value of zero in the D-bit sets the default address size and operand size to 16 bits; a value of one, to 32 bits."

https://www.scs.stanford.edu/05au-cs240c/lab/i386/s17_01.htm

So if the default is set correctly to 16-bits, then there wouldn't be an instruction prefix.

Joe
muta...@gmail.com
2021-07-16 09:11:35 UTC
Permalink
Post by Joe Monk
Post by ***@gmail.com
There is an x'66' in front of the 16-bit op-code. It wouldn't
work otherwise.
That's why we've been talking about x'66'.
Not Necessarily.
"For programs executed in protected mode, the D-bit in
executable-segment descriptors determines the default
attribute for both address size and operand size. These
default attributes apply to the execution of all instructions
in the segment. A value of zero in the D-bit sets the default
address size and operand size to 16 bits; a value of one, to 32 bits."
So if the default is set correctly to 16-bits, then there wouldn't be an instruction prefix.
Whether an instruction prefix is generated or not is dependent
on the assembler, not the descriptors used at runtime.

My assembler will generate the x'66'.

And there is no such thing as "correctly". PDOS/86 when
running on an 80386 in PM32 will set all the D-bits to 0
to keep everything 16-bit. That's the design. It isn't "incorrect".

BFN. Paul.
Joe Monk
2021-07-16 10:39:48 UTC
Permalink
Post by ***@gmail.com
Post by Joe Monk
So if the default is set correctly to 16-bits, then there wouldn't be an instruction prefix.
Whether an instruction prefix is generated or not is dependent
on the assembler, not the descriptors used at runtime.
My assembler will generate the x'66'.
And there is no such thing as "correctly". PDOS/86 when
running on an 80386 in PM32 will set all the D-bits to 0
to keep everything 16-bit. That's the design. It isn't "incorrect".
x'66' is operand size override prefix.

It flips the operand size from 16 to 32 when the D-bit is 0, and from 32 to 16 when the D-bit is 1. Of course, It only works for the 1 instruction it prefixes.

So is that what you want? All operands to be 32 bit when the D-bit is 0?

Table 17-1. Effective Size Attributes https://www.scs.stanford.edu/05au-cs240c/lab/i386/s17_01.htm

Segment Default D = ... 0 0 0 0 1 1 1 1 <-------
Operand-Size Prefix 66H N N Y Y N N Y Y <-------
Address-Size Prefix 67H N Y N Y N Y N Y

Effective Operand Size 16 16 32 32 32 32 16 16 <-------
Effective Address Size 16 32 16 32 32 16 32 16

Y = Yes, this instruction prefix is present
N = No, this instruction prefix is not present

Joe
muta...@gmail.com
2021-07-16 11:13:28 UTC
Permalink
Post by Joe Monk
Post by ***@gmail.com
Post by Joe Monk
So if the default is set correctly to 16-bits, then there wouldn't be an instruction prefix.
Whether an instruction prefix is generated or not is dependent
on the assembler, not the descriptors used at runtime.
My assembler will generate the x'66'.
And there is no such thing as "correctly". PDOS/86 when
running on an 80386 in PM32 will set all the D-bits to 0
to keep everything 16-bit. That's the design. It isn't "incorrect".
x'66' is operand size override prefix.
It flips the operand size from 16 to 32 when the D-bit is 0, and from 32 to 16 when the D-bit is 1. Of course, It only works for the 1 instruction it prefixes.
So is that what you want? All operands to be 32 bit when the D-bit is 0?
Table 17-1. Effective Size Attributes https://www.scs.stanford.edu/05au-cs240c/lab/i386/s17_01.htm
Segment Default D = ... 0 0 0 0 1 1 1 1 <-------
Operand-Size Prefix 66H N N Y Y N N Y Y <-------
Address-Size Prefix 67H N Y N Y N Y N Y
Effective Operand Size 16 16 32 32 32 32 16 16 <-------
Effective Address Size 16 32 16 32 32 16 32 16
Y = Yes, this instruction prefix is present
N = No, this instruction prefix is not present
Apologies.

And brilliant.

I now have a choice.

I can either not get the assembler to generate the x'66' and
then set all the D-bits to 0.

Or I can get the assembler to generate the x'66' and set all
the D-bits to 1.

I can't see a reason to do the latter when the former is
available. I was thinking it would be useful to do the
latter if I had mixed 32-bit and 16-bit code. But if I was
doing that, I would have a different GDT, like I already
have in current PDOS/386 (ie very minimal). And flip
between the two.

Thankyou again sir.

BFN. Paul.
muta...@gmail.com
2021-07-16 14:32:14 UTC
Permalink
Post by ***@gmail.com
available. I was thinking it would be useful to do the
latter if I had mixed 32-bit and 16-bit code. But if I was
doing that, I would have a different GDT, like I already
have in current PDOS/386 (ie very minimal). And flip
between the two.
Since I can only get 512 MiB for 16-bit programs anyway,
I may as well have the mandatory 6 selectors or whatever
to address the first 3.5 GiB of memory for 32-bit programs
and the rest of the selectors used for 16-bit programs in
the other 512 MiB of memory.

Minus whatever is occupied by video ram or whatever,
obviously.

BFN. Paul.
muta...@gmail.com
2021-07-16 15:06:37 UTC
Permalink
Post by ***@gmail.com
Post by ***@gmail.com
available. I was thinking it would be useful to do the
latter if I had mixed 32-bit and 16-bit code. But if I was
doing that, I would have a different GDT, like I already
have in current PDOS/386 (ie very minimal). And flip
between the two.
Since I can only get 512 MiB for 16-bit programs anyway,
I may as well have the mandatory 6 selectors or whatever
to address the first 3.5 GiB of memory for 32-bit programs
and the rest of the selectors used for 16-bit programs in
the other 512 MiB of memory.
I don't suppose there is a way to have, in addition to the
above, of having 64-bit programs run in the 4 GiB to
32 GiB region?

According to this:

https://en.wikipedia.org/wiki/X86-64#Operating_modes

there is 16-bit and 32-bit available to some degree, even
in long mode.

My requirement is for the selected subset of 8086
programs that follow "the rules" to work in their
allotted 512 MiB region. So that means the programs
load cs and ds constantly. If that is not available in
this "CM16", it's no use to me, and I'll stick to running
16-bit programs in PM32 for now.

BFN. Paul.
Joe Monk
2021-07-16 15:25:30 UTC
Permalink
In long mode, the processor forces cs, ds, es and ss to 0.

Joe
muta...@gmail.com
2021-07-16 19:51:29 UTC
Permalink
Post by ***@gmail.com
Post by ***@gmail.com
Since I can only get 512 MiB for 16-bit programs anyway,
I may as well have the mandatory 6 selectors or whatever
to address the first 3.5 GiB of memory for 32-bit programs
and the rest of the selectors used for 16-bit programs in
the other 512 MiB of memory.
I don't suppose there is a way to have, in addition to the
above, of having 64-bit programs run in the 4 GiB to
32 GiB region?
How about a new mode for x64, called CM64,
active in PM32, where if the selector (cs/ds)
has a base of 0xffffffff and a length of 1 (or
something similar), then the x64 long mode
instruction set is active?

In fact, is there any reason why the x64 instruction
set can't always be active? If that's the case then
we are left with the only difference being the
default address size being 64 bits. So that's the
only thing that the dummy base above activates.
It would be similar to having a D "bit" of 2.

How many extra transistors would that require?
As a percentage.

Thanks. Paul.
muta...@gmail.com
2021-07-16 19:37:32 UTC
Permalink
Post by Joe Monk
"For programs executed in protected mode, the D-bit in
executable-segment descriptors determines the default
attribute for both address size and operand size. These
default attributes apply to the execution of all instructions
in the segment. A value of zero in the D-bit sets the default
address size and operand size to 16 bits; a value of one, to 32 bits."
And what are we left with with this issue from Rod?
Post by Joe Monk
;32-bit code
00000028 89FB mov ebx,edi
0000002A 8B4720 mov eax,[edi+32]
;16-bit code
00000028 89FB mov bx,di
0000002A 8B4720 mov ax,[bx+0x20]
Can I get the 16-bit flavor to work by just setting
the D-bit to 0?

Or is it still different in PM32?

And if the latter, then how important is that (second) instruction?
Can I just ensure that the C compiler doesn't emit that
instruction? Will I still have a Turing machine operating
close to full speed compared to if I was in RM16 instead?

Thanks. Paul.
a***@math.uni.wroc.pl
2021-07-16 02:46:09 UTC
Permalink
Post by ***@gmail.com
Post by Joe Monk
The same way that z/Arch running in AM64 executes
a LR R3,R4 perfectly fine.
So too the 80386 will execute "mov ax, bx" perfectly fine,
in exactly the same manner as the 8086.
At least I think it will. I'm more familiar with S/3X0
assembler than x86.
On an S/370, an LR R3,R4 is a 32-bit move. Same for S/390, and z/Arch.
On an 8086, a mov ax,bx is a 16-bit instruction. Same on the 80386. In PM32, that register is called EAX. AX only references the lower 16 bits of EAX.
See the problem? They're not the same operations.
In the same way that z/Arch supports both 32-bit and 64-bit operations,
the 80386 supports both 16-bit and 32-bit operations.
You can do a mov ax, bx on both the 8086 and the
80386 (even in normal PM32, forget RM16, forget
V8086).
Or am I missing something?
It is not "the same way". I did not check all details of modes
on z/Arch, but x86 way is strange: 16-bit operations preserve
upper bits, 32-bit operation extend by zero. Operand size
reverts mode. IIUC RISC machines have separate 32-bit and
64-bit instructions, they sign-extend instead of zero extend
or preserving high bits.
--
Waldek Hebisch
a***@math.uni.wroc.pl
2021-07-15 15:11:08 UTC
Permalink
Post by ***@gmail.com
Post by wolfgang kern
you sure can map many 64KB physical consecutive blocks to also
0008 base 0000_0000 code limit ffff
0010 base 0000_0000 data
0018 base 0001_0000
0020 base 0001_0000
but the selector must be calculated, simple shift wont help here.
so your idea to make it an easy solution fell flat :)
I have told you at least twice that my calculation will be
a division and a multiplication. Not a shift.
And the calculation doesn't start from a random value,
it starts from an existing segment/selector.
If I currently have a data (ie not cs) seg:off of 3000:0000,
which on the 80386 with effective 16-bit shifts would be
x'3000' / 64k = 3 * x'10' (distance between data selectors)
+ x'10' (starting point of data selectors, not code selectors)
= x'40', and then the program adds 64k to this pointer, and
so the new target is 4000:0000 there will be a non-hardcoded
divide and multiply, in our case because we're doing 16-bit
effective shifts the divisor is 64k, and because we're using
an 80386 not some theoretical 99986 processor (with a
distance of say 5 bytes), the distance between data selectors
x'40' + 1 * x'10' = x'50'
which will correctly get you to the next selector.
On an 8086 the divisor (provided by the OS to the application
at startup time, exactly the same way when running on an
80386) will always be 16 and the multiplier will always
be 1, so to jump 64k starting at 3000:0000 you will have
64k / 16 = 4096 * 1 = 4096 = x'1000' added to x'3000' and
you have x'4000' so you end up at 4000:0000.
Which bit won't work?
I'm sure plenty of people won't like the design, but will it
work or not?
Depends very much what "works" mean. Running unmodified
86 binaries: AFAICS no. You are creating a virtual machine
which with little care (and assuming divisor 16) will run the
same binary on 86 and 386.
Post by ***@gmail.com
If it works then the next question is - does the 8086 ignore
x'66' and advance to the next byte? Or can x'66' be trapped
and ignored? Or if it is an alias, what effect does it have,
and can memory be structures to keep that x'66' from
harming anything important?
There were several 86 processors: original 86, 88 (in original PC)
"compatible" processors by AMD and Harris, NEC variant (which
had Z80 mode so certainly used different mask). Some machines
used 186 (which at lowest level was incompatible, but some
manufactutes pushed "PC-s" based on them), there is 286. There
were various steppings and speed grades. Steppings fixed bugs
and were supposed to make no change to documented instructions,
but it is hard to tell what could happend to undocumented stuff.
Post by ***@gmail.com
If the answer to the above is "ok it will work, but the trapping
will make it 5% slower", that is within my 10% flexibility, so
Hmm. On 86 multiply and divide take time comparable to tens
of simpler instructions. You also have problem of finding
place to keep your divisor. If you keep divisor in register,
that alone will blow up your 10% budget (losing a register on
86 is closer to 20% drop in performace) and introduce seroius
incompatiblity with traditional 86 code. If you keep divisor in
memory there will be extra segment manipulations to fetch
divisor.
Post by ***@gmail.com
What 16-bit instructions are available on the 80386 if I code
x'66' or x'67' wherever necessary?
And finally, with the above instructions, can a Turing
machine be constructed on the 8086 and if so, how
much slower would a C program be compared to if
I had access to the full range of 8086 instructions?
Since this is my own C applications we are talking
about, I probably don't care (even if the year is 1988)
if my program runs 3 times slower than the machine
code generated by the same C compiler if it has
access to the full range of 8086 instructions rather
than just the subset that work on 80386. I'm more
interested in producing a future-proofed 16-bit
executable than being "first and best" in 1988.
Well, if you have C source, then there is obvious way
to get good speed: recompile for newer machine.
Note that fancy "binary translation" schemes claim
speed comparable to native speed. Even less fancy
schemes should be not much worse than 3 times slower
compared to native. Your scheme for huge model
programs is likely to lead to much more significant
slowdown.

BTW: Forth threaded code scheme leads to speed loss
2-4 times compared to machine code, is quite easy to
implement and "interpreter" is tiny.

OTOH politicians and religous leaders understand quite
well that ideological purity is more important than real
world performance...
--
Waldek Hebisch
Scott Lurndal
2021-07-15 15:47:17 UTC
Permalink
Post by a***@math.uni.wroc.pl
Post by ***@gmail.com
If the answer to the above is "ok it will work, but the trapping
will make it 5% slower", that is within my 10% flexibility, so
Hmm. On 86 multiply and divide take time comparable to tens
of simpler instructions.
Perhaps in 1985.

2008:

IMUL has a latency of 10 cycles (i.e. it completes in 10 cycles).
It has a throughput of 1 cycle (i.e. there must be one cycle between
successive imuls).

IDIV has circa 70 cycle latency, with 30 cycle throughput.

FMUL has 5 cycle latency and 2 cycle throughput.

(These numbers were as of December 2008, I'm sure they've gotten
better since - downloads the latest copy of 64-ia-32-architectures-optimization-manual.pdf:

2021:

IMUL is down to 3 cycle latency with 1 cycle throughput.
IDIV latencies aren't listed, but throughput for 32-bit integer division is about 20 cycles
and 90 cycles for 64-bit division.
MUL r64 has 4-5 cycle latency and 1 cycle throughput.


8. The throughput of "DIV/IDIV r64" varies with the number of significant digits in the input RDX:RAX.
The throughput is significantly higher if RDX input is 0, similar to those of "DIV/IDIV r32". If RDX is
not zero, the throughput is significantly lower, as shown in the range. The throughput decreases
(increasing numerical value in cycles) with increasing number of significant bits in the input RDX:RAX
(relative to the number of significant bits of the divisor) or the output quotient. The latency of
"DIV/IDIV r64" also varies with the significant bits of input values. For a given set of input values, the
latency is about the same as the throughput in cycles.

9. The throughput of "DIV/IDIV r32" varies with the number of significant digits in the input EDX:EAX
and/or of the quotient of the division for a given size of significant bits in the divisor r32. The
throughput decreases (increasing numerical value in cycles) with increasing number of significant
bits in the input EDX:EAX or the output quotient. The latency of "DIV/IDIV r32" also varies with the
significant bits of the input values. For a given set of input values, the latency is about the same as
the throughput in cycles.

You also have problem of finding
Post by a***@math.uni.wroc.pl
place to keep your divisor. If you keep divisor in register,
Good thing they added 8 more registers to x86_64 :-).

Not to mention ARMv8 with 32 registers.
a***@math.uni.wroc.pl
2021-07-15 16:04:22 UTC
Permalink
Post by Scott Lurndal
Post by a***@math.uni.wroc.pl
Post by ***@gmail.com
If the answer to the above is "ok it will work, but the trapping
will make it 5% slower", that is within my 10% flexibility, so
Hmm. On 86 multiply and divide take time comparable to tens
of simpler instructions.
Perhaps in 1985.
Yes. Paul wants to run on real 86 (or rather 88) from 1982.
Post by Scott Lurndal
IMUL has a latency of 10 cycles (i.e. it completes in 10 cycles).
It has a throughput of 1 cycle (i.e. there must be one cycle between
successive imuls).
IDIV has circa 70 cycle latency, with 30 cycle throughput.
Note that this is not so great: peak is 4 instructions per cycle,
average between 1 and 2 per cycle, so this throughput is comparable
to tens of normal instructions. It seems that division goes
in parallel with other onstructions, so as long as number of
divisions in moderate overall impact is limited. Figures that
I saw were better of what you give above. But Paul want to put
multiply and divide in data fetch patch. It is likely that his
performance will be limited by latency, so quite bad...
Post by Scott Lurndal
IMUL is down to 3 cycle latency with 1 cycle throughput.
IDIV latencies aren't listed, but throughput for 32-bit integer division is about 20 cycles
and 90 cycles for 64-bit division.
MUL r64 has 4-5 cycle latency and 1 cycle throughput.
Yes, butter than older figures but still not so great (especially
concerning division).
Post by Scott Lurndal
You also have problem of finding
Post by a***@math.uni.wroc.pl
place to keep your divisor. If you keep divisor in register,
Good thing they added 8 more registers to x86_64 :-).
Not to mention ARMv8 with 32 registers.
But Paul insist on code that runs on original 86, so do not
want to use extra registers.
--
Waldek Hebisch
Scott Lurndal
2021-07-15 17:10:39 UTC
Permalink
Post by a***@math.uni.wroc.pl
Post by Scott Lurndal
Post by a***@math.uni.wroc.pl
Hmm. On 86 multiply and divide take time comparable to tens
of simpler instructions.
Perhaps in 1985.
Yes. Paul wants to run on real 86 (or rather 88) from 1982.
Post by Scott Lurndal
IMUL is down to 3 cycle latency with 1 cycle throughput.
IDIV latencies aren't listed, but throughput for 32-bit integer division is about 20 cycles
and 90 cycles for 64-bit division.
MUL r64 has 4-5 cycle latency and 1 cycle throughput.
Yes, butter than older figures but still not so great (especially
concerning division).
Although if the operations he's contemplating are
by powers of two, then bit shifts are single-cycle high-throughput
operations (ARMv8 actually includes bit shifts in some operands automatically).
muta...@gmail.com
2021-07-15 20:14:25 UTC
Permalink
Post by Scott Lurndal
Although if the operations he's contemplating are
by powers of two, then bit shifts are single-cycle high-throughput
operations (ARMv8 actually includes bit shifts in some operands automatically).
They are in fact always going to be powers of two, and
I could in fact get a right-shift and a left-shift value from
the OS instead of divide and multiply values.

That's a great idea. It does rely on the distance between
selectors being a power of 2 though. Is it wise to mandate
such a processor?

Also, it is sounding like this will work in PM32. Will it also
work in long mode? I'm not familiar with their selectors
or anything else.

Thanks. Paul.
Scott Lurndal
2021-07-15 20:27:03 UTC
Permalink
Post by ***@gmail.com
Post by Scott Lurndal
Although if the operations he's contemplating are
by powers of two, then bit shifts are single-cycle high-throughput
operations (ARMv8 actually includes bit shifts in some operands automatically).
They are in fact always going to be powers of two, and
I could in fact get a right-shift and a left-shift value from
the OS instead of divide and multiply values.
That's a great idea. It does rely on the distance between
selectors being a power of 2 though. Is it wise to mandate
such a processor?
Also, it is sounding like this will work in PM32. Will it also
work in long mode? I'm not familiar with their selectors
or anything else.
Long mode doesn't use selectors. There is one flat physical address
space (2^40 to 2^64 bits in size depending on processor implementation).
Once paging is enabled, the virtual address is a full 64-bits.

Segment registers (e.g. %fs, %gs) can be loaded with an address that will
be summed with the computed operand address when the segment override is
used. There are special instructions to load and swap %fs values. Linux
uses a segment register to point to per-core operating system data.

This is all documented in the Intel reference manuals, which you
should be intimately familiar with if you desire to write an operating
system for the intel processor family.
muta...@gmail.com
2021-07-15 20:53:27 UTC
Permalink
Post by ***@gmail.com
Also, it is sounding like this will work in PM32. Will it also
work in long mode? I'm not familiar with their selectors
or anything else.
Long mode doesn't use selectors. There is one flat physical address
space (2^40 to 2^64 bits in size depending on processor implementation).
Ok, thanks. In that case, my huge memory model programs,
on a machine with 4 GiB of memory, should be able to do
a divide of 64k followed by a multiply of 1, right?

Do I lose the ability to lose some 16-bit instructions,
even compared to PM32?

Scratch that. I just realized - my 8086 programs need
to manipulate segment registers. PM32 with 512 MiB
addressability is as high as I can go unless some other
processor provides more selectors. Or instead of
selectors, flexible shifts instead of a hardcoded 4.
Well, a hardcoded 16 would be fine these days too,
if such a processor (or mode) is made.
Once paging is enabled, the virtual address is a full 64-bits.
I'm not interested in paging except if it fixes a problem
with what I wanted to do un-paged, e.g. fill holes caused
by the A20 line not being enabled, or blocking NULL pointer
assignment, or protecting low memory so the OS can
protect itself. But I don't want to consider that for the
moment. I'm still trying to write 8086 programs "properly".
Segment registers (e.g. %fs, %gs) can be loaded with an address that will
be summed with the computed operand address when the segment override is
used. There are special instructions to load and swap %fs values. Linux
uses a segment register to point to per-core operating system data.
Ok, I don't need that at the moment. I'm just trying to
get the equivalent of MSDOS to work. Single core.
This is all documented in the Intel reference manuals, which you
should be intimately familiar with if you desire to write an operating
system for the intel processor family.
If I had done that, I would have been stuck in a rut.

It took more than 10 years I think for "separate memory"
to be invented (by somitcw) to solve the problem of
multiple ATL address spaces, despite the solution being
very obvious and conceptually simple once verbalized.
Despite plenty of good brains being thrown at the
problem. Absolutely everyone, including him, was stuck
in a rut. Even then he didn't create it in a vacuum, but as
part of yet another discussion on how to solve the problem.

I just need a little bit of specific knowledge to help me on
my way for the ultimate 8086+ processor. If that matches
what Intel/AMD came up with, cool.

BFN. Paul.
muta...@gmail.com
2021-07-15 20:08:26 UTC
Permalink
Depends very much what "works" mean. Running unmodified
86 binaries: AFAICS no.
Do you mean running ALL unmodified 8086 binaries?

If so, you are correct.

But all C-generated code, other than huge memory
model, doesn't manipulate the segment registers
as far as hardcoding a 4-bit shift.

So. It depends. Some 8086 binaries will work, some
won't.

I can ensure that all of my C90-compliant programs
work, since even in huge memory model I have control
of the C library with PDPCLIB.
You are creating a virtual machine
Why are you calling it a virtual machine??? By what
definition? Because it doesn't use the full capability
of the 80386?
which with little care (and assuming divisor 16) will run the
same binary on 86 and 386.
Why am I assuming a divisor of 16?
Hmm. On 86 multiply and divide take time comparable to tens
of simpler instructions. You also have problem of finding
place to keep your divisor. If you keep divisor in register,
that alone will blow up your 10% budget (losing a register on
86 is closer to 20% drop in performace) and introduce seroius
incompatiblity with traditional 86 code. If you keep divisor in
memory there will be extra segment manipulations to fetch
divisor.
I'm not worried at all about performance of huge memory
model programs, which are the only ones that need to do
the divide and multiply.

That's the price you pay when you exceed 64k in a single
data structure on a machine that was only designed to
run 16-bit programs. There comes a point when you're
supposed to be recompiling as 32-bit. If you insist on
maintaining 8086 compatibility, then you will take a
performance hit.

All other memory models will run full speed, other than
any x'66' that are being skipped.
Well, if you have C source, then there is obvious way
to get good speed: recompile for newer machine.
Sure. But I'm trying to construct the best 16-bit
machine at the moment, using existing tools.

Ideally I can have EFFECTIVE 16-bit segment shifts
and address an entire 4 GiB, but they didn't provide
enough selectors for that, and I can only get 512
MiB, so the equivalent of an 8086 with a 13-bit
segment shift instead of a 4-bit segment shift.
Note that fancy "binary translation" schemes claim
speed comparable to native speed. Even less fancy
schemes should be not much worse than 3 times slower
compared to native. Your scheme for huge model
programs is likely to lead to much more significant
slowdown.
Huge memory model is rare - I'm not worried about that.
Even if it is 3 times slower (the whole application). It
would be good if other memory models ran at full speed
though, but I don't know how many of the instructions
disappear in PM32.

BFN. Paul.
muta...@gmail.com
2021-07-15 22:55:40 UTC
Permalink
Post by ***@gmail.com
But all C-generated code, other than huge memory
model, doesn't manipulate the segment registers
as far as hardcoding a 4-bit shift.
Well, it depends on details which you did not provide.
If you reserve a register, then all programs, regardless
of mode will be affected, basically in such case you
need recompile with compiler respecting your convention.
No registers need to be reserved. The only thing that
an 8086 program needs to do is not assume that the
shift is 4 bits, and instead make an OS query to find
out a right-shift value (or divide) and a left-shift value
(or multiply) if they wish to *modify* a segment
register with a value that was not set at load time.

And the only reason they would need to do this is
if they are exceeding 64k of memory with a single
pointer and thus need to normalize the pointer.

Well, there are other reasons (like setting the
segment to x'b800'), but that is non-C90
code. At the moment I am trying to support C90
code.
Above you claim that that only huge mode will be
affected. First, AFAICS most existing binaries are
mixed mode, that is use mixture of near and far pointers.
Far pointers are not a problem. Only huge pointers
are a problem.
Post by ***@gmail.com
I can ensure that all of my C90-compliant programs
work, since even in huge memory model I have control
of the C library with PDPCLIB.
Well, you need cooperation of compiler and linker too.
The compilers are all generating correct code already,
other than the failure to emit x'66' to prepare themselves
for the 80386.

The linker does indeed need to ensure that no individual
function is split over a 64k boundary.
Post by ***@gmail.com
Post by a***@math.uni.wroc.pl
You are creating a virtual machine
Why are you calling it a virtual machine??? By what
definition? Because it doesn't use the full capability
of the 80386?
Because valid binaries will be broken.
That's a strange definition of a VM. Windows 10 has
broken all of my 8086 binaries. It's not a VM.
You need to recompile specifically for your VM.
I compile for an 8086. The 8086 is not a VM.
My executables are pure 8086 code unless you
count the x'66' no-op. Huge memory model will
use an INT 21H that wasn't provided by MSDOS,
to get a right-shift and left-shift, but that doesn't
create a VM either.

And if you insist on counting the x'66' as "aha,
VM", maybe I can simply make the C compiler not
generate the instructions that need an x'66' to
work in PM32. At this stage I don't know what
happens if those instructions are eliminated. Will
I still have a Turing machine?
Post by ***@gmail.com
Post by a***@math.uni.wroc.pl
which with little care (and assuming divisor 16) will run the
same binary on 86 and 386.
Why am I assuming a divisor of 16?
Let us see. What will be machine code for skeleton program
int f1(void) {
return 1;
}
/* Other, possibly long code */
int f2(void) {
return 2;
}
/* More other code */
int f3(void) {
return (f1() + f2());
}
Of course, most interesting is code for 'f3', in particular how
you represent addresses of functions and what sequence you use
for calls. "possibly long code" means that you should be
prepared to handle programs with more than 1 segment of code.
You're talking about medium and large memory model programs.

Those have far code pointers.

When f3() calls f1(), which may be in a different segment, it
does it by doing a far call. The segment that is used in that
call is indeed set by the linker, and indeed, is based on
segments being 4-bit shifts. But that segment gets altered
by the OS when it is loaded (relocated). The algorithm on
MSDOS is very simple - just add the load point's segment.

My proposal is that when the binary is loaded by PDOS/86
running in PM32, that the segments will be radically
different. They will in fact all be selectors. So if you have
an 8086 binary that has say f1 at 3000:0000 and f2 at
3005:0000 then in PM32 with 512 MiB (or more, but
not accessible) memory so we use 16-bit shifts, the
segment/selector of both f1 and f2 will be adjusted to
the same, because they both fit within a common
64k base (x'30000'). f2 will then need the *offset*
adjusted to x'0050'. On the 8086 the offset is never
adjusted, but on 80386 I will need the offset adjusted.

And yes, this may mean another linker change/format
to support offsets being adjusted. I'm not sure if there
is sufficient information already in current MSDOS
executables to guess where the offset is.
Well, it is for you to justify that your apprach will work
in other modes. Note that to run program as PM16 normally
you will at least relink it.
I accept that a relink may be required.
Post by ***@gmail.com
That's the price you pay when you exceed 64k in a single
data structure on a machine that was only designed to
run 16-bit programs. There comes a point when you're
supposed to be recompiling as 32-bit. If you insist on
maintaining 8086 compatibility, then you will take a
performance hit.
Sure, I would recompile instead to inventing strange schemmas.
Sure. If you're happy to move from 16-bit programming
to 32-bit programming, then none of this is relevant.

But what would you have done in 1985?

Also, this is a generic problem. Coding 32-bit programs
accessing more than 4 GiB of memory have the same
problem.

And from the application's point of view, there is nothing
strange about this. It's absolutely stock-standard 8086
code unless you count the dead x'66' and the fact that
the 8086 OS vendor insists that you stop hardcoding
the number "4" in your huge pointers and instead call
the OS at startup time to get the 2 values you need.

And the linker format that includes offset adjustments
when those numbers seemingly never get modified,
which in fact, is true for PDOS/86 running on an 8086.
Post by ***@gmail.com
All other memory models will run full speed, other than
any x'66' that are being skipped.
ATM you say this. But you did not say how, in particular
if you use "segment shift" different than 4.
I have laid out my proposal. If there is a specific thing
that I have missed, I can address that if you tell me
what it is.
Post by ***@gmail.com
Post by a***@math.uni.wroc.pl
Well, if you have C source, then there is obvious way
to get good speed: recompile for newer machine.
Sure. But I'm trying to construct the best 16-bit
machine at the moment, using existing tools.
You are setting strange constraints and getting strange
results.
The 8086 set the constraints. I'm just working within
those constraints, with inside knowledge of what the
80386 is going to give us "in a few years from now
(e.g. 1982)".

The results are not strange. My 16-bit programs have
the ability to work with more than 64k of memory.
On the 8086 that is 1 MiB. On the 80386 that is
512 MiB. What's not to love?
Once you routinely have more than 64k program you need
32-bit pointers and it makes sense to switch to 32-bits.
It's only when you need more than 64k of data in
a single buffer that you need to switch to 32 bit
offsets.
Post by ***@gmail.com
Ideally I can have EFFECTIVE 16-bit segment shifts
and address an entire 4 GiB, but they didn't provide
enough selectors for that, and I can only get 512
MiB, so the equivalent of an 8086 with a 13-bit
segment shift instead of a 4-bit segment shift.
That is your choice. You want single LDT.
Yes, I want a simple machine. After the "simple"
technology is proven I am happy for people to
come up with something better that uses the
full 4 GiB for 16-bit programming.
Note that
each program can have its own LDT and with per-program
LDT you can address much more memory. OTOH 286
segmentaion makes sense if you have a lot of programs,
each of them relatively small. By insisting on single
address space you throw out main intended use cases
for 286.
My goal is to get something that looks like MSDOS
working on the 80386, and then maybe the 80286
and x64, and in fact, unrelated processors.

I expect to die before I accomplish all processors in
the world.

Once MSDOS is everywhere, the next step will be to
support the "main intended use cases". I'll leave that
technical challenge to Soolin.
Even if huge model is rare on original 86 (I think that
huge pointers are used in may programs and saying "rare"
underestimates their use), it would be dominant mode on
Are you sure you mean "huge pointers" rather than
"far pointers"?
machines having more memory. Why have more memory if you
can not have large array?
That's an odd question. You can use more memory,
much more than 64k, but you can't have/expect a single
array more than 64k when you're doing 16-bit programming.

It so happens that there is an expensive way of achieving
that on the 8086, but it's not common. Many programs
do lots of small mallocs rather than one huge malloc.
Quite apart from the fact that no-one at all (*), not even in
huge memory model, gives you a working malloc() that
does that. You have to use non-standard farmalloc() to
achieve that, and then you are doing non-C90 which I
don't support.

(*) Ok, so Smaller C does it, but requires an 80386. And
I intend to do it in PDPCLIB too, combined with Watcom
C generating 8086 instructions in huge memory model,
but haven't implemented it yet, so can't prove definitively
that it works either.

BFN. Paul.
a***@math.uni.wroc.pl
2021-07-16 02:36:19 UTC
Permalink
Post by ***@gmail.com
Post by ***@gmail.com
But all C-generated code, other than huge memory
model, doesn't manipulate the segment registers
as far as hardcoding a 4-bit shift.
Well, it depends on details which you did not provide.
If you reserve a register, then all programs, regardless
of mode will be affected, basically in such case you
need recompile with compiler respecting your convention.
No registers need to be reserved. The only thing that
an 8086 program needs to do is not assume that the
shift is 4 bits, and instead make an OS query to find
out a right-shift value (or divide) and a left-shift value
(or multiply) if they wish to *modify* a segment
register with a value that was not set at load time.
OS call to get segment shift? That is horribly inefficient
if you do this for every memory access. And if not then
question where you store segment shift is back.

AFAICS it would be cheaper to offer Basic-like PEEK and POKE
system calls...
Post by ***@gmail.com
And the only reason they would need to do this is
if they are exceeding 64k of memory with a single
pointer and thus need to normalize the pointer.
Well, there are other reasons (like setting the
segment to x'b800'), but that is non-C90
code. At the moment I am trying to support C90
code.
Above you claim that that only huge mode will be
affected. First, AFAICS most existing binaries are
mixed mode, that is use mixture of near and far pointers.
Far pointers are not a problem. Only huge pointers
are a problem.
Post by ***@gmail.com
I can ensure that all of my C90-compliant programs
work, since even in huge memory model I have control
of the C library with PDPCLIB.
Well, you need cooperation of compiler and linker too.
The compilers are all generating correct code already,
other than the failure to emit x'66' to prepare themselves
for the 80386.
The linker does indeed need to ensure that no individual
function is split over a 64k boundary.
I am not sure if compilers supported this, but single function
bigger that 64k is valid once you allow more than 64k code.
Compiler can use any mixture of near and far jumps inside.
Post by ***@gmail.com
Post by ***@gmail.com
Post by a***@math.uni.wroc.pl
You are creating a virtual machine
Why are you calling it a virtual machine??? By what
definition? Because it doesn't use the full capability
of the 80386?
Because valid binaries will be broken.
That's a strange definition of a VM. Windows 10 has
broken all of my 8086 binaries. It's not a VM.
It is _one_ of conditions to distinguish VM. Note that
in 64-bit mode real mode programs are no longer supported
(you can run them using a VM). Windows 10 probaly broken
them by not providing VM. More to the point: Windows
(or Linux) provide a VM in a sense that some operations
cause OS to do something special. But as long as you
stay with normal (unpriviledged) instructions behaviour
is as defined by hardware. You effectively create new
instruction set, different of hardware instruction set.
Some instruction sequences which are valid on real
hardware will fail, OTOH some instruction sequences
will get new meaning. Those two properties means you
have new instruction set which is distinguishing feature
of VM. Your VM has a lot of similarity to 8086 and
your implementation will be simpler than many other VM.
But it is still a VM.

BTW: In the spirit you idea has some similarity to
NACL project at Google. Google idea was to forbid
some 386 instruction sequences. When forbidden
instruction sequences were absent then (according
to Google folks) there was no way to execute something
which was not code or say overwite return address on
the stack. Idea was that Web browser could check
that rules are followed and then safely execute
downloaded code in the same address space as brower,
without risk of malicious behaviour from downloaded
code. This approch was consider to be a VM, build
using 386 instructions, but one had to compile
programs in special way (otherwise rules would be
broken) and one get new properties (safety warranty).
Post by ***@gmail.com
You need to recompile specifically for your VM.
I compile for an 8086. The 8086 is not a VM.
My executables are pure 8086 code unless you
count the x'66' no-op. Huge memory model will
use an INT 21H that wasn't provided by MSDOS,
to get a right-shift and left-shift, but that doesn't
create a VM either.
And if you insist on counting the x'66' as "aha,
VM", maybe I can simply make the C compiler not
generate the instructions that need an x'66' to
work in PM32. At this stage I don't know what
happens if those instructions are eliminated. Will
I still have a Turing machine?
In PM16 you do not need prefixes, so this is smallest
issue. Other are mor significant.
Post by ***@gmail.com
Post by ***@gmail.com
Post by a***@math.uni.wroc.pl
which with little care (and assuming divisor 16) will run the
same binary on 86 and 386.
Why am I assuming a divisor of 16?
Let us see. What will be machine code for skeleton program
int f1(void) {
return 1;
}
/* Other, possibly long code */
int f2(void) {
return 2;
}
/* More other code */
int f3(void) {
return (f1() + f2());
}
Of course, most interesting is code for 'f3', in particular how
you represent addresses of functions and what sequence you use
for calls. "possibly long code" means that you should be
prepared to handle programs with more than 1 segment of code.
You're talking about medium and large memory model programs.
Those have far code pointers.
When f3() calls f1(), which may be in a different segment, it
does it by doing a far call. The segment that is used in that
call is indeed set by the linker, and indeed, is based on
segments being 4-bit shifts. But that segment gets altered
by the OS when it is loaded (relocated). The algorithm on
MSDOS is very simple - just add the load point's segment.
So you assume "typical" MSDOS executable. Some MSDOS compiler
generated self-relocating executables (relocation was done
by compiler generated stub). And when you speak about 8086
executables, than absolute executable is valid. Such executable
could work on PC without an OS. Or under MSDOS as long as
required memory area was free.
Post by ***@gmail.com
My proposal is that when the binary is loaded by PDOS/86
running in PM32, that the segments will be radically
different. They will in fact all be selectors. So if you have
an 8086 binary that has say f1 at 3000:0000 and f2 at
3005:0000 then in PM32 with 512 MiB (or more, but
not accessible) memory so we use 16-bit shifts, the
segment/selector of both f1 and f2 will be adjusted to
the same, because they both fit within a common
64k base (x'30000'). f2 will then need the *offset*
adjusted to x'0050'. On the 8086 the offset is never
adjusted, but on 80386 I will need the offset adjusted.
And yes, this may mean another linker change/format
to support offsets being adjusted. I'm not sure if there
is sufficient information already in current MSDOS
executables to guess where the offset is.
Well, it is for you to justify that your apprach will work
in other modes. Note that to run program as PM16 normally
you will at least relink it.
I accept that a relink may be required.
What about casts betwen data and function pointers? Formally
this is undefined in C90, but works in large model. And
implementation is just a copy. In PM if you implement this
via copy, then program will crash if it tries to access
data via such pointer. Such use is probably rare, but in
big program rare uses will appear somewhere, so this is
restriction on programs that you can run.
Post by ***@gmail.com
Post by ***@gmail.com
That's the price you pay when you exceed 64k in a single
data structure on a machine that was only designed to
run 16-bit programs. There comes a point when you're
supposed to be recompiling as 32-bit. If you insist on
maintaining 8086 compatibility, then you will take a
performance hit.
Sure, I would recompile instead to inventing strange schemmas.
Sure. If you're happy to move from 16-bit programming
to 32-bit programming, then none of this is relevant.
But what would you have done in 1985?
In 1985 I got ZX-Spectrum. No problem with segmentation.
I would prefer to get Atari ST (with 32-bit Motorala 68000),
but Atari was too expensive (I think it was still cheaper
than a PC in comparable configuration).
Post by ***@gmail.com
Also, this is a generic problem. Coding 32-bit programs
accessing more than 4 GiB of memory have the same
problem.
If you need to access more than 64k memory you need more
than 16 address bits. If more than 4G, you need more
than 32 address bits. There is no way around this.
You can play tricks to reduce cost of address bits.
Once you have enough transitors the most sensible way
it to have enough address bit in general registers.
For Intel, after starting with 8086 probably most
sensible may would be to go stright to 32-bit registers,
leaving old segment registers for backward compatiblity.
Or maybe increase segment registers to 32 bits, with
old instructions loading only 16 bits with shift and
new one loading full segment register.
Post by ***@gmail.com
And from the application's point of view, there is nothing
strange about this. It's absolutely stock-standard 8086
code unless you count the dead x'66' and the fact that
the 8086 OS vendor insists that you stop hardcoding
the number "4" in your huge pointers and instead call
the OS at startup time to get the 2 values you need.
And the linker format that includes offset adjustments
when those numbers seemingly never get modified,
which in fact, is true for PDOS/86 running on an 8086.
Post by ***@gmail.com
All other memory models will run full speed, other than
any x'66' that are being skipped.
ATM you say this. But you did not say how, in particular
if you use "segment shift" different than 4.
I have laid out my proposal. If there is a specific thing
that I have missed, I can address that if you tell me
what it is.
You assume that you provide your compiler and linker, which
will avoid any dependence on "segment shift", and you
have programs which do not make (non-portable) tricks
like convert pointer to long, do some arithmetic and
convert back. With such assumption programs should
work, but you may exclude surprisingly many real programs.
And it is not clear what you gain: if your program needs
more than 1 meg it will not work on original 8086.
If it works in 1 meg, then standard 8086 "segment shift"
is enough. And if you have better machine, why to cripple
it with notion of "segment shift"? Under your rules
you can put segments in memory however you wish and
program should work. In particular, you can set segment
limit at what is used and pack them tightly one after
another. Your approach essentially forces gaps, when
data or routines do not fit in remaining space.
Post by ***@gmail.com
Post by ***@gmail.com
Post by a***@math.uni.wroc.pl
Well, if you have C source, then there is obvious way
to get good speed: recompile for newer machine.
Sure. But I'm trying to construct the best 16-bit
machine at the moment, using existing tools.
You are setting strange constraints and getting strange
results.
The 8086 set the constraints. I'm just working within
those constraints, with inside knowledge of what the
80386 is going to give us "in a few years from now
(e.g. 1982)".
The results are not strange. My 16-bit programs have
the ability to work with more than 64k of memory.
On the 8086 that is 1 MiB. On the 80386 that is
512 MiB. What's not to love?
Once you routinely have more than 64k program you need
32-bit pointers and it makes sense to switch to 32-bits.
It's only when you need more than 64k of data in
a single buffer that you need to switch to 32 bit
offsets.
You did not get what I wrote: to point to arbitrary
location you need segment+offset, that is 16+16 = 32 bits.
So you need exactly the same memory for pointers as
program using flat 32-bit pointers. And you need more
instructions, because in many cases you need separate
instruction for segment and offset. If you care about
space you can use a few "base" pointers which are
32-bit and do the rest via offsets which may be 16 bit.
Of course, to make use of 16-bit offsets kosher in
C90 you need to allocate chunks smaller than 64k and
use offsets only within hunk. This is not much
different than what happens in 16-bit code: you
can reduce pointers to offsets only if you arrange
data so that they are in single segment.
Post by ***@gmail.com
Post by ***@gmail.com
Ideally I can have EFFECTIVE 16-bit segment shifts
and address an entire 4 GiB, but they didn't provide
enough selectors for that, and I can only get 512
MiB, so the equivalent of an 8086 with a 13-bit
segment shift instead of a 4-bit segment shift.
That is your choice. You want single LDT.
Yes, I want a simple machine. After the "simple"
technology is proven I am happy for people to
come up with something better that uses the
full 4 GiB for 16-bit programming.
Note that
each program can have its own LDT and with per-program
LDT you can address much more memory. OTOH 286
segmentaion makes sense if you have a lot of programs,
each of them relatively small. By insisting on single
address space you throw out main intended use cases
for 286.
My goal is to get something that looks like MSDOS
working on the 80386, and then maybe the 80286
and x64, and in fact, unrelated processors.
What does it mean "looks like MSDOS"? Some sources
claim that CP/M commands and later MSDOS command
were modelled after CMS. So does CMS qualify as
"looks like MSDOS"? Note that in CMS there is no
segment nonsense (it uses 24-bit address mode on 370).
System calls are via SVC and DIAGNOSE. OTOH CMS is
single tasking, once you have access to a disk you
may trash filesystem (if you rally want to).
Post by ***@gmail.com
I expect to die before I accomplish all processors in
the world.
Once MSDOS is everywhere, the next step will be to
support the "main intended use cases". I'll leave that
technical challenge to Soolin.
Even if huge model is rare on original 86 (I think that
huge pointers are used in may programs and saying "rare"
underestimates their use), it would be dominant mode on
Are you sure you mean "huge pointers" rather than
"far pointers"?
Yes. Once you have a single array bigger than 64k you need
huge pointers. A lot of programs put main data in single
buffer, bigger than 64k. Program may do 99% of work with
small offsets, but a few huge pointers are essential.
Main reson to avoid huge pointers is their high execution
time. But it still easier and faster to do work combining
near and huge pointers, than staying uniformly with far
pointers.
Post by ***@gmail.com
machines having more memory. Why have more memory if you
can not have large array?
That's an odd question. You can use more memory,
much more than 64k, but you can't have/expect a single
array more than 64k when you're doing 16-bit programming.
Yes. So when I got my own PC I made sure it can run 32-bit
code and went almost exclusively 32-bit. Few times when
I run "the same" code in 16-bit and 32-bit version
32-bit was significantly faster (because gcc that I used
for 32-bit had much better code generator than 16-bit
compilers available to me). At OS level I used DOS
(and DJGPP) for some time, but it quickly become
clear that Unix (386BSD and later Linux) gives me much
better performance on the same hadware. In 16-bit era
I was mostly Pascal programmer, once I became comfortable
with C there was no more reson to create 16-bit
programs (OK, I did a CDROM driver to make sure that
proprietary MSDOS program worked with our hardware,
boot sector for my toy OS and few system hacks).

My point is that trying to stick to 16-bit model is
really limiting once you have more memory.
--
Waldek Hebisch
muta...@gmail.com
2021-07-16 09:00:50 UTC
Permalink
Post by ***@gmail.com
No registers need to be reserved. The only thing that
an 8086 program needs to do is not assume that the
shift is 4 bits, and instead make an OS query to find
out a right-shift value (or divide) and a left-shift value
(or multiply) if they wish to *modify* a segment
register with a value that was not set at load time.
OS call to get segment shift? That is horribly inefficient
if you do this for every memory access.
The executable needs to do this once at startup. Any
executable that uses huge pointers. ie mainly huge
memory model programs, which are rare. So rare in
fact that Turbo C++ doesn't even generate the
required assembler code. Watcom does though.

The result of the OS call can then be saved in global
variables for use by a C runtime library provided that
works with a suitable compiler like Watcom.
And if not then
question where you store segment shift is back.
I don't understand that question.
AFAICS it would be cheaper to offer Basic-like PEEK and POKE
system calls...
No, it is a minor change to existing huge pointer
manipulation. You just access a couple of global
variables whenever you normalize the pointer
instead of hardcoding the number "4".
Post by ***@gmail.com
The linker does indeed need to ensure that no individual
function is split over a 64k boundary.
I am not sure if compilers supported this, but single function
bigger that 64k is valid once you allow more than 64k code.
Compiler can use any mixture of near and far jumps inside.
Ok, if you are mixing near and far code pointers,
in a single function that exceeds 64k, you will
indeed exceed the ability of the OS to load that
on a non-16-byte boundary.

So that won't be supported.
Post by ***@gmail.com
Post by ***@gmail.com
Post by a***@math.uni.wroc.pl
You are creating a virtual machine
Why are you calling it a virtual machine??? By what
definition? Because it doesn't use the full capability
of the 80386?
Because valid binaries will be broken.
That's a strange definition of a VM. Windows 10 has
broken all of my 8086 binaries. It's not a VM.
It is _one_ of conditions to distinguish VM. Note that
in 64-bit mode real mode programs are no longer supported
(you can run them using a VM). Windows 10 probaly broken
them by not providing VM. More to the point: Windows
(or Linux) provide a VM in a sense that some operations
cause OS to do something special. But as long as you
stay with normal (unpriviledged) instructions behaviour
is as defined by hardware. You effectively create new
instruction set, different of hardware instruction set.
Some instruction sequences which are valid on real
hardware will fail, OTOH some instruction sequences
will get new meaning. Those two properties means you
have new instruction set which is distinguishing feature
of VM. Your VM has a lot of similarity to 8086 and
your implementation will be simpler than many other VM.
But it is still a VM.
Not by the above definition. First of all, I'm not providing
a VM at all. I'm providing an operating system and
applications. Both of which are pretty simple. You can't
make them very much simpler. There is no virtual hardware,
no instruction interpretation. No new instructions. All
applications will comprise of a set of instructions that
are common to both 8086 and 80386. Like mov ax, bx

There's nothing virtual about that. And there's nothing
machine about that. I won't even bother to activate
virtual memory via CR3, nevermind an entire virtual machine.
BTW: In the spirit you idea has some similarity to
NACL project at Google. Google idea was to forbid
some 386 instruction sequences. When forbidden
instruction sequences were absent then (according
to Google folks) there was no way to execute something
which was not code or say overwite return address on
the stack. Idea was that Web browser could check
that rules are followed and then safely execute
downloaded code in the same address space as brower,
without risk of malicious behaviour from downloaded
code. This approch was consider to be a VM, build
using 386 instructions, but one had to compile
programs in special way (otherwise rules would be
broken) and one get new properties (safety warranty).
You can call a cat a dog if you want, but it's still a cat.

That's not a VM. Using a subset of available CPU
instructions is not a VM. It's neither virtual, nor a
machine. It's a SUBSET. When you compile a C
program with perhaps a -mach=486 option when you are
using a Pentium, it will restrict itself to 486 instructions
to be generated too. That's not a VM. It's a simple
application that runs on the 80486 and above.
Post by ***@gmail.com
You need to recompile specifically for your VM.
I compile for an 8086. The 8086 is not a VM.
My executables are pure 8086 code unless you
count the x'66' no-op. Huge memory model will
use an INT 21H that wasn't provided by MSDOS,
to get a right-shift and left-shift, but that doesn't
create a VM either.
And if you insist on counting the x'66' as "aha,
VM", maybe I can simply make the C compiler not
generate the instructions that need an x'66' to
work in PM32. At this stage I don't know what
happens if those instructions are eliminated. Will
I still have a Turing machine?
In PM16 you do not need prefixes, so this is smallest
issue. Other are mor significant.
Ok. But I'm more interested in PM32. Why not take it
to the max?
Post by ***@gmail.com
When f3() calls f1(), which may be in a different segment, it
does it by doing a far call. The segment that is used in that
call is indeed set by the linker, and indeed, is based on
segments being 4-bit shifts. But that segment gets altered
by the OS when it is loaded (relocated). The algorithm on
MSDOS is very simple - just add the load point's segment.
So you assume "typical" MSDOS executable. Some MSDOS compiler
generated self-relocating executables (relocation was done
by compiler generated stub). And when you speak about 8086
executables, than absolute executable is valid. Such executable
could work on PC without an OS. Or under MSDOS as long as
required memory area was free.
Those things will not be supported. I'm not trying to solve
the entire world's problems in one hit. I'm trying to create
a set of rules for anyone who wants to future-proof their
16-bit 8086 applications.

If your particular application can't live within the rules, then
so be it, it won't run on the next generation of processor,
ie the 8086+, which supports 512 MiB of memory with
13-bit shifts. I believe the 80386 has the equivalent
already.
Post by ***@gmail.com
Well, it is for you to justify that your apprach will work
in other modes. Note that to run program as PM16 normally
you will at least relink it.
I accept that a relink may be required.
What about casts betwen data and function pointers? Formally
this is undefined in C90,
Yes, for exactly this reason. Works fine today on the
8086. Breaks when you move to the 80386 in PM32.
but works in large model. And
implementation is just a copy. In PM if you implement this
via copy, then program will crash if it tries to access
data via such pointer. Such use is probably rare, but in
big program rare uses will appear somewhere, so this is
restriction on programs that you can run.
gccwin is a 3 MB executable and there is not a single
such occurrence that I am aware of. The "-ansi -pedantic"
didn't pick it up, anyway. I don't know beyond that.
That's 400,000 lines of C code. What's your definition
of "big"?
Post by ***@gmail.com
Post by ***@gmail.com
That's the price you pay when you exceed 64k in a single
data structure on a machine that was only designed to
run 16-bit programs. There comes a point when you're
supposed to be recompiling as 32-bit. If you insist on
maintaining 8086 compatibility, then you will take a
performance hit.
Sure, I would recompile instead to inventing strange schemmas.
Sure. If you're happy to move from 16-bit programming
to 32-bit programming, then none of this is relevant.
But what would you have done in 1985?
In 1985 I got ZX-Spectrum. No problem with segmentation.
I would prefer to get Atari ST (with 32-bit Motorala 68000),
but Atari was too expensive (I think it was still cheaper
than a PC in comparable configuration).
What would you have done in 1985 if your EMPLOYER
bought an 8086 and said "please write a program to
do xyz. And make sure it's future-proofed because I
hear there's a fantastic new processor coming out
next year that is going to address 512 MiB of memory
for 16-bit applications. No more of this 4-bit shift crap.
Finally Gates and Intel have got it right."
Post by ***@gmail.com
Also, this is a generic problem. Coding 32-bit programs
accessing more than 4 GiB of memory have the same
problem.
If you need to access more than 64k memory you need more
than 16 address bits. If more than 4G, you need more
than 32 address bits. There is no way around this.
You can play tricks to reduce cost of address bits.
Once you have enough transitors the most sensible way
it to have enough address bit in general registers.
For Intel, after starting with 8086
First of all are we now agreed that:

1. 8086 was the correct technical solution at the time it
was produced.

2. The 8086 silicon was perfect.

3. Intel stuffed up in the documentation by not mentioning
"by the way, assemblers should generate x'66' in front of
a mov ax, bx instruction in order to be future-proofed. If
you omit the x'66', it will still work, but we won't catch that,
and your program will break on the next generation of
processors".

4. Intel stuffed up in the documentation by not saying x'66'
is at the moment a single byte no-op, but will have a
meaning later.

5. Intel stuffed up in the documentation by not saying
"please don't assume 4-bit segment shifts, just because
that's what the 8086 does currently. We may come out
with an 8086+ next year which bumps that up to 6."

6. Microsoft stuffed up by not providing an INT 21H call
to obtain a right and left shift to adjust segments by.

Yes, I have the benefit of 20/20 hindsight, unlike them,
so it's not really fair, but that's what technically happened.

I'm not sure if some mathematician/computer science
person could have sat down and calculated this in say 1970,
as a generic solution to the "address range exceeding
register range problem".
probably most
sensible may would be to go stright to 32-bit registers,
leaving old segment registers for backward compatiblity.
Or maybe increase segment registers to 32 bits, with
old instructions loading only 16 bits with shift and
new one loading full segment register.
Well, this is an interesting proposition. What precisely
do you think Intel should have done and why? What
exactly are you trying to achieve?
Post by ***@gmail.com
ATM you say this. But you did not say how, in particular
if you use "segment shift" different than 4.
I have laid out my proposal. If there is a specific thing
that I have missed, I can address that if you tell me
what it is.
You assume that you provide your compiler and linker, which
will avoid any dependence on "segment shift", and you
have programs which do not make (non-portable) tricks
like convert pointer to long, do some arithmetic and
convert back. With such assumption programs should
work,
Exactly. We both believe it should work. That's a great
starting point.
but you may exclude surprisingly many real programs.
I'm interested in C90-compliant programs. I fully
understand that I am excluding others. But at least
there is hope.
And it is not clear what you gain: if your program needs
more than 1 meg it will not work on original 8086.
If it works in 1 meg, then standard 8086 "segment shift"
is enough. And if you have better machine, why to cripple
it with notion of "segment shift"?
Some programs don't have a particular limit.
Micro-emacs will edit any file you want, until you
run out of memory. A 16-bit version of micro-emacs
will suddenly start editing 512 MiB files. What's not
to love? In 1986 even. It wasn't theoretical. There was
a processor capable of editing 512 MiB files, while
still running on the 8086. It was only missing an
operating system and a set of rules.

Maybe if micro-emacs had been written in compliance
with "the rules", someone would have made the effort
to put out an 8086+ with 6-bit or 8-bit shifts which would
have covered all the files I ever wanted to edit. I used to
come up against hard limits on the 8086.
Under your rules
you can put segments in memory however you wish and
program should work. In particular, you can set segment
limit at what is used and pack them tightly one after
another. Your approach essentially forces gaps, when
data or routines do not fit in remaining space.
The shift amount is flexible. If you have 512 MiB of
RAM, you don't care about the gaps.

If you have 2 MiB of RAM, you may well care about
the gaps, so use a 5-bit shift. What alternative is there?
Doing 4 instead of 5 won't help you.
Post by ***@gmail.com
Once you routinely have more than 64k program you need
32-bit pointers and it makes sense to switch to 32-bits.
It's only when you need more than 64k of data in
a single buffer that you need to switch to 32 bit
offsets.
You did not get what I wrote: to point to arbitrary
location you need segment+offset, that is 16+16 = 32 bits.
So you need exactly the same memory for pointers as
program using flat 32-bit pointers. And you need more
instructions, because in many cases you need separate
instruction for segment and offset. If you care about
space you can use a few "base" pointers which are
32-bit and do the rest via offsets which may be 16 bit.
Of course, to make use of 16-bit offsets kosher in
C90 you need to allocate chunks smaller than 64k and
use offsets only within hunk. This is not much
different than what happens in 16-bit code: you
can reduce pointers to offsets only if you arrange
data so that they are in single segment.
There are multiple memory models on the 8086 to
choose from, and they are all in fact, generic. The
same memory models apply to a processor with
32-bit registers designed to access more than
4 GiB of memory. Doesn't have to be x86-based.
Post by ***@gmail.com
My goal is to get something that looks like MSDOS
working on the 80386, and then maybe the 80286
and x64, and in fact, unrelated processors.
What does it mean "looks like MSDOS"? Some sources
claim that CP/M commands and later MSDOS command
were modelled after CMS. So does CMS qualify as
"looks like MSDOS"? Note that in CMS there is no
segment nonsense (it uses 24-bit address mode on 370).
System calls are via SVC and DIAGNOSE. OTOH CMS is
single tasking, once you have access to a disk you
may trash filesystem (if you rally want to).
I expect to be able to connect via a VT100 and type
"dir" and see files called fred.exe or whatever, and
subdirectories and if I type "fred" it runs some
program (I don't care if the instructions are S/370,
and in fact, I won't even be able to detect that).
I then want to go "e fred.c" to run Micro-emacs to
edit my C program and then type "gcc -S -I . -o fred.s fred.c"
to convert my C program into assembler.

That's enough.

BTW, have you ever tried PDOS/3X0? It runs on
the S/370 you mentioned. In AM24 as you described.
Post by ***@gmail.com
Are you sure you mean "huge pointers" rather than
"far pointers"?
Yes. Once you have a single array bigger than 64k you need
huge pointers. A lot of programs put main data in single
buffer, bigger than 64k. Program may do 99% of work with
small offsets, but a few huge pointers are essential.
Main reson to avoid huge pointers is their high execution
time. But it still easier and faster to do work combining
near and huge pointers, than staying uniformly with far
pointers.
No C90-compliant 8086 compiler I know of will generate such code.
If it exists, I don't support that either.
Post by ***@gmail.com
machines having more memory. Why have more memory if you
can not have large array?
That's an odd question. You can use more memory,
much more than 64k, but you can't have/expect a single
array more than 64k when you're doing 16-bit programming.
Yes. So when I got my own PC I made sure it can run 32-bit
code and went almost exclusively 32-bit. Few times when
I run "the same" code in 16-bit and 32-bit version
32-bit was significantly faster (because gcc that I used
for 32-bit had much better code generator than 16-bit
compilers available to me). At OS level I used DOS
(and DJGPP) for some time, but it quickly become
clear that Unix (386BSD and later Linux) gives me much
better performance on the same hadware. In 16-bit era
I was mostly Pascal programmer, once I became comfortable
with C there was no more reson to create 16-bit
programs (OK, I did a CDROM driver to make sure that
proprietary MSDOS program worked with our hardware,
boot sector for my toy OS and few system hacks).
My point is that trying to stick to 16-bit model is
really limiting once you have more memory.
None of this discussion is about what to do if you have
plenty of memory and 32-bit registers. I totally agree
with you, and I almost exclusively write 32-bit programs.
All my focus is on PDOS/386, not PDOS/86. But I still
occasionally think about PDOS/86, because I chose
large memory model instead of huge, because I didn't
know what huge actually meant, and now that I know
what it is, I'm irked.

BFN. Paul.
Joe Monk
2021-07-16 22:59:07 UTC
Permalink
If Intel thinks the 8086 was a kludge, that's up to them.
But I disagree with Intel. It was the correct technical
solution.
Intel does in fact think the 8086/8088 was a kludge.

Post 8080/8085, we were supposed to get the iAPX 432. However, when the chip wars happened, the iAPX 432 had some bugs that needed to be worked out. So, they produced the 8086/88 as a stopgap.

"The iAPX 432 was referred to as a "micromainframe", designed to be programmed entirely in high-level languages. The instruction set architecture was also entirely new and a significant departure from Intel's previous 8008 and 8080 processors as the iAPX 432 programming model is a stack machine with no visible general-purpose registers. It supports object-oriented programming, garbage collection and multitasking as well as more conventional memory management directly in hardware and microcode. Direct support for various data structures is also intended to allow modern operating systems to be implemented using far less program code than for ordinary processors. Intel iMAX 432 is a discontinued operating system for the 432, written entirely in Ada, and Ada was also the intended primary language for application programming. In some aspects, it may be seen as a high-level language computer architecture."

https://en.wikipedia.org/wiki/Intel_iAPX_432

Joe
muta...@gmail.com
2021-07-17 00:14:52 UTC
Permalink
Post by Joe Monk
If Intel thinks the 8086 was a kludge, that's up to them.
But I disagree with Intel. It was the correct technical
solution.
Intel does in fact think the 8086/8088 was a kludge.
Post 8080/8085, we were supposed to get the iAPX 432.
the iAPX 432 programming model is a stack machine with no visible general-purpose registers.
That processor with no visible registers sounds like a
pie-in-the-sky design to me. You may as well design
the x64 in 1970. You can do anything on paper.

Regardless, it doesn't have to be exactly the 8086.

The generic thing is segmented memory with a
segment shift that may or may not be the same
as the register size.

And with an instruction set that matches an actual
processor of that same register size that was
effectively operating in tiny memory model (ie it had
no concept of segment registers, and the implied
segment registers were effectively the same,
effectively 0).

Explain to me why segmented memory is not the
right approach to solving this problem in an
environment with severe limits on memory, but
still more than a single register can address.

BFN. Paul.
muta...@gmail.com
2021-07-17 02:09:23 UTC
Permalink
Post by ***@gmail.com
That processor with no visible registers sounds like a
pie-in-the-sky design to me. You may as well design
the x64 in 1970. You can do anything on paper.
But it's not ... it was actually produced along with an OS, written in ADA.
Maybe I used the wrong term. You could have produced
an x64 processor in 2000 BC, implemented in the form
of Egyptian slaves. It would have been ridiculously slow,
but it would "work" for some value of "work".

You can do anything you want for fun, but I haven't
heard of anyone taking away registers. The most I've
seen anyone do is reduce the number of instructions
which managed to produce a better result. Anyone
can produce a worse result.
Post by ***@gmail.com
Explain to me why segmented memory is not the
right approach to solving this problem in an
environment with severe limits on memory, but
still more than a single register can address.
The right approach, IMHO, has always been linear addressing.
Think about how the mainframe does it ... linear address space,
with an ASID. So you could have x^asid linear address spaces.
That's not the issue being addressed. What happens when a
single application wants to access more than 4 GiB of memory
on the mainframe, and the memory is indeed available, but
you only have 32-bit registers?

tiny memory model is great, but eventually you need to
move to the other memory models and buy yourself
3 segment registers. I don't think you need any more
than that. I don't think I have any code that uses more
than cs/ds/es.
If you remember the z80, then you understand. The z80,
as a 16-bit address bus processor, was limited to 64K of
directly addressable memory. But with bank-switching,
you could put lot of memory on a system, and switch it in
and out. There were many minicomputers in the '80s that did that.
I think you're right - that would work too. But those
different banks are basically just segments themselves.
The model is still the same. Just a different way of
loading a segment register.

And that buys you what, compared to a proper segment
register that allows fine-grained segment shifts to
enable packing?

BFN. Paul.
Joe Monk
2021-07-17 11:49:04 UTC
Permalink
Post by ***@gmail.com
That's not the issue being addressed. What happens when a
single application wants to access more than 4 GiB of memory
on the mainframe, and the memory is indeed available, but
you only have 32-bit registers?
Youre joking right? Stop and think how we actually did it with 31-bit addressing back in the MVS/XA days.

Physical memory is not a constraint...

Joe
muta...@gmail.com
2021-07-17 12:01:10 UTC
Permalink
Post by Joe Monk
Post by ***@gmail.com
That's not the issue being addressed. What happens when a
single application wants to access more than 4 GiB of memory
on the mainframe, and the memory is indeed available, but
you only have 32-bit registers?
Youre joking right? Stop and think how we actually did it with 31-bit addressing back in the MVS/XA days.
Physical memory is not a constraint...
I have no idea what you are talking about. Some applications
exceed the capacity of a 2 GiB/4 GiB address space, and the
cleanest solution to that problem in the absence of 64-bit
registers is segmentation.

You just need to recompile your application in the compact
memory model and the job is done.

BFN. Paul.
Joe Monk
2021-07-17 12:27:54 UTC
Permalink
Post by ***@gmail.com
I have no idea what you are talking about. Some applications
exceed the capacity of a 2 GiB/4 GiB address space, and the
cleanest solution to that problem in the absence of 64-bit
registers is segmentation.
Think about this ... How could CICS literally run thousands of users, each with their own address space?

We didn't "segment" anything.

Joe
muta...@gmail.com
2021-07-17 12:35:25 UTC
Permalink
Post by Joe Monk
Post by ***@gmail.com
I have no idea what you are talking about. Some applications
exceed the capacity of a 2 GiB/4 GiB address space, and the
cleanest solution to that problem in the absence of 64-bit
registers is segmentation.
Think about this ... How could CICS literally run thousands of users, each with their own address space?
We didn't "segment" anything.
Those applications didn't address more than 4 GiB in a
single address space.

If you want to run some sort of video-editing software
under CICS that needs to edit a 16 GiB video file in
memory you need segmentation if you only have
32-bit registers. And even that won't work in compact
memory model unless the video is logically divided
into smaller chunks that are a maximum of 4 GiB.
If you need a single indivisible buffer of more than
4 GiB you need huge memory model.

It depends on your application.

And the exact same rules apply to 16-bit registers
when accessing more than 64k of data. There is
nothing magical about the number 16. Or 32.

BFN. Paul.
Joe Monk
2021-07-17 12:40:52 UTC
Permalink
Post by ***@gmail.com
If you want to run some sort of video-editing software
under CICS that needs to edit a 16 GiB video file in
memory you need segmentation if you only have
32-bit registers. And even that won't work in compact
memory model unless the video is logically divided
into smaller chunks that are a maximum of 4 GiB.
If you need a single indivisible buffer of more than
4 GiB you need huge memory model.
First off, even today, no application would edit a 16GB file in memory. That would be asinine.

Nothing says the *entire* file has to be in memory at the same time. It is called windowing ... a technique very commonly used.

Same thing with DB2. If I run a SQL query, and open a cursor on that query, nothing says the result set has to all be in memory at the same time.

Think outside of the small box you seem to be stuck in.

Joe
muta...@gmail.com
2021-07-17 12:51:35 UTC
Permalink
Post by Joe Monk
Post by ***@gmail.com
If you want to run some sort of video-editing software
under CICS that needs to edit a 16 GiB video file in
memory you need segmentation if you only have
32-bit registers. And even that won't work in compact
memory model unless the video is logically divided
into smaller chunks that are a maximum of 4 GiB.
If you need a single indivisible buffer of more than
4 GiB you need huge memory model.
First off, even today, no application would edit a 16GB file in memory. That would be asinine.
Your omniscience of all extant applications in the world
today is showing again.
Post by Joe Monk
Nothing says the *entire* file has to be in memory at the same time. It is called windowing ... a technique very commonly used.
You can "window" in 16-bit too.
Post by Joe Monk
Same thing with DB2. If I run a SQL query, and open a cursor on that query, nothing says the result set has to all be in memory at the same time.
Nothing says that about 16-bit either.
Post by Joe Monk
Think outside of the small box you seem to be stuck in.
Try taking your own advice.

BFN. Paul.
Joe Monk
2021-07-17 13:31:44 UTC
Permalink
Post by ***@gmail.com
Your omniscience of all extant applications in the world
today is showing again.
Well lets see ... I used to work on an application that paid 250,000 users their pensions every month. So I actually have real world applications experience. And I used to be a sysprog on multiple different mainframes, for the power company, for oil & gas, and for banks.
Post by ***@gmail.com
Post by Joe Monk
Nothing says the *entire* file has to be in memory at the same time. It is called windowing ... a technique very commonly used.
You can "window" in 16-bit too.
Sure.
Post by ***@gmail.com
Post by Joe Monk
Same thing with DB2. If I run a SQL query, and open a cursor on that query, nothing says the result set has to all be in memory at the same time.
Nothing says that about 16-bit either.
Sure.
Post by ***@gmail.com
Post by Joe Monk
Think outside of the small box you seem to be stuck in.
Try taking your own advice.
My software has been commercially used to produce profits, to the tunes of millions of dollars...

You?

Joe
muta...@gmail.com
2021-07-17 13:37:59 UTC
Permalink
Post by Joe Monk
Post by ***@gmail.com
Your omniscience of all extant applications in the world
today is showing again.
Well lets see ... I used to work on an application that paid
250,000 users their pensions every month. So I actually
have real world applications experience. And I used to be
a sysprog on multiple different mainframes, for the power
company, for oil & gas, and for banks.
And that's what is required to make someone omniscient?
Cool.
Post by Joe Monk
Post by ***@gmail.com
Post by Joe Monk
Think outside of the small box you seem to be stuck in.
Try taking your own advice.
My software has been commercially used to produce profits, to the tunes of millions of dollars...
You?
My software has not been commercially used to produce profits, to the tunes of millions of dollars...

And I farted twice today.

Your point?

BFN. Paul.
Joe Monk
2021-07-17 14:18:41 UTC
Permalink
Post by ***@gmail.com
Your point?
My point is that I have a better view of the forest. I have made peoples lives better thru my software...

I have written software that people actually depended on.

Joe
muta...@gmail.com
2021-07-17 14:28:28 UTC
Permalink
Post by Joe Monk
Post by ***@gmail.com
Your point?
My point is that I have a better view of the forest. I have made peoples lives better thru my software...
I have written software that people actually depended on.
Ok, so you have a theory that thinking outside the box is
measured by an algorithm that involves average profit,
number of companies, and number of humans whose
lives have been made "better" by an as-yet-unstated
measure.

Personally I measure it by the number of human skulls
crushed with a sledgehammer. It's largely a race between
me and Pol Pot, and the latter, although slightly ahead at
the moment, is currently in a lull.

So that's why we're having a disconnect here. It's a
semantic debate.

BFN. Paul.
Joe Monk
2021-07-17 18:00:40 UTC
Permalink
Post by ***@gmail.com
Ok, so you have a theory that thinking outside the box is
measured by an algorithm that involves average profit,
number of companies, and number of humans whose
lives have been made "better" by an as-yet-unstated
measure.
Its called success. When software is running on mainframes at the top 6% of the banks in the USA and generating millions of dollars in revenue and supporting the lives of many programmers, as well as the people who operate the system and the people who receive pension checks from the system, thats success.

Joe
muta...@gmail.com
2021-07-17 22:17:52 UTC
Permalink
Post by Joe Monk
Post by ***@gmail.com
Ok, so you have a theory that thinking outside the box is
measured by an algorithm that involves average profit,
number of companies, and number of humans whose
lives have been made "better" by an as-yet-unstated
measure.
Its called success. When software is running on mainframes
at the top 6% of the banks in the USA and generating millions
of dollars in revenue and supporting the lives of many
programmers, as well as the people who operate the system
and the people who receive pension checks from the system,
thats success.
There are some who would argue that "financially successful"
and "popular" are different concepts to "thinking outside the
box". But my advice to you if you meet someone like that is
to crush their skull with a sledgehammer.

BFN. Paul.

Scott Lurndal
2021-07-17 14:39:27 UTC
Permalink
Post by ***@gmail.com
Post by Joe Monk
If Intel thinks the 8086 was a kludge, that's up to them.
But I disagree with Intel. It was the correct technical
solution.
Intel does in fact think the 8086/8088 was a kludge.
Post 8080/8085, we were supposed to get the iAPX 432.
the iAPX 432 programming model is a stack machine with no visible general-purpose registers.
That processor with no visible registers sounds like a
pie-in-the-sky design to me. You may as well design
the x64 in 1970. You can do anything on paper.
There was at least one such system in existence in by 1961.

The Burroughs B5000/B5500.

By 1972, the HP-3000 was released which also
had no programmer visible registers.

Both were stack-based architectures programmed in high
level languages only (The B5500 OS was written in a
flavor of Algol, while the HP-3000 OS was written
in SPL/3000).
a***@math.uni.wroc.pl
2021-07-17 03:21:37 UTC
Permalink
Fortunately you answered this question above: you store
segment shift in memory. Still, it would be intersting
to see machine code that you want for
p++;
actual instructions, not some handwaving.
Ok, Watcom C generates a call to a function with
the expectation of particular registers, so ideally
the code needs to be in assembler, which I'm not
very good at. I'm better in C.
In addition, in huge memory model, pointers are
normalized as far as I am aware, so the offset will
always be less than 16.
Would you like to see some C code that does a p++
with the proposed arbitrary segment shifts?
No, I would like to see _assembler_ generated with support
for arbitrary segment shifts. I know how to handle
normal 4 bit shift, I know to implement what you want.
But that is slow code. I do not know if you can produce
better code or just did not realize how bad is resulting
code.
I think we're probably in a semantic debate now.
If you want to call everything a VM, it doesn't affect
my design. Just a few months ago someone told
me that PDOS/386 wasn't an operating system, it
was a file manager and API or something like that.
Well, PDOS/386 is not an operating system, at best is
just part of operating system. But I know that
you really want to call it operating system, so I am
simply ready to translate form your terminolgy to
accepted one.
Post by ***@gmail.com
Post by a***@math.uni.wroc.pl
In PM16 you do not need prefixes, so this is smallest
issue. Other are mor significant.
Ok. But I'm more interested in PM32. Why not take it
to the max?
You mean why not solve more non-problems? Using PM16 may
by too easy, but is was designed for compatibility with
real mode, in particular you get 16-bit address size and
16-bit operand size.
I now know that I can avoid the prefixes, in PM32,
simply by setting the D bit appropriately in all the
segments. Why would I bother learning how to
get into PM16?
If you want to do anyting (as opposed to asking other folks
to do coding for you) you need to look up descriptor
format in Intel docs. You will see that each descriptor
has type field. There are several types, but only two
are relevant here: 16-bit code descriptor and 32-bit code
descriptor. If you do far control transfer to descriptor
marked as 32-bit code, then processor switches to 32-bit
mode. If you do far control transfer to descriptor
marked as 16-bit code, then processor switches to 16-bit
mode. If you really know how to get into PM32, you
should also know how to get into PM16. Traditional sequence
to get into PM32 from real mode is as follows:

1) load GDT
2) set PM bit in CR0
3) short jump to flush prefetch queue
4) far jump to code-32 segment

After step 3 processor is in PM16 (short jump is needed to
ensure that following intructions will really work in PM16).
Step 4 is to switch mode to 32-bit.
Most C programs are full of crap. They won't even
compile, nevermind run. They will do an unconditional
#include of the non-existent sys/types.h
Well, switch to better system/compiler which provides
sys/types.h
And program were routinely "future-proofed" by izolating
tricky/nonoprtable parts in small subroutines. Everybody
accepted need to recompile for new machine and rewrites
of small non-portable parts. Later in my practice, I
if probel was expected orignal programmer would work
around it, but clearly it did not happen for unecpected
problems.
If you're happy to have a 16-bit executable, a 32-bit
executable, and a 64-bit executable, even for a
printf("hello, world\n"); then yes, you can solve the
problem that way.
On my current system support for running 32-bit and 16-bit
executables is not installed, so there are only 64-bit
one. I am happy to recompile if needed. In fact, for
rarely used programs I keep only sources and no binaries,
and compile when needed. And "hello world" is more
often compiled than run.
But I would also like to have the
option of coding the hello world as 16-bit, produce
a single executable, and have it work everywhere.
There are various "solutions" to "work everywhere", but
in modern times "everywhere" does not include 16-bit
systems.
If anyone has a specific requirement for a 32-bit
version of the program the onus is on them to
recompile it.
I was surprised to wake up one day and find that
my programs in c:\dospath stopped working.
For decades I had had no reason to recompile them.
It would have been far better if I had woken up and
I could suddenly edit 512 MiB files with them. Or
whatever the application was that just allocated
memory in chunks as required. Like maybe
Turbo C++ can suddenly start building 50 MB
executables.
Dreams are nicer than reality. My modern 16-bit coding
was for machines with small memory, between 512 byte RAM
and 2K RAM (there was also flash, 512 byte RAM machine
also had 16k flash). I am not worried that old 16-bit
programs would not run on those machines and that I need
to recompile.
Post by ***@gmail.com
2. The 8086 silicon was perfect.
Here you are making things up. 8086 was a compromise, it
worked and won against several competitors. But "won" does
not mean it was better then existing competitors, and almost
surely it was possible to make better processor.
Not when the requirement is to run 8080 CP/M
programs in a memory-restricted environment.
Note that 8086 was _not_ binary compatible with 8080.
You had to re-assemble. AFAIK various Z80 based system
offered better compatibility (and they had various kludges
to go beyond 64k).
Post by ***@gmail.com
3. Intel stuffed up in the documentation by not mentioning
"by the way, assemblers should generate x'66' in front of
a mov ax, bx instruction in order to be future-proofed. If
you omit the x'66', it will still work, but we won't catch that,
and your program will break on the next generation of
processors".
- 8086 code could be run with almost no restriction for 25
years after introduction of 8086. With small restrictions
it runs up to now.
Not addressing 512 MiB of memory it doesn't.
- you miss notion of "present value" of future gain. In 1978
it was much more important to make sure that 8086 worked
well, then worry about future processors. Exactly because
of this argument 8086 was reasonable. Otherwise you would
go directly to 32 address bits (possibly 20 address bits
saying that other 12 will be implemented in the future).
- in 1985 Intel could decide that to get 32-bit addresses
(or operands) one needs a prefix (that is effectively
do not have PM32). You may debate if using mode bit
(that is introduction on PM32) was right approach to
compatibility, but adding prefixes in 1978 made no sense.
I don't know what you're talking about. I didn't ask for
money to be spent or silicon to be changed. I asked
for about 4 lines of English to be written on a piece
of paper by Intel.
Those lines have substantial cost. Intel offered an assembler,
assuming that Intel put advice in documentation, but
did not follow it in their assembler, that would encourage
other folks to ignore advice from manual. And, as explained
this advice would be counterproductive, as the same thing
can be done without prefixes.

When 386 appeared Intel explained quite well various options
for running older code.
Post by ***@gmail.com
5. Intel stuffed
up in the documentation by not saying
Post by ***@gmail.com
"please don't assume 4-bit segment shifts, just because
that's what the 8086 does currently. We may come out
with an 8086+ next year which bumps that up to 6."
They did not introduce different segment shift, so no need
to put such speculation in documentation.
Ok, then YOU stuffed up. YOU should have said "hey
guys, Intel just released an 8086 processor, but I can
see they failed to tell OS developers and application
programmers to not hardcode the number 4 in their
applications - always get two shift values via an OS
call if you need to manipulate the segment. That way
we may be able to have the same executables address
4 GiB one day. Or at least more than 1 MiB".
Running "the same executables" with bigger memory was no-goal
for most folks. And folks that wanted this had their ways.
You want other folks to solve your problems, and somewant
do not notice that they solve _their_ problems quite well.
Post by ***@gmail.com
6. Microsoft stuffed up by not providing an INT 21H call
to obtain a right and left shift to adjust segments by.
Yes, I have the benefit of 20/20 hindsight, unlike them,
so it's not really fair, but that's what technically happened.
Even with hindsight you can not propose viable solution.
Why is 1 MiB maximum more viable than 512 MiB maximum?
Trying to push 16-bit code much beyond 1M is not viable.
Intel knew that and offered 32-bit path.
Had
Intel stayed with 16 bits + kludges to enlarge address space
they would almost surely loose to Motorola or RISC vendors.
Around 1985 there was 32-bit ARM.
I'm not suggesting they stayed with 16-bit kludges.
I'm suggesting that they support up to 16-bit shifts
for 16-bit code, 32-bit shifts for 32-bit code, and
64-bit shifts for 64-bit code. That's what segmentation
allows.
In other words, you propose to replace resonable temporay
kludge by permanently using much worse variant...
Post by ***@gmail.com
I'm not sure if some mathematician/computer science
person could have sat down and calculated this in say 1970,
as a generic solution to the "address range exceeding
register range problem".
IBM had a trick in 360/20: at hardware level it was 16-bit
processor. Instruction set officially was 32-bit. But
it signaled error on any operation that could not be done
using 16-bits. In effect, program that run correctly
(not signaling error) on 360/20 would run correctly on
bigger machines. And in principle it could use more
memory, as operations that failed on 360/20 could work
on bigger machine.
That sounds like a flat 32-bit model implemented on 16-bit
hardware. That's the equivalent of segmentation with a
16-bit shift. Which defeats the purpose of 4-bit shifts.
Intel didn't use the number 4 instead of the number 16
for fun.
There was no segmentation, 360/20 was limited to 64k memory
(max that true 16-bit machine could handle). You needed
better model to get more memory. AFAIK all other models
had 32-bit registers (but there were rather severe restrictions
on max supported memory). But if you think that 32-bit
registers are too expensive and memory is cheap enough to
have more of it you could create fictional 360/27 having
say 2 or 3 register extended to 20 bits. You could then
use those registers to access 1M, with binary compatibility
with bigger machines. And with hidsight, you could correct
24-bit limitation.
Post by ***@gmail.com
Post by a***@math.uni.wroc.pl
probably most
sensible may would be to go stright to 32-bit registers,
leaving old segment registers for backward compatiblity.
Or maybe increase segment registers to 32 bits, with
old instructions loading only 16 bits with shift and
new one loading full segment register.
Well, this is an interesting proposition. What precisely
do you think Intel should have done and why? What
exactly are you trying to achieve?
Variant 1: extend data registers and addresses to 32-bits,
keeping segment registers as in 8086 (but without a20 wraparound).
This would be similar to "unreal" mode, except that operations
on segment registers would _not_ add limits or wraparound.
That would save cost of machinery to handle descriptors.
At first processor internally could perform 32-bit operations as
pairs of 16-bit operations (so there would be rather little
extra cost beyond added high parts of registers). Such processor
could be simpler than 286. Old 16-bit programs would
execute as before. Programs using 32-bit addresses could
use up to 4G of memory.
Variant 2: Extend segment registers to 32-bits. Add new
instructions to load 32-bit data to segment registers and
few other instructions for arithmetic. AFAICS such extention
would allow runnig old programs in 1 meg and new programs
could use all memory by proper manipulation of segment
registers.
Neither variant allows old 8086 programs to use more than
1 meg, you need to recompile them to use new instructions.
Ok, you're trying to solve a different problem than I am.
I see no reason for 16-bit programs to be limited to 1 MiB.
Why would any software be designed like that? Hardware,
yes. Software, no.
I see reason: true 16-bit program will use 16-bit addresses,
so will be limited to 64k. You can play some trick to use
mostly 16-bit address and utilize more momory. But it quickly
becomes tedious and does not stay within your beloved C90.
So you move to 32-bit addresses (large model). But then it
is only natural to move completely 32-bit. Extra factor
is hardware. You may prefer 16-bit hardware because it
is simpler. But if you have more than 1M memory this
gets silly: you are adding large silicon area in memory
chips and skimp on small addition to processor.
Post by ***@gmail.com
Post by a***@math.uni.wroc.pl
but you may exclude surprisingly many real programs.
I'm interested in C90-compliant programs. I fully
understand that I am excluding others. But at least
there is hope.
In 1985 there were probably no real C90-programs. I mean
that some tiny toy programs could "by accident" be
C90-compliant, but it in not reasonable to expect programs
in 1985 to comply with future rules.
Most K&R C programs are C90-compliant.
I saw a lot of code that assumed that signed aritmetic is
modulo power of 2. Standard said it is not kosher.
And many similar things: K&R wrote that C gives you what
machine instructions produce, so people tested and used
inferred properties.
ANSI didn't
introduce any surprises.
It was clear that standard will cover common practice, and
declare other things as extentions. But before standard
was well advanced it was not clear what a common practice
is. Extra factor was that before standard was approved
there was no pressure to avoid non-standard features:
users of such features intead argued that they should be
in the standard.
And in modern times
proportions of _programs_ that stay within C90 is tiny
(99% of code may be C90, but 1% may perform crucial
non-portable tasks).
I'd like to get C90 working with no VM and no multitasking
to my satisfaction as a starting point.
Noting that other people added VM and multitasking
to PDOS/386 already and I added a few #defines to
try to switch it off, but wasn't fully successful.
I do not get why you are so against multitasking?
Post by ***@gmail.com
Post by a***@math.uni.wroc.pl
Under your rules
you can put segments in memory however you wish and
program should work. In particular, you can set segment
limit at what is used and pack them tightly one after
another. Your approach essentially forces gaps, when
data or routines do not fit in remaining space.
The shift amount is flexible. If you have 512 MiB of
RAM, you don't care about the gaps.
When I got 512M machine my first program used carefully
designed space-optimized data structurs. It could not
run on smaller machines, 512M has smallest that it
would fit. And due to large memory machine was more
expensive than average machines. And when I pay for
memory I do not to waste it...
I'm not going to cry to you for your 512 MiB machine
when everyone else is stuck at 1 MiB.
At that time average was closer to 64M...
Post by ***@gmail.com
If you have 2 MiB of RAM, you may well care about
the gaps, so use a 5-bit shift. What alternative is there?
Doing 4 instead of 5 won't help you.
Use what is there: segment can have arbitrary origin (reccomended
is origin divisible by 16), you can pack them tightly, move
or even swap to disc.
I don't know what you are talking about. The only
way you can address 2 MiB RAM with 16-bit
programs is to have 5-bit shifts minimum. Yes,
the packing isn't as tight - 32 bytes instead of
16 bytes, but the space wasted by packing
failure is always going to be far more compensated
by the fact that you just doubled addressable
memory.
You are preaching segmentation, but when it comes to details
you do not know how 286/386 segmentation works. Have you
ever initilized GDT?
Another things: you write as you invented soemthing. Various
If I didn't invent anything, and valid 8086 programs
exist that can run unchanged and access 512 MiB
on a suitable processor, then why are people as
recently as 48 hours ago telling me that what I want
is impossible, and giving me a link, and why didn't
you respond and tell them "it's totally possible", and
why are you asking me to give you actual code?
Details matter. You admited that your original claim
("8086 binaries" without extra qualification meaning all
8086 binaries) is not what you intended to claim, that
you need well-behaved programs. When we sum all those
"non supported" and "must be done by compiler/linker/OS"
we get quite different claim. Do you get difference
between "there exists" and "all" and that in normal
conversation you need to explictly say "there exists"?

Intel gave guidelines how to port programs to PM16. Unlike
you they understand that real programs have hidden assumptions
and how to remove them so that program works well in PM16.
And they do not shout how much memory PM16 could
utilize, they give you info from which you can work
out max but they understand in real program other
factors will limit you. And they understand that
"there exists" is really weak claim and that your
goal is not the goal for many developers.

Another thing is huge model: it may have some advantages
for lazy folks, but normal lazy folks would just use
32-bit mode when needed. You keep claiming how wonderful
your variation of huge mode is, but when we got
to details you basicaly say that you do not care
about speed for such programs. You seem still
ignore extra overhead that your mode would introduce
compared to standard huge mode.
segmentation schemes were studied and in 1985 it was well
known that technically you could use different segment
shift. But it was also known that such machine would
have no advantages, so nobody tried to make it.
I have no idea why a Taiwanese-made 8086+ couldn't
have stormed the market by addressing say 4 MiB
instead of 1 MiB. If it was because of patents, that's
not a technical question to discuss. If it was because
Gates was a jackass, not setting clear rules, that's
something that can be belatedly fixed, which is what
I am doing now. If only for my own programs.
I do not think there were problems with patents or similar.
NEC processors were 8086 with special extention. NEC
extention (Z80 compatibility) was potentially much more
attractive than messing with segments, but did not
create big demand for NEC processors. AFAICS before
1990 machine with more than 1M memory would be quite
expensive so to utilize memory you would like inside
best available processor, which points towards 386.
In fact, 286 could address big memory, so your 8086+
would have to compete with with 286. Hypotetical
8086+ could be marginally cheaper than 286, but
in expensive product it would not significantly increase
sales.

You may ask why 286 with large memory did not get
more use. Part is costs/benefits thing: when
memory got resonably cheap 386 SX was only marginally
more expensive than 286, and offer advantages of 32-bits.

Software played some role, but with approprate compiler
you could create segmented 286 program running under
MSDOS and using all available memory. I do not
remember seeing such program in the wild (it is possible
that I simply forgot). I remember 8086 programs that
stayed withing first 1M and 386 programs that could use
more memory. While not a fair sample I would guess
that 16-bit segemented model was much less popular
than 386-mode later.
Post by ***@gmail.com
Post by a***@math.uni.wroc.pl
Post by ***@gmail.com
My goal is to get something that looks like MSDOS
working on the 80386, and then maybe the 80286
and x64, and in fact, unrelated processors.
What does it mean "looks like MSDOS"? Some sources
claim that CP/M commands and later MSDOS command
were modelled after CMS. So does CMS qualify as
"looks like MSDOS"? Note that in CMS there is no
segment nonsense (it uses 24-bit address mode on 370).
System calls are via SVC and DIAGNOSE. OTOH CMS is
single tasking, once you have access to a disk you
may trash filesystem (if you rally want to).
I expect to be able to connect via a VT100 and type
"dir" and see files called fred.exe or whatever, and
subdirectories and if I type "fred" it runs some
program (I don't care if the instructions are S/370,
and in fact, I won't even be able to detect that).
AFAIK this works.
I'm from the mainframe world. They are called "module"
instead of "exe", there's no "C:\" prompt.
You did not mention "C:\" earlier. I took that
"whatever" covers CMS modules and execs.
There are no
subdirectories,
You did not say how deep you want to get. PDS gives
you one level.
and there is no VT100 support.
Hmm, there is support for dumb async teletypes. AFAIK VT100
can work as dumb async teletype. Of course, using VT100
as dumb async teletype underutilizes it, but I see no
reason why it would not work. Anyway, my linux console
emulates VT100 (with extensions), telnet to hercules emulates
async connections and CMS (configured to epect dumb line
terminal) worked fine in such setup.
Post by ***@gmail.com
I then want to go "e fred.c" to run Micro-emacs to
edit my C program and then type "gcc -S -I . -o fred.s fred.c"
to convert my C program into assembler.
Well, with your gcc there is well-known size problem, but
No, this is wrong too. GCCCMS actually works perfectly
fine on the S/370. Why wouldn't it? The module is only
3 MB in size and most program source code isn't that
large that it needs the remaining 13 MiB even with
optimization on.
OK, good for you.
switching to s390 mode when needed you should be able
to work around this. Micro-emacs want char-by-char input
and mainframe devices were not designed to support this.
Traditionally that is true, but depending on your definition,
this restriction has been lifted.
AFAIK there is nothing in CMS preventing char-by-char
There is no CMS call that would enable it.
That least of the problems: if new device only works in
char-by-char mode there is no need for any enable call.

But you need to be able to hook new device, working differently
than "standard" ones. No, I did not look deply into this.
But I checked and there are tables to which you can hook
your drivers. And if you really want you can add your
own system calls.
input, but you would have to provide low-level support
(and apparently you want other folks to do this).
I want other folk to do what with what?
As I wrote above. When there is some real low-level work
to do you want vendors to provide drivers/BIOS. Or you
want sombody else to write code to your PDOS.
Post by ***@gmail.com
That's enough.
BTW, have you ever tried PDOS/3X0? It runs on
the S/370 you mentioned. In AM24 as you described.
No.
Ok, well it runs micro-emacs. Not very well though,
but those problems can be overcome as far as I
know.
For me micro-emacs is a non-goal. I am more interested
when things work as intended by IBM.
--
Waldek Hebisch
muta...@gmail.com
2021-07-17 05:08:17 UTC
Permalink
Post by a***@math.uni.wroc.pl
Would you like to see some C code that does a p++
with the proposed arbitrary segment shifts?
No, I would like to see _assembler_ generated with support
for arbitrary segment shifts.
I don't understand this. We may be talking cross-purposes.
Assembler generated ... from what? A C90 program
compiled in huge memory model using Watcom C?

That generated C doesn't change one iota.

Nor does the code generated from any other memory
model.
Post by a***@math.uni.wroc.pl
I know how to handle
normal 4 bit shift, I know to implement what you want.
But that is slow code. I do not know if you can produce
better code or just did not realize how bad is resulting
code.
I have never actually produced a huge memory model
program in my life. Everything I have wanted to do,
starting with hello world and moving up, can be done
in large memory model max.

If only large memory model programs and below work
with 512 MiB of memory, so be it. It doesn't actually
affect any code I have already written.

It does affect a future program though, for which I
wish to use huge memory model, now that I know it
exists. That program is not performance-critical. I
don't care if the entire application is 3 times slower.
I bet it isn't even 1% slower though, but it will be
impossible to measure either way. At least by
observance.

Getting huge memory model to work at all with
512 MiB is icing on the cake.
Post by a***@math.uni.wroc.pl
I think we're probably in a semantic debate now.
If you want to call everything a VM, it doesn't affect
my design. Just a few months ago someone told
me that PDOS/386 wasn't an operating system, it
was a file manager and API or something like that.
Well, PDOS/386 is not an operating system, at best is
just part of operating system. But I know that
you really want to call it operating system, so I am
simply ready to translate form your terminolgy to
accepted one.
Wow. Ok, the last person who made that claim didn't
actually answer me when I asked him what definition
of "operating system" he was using. So maybe you
can answer.

1. Was MSDOS an operating system, and if so, why?

2. Why is PDOS/386 not an operating system?
Post by a***@math.uni.wroc.pl
I now know that I can avoid the prefixes, in PM32,
simply by setting the D bit appropriately in all the
segments. Why would I bother learning how to
get into PM16?
If you want to do anyting (as opposed to asking other folks
to do coding for you)
What are you talking about? I've been working on
PDOS for 27 years. This is not a future tense
situation.
Post by a***@math.uni.wroc.pl
you need to look up descriptor
format in Intel docs. You will see that each descriptor
I already consulted that something like 24 years ago.
The knowledge is now encapsulated in public domain
code, and I no longer have any interest in it except
just now when I started thinking about PDOS/86
again.
Post by a***@math.uni.wroc.pl
has type field. There are several types, but only two
are relevant here: 16-bit code descriptor and 32-bit code
descriptor. If you do far control transfer to descriptor
marked as 32-bit code, then processor switches to 32-bit
mode. If you do far control transfer to descriptor
marked as 16-bit code, then processor switches to 16-bit
mode. If you really know how to get into PM32, you
should also know how to get into PM16.
It's more "knew" than "know". And I only focused on
getting into PM32, not stopping to think whether I
had just passed PM16 and whether that would be
useful for anything. I wanted a flat 32-bit address
space as priority. I had very little interest in 16-bit
coding until just recently.
Post by a***@math.uni.wroc.pl
Most C programs are full of crap. They won't even
compile, nevermind run. They will do an unconditional
#include of the non-existent sys/types.h
Well, switch to better system/compiler which provides
sys/types.h
No, I will switch to better applications, that don't
include sys/types.h and actually follow the C90
rules.

I'm not expecting you to follow my route. But don't
expect me to follow yours either.
Post by a***@math.uni.wroc.pl
On my current system support for running 32-bit and 16-bit
executables is not installed, so there are only 64-bit
one. I am happy to recompile if needed. In fact, for
rarely used programs I keep only sources and no binaries,
and compile when needed. And "hello world" is more
often compiled than run.
Again, you are free to do whatever you like. Me, I want
my 16-bit programs to be clean, meaning no hardcoding
of the number 4.

I also want my S/3X0 programs to be clean, ie no
messing with the top address bit, so that they can
run as AM32.
Post by a***@math.uni.wroc.pl
But I would also like to have the
option of coding the hello world as 16-bit, produce
a single executable, and have it work everywhere.
There are various "solutions" to "work everywhere", but
in modern times "everywhere" does not include 16-bit
systems.
Not everyone shares that opinion, even in modern
times. I cater for people who like the idea of clean
16-bit 8086 programs running on both an 8086
and addressing 512 MiB on an 80386.
Post by a***@math.uni.wroc.pl
I was surprised to wake up one day and find that
my programs in c:\dospath stopped working.
For decades I had had no reason to recompile them.
It would have been far better if I had woken up and
I could suddenly edit 512 MiB files with them. Or
whatever the application was that just allocated
memory in chunks as required. Like maybe
Turbo C++ can suddenly start building 50 MB
executables.
Dreams are nicer than reality. My modern 16-bit coding
was for machines with small memory, between 512 byte RAM
and 2K RAM (there was also flash, 512 byte RAM machine
also had 16k flash). I am not worried that old 16-bit
programs would not run on those machines and that I need
to recompile.
That's fine. I don't care if you're not worried. I care, and
I'm interested in a technical discussion. I'm not trying
to get you to buy current PDOS/86 or future PDOS/86.
Post by a***@math.uni.wroc.pl
Post by ***@gmail.com
2. The 8086 silicon was perfect.
Here you are making things up. 8086 was a compromise, it
worked and won against several competitors. But "won" does
not mean it was better then existing competitors, and almost
surely it was possible to make better processor.
Not when the requirement is to run 8080 CP/M
programs in a memory-restricted environment.
Note that 8086 was _not_ binary compatible with 8080.
Yes, I didn't say they were. I said it was a requirement.
And Intel had the correct technical solution to that
requirement. It is a very clean solution.
Post by a***@math.uni.wroc.pl
I don't know what you're talking about. I didn't ask for
money to be spent or silicon to be changed. I asked
for about 4 lines of English to be written on a piece
of paper by Intel.
Those lines have substantial cost. Intel offered an assembler,
assuming that Intel put advice in documentation, but
did not follow it in their assembler, that would encourage
other folks to ignore advice from manual. And, as explained
this advice would be counterproductive, as the same thing
can be done without prefixes.
Ok, fair enough. If Intel themselves were in the assembler
business, yes, they would have need to generate the
correct opcode sequence, which means someone
would have needed to type x'66' under my old proposal
that has since been replaced.
Post by a***@math.uni.wroc.pl
Ok, then YOU stuffed up. YOU should have said "hey
guys, Intel just released an 8086 processor, but I can
see they failed to tell OS developers and application
programmers to not hardcode the number 4 in their
applications - always get two shift values via an OS
call if you need to manipulate the segment. That way
we may be able to have the same executables address
4 GiB one day. Or at least more than 1 MiB".
Running "the same executables" with bigger memory was no-goal
for most folks. And folks that wanted this had their ways.
Ok, fine. Then *I* stuffed up, by not explaining in
the 1970s that *I* wanted clean 16-bit executables.
Mea culpa.
Post by a***@math.uni.wroc.pl
You want other folks to solve your problems, and somewant
do not notice that they solve _their_ problems quite well.
Ok, fair enough. Yes, I don't mind how other people
solve their problems. I want to have clean 16-bit
executables for my own satisfaction.

If other people don't mind waking up and seeing their
executables suddenly have a message "sorry, doesn't
work any more" instead of suddenly being able to
edit 512 MiB files, that's fine. We're in different markets.
Post by a***@math.uni.wroc.pl
Post by ***@gmail.com
6. Microsoft stuffed up by not providing an INT 21H call
to obtain a right and left shift to adjust segments by.
Yes, I have the benefit of 20/20 hindsight, unlike them,
so it's not really fair, but that's what technically happened.
Even with hindsight you can not propose viable solution.
Why is 1 MiB maximum more viable than 512 MiB maximum?
Trying to push 16-bit code much beyond 1M is not viable.
That is not true. There is nothing to push. With segmentation,
16-bit code naturally converts into a 4 GiB address space.
No work is required by application developers. There is indeed
work required from all the other components though.
Post by a***@math.uni.wroc.pl
Intel knew that and offered 32-bit path.
A 32-bit path is another option, and a perfectly fine option too.
But it requires a recompile and excludes the 8086.
Post by a***@math.uni.wroc.pl
Had
Intel stayed with 16 bits + kludges to enlarge address space
they would almost surely loose to Motorola or RISC vendors.
Around 1985 there was 32-bit ARM.
I'm not suggesting they stayed with 16-bit kludges.
I'm suggesting that they support up to 16-bit shifts
for 16-bit code, 32-bit shifts for 32-bit code, and
64-bit shifts for 64-bit code. That's what segmentation
allows.
In other words, you propose to replace resonable temporay
kludge by permanently using much worse variant...
I don't understand:

1. Much worse by what measure? 4 GiB vs 1 MiB is not worse.
Are we talking cross-purposes?

2. 16-bit is not temporary. It still exists, even in long mode.
Post by a***@math.uni.wroc.pl
That sounds like a flat 32-bit model implemented on 16-bit
hardware. That's the equivalent of segmentation with a
16-bit shift. Which defeats the purpose of 4-bit shifts.
Intel didn't use the number 4 instead of the number 16
for fun.
There was no segmentation, 360/20 was limited to 64k memory
(max that true 16-bit machine could handle).
A 32-bit register with the top 16 bits always 0 is the
equivalent of a segment register set to 0.
Post by a***@math.uni.wroc.pl
You needed
better model to get more memory. AFAIK all other models
had 32-bit registers (but there were rather severe restrictions
on max supported memory). But if you think that 32-bit
registers are too expensive and memory is cheap enough to
have more of it you could create fictional 360/27 having
say 2 or 3 register extended to 20 bits.
That exceeds the 16-bit registers. That's not achieving
what the 8086 achieved.
Post by a***@math.uni.wroc.pl
You could then
use those registers to access 1M, with binary compatibility
with bigger machines. And with hidsight, you could correct
24-bit limitation.
There is no 24-bit limitation. That is down to individual
programmers choosing to pollute unused bits. If you
write your S/370 programs properly, they work fine,
at an unchanged binary level, on a AM32 processor like
the S/380.
Post by a***@math.uni.wroc.pl
I see no reason for 16-bit programs to be limited to 1 MiB.
Why would any software be designed like that? Hardware,
yes. Software, no.
I see reason: true 16-bit program will use 16-bit addresses,
so will be limited to 64k. You can play some trick to use
mostly 16-bit address and utilize more momory. But it quickly
becomes tedious and does not stay within your beloved C90.
So you move to 32-bit addresses (large model). But then it
is only natural to move completely 32-bit.
It is natural to move completely to 256-bit too. That
doesn't mean that 32-bit doesn't have a place in life.
Nor does it mean that 16-bit doesn't have a place in
life. These are natural progressions and there is no
reason to not design them cleanly. There is nothing
magical about 1 MiB. 512 MiB is just as good. Some
would even say better. E.g. you can edit files that are
512 MiB in size instead of 1 MiB in size.
Post by a***@math.uni.wroc.pl
Extra factor
is hardware. You may prefer 16-bit hardware because it
is simpler. But if you have more than 1M memory this
gets silly: you are adding large silicon area in memory
chips and skimp on small addition to processor.
Adding 32-bit instructions to a 16-bit processor is
not a "small addition".
Post by a***@math.uni.wroc.pl
Post by ***@gmail.com
Post by a***@math.uni.wroc.pl
but you may exclude surprisingly many real programs.
I'm interested in C90-compliant programs. I fully
understand that I am excluding others. But at least
there is hope.
In 1985 there were probably no real C90-programs. I mean
that some tiny toy programs could "by accident" be
C90-compliant, but it in not reasonable to expect programs
in 1985 to comply with future rules.
Most K&R C programs are C90-compliant.
I saw a lot of code that assumed that signed aritmetic is
modulo power of 2. Standard said it is not kosher.
I don't actually know this level of detail.
Post by a***@math.uni.wroc.pl
And many similar things: K&R wrote that C gives you what
machine instructions produce, so people tested and used
inferred properties.
Ok, I don't know.
Post by a***@math.uni.wroc.pl
ANSI didn't
introduce any surprises.
It was clear that standard will cover common practice, and
declare other things as extentions. But before standard
was well advanced it was not clear what a common practice
is. Extra factor was that before standard was approved
users of such features intead argued that they should be
in the standard.
Ok. Maybe it would be better to abstract the problem and
assume that ANSI/ISO got their act together say 3 years
after K&R 1 was published. I'm more interested in the
theoretical (actually, real) problem that segmentation
was introduced to solve.
Post by a***@math.uni.wroc.pl
And in modern times
proportions of _programs_ that stay within C90 is tiny
(99% of code may be C90, but 1% may perform crucial
non-portable tasks).
I'd like to get C90 working with no VM and no multitasking
to my satisfaction as a starting point.
Noting that other people added VM and multitasking
to PDOS/386 already and I added a few #defines to
try to switch it off, but wasn't fully successful.
I do not get why you are so against multitasking?
I want to have a simple replacement for MSDOS. Just
refine it a little and make it 32-bit. When that is working
to my satisfaction we can pollute a beautiful expression
with marketing crap.
Post by a***@math.uni.wroc.pl
I don't know what you are talking about. The only
way you can address 2 MiB RAM with 16-bit
programs is to have 5-bit shifts minimum. Yes,
the packing isn't as tight - 32 bytes instead of
16 bytes, but the space wasted by packing
failure is always going to be far more compensated
by the fact that you just doubled addressable
memory.
You are preaching segmentation, but when it comes to details
you do not know how 286/386 segmentation works. Have you
ever initilized GDT?
Of course. How the hell do you think I wrote PDOS/386?
What do you think PDOS/386 even is? And who wrote it?
Post by a***@math.uni.wroc.pl
Another things: you write as you invented soemthing. Various
If I didn't invent anything, and valid 8086 programs
exist that can run unchanged and access 512 MiB
on a suitable processor, then why are people as
recently as 48 hours ago telling me that what I want
is impossible, and giving me a link, and why didn't
you respond and tell them "it's totally possible", and
why are you asking me to give you actual code?
Details matter. You admited that your original claim
("8086 binaries" without extra qualification meaning all
8086 binaries) is not what you intended to claim, that
you need well-behaved programs. When we sum all those
"non supported" and "must be done by compiler/linker/OS"
we get quite different claim. Do you get difference
between "there exists" and "all" and that in normal
conversation you need to explictly say "there exists"?
If I worded it sloppily, sorry. I have explained the general
thrust of what I wanted over the last several years, and
the latest round is just a continuation of that. I don't
remember what I already explained and who was
listening.
Post by a***@math.uni.wroc.pl
Intel gave guidelines how to port programs to PM16. Unlike
you they understand that real programs have hidden assumptions
and how to remove them so that program works well in PM16.
And they do not shout how much memory PM16 could
utilize, they give you info from which you can work
out max but they understand in real program other
factors will limit you. And they understand that
"there exists" is really weak claim and that your
goal is not the goal for many developers.
I am not in the same market as Intel. I'm not receiving
money so I do whatever I want.

If someone pays me to hardcode the number 4, I'll
write 50 million 4s until I get RSI.
Post by a***@math.uni.wroc.pl
Another thing is huge model: it may have some advantages
for lazy folks, but normal lazy folks would just use
32-bit mode when needed.
I don't have a problem with tiny model 32-bit applications.
Almost all of my executables are this already. All the
binaries at pdos.org are this. Except for IO.SYS.
Post by a***@math.uni.wroc.pl
You keep claiming how wonderful
your variation of huge mode is, but when we got
Huge model is not wonderful at all. Any variation. It
is expensive. Which is why I have never built one in
my life.

It is my large memory model executables that will
be wonderful once the support infrastructure is in
place.
Post by a***@math.uni.wroc.pl
to details you basicaly say that you do not care
about speed for such programs. You seem still
ignore extra overhead that your mode would introduce
compared to standard huge mode.
Yes, when processors have increased in speed
1000-fold or whatever since 8086 huge memory
model was invented, and noting that Turbo C
doesn't even generate the necessary code to
support that model, I don't care.

I care that when I type "fdisk" it says "command
not found" and I care that when I woke up one
day and went "zcalc" it said "no longer supported".
Post by a***@math.uni.wroc.pl
In fact, 286 could address big memory, so your 8086+
would have to compete with with 286. Hypotetical
8086+ could be marginally cheaper than 286, but
in expensive product it would not significantly increase
sales.
Ok, I'm not particularly concerned about there being
a physical 8086+ processor. 16-bit programs using
256 MiB on an 80286 and 512 MiB on an 80386 is
sufficient.
Post by a***@math.uni.wroc.pl
Software played some role, but with approprate compiler
you could create segmented 286 program running under
MSDOS and using all available memory.
What do you mean? I can produce an executable that
works with 1 MiB on 8086 and more than 1 MiB on
80286? Was it restricted to large memory model so
that segments didn't need to be adjusted?
Post by a***@math.uni.wroc.pl
I do not
remember seeing such program in the wild (it is possible
that I simply forgot). I remember 8086 programs that
stayed withing first 1M and 386 programs that could use
more memory. While not a fair sample I would guess
that 16-bit segemented model was much less popular
than 386-mode later.
32-bit offsets are fantastic. I'm not arguing against
them. That doesn't mean that 16-bit offsets shouldn't
be as good as they can get. Which is 4 GiB theoretical
and 512 MiB practical.
Post by a***@math.uni.wroc.pl
There are no
subdirectories,
You did not say how deep you want to get. PDS gives
you one level.
CMS doesn't have PDSes. Did you mean MVS?
Post by a***@math.uni.wroc.pl
and there is no VT100 support.
Hmm, there is support for dumb async teletypes. AFAIK VT100
can work as dumb async teletype. Of course, using VT100
I meant the escape sequences.
Post by a***@math.uni.wroc.pl
AFAIK there is nothing in CMS preventing char-by-char
There is no CMS call that would enable it.
That least of the problems: if new device only works in
char-by-char mode there is no need for any enable call.
But you need to be able to hook new device, working differently
than "standard" ones. No, I did not look deply into this.
But I checked and there are tables to which you can hook
your drivers. And if you really want you can add your
own system calls.
At the point of writing my own system calls I am producing
a new OS, and yes, with a new OS, not CMS, you can indeed
support character mode terminals. There is no restriction
at the CCW level.
Post by a***@math.uni.wroc.pl
input, but you would have to provide low-level support
(and apparently you want other folks to do this).
I want other folk to do what with what?
As I wrote above. When there is some real low-level work
to do you want vendors to provide drivers/BIOS.
The vendors have already been doing that for decades.
It is only when they stopped doing that that I started
complaining, as they were invalidating software.

Yes, I never expected to have to get into the BIOS business.
Post by a***@math.uni.wroc.pl
Or you want sombody else to write code to your PDOS.
For decades I was the one coding, with something in the
documentation saying that I don't know proper OS design,
I need some help with the design.

After enough coding, I finally figured out a design I am
happy with.

So now I am working on the design and looking to employ
others to do the actual coding.

The progression from coder to designer is not particularly
abnormal.
Post by a***@math.uni.wroc.pl
Post by ***@gmail.com
That's enough.
BTW, have you ever tried PDOS/3X0? It runs on
the S/370 you mentioned. In AM24 as you described.
No.
Ok, well it runs micro-emacs. Not very well though,
but those problems can be overcome as far as I
know.
For me micro-emacs is a non-goal. I am more interested
when things work as intended by IBM.
Ok, you personally are not my target market then. IBM
already has you covered.

That doesn't prevent us from having a technical
discussion about alternatives to IBM though, even
if you dislike them. And even if you have your life
savings in IBM shares. Well, maybe not then.

BFN. Paul.
Rod Pemberton
2021-07-14 03:46:28 UTC
Permalink
On Tue, 13 Jul 2021 17:30:05 +0200
Post by wolfgang kern
Post by ***@gmail.com
How can it be an alias on an 8086?
would need to remove several layers of dust from my collection
of Intel books to check if it is mentioned at all.
the code 0x60..group could be alias for 0x70.. or 0x50..
similar to 0x82 which is (was until 2000) an alias for 0x80
That would be interesting to know for sure, as I've not noticed that
set of aliases mentioned anywhere.
--
The Chinese have such difficulty with English ... The word is not
"reunification" but "revenge".
wolfgang kern
2021-07-14 05:17:18 UTC
Permalink
Post by Rod Pemberton
Post by wolfgang kern
Post by ***@gmail.com
How can it be an alias on an 8086?
would need to remove several layers of dust from my collection
of Intel books to check if it is mentioned at all.
the code 0x60..group could be alias for 0x70.. or 0x50..
similar to 0x82 which is (was until 2000) an alias for 0x80
That would be interesting to know for sure, as I've not noticed that
set of aliases mentioned anywhere.
They were never documented by the vendors, but back then (1995) when I
wrote my value tracking code analyzer as part of disassembler and
debugger I tested all code variants and found _some_ alias on +486.
sorry for I lost my handwritten notes on it due a water disaster.
Meanwhile sandpile.org show new usage of many opcodes, but there are
still a lot doubles left undocumented especially in the 0Fxx groups.
__
wolfgang
muta...@gmail.com
2021-07-13 01:41:38 UTC
Permalink
Post by ***@gmail.com
24-bit addressing, and a theoretical x86 processor with
32-bit addressing. Come to think of it, it might be possible
to use an actual 80386 to do effective 16-bit segment
shifts. Or surely I can at least match the 80286 and do
(effective) 8-bit shifts. That would be a load of fun.
I guess it depends how many selectors I can define on
the 80386. I'll run everything in supervisor mode, so I
can use both GDT and LDT if that helps.
I looked up Wikipedia:

https://en.wikipedia.org/wiki/Global_Descriptor_Table

and there are 8192*2 selectors available, meaning that
16-bit programs can access 1 GiB maximum.

4-bit segment shifts make sense when there is 1 MiB of
memory available.

5-bit when there is 2 MiB.

6-bit when there is 4 MiB.

So the pattern is:

C:\devel\bochs>zcalc 65536*2**14/1024/1024
Calculated Value is 1024.000000
Thank you for using the calculator

13 bit shifts are the last purposeful one, to give 512 MiB,
and after that, you may as well just use 16 bit shifts.

Given the restrictions of the 80386.

BFN. Paul.
wolfgang kern
2021-07-13 03:31:26 UTC
Permalink
Post by ***@gmail.com
Post by ***@gmail.com
24-bit addressing, and a theoretical x86 processor with
32-bit addressing. Come to think of it, it might be possible
to use an actual 80386 to do effective 16-bit segment
shifts. Or surely I can at least match the 80286 and do
(effective) 8-bit shifts. That would be a load of fun.
I guess it depends how many selectors I can define on
the 80386. I'll run everything in supervisor mode, so I
can use both GDT and LDT if that helps.
https://en.wikipedia.org/wiki/Global_Descriptor_Table
and there are 8192*2 selectors available, meaning that
16-bit programs can access 1 GiB maximum.
4-bit segment shifts make sense when there is 1 MiB of
memory available.
you totally misunderstood how descriptors work.
1. GDT/LDT are only in effect in PM and VM
2. a GDT entry can span a 4GB memory range.
3. a descriptor can be set for either 16 or 32 bit limit.
4. there are several descriptor types [code/data/TSS].
5. the count of entries got nothing to do with accessible range.
6. segment descriptors ARE NOT extended RM-segment registers,
they work complete different.
__
wolfgang
muta...@gmail.com
2021-07-13 04:13:32 UTC
Permalink
Post by wolfgang kern
Post by ***@gmail.com
Post by ***@gmail.com
24-bit addressing, and a theoretical x86 processor with
32-bit addressing. Come to think of it, it might be possible
to use an actual 80386 to do effective 16-bit segment
shifts. Or surely I can at least match the 80286 and do
(effective) 8-bit shifts. That would be a load of fun.
I guess it depends how many selectors I can define on
the 80386. I'll run everything in supervisor mode, so I
can use both GDT and LDT if that helps.
https://en.wikipedia.org/wiki/Global_Descriptor_Table
and there are 8192*2 selectors available, meaning that
16-bit programs can access 1 GiB maximum.
4-bit segment shifts make sense when there is 1 MiB of
memory available.
you totally misunderstood how descriptors work.
1. GDT/LDT are only in effect in PM and VM
2. a GDT entry can span a 4GB memory range.
3. a descriptor can be set for either 16 or 32 bit limit.
4. there are several descriptor types [code/data/TSS].
5. the count of entries got nothing to do with accessible range.
6. segment descriptors ARE NOT extended RM-segment registers,
they work complete different.
You misunderstand my intentions.

I will set up 16384 selectors, each pointing to consecutive
64k buffers. If the user has 1 GiB of memory, then we are
using EFFECTIVE 16-bit shifts. What that means is that
select 0 points to the first 64k. The second selector points
to address x'10000' for a length of 64k. The third selector
points to address x'20000' for a length of 64k.

So x'10000' * 16384 will eventually get us up to 1 GiB.

If we only had 512 MiB installed we would instead use
15 bit shifts to give better granularity.

So selector 0 points to address 0. Selector 1 points to
address x'8000' for a length of 64k. Selector 2 points to
address x'10000' for a length of 64k.

x'8000' * 16384 will eventually get us up to 512 MiB.

BFN. Paul.
wolfgang kern
2021-07-13 05:56:33 UTC
Permalink
Post by ***@gmail.com
Post by wolfgang kern
Post by ***@gmail.com
Post by ***@gmail.com
24-bit addressing, and a theoretical x86 processor with
32-bit addressing. Come to think of it, it might be possible
to use an actual 80386 to do effective 16-bit segment
shifts. Or surely I can at least match the 80286 and do
(effective) 8-bit shifts. That would be a load of fun.
I guess it depends how many selectors I can define on
the 80386. I'll run everything in supervisor mode, so I
can use both GDT and LDT if that helps.
https://en.wikipedia.org/wiki/Global_Descriptor_Table
and there are 8192*2 selectors available, meaning that
16-bit programs can access 1 GiB maximum.
4-bit segment shifts make sense when there is 1 MiB of
memory available.
you totally misunderstood how descriptors work.
1. GDT/LDT are only in effect in PM and VM
2. a GDT entry can span a 4GB memory range.
3. a descriptor can be set for either 16 or 32 bit limit.
4. there are several descriptor types [code/data/TSS].
5. the count of entries got nothing to do with accessible range.
6. segment descriptors ARE NOT extended RM-segment registers,
they work complete different.
You misunderstand my intentions.
I will set up 16384 selectors, each pointing to consecutive
64k buffers. If the user has 1 GiB of memory, then we are
using EFFECTIVE 16-bit shifts. What that means is that
select 0 points to the first 64k. The second selector points
to address x'10000' for a length of 64k. The third selector
points to address x'20000' for a length of 64k.
So x'10000' * 16384 will eventually get us up to 1 GiB.
If we only had 512 MiB installed we would instead use
15 bit shifts to give better granularity.
So selector 0 points to address 0. Selector 1 points to
address x'8000' for a length of 64k. Selector 2 points to
address x'10000' for a length of 64k.
x'8000' * 16384 will eventually get us up to 512 MiB.
your calculation sucks :)

and any valid selector can start at any address within 4GB.
(but any means aligned to bounds here) the selector number
got nothing to do with the address it is pointing to !

selector 0 is invalid
first usable selector number is 8 which is also the offset
in the GDTable (numbers 9..15 are same but restricted PL)
next usable is 0x10 and so on

but you may need at least two selectors per 64K block, one
for code and one for data (some folks have third for stack)

So even if you fill the GDT with max possible entries:
last will be 0xFFF8, that means (65536-8)/16 = 4095 pairs

hope you realize that your GDT use a full 64K block by itself.
my GDT solution has only 32 entries, including 6 VBIOS selectors.
__
wolfgang
muta...@gmail.com
2021-07-13 07:47:59 UTC
Permalink
Post by wolfgang kern
and any valid selector can start at any address within 4GB.
(but any means aligned to bounds here) the selector number
got nothing to do with the address it is pointing to !
There is no point going above 1 GB when I am
doing 16-bit programming.
Post by wolfgang kern
selector 0 is invalid
first usable selector number is 8 which is also the offset
in the GDTable (numbers 9..15 are same but restricted PL)
next usable is 0x10 and so on
Oh, ok. That's fine. I can live with that. I just need to
make the rules for huge pointers that when you
cross the 64k boundary, the segment gets incremented
by 0x10 or whatever.
Post by wolfgang kern
but you may need at least two selectors per 64K block, one
for code and one for data (some folks have third for stack)
Is there a reason I can't just have on selector for each
64k block?
Post by wolfgang kern
last will be 0xFFF8, that means (65536-8)/16 = 4095 pairs
Ok.
Post by wolfgang kern
hope you realize that your GDT use a full 64K block by itself.
That's fine. That's peanuts for what I'm doing.

BFN. Paul.
wolfgang kern
2021-07-13 15:36:05 UTC
Permalink
On 13.07.2021 09:47, ***@gmail.com wrote:
...
Post by ***@gmail.com
Post by wolfgang kern
but you may need at least two selectors per 64K block, one
for code and one for data (some folks have third for stack)
Is there a reason I can't just have on selector for each
64k block?
you can have that, but either 64K code only or 64K data only.
but application code usually need both within one 64 K block.
__
wolfgang
muta...@gmail.com
2021-07-13 22:04:01 UTC
Permalink
Post by wolfgang kern
Post by ***@gmail.com
Post by wolfgang kern
but you may need at least two selectors per 64K block, one
for code and one for data (some folks have third for stack)
Is there a reason I can't just have on selector for each
64k block?
you can have that, but either 64K code only or 64K data only.
but application code usually need both within one 64 K block.
I see. That's not very nice. But given that the selectors
don't start from a neat 0 anyway, I guess pairs are OK.

The assembler generated from C code in the huge memory
model will not care. It just needs to know how much to
increment the "segment" (selector) whenever it crosses
a 64k boundary.

BFN. Paul.
muta...@gmail.com
2021-07-13 06:48:18 UTC
Permalink
Post by ***@gmail.com
13 bit shifts are the last purposeful one, to give 512 MiB,
and after that, you may as well just use 16 bit shifts.
Given the restrictions of the 80386.
Actually, we always want to max out the selectors. It is
the granularity that needs to give.

Until we're down to 4-bit shifts as there is no point
going below that. So we can reduce the number of
selectors if we're down to that much memory. Which
happens when we have 256k of memory.

So only if we have less than 256k of memory would
we be in a realistic position to reduce the number of
selectors.

BFN. Paul.
Loading...