Discussion:
The EA jump immediately after enabling protected mode by setting PE in CR0
(too old to reply)
James Harris
2015-04-26 18:54:05 UTC
Permalink
You know that, per Intel's directions, after setting the CR0 PE flag
with

mov eax, cr0
or al, 1
mov cr0, eax

we are expected to have something like

jmp seg:pmode_running

I had taken that jump instruction for granted but the recent Qemu/GDT
thread has brought up some issues about the jump, as follows.

1. The jump appears to be necessary in order to put the correct pmode
GDT entry number in CS (in its upper 13 bits, i.e. shifted left 3 bits)
and also to set the low bits of CS so that they contain the CPL and TI,
all of which should be zero.

2. On the 386 any kind of jump was needed immediately following the MOV
to CR0 - even a near jump - in order to flush the prefetch queue. On
Pentium Pro and later (and maybe even on the Pentium 1) there is no need
to flush the queue but the far jump keeps things compatible as it will
flush the prefetch queue on early CPUs as well as load the Pmode CS on
all of them.

I knew the above but the following points are of particular interest
just now as I had not considered them before - or if I had then I had
forgotten the subtleties of the problem.

3. That jump instruction has a 16-bit form and a 32-bit form. It is
encoded in hex as

EA oo oo ss ss (16-bit form)
EA oo oo oo oo ss ss (32-bit form)

where the Ss are the selector and the Os are the offset as hex bytes.

4. Depending on the mode the CPU thinks it is operating in at the time
that it hits the jump instruction the 16-bit and 32-bit forms may need
to be encoded with a leading 0x66 so they appear as

66 EA oo oo ss ss (16-bit form in 32-bit mode)
66 EA oo oo oo oo ss ss (32-bit form in 16-bit mode)

5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
still in 16-bit mode. Right?

6. Now, where it gets interesting is that the offset field of the EA
jump instruction seems to be an offset not from the jump instruction but
from the start if the segment. Is that correct? If so then we have to be
careful which jump form is encoded, as follows.

If the executing code is in the low 64k of a descriptor's space then we
can encode the simple

EA oo oo ss ss

because the offset can fit in 16 bits. But if the executing code is
above the 64k mark relative to the start of the segment then we need to
encode the 32-bit form for 16-bit mode, i.e.

66 EA oo oo oo oo ss ss

To make an example, say that the code that will enable Pmode is located
in memory so that the jump target is at physical address 0x12345. If the
GDT entry for privileged code has been set to describe all of memory,
i.e. from address 0 to address ff....fff, then it will be impossible to
use the 16-bit form of the EA jump instruction. Correct?

Solutions?

Solution 1. Set up a temporary GDT entry to point to the place in memory
where the code is running. In the above case, the GDT entry could point
at 0x10000 and then the jump offset would be 0x2345, leading to the jump
instruction being encoded as

EA 45 23 ss ss (bytes shown in memory order, i.e. little endian)

Solution 2. Modify the jump instruction so that the code's location
relative to the start of the privileged code segment does not matter.
That leads to

66 EA 45 23 01 00 ss ss (bytes shown in little-endian order)

When I did this before I used a temporary GDT entry to point to the
executing code, i.e. solution 1, but solution 2 also has merits.

I should say that the above is just as written after working out what I
think was going on and may contain errors for which I would welcome your
corrections.

Interesting subtlety, no?

Any thoughts/comments?

James
wolfgang kern
2015-04-26 21:42:21 UTC
Permalink
Post by James Harris
You know that, per Intel's directions, after setting the CR0 PE flag
with
mov eax, cr0
or al, 1
mov cr0, eax
we are expected to have something like
jmp seg:pmode_running
I had taken that jump instruction for granted but the recent Qemu/GDT
thread has brought up some issues about the jump, as follows.
1. The jump appears to be necessary in order to put the correct pmode
GDT entry number in CS (in its upper 13 bits, i.e. shifted left 3 bits)
and also to set the low bits of CS so that they contain the CPL and TI,
all of which should be zero.
the value in a PM seg-reg is nothing else than the offset within
the GDT (lower bits are ignored/used elsewhere, so the GDT must
be 8 byte aligned). Nothing is shifted here.
Post by James Harris
2. On the 386 any kind of jump was needed immediately following the MOV
to CR0 - even a near jump - in order to flush the prefetch queue. On
Pentium Pro and later (and maybe even on the Pentium 1) there is no need
to flush the queue but the far jump keeps things compatible as it will
flush the prefetch queue on early CPUs as well as load the Pmode CS on
all of them.
?? isn't this far jmp 'the switch point'.
Post by James Harris
I knew the above but the following points are of particular interest
just now as I had not considered them before - or if I had then I had
forgotten the subtleties of the problem.
3. That jump instruction has a 16-bit form and a 32-bit form. It is
encoded in hex as
EA oo oo ss ss (16-bit form)
EA oo oo oo oo ss ss (32-bit form)
where the Ss are the selector and the Os are the offset as hex bytes.
4. Depending on the mode the CPU thinks it is operating in at the time
that it hits the jump instruction the 16-bit and 32-bit forms may need
to be encoded with a leading 0x66 so they appear as
66 EA oo oo ss ss (16-bit form in 32-bit mode)
66 EA oo oo oo oo ss ss (32-bit form in 16-bit mode)
5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
still in 16-bit mode. Right?
Right, the far jmp is 'the switch'.
Post by James Harris
6. Now, where it gets interesting is that the offset field of the EA
jump instruction seems to be an offset not from the jump instruction but
from the start if the segment. Is that correct? If so then we have to be
careful which jump form is encoded, as follows.
If the executing code is in the low 64k of a descriptor's space then we
can encode the simple
EA oo oo ss ss
because the offset can fit in 16 bits.
Yes, and it also works if the new PM-CS use RM-CS*16 as base, and
because the offsets are equal then it may help stupid asm-tools too.
If a boot-sequence use 0:7c00 the PM-base can be flat (zero) as well.
Post by James Harris
But if the executing code is above the 64k mark relative to the
start of the segment then we need to encode the 32-bit form for
16-bit mode, i.e.
66 EA oo oo oo oo ss ss
sure, that's the only way if wont fit into 16-bits.
Post by James Harris
To make an example, say that the code that will enable Pmode is located
in memory so that the jump target is at physical address 0x12345. If the
GDT entry for privileged code has been set to describe all of memory,
i.e. from address 0 to address ff....fff, then it will be impossible to
use the 16-bit form of the EA jump instruction. Correct?
Solutions?
Solution 1. Set up a temporary GDT entry to point to the place in memory
where the code is running. In the above case, the GDT entry could point
at 0x10000 and then the jump offset would be 0x2345, leading to the jump
instruction being encoded as
EA 45 23 ss ss (bytes shown in memory order, i.e. little endian)
Solution 2. Modify the jump instruction so that the code's location
relative to the start of the privileged code segment does not matter.
That leads to
66 EA 45 23 01 00 ss ss (bytes shown in little-endian order)
When I did this before I used a temporary GDT entry to point to the
executing code, i.e. solution 1, but solution 2 also has merits.
I should say that the above is just as written after working out what I
think was going on and may contain errors for which I would welcome your
corrections.
Interesting subtlety, no?
Any thoughts/comments?
my OS switches forth and back between modes, so there are both variants,
the shorter for PM16<->RM and the 8 byte form for PM16/RM<->PM32.

__
wolfgang
James Harris
2015-04-27 08:58:56 UTC
Permalink
...
Post by James Harris
1. The jump appears to be necessary in order to put the correct pmode
GDT entry number in CS (in its upper 13 bits, i.e. shifted left 3
bits) and also to set the low bits of CS so that they contain the CPL
and TI, all of which should be zero.
the value in a PM seg-reg is nothing else than the offset within the
GDT (lower bits are ignored/used elsewhere, so the GDT must be 8 byte
aligned). Nothing is shifted here.
Did you see somewhere that the GDT must be so aligned? For sure it is a
good idea but is it needed?
Post by James Harris
2. On the 386 any kind of jump was needed immediately following the
MOV to CR0 - even a near jump - in order to flush the prefetch queue.
On Pentium Pro and later (and maybe even on the Pentium 1) there is
no need to flush the queue but the far jump keeps things compatible
as it will flush the prefetch queue on early CPUs as well as load the
Pmode CS on all of them.
?? isn't this far jmp 'the switch point'.
Well, AIUI on the 386 you could set CR0.PE and go off and do a bunch of
processing in Pmode before doing a far jump. That post-PE processing
could even include the LGDT instruction!

It seems the 486 was similar to the 386 in what was required after
setting PE. Link below but is a large download:

https://ia601608.us.archive.org/22/items/bitsavers_intel80486mmersReferenceManual1990_29642780/i486_Processor_Programmers_Reference_Manual_1990.pdf

...
Post by James Harris
5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
still in 16-bit mode. Right?
Right, the far jmp is 'the switch'.
Not on the 386 or 486. Two more that you would classify as for the
museum...?

James
wolfgang kern
2015-04-27 13:20:16 UTC
Permalink
Post by James Harris
...
Post by James Harris
1. The jump appears to be necessary in order to put the correct pmode
GDT entry number in CS (in its upper 13 bits, i.e. shifted left 3 bits)
and also to set the low bits of CS so that they contain the CPL and TI,
all of which should be zero.
the value in a PM seg-reg is nothing else than the offset within the GDT
(lower bits are ignored/used elsewhere, so the GDT must be 8 byte
aligned). Nothing is shifted here.
Did you see somewhere that the GDT must be so aligned?
For sure it is a good idea but is it needed?
Yes! seems all manuals from 286 onward tell it.
Yes! it will raise an exception on the first access otherwise.
Post by James Harris
Post by James Harris
2. On the 386 any kind of jump was needed immediately following the MOV
to CR0 - even a near jump - in order to flush the prefetch queue. On
Pentium Pro and later (and maybe even on the Pentium 1) there is no need
to flush the queue but the far jump keeps things compatible as it will
flush the prefetch queue on early CPUs as well as load the Pmode CS on
all of them.
?? isn't this far jmp 'the switch point'.
Well, AIUI on the 386 you could set CR0.PE and go off and do a bunch of
processing in Pmode before doing a far jump. That post-PE processing
could even include the LGDT instruction!
It seems the 486 was similar to the 386 in what was required after
https://ia601608.us.archive.org/22/items/bitsavers_intel80486mmers
ReferenceManual1990_29642780/i486_Processor_Programmers_Reference_
Manual_1990.pdf

I have all these expensive old Intel-books on my shelf.
But many time passed since I read them.
Post by James Harris
Post by James Harris
Post by James Harris
5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
still in 16-bit mode. Right?
Right, the far jmp is 'the switch'.
Not on the 386 or 486. Two more that you would classify as for the
museum...?
:) yes almost, I haven't seen any working 386 since long.
and I never had a 386 nor any Intels after 80286.

I would have to look at these dusty old books to confirm your staement,
but what I once tried on my old AMD 486 seem to work the same way as
modern CPUs on this matter: strange behavior if instructions are between
CR0-wr and jmp-far because this 'intermediate mode' may not work as
expected.
Even we know a few brave enough to make such a mode a special case,
there is no guarantee that this work on another CPU too.
__
wolfgang
James Harris
2015-04-27 14:19:25 UTC
Permalink
"wolfgang kern" <***@never.at> wrote in message news:mhld4g$avp$***@speranza.aioe.org...

...
Post by wolfgang kern
Post by James Harris
so the GDT must be 8 byte aligned ....
Did you see somewhere that the GDT must be so aligned?
For sure it is a good idea but is it needed?
Yes! seems all manuals from 286 onward tell it.
Yes! it will raise an exception on the first access otherwise.
Could it have been the case for the 286 and possibly some but not all
later processors? I ask because I have found a comment in a PPro manual,
as follows: "The base addresses of the GDT should be aligned on an
eight-byte boundary to yield the best processor performance."

That seems to say that it is advisable, and hence not mandatory, on the
PPro. The manual is the PPro Family Developer's Manual Vol 3: OS
Writer's Guide.

After a search I also found "The base addresses of the GDT and IDT
should be aligned on an eight-byte boundary to maximize performance of
cache line fills." in a PPro manual vol 3.

Again, "should" not "must"!

...
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
Post by James Harris
5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
still in 16-bit mode. Right?
Right, the far jmp is 'the switch'.
Not on the 386 or 486. Two more that you would classify as for the
museum...?
:) yes almost, I haven't seen any working 386 since long.
and I never had a 386 nor any Intels after 80286.
I would have to look at these dusty old books to confirm your
staement,
but what I once tried on my old AMD 486 seem to work the same way as
modern CPUs on this matter: strange behavior if instructions are between
CR0-wr and jmp-far because this 'intermediate mode' may not work as
expected.
Even we know a few brave enough to make such a mode a special case,
there is no guarantee that this work on another CPU too.
Yes, this is unimportant as long as we follow the standard sequence -
until someone someone asks why a certain piece of code is not working,
of course.

FYI

http://css.csail.mit.edu/6.858/2014/readings/i386/s10_03.htm

Here's a quote from a 486 manual dated 1990:

"Protected mode is entered by setting the PE bit in the CRO, register.
Either an LMSW or MOV CRO instruction may be used to set this bit (the
MSW register is part of the CRO register). Because the processor
overlaps the interpretation of several instructions, it is necessary to
discard the instructions which already have been read into the
processor. A JMP instruction immediately after the LMSW instruction
changes the flow of execution, so it has the effect of emptying the
processor of instructions which have been fetched or decoded.

"After entering protected mode, the segment registers continue to hold
the contents they had in real address mode. Software should reload all
the segment registers. Execution in protected mode begins with a CPL of
O."

So it seems that for 386 and 486 only a jmp is needed. I found a Pentium
manual which seems to say that no jump is needed for the Pentium but
should be included for compatibility with the 386 and 486.

Of course, this is just for enabling Pmode ... enabling paging has its
own stipulations for forward and backward compatibility. (The Pentium
manual is helpful here but off topic for this thread.)

James
Rod Pemberton
2015-04-27 04:13:36 UTC
Permalink
On Sun, 26 Apr 2015 14:54:05 -0400, James Harris
<***@gmail.com> wrote:

[snip]
Post by James Harris
3. That jump instruction has a 16-bit form and a 32-bit form. It is
encoded in hex as
EA oo oo ss ss (16-bit form)
EA oo oo oo oo ss ss (32-bit form)
where the Ss are the selector and the Os are the offset as hex bytes.
Of course, the mixed mode code, i.e., using o32 (66h) and a32 (67h) for
NASM, will only work after one of the early processors, i.e., probably 386.
I.e., you may need the o32 override to use the 32-bit form in 16-bit code,
but that won't work for the oldest processors.
Post by James Harris
5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
still in 16-bit mode. Right?
I'm not clear on all the details here ...

My understanding is that CR0 enables PM, but the code segment is still
functioning as 16-bit RM until the CS segment descriptor cache is updated
for PM which is done via a far jump setting the CS selector. The PM
selector determines a PM 16-bit or PM 32-bit code segment.
Post by James Harris
6. Now, where it gets interesting is that the offset field of the EA
jump instruction seems to be an offset not from the jump instruction but
from the start if the segment. Is that correct?
Seems so ...

If you use the offset generated by the assembler label, it's likely to
be wrong since your PM base address for the code selector is likely to
be different from the address for the RM code segment in CS. E.g., RM
CS might be 07C0h while PM CS base address might be zero. So, the
assembler offset is relative to 07C0h, while the instruction offset
would need to be relative to zero.

This is probably why I've found using far returns and indirect far jumps
to be easier. You can compute the offset at runtime to match your code's
actual location relative to the base address for the descriptor.
Post by James Harris
If so then we have to be
careful which jump form is encoded, as follows.
If the executing code is in the low 64k of a descriptor's space then we
can encode the simple
EA oo oo ss ss
because the offset can fit in 16 bits. But if the executing code is
above the 64k mark relative to the start of the segment then we need to
encode the 32-bit form for 16-bit mode, i.e.
66 EA oo oo oo oo ss ss
As noted above, overrides may only work on the 386 or later.
IIRC, you're wanting your code to work on an 8086.
Post by James Harris
To make an example, say that the code that will enable Pmode is located
in memory so that the jump target is at physical address 0x12345. If the
GDT entry for privileged code has been set to describe all of memory,
i.e. from address 0 to address ff....fff, then it will be impossible to
use the 16-bit form of the EA jump instruction. Correct?
Solutions?
1) use a far return
2) use an indirect far call
3) set the PM descriptor base address for CS to match the RM segment
for CS, but only temporarily. After updating the base address to
it's final value, you may need to reload the selectors.


Rod Pemberton
--
Cars kill more people than guns in the U.S.
Yet, no one is trying to take away your car.
wolfgang kern
2015-04-27 06:01:49 UTC
Permalink
Rod Pemberton mentioned:

...
Post by Rod Pemberton
As noted above, overrides may only work on the 386 or later.
IIRC, you're wanting your code to work on an 8086.
set PM for 8086 ?
James need to try much harder then :)
... now at least there is VM86.
__
wolfgang
James Harris
2015-04-27 13:02:28 UTC
Permalink
Post by Rod Pemberton
On Sun, 26 Apr 2015 14:54:05 -0400, James Harris
...
Post by Rod Pemberton
Post by James Harris
5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
still in 16-bit mode. Right?
I'm not clear on all the details here ...
My understanding is that CR0 enables PM, but the code segment is still
functioning as 16-bit RM until the CS segment descriptor cache is updated
for PM which is done via a far jump setting the CS selector. The PM
selector determines a PM 16-bit or PM 32-bit code segment.
Those specifics seem to depend on which processor you are using. The
Pentium's requirements are different from those for 386 and 486, for
example. Fortunately Intel suggest a sequence that will work on anything
from a 386 upwards so we don't need to think about the differences.
Post by Rod Pemberton
Post by James Harris
6. Now, where it gets interesting is that the offset field of the EA
jump instruction seems to be an offset not from the jump instruction but
from the start if the segment. Is that correct?
Seems so ...
If you use the offset generated by the assembler label, it's likely to
be wrong since your PM base address for the code selector is likely to
be different from the address for the RM code segment in CS. E.g., RM
CS might be 07C0h while PM CS base address might be zero. So, the
assembler offset is relative to 07C0h, while the instruction offset
would need to be relative to zero.
This is probably why I've found using far returns and indirect far jumps
to be easier. You can compute the offset at runtime to match your code's
actual location relative to the base address for the descriptor.
Yes, as long as you use the 32-bit version of the indirect jump - i.e.
with the 0x66 prefix - the jump target can be anywhere. (I think Nasm
allows the dword keyword for that.)

...
Post by Rod Pemberton
Post by James Harris
To make an example, say that the code that will enable Pmode is located
in memory so that the jump target is at physical address 0x12345. If the
GDT entry for privileged code has been set to describe all of memory,
i.e. from address 0 to address ff....fff, then it will be impossible to
use the 16-bit form of the EA jump instruction. Correct?
Solutions?
1) use a far return
2) use an indirect far call
3) set the PM descriptor base address for CS to match the RM segment
for CS, but only temporarily. After updating the base address to
it's final value, you may need to reload the selectors.
I like the idea of the indirect far call or, at least, an indirect far
jump because it makes the values easy to calculate. I am not so sure
about the far return. It may work in some or all cases but doesn't seem
to be guaranteed to work.

Your (3) sounds like one I suggested. In fact the whole GDT can be
temporary. In my case I would rather allocate space for the real GDT
once memory management is working so a temporary GDT is good just to get
Pmode working.

James
Rod Pemberton
2015-04-27 22:46:29 UTC
Permalink
On Mon, 27 Apr 2015 09:02:28 -0400, James Harris
Post by James Harris
Post by Rod Pemberton
On Sun, 26 Apr 2015 14:54:05 -0400, James Harris
Post by James Harris
5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
still in 16-bit mode. Right?
I'm not clear on all the details here ...
My understanding is that CR0 enables PM, but the code segment is still
functioning as 16-bit RM until the CS segment descriptor cache is
updated for PM which is done via a far jump setting the CS selector.
The PM selector determines a PM 16-bit or PM 32-bit code segment.
Those specifics seem to depend on which processor you are using. The
Pentium's requirements are different from those for 386 and 486, for
example. Fortunately Intel suggest a sequence that will work on anything
from a 386 upwards so we don't need to think about the differences.
True, the manuals vary on the process needed. AIR, we've discussed some
aspects of that in the past, which were many and varied.

You deviated from my point that was in response to you saying "... the
CPU is still in 16-bit mode. Right?" AIUI, you're not fully in PM simply
by setting CR0.PE. The far jump also is required to activate PM for the
executing code segment.
Post by James Harris
Post by Rod Pemberton
Post by James Harris
6. Now, where it gets interesting is that the offset field of the EA
jump instruction seems to be an offset not from the jump instruction but
from the start if the segment. Is that correct?
Seems so ...
If you use the offset generated by the assembler label, it's likely to
be wrong since your PM base address for the code selector is likely to
be different from the address for the RM code segment in CS. E.g., RM
CS might be 07C0h while PM CS base address might be zero. So, the
assembler offset is relative to 07C0h, while the instruction offset
would need to be relative to zero.
This is probably why I've found using far returns and indirect far
jumps to be easier. You can compute the offset at runtime to match
your code's actual location relative to the base address for the
descriptor.
Yes, as long as you use the 32-bit version of the indirect jump - i.e.
with the 0x66 prefix - the jump target can be anywhere. (I think Nasm
allows the dword keyword for that.)
Why would you need the 32-bit version when the code is to be loaded
below 1MB? ...
Post by James Harris
Post by Rod Pemberton
Post by James Harris
To make an example, say that the code that will enable Pmode is
located in memory so that the jump target is at physical address
0x12345. If the GDT entry for privileged code has been set to describe
all of memory, i.e. from address 0 to address ff....fff, then it will
be impossible to use the 16-bit form of the EA jump instruction.
Correct?
Solutions?
1) use a far return
2) use an indirect far call
3) set the PM descriptor base address for CS to match the RM segment
for CS, but only temporarily. After updating the base address to
it's final value, you may need to reload the selectors.
I like the idea of the indirect far call or, at least, an indirect far
jump because it makes the values easy to calculate.
...
Post by James Harris
I am not so sure about the far return. It may work in some or all
cases but doesn't seem to be guaranteed to work.
Why do you say it doesn't seem to be guaranteed to work? ...
I don't recall ever seeing anything in the manuals to indicate that.
Shouldn't any control transfer which sets CS to a PM selector work?
Post by James Harris
Your (3) sounds like one I suggested. In fact the whole GDT can be
temporary. In my case I would rather allocate space for the real GDT
once memory management is working so a temporary GDT is good just to get
Pmode working.
I took your comments to mean not re-using the same descriptor,
whereas mine was to re-use the same descriptor.


Rod Pemberton
--
Cars kill more people than guns in the U.S.
Yet, no one is trying to take away your car.
James Harris
2015-04-28 20:08:54 UTC
Permalink
Post by Rod Pemberton
On Mon, 27 Apr 2015 09:02:28 -0400, James Harris
Post by James Harris
Post by Rod Pemberton
On Sun, 26 Apr 2015 14:54:05 -0400, James Harris
Post by James Harris
5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
still in 16-bit mode. Right?
I'm not clear on all the details here ...
My understanding is that CR0 enables PM, but the code segment is still
functioning as 16-bit RM until the CS segment descriptor cache is
updated for PM which is done via a far jump setting the CS selector.
The PM selector determines a PM 16-bit or PM 32-bit code segment.
Those specifics seem to depend on which processor you are using. The
Pentium's requirements are different from those for 386 and 486, for
example. Fortunately Intel suggest a sequence that will work on anything
from a 386 upwards so we don't need to think about the differences.
True, the manuals vary on the process needed. AIR, we've discussed some
aspects of that in the past, which were many and varied.
You deviated from my point that was in response to you saying "... the
CPU is still in 16-bit mode. Right?" AIUI, you're not fully in PM simply
by setting CR0.PE. The far jump also is required to activate PM for the
executing code segment.
I didn't mean to deviate but I can see I wasn't explicit. I meant that,
perhaps, for the 386 and 486 you really are in Pmode once the prefetch
queue has drained. I know that the segment registers will not have been
loaded with Pmode selectors but the early manuals seem to say that you
are nonetheless in Pmode. For example, I expect that data accesses will
be 32-bit by default at that point.
Post by Rod Pemberton
Post by James Harris
Post by Rod Pemberton
Post by James Harris
6. Now, where it gets interesting is that the offset field of the EA
jump instruction seems to be an offset not from the jump
instruction
but
from the start if the segment. Is that correct?
Seems so ...
If you use the offset generated by the assembler label, it's likely to
be wrong since your PM base address for the code selector is likely to
be different from the address for the RM code segment in CS. E.g., RM
CS might be 07C0h while PM CS base address might be zero. So, the
assembler offset is relative to 07C0h, while the instruction offset
would need to be relative to zero.
This is probably why I've found using far returns and indirect far
jumps to be easier. You can compute the offset at runtime to match
your code's actual location relative to the base address for the
descriptor.
Yes, as long as you use the 32-bit version of the indirect jump - i.e.
with the 0x66 prefix - the jump target can be anywhere. (I think Nasm
allows the dword keyword for that.)
Why would you need the 32-bit version when the code is to be loaded
below 1MB? ...
Why 1MB Rod? Wouldn't you need the 32-bit version above 64k?
Specifically, I think it would be needed any time the jump target was
more than 64k beyond the new CS.

Notably, even if the calculated address is above 64k the assembler may
*silently* truncate the jump offset to 16 bits so it's something to be
aware of. Some innocuous
change elsewhere in the code could make the code bigger and thereby push
the jump target over the threshold and cause the code to fail. So it may
be a good idea to code the dword version anyway. That would make the
code more resilient against changes.
Post by Rod Pemberton
Post by James Harris
Post by Rod Pemberton
Post by James Harris
To make an example, say that the code that will enable Pmode is
located in memory so that the jump target is at physical address
0x12345. If the GDT entry for privileged code has been set to describe
all of memory, i.e. from address 0 to address ff....fff, then it will
be impossible to use the 16-bit form of the EA jump instruction.
Correct?
Solutions?
1) use a far return
2) use an indirect far call
3) set the PM descriptor base address for CS to match the RM segment
for CS, but only temporarily. After updating the base address to
it's final value, you may need to reload the selectors.
I like the idea of the indirect far call or, at least, an indirect far
jump because it makes the values easy to calculate.
...
Post by James Harris
I am not so sure about the far return. It may work in some or all
cases but doesn't seem to be guaranteed to work.
Why do you say it doesn't seem to be guaranteed to work? ...
I don't recall ever seeing anything in the manuals to indicate that.
Shouldn't any control transfer which sets CS to a PM selector work?
Because Intel say that for backwards and forwards compatibility to do X
and X includes a far jump and a far call but not a far return.

Of course, Intel is not the only manufacturer. At one time there were
many. Hopefully they work with Intel's suggested initialisation
sequence.
Post by Rod Pemberton
Post by James Harris
Your (3) sounds like one I suggested. In fact the whole GDT can be
temporary. In my case I would rather allocate space for the real GDT
once memory management is working so a temporary GDT is good just to get
Pmode working.
I took your comments to mean not re-using the same descriptor,
whereas mine was to re-use the same descriptor.
OK.

James
Rod Pemberton
2015-04-29 00:06:05 UTC
Permalink
Post by Rod Pemberton
Post by James Harris
To make an example, say that the code that will enable Pmode is located
in memory so that the jump target is at physical address 0x12345. If the
GDT entry for privileged code has been set to describe all of memory,
i.e. from address 0 to address ff....fff, then it will be impossible to
use the 16-bit form of the EA jump instruction. Correct?
Solutions?
1) use a far return
2) use an indirect far call
3) set the PM descriptor base address for CS to match the RM segment
for CS, but only temporarily. After updating the base address to
it's final value, you may need to reload the selectors.
Of course, the easiest solution is a PM descriptor base address of zero,
and matching RM CS equal to zero. But, CS == 0 brings up the entire
CS=0,IP=7C00h versus CS=07C0h,IP=0 debate, again, which we've gone
over countless times now.


Rod Pemberton
--
Cars kill more people than guns in the U.S.
Yet, no one is trying to take away your car.
James Harris
2015-04-29 08:44:22 UTC
Permalink
On Mon, 27 Apr 2015 00:13:36 -0400, Rod Pemberton
Post by Rod Pemberton
Post by James Harris
To make an example, say that the code that will enable Pmode is located
in memory so that the jump target is at physical address 0x12345. If the
GDT entry for privileged code has been set to describe all of memory,
i.e. from address 0 to address ff....fff, then it will be impossible to
use the 16-bit form of the EA jump instruction. Correct?
Solutions?
1) use a far return
2) use an indirect far call
3) set the PM descriptor base address for CS to match the RM segment
for CS, but only temporarily. After updating the base address to
it's final value, you may need to reload the selectors.
Of course, the easiest solution is a PM descriptor base address of zero,
and matching RM CS equal to zero.
You would still need the "dword" or 0x66 variant to get to 0x12345,
wouldn't you? I've come to think that having the dword variant in at all
times is probably a good idea because it is more robust, albeit that it
is three bytes longer.

Put another way, if the dword variant was the default then when omitted
someone in someone could ask "why did you omit dword there?" and get the
answer "because we are guaranteed that the branch target will be within
64k of the Pmode CS" or similar. Having it omitted by default
(especially in sample code) leads to it not being considered and
therefore code which is either broken or fragile.

I may also use a temporary GDT entry (or a temporary GDT) which has a
privileged CS descriptor set to match the RM CS=0 point. That makes the
transition code simpler.

Another good option is the indirect jump you mentioned. That, too,
allows the target address to be calculated.

At the end of the day this has been an interesting thread for me because
I have realised that it can be too easy simply not to think about what
that jump does. Wolfgang had sussed it out but it is something I had
never thought about in enough detail because sample code just makes it
look like a simple jump - often a jump to the next instruction. Easy
right? No! As we have discussed, it is only that simple under certain
conditions, not always.

Another interesting finding was that the immediate jump naturally has
16-bit operands, not 32-bit ones. So in assembly source it's right to
place the "bits 32" directive *after* the jump and not before it.
But, CS == 0 brings up the entire
CS=0,IP=7C00h versus CS=07C0h,IP=0 debate, again, which we've gone
over countless times now.
AIUI that is (or was) a different issue. Although the boot sector code
(if any) will be loaded at 0x7c00 there is no guarantee that it will
move itself out of the way and load any following code there. So the
subsequent code may be somewhere completely different.

Even with CS=0 a jump to the suggested jump-target address of 0x12345
would need the dword variant wouldn't it, because of the target being
beyond the 64k point?

James
James Harris
2021-06-12 19:06:11 UTC
Permalink
Post by James Harris
You know that, per Intel's directions, after setting the CR0 PE flag
with
mov eax, cr0
or al, 1
mov cr0, eax
we are expected to have something like
jmp seg:pmode_running
I had taken that jump instruction for granted but the recent Qemu/GDT
thread has brought up some issues about the jump, as follows.
1. The jump appears to be necessary in order to put the correct pmode
GDT entry number in CS (in its upper 13 bits, i.e. shifted left 3 bits)
and also to set the low bits of CS so that they contain the CPL and TI,
all of which should be zero.
Going back to this old thread as I have some more information.

The Sybex book Programming the 80386 says on the back that its authors are one of the 80386's logic designers and the 80386's (and thus x86-32's) chief architect, so they ought to know a thing or two about the design! On page 605 the book says that after setting the PE bit the processor enters "16-bit protected mode" - which I didn't know even existed (but see below).

The processor apparently stays in that mode (Protected Mode but 16-bit) for as long as we want, and what changes it to 32-bit PMode is an instruction which loads CS with the selector for a 32-bit descriptor.

Correspondingly, it would appear to be possible to change /back/ to 16-bit PMode by loading CS with the selector of a 16-bit descriptor.

As for what determines whether a descriptor is 32-bit or 16-bit remember that apart from Base and Limit a descriptor has one byte and one nibble for other fields. I thought the difference might be in the descriptor type code (which is in the byte) but it's actually in the B bit which is in the nibble. See

https://en.wikipedia.org/wiki/Segment_descriptor#Structure

Basically, B=0 means 16-bit and B=1 means 32-bit.

I knew about the bit, of course, but never really understood it; and I didn't realise that B=0 was the mode the processor executed in after enabling PMode prior to reloading CS.
Post by James Harris
2. On the 386 any kind of jump was needed immediately following the MOV
to CR0 - even a near jump - in order to flush the prefetch queue. On
Pentium Pro and later (and maybe even on the Pentium 1) there is no need
to flush the queue but the far jump keeps things compatible as it will
flush the prefetch queue on early CPUs as well as load the Pmode CS on
all of them.
The different stages of enabling PMode are now possibly easier to guess at.

1. After enabling CR0.PE

The processor will be in 16-bit PM but early processors (including 386 and 486) did not automatically flush the prefetch queue so they could have already decoded some of the following bytes as RM. I would guess that many such decodings would be different but that some could be the same. If that's right then some instructions could be validly executed here even without flushing the prefetch queue. There's no value in doing so but ISTM informative to see exactly what it likely to change and when.

2. After flushing the prefetch queue

This will apparently be true 16-bit PM with a 64k limit on code addresses - and probably the same for data addresses.

From this point it turns out that contrary to normal practice one could load selectors for 32-bit /data/ descriptors while still running the code in 16-bit mode. I say that because the code in

https://archive.org/stream/bitsavers_intel80386ammersReferenceManual1986_27457025/230985-001_80386_Programmers_Reference_Manual_1986_djvu.txt

includes the following:

LGDT tGDT__pword

; switch to protected mode
MOV EAX,CR0
MOV EAX,1
MOV CR0,EAX

; clear prefetch queue
JMP SHORT flush
flush:

; set DS,ES,SS to address flat linear space (0 ... 4GB)
MOV BX,FLAT_DES-Temp_GDT
MOV DS,BX
MOV ES,BX
MOV SS,BX

Note the data selectors being loaded before the code selector (which the code changes much later) - and the botched update of CR0 which appeared in many Intel sources of the time.

FWIW, the code also goes on to do a bunch of other stuff such as

; initialize stack pointer to some (arbitrary) RAM location
MOV ESP, OFFSET end_Temp_GDT

; copy eprom GDT to RAM
MOV ESI, DWORD PTR GDT_eprom +2 ; get base of eprom GDT
MOV EDI,GDTbase
MOV CX,WORD PTR gdt_eprom +0
INC CX
SHR CX,1
CLD
REP MOVS WORD PTR ES : [EDI] , WORD PTR DS:[ESI]

; point ES:EDI to GDT base in RAM.
; limit of eprom GDT
; easier to move words

;copy eprom IDT to RAM
MOV ESI, DWORD PTR IDT_eprom +2 ; get base of eprom IDT
MOV EDI,IDTbase
MOV CX,WORD PTR idt_eprom +0
INC CX
SHR CX,1
CLD
REP MOVS WORD PTR ES : [EDI] , WORD PTR DS:[ESI]

etc, all before setting CS to a 32-bit descriptor.


3. After loading CS (via a far jump or fall call or by a TSS switch) to refer to a 32-bit descriptor.

The code will finally be in PM32.
Post by James Harris
I knew the above but the following points are of particular interest
just now as I had not considered them before - or if I had then I had
forgotten the subtleties of the problem.
3. That jump instruction has a 16-bit form and a 32-bit form. It is
encoded in hex as
EA oo oo ss ss (16-bit form)
EA oo oo oo oo ss ss (32-bit form)
where the Ss are the selector and the Os are the offset as hex bytes.
The above info implies that it's the /short/ version of the jump which is required.

...
Post by James Harris
5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
still in 16-bit mode. Right?
That turns out to be right.
Post by James Harris
6. Now, where it gets interesting is that the offset field of the EA
jump instruction seems to be an offset not from the jump instruction but
from the start if the segment. Is that correct? If so then we have to be
careful which jump form is encoded, as follows.
If the executing code is in the low 64k of a descriptor's space then we
can encode the simple
EA oo oo ss ss
because the offset can fit in 16 bits. But if the executing code is
above the 64k mark relative to the start of the segment then we need to
encode the 32-bit form for 16-bit mode, i.e.
66 EA oo oo oo oo ss ss
To make an example, say that the code that will enable Pmode is located
in memory so that the jump target is at physical address 0x12345. If the
GDT entry for privileged code has been set to describe all of memory,
i.e. from address 0 to address ff....fff, then it will be impossible to
use the 16-bit form of the EA jump instruction. Correct?
Solutions?
Solution 1. Set up a temporary GDT entry to point to the place in memory
where the code is running. In the above case, the GDT entry could point
at 0x10000 and then the jump offset would be 0x2345, leading to the jump
instruction being encoded as
EA 45 23 ss ss (bytes shown in memory order, i.e. little endian)
Solution 2. Modify the jump instruction so that the code's location
relative to the start of the privileged code segment does not matter.
That leads to
66 EA 45 23 01 00 ss ss (bytes shown in little-endian order)
When I did this before I used a temporary GDT entry to point to the
executing code, i.e. solution 1, but solution 2 also has merits.
--
James Harris
James Harris
2021-06-12 19:23:35 UTC
Permalink
Post by James Harris
Post by James Harris
You know that, per Intel's directions, after setting the CR0 PE flag
with
mov eax, cr0
or al, 1
mov cr0, eax
we are expected to have something like
jmp seg:pmode_running
I had taken that jump instruction for granted but the recent Qemu/GDT
thread has brought up some issues about the jump, as follows.
...
Post by James Harris
Post by James Harris
3. That jump instruction has a 16-bit form and a 32-bit form. It is
encoded in hex as
EA oo oo ss ss (16-bit form)
EA oo oo oo oo ss ss (32-bit form)
where the Ss are the selector and the Os are the offset as hex bytes.
The above info implies that it's the /short/ version of the jump which is required.
I found some code from AMD which confirms it. Note the two dws in the
following code fragment.

mov eax, 000000011h
mov cr0, eax

;Execute a far jump to turn protected mode on.
;code16_sel must point to the previously-established 16-bit
;code descriptor located in the GDT (for the code currently
;being executed).
db 0eah ;Far jump...
dw offset now_in_prot ; to offset...
dw code16_sel ; in current code segment


That's from the AMD64 Architecture Programmer’s Manual Volume 2: System
Programming, publication number 24593 revision 3.11 December 2005.


That's a good, clear confirmation.

BTW, it's unusual to see CR0's bit 1 set at the same time....
--
James Harris
James Harris
2022-02-10 14:47:52 UTC
Permalink
Post by James Harris
Post by James Harris
Post by James Harris
You know that, per Intel's directions, after setting the CR0 PE flag
with
mov eax, cr0
or al, 1
mov cr0, eax
we are expected to have something like
jmp seg:pmode_running
I had taken that jump instruction for granted but the recent Qemu/GDT
thread has brought up some issues about the jump, as follows.
...
Post by James Harris
Post by James Harris
3. That jump instruction has a 16-bit form and a 32-bit form. It is
encoded in hex as
EA oo oo ss ss (16-bit form)
EA oo oo oo oo ss ss (32-bit form)
where the Ss are the selector and the Os are the offset as hex bytes.
The above info implies that it's the /short/ version of the jump which is required.
I found some code from AMD which confirms it.
A further point about the machine's state just after setting CR0 bit 0
and some queries which can be taken as rhetorical as they are somewhat
academic but may be of note nonetheless if you are interested in the detail.

It boils down to how instructions are decoded after setting CR0.PE.

From memory, it is advised to put a jump immediately after enabling
Pmode so as to flush the instruction pipeline on early processors 386,
486, and possibly Pentium before Pentium Pro. But is the pipeline
flushed before or after the jump is executed? IOW, is the jump itself
decoded in real mode or in PMode16 (the state the processor is put in by
the load of CR0)?

Perhaps all jump instructions assemble to the same bytes so it wouldn't
matter. Certainly a jump to the next instruction will be coded as EB00
in either 16-bit or 32-bit mode.

In fact, as the assembler (Nasm, in this case but the point is
presumably generally applicable) recognises only "bits 16" or "bits 32"
and not Rmode and Pmode perhaps instructions in PM16 decode exactly as
they do in Real mode.

But then if the encodings are the same then why would flushing the
instruction queue be necessary?

To be clear, where I believe a "bits 32" directive sits is /after/ the
far jump as in

mov cr0, eax ;Set bit 0
... (potentially a large number of instructions)
jmp seg:offset
bits 32
offset:

It seems a bit of a conundrum and leads to the obvious question: exactly
what differences are there between instruction decoding in real mode and
in PM16 (the mode immediately after setting CR0 bit 0?

As I say, this is all largely academic but if you happen to know the
answer without doing any research do say as the details look interesting.
--
James Harris
a***@math.uni.wroc.pl
2022-02-10 18:34:48 UTC
Permalink
Post by James Harris
Post by James Harris
Post by James Harris
Post by James Harris
You know that, per Intel's directions, after setting the CR0 PE flag
with
mov eax, cr0
or al, 1
mov cr0, eax
we are expected to have something like
jmp seg:pmode_running
I had taken that jump instruction for granted but the recent Qemu/GDT
thread has brought up some issues about the jump, as follows.
...
Post by James Harris
Post by James Harris
3. That jump instruction has a 16-bit form and a 32-bit form. It is
encoded in hex as
EA oo oo ss ss (16-bit form)
EA oo oo oo oo ss ss (32-bit form)
where the Ss are the selector and the Os are the offset as hex bytes.
The above info implies that it's the /short/ version of the jump which is required.
I found some code from AMD which confirms it.
A further point about the machine's state just after setting CR0 bit 0
and some queries which can be taken as rhetorical as they are somewhat
academic but may be of note nonetheless if you are interested in the detail.
It boils down to how instructions are decoded after setting CR0.PE.
From memory, it is advised to put a jump immediately after enabling
Pmode so as to flush the instruction pipeline on early processors 386,
486, and possibly Pentium before Pentium Pro. But is the pipeline
flushed before or after the jump is executed? IOW, is the jump itself
decoded in real mode or in PMode16 (the state the processor is put in by
the load of CR0)?
Your question is confused. Instructions are decoded in parallel with
execution of previous instructions. This may happen few clocks
before execution. So _normally_ jump is decoded before CR0 bit 0
is set, so decoded in real mode. But I can imagine rather obscure
situation so that jump is decoded after CR0 bit 0 is set, so in
protected mode.
Post by James Harris
Perhaps all jump instructions assemble to the same bytes so it wouldn't
matter. Certainly a jump to the next instruction will be coded as EB00
in either 16-bit or 32-bit mode.
That is recomended instruction, safe under all cirumstances.
Post by James Harris
In fact, as the assembler (Nasm, in this case but the point is
presumably generally applicable) recognises only "bits 16" or "bits 32"
and not Rmode and Pmode perhaps instructions in PM16 decode exactly as
they do in Real mode.
But then if the encodings are the same then why would flushing the
instruction queue be necessary?
Correspondence between mnemonics and intruction bits is the same,
so no need for extra assembler directive. But processor treats
intruction bits differently depending on mode.
Post by James Harris
To be clear, where I believe a "bits 32" directive sits is /after/ the
far jump as in
mov cr0, eax ;Set bit 0
... (potentially a large number of instructions)
jmp seg:offset
bits 32
It seems a bit of a conundrum and leads to the obvious question: exactly
what differences are there between instruction decoding in real mode and
in PM16 (the mode immediately after setting CR0 bit 0?
Note: "canonical" sequence has _two_ jumps: one short jump to
flush decode queue and long jump to load CS. Namely, setting
CR0 bit 0 gives you 16 bit proteded mode. You switch to 32-bit
mode by loading 32-bit segment descriptor.

IIRC simpler sequence failed (on actual 386). But I did my
tests more than 25 years ago so I am not 100% confident in my
memory. So, if you want to be sure or want to know what happens
beyond simple fail/works, then get real 386 and run enough tests...
OTOH I would not expect detailed explanations, during switch
processor is in strange transitory mode so can exhibit
weird behaviour. Intel documented how to avoid pitfals,
but they have no interest in providing deeper explanation.
--
Waldek Hebisch
James Harris
2022-02-11 11:41:23 UTC
Permalink
...
Post by a***@math.uni.wroc.pl
Post by James Harris
From memory, it is advised to put a jump immediately after enabling
Pmode so as to flush the instruction pipeline on early processors 386,
486, and possibly Pentium before Pentium Pro. But is the pipeline
flushed before or after the jump is executed? IOW, is the jump itself
decoded in real mode or in PMode16 (the state the processor is put in by
the load of CR0)?
Your question is confused. Instructions are decoded in parallel with
execution of previous instructions. This may happen few clocks
before execution. So _normally_ jump is decoded before CR0 bit 0
is set, so decoded in real mode. But I can imagine rather obscure
situation so that jump is decoded after CR0 bit 0 is set, so in
protected mode.
The point is that the encoding of relative jumps appears to be the same
in Real Mode and 16-bit Protected Mode so why does it matter which mode
they are decoded in?
Post by a***@math.uni.wroc.pl
Post by James Harris
Perhaps all jump instructions assemble to the same bytes so it wouldn't
matter. Certainly a jump to the next instruction will be coded as EB00
in either 16-bit or 32-bit mode.
That is recomended instruction, safe under all cirumstances.
So, it seems, are other jumps. And possibly most instructions.

Any idea which instructions would be coded differently in RM and PM16?

AFAICS almost none. I am beginning to suspect it's only direct memory
accesses (not relative ones such as relative jmp and call).
Post by a***@math.uni.wroc.pl
Post by James Harris
In fact, as the assembler (Nasm, in this case but the point is
presumably generally applicable) recognises only "bits 16" or "bits 32"
and not Rmode and Pmode perhaps instructions in PM16 decode exactly as
they do in Real mode.
But then if the encodings are the same then why would flushing the
instruction queue be necessary?
Correspondence between mnemonics and intruction bits is the same,
so no need for extra assembler directive. But processor treats
intruction bits differently depending on mode.
Can you think of an example other than direct memory accesses?
Post by a***@math.uni.wroc.pl
Post by James Harris
To be clear, where I believe a "bits 32" directive sits is /after/ the
far jump as in
mov cr0, eax ;Set bit 0
... (potentially a large number of instructions)
jmp seg:offset
bits 32
It seems a bit of a conundrum and leads to the obvious question: exactly
what differences are there between instruction decoding in real mode and
in PM16 (the mode immediately after setting CR0 bit 0?
Note: "canonical" sequence has _two_ jumps: one short jump to
flush decode queue and long jump to load CS. Namely, setting
CR0 bit 0 gives you 16 bit proteded mode. You switch to 32-bit
mode by loading 32-bit segment descriptor.
Yes, and one can apparently mix the two modes - some segments in PM16
and some in PM32.
Post by a***@math.uni.wroc.pl
IIRC simpler sequence failed (on actual 386). But I did my
tests more than 25 years ago so I am not 100% confident in my
memory. So, if you want to be sure or want to know what happens
beyond simple fail/works, then get real 386 and run enough tests...
OTOH I would not expect detailed explanations, during switch
processor is in strange transitory mode so can exhibit
weird behaviour. Intel documented how to avoid pitfals,
but they have no interest in providing deeper explanation.
Indeed.
--
James Harris
wolfgang kern
2022-02-10 22:33:00 UTC
Permalink
On 10/02/2022 15:47, James Harris wrote:
...
Post by James Harris
It seems a bit of a conundrum and leads to the obvious question: exactly
what differences are there between instruction decoding in real mode and
in PM16 (the mode immediately after setting CR0 bit 0?
As I say, this is all largely academic but if you happen to know the
answer without doing any research do say as the details look interesting.
1. this EB 00 after write CR0 were never required, at least not by me.
2. setting PE does nothing on its own, the CPU remain in real mode until
the far jump which changes interpretation from segment to descriptor.
and its a 16:16 code without prefix

my RM->PM switches look like:
MOV eax,CR0
OR eax,1
MOV CR0,eax
push 0x20 ;prepared selectors
pop ds
push 0x20 ;20=flat data
pop es
push 0x10 ;10=restricted stack
pop ss
mov esp.0000xxxx
jmp 0018:PM16 or jmp 66 0028:PM32 or even jmp 66 0038:LM64

PMl6:
...
PM32:
...
__
wolfgang
James Harris
2022-02-12 11:08:39 UTC
Permalink
I think I may have come up with a clearer insight into what happens at
each step of enabling Pmode - and it's very little! See below for details.
Post by wolfgang kern
...
exactly what differences are there between instruction decoding in
real mode and in PM16 (the mode immediately after setting CR0 bit 0?
As I say, this is all largely academic but if you happen to know the
answer without doing any research do say as the details look interesting.
1. this EB 00 after write CR0 were never required, at least not by me.
From what I've found recently it looks as though it would be rare for
anyone to need that jump. (Though it or something like it is still right
to include to cover the unusual cases.)
Post by wolfgang kern
2. setting PE does nothing on its own, the CPU remain in real mode until
   the far jump which changes interpretation from segment to descriptor.
   and its a 16:16 code without prefix
I am not sure that's right, Wolfgang. I am beginning to think that once
PE is set the processor will be in 16-bit Protected Mode (PM16); in that
mode the encoding of instructions will be identical to RM; and the main
differences will be when loading segment registers. There may also be
some differences when /using/ segment registers but see below.

I have been looking at the 80286 manuals to find out more about PM16. It
turns out that it has the same addressing limitations as the 8086, i.e.

BP/BX + SI/DI + displacement

and apparently the same encoding.

(Ref. 80286 and 80287 programmer's manual 1987 section 2.3.3 and table 2-1)

If PM16 has the same encoding of instructions and of addressing modes as
RM then then the question arises as to what that jmp to flush the decode
queue is for?

After setting PE there will be zero or more of the following bytes
already decoded. (The number will vary according to the processor, its
caching, and the dynamic flow of prior instructions.) Those encodings
and meanings will be the same in PM16 as they were in RM for most
instructions so the question is Where does it matter?

I can think of two potential cases:

1. /Loading/ a segment register.

When a segment register is loaded in PM16 the processor gets a memory
descriptor from a descriptor table much as it does in PM32. The
descriptor includes a base and a limit. AFIACS this is the first
relevant difference from RM. A segment register reloaded in PM16 would
not necessarily refer to the same address as it would in RM. Consider a
sequence such as

lmsw ax ;Enter protected mode
mov ds, bx

In that, if the MOV has already been decoded then segment register DS
would be loaded with the value of BX but if the MOV had not been decoded
then DS would be loaded with the descriptor which BX selects.

2. Using a segment register.

The second potential scenario is /using/ a segment register. Consider

lmsw ax ;Enter protected mode
mov ax, [val]

The [val] reference would use DS so what is the code likely to do?

It is possible that a processor would trigger an interrupt to indicate
an exception because DS was not valid for PM. It could even lock up.

But there is another possibility. The 80286 and later CPUs could have
been designed to use the internal PM-type descriptors at all times, even
when running in RM. They would essentially need to make sure that when a
segment register is loaded in Real Mode that the base address, limit and
flags are set appropriately:

base = seg << 4
limit = 0xffff
flags = don't check anything

The hardware engineers could have taken either approach but I suspect
the latter. That's for three reasons. First, Pmode operation can subsume
that in Rmode allowing the difference to be only when segment registers
are loaded. Second, they did exactly that with the IVT. And third, they
would probably have had to do that for the code segment. I say that
because the CPU has to continue to retrieve bytes of code after the
switch to PM16 and as we've seen from Intel examples, such code could
continue for an arbitrary number of bytes before loading a new selector
into CS. So accesses relative to CS would /have/ to continue to use the
old interpretation of CS until the code reloaded it. It would be natural
to apply the same rules to DS and other segment registers.

IOW, after enabling Pmode the JMP instruction was required to ensure
that subsequent loads of segment registers used the Pmode interpretation
rather than the Rmode one ... and possibly nothing else.

Of course, it's still right to include an immediate jump for portability
and this is all speculation but ISTM that it does give a simple and
consistent model of the likely state of the processor between enabling
Pmode and reloading segment registers: Whether in Rmode or Pmode the PE
bit primarily determines what effect loading a segment register has.

In summary:

mov cr0, eax ;Enter Pmode
... nothing changed in the CPU other than the PE bit
... instructions using same encoding & addressing as in Rmode
mov ax, 8 ;Operates normally
jmp $ + 2
... still nothing else changed in the architectural state of the
... CPU except that we have ensured that the following segment
... register access will be interpreted correctly for Pmode
mov ds, ax
... DS now has base, limit and protections as loaded from GDT

That's it. Feel free to disagree.

But if I am right then it's amazing how little changes in the CPU
between each step.
--
James Harris
wolfgang kern
2022-02-12 12:01:27 UTC
Permalink
On 12/02/2022 12:08, James Harris wrote:
...
Post by James Harris
Post by wolfgang kern
2. setting PE does nothing on its own, the CPU remain in real mode until
    the far jump which changes interpretation from segment to descriptor.
    and its a 16:16 code without prefix
I am not sure that's right, Wolfgang. I am beginning to think that once
PE is set the processor will be in 16-bit Protected Mode (PM16); in that
mode the encoding of instructions will be identical to RM; and the main
differences will be when loading segment registers. There may also be
some differences when /using/ segment registers but see below.
...
Post by James Harris
  mov ds, ax
  ... DS now has base, limit and protections as loaded from GDT
That's it. Feel free to disagree.
Yeah you're partly right :) the CPU isn't in PM unless you alter CS.
but write to a data segment register invokes UNREAL mode.
the only thing which is different with a set PE-bit is interpretation
changes from segment to descriptor but only when segreg is written to.

you could do after setting PE:
mov ds,[variable] ;the var is still a real mode DS address
;and DS became a descriptor after this.
also:
mov esp,[cs:d16] ;uses the current CS range
Post by James Harris
But if I am right then it's amazing how little changes in the CPU
between each step.
I see only one beside the final jump.

As long CS remain untouched there are no privilege checks, so it acts
like in real mode for ALL "otherwise protected" instructions.
That's why I said the change occur only on write CS.
OK I forgot the UNREAL exception here even I use that a lot.
__
wolfgang
James Harris
2022-02-12 16:35:49 UTC
Permalink
Post by wolfgang kern
...
Post by James Harris
Post by wolfgang kern
2. setting PE does nothing on its own, the CPU remain in real mode until
    the far jump which changes interpretation from segment to descriptor.
    and its a 16:16 code without prefix
I am not sure that's right, Wolfgang. I am beginning to think that
once PE is set the processor will be in 16-bit Protected Mode (PM16);
in that mode the encoding of instructions will be identical to RM; and
the main differences will be when loading segment registers. There may
also be some differences when /using/ segment registers but see below.
...
Post by James Harris
   mov ds, ax
   ... DS now has base, limit and protections as loaded from GDT
That's it. Feel free to disagree.
Yeah you're partly right :) the CPU isn't in PM unless you alter CS.
Why could it not be in PM running a 16-bit code segment?

Remember the isolated nibble in the IA32 descriptor just after
Base31::24. Its bits can be seen as

GWZA

where
G = Granularity
W = Wide (0 means a 16-bit segment - yes, even in Pmode)
Z = Zero
A = Available for system programmer to use

That bit must be zero on the 286. Even on 386+, however, with W = 0 you
would still have a 16-bit segment. The default size of addresses and
operations would be 16-bit and I gather it would operate just as Real
Mode does except when segment registers are written to.
Post by wolfgang kern
but write to a data segment register invokes UNREAL mode.
OK.
Post by wolfgang kern
the only thing which is different with a set PE-bit is interpretation
changes from segment to descriptor but only when segreg is written to.
Yes, that's what I was suggesting. And it would only matter when a
segreg is loaded.
Post by wolfgang kern
   mov ds,[variable]      ;the var is still a real mode DS address
                          ;and DS became a descriptor after this.
   mov esp,[cs:d16]       ;uses the current CS range
Post by James Harris
But if I am right then it's amazing how little changes in the CPU
between each step.
I see only one beside the final jump.
As long CS remain untouched there are no privilege checks, so it acts
like in real mode for ALL "otherwise protected" instructions.
That's why I said the change occur only on write CS.
OK I forgot the UNREAL exception here even I use that a lot.
Even with no change to CS wouldn't there be protection checks on data
accesses which have had their segment registers loaded? For example,
consider loading just ES as in

lmsw ax ;Switch to Pmode
jmp $ + 2
mov ax, 16
mov es, ax

After that I'd suggest that even though CS has not been reloaded

mov al, [es: val]

would include protection and range checks.

Furthermore, you could consider that accesses off DS would also include
checks but that the internal descriptor would have the limit set to
0xffff so nothing would be out of range.

Put another way, PM16 would look very like RM but it would not be RM
because loads of segment registers would have a different meaning; and
the only difference between RM and PM16 would be in how loads of segment
registers are carried out.

You say the CPU would still be operating in Rmode. I am putting forward
a suggestion that it would, in fact, be operating in Pm16 but that the
two are so similar that a lot of code would operate as though the CPU
were still in Rmode.

It makes sense! :-)
--
James Harris
wolfgang kern
2022-02-12 21:31:20 UTC
Permalink
Post by James Harris
Post by wolfgang kern
...
Post by James Harris
Post by wolfgang kern
2. setting PE does nothing on its own, the CPU remain in real mode until
    the far jump which changes interpretation from segment to descriptor.
    and its a 16:16 code without prefix
I am not sure that's right, Wolfgang. I am beginning to think that
once PE is set the processor will be in 16-bit Protected Mode (PM16);
in that mode the encoding of instructions will be identical to RM;
and the main differences will be when loading segment registers.
There may also be some differences when /using/ segment registers but
see below.
...
Post by James Harris
   mov ds, ax
   ... DS now has base, limit and protections as loaded from GDT
That's it. Feel free to disagree.
Yeah you're partly right :) the CPU isn't in PM unless you alter CS.
Why could it not be in PM running a 16-bit code segment?
Remember the isolated nibble in the IA32 descriptor just after
Base31::24. Its bits can be seen as
  GWZA
where
  G = Granularity
  W = Wide (0 means a 16-bit segment - yes, even in Pmode)
  Z = Zero
  A = Available for system programmer to use
That bit must be zero on the 286. Even on 386+, however, with W = 0 you
would still have a 16-bit segment. The default size of addresses and
operations would be 16-bit and I gather it would operate just as Real
Mode does except when segment registers are written to.
I could not see any protection on Data-ranges while in UnReal mode.
address beyond limits just wrap around (as it does within RM).
My Unreal DS is flat 4GB, but my stack is only 64K to match RM stack.
Post by James Harris
Post by wolfgang kern
but write to a data segment register invokes UNREAL mode.
OK.
Post by wolfgang kern
the only thing which is different with a set PE-bit is interpretation
changes from segment to descriptor but only when segreg is written to.
Yes, that's what I was suggesting. And it would only matter when a
segreg is loaded.
Post by wolfgang kern
As long CS remain untouched there are no privilege checks, so it acts
like in real mode for ALL "otherwise protected" instructions.
That's why I said the change occur only on write CS.
OK I forgot the UNREAL exception here even I use that a lot.
Even with no change to CS wouldn't there be protection checks on data
accesses which have had their segment registers loaded? For example,
consider loading just ES as in
  lmsw ax   ;Switch to Pmode
  jmp $ + 2
  mov ax, 16
  mov es, ax
After that I'd suggest that even though CS has not been reloaded
  mov al, [es: val]
would include protection and range checks.
I can't confirm this, but to be honest I never tried by intention,
and I figured a typo with DS==0x2B instead of 0x28 after many years
w/o causing any access restrictions.
Post by James Harris
Furthermore, you could consider that accesses off DS would also include
checks but that the internal descriptor would have the limit set to
0xffff so nothing would be out of range.
You could setup smaller than 64K limits on Unreal Data segments.
this might raise a real mode exception because still in RM :)
Post by James Harris
Put another way, PM16 would look very like RM but it would not be RM
because loads of segment registers would have a different meaning; and
the only difference between RM and PM16 would be in how loads of segment
registers are carried out.
You say the CPU would still be operating in Rmode. I am putting forward
a suggestion that it would, in fact, be operating in Pm16 but that the
two are so similar that a lot of code would operate as though the CPU
were still in Rmode.
try forcing exceptions in Unreal mode to see which handlers react.
Post by James Harris
It makes sense! :-)
OK, some very old stuff may work different...
__
wolfgang
James Harris
2022-02-13 15:50:40 UTC
Permalink
...
Post by wolfgang kern
Post by James Harris
Why could it not be in PM running a 16-bit code segment?
...
Post by wolfgang kern
Post by James Harris
   G = Granularity
   W = Wide (0 means a 16-bit segment - yes, even in Pmode)
   Z = Zero
   A = Available for system programmer to use
That bit must be zero on the 286. Even on 386+, however, with W = 0
you would still have a 16-bit segment. The default size of addresses
and operations would be 16-bit and I gather it would operate just as
Real Mode does except when segment registers are written to.
I could not see any protection on Data-ranges while in UnReal mode.
address beyond limits just wrap around (as it does within RM).
My Unreal DS is flat 4GB, but my stack is only 64K to match RM stack.
Not sure what you mean but AIUI the B bit (big bit) of the SS descriptor
selects the size of stack pointer (32-bit ESP or 16-bit SP) used for
implicit stack references.

Rather than having all segments 32-bit or all segments 16-bit it is
looking more and more likely that a programmer could use any arbitrary
mix of 16-bit and 32-bit segments - even on current processors - so
having a 'big' code segment would make operands and addresses default to
32-bit while simultaneously having a 'small' stack segment would make
implicit stack references use SP rather than ESP.

Further, loading CS with a selector for a 32-bit ('big') descriptor will
only affect the code segment. One or more data segments could still be
16-bit.

All this would make Real mode little more than a subset of Protected
mode. Or, put another way, one could say that Real mode *is* Protected
mode with:

1. certain values in the segment descriptors
2. different rules as to what it means to load a segment register

and very little else.

...
Post by wolfgang kern
Post by James Harris
   lmsw ax   ;Switch to Pmode
   jmp $ + 2
   mov ax, 16
   mov es, ax
After that I'd suggest that even though CS has not been reloaded
   mov al, [es: val]
would include protection and range checks.
I can't confirm this, but to be honest I never tried by intention,
and I figured a typo with DS==0x2B instead of 0x28 after many years
w/o causing any access restrictions.
OK. Using 2B rather than 28 (and I can see why they might be confused
visually!) would set the bottom two bits which I think would give that
selector user privilege rather than supervisor privilege.
Post by wolfgang kern
Post by James Harris
Furthermore, you could consider that accesses off DS would also
include checks but that the internal descriptor would have the limit
set to 0xffff so nothing would be out of range.
You could setup smaller than 64K limits on Unreal Data segments.
this might raise a real mode exception because still in RM  :)
Or PM16. :-)

Whoever wrote the Wikipedia article on Unreal mode seems to back up my
supposition. After saying that Unreal mode is not really a separate
addressing mode it says:

... the 80286 and all later x86 processors use the base address, size
and other attributes stored in their internal segment descriptor cache
whenever computing effective memory addresses, even in real mode.

https://en.wikipedia.org/wiki/Unreal_mode
--
James Harris
wolfgang kern
2022-02-13 21:14:11 UTC
Permalink
On 13/02/2022 16:50, James Harris wrote:
...
Post by James Harris
Post by wolfgang kern
Post by James Harris
Why could it not be in PM running a 16-bit code segment?
at which address would you see such PM16 code?
while trueRM is limited to FFFF:FFFF (aka HMA minus 16),
PM16 blocks can reside anywhere within 4GB.
...
Post by James Harris
Post by wolfgang kern
I could not see any protection on Data-ranges while in UnReal mode.
address beyond limits just wrap around (as it does within RM).
My Unreal DS is flat 4GB, but my stack is only 64K to match RM stack.
Not sure what you mean but AIUI the B bit (big bit) of the SS descriptor
selects the size of stack pointer (32-bit ESP or 16-bit SP) used for
implicit stack references.
I use esp with upper half zeroed and 64k limit in the HMA for both RM
and PM32 because all my code in all modes share one single stack.
Post by James Harris
Rather than having all segments 32-bit or all segments 16-bit it is
looking more and more likely that a programmer could use any arbitrary
mix of 16-bit and 32-bit segments - even on current processors - so
having a 'big' code segment would make operands and addresses default to
32-bit while simultaneously having a 'small' stack segment would make
implicit stack references use SP rather than ESP.
I have 16 bit RM code alive beside PM16, PM32 and LM64.
*true RM for BIOS calls and fastest hardware needs.
*UnReal during startup and mode switches
*PM16 as intermediate links and one protected core
*PM32 all OS specific
*LM64 except for the switches only used for user data.
Post by James Harris
Further, loading CS with a selector for a 32-bit ('big') descriptor will
only affect the code segment. One or more data segments could still be
16-bit.
been there :) it crashed if you leave data selectors RM styled.
Post by James Harris
All this would make Real mode little more than a subset of Protected
mode. Or, put another way, one could say that Real mode *is* Protected
1. certain values in the segment descriptors
2. different rules as to what it means to load a segment register
just a point of view matter ?
....
Post by James Harris
Post by wolfgang kern
Post by James Harris
   mov al, [es: val]
would include protection and range checks.
I can't confirm this, but to be honest I never tried by intention,
and I figured a typo with DS==0x2B instead of 0x28 after many years
w/o causing any access restrictions.
OK. Using 2B rather than 28 (and I can see why they might be confused
visually!) would set the bottom two bits which I think would give that
selector user privilege rather than supervisor privilege.
Post by wolfgang kern
Post by James Harris
Furthermore, you could consider that accesses off DS would also
include checks but that the internal descriptor would have the limit
set to 0xffff so nothing would be out of range.
You could setup smaller than 64K limits on Unreal Data segments.
this might raise a real mode exception because still in RM  :)
Or PM16. :-)
:) look at the AMD pages which lists RM<>PM instruction differences,
and then check on a few to see if it is in RM or PM16.
hint: some instructions are privileged and a few aren't allowed.
Post by James Harris
Whoever wrote the Wikipedia article on Unreal mode seems to back up my
supposition. After saying that Unreal mode is not really a separate
... the 80286 and all later x86 processors use the base address, size
and other attributes stored in their internal segment descriptor cache
whenever computing effective memory addresses, even in real mode.
https://en.wikipedia.org/wiki/Unreal_mode
sure, this internal descriptors defaults to RM limits.
But too many gadgeteers and wannabees made wiki no more trustworthy,
all is pretty dated there anyway.
__
wolfgang
James Harris
2022-02-14 16:10:07 UTC
Permalink
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
Post by James Harris
Why could it not be in PM running a 16-bit code segment?
at which address would you see such PM16 code?
Within range 0 to 64k. See below.
Post by wolfgang kern
while trueRM is limited to FFFF:FFFF (aka HMA minus 16),
PM16 blocks can reside anywhere within 4GB.
I don't think so: PM16 is limited to 64k code segments. The following is
from the 80286 documentation.

The programmer views the virtual address space on the 80286 as a
collection of up to sixteen thousand linear subspaces, each with a
specified size or length. Each of these linear address spaces is called
a segment. A segment is a logical unit of contiguous memory. Segment
sizes may range from one byte up to 64K (65,536) bytes.

...
Post by wolfgang kern
Post by James Harris
Further, loading CS with a selector for a 32-bit ('big') descriptor
will only affect the code segment. One or more data segments could
still be 16-bit.
been there :) it crashed if you leave data selectors RM styled.
It should work, AFAICS.
Post by wolfgang kern
Post by James Harris
All this would make Real mode little more than a subset of Protected
mode. Or, put another way, one could say that Real mode *is* Protected
1. certain values in the segment descriptors
2. different rules as to what it means to load a segment register
just a point of view matter ?
To me this is more about gaining an insight into what is likely
happening inside the processor, and thereby making it easier to understand.

...
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
Post by James Harris
Furthermore, you could consider that accesses off DS would also
include checks but that the internal descriptor would have the limit
set to 0xffff so nothing would be out of range.
You could setup smaller than 64K limits on Unreal Data segments.
this might raise a real mode exception because still in RM  :)
Or PM16. :-)
:) look at the AMD pages which lists RM<>PM instruction differences,
and then check on a few to see if it is in RM or PM16.
hint: some instructions are privileged and a few aren't allowed.
What about RM code? ISTM that a lot of RM code could work unchanged in PM16.
--
James Harris
wolfgang kern
2022-02-16 03:01:37 UTC
Permalink
Post by James Harris
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
Post by James Harris
Why could it not be in PM running a 16-bit code segment?
at which address would you see such PM16 code?
Within range 0 to 64k. See below.
this would mean starting from 0000:0000
but what value will the CS descriptor have then ?
Post by James Harris
Post by wolfgang kern
while trueRM is limited to FFFF:FFFF (aka HMA minus 16),
PM16 blocks can reside anywhere within 4GB.
I don't think so: PM16 is limited to 64k code segments. The following is
from the 80286 documentation.
The programmer views the virtual address space on the 80286 as a
collection of up to sixteen thousand linear subspaces, each with a
specified size or length. Each of these linear address spaces is called
a segment. A segment is a logical unit of contiguous memory. Segment
sizes may range from one byte up to 64K (65,536) bytes.
me too owe ye olde 286 manuals.
but later stuff said:

Protected Mode.
In this mode, the processor supports virtual-memory and physical-memory
spaces of 4 Gbytes and operand sizes of 16 or 32 bits. All segment
translation, segment protection, and hardware multitasking functions are
available. System software can use segmentation to relocate effective
addresses in virtual-address space. If paging is not enabled, virtual
addresses are equal to physical addresses. Paging can be optionally
enabled to allow translation of virtual addresses to physical addresses
and to use the page-based memory-protection mechanisms.

I once tested and set the base of an PM16 code descriptor to 3rd GB.
it worked but there was no much sense in doing that.
Post by James Harris
Post by wolfgang kern
Post by James Harris
Further, loading CS with a selector for a 32-bit ('big') descriptor
will only affect the code segment. One or more data segments could
still be 16-bit.
been there :) it crashed if you leave data selectors RM styled.
It should work, AFAICS.
this BIG REAL mode (had been tried by a few brave enough) suffers a lot.
it can't use INT, IRQ and EXC by normal means. the workaround is a PITA.
Post by James Harris
Post by wolfgang kern
Post by James Harris
All this would make Real mode little more than a subset of Protected
mode. Or, put another way, one could say that Real mode *is*
1. certain values in the segment descriptors
2. different rules as to what it means to load a segment register
just a point of view matter ?
I see it the other way: PM came a long while after RM.
Post by James Harris
To me this is more about gaining an insight into what is likely
happening inside the processor, and thereby making it easier to understand.
But not much to learn if you're stuck with 286 ...:)
Post by James Harris
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
Post by James Harris
Furthermore, you could consider that accesses off DS would also
include checks but that the internal descriptor would have the
limit set to 0xffff so nothing would be out of range.
You could setup smaller than 64K limits on Unreal Data segments.
this might raise a real mode exception because still in RM  :)
Or PM16. :-)
:) look at the AMD pages which lists RM<>PM instruction differences,
and then check on a few to see if it is in RM or PM16.
hint: some instructions are privileged and a few aren't allowed.
What about RM code? ISTM that a lot of RM code could work unchanged in PM16.
again: look at privileged instructions, work in RM but may fail in PM.
__
wolfgang
James Harris
2022-02-21 07:03:57 UTC
Permalink
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
Post by James Harris
Why could it not be in PM running a 16-bit code segment?
at which address would you see such PM16 code?
Within range 0 to 64k. See below.
this would mean starting from 0000:0000
but what value will the CS descriptor have then ?
Your question may be rhetorical but if not then I'd say that the CS
descriptor would have the D bit (the B bit by any other name) as zero.
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
while trueRM is limited to FFFF:FFFF (aka HMA minus 16),
PM16 blocks can reside anywhere within 4GB.
I don't think so: PM16 is limited to 64k code segments. The following
is from the 80286 documentation.
The programmer views the virtual address space on the 80286 as a
collection of up to sixteen thousand linear subspaces, each with a
specified size or length. Each of these linear address spaces is
called a segment. A segment is a logical unit of contiguous memory.
Segment sizes may range from one byte up to 64K (65,536) bytes.
me too owe ye olde 286 manuals.
If you want to find definitions of ye olde PM16 then ye olde manuals are
the place to look. :)
Post by wolfgang kern
Protected Mode.
In this mode, the processor supports virtual-memory and physical-memory
spaces of 4 Gbytes and operand sizes of 16 or 32 bits. All segment
translation, segment protection, and hardware multitasking functions are
available. System software can use segmentation to relocate effective
addresses in virtual-address space. If paging is not enabled, virtual
addresses are equal to physical addresses. Paging can be optionally
enabled to allow translation of virtual addresses to physical addresses
and to use the page-based memory-protection mechanisms.
I can't see the relevance of that. If it's not clear, the point in what
I posted from the 80286 manual was:

"Segment sizes may range from one byte up to 64K (65,536) bytes."

IOW PM16 segments cannot be larger than 64k so when you say that PM16
blocks can be anywhere in 4GB then I don't think that can be right. It
is true of what's called Unreal Mode (386 and above) but not of PM16
(286 and above).

If you mean PM16 "segments" then I think that still cannot be right.
PM16 is limited to 24 bits.

...
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
Post by James Harris
All this would make Real mode little more than a subset of Protected
mode. Or, put another way, one could say that Real mode *is*
1. certain values in the segment descriptors
2. different rules as to what it means to load a segment register
just a point of view matter ?
I see it the other way: PM came a long while after RM.
Post by James Harris
To me this is more about gaining an insight into what is likely
happening inside the processor, and thereby making it easier to understand.
But not much to learn if you're stuck with 286 ...:)
Well, don't modern CPUs still support the 80286's PM16, and isn't PM16
the mode that they are switched in to when CR0.PE is set before any
segment register is loaded?
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
Post by James Harris
Furthermore, you could consider that accesses off DS would also
include checks but that the internal descriptor would have the
limit set to 0xffff so nothing would be out of range.
You could setup smaller than 64K limits on Unreal Data segments.
this might raise a real mode exception because still in RM  :)
Or PM16. :-)
:) look at the AMD pages which lists RM<>PM instruction differences,
and then check on a few to see if it is in RM or PM16.
hint: some instructions are privileged and a few aren't allowed.
What about RM code? ISTM that a lot of RM code could work unchanged in PM16.
again: look at privileged instructions, work in RM but may fail in PM.
RM would have to work very hard to have privileged instructions. ;-)

Everything I've found recently suggests that after CR0.PE is set (and
the prefetch queue flushed if necessary) the processor will be in
Protected Mode whereas you, IIUC, are sure it would remain in Real Mode
until CS is reloaded.

If so, can you think of an instruction or sequence which would
distinguish between the two?
--
James Harris
wolfgang kern
2022-02-21 14:33:55 UTC
Permalink
On 21/02/2022 08:03, James Harris wrote:
...
Post by James Harris
Post by James Harris
What about RM code? ISTM that a lot of RM code could work unchanged
Post by James Harris
in PM16.
again: look at privileged instructions, work in RM but may fail in PM.
RM would have to work very hard to have privileged instructions. ;-)
Everything I've found recently suggests that after CR0.PE is set (and
the prefetch queue flushed if necessary) the processor will be in
Protected Mode whereas you, IIUC, are sure it would remain in Real Mode
until CS is reloaded.
If so, can you think of an instruction or sequence which would
distinguish between the two?
give me some time to check in deep detail, I'll be back on this soon.
__
wolfgang
wolfgang kern
2022-02-21 17:18:17 UTC
Permalink
Post by wolfgang kern
Post by James Harris
If so, can you think of an instruction or sequence which would
distinguish between the two?
give me some time to check in deep detail, I'll be back on this soon.
here we go:
this works in RM as long it points to an far RETURN:
you can test this yourself
9A xxxx 0000 call far 0000:xxxx ;raise exception if PM

PM only (raise invalid opcode exception in RM):
63 xx ARPL (seems obsolete)
0F 00 /1 STR an easy test possibility ie:
0F 00 c8 STR ax |eax |rax
0F 00 08 xx STR [mem16] ; all except RM

0F 00 /0 SLDT r16/r32/r64/m16
0F 00 /2 LLDT rm16

0F 00 /3 LTR
0F 02 /r LAR
0F 03 /r,[m] LSL

these work in UNreal after correct loaded
0F 00 /4 VERR
0F 00 /5 VERW
__
wolfgang
James Harris
2022-02-21 18:35:22 UTC
Permalink
Post by wolfgang kern
Post by wolfgang kern
Post by James Harris
If so, can you think of an instruction or sequence which would
distinguish between the two?
give me some time to check in deep detail, I'll be back on this soon.
you can test this yourself
9A xxxx 0000 call far 0000:xxxx ;raise exception if PM
63 xx        ARPL  (seems obsolete)
0F 00 c8     STR ax |eax |rax
0F 00 08 xx  STR [mem16] ; all except RM
0F 00 /0     SLDT r16/r32/r64/m16
0F 00 /2     LLDT rm16
0F 00 /3     LTR
0F 02 /r     LAR
0F 03 /r,[m] LSL
these work in UNreal after correct loaded
0F 00 /4     VERR
0F 00 /5     VERW
Sorry, I wasn't clear enough. What I was thinking about was a legitimate
instruction or sequence thereof which would work differently in RM and
PM16. ARPL and similar are not RM instructions.

For example, here's an instruction which is legitimate in both modes but
executes differently:

mov ds, ax

In PM16 it loads the hidden part of DS from the appropriate descriptor
table. In RM it doesn't.

What I am trying to prove is that once CR0.PE is set (and prefetch
queues flushed) the processor will be in PM, not RM, even _before_ CS is
reloaded.

AISI the different behaviour of MOV DS, AX shows that the processor /is/
in Protected Mode, not Real Mode, even if it's PM16 rather than PM32.

IIRC you thought the processor would still be in RM. If I'm wrong,
therefore, perhaps you can think of another instruction (a legitimate
one) or sequence which could be executed after setting CR0.PE but before
loading CS which would tell whether the CPU was in RM or PM.
--
James Harris
wolfgang kern
2022-02-21 20:42:03 UTC
Permalink
Post by James Harris
Post by wolfgang kern
Post by wolfgang kern
Post by James Harris
If so, can you think of an instruction or sequence which would
distinguish between the two?
give me some time to check in deep detail, I'll be back on this soon.
you can test this yourself
9A xxxx 0000 call far 0000:xxxx ;raise exception if PM
63 xx        ARPL  (seems obsolete)
0F 00 c8     STR ax |eax |rax
0F 00 08 xx  STR [mem16] ; all except RM
0F 00 /0     SLDT r16/r32/r64/m16
0F 00 /2     LLDT rm16
0F 00 /3     LTR
0F 02 /r     LAR
0F 03 /r,[m] LSL
these work in UNreal after correct loaded
0F 00 /4     VERR
0F 00 /5     VERW
Sorry, I wasn't clear enough. What I was thinking about was a legitimate
instruction or sequence thereof which would work differently in RM and
PM16. ARPL and similar are not RM instructions.
For example, here's an instruction which is legitimate in both modes but
  mov ds, ax
In PM16 it loads the hidden part of DS from the appropriate descriptor
table. In RM it doesn't.
What I am trying to prove is that once CR0.PE is set (and prefetch
queues flushed) the processor will be in PM, not RM, even _before_ CS is
reloaded.
AISI the different behaviour of MOV DS, AX shows that the processor /is/
in Protected Mode, not Real Mode, even if it's PM16 rather than PM32.
IIRC you thought the processor would still be in RM. If I'm wrong,
therefore, perhaps you can think of another instruction (a legitimate
one) or sequence which could be executed after setting CR0.PE but before
loading CS which would tell whether the CPU was in RM or PM.
assume or make your RM-CS 07c0 before the switch
and try this after it:
push cs
pop ax ;ax show the RM-segment and nothing else

or try self-modify and check where the change happens:
mov word [cs:00FE],31c8 ;or whatsoever. might crash if PM
__
wolfgang
James Harris
2022-02-22 10:12:19 UTC
Permalink
...
Post by wolfgang kern
Post by James Harris
What I am trying to prove is that once CR0.PE is set (and prefetch
queues flushed) the processor will be in PM, not RM, even _before_ CS
is reloaded.
...
Post by wolfgang kern
assume or make your RM-CS 07c0 before the switch
 push cs
 pop ax   ;ax show the RM-segment and nothing else
If the /user-visible/ part of CS is 07c0 then wouldn't that end up in AX
in either mode?
Post by wolfgang kern
mov word [cs:00FE],31c8  ;or whatsoever. might crash if PM
I can't see how modifying the instruction stream would do anything. RM
encodings are valid in PM16!

Consider an instruction such as

mov ds, ax

In Protected Mode that would do

DS.base = from descriptor
DS.limit = from descriptor
DS.access_rights = from descriptor

Wouldn't it make sense for the same instruction in Real Mode to do as
follows?

DS.base = AX shl 4
DS.limit = 0xffff
DS.access_rights = unrestricted

Then the same architectural parts (the hidden parts) could be used in
either RM or PM. That would keep the hardware design simpler and more
consistent than having two entirely separate modes.

In fact, surely the so-called Unreal Mode only works because the CPU
uses the hidden parts of the segment registers at all times - even when
in Real Mode (PE=0).
--
James Harris
wolfgang kern
2022-02-24 17:51:41 UTC
Permalink
Post by James Harris
...
Post by wolfgang kern
Post by James Harris
What I am trying to prove is that once CR0.PE is set (and prefetch
queues flushed) the processor will be in PM, not RM, even _before_ CS
is reloaded.
...
Post by wolfgang kern
assume or make your RM-CS 07c0 before the switch
  push cs
  pop ax   ;ax show the RM-segment and nothing else
If the /user-visible/ part of CS is 07c0 then wouldn't that end up in AX
in either mode?
Only in RM but not within PM, here AX would show a descriptor value.
Post by James Harris
Post by wolfgang kern
mov word [cs:00FE],31c8  ;or whatsoever. might crash if PM
I can't see how modifying the instruction stream would do anything. RM
encodings are valid in PM16!
OK this wasn't a good example, I meant it as a crash test because
exceptions work quite different.
Post by James Harris
Consider an instruction such as
  mov ds, ax
In Protected Mode that would do
    DS.base = from descriptor
    DS.limit = from descriptor
    DS.access_rights = from descriptor
Wouldn't it make sense for the same instruction in Real Mode to do as
follows?
    DS.base = AX shl 4
    DS.limit = 0xffff
    DS.access_rights = unrestricted
Then the same architectural parts (the hidden parts) could be used in
either RM or PM. That would keep the hardware design simpler and more
consistent than having two entirely separate modes.
X86 grew up in large steps, so we see historical remains here and there.
Post by James Harris
In fact, surely the so-called Unreal Mode only works because the CPU
uses the hidden parts of the segment registers at all times - even when
in Real Mode (PE=0).
yes, Unreal may not be designed by intention, but it became handy :)
__
wolfgang
James Harris
2022-02-25 15:31:00 UTC
Permalink
Post by wolfgang kern
Post by James Harris
...
Post by wolfgang kern
Post by James Harris
What I am trying to prove is that once CR0.PE is set (and prefetch
queues flushed) the processor will be in PM, not RM, even _before_
CS is reloaded.
...
Post by wolfgang kern
assume or make your RM-CS 07c0 before the switch
  push cs
  pop ax   ;ax show the RM-segment and nothing else
If the /user-visible/ part of CS is 07c0 then wouldn't that end up in
AX in either mode?
Only in RM but not within PM, here AX would show a descriptor value.
Wouldn't AX hold the selector - 16 bits? Descriptors are too wide for AX.

In fact, there's a diagram for early CPUs which I've not seen for later
ones. It sets out the /internal/ structure of segment registers as they
were at the time. It shows CS, DS, etc as 64-bit where the top 16 bits
are the visible selector. Full contents:

* 16 bits: selector <-- the bits we see in the segreg
* 8 bits: access rights
* 24 bits: base address (24 being all that was needed in days of yore)
* 16 bits: segment size (by which I think they mean 'limit')

That's from Figure 6·8 Memory Management Registers of 80286 AND 80287
PROGRAMMER'S REFERENCE MANUAL 1987.

These days they are a little wider but a "MOV AX,DS" instruction will
still copy just the 16-bit selector field to AX.

The converse "MOV DS,AX" instruction will copy AX to the DS selector
field and, in PM, load the other fields from the descriptor table
whereas in RM it will load the base with 'selector shl 4' (see below).

...
Post by wolfgang kern
Post by James Harris
Consider an instruction such as
   mov ds, ax
In Protected Mode that would do
     DS.base = from descriptor
     DS.limit = from descriptor
     DS.access_rights = from descriptor
Wouldn't it make sense for the same instruction in Real Mode to do as
follows?
     DS.base = AX shl 4
     DS.limit = 0xffff
     DS.access_rights = unrestricted
It turns out to be even simpler. According to the document below, in
Real Mode Intel CPUs load only the base, leaving limit and access_rights
unchanged.

To quote: "In real mode, when a segment register is loaded, only the
base field is changed, in particular the value placed into the base is
selector*16."
Post by wolfgang kern
Post by James Harris
Then the same architectural parts (the hidden parts) could be used in
either RM or PM. That would keep the hardware design simpler and more
consistent than having two entirely separate modes.
X86 grew up in large steps, so we see historical remains here and there.
Post by James Harris
In fact, surely the so-called Unreal Mode only works because the CPU
uses the hidden parts of the segment registers at all times - even
when in Real Mode (PE=0).
yes, Unreal may not be designed by intention, but it became handy :)
Yes, and Unreal Mode shows that CPUs use PM mechanisms AT ALL TIMES,
even when they are running in what we call "Real Mode".

I found a great writeup of Unreal Mode at

http://www.os2museum.com/wp/a-brief-history-of-unreal-mode/

It includes sources which make clear that in Real Mode only the base
address can be changed so, as it says, "8086-compatible attributes must
be loaded into the data segment descriptor registers before switching to
real mode".
--
James Harris
James Harris
2022-02-25 16:30:14 UTC
Permalink
On 25/02/2022 15:31, James Harris wrote:

...
Post by James Harris
Yes, and Unreal Mode shows that CPUs use PM mechanisms AT ALL TIMES,
even when they are running in what we call "Real Mode".
This has been a very revealing foray into what has to go on inside a CPU
in each step of enabling PMode. The surprising thing is how simple each
step is.

Here's a summary of where I've got to:

lgdt

That loads one register, the GDTR, and so tells the CPU where to look
for the Global Descriptor Table when we load a segment register in PMode.

cli

That sets a single bit in EFLAGS. It's recommended because we are about
to change /how interrupts are handled/ and so don't want an interrupt to
fire asynchronously mid change.

or eax, 1
mov cr0, eax

That sets a single bit in CR0 (the PE bit). Setting that bit _fully_
enables PMode. What magic does it do? What else does the CPU do at this
point other than set the bit? AFAICS, absolutely nothing! Zilch! Nada!

What that bit being set does, however, is affect future behaviour. Once
the bit is set the processor will respond differently in certain cases
such as loading a segment register or responding to an interrupt. But
nothing other than the bit changes immediately, and that bit being set
means the CPU /is/ in PMode.

jmp flushed

That jump (typically to the next instruction) changes no architectural
state and it's not even needed on modern CPUs. All it does is flush the
prefetch queue on early processors which don't do so automatically. Most
decodes are the same in either mode so it would likely only matter if a
segment register were to be loaded in the next few instructions. (But a
jump is, of course, still right to include in all cases for portability.)

As said, we will already be in PMode but it will be PM16 and we
typically want to use 32-bit accesses.

mov ds, bx

That would load DS with whatever base, limit and control bits we have
put in the selected descriptor - typically changing the limit field of
the segment from 0xffff to 0xffff_ffff although if we update DS before
CS then accesses to the segment would still, by default, use 16-bit
offsets: it doesn't make a lot of sense to load DS only but I put it
first, here, to emphasise that it is possible.

jmp codeseg:pm32start

That would load CS with whatever base, limit and control bits we have
put in the selected descriptor. (IOW essentially the same as loading a
data segment register.)

bits 32
pm32start:

I add those just to emphasise that although we would be in PMode as soon
as PE is set it would be PM16 and it's only from the label that the
instruction stream would be 32-bit PM32.


While in PMode we can switch in either direction between 16-bit and
32-bit code (both still in PMode, i.e. we can switch between PM16 and
PM32) simply by loading CS (using jump, call or task switch) with a
descriptor which has the D bit clear or set, as required.

CS.D sets the default address and operand sizes. SS.B sets whether SP or
ESP is used in stack operations such as pushes, calls, returns and pops.
And the B bit on other segment registers (such as DS.B) is ignored.


The above all comes with disclaimers but it's simple and flexible,
consistent with documentation, and it makes sense because as per Occam's
Razor there's no need to make things any more complex. So I offer it as,
if nothing else, a potential way to understand the internals of the
transition to PM32.
--
James Harris
wolfgang kern
2022-02-25 18:10:28 UTC
Permalink
...
Post by James Harris
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
assume or make your RM-CS 07c0 before the switch
  push cs
  pop ax   ;ax show the RM-segment and nothing else
If the /user-visible/ part of CS is 07c0 then wouldn't that end up in
AX in either mode?
Only in RM but not within PM, here AX would show a descriptor value.
Wouldn't AX hold the selector - 16 bits? Descriptors are too wide for AX.
the selector for a descriptor, so yes.
Post by James Harris
In fact, there's a diagram for early CPUs which I've not seen for later
ones. It sets out the /internal/ structure of segment registers as they
were at the time. It shows CS, DS, etc as 64-bit where the top 16 bits
* 16 bits: selector        <-- the bits we see in the segreg
*  8 bits: access rights
* 24 bits: base address (24 being all that was needed in days of yore)
* 16 bits: segment size (by which I think they mean 'limit')
That's from Figure 6·8 Memory Management Registers of 80286 AND 80287
PROGRAMMER'S REFERENCE MANUAL 1987.
now try to read more recent stuff:
a selector is 16 bit wide but lowest 3 bits aren't use for selecting.
and as zero isn't allowed, 08 is the first valid value.
a descriptor (is a GDT-entry) and contain 64 byte (let aside LM64 yet):

So a selector addresses an GDT entrys just by its value (ANDed F8).
data and code descriptors are defined as:

00 limit 0..15
02 base 0..23
05 |P|DPL|01|E|W|A| for data |P|DPL|11|C|R|A| for code
other types have this different
06 |G|B|0|X| limit 15..19
07 base 24..31

I once posted my descriptor page in CLAX, lost it on defective PC.
Post by James Harris
These days they are a little wider but a "MOV AX,DS" instruction will
still copy just the 16-bit selector field to AX.
Oh, yes.
...
Post by James Harris
Post by wolfgang kern
X86 grew up in large steps, so we see historical remains here and there.
Post by James Harris
In fact, surely the so-called Unreal Mode only works because the CPU
uses the hidden parts of the segment registers at all times - even
when in Real Mode (PE=0).
yes, Unreal may not be designed by intention, but it became handy :)
Yes, and Unreal Mode shows that CPUs use PM mechanisms AT ALL TIMES,
even when they are running in what we call "Real Mode".
If it would be in PM then all the PM instructions I listed earlier would
not crash or raise exceptions. Go figure :)
Post by James Harris
I found a great writeup of Unreal Mode at
  http://www.os2museum.com/wp/a-brief-history-of-unreal-mode/
It includes sources which make clear that in Real Mode only the base
address can be changed so, as it says, "8086-compatible  attributes must
be loaded into the data segment descriptor registers before switching to
real mode".
Yes, my switches from any PM to RM needs to reset RM limits only before
calling BIOS-functions, all my other functions wouldn't care.
__
wolfgang
wolfgang kern
2022-02-25 22:32:24 UTC
Permalink
Post by wolfgang kern
...
Post by James Harris
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
assume or make your RM-CS 07c0 before the switch
  push cs
  pop ax   ;ax show the RM-segment and nothing else
If the /user-visible/ part of CS is 07c0 then wouldn't that end up
in AX in either mode?
Only in RM but not within PM, here AX would show a descriptor value.
Wouldn't AX hold the selector - 16 bits? Descriptors are too wide for AX.
the selector for a descriptor, so yes.
Post by James Harris
In fact, there's a diagram for early CPUs which I've not seen for
later ones. It sets out the /internal/ structure of segment registers
as they were at the time. It shows CS, DS, etc as 64-bit where the top
* 16 bits: selector        <-- the bits we see in the segreg
*  8 bits: access rights
* 24 bits: base address (24 being all that was needed in days of yore)
* 16 bits: segment size (by which I think they mean 'limit')
That's from Figure 6·8 Memory Management Registers of 80286 AND 80287
PROGRAMMER'S REFERENCE MANUAL 1987.
a selector is 16 bit wide but lowest 3 bits aren't use for selecting.
and as zero isn't allowed, 08 is the first valid value.
corrected byte>>bits
Post by wolfgang kern
So a selector addresses an GDT entrys just by its value (ANDed F8).
00 limit 0..15
02 base  0..23
05 |P|DPL|01|E|W|A|  for data   |P|DPL|11|C|R|A| for code
   other types have this different
06 |G|B|0|X| limit 15..19
07 base 24..31
I once posted my descriptor page in CLAX, lost it on defective PC.
Post by James Harris
These days they are a little wider but a "MOV AX,DS" instruction will
still copy just the 16-bit selector field to AX.
Oh, yes.
...
Post by James Harris
Post by wolfgang kern
X86 grew up in large steps, so we see historical remains here and there.
Post by James Harris
In fact, surely the so-called Unreal Mode only works because the CPU
uses the hidden parts of the segment registers at all times - even
when in Real Mode (PE=0).
yes, Unreal may not be designed by intention, but it became handy :)
Yes, and Unreal Mode shows that CPUs use PM mechanisms AT ALL TIMES,
even when they are running in what we call "Real Mode".
If it would be in PM then all the PM instructions I listed earlier would
not crash or raise exceptions. Go figure :)
Post by James Harris
I found a great writeup of Unreal Mode at
   http://www.os2museum.com/wp/a-brief-history-of-unreal-mode/
It includes sources which make clear that in Real Mode only the base
address can be changed so, as it says, "8086-compatible  attributes
must be loaded into the data segment descriptor registers before
switching to real mode".
Yes, my switches from any PM to RM needs to reset RM limits only before
calling BIOS-functions, all my other functions wouldn't care.
__
wolfgang
James Harris
2022-02-26 17:04:56 UTC
Permalink
...
Post by wolfgang kern
Post by James Harris
* 16 bits: selector        <-- the bits we see in the segreg
*  8 bits: access rights
* 24 bits: base address (24 being all that was needed in days of yore)
* 16 bits: segment size (by which I think they mean 'limit')
That's from Figure 6·8 Memory Management Registers of 80286 AND 80287
PROGRAMMER'S REFERENCE MANUAL 1987.
a selector is 16 bit wide but lowest 3 bits aren't use for selecting.
The parts of a selector today are

<index> <TI> <RPL>

As it happens, they were the same in olden times 286 days. :)
Post by wolfgang kern
and as zero isn't allowed, 08 is the first valid value.
For sure, 8 is the lowest selector which can be loaded into the register
in PMode although if one wanted to hack around unnecessarily ... then
one could probably get

mov ax, ds

to put a number lower than 8 (e.g. 7 in the code below) into AX even in
PMode.

I say that based on the understanding that a segment register has these
parts

<selector> <permissions> <base> <limit>

and that a load in Real Mode sets two of them: selector and base.

For example, starting in real mode

mov ax, 7
mov es, ax <--- ok in real mode, sets ES /selector and base/

mov eax, cr0
or eax, 1
mov cr0, eax <--- enter protected mode
jmp $ + 2

mov ax, es

then that should put a 7 into AX even though the CPU is in Pmode.

Just for fun. :)
Post by wolfgang kern
So a selector addresses an GDT entrys just by its value (ANDed F8).
00 limit 0..15
02 base  0..23
05 |P|DPL|01|E|W|A|  for data   |P|DPL|11|C|R|A| for code
   other types have this different
06 |G|B|0|X| limit 15..19
07 base 24..31
Yes, the descriptors I've been looking at are 64-bit and my comments are
only about RM and PM. As far as this discussion is concerned I've not
looked at LM at all.

...
Post by wolfgang kern
Post by James Harris
Yes, and Unreal Mode shows that CPUs use PM mechanisms AT ALL TIMES,
even when they are running in what we call "Real Mode".
If it would be in PM then all the PM instructions I listed earlier would
not crash or raise exceptions. Go figure :)
AIUI - at least on Intel - the instructions you listed should all
execute if PE = 1 and raise exception 6 if PE = 0. Is that not what
happens on your hardware?
--
James Harris
wolfgang kern
2022-02-26 22:13:15 UTC
Permalink
On 26/02/2022 18:04, James Harris wrote:
...
Post by James Harris
For example, starting in real mode
  mov ax, 7
  mov es, ax  <--- ok in real mode, sets ES /selector and base/
  mov eax, cr0
  or eax, 1
  mov cr0, eax  <--- enter protected mode
  jmp $ + 2
  mov ax, es
then that should put a 7 into AX even though the CPU is in Pmode.
Just for fun. :)
yes of course, even a first attempt to access with ES would crash.
I ask again what you assume to be: base limit and PL of this pre-PM.

if the base would be 0000 or 07C0 then non of my switches would work.
and why should it be 07c0 ? my switches aren't in this region.
Post by James Harris
Post by wolfgang kern
Post by James Harris
Yes, and Unreal Mode shows that CPUs use PM mechanisms AT ALL TIMES,
even when they are running in what we call "Real Mode".
If it would be in PM then all the PM instructions I listed earlier
would not crash or raise exceptions. Go figure :)
AIUI - at least on Intel - the instructions you listed should all
execute if PE = 1 and raise exception 6 if PE = 0. Is that not what
happens on your hardware?
not quite accurate, AMD-docs tell explicit that privileged instructions
work only in PM and not in UNREAL mode.
I didn't try by intention, but I remember some hard crashes during my
first attempts (>30 years ago) on making mixed modes work.

So between setting PE and loading CS there is the UNREAL story where
the CPU behave partly as PM (seg-regs) but wont execute PM only code.
__
wolfgang
James Harris
2022-03-05 15:41:41 UTC
Permalink
Post by wolfgang kern
...
Post by James Harris
For example, starting in real mode
   mov ax, 7
   mov es, ax  <--- ok in real mode, sets ES /selector and base/
   mov eax, cr0
   or eax, 1
   mov cr0, eax  <--- enter protected mode
   jmp $ + 2
   mov ax, es
then that should put a 7 into AX even though the CPU is in Pmode.
Just for fun. :)
yes of course, even a first attempt to access with ES would crash.
I ask again what you assume to be: base limit and PL of this pre-PM.
Well, I wouldn't /recommend/ writing code with such an access. It would
probably go against the specs and might be inconsistent between
processors. But if one did then here's what I think the basic mechanism
would be - as may be implemented by the CPU's engineers.

The thing to bear in mind that segment registers have multiple fields,
most of which are hidden in Rmode. One could consider ES as having these
parts:

ES.selector
ES.base
ES.limit
ES.attributes

The only part we normally see is the selector.

Given that model of a segment register, the code

mov ax, 7
mov es, ax

when run in Rmode would have the following effect

ES.selector = 7
ES.base = 112 (7 times 16)

and, importantly, _nothing else_. It would only change the selector and
the base. It would not alter the other parts of ES. Hence they would
retain the values they had before, i.e.

ES.limit = whatever was there from before
ES.attributes = whatever was there from before

Therefore, if ES.limit was large enough and ES.attributes allowed the
access then

mov ax, [es:0]

would, in the model, load AX with the contents of address 112 (the
base). But you say it would crash. Could that be because the limit
and/or the attributes did not permit the access?

Remember that although the CPU powers up in Real mode and is typically
in Real mode when we get control it has probably taken a trip to Pmode
and back since being booted. If nothing else, the BIOS has to change to
Pmode (set PE = 1) so it can check the RAM.

In Pmode, the BIOS will have been able to control what is loaded into
base, limit and attributes of the segment registers. We don't know what
that will have been but we can guess at what's likely. When it changes
back to RM to boot our code it will probably have set

limit = 64k
attributes = writable, present, expand up, byte granularity, etc

With those values the segment register's hidden parts will be correctly
set for Rmode operation. In Rmode, loads of segment registers will only
have to change selector and base.
Post by wolfgang kern
if the base would be 0000 or 07C0 then non of my switches would work.
and why should it be 07c0 ?  my switches aren't in this region.
In Rmode moving 7 into ES would set the base to 112, AIUI, not anything
like 0x7c0.

By contrast, in Real mode loading ES with 0x07c0 would set the base to
0x7c00 - exactly as required for RM operation.
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
Post by James Harris
Yes, and Unreal Mode shows that CPUs use PM mechanisms AT ALL TIMES,
even when they are running in what we call "Real Mode".
If it would be in PM then all the PM instructions I listed earlier
would not crash or raise exceptions. Go figure :)
AIUI - at least on Intel - the instructions you listed should all
execute if PE = 1 and raise exception 6 if PE = 0. Is that not what
happens on your hardware?
not quite accurate, AMD-docs tell explicit that privileged instructions
work only in PM and not in UNREAL mode.
I'd like to see that. I've only been looking at Intel so far. Do you
have a reference (manual and section)?
Post by wolfgang kern
I didn't try by intention, but I remember some hard  crashes during my
first attempts (>30 years ago) on making mixed modes work.
So between setting PE and loading CS there is the UNREAL story where
the CPU behave partly as PM (seg-regs) but wont execute PM only code.
Well, AIUI what's called "unreal" is the opposite: being in Real mode
(PE=0) but with unusual settings for the segments - typically a 4G limit
rather than 64k.

What specifically do you mean by 'PM-only code' that the CPU won't execute?

I could see a chip manufacturer /adding/ limitations on what can be done
in each mode (as Intel have done, e.g. in detecting invalid opcodes) but
they would had had to add those on top of the basic mechanisms.
--
James Harris
wolfgang kern
2022-03-07 09:57:30 UTC
Permalink
On 05/03/2022 16:41, James Harris wrote:
...
Post by James Harris
Post by wolfgang kern
Post by James Harris
For example, starting in real mode
   mov ax, 7
   mov es, ax  <--- ok in real mode, sets ES /selector and base/
   mov eax, cr0
   or eax, 1
   mov cr0, eax  <--- enter protected mode
   jmp $ + 2
try this here:
0F 00 c8 STR ax
You win this dispute if it doesn't crash :)
Post by James Harris
Post by wolfgang kern
Post by James Harris
   mov ax, es
then that should put a 7 into AX even though the CPU is in Pmode.
Just for fun. :)
yes of course, even a first attempt to access with ES would crash.
I ask again what you assume to be: base limit and PL of this pre-PM.
Well, I wouldn't /recommend/ writing code with such an access. It would
probably go against the specs and might be inconsistent between
processors. But if one did then here's what I think the basic mechanism
would be - as may be implemented by the CPU's engineers.
A selector value 7 can't be accesses in PM (will raise an exception).
my question was about what you think is CS RM or PM limited ?
Post by James Harris
The thing to bear in mind that segment registers have multiple fields,
most of which are hidden in Rmode. One could consider ES as having these
  ES.selector
a PM selector is just a pointer into the GDT (ANDed FFF8)
stored in seg-reg but not elsewhere.
Post by James Harris
  ES.base
  ES.limit
  ES.attributes
The only part we normally see is the selector.
normally hidden? I see everything in a hex-dump of the GDT :)
Post by James Harris
Given that model of a segment register, the code
  mov ax, 7
  mov es, ax
when run in Rmode would have the following effect
  ES.selector = 7
  ES.base     = 112 (7 times 16)
and, importantly, _nothing else_. It would only change the selector and
the base. It would not alter the other parts of ES. Hence they would
retain the values they had before, i.e.
  ES.limit      = whatever was there from before
  ES.attributes = whatever was there from before
Therefore, if ES.limit was large enough and ES.attributes allowed the
access then
  mov ax, [es:0]
would, in the model, load AX with the contents of address 112 (the
base). But you say it would crash. Could that be because the limit
and/or the attributes did not permit the access?
it would crash for sure if run in PM (0007 isn't valid).
Post by James Harris
Remember that although the CPU powers up in Real mode and is typically
in Real mode when we get control it has probably taken a trip to Pmode
and back since being booted. If nothing else, the BIOS has to change to
Pmode (set PE = 1) so it can check the RAM.
In Pmode, the BIOS will have been able to control what is loaded into
base, limit and attributes of the segment registers. We don't know what
that will have been but we can guess at what's likely. When it changes
back to RM to boot our code it will probably have set
Yes, find tombstones in RAM of what the BIOS left behind after power up.
I once copied first 128MB RAM to disk before loading all of my OS...
Post by James Harris
  limit      = 64k
  attributes = writable, present, expand up, byte granularity, etc
With those values the segment register's hidden parts will be correctly
set for Rmode operation. In Rmode, loads of segment registers will only
have to change selector and base.
yes. even I'd recommend to use the term "selector" for PM only.
Post by James Harris
Post by wolfgang kern
if the base would be 0000 or 07C0 then non of my switches would work.
and why should it be 07c0 ?  my switches aren't in this region.
In Rmode moving 7 into ES would set the base to 112, AIUI, not anything
like 0x7c0.
By contrast, in Real mode loading ES with 0x07c0 would set the base to
0x7c00 - exactly as required for RM operation.
you said that for CS earlier in this thread...
and that's what I see unanswered.
Post by James Harris
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
Post by James Harris
Yes, and Unreal Mode shows that CPUs use PM mechanisms AT ALL
TIMES, even when they are running in what we call "Real Mode".
OR al,1
MOV CR0,eax
LDS si,[cs:0400] ;my 1st switch block is position independent
Post by James Harris
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
If it would be in PM then all the PM instructions I listed earlier
would not crash or raise exceptions. Go figure :)
AIUI - at least on Intel - the instructions you listed should all
execute if PE = 1 and raise exception 6 if PE = 0. Is that not what
happens on your hardware?
not quite accurate, AMD-docs tell explicit that privileged instructions
work only in PM and not in UNREAL mode.
I'd like to see that. I've only been looking at Intel so far. Do you
have a reference (manual and section)?
I remember it well but forgot where I read it,
.. started to scan my docs ... >600 PDFs may take a while...
You might find this sentence in Intel-docs as well (not older than 486+)
Post by James Harris
Post by wolfgang kern
I didn't try by intention, but I remember some hard  crashes during my
first attempts (>30 years ago) on making mixed modes work.
So between setting PE and loading CS there is the UNREAL story where
the CPU behave partly as PM (seg-regs) but wont execute PM only code.
Well, AIUI what's called "unreal" is the opposite: being in Real mode
(PE=0) but with unusual settings for the segments - typically a 4G limit
rather than 64k.
What specifically do you mean by 'PM-only code' that the CPU won't execute?
again:
PM only (raise invalid opcode exception in RM):
63 xx ARPL (seems obsolete now)

0F 00 /1 STR an easy test possibility ie:
0F 00 c8 STR ax |eax |rax
0F 00 08 xx STR [mem16] ; all except RM

0F 00 /0 SLDT r16/r32/r64/m16
0F 00 /2 LLDT rm16
0F 00 /3 LTR
0F 02 /r LAR
0F 03 /r,[m] LSL
Post by James Harris
I could see a chip manufacturer /adding/ limitations on what can be done
in each mode (as Intel have done, e.g. in detecting invalid opcodes) but
they would had had to add those on top of the basic mechanisms.
I think it started with 286, added features vs. backwards compatibility.
__
wolfgang
James Harris
2022-03-11 11:45:25 UTC
Permalink
Post by wolfgang kern
...
Post by James Harris
Post by wolfgang kern
Post by James Harris
For example, starting in real mode
   mov ax, 7
   mov es, ax  <--- ok in real mode, sets ES /selector and base/
   mov eax, cr0
   or eax, 1
   mov cr0, eax  <--- enter protected mode
   jmp $ + 2
0F 00 c8     STR ax
You win this dispute if it doesn't crash :)
That's kind of you, Wolfgang, ... since it doesn't crash! :)

When I tested it it set AX to 0 which is what one would expect to be in
TR's selector at reset.

Why did you think it would crash?
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
Post by James Harris
   mov ax, es
then that should put a 7 into AX even though the CPU is in Pmode.
Just for fun. :)
yes of course, even a first attempt to access with ES would crash.
I ask again what you assume to be: base limit and PL of this pre-PM.
Well, I wouldn't /recommend/ writing code with such an access. It
would probably go against the specs and might be inconsistent between
processors. But if one did then here's what I think the basic
mechanism would be - as may be implemented by the CPU's engineers.
A selector value 7 can't be accesses in PM (will raise an exception).
Be careful not to confuse loading with storing or using. A selector
value of 7 cannot be /loaded/ when in PM but it can be loaded before
switching to Pmode. Take this code:

mov ax, 7
mov fs, ax

mov eax, cr0
or al, 1
mov cr0, eax
jmp $ + 2

mov dx, fs
mov cx, [fs:0]

I tried that, too, and it works. The value copied from FS to DX is 7,
FS's selector which was the segment value loaded in RM and which cannot
be loaded in PM.

Furthermore, the access to FS:0 also works without crashing. No
exception is generated.

Try it. What do you get?
Post by wolfgang kern
my question was about what you think is CS RM or PM limited ?
If I understood the question I'd try to answer it but I cannot parse it. :(

...
Post by wolfgang kern
Post by James Harris
   mov ax, [es:0]
would, in the model, load AX with the contents of address 112 (the
base). But you say it would crash. Could that be because the limit
and/or the attributes did not permit the access?
it would crash for sure if run in PM (0007 isn't valid).
Why would it crash? (As above, it doesn't.) The /selector/ is not used
to access memory. The selector is only used to load the important parts
of a descriptor (BASE, LIMIT and ATTRIBUTES) into the CPU. Once they are
loaded future memory accesses can proceed without referring to the
selector at all.

The memory address accessed is BASE + OFFSET where BASE is in the hidden
portion of the descriptor and OFFSET is in the instruction (0 in this case).

In Protected mode a load sets BASE and the other fields to whatever
values are read from the descriptor table.

In Real mode a load sets BASE to 16 * the segment and leaves the other
fields alone.

But once the fields of a descriptor have been set, memory accesses can
be carried out in the same way in either mode.

This is a remarkably simple model. And it's clean. There's no need to
make it complicated. Segreg loads operate differently in RM and PM but
once the load has taken place then the descriptor has all the info it
needs to handle future memory accesses without using the
segment/selector value as part of the access.

...
Post by wolfgang kern
Yes, find tombstones in RAM of what the BIOS left behind after power up.
I once copied first 128MB RAM to disk before loading all of my OS...
That's the kind of thing I would do. :)

...
Post by wolfgang kern
Post by James Harris
In Rmode moving 7 into ES would set the base to 112, AIUI, not
anything like 0x7c0.
By contrast, in Real mode loading ES with 0x07c0 would set the base to
0x7c00 - exactly as required for RM operation.
you said that for CS earlier in this thread...
and that's what I see unanswered.
I'll answer if I can but what is the question?!

...
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
not quite accurate, AMD-docs tell explicit that privileged instructions
work only in PM and not in UNREAL mode.
I'd like to see that. I've only been looking at Intel so far. Do you
have a reference (manual and section)?
I remember it well but forgot where I read it,
.. started to scan my docs ... >600 PDFs may take a while...
You might find this sentence in Intel-docs as well (not older than 486+)
The word "unreal" doesn't appear anywhere in my full 7-volume set of
Intel manuals. That's unsurprising as it's not really a separate mode,
just Real mode with unusual limits in the segment descriptors. Maybe AMD
is the same.

To be clear, though, what's called "unreal mode" is really Real mode
(PE=0) with unusual segment settings. No PM instructions should work.

But the state after setting PE and before reloading CS is not unreal
mode. I suggest it's Protected mode but with D=0 hence 16-bit Pmode.
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
I didn't try by intention, but I remember some hard  crashes during
my first attempts (>30 years ago) on making mixed modes work.
From what I've seen, interrupts, in particular, could be 'challenging'
with mixed segreg sizes.
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
So between setting PE and loading CS there is the UNREAL story where
the CPU behave partly as PM (seg-regs) but wont execute PM only code.
Well, AIUI what's called "unreal" is the opposite: being in Real mode
(PE=0) but with unusual settings for the segments - typically a 4G
limit rather than 64k.
What specifically do you mean by 'PM-only code' that the CPU won't execute?
63 xx        ARPL  (seems obsolete now)
0F 00 c8     STR ax |eax |rax
0F 00 08 xx  STR [mem16] ; all except RM
0F 00 /0     SLDT r16/r32/r64/m16
0F 00 /2     LLDT rm16
0F 00 /3     LTR
0F 02 /r     LAR
0F 03 /r,[m] LSL
OK. I haven't tested them all but as above, STR executes. Further, so
does LSL. And Intel is very clear:

"The LSL instruction is not recognized in real-address mode"

Again, that backs up my thesis: PM is defined by PE=1 and nothing else.
Nothing else had been changed.

Are you expecting those instructions to execute differently in PM16
compared with what they would do after setting PE=1 and before reloading CS?
--
James Harris
wolfgang kern
2022-03-13 20:18:21 UTC
Permalink
Post by James Harris
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
Post by James Harris
For example, starting in real mode
   mov ax, 7
   mov es, ax  <--- ok in real mode, sets ES /selector and base/
   mov eax, cr0
   or eax, 1
   mov cr0, eax  <--- enter protected mode
   jmp $ + 2
0F 00 c8     STR ax
You win this dispute if it doesn't crash :)
That's kind of you, Wolfgang, ... since it doesn't crash! :)
When I tested it it set AX to 0 which is what one would expect to be in
TR's selector at reset.
Why did you think it would crash?
It would crash if in RM. So you're right and it is already in PM then.
Interesting, which CPU have you tried on ?
my notes for exactly this code on AMD 486 say: EXC_06 due to RM (~1994).

[about data selector 07]
...
Post by James Harris
  mov cx, [fs:0]
I tried that, too, and it works. The value copied from FS to DX is 7,
FS's selector which was the segment value loaded in RM and which cannot
be loaded in PM.
you can load a data selector with 0000 also in PM
but better never us it for any access.
Post by James Harris
Furthermore, the access to FS:0 also works without crashing. No
exception is generated.
yes because it is still not modified after set PE.
Post by James Harris
Try it. What do you get?
no need, I know that. why I said it will crash:
try your line after the final transition to PM.

mov cx, [fs:0] ; if FS is 0007 then it will crash

...
Post by James Harris
Post by wolfgang kern
it would crash for sure if run in PM (0007 isn't valid).
Why would it crash? (As above, it doesn't.) The /selector/ is not used
to access memory. The selector is only used to load the important parts
of a descriptor (BASE, LIMIT and ATTRIBUTES) into the CPU. Once they are
loaded future memory accesses can proceed without referring to the
selector at all.
OK, not much sense to do it except to use FS as a temporary scratch pad
...
Post by James Harris
Post by wolfgang kern
you said that for CS earlier in this thread...
and that's what I see unanswered.
I'll answer if I can but what is the question?!
:) it was about: If it's in PM after setting PE then CS remain RM ?
I can answer this now myself:
all segment registers keep their base/limit as long not altered
regardless of the PE-bit status.
Post by James Harris
...
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
not quite accurate, AMD-docs tell explicit that privileged instructions
work only in PM and not in UNREAL mode.
I'd like to see that. I've only been looking at Intel so far. Do you
have a reference (manual and section)?
I remember it well but forgot where I read it,
.. started to scan my docs ... >600 PDFs may take a while...
You might find this sentence in Intel-docs as well (not older than 486+)
The word "unreal" doesn't appear anywhere in my full 7-volume set of
Intel manuals. That's unsurprising as it's not really a separate mode,
just Real mode with unusual limits in the segment descriptors. Maybe AMD
is the same.
To be clear, though, what's called "unreal mode" is really Real mode
(PE=0) with unusual segment settings. No PM instructions should work.
Yes, UNreal starts with reset of PE and a far jump. So what I remembered
to read was right and privileged PM code wont run in (un)RealMode.
Post by James Harris
But the state after setting PE and before reloading CS is not unreal
mode. I suggest it's Protected mode but with D=0 hence 16-bit Pmode.
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
I didn't try by intention, but I remember some hard  crashes during
my first attempts (>30 years ago) on making mixed modes work.
From what I've seen, interrupts, in particular, could be 'challenging'
with mixed segreg sizes.
Not that much of a challenge if you use my way:
all INT and IRQ use the same stacks the same variables and do exactly
the same in both RM, Unreal, P16M, PM32 and LM.
also all Exceptions from all modes were linked to a single debugger.

[...]
Post by James Harris
Again, that backs up my thesis: PM is defined by PE=1 and nothing else.
Nothing else had been changed.
YOU WON. It took a while to convince myself to see it different yet :)
Post by James Harris
Are you expecting those instructions to execute differently in PM16
compared with what they would do after setting PE=1 and before reloading CS?
I had some bad experience with featured switches back then, so finally
every switch became less smart aka simple and was easier to handle too.
Seems an old wiki note were burned into my brain:
"the transition from RM to PM occur only on the final far jump"
__
wolfgang
wolfgang kern
2022-03-15 07:45:35 UTC
Permalink
[...]
Post by wolfgang kern
YOU WON. It took a while to convince myself to see it different yet :)
I had to give my workstation to an employer, so I can't check myself
one more instruction may be worth to check between set PE and load CS:

2E 89 06 xx xx mov [cs:xxxx],ax ;use any harmless address here

would a set PE cause CS not writable ?
the RM default wont even test that.
__
wolfgang
James Harris
2022-03-15 15:54:20 UTC
Permalink
Post by wolfgang kern
[...]
Post by wolfgang kern
YOU WON. It took a while to convince myself to see it different yet :)
I had to give my workstation to an employer, so I can't check myself
I had a similar logistics problem. My current development is on Fat16
and runs in a VM. To test it on real hardware I'd have to write to a
partition on another machine's HD. That was too much work to set up
straight away so I pulled out some old code which has a Fat12 loader.

It was a bit of a nuisance as I had to build/mount/write/unmount,
transfer a floppy and reboot for each test but it was adequate as long
as there weren't too many experiments to run.
Post by wolfgang kern
2E 89 06 xx xx   mov [cs:xxxx],ax   ;use any harmless address here
would a set PE cause CS not writable ?
the RM default wont even test that.
That's a good test but I haven't got time to run it ATM and I'm going to
be away for a few days. Will reply to posts when I get back.

We should try to predict what will happen. What would you expect? I
don't know yet but I'll have a think about it.
--
James Harris
James Harris
2022-03-23 17:50:22 UTC
Permalink
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
Post by James Harris
For example, starting in real mode
   mov ax, 7
   mov es, ax  <--- ok in real mode, sets ES /selector and base/
   mov eax, cr0
   or eax, 1
   mov cr0, eax  <--- enter protected mode
   jmp $ + 2
0F 00 c8     STR ax
You win this dispute if it doesn't crash :)
That's kind of you, Wolfgang, ... since it doesn't crash! :)
When I tested it it set AX to 0 which is what one would expect to be
in TR's selector at reset.
Why did you think it would crash?
It would crash if in RM. So you're right and it is already in PM then.
Interesting, which CPU have you tried on ?
I tried in VirtualBox first but as that's emulated I dug out an old test
machine which has an Intel Atom. Same behaviour on both.
Post by wolfgang kern
my notes for exactly this code on AMD 486 say: EXC_06 due to RM (~1994).
[about data selector 07]
...
Post by James Harris
   mov cx, [fs:0]
I tried that, too, and it works. The value copied from FS to DX is 7,
FS's selector which was the segment value loaded in RM and which
cannot be loaded in PM.
you can load a data selector with 0000 also in PM
but better never us it for any access.
I realised after posting that 7 was a poor choice of value in one way
but it turned out to be good in another.

It was a poor choice as it's a valid selector in Pmode. I should have
chosen 0 to 3 which are always invalid (in the sense of being inaccessible).

On the other hand it was a good choice as in Pmode it refers to the LDT.
There was no LDT so access via it would have failed in Pmode.

Either way, it shows that the processor was still using the segreg
settings (Base, Limit, Attributes) carried over from Rmode.

But I think we need to go a little further and say that once PE is set
the CPU will use the settings the BIOS left in the descriptor of that
segreg. Remember the model that in Pmode all four parts of the segreg
are loaded (selector, base, limit, attributes) whereas in Rmode only
selector and base are changed. If that's the case then when we get
control the hidden parts of the descriptor will not necessarily be
pristine as they would have been on reset but they will be whatever the
BIOS left there. All the more reason, then, to reload all segment
registers after setting PE=1.
Post by wolfgang kern
Post by James Harris
Furthermore, the access to FS:0 also works without crashing. No
exception is generated.
yes because it is still not modified after set PE.
Yes.
Post by wolfgang kern
Post by James Harris
Try it. What do you get?
try your line after the final transition to PM.
  mov cx, [fs:0]   ; if FS is 0007 then it will crash
You seem to agree elsewhere in your post that the CPU is in Pmode when
PE=1 so by "the final transition to PM" I presume you mean loading
segment registers to switch to /32-bit/ PM.

Primarily:
CS.B (aka "D") says whether the PM mode is 16-bit or 32-bit
SS.B says whether to use SP or ESP

the B bits in the other segment registers don't seem to matter much
unless the segment is expand down. Though it makes sense to fully reload
all segment registers if only to ensure we are starting from a clean
state with nothing left over from the BIOS.

...
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
you said that for CS earlier in this thread...
and that's what I see unanswered.
I'll answer if I can but what is the question?!
:) it was about: If it's in PM after setting PE then CS remain RM ?
all segment registers keep their base/limit as long not altered
regardless of the PE-bit status.
Yes, I think that's the key to understanding this. For each segment
register the CPU uses Base, Limit and Attributes at all times and a
segreg load does:

if PE == 0 (* real mode *)
selector = segment value
base = segment value * 16
else if PE == 1 (* protected mode *)
selector = selector value
base, limit, attributes = from the descriptor
endif

To reiterate, AIUI in Real mode loading a segreg sets only Selector and
Base whereas in Protected mode it also sets Limit and Attributes.

For Intel and probably AMD it looks as though one can get the CPU to
report the Limit with an LSL instruction and the Attributes with LAR but
I don't know a way to get the base.

On Cyrix there's SVDC to write the entire Descriptor to ten bytes of memory.

...
Post by wolfgang kern
Post by James Harris
Again, that backs up my thesis: PM is defined by PE=1 and nothing
else. Nothing else had been changed.
YOU WON. It took a while to convince myself to see it different yet :)
Thanks, though it wasn't a competition. Just an exploration of the topic.
Post by wolfgang kern
Post by James Harris
Are you expecting those instructions to execute differently in PM16
compared with what they would do after setting PE=1 and before reloading CS?
I had some bad experience with featured switches back then, so finally
every switch became less smart aka simple and was easier to handle too.
"the transition from RM to PM occur only on the final far jump"
I've heard similar. Perhaps whoever wrote that didn't know about PM16 -
which I also didn't until a few weeks ago.
--
James Harris
wolfgang kern
2022-03-25 07:33:46 UTC
Permalink
On 23/03/2022 18:50, James Harris wrote:
[agreed..]
Post by James Harris
For Intel and probably AMD it looks as though one can get the CPU to
report the Limit with an LSL instruction and the Attributes with LAR but
I don't know a way to get the base.
get_base: ;selector in eax assume DS=flat
MOV esi,anybuffer
SGDT [ds:esi] ;now you know where the GDT resides
AND eax,FFF8 ;just in case
; SHL eax,3 ;mul by 8
; ADD esi,eax
LEA esi,[eax*8+esi] ;same as the two lines above but faster/shorter
MOV ecx,[esi+2] ;low 24 bits of base
AND ecx,00FFFFFF
MOV bl,[esi+7]
SHL ebx,24 ;decimal yet!
OR ecx,ebx ;ecx hold 32 bit base of selector eax yet

it works on both code and data descriptors.
Post by James Harris
On Cyrix there's SVDC to write the entire Descriptor to ten bytes of memory.
I never used and wont recommend any exotic CPU to rely on.
...
Post by James Harris
Post by wolfgang kern
"the transition from RM to PM occur only on the final far jump"
I've heard similar. Perhaps whoever wrote that didn't know about PM16 -
which I also didn't until a few weeks ago.
me too learned something new even it wont affect any of my code :)
__
wolfgang
James Harris
2022-03-25 09:50:11 UTC
Permalink
Post by wolfgang kern
[agreed..]
Post by James Harris
For Intel and probably AMD it looks as though one can get the CPU to
report the Limit with an LSL instruction and the Attributes with LAR
but I don't know a way to get the base.
get_base:        ;selector in eax assume DS=flat
  MOV esi,anybuffer
  SGDT [ds:esi]   ;now you know where the GDT resides
  AND eax,FFF8    ;just in case
; SHL eax,3       ;mul by 8
; ADD esi,eax
  LEA esi,[eax*8+esi]   ;same as the two lines above but faster/shorter
  MOV ecx,[esi+2]       ;low 24 bits of base
  AND ecx,00FFFFFF
  MOV bl,[esi+7]
  SHL ebx,24            ;decimal yet!
  OR ecx,ebx            ;ecx hold 32 bit base of selector eax yet
it works on both code and data descriptors.
What if the segreg (and, hence, its Base) had been loaded before
switching to Pmode?
Post by wolfgang kern
Post by James Harris
On Cyrix there's SVDC to write the entire Descriptor to ten bytes of memory.
I never used and wont recommend any exotic CPU to rely on.
Nor me. I want my code to be maximally portable. That means either
programming for the lowest common denominator (what you call "the
museum") or having ways to adapt to earlier CPUs.
--
James Harris
wolfgang kern
2022-03-25 19:46:06 UTC
Permalink
Post by James Harris
Post by wolfgang kern
[agreed..]
Post by James Harris
For Intel and probably AMD it looks as though one can get the CPU to
report the Limit with an LSL instruction and the Attributes with LAR
but I don't know a way to get the base.
get_base:        ;selector in eax assume DS=flat
   MOV esi,anybuffer
   SGDT [ds:esi]   ;now you know where the GDT resides
   AND eax,FFF8    ;just in case
; SHL eax,3       ;mul by 8
; ADD esi,eax
   LEA esi,[eax*8+esi]   ;same as the two lines above but faster/shorter
   MOV ecx,[esi+2]       ;low 24 bits of base
   AND ecx,00FFFFFF
   MOV bl,[esi+7]
   SHL ebx,24            ;decimal yet!
   OR ecx,ebx            ;ecx hold 32 bit base of selector eax yet
it works on both code and data descriptors.
What if the segreg (and, hence, its Base) had been loaded before
switching to Pmode?
As long a GDT is already installed this works with 66 and 67 overrides
also in RM.
__
wolfgang
James Harris
2022-03-26 06:52:37 UTC
Permalink
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
[agreed..]
Post by James Harris
For Intel and probably AMD it looks as though one can get the CPU to
report the Limit with an LSL instruction and the Attributes with LAR
but I don't know a way to get the base.
get_base:        ;selector in eax assume DS=flat
   MOV esi,anybuffer
   SGDT [ds:esi]   ;now you know where the GDT resides
   AND eax,FFF8    ;just in case
; SHL eax,3       ;mul by 8
; ADD esi,eax
   LEA esi,[eax*8+esi]   ;same as the two lines above but faster/shorter
   MOV ecx,[esi+2]       ;low 24 bits of base
   AND ecx,00FFFFFF
   MOV bl,[esi+7]
   SHL ebx,24            ;decimal yet!
   OR ecx,ebx            ;ecx hold 32 bit base of selector eax yet
it works on both code and data descriptors.
What if the segreg (and, hence, its Base) had been loaded before
switching to Pmode?
As long a GDT is already installed this works with 66 and 67 overrides
also in RM.
That's not what I mean. Rather, let's say selector and base had been
loaded in Real Mode, then you switched to Protected Mode, your SGDT
wouldn't tell you anything useful for such a case because the base
wouldn't have been loaded from the GDT.

In short, LSL will report the limit but nothing (AFAIK) will report the
base. It's not a problem, BTW, just an observation that there's no
equivalent of LSL for the base.
--
James Harris
wolfgang kern
2022-03-26 18:31:14 UTC
Permalink
Post by James Harris
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
[agreed..]
Post by James Harris
For Intel and probably AMD it looks as though one can get the CPU
to report the Limit with an LSL instruction and the Attributes with
LAR but I don't know a way to get the base.
get_base:        ;selector in eax assume DS=flat
   MOV esi,anybuffer
   SGDT [ds:esi]   ;now you know where the GDT resides
   AND eax,FFF8    ;just in case
; SHL eax,3       ;mul by 8
; ADD esi,eax
   LEA esi,[eax*8+esi]   ;same as the two lines above but faster/shorter
   MOV ecx,[esi+2]       ;low 24 bits of base
   AND ecx,00FFFFFF
   MOV bl,[esi+7]
   SHL ebx,24            ;decimal yet!
   OR ecx,ebx            ;ecx hold 32 bit base of selector eax yet
it works on both code and data descriptors.
What if the segreg (and, hence, its Base) had been loaded before
switching to Pmode?
As long a GDT is already installed this works with 66 and 67 overrides
also in RM.
That's not what I mean. Rather, let's say selector and base had been
loaded in Real Mode, then you switched to Protected Mode, your SGDT
wouldn't tell you anything useful for such a case because the base
wouldn't have been loaded from the GDT.
true after power up or hard RESET. If the BIOS may have played with it
then there must be valid GDT-entries at least for SS and DS.

But you can't switch to PM without valid entries in the GDT :)
I wont recommend to try this test between set PE and load CS.
Post by James Harris
In short, LSL will report the limit but nothing (AFAIK) will report the
base. It's not a problem, BTW, just an observation that there's no
equivalent of LSL for the base.
But right, while in RM you can't be sure if a GDT exists or not.
Only after RESET you can rely on the default setting.

Finally I never needed that test because I decided what's where and how
all by myself :)
__
wolfgang

Scott Lurndal
2022-03-14 16:26:00 UTC
Permalink
Post by James Harris
Post by wolfgang kern
...
Post by James Harris
For example, starting in real mode
   mov ax, 7
   mov es, ax  <--- ok in real mode, sets ES /selector and base/
   mov eax, cr0
   or eax, 1
   mov cr0, eax  <--- enter protected mode
   jmp $ + 2
   mov ax, es
then that should put a 7 into AX even though the CPU is in Pmode.
Just for fun. :)
yes of course, even a first attempt to access with ES would crash.
I ask again what you assume to be: base limit and PL of this pre-PM.
Well, I wouldn't /recommend/ writing code with such an access. It would
probably go against the specs and might be inconsistent between
processors. But if one did then here's what I think the basic mechanism
would be - as may be implemented by the CPU's engineers.
The thing to bear in mind that segment registers have multiple fields,
most of which are hidden in Rmode. One could consider ES as having these
ES.selector
ES.base
ES.limit
ES.attributes
The only part we normally see is the selector.
Given that model of a segment register, the code
mov ax, 7
mov es, ax
when run in Rmode would have the following effect
ES.selector = 7
ES.base = 112 (7 times 16)
and, importantly, _nothing else_. It would only change the selector and
the base. It would not alter the other parts of ES. Hence they would
retain the values they had before, i.e.
ES.limit = whatever was there from before
ES.attributes = whatever was there from before
Therefore, if ES.limit was large enough and ES.attributes allowed the
access then
mov ax, [es:0]
would, in the model, load AX with the contents of address 112 (the
base). But you say it would crash. Could that be because the limit
and/or the attributes did not permit the access?
Remember that although the CPU powers up in Real mode and is typically
in Real mode when we get control it has probably taken a trip to Pmode
and back since being booted. If nothing else, the BIOS has to change to
Pmode (set PE = 1) so it can check the RAM.
In Pmode, the BIOS will have been able to control what is loaded into
base, limit and attributes of the segment registers. We don't know what
that will have been but we can guess at what's likely. When it changes
back to RM to boot our code it will probably have set
limit = 64k
attributes = writable, present, expand up, byte granularity, etc
With those values the segment register's hidden parts will be correctly
set for Rmode operation. In Rmode, loads of segment registers will only
have to change selector and base.
Chapter 8 80286 Compatability (80386 System Software Writers Guide - 1987):

"In protected mode, the processor interprets an instruction according
to the content of the descriptors that are in effect at the time the
instruction is executed. For example, suppose a JMP instruction target
is 100,000 bytes from the beginning of the code segment. The instruction
faults if the code segment was produced by a 80286 translator because
the code limit is 64 Kbytes. The same instruction doesn't fault if the
descriptor specifies a larger limit (i.e. if it was created by an 80386
translator). Thus code segments are self-identifying; they establish
either a 80286 or 80386 "execution environment" for each instruction.
Note, however, that the 80386 does not trap an attempt by a 80286
code segment to execute an 80386 instruction that is undefined for a
80286."

Chapter 9 8086 Compatability

"A debugged 8086 binary should not contain 80286 or 80386 instructions
because such an instruction causes undefined behavior if executed by
an 8086. Nevertheless, if such instructions are present in an 8086
program that is executed in real or V86 mode, they can be executed
as shown in table 9-1. Note that as described in the next section,
the 80386 32-bit instruction and operand addressing extensions can be
executed in real and V86 modes but are subject to 64 Kbyte limit
checking. For example, an attempt to JMP or CALL to an offset
greater than 64k results in a #GP, as does an attempt to address an
operand located at an offset higher than 64K.


...
"However, descriptors do not exist in real mode and V86 mode; therefore,
the contents of the descriptor registers in these modes are called
pseudodescriptors. ... The processor initializes some pseudodescriptor
values when it is reset (see Chapter 6) and loads others when it
switches from protected mode to real mode.

"Software running in real mode or V86 mode can change the base address
in a descriptor register. In these modes, the 80386 interprets a
selector operand as a 16-bit address. When the processor loads a
descriptor register in real mode or V86 mode, it shift the selector
value left by four and loads the resulting base address into the associated
descriptor register.

"The pseudodescriptor-based addressing used by the 80386 in real and
V86 modes differs from 8086 addressing (but is identical to 80286
addressing) at the one-megabyte 8086 address space boundary. (ed. A20)
Under the same conditions, the 80386 running in real or V86 mode
generates a linear address while the 8086 would have wrapped. The
maximum linear address in real mode is 0x10ffef.

As an aside, the IAPX 86/88 manual indicates that the BIU (Bus
Interface Unit) prefetches six instruction bytes on the 8086 and
three instruction bytes on the 8088.
wolfgang kern
2022-02-14 09:56:40 UTC
Permalink
On 13/02/2022 16:50, James Harris wrote:

[about stack...]
Post by James Harris
Not sure what you mean but AIUI the B bit (big bit) of the SS descriptor
selects the size of stack pointer (32-bit ESP or 16-bit SP) used for
implicit stack references.
Rather than having all segments 32-bit or all segments 16-bit it is
looking more and more likely that a programmer could use any arbitrary
mix of 16-bit and 32-bit segments - even on current processors - so
having a 'big' code segment would make operands and addresses default to
32-bit while simultaneously having a 'small' stack segment would make
implicit stack references use SP rather than ESP.
you mean implicit stack references (all push pop call return)?

BUT how about
PM32:
8B 44 24 fc mov eax.[esp-04] ;SP or ESP depending on seg-size ?
RM:
67 8B 44 24 fc mov ax,[esp-04] ;could have an UnReal flat big stack

and I'm not sure yet if my mixed code CALL/RET work on SP only due to
my 16 bit stack. OK I use 66 c3 and 66 E8xxxxxxxx here and there and my
esp is always in 16 bit range (initially decided to fit BIOS calls).
So I never noticed it's using only SP.
__
wolfgang
James Harris
2022-02-14 15:55:44 UTC
Permalink
Post by wolfgang kern
[about stack...]
I have previously only ever had to think about Real Mode (which is
always 16-bit) or the form of Protected Mode which is entirely 32-bit,
i.e. where all segments are 32-bit. The idea of having PMode where some
segments are 16-bit and others are 32-bit is entirely new to me but I
think it is yielding insights into how the processor works.
Post by wolfgang kern
Post by James Harris
Not sure what you mean but AIUI the B bit (big bit) of the SS
descriptor selects the size of stack pointer (32-bit ESP or 16-bit SP)
used for implicit stack references.
Rather than having all segments 32-bit or all segments 16-bit it is
looking more and more likely that a programmer could use any arbitrary
mix of 16-bit and 32-bit segments - even on current processors - so
having a 'big' code segment would make operands and addresses default
to 32-bit while simultaneously having a 'small' stack segment would
make implicit stack references use SP rather than ESP.
you mean implicit stack references (all push pop call return)?
Yes.

When working with a mix of 16-bit and 32-bit segments it seems there are
at least THREE sizes we need to be aware of.

Address Size
Operand Size
Stack Address Size

The following definition of PUSH depends on the latter two. Note that
whether to use SP or ESP depends on StackAddrSize:


IF StackAddrSize = 16
THEN
IF OperandSize = 16 THEN
SP := SP - 2;
(SS:SP) := (SOURCE); (* word assignment *)
ELSE
SP := SP - 4;
(SS:SP) := (SOURCE); (* dword assignment *)
FI;
ELSE (* StackAddrSize = 32 *)
IF OperandSize = 16
THEN
ESP := ESP - 2;
(SS:ESP) := (SOURCE); (* word assignment *)
ELSE
ESP := ESP - 4;
(SS:ESP) := (SOURCE); (* dword assignment *)
FI;
FI;

Ref: http://www.scs.stanford.edu/05au-cs240c/lab/i386/PUSH.htm


So where are such sizes defined?

AddressSize and OperandSize are defined by the code segment's D bit
possibly overridden with an AdSiz or OpSiz prefix. (I think that bit
also determines how instruction bytes are interpreted. The 16-bit RM and
PM16 have essentially the same encodings but the encodings for 32-bit
code are different.)

StackAddrSize is more interesting - at least to me. Where is it defined?
It turns out that: "The stack address-size attribute is controlled by
the B-bit of the data-segment descriptor in the SS register."

Ref: http://www.scs.stanford.edu/05au-cs240c/lab/i386/s17_01.htm

So that's the key: whether to use SP or ESP depends on the B bit in the
SS descriptor.

Notably, while the B bit sets only StackAddrSize the D bit sets the
defaults for both AddressSize and OperandSize.

BTW, the D bit is what Intel calls the B bit when it's on a code
descriptor. I don't know why they didn't use the same name.
Post by wolfgang kern
BUT how about
8B 44 24 fc     mov eax.[esp-04]  ;SP or ESP depending on seg-size ?
67 8B 44 24 fc  mov ax,[esp-04]   ;could have an UnReal flat big stack
How do you interpret those?

BTW, what happens when referring to BP or EBP as in

mov eax, [ebp + 4]
sub ebp, 8

Does such code use the SS descriptor's B bit?
Post by wolfgang kern
and I'm not sure yet if my mixed code CALL/RET work on SP only due to
my 16 bit stack. OK I use 66 c3 and 66 E8xxxxxxxx here and there and my
esp is always in 16 bit range (initially decided to fit BIOS calls).
So I never noticed it's using only SP.
Does the above help?
--
James Harris
wolfgang kern
2022-02-16 02:07:08 UTC
Permalink
Post by James Harris
Post by wolfgang kern
[about stack...]
I have previously only ever had to think about Real Mode (which is
always 16-bit) or the form of Protected Mode which is entirely 32-bit,
i.e. where all segments are 32-bit. The idea of having PMode where some
segments are 16-bit and others are 32-bit is entirely new to me but I
think it is yielding insights into how the processor works.
I played a lot around with these options and finally decided for a mix.

[the Big bit...]>> you mean implicit stack references (all push pop call
return)?
Post by James Harris
Yes.
When working with a mix of 16-bit and 32-bit segments it seems there are
at least THREE sizes we need to be aware of.
...
Post by James Harris
So where are such sizes defined?
...
I knew all that :) just didn't remember because there were no problems.
Post by James Harris
Post by wolfgang kern
BUT how about
8B 44 24 fc     mov eax.[esp-04]  ;SP or ESP depending on seg-size ?
67 8B 44 24 fc  mov ax,[esp-04]   ;could have an UnReal flat big stack
How do you interpret those?
my disassembler do this for me.
Post by James Harris
BTW, what happens when referring to BP or EBP as in
  mov eax, [ebp + 4]
  sub ebp, 8
Does such code use the SS descriptor's B bit?
Yes, at least on CPUs which still support the B bit.
Post by James Harris
Post by wolfgang kern
and I'm not sure yet if my mixed code CALL/RET work on SP only due to
my 16 bit stack. OK I use 66 c3 and 66 E8xxxxxxxx here and there and
my esp is always in 16 bit range (initially decided to fit BIOS calls).
So I never noticed it's using only SP.
Does the above help?
:) thanks it helped to remember.
__
wolfgang
James Harris
2022-02-21 12:13:57 UTC
Permalink
...
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
BUT how about
8B 44 24 fc     mov eax.[esp-04]  ;SP or ESP depending on seg-size ?
67 8B 44 24 fc  mov ax,[esp-04]   ;could have an UnReal flat big stack
How do you interpret those?
my disassembler do this for me.
:) I was asking what you thought they meant (in the context of
different descriptor settings).

I've been looking in to this a bit more and will have a go at it. See if
you think I've got it right or not.



Your first example:

PM32:
8B 44 24 fc mov eax.[esp-04] ;SP or ESP depending on seg-size ?

AIUI:

* that will NOT use the SS segment's 'B' bit (which selects SP or ESP
but does so only for /implicit/ references to the stack)

* it will use the CS segment's D bit being set to 1 in two ways:
1) it will have adsiz as 32-bit so recognising ESP rather than SP
2) it will have opsiz as 32-bit so recognising EAX rather than AX

* the accessible range will depend on DS.limit, not SS.limit

* it will access ESP relative to DS.base, not relative to SS.base -
which could be a source of significant confusion if they don't match.



Your second example:

RM:
67 8B 44 24 fc mov ax,[esp-04] ;could have an UnReal flat big stack

* again, won't use anything about the SS segment

* will use CS.D=0 to recognise AX

* will use CS.D=0 and adsiz to recognise ESP

* addressable range will depend on DS.limit, not SS.limit

* linear address will depend on DS.base, not SS.base.



Note that it appears that the Big bit on DS (i.e. DS.B) is ignored even
though the instructions access DS. Also, the base and limit and
everything else from SS are ignored for those instructions.

The findings may be unexpected (they certainly surprised me) but see the
section entitled Segment Descriptors in CHAPTER 3 of PROTECTED-MODE
MEMORY MANAGEMENT in Intel manuals.
Post by wolfgang kern
Post by James Harris
BTW, what happens when referring to BP or EBP as in
   mov eax, [ebp + 4]
   sub ebp, 8
Does such code use the SS descriptor's B bit?
Yes, at least on CPUs which still support the B bit.
Any advance on "Yes"? :) As above, the Segment Descriptors section of
the Intel manual suggests otherwise.
--
James Harris
wolfgang kern
2022-02-21 14:30:43 UTC
Permalink
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
BUT how about
8B 44 24 fc     mov eax.[esp-04]  ;SP or ESP depending on seg-size ?
67 8B 44 24 fc  mov ax,[esp-04]   ;could have an UnReal flat big stack
How do you interpret those?
my disassembler do this for me.
:)  I was asking what you thought they meant (in the context of
different descriptor settings).
I've been looking in to this a bit more and will have a go at it. See if
you think I've got it right or not.
can't give you an A here ...
both of my and your example instructions use SS by default.
  8B 44 24 fc  mov eax.[esp-04]  ;SP or ESP depending on seg-size ?
* that will NOT use the SS segment's 'B' bit (which selects SP or ESP
but does so only for /implicit/ references to the stack)
  1) it will have adsiz as 32-bit so recognising ESP rather than SP
  2) it will have opsiz as 32-bit so recognising EAX rather than AX
* the accessible range will depend on DS.limit, not SS.limit
* it will access ESP relative to DS.base, not relative to SS.base -
which could be a source of significant confusion if they don't match.
My SS use a PM16 data-descriptor and my DS a Flat PM32 one.
No confusion detected :)
  67 8B 44 24 fc  mov ax,[esp-04]   ;could have an UnReal flat big stack
* again, won't use anything about the SS segment
* will use CS.D=0 to recognise AX
yeah, RM uses 16 bit for AX by default, but 67 allow SIB (uses SS)
* will use CS.D=0 and adsiz to recognise ESP
there aren't any [SP] addressing modes in RM nor in PM except
PUSH/POP/CALL/RET.
* addressable range will depend on DS.limit, not SS.limit
* linear address will depend on DS.base, not SS.base.
this is wrong!
Note that it appears that the Big bit on DS (i.e. DS.B) is ignored even
though the instructions access DS. Also, the base and limit and
everything else from SS are ignored for those instructions.
The findings may be unexpected (they certainly surprised me) but see the
section entitled Segment Descriptors in CHAPTER 3 of PROTECTED-MODE
MEMORY MANAGEMENT in Intel manuals.
such false statements are really printed in Intel docs ?
perhaps things changed meanwhile (40 years later) 80186/80286 books may
not apply to 80386++.

My AMD docs seem to follow the truth and I can confirm that all
[eBP/eSP]-based instructions including SIB-styled use SS.
Post by wolfgang kern
Post by James Harris
BTW, what happens when referring to BP or EBP as in
   mov eax, [ebp + 4]
   sub ebp, 8
Does such code use the SS descriptor's B bit?
Yes, at least on CPUs which still support the B bit.
Any advance on "Yes"? :)  As above, the Segment Descriptors section of
the Intel manual suggests otherwise.
__
wolfgang
Scott Lurndal
2022-02-21 15:48:11 UTC
Permalink
Post by wolfgang kern
Post by James Harris
The findings may be unexpected (they certainly surprised me) but see the
section entitled Segment Descriptors in CHAPTER 3 of PROTECTED-MODE
MEMORY MANAGEMENT in Intel manuals.
such false statements are really printed in Intel docs ?
perhaps things changed meanwhile (40 years later) 80186/80286 books may
not apply to 80386++.
My AMD docs seem to follow the truth and I can confirm that all
[eBP/eSP]-based instructions including SIB-styled use SS.
The QEMU source base is a useful reference for these types of
questions.
James Harris
2022-02-21 18:02:26 UTC
Permalink
Post by wolfgang kern
Post by wolfgang kern
Post by James Harris
Post by wolfgang kern
BUT how about
8B 44 24 fc     mov eax.[esp-04]  ;SP or ESP depending on seg-size ?
67 8B 44 24 fc  mov ax,[esp-04]   ;could have an UnReal flat big stack
How do you interpret those?
my disassembler do this for me.
:)  I was asking what you thought they meant (in the context of
different descriptor settings).
I've been looking in to this a bit more and will have a go at it. See
if you think I've got it right or not.
can't give you an A here ...
both of my and your example instructions use SS by default.
You are right about using SS. Since the earlier post I've found in the
Intel manual under Default Segment Selection Rules the text for uses of
SS and it mentions ESP as well as EBP: "Any memory reference which uses
the ESP or EBP register as a base register." I'd missed off ESP. I will
correct my comments, below, though please don't get diverted by that
because the selection of base register is a side issue compared with the
significance(s) of the D/B bit.

I should say that IME assemblers have no directives for RM or PM32 but
only for "16-bit" and "32-bit" and AISI they are correct in that because
for both of the 16-bit modes (RM and PM16) the instruction encodings are
identical.

Therefore, AISI your second example (which you labelled "RM") cannot
really be RM as it refers to ESP so I presume you mean PM16 (which uses
16-bit encodings but can refer to 32-bit registers).

BTW, to be clear, you referred to /my/ examples but I just copied yours
so there are only two examples, not four.
Post by wolfgang kern
   8B 44 24 fc  mov eax.[esp-04]  ;SP or ESP depending on seg-size ?
* that will NOT use the SS segment's 'B' bit (which selects SP or ESP
but does so only for /implicit/ references to the stack)
I stand by that part of the assessment (until told otherwise!). The B
bit is only used for implicit stack operations and is ignored for
explicit ones.
Post by wolfgang kern
   1) it will have adsiz as 32-bit so recognising ESP rather than SP
   2) it will have opsiz as 32-bit so recognising EAX rather than AX
I stand by that, too.
Post by wolfgang kern
* the accessible range will depend on DS.limit, not SS.limit
* it will access ESP relative to DS.base, not relative to SS.base -
which could be a source of significant confusion if they don't match.
My SS use a PM16 data-descriptor and my DS a Flat PM32 one.
No confusion detected :)
Agreed. If MOV EAX,[ESP - 4] uses ESP as a base register then it /will/
use SS.limit and SS.base - although for that instruction the CPU will
still ignore SS.B. Agreed?
Post by wolfgang kern
   67 8B 44 24 fc  mov ax,[esp-04]   ;could have an UnReal flat big stack
* again, won't use anything about the SS segment
* will use CS.D=0 to recognise AX
yeah, RM uses 16 bit for AX by default, but 67 allow SIB (uses SS)
Agreed.
Post by wolfgang kern
* will use CS.D=0 and adsiz to recognise ESP
there aren't any [SP] addressing modes in RM nor in PM except
PUSH/POP/CALL/RET.
* addressable range will depend on DS.limit, not SS.limit
* linear address will depend on DS.base, not SS.base.
this is wrong!
Agreed. Issue as above. SS.limit and SS.base will be used.
Post by wolfgang kern
Note that it appears that the Big bit on DS (i.e. DS.B) is ignored
even though the instructions access DS. Also, the base and limit and
everything else from SS are ignored for those instructions.
The findings may be unexpected (they certainly surprised me) but see
the section entitled Segment Descriptors in CHAPTER 3 of
PROTECTED-MODE MEMORY MANAGEMENT in Intel manuals.
such false statements are really printed in Intel docs ?
No, what Intel says is as follows. Remember this is about the B bit (aka
D bit on a code seg) which is supposed to be 0 for 16-bit and 1 for
32-bit operation - and which I think we disagree.




D/B (default operation size/default stack pointer size and/or upper
bound) flag

Performs different functions depending on whether the segment descriptor
is an executable code segment, an expand-down data segment, or a stack
segment. (This flag should always be set to 1 for 32-bit code and data
segments and to 0 for 16-bit code and data segments.)

• Executable code segment. The flag is called the D flag and it
indicates the default length for effective addresses and operands
referenced by instructions in the segment. If the flag is set, 32-bit
addresses and 32-bit or 8-bit operands are assumed; if it is clear,
16-bit addresses and 16-bit or 8-bit operands are assumed. The
instruction prefix 66H can be used to select an operand size other than
the default, and the prefix 67H can be used select an address size other
than the default.

• Stack segment (data segment pointed to by the SS register). The flag
is called the B (big) flag and it specifies the size of the stack
pointer used for implicit stack operations (such as pushes, pops, and
calls). If the flag is set, a 32-bit stack pointer is used, which is
stored in the 32-bit ESP register; if the flag is clear, a 16-bit stack
pointer is used, which is stored in the 16-bit SP register. If the stack
segment is set up to be an expand-down data segment (described in the
next paragraph), the B flag also specifies the upper bound of the stack
segment.

• Expand-down data segment. The flag is called the B flag and it
specifies the upper bound of the segment. If the flag is set, the upper
bound is FFFFFFFFH (4 GBytes); if the flag is clear, the upper bound is
FFFFH (64 KBytes).




That's all the Intel docs say in that section about the D/B flag. Some
points of note:

* The bit is in the last 16 bits of a descriptor so will be zero for
PM16 for which all those bits are expected to be zero.

* Although Intel say the bit should be set to 1 on all 32-bit segments,
according to the description Intel gives, above, the CPU doesn't check
the bit on any data segments other than SS.

* Even on the stack segment the bit still doesn't influence address size
or operand size of any explicit references to that segment.

IOW even if the B bit were to be clear on DS, ES etc you could still
access them as normal as 32-bit segments.

Going a little further, the B bit is distinct from the G (granularity)
bit so you could define a limit in terms of 4k chunks and access all 4G
with the B bit still being zero.

Surprised? I was.
Post by wolfgang kern
Post by wolfgang kern
Post by James Harris
BTW, what happens when referring to BP or EBP as in
   mov eax, [ebp + 4]
   sub ebp, 8
Does such code use the SS descriptor's B bit?
Yes, at least on CPUs which still support the B bit.
Any advance on "Yes"? :)  As above, the Segment Descriptors section of
the Intel manual suggests otherwise.
I stand by that, too. For the reasons mentioned above it seems that your
MOV-SUB pair would _not_ take any notice of the SS.B bit.

As ever, further corrections welcome. :)
--
James Harris
wolfgang kern
2022-02-21 21:27:43 UTC
Permalink
Post by James Harris
....
Therefore, AISI your second example (which you labelled "RM") cannot
really be RM as it refers to ESP so I presume you mean PM16 (which uses
16-bit encodings but can refer to 32-bit registers).
NO, the 67 override alters only the addressing modes.
RM example: 67 8B 24 24 mov SP,[esp] ;JFI but not much sense.
Post by James Harris
Post by wolfgang kern
   8B 44 24 fc  mov eax.[esp-04]  ;SP or ESP depending on seg-size ?
* that will NOT use the SS segment's 'B' bit (which selects SP or ESP
but does so only for /implicit/ references to the stack)
I stand by that part of the assessment (until told otherwise!). The B
bit is only used for implicit stack operations and is ignored for
explicit ones.
might be worth to check on this again, I made my ESP 0000xxxx anyway,
perhaps BIOS calls weren't my only reason 30 years ago :)
Post by James Harris
Post by wolfgang kern
   1) it will have adsiz as 32-bit so recognising ESP rather than SP
   2) it will have opsiz as 32-bit so recognising EAX rather than AX
I stand by that, too.
Yeah, this is default of PM32 :)
...
Post by James Harris
Post by wolfgang kern
My SS use a PM16 data-descriptor and my DS a Flat PM32 one.
No confusion detected :)
Agreed. If MOV EAX,[ESP - 4] uses ESP as a base register then it /will/
use SS.limit and SS.base - although for that instruction the CPU will
still ignore SS.B. Agreed?
Fine if it uses ESP, but it wont ignore the limits.
My stack wraps around at 64K (if not misaligned)... so as if SP in use.
...Could be that my point of view differs because my stack resides on a
"normal" 64K data segment.
__
wolfgang
Scott Lurndal
2022-02-13 17:17:27 UTC
Permalink
Post by James Harris
I think I may have come up with a clearer insight into what happens at
each step of enabling Pmode - and it's very little! See below for details.
Post by wolfgang kern
...
exactly what differences are there between instruction decoding in
real mode and in PM16 (the mode immediately after setting CR0 bit 0?
As I say, this is all largely academic but if you happen to know the
answer without doing any research do say as the details look interesting.
1. this EB 00 after write CR0 were never required, at least not by me.
From what I've found recently it looks as though it would be rare for
anyone to need that jump. (Though it or something like it is still right
to include to cover the unusual cases.)
Post by wolfgang kern
2. setting PE does nothing on its own, the CPU remain in real mode until
   the far jump which changes interpretation from segment to descriptor.
   and its a 16:16 code without prefix
I am not sure that's right, Wolfgang. I am beginning to think that once
PE is set the processor will be in 16-bit Protected Mode (PM16); in that
mode the encoding of instructions will be identical to RM; and the main
differences will be when loading segment registers. There may also be
some differences when /using/ segment registers but see below.
I suspect that this behavior varies based on the processor generation.

The original 80286 in 1982 (design started in the late 70's) is neither
pipelined nor superscalar.

While it is likely that the 8086 (1978) used a PLA
to drive the instruction decode and execution, much like the 6502 (1975),
thats definitely not the case for the 80286.

https://www.pagetable.com/?p=39 (the 6502 block diagram here is blurry, a
better version is available via google images).

The 80286 broke the chip up in to unit blocks (Instruction, Bus, Execution
and Address), where the instruction unit produced a queue (FIFO in hardware
terms) of decode instuctions passed to the execution unit.

This meant that there is a 'depth-of-decode-queue' window between
decoding and executing an instruction; for those instructions whose
decode stage involved knowledge of the PM flag but were decoded
before the instruction to set the flag was executed, they'll be
executing in an indeterminate state (unless the programmer knows
the absolute depth of the queue under all circumstances, the
programmer cannot make any assumptions about the environment of
any instructions between setting the PM flag and loading the CS
register via a jump instruction.

https://electronicsdesk.com/80286-microprocessor.html
James Harris
2022-02-14 16:23:01 UTC
Permalink
...
Post by Scott Lurndal
Post by James Harris
Post by wolfgang kern
1. this EB 00 after write CR0 were never required, at least not by me.
From what I've found recently it looks as though it would be rare for
anyone to need that jump. (Though it or something like it is still right
to include to cover the unusual cases.)
...
Post by Scott Lurndal
I suspect that this behavior varies based on the processor generation.
Yes, if you are talking about the length of the decode queue then I
agree. It will depend on the specific processor and it's
non-architectural so cannot be predetermined.

...
Post by Scott Lurndal
This meant that there is a 'depth-of-decode-queue' window between
decoding and executing an instruction; for those instructions whose
decode stage involved knowledge of the PM flag but were decoded
before the instruction to set the flag was executed, they'll be
executing in an indeterminate state (unless the programmer knows
the absolute depth of the queue under all circumstances, the
programmer cannot make any assumptions about the environment of
any instructions between setting the PM flag and loading the CS
register via a jump instruction.
Yes, it's possible that all instructions would decode under the wrong
assumptions in certain processors; one would have to run tests to find
out for sure. But it's interesting that PM16 uses the /same/ instruction
encodings and addressing limitations as RM, and that the main semantic
differences are in the interpretation of loading segment registers.
--
James Harris
Rod Pemberton
2021-06-12 23:31:20 UTC
Permalink
On Sat, 12 Jun 2021 12:06:11 -0700 (PDT)
Post by James Harris
The Sybex book Programming the 80386 says on the back that its
authors are one of the 80386's logic designers and the 80386's (and
thus x86-32's) chief architect, so they ought to know a thing or two
about the design! On page 605 the book says that after setting the PE
bit the processor enters "16-bit protected mode" - which I didn't
know even existed (but see below).
I didn't fully re-read the 2015 thread, but I think my perspective
hasn't changed any. I.e., after enabling CR0.PE, it's the re-loading
the CS selector which causes the processor to switch into protected
mode, be it 16-bit PM or 32-bit PM. Without the jump to re-load CS, CS
is still set to a 16-bit RM segment. I.e., the processor is still
executing 16-bit RM code after CR0.PE is set, but prior to a far jump
re-loading CS with a valid selector for PM. In other words, simply
enabling CR0.PE doesn't activate PM, the code is still 16-bit RM code,
but perhaps some of the PM features have been enabled by setting CR0.PE.

When CR0.PE is enabled, CS isn't a selector but a segment, and so CS
doesn't point to a descriptor. No descriptor means no CS.ar.D bit
which determines a 16-bit PM or 32-bit PM code segment. CS won't be
set to a selector with a valid descriptor with a CS.ar.D bit, until a
far jump re-loads CS, but only after CR0.PE is enabled.
Post by James Harris
The different stages of enabling PMode are now possibly easier to guess at.
1. After enabling CR0.PE
The processor will be in 16-bit PM but early processors (including
386 and 486) did not automatically flush the prefetch queue so they
could have already decoded some of the following bytes as RM. I would
guess that many such decodings would be different but that some could
be the same. If that's right then some instructions could be validly
executed here even without flushing the prefetch queue. There's no
value in doing so but ISTM informative to see exactly what it likely
to change and when.
I'm still of the mind that after CR0.PE is set, a valid 16-bit PM
selector must be loaded into CS via a far jump to enable 16-bit PM or
32-bit PM. Otherwise, CS still contains a 16-bit RM segment, and the
processor is still executing 16-bit RM code, perhaps with some PM
features enabled due to CR0.PE being set.
--
What is hidden in the ground, when found, is hidden there again?
wolfgang kern
2021-06-13 13:54:09 UTC
Permalink
Post by James Harris
Post by James Harris
You know that, per Intel's directions, after setting the CR0 PE flag
with
mov eax, cr0
or al, 1
mov cr0, eax
we are expected to have something like
jmp seg:pmode_running
I had taken that jump instruction for granted but the recent Qemu/GDT
thread has brought up some issues about the jump, as follows.
1. The jump appears to be necessary in order to put the correct pmode
GDT entry number in CS (in its upper 13 bits, i.e. shifted left 3 bits)
and also to set the low bits of CS so that they contain the CPL and TI,
all of which should be zero.
Going back to this old thread as I have some more information.
The Sybex book Programming the 80386 says on the back that its authors are one of the 80386's logic designers and the 80386's (and thus x86-32's) chief architect, so they ought to know a thing or two about the design! On page 605 the book says that after setting the PE bit the processor enters "16-bit protected mode" - which I didn't know even existed (but see below).
Old (some newer as well) docs are often written with doubtful wording.
I remember my early attempts to enter PM (but 486), it doesn't work that
way, PM16 or PM32 start is exactly at the point where CS become altered.
Post by James Harris
The processor apparently stays in that mode (Protected Mode but 16-bit) for as long as we want, and what changes it to 32-bit PMode is an instruction which loads CS with the selector for a 32-bit descriptor.
PM16 or 32 just differ by one bit in the code segment selector.
Post by James Harris
Correspondingly, it would appear to be possible to change /back/ to 16-bit PMode by loading CS with the selector of a 16-bit descriptor.
Yes, my mode switch from PM32 to RM16 needs one step to PM16 in between
but the switches from RM16 to PM32 or LM don't need a PM16 step.

...
Post by James Harris
https://en.wikipedia.org/wiki/Segment_descriptor#Structure
Basically, B=0 means 16-bit and B=1 means 32-bit.
I knew about the bit, of course, but never really understood it; and I didn't realise that B=0 was the mode the processor executed in after enabling PMode prior to reloading CS.
yeah, it wasn't too well documented :)
Post by James Harris
The different stages of enabling PMode are now possibly easier to guess at.
1. After enabling CR0.PE
The processor will be in 16-bit PM
NO!
Post by James Harris
but early processors (including 386 and 486) did not automatically flush the prefetch queue so they could have already decoded some of the following bytes as RM. I would guess that many such decodings would be different but that some could be the same. If that's right then some instructions could be validly executed here even without flushing the prefetch queue. There's no value in doing so but ISTM informative to see exactly what it likely to change and when.
2. After flushing the prefetch queue
This will apparently be true 16-bit PM with a 64k limit on code addresses - and probably the same for data addresses.
From this point it turns out that contrary to normal practice one could load selectors for 32-bit /data/ descriptors while still running the code in 16-bit mode. I say that because the code in
https://archive.org/stream/bitsavers_intel80386ammersReferenceManual1986_27457025/230985-001_80386_Programmers_Reference_Manual_1986_djvu.txt
it's a bit dated :)
Post by James Harris
LGDT tGDT__pword
missing: CLI
and the GDT should already contain valid entries here.
Post by James Harris
; switch to protected mode
correction: prepare to switch
Post by James Harris
MOV EAX,CR0
MOV EAX,1
MOV CR0,EAX
redundant:
;> ; clear prefetch queue
;> JMP SHORT flush
Post by James Harris
; set DS,ES,SS to address flat linear space (0 ... 4GB)
MOV BX,FLAT_DES-Temp_GDT
MOV DS,BX
MOV ES,BX
MOV SS,BX
Note the data selectors being loaded before the code selector (which the code changes much later) - and the botched update of CR0 which appeared in many Intel sources of the time.
it doesn't matter where data/stack selectors were initialized as long
they aren't used. I make DS SS:ESP before, ES,FS.GS after the far jump.
Post by James Harris
FWIW, the code also goes on to do a bunch of other stuff such as
; initialize stack pointer to some (arbitrary) RAM location
MOV ESP, OFFSET end_Temp_GDT
wherever you want it to be :)
Post by James Harris
; copy eprom GDT to RAM
MOV ESI, DWORD PTR GDT_eprom +2 ; get base of eprom GDT
MOV EDI,GDTbase
MOV CX,WORD PTR gdt_eprom +0
INC CX
SHR CX,1
CLD
REP MOVS WORD PTR ES : [EDI] , WORD PTR DS:[ESI]
; point ES:EDI to GDT base in RAM.
; limit of eprom GDT
; easier to move words
;copy eprom IDT to RAM
MOV ESI, DWORD PTR IDT_eprom +2 ; get base of eprom IDT
MOV EDI,IDTbase
MOV CX,WORD PTR idt_eprom +0
INC CX
SHR CX,1
CLD
REP MOVS WORD PTR ES : [EDI] , WORD PTR DS:[ESI]
this ROM GDT may point to an already wiped RAM area !!!
modern (already old now) BIOS use only temporary PM32 and LM.
Post by James Harris
etc, all before setting CS to a 32-bit descriptor.
3. After loading CS (via a far jump or fall call or by a TSS switch) to refer to a 32-bit descriptor.
I wouldn't use a far CALL nor (total worse) a task-switch here.
if you're brave you could use what I do for mode switches but only after
the initial PM and stack setup:

PUSH EFL ;needs 66 if within 16 bit code
CLI
PUSH new_descriptor ;I use immediate constants here
PUSH new_offset ;
IRET
Post by James Harris
The code will finally be in PM32.
could be in PM16 as well.
Post by James Harris
Post by James Harris
I knew the above but the following points are of particular interest
just now as I had not considered them before - or if I had then I had
forgotten the subtleties of the problem.
OK, nothing new on the matter to me.
Post by James Harris
Post by James Harris
3. That jump instruction has a 16-bit form and a 32-bit form. It is
encoded in hex as
EA oo oo ss ss (16-bit form)
EA oo oo oo oo ss ss (32-bit form)
where the Ss are the selector and the Os are the offset as hex bytes.
The above info implies that it's the /short/ version of the jump which is required.
yes of course because this jump is still RM16 [NOT PM16] code.
Post by James Harris
Post by James Harris
5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
still in 16-bit mode. Right?
That turns out to be right.
Yes still in 16 bit Realmode, until CS is loaded.
Post by James Harris
Post by James Harris
6. Now, where it gets interesting is that the offset field of the EA
jump instruction seems to be an offset not from the jump instruction but
from the start if the segment. Is that correct? If so then we have to be
careful which jump form is encoded, as follows.
If the executing code is in the low 64k of a descriptor's space then we
can encode the simple
EA oo oo ss ss
because the offset can fit in 16 bits. But if the executing code is
above the 64k mark relative to the start of the segment then we need to
encode the 32-bit form for 16-bit mode, i.e.
66 EA oo oo oo oo ss ss
only if there is already code loaded (guess how to do before PM..).
Post by James Harris
Post by James Harris
Solution 1. Set up a temporary GDT entry to point to the place in memory
where the code is running. In the above case, the GDT entry could point
at 0x10000 and then the jump offset would be 0x2345, leading to the jump
instruction being encoded as
EA 45 23 ss ss (bytes shown in memory order, i.e. little endian)
0000:0600 66 EA 18 00 45 23 10 00 jmp far 0018:102345 ;assume FLAT CS
FFFF:2355 ;aka 0010:2345 PM32 code
Loading...