The EA jump immediately after enabling protected mode by setting PE in CR0

Discussion:

(too old to reply)

James Harris

2015-04-26 18:54:05 UTC

You know that, per Intel's directions, after setting the CR0 PE flag
with

mov eax, cr0
or al, 1
mov cr0, eax

we are expected to have something like

jmp seg:pmode_running

I had taken that jump instruction for granted but the recent Qemu/GDT
thread has brought up some issues about the jump, as follows.

1. The jump appears to be necessary in order to put the correct pmode
GDT entry number in CS (in its upper 13 bits, i.e. shifted left 3 bits)
and also to set the low bits of CS so that they contain the CPL and TI,
all of which should be zero.

2. On the 386 any kind of jump was needed immediately following the MOV
to CR0 - even a near jump - in order to flush the prefetch queue. On
Pentium Pro and later (and maybe even on the Pentium 1) there is no need
to flush the queue but the far jump keeps things compatible as it will
flush the prefetch queue on early CPUs as well as load the Pmode CS on
all of them.

I knew the above but the following points are of particular interest
just now as I had not considered them before - or if I had then I had
forgotten the subtleties of the problem.

3. That jump instruction has a 16-bit form and a 32-bit form. It is
encoded in hex as

EA oo oo ss ss (16-bit form)
EA oo oo oo oo ss ss (32-bit form)

where the Ss are the selector and the Os are the offset as hex bytes.

4. Depending on the mode the CPU thinks it is operating in at the time
that it hits the jump instruction the 16-bit and 32-bit forms may need
to be encoded with a leading 0x66 so they appear as

66 EA oo oo ss ss (16-bit form in 32-bit mode)
66 EA oo oo oo oo ss ss (32-bit form in 16-bit mode)

5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
still in 16-bit mode. Right?

6. Now, where it gets interesting is that the offset field of the EA
jump instruction seems to be an offset not from the jump instruction but
from the start if the segment. Is that correct? If so then we have to be
careful which jump form is encoded, as follows.

If the executing code is in the low 64k of a descriptor's space then we
can encode the simple

EA oo oo ss ss

because the offset can fit in 16 bits. But if the executing code is
above the 64k mark relative to the start of the segment then we need to
encode the 32-bit form for 16-bit mode, i.e.

66 EA oo oo oo oo ss ss

To make an example, say that the code that will enable Pmode is located
in memory so that the jump target is at physical address 0x12345. If the
GDT entry for privileged code has been set to describe all of memory,
i.e. from address 0 to address ff....fff, then it will be impossible to
use the 16-bit form of the EA jump instruction. Correct?

Solutions?

Solution 1. Set up a temporary GDT entry to point to the place in memory
where the code is running. In the above case, the GDT entry could point
at 0x10000 and then the jump offset would be 0x2345, leading to the jump
instruction being encoded as

EA 45 23 ss ss (bytes shown in memory order, i.e. little endian)

Solution 2. Modify the jump instruction so that the code's location
relative to the start of the privileged code segment does not matter.
That leads to

66 EA 45 23 01 00 ss ss (bytes shown in little-endian order)

When I did this before I used a temporary GDT entry to point to the
executing code, i.e. solution 1, but solution 2 also has merits.

I should say that the above is just as written after working out what I
think was going on and may contain errors for which I would welcome your
corrections.

Interesting subtlety, no?

Any thoughts/comments?

James

wolfgang kern

2015-04-26 21:42:21 UTC

Permalink

Post by James Harris
You know that, per Intel's directions, after setting the CR0 PE flag
with
mov eax, cr0
or al, 1
mov cr0, eax
we are expected to have something like
jmp seg:pmode_running
I had taken that jump instruction for granted but the recent Qemu/GDT
thread has brought up some issues about the jump, as follows.
1. The jump appears to be necessary in order to put the correct pmode
GDT entry number in CS (in its upper 13 bits, i.e. shifted left 3 bits)
and also to set the low bits of CS so that they contain the CPL and TI,
all of which should be zero.

the value in a PM seg-reg is nothing else than the offset within
the GDT (lower bits are ignored/used elsewhere, so the GDT must
be 8 byte aligned). Nothing is shifted here.

Post by James Harris
2. On the 386 any kind of jump was needed immediately following the MOV
to CR0 - even a near jump - in order to flush the prefetch queue. On
Pentium Pro and later (and maybe even on the Pentium 1) there is no need
to flush the queue but the far jump keeps things compatible as it will
flush the prefetch queue on early CPUs as well as load the Pmode CS on
all of them.

?? isn't this far jmp 'the switch point'.

Post by James Harris
I knew the above but the following points are of particular interest
just now as I had not considered them before - or if I had then I had
forgotten the subtleties of the problem.
3. That jump instruction has a 16-bit form and a 32-bit form. It is
encoded in hex as
EA oo oo ss ss (16-bit form)
EA oo oo oo oo ss ss (32-bit form)
where the Ss are the selector and the Os are the offset as hex bytes.
4. Depending on the mode the CPU thinks it is operating in at the time
that it hits the jump instruction the 16-bit and 32-bit forms may need
to be encoded with a leading 0x66 so they appear as
66 EA oo oo ss ss (16-bit form in 32-bit mode)
66 EA oo oo oo oo ss ss (32-bit form in 16-bit mode)
5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
still in 16-bit mode. Right?

Right, the far jmp is 'the switch'.

Yes, and it also works if the new PM-CS use RM-CS*16 as base, and
because the offsets are equal then it may help stupid asm-tools too.
If a boot-sequence use 0:7c00 the PM-base can be flat (zero) as well.

Post by James Harris
But if the executing code is above the 64k mark relative to the
start of the segment then we need to encode the 32-bit form for
16-bit mode, i.e.
66 EA oo oo oo oo ss ss

sure, that's the only way if wont fit into 16-bits.

Post by James Harris
To make an example, say that the code that will enable Pmode is located
in memory so that the jump target is at physical address 0x12345. If the
GDT entry for privileged code has been set to describe all of memory,
i.e. from address 0 to address ff....fff, then it will be impossible to
use the 16-bit form of the EA jump instruction. Correct?
Solutions?
Solution 1. Set up a temporary GDT entry to point to the place in memory
where the code is running. In the above case, the GDT entry could point
at 0x10000 and then the jump offset would be 0x2345, leading to the jump
instruction being encoded as
EA 45 23 ss ss (bytes shown in memory order, i.e. little endian)
Solution 2. Modify the jump instruction so that the code's location
relative to the start of the privileged code segment does not matter.
That leads to
66 EA 45 23 01 00 ss ss (bytes shown in little-endian order)
When I did this before I used a temporary GDT entry to point to the
executing code, i.e. solution 1, but solution 2 also has merits.
I should say that the above is just as written after working out what I
think was going on and may contain errors for which I would welcome your
corrections.
Interesting subtlety, no?
Any thoughts/comments?

my OS switches forth and back between modes, so there are both variants,
the shorter for PM16<->RM and the 8 byte form for PM16/RM<->PM32.

__
wolfgang

James Harris

2015-04-27 08:58:56 UTC

Permalink

...

Post by James Harris
1. The jump appears to be necessary in order to put the correct pmode
GDT entry number in CS (in its upper 13 bits, i.e. shifted left 3
bits) and also to set the low bits of CS so that they contain the CPL
and TI, all of which should be zero.

the value in a PM seg-reg is nothing else than the offset within the
GDT (lower bits are ignored/used elsewhere, so the GDT must be 8 byte
aligned). Nothing is shifted here.

Did you see somewhere that the GDT must be so aligned? For sure it is a
good idea but is it needed?

Post by James Harris
2. On the 386 any kind of jump was needed immediately following the
MOV to CR0 - even a near jump - in order to flush the prefetch queue.
On Pentium Pro and later (and maybe even on the Pentium 1) there is
no need to flush the queue but the far jump keeps things compatible
as it will flush the prefetch queue on early CPUs as well as load the
Pmode CS on all of them.

?? isn't this far jmp 'the switch point'.

Well, AIUI on the 386 you could set CR0.PE and go off and do a bunch of
processing in Pmode before doing a far jump. That post-PE processing
could even include the LGDT instruction!

It seems the 486 was similar to the 386 in what was required after
setting PE. Link below but is a large download:

https://ia601608.us.archive.org/22/items/bitsavers_intel80486mmersReferenceManual1990_29642780/i486_Processor_Programmers_Reference_Manual_1990.pdf

...

Post by James Harris
5. Immediately after the MOV to CR0 to set the pmode bit the CPU is
still in 16-bit mode. Right?

Right, the far jmp is 'the switch'.

Not on the 386 or 486. Two more that you would classify as for the
museum...?

James

wolfgang kern

2015-04-27 13:20:16 UTC

Permalink

Post by James Harris
...

Post by James Harris
1. The jump appears to be necessary in order to put the correct pmode
GDT entry number in CS (in its upper 13 bits, i.e. shifted left 3 bits)
and also to set the low bits of CS so that they contain the CPL and TI,
all of which should be zero.

the value in a PM seg-reg is nothing else than the offset within the GDT
(lower bits are ignored/used elsewhere, so the GDT must be 8 byte
aligned). Nothing is shifted here.

Did you see somewhere that the GDT must be so aligned?
For sure it is a good idea but is it needed?

Yes! seems all manuals from 286 onward tell it.
Yes! it will raise an exception on the first access otherwise.