32-bit x64

Discussion:

32-bit x64

(too old to reply)

muta...@gmail.com

2021-03-13 00:56:33 UTC

Is it possible to write x64 code that doesn't
disturb the top 32 bits of registers? As far
as possible I would like to stick to using
pure 80386 instructions and registers.

I'm only interested in application code, e.g.
printf("hello, world\n");

Then we could run 32-bit executables on a
64-bit OS without needing to switch modes.
A different executable, sure, but that's not
my fault. That's the fault of the 80386 and
x64 designers who apparently didn't think
ahead.

Thanks. Paul.

Alexei A. Frounze

2021-03-13 02:20:02 UTC

Permalink

Post by ***@gmail.com
Is it possible to write x64 code that doesn't
disturb the top 32 bits of registers?

Not quite.
Either you get them zeroed or you need to use carefully
chosen and scheduled 64-bit instructions that would preserve
those 32 most significant bits.

Post by ***@gmail.com
As far
as possible I would like to stick to using
pure 80386 instructions and registers.

Some instructions aren't available in 64-bit/long mode because
their opcodes were taken for other things (e.g. REX prefixes)
or there's no way to choose a particular operand/address
size with the 66h and 67h prefixes anymore.

Post by ***@gmail.com
I'm only interested in application code, e.g.
printf("hello, world\n");
Then we could run 32-bit executables on a
64-bit OS without needing to switch modes.

32-bit and 16-bit protected mode programs can coexist with
64-bit/long mode programs and switching between them
shouldn't be problematic (should be less problematic than
getting real or virtual 8086 mode anyway).

Alex

muta...@gmail.com

2021-03-13 11:12:05 UTC

Permalink

Post by Alexei A. Frounze

Post by ***@gmail.com
Is it possible to write x64 code that doesn't
disturb the top 32 bits of registers?

Not quite.
Either you get them zeroed or you need to use carefully
chosen and scheduled 64-bit instructions that would preserve
those 32 most significant bits.

Ok, I don't want them zeroed. But I am happy to be
careful with x64 instructions.

For simplicity, let's say I put the 32-bit executable
at location 0x1 0000 0000 (4 GiB) and load all the
64-bit registers with that hex value.

All memory references if done in 64-bit mode will
access memory in that 4 GiB to 8 GiB region, which
is what I am after.

But I can't change any of the high 32-bits, or all
my memory references will be screwed.

Post by Alexei A. Frounze

Post by ***@gmail.com
As far
as possible I would like to stick to using
pure 80386 instructions and registers.

I don't need to change address size. That will always
be 64-bit. But I need to be able to add a 32-bit integer
to a 64-bit address rather than requiring an instruction
that uses a 64-bit integer.

Post by Alexei A. Frounze

Post by ***@gmail.com
I'm only interested in application code, e.g.
printf("hello, world\n");
Then we could run 32-bit executables on a
64-bit OS without needing to switch modes.

I don't want to switch modes, which is presumably a
privileged instruction. I want to do this from user code.
64-bit user code loading and invoking a 32-bit program
at the 4 GiB location (or actually some other 4 GiB
boundary, obtained by allocating 8 GiB from Windows).

BFN. Paul.

Alexei A. Frounze

2021-03-14 03:29:31 UTC

Permalink

On Saturday, March 13, 2021 at 3:12:06 AM UTC-8, ***@gmail.com wrote:
...

Post by ***@gmail.com
I don't want to switch modes, which is presumably a
privileged instruction. I want to do this from user code.
64-bit user code loading and invoking a 32-bit program
at the 4 GiB location (or actually some other 4 GiB
boundary, obtained by allocating 8 GiB from Windows).

You can translate 16-bit code into 64-bit code and stop
making doomed imaginary experiments.
I think it's Rod who's been praising binary translation.

OTOH, current x86-64 CPUs from intel and AMD support
virtualization in hardware and you could use that too.

At the same time, nowadays there isn't much value in
old 16-bit code or compatibility with long gone systems.
You could simply interpret/emulate old code at effective
speeds close to those of the native systems of the past.

Alex

Rod Pemberton

2021-03-14 04:04:38 UTC

Permalink

On Sat, 13 Mar 2021 19:29:31 -0800 (PST)

wrote: ...

You can translate 16-bit code into 64-bit code and stop
making doomed imaginary experiments.

Yes.

On the technical merits, V86 is ideal for
executing 16-bit RM code under control of
32-bit PM, as long as the code is executing
natively on x86, i.e., not being emulated.

Except, V86 mode didn't fit well with my OS
design goals. So, I wasn't interested in
V86 mode. Apparently, it doesn't fit
with Paul's goals either.

OTOH, current x86-64 CPUs from intel and AMD support
virtualization in hardware and you could use that too.

Yes.

I haven't really looked into virtualization
for my OS, but I suspect it has the
same set of issues as V86 mode.

I seem to recall that he's been executing
his x86 code on an emulator for an
IBM mainframe. If so, he probably wants
a software solution instead of hardware,
but he'll have to state his desires here.

I think it's Rod who's been praising binary translation.

Yes.

The DEC Alpha's FX!32 was excellent for
x86 code emulation, and emitted binary
translated code for future execution.

https://en.wikipedia.org/wiki/FX!32

The problem that I ran into with binary
translation was that I couldn't find
anything that converted 16-bit x86
to 32-bit x86. Mostly, binary translation
seemed to be from one processor to another.
It's like trying to find a C-to-C compiler ...

--
Clinton: biter. Trump: grabber. Cuomo: groper. Biden: mauler.

muta...@gmail.com

2021-03-15 06:29:47 UTC

Permalink

Post by Rod Pemberton
I seem to recall that he's been executing
his x86 code on an emulator for an
IBM mainframe. If so, he probably wants
a software solution instead of hardware,
but he'll have to state his desires here.

Actually I didn't know *why* I wanted to do that,
I just thought it would be an interesting thing
to do.

But you are totally correct. If Hercules/380 has
been built as x64 so that it can present an entire
4 GiB to the S/380 box, so that I can then run
PDOS/3X0, and it also gets the entire 4 GiB, I
then want to be be able to load 32-bit x64
executables into the address space managed
by PDOS/3X0 and then switch to an 80386+
coprocessor and execute that executable and
have it run at native speed, with full EBCDIC
data, and then return to PDOS/3X0 without
anyone at all knowing or caring or complaining.

So again - what 32-bit instructions can I actually
execute while in x64 mode? Enough to do a
GCC compile? GCC must not access memory
beyond its allocated region.

Note that when I say "switch coprocessor", I
mean that Hercules/380, while running x64
instructions with 64-bit pointers, will do this:

static int (*genstart)(void (*cbfunc)(void *cbdata, int funccode, void *retptr, char *str), void *cbdata);

rc = genstart(cbfunc, NULL);

ie a bog standard function call, which the OS has
no knowledge of.

Hercules/380 itself will also have no knowledge of
what it is doing. Only PDOS/3X0 knows what it is
doing, as it has executed 32-bit S/380 instructions
to set up an environment ready for the above
function call, and just said to Hercules "everything
is set to go, just call the function at this address
please".

Thankyou for teasing out what was actually possible
if you bypass the hardware.

BFN. Paul.

muta...@gmail.com

2021-03-15 17:05:34 UTC

Permalink

Post by Alexei A. Frounze

You can translate 16-bit code into 64-bit code and stop
making doomed imaginary experiments.

How about long mode processors get extended by
the addition of ECS, EDS and EES as brand new
registers, which allows the 64-bit address space
to be (identically) mapped via two different methods,
and it is a user-mode instruction to switch between these
two different modes, so that I can run my 80386
software in peace, ie in long mode.

BFN. Paul.

muta...@gmail.com

2021-03-15 17:25:17 UTC

Permalink

Post by ***@gmail.com
How about long mode processors get extended by
the addition of ECS, EDS and EES as brand new
registers, which allows the 64-bit address space
to be (identically) mapped via two different methods,
and it is a user-mode instruction to switch between these
two different modes, so that I can run my 80386
software in peace, ie in long mode.

And another user-mode instruction to start using
cs/ds/es as maps onto the first 4 GiB of the long
mode address space using 16-bit shifts of the
segment registers, and another user-mode instruction
to do the same thing except with 4-bit shifts of the
segment registers to map the first 1 MiB + 64k of
the address space, and another user-mode
instruction to make it just 1 MiB.

Why should long mode give a shit about me doing that?
It's perfectly logical.

All without privilege. I don't want to be able to modify
the interrupt vectors.

BFN. Paul.

muta...@gmail.com

2021-03-15 17:38:16 UTC

Permalink

Post by ***@gmail.com
All without privilege. I don't want to be able to modify
the interrupt vectors.

Actually, why can't I change the interrupt vectors in
memory location around 0? These aren't the real
interrupts and will just point to memory within my
address space. Who cares?

So I want to change where INT 21H points to, as
well as being able to do an INT 21H and get to that
location. The absolute location that INT 21H points
to will differ depending on whether I activated 4-bit
shifts or 16-bit shifts, my choice.

BFN. Paul.

muta...@gmail.com

2021-03-15 18:34:45 UTC

Permalink

Post by ***@gmail.com
location. The absolute location that INT 21H points
to will differ depending on whether I activated 4-bit
shifts or 16-bit shifts, my choice.

Assuming I have activated 16-bit shifts, and I am
loading a medium memory model MSDOS executable,
is it possible to adjust the offsets within the executable,
not just the segment?

Thanks. Paul.

muta...@gmail.com

2021-03-15 18:44:44 UTC

Permalink

Note that Windows is free to install its crap above
the 4 GiB location of MY address space. Preferably
somewhere around the 16 EiB mark.

I want the ENTIRE region below 4 GiB of MY
address space to myself.

BFN. Paul.

muta...@gmail.com

2021-03-15 19:55:33 UTC

Permalink

Assuming all this is set up, via long mode processor
enhancements and Windows fixing their shit, is there
any reason why when someone does an "out"
instruction that that can't invoke a routine at a
particular address in memory (within 1 MiB or
4 GiB depending on the shift) so that I can interpret
what they're trying to do?

How about another set of interrupt vectors by
port number?

Thanks. Paul.

Scott Lurndal

2021-03-15 22:29:27 UTC

Permalink

Post by ***@gmail.com
Assuming all this is set up, via long mode processor
enhancements and Windows fixing their shit, is there
any reason why when someone does an "out"
instruction that that can't invoke a routine at a
particular address in memory (within 1 MiB or
4 GiB depending on the shift) so that I can interpret
what they're trying to do?

Yes, of course there is. And there has been for well over
a decade. If you'd bother to familiarize yourself with the
state of the art in processors, rather than something that
wasn't even state of the art in its day....

Hint. Do a search for Intel VMX.

muta...@gmail.com

2021-03-16 01:07:57 UTC

Permalink

I prefer inventing my own processors. Ones that can
run 16-bit programs, including ones that expect
address wrap, and ones that don't, all at native speed
in long mode. THAT is state of the art.

I think a fundamental problem might be that people are
trying to run MSDOS itself rather than conforming
MSDOS applications. They haven't considered the
possibility of someone providing a different OS to
cope with the MSDOS apps, instead of relegating
them to a different VM, and then even abandoning
that, and then abandoning the 80386 to some
SysWOW crap too.

BFN. Paul.

muta...@gmail.com

2021-03-16 01:52:58 UTC

Permalink

Post by ***@gmail.com
I think a fundamental problem might be that people are
trying to run MSDOS itself rather than conforming
MSDOS applications. They haven't considered the
possibility of someone providing a different OS to
cope with the MSDOS apps, instead of relegating
them to a different VM, and then even abandoning
that, and then abandoning the 80386 to some
SysWOW crap too.

There should have only been limited (or no) attempt to
support applications that write directly to 0xb8000.
That area should have just been application memory
with the new environment. If you don't follow the
rules, you shouldn't expect your applications to run.
But if you follow the rules, you should expect them
to continue to run even in long mode, including the
possibility of 4 GiB worth of MSDOS programs
being spawned.

And it should have been designed into the system
how to load absolute addresses (or rather, why
you shouldn't be). And none of this LONG and
SHORT and int32 crap either. As if one day you
could seriously go #define LONG short and have
your applications work. This is not abstraction.

The correct abstraction is to allow an 80386 program
to invoke a 68000 program if a suitable coprocessor
is available. Or an 8086 application. By switching down
to real mode, or at least, appearing to, and then taking
the INT 21H service request and passing it back up to
the caller to process. Where the caller could be a
68000, not necessarily an x64 in long mode.

The S/390 has this already to some extent. The ability
*in user mode* to switch down to a different AMODE.
So long as you branch down to low memory before
doing the switch, everything is fine.

BFN. Paul.

muta...@gmail.com

2021-03-16 06:26:45 UTC

Permalink

My latest analysis is here:

https://groups.io/g/hercules-380/message/201

BFN. Paul.

wolfgang kern

2021-03-17 06:15:03 UTC

Permalink

Post by ***@gmail.com
There should have only been limited (or no) attempt to
support applications that write directly to 0xb8000.
That area should have just been application memory
with the new environment. If you don't follow the
rules, you shouldn't expect your applications to run.
But if you follow the rules, you should expect them
to continue to run even in long mode, including the
possibility of 4 GiB worth of MSDOS programs
being spawned.

been there, I can any run 16bit code in Long Mode. but I emulate
DOS-functions, trap all I/O-access and all INTs and convert A0000-BFFFF
access to LFB (usually at 3rd GB) to allow all my GUI support.
__
wolfgang

muta...@gmail.com

2021-03-18 15:46:57 UTC

Permalink

Post by wolfgang kern

been there, I can any run 16bit code in Long Mode. but I emulate
DOS-functions, trap all I/O-access and all INTs and convert A0000-BFFFF
access to LFB (usually at 3rd GB) to allow all my GUI support.

The 16-bit code you run - are you running 8086 instructions
or x64 instructions?

If the former, how is that possible? Are you doing interpretation?

Also, when a x64 PC starts in legacy mode, it is presumably
in real mode. Are the x64 instructions valid? Do they interfere
with 8086 instructions? And are addresses 16:64?

Thanks. Paul.

Rod Pemberton

2021-03-19 00:02:01 UTC

Permalink

On Thu, 18 Mar 2021 08:46:57 -0700 (PDT)

Post by ***@gmail.com
The 16-bit code you run - are you running 8086 instructions
or x64 instructions?

CM16 is the 64-bit equivalent of PM16 for 32-bit protected mode.

Read up on x86 processor modes by looking at this nice chart:
https://www.sandpile.org/x86/mode.htm

From the chart, x86 has 6 16-bit modes, 3 32-bit modes, and 1 64-bit
mode. That's 10 modes in total.

RM = real mode
VM = virtual mode i.e., versions of V86 mode
PM = protected mode
CM = compatibility mode

--
Clinton: biter. Trump: grabber. Cuomo: groper. Biden: mauler.

muta...@gmail.com

2021-03-19 10:14:23 UTC

Permalink

Post by Rod Pemberton
CM16 is the 64-bit equivalent of PM16 for 32-bit protected mode.

Oh I see. Thanks.

Post by Rod Pemberton
https://www.sandpile.org/x86/mode.htm

Ok, so I think that chart answers my other question.
There is an RM32 that allows me to do 16:32
addressing, but it won't give me 16:64.

BFN. Paul.

wolfgang kern

2021-03-20 07:53:02 UTC

Permalink

Post by ***@gmail.com

Post by Rod Pemberton
CM16 is the 64-bit equivalent of PM16 for 32-bit protected mode.

Oh I see. Thanks.

Post by Rod Pemberton
https://www.sandpile.org/x86/mode.htm

Ok, so I think that chart answers my other question.
There is an RM32 that allows me to do 16:32
addressing, but it won't give me 16:64.

16:64 can't be done because Long Mode doesn't have CS segments.
64 bit mean more than 10^19 address locations, you need more ? :)

long mode works only with paging anyway, so use that instead.
__
wolfgang

wolfgang kern

2021-03-20 08:02:21 UTC

Permalink

Post by ***@gmail.com

Post by wolfgang kern

been there, I can any run 16bit code in Long Mode. but I emulate
DOS-functions, trap all I/O-access and all INTs and convert A0000-BFFFF
access to LFB (usually at 3rd GB) to allow all my GUI support.

The 16-bit code you run - are you running 8086 instructions
or x64 instructions?

CM16, as Rod already answered this

Post by ***@gmail.com
If the former, how is that possible? Are you doing interpretation?
Also, when a x64 PC starts in legacy mode, it is presumably
in real mode. Are the x64 instructions valid? Do they interfere
with 8086 instructions? And are addresses 16:64?

64 bit mode is different to RM and PM, it has another instruction set.
to explain all this here wont fit in size of a use-net reply ...
you find all details in the manuals [RTFM !] AMD 1..6 and Intel 1..3.
__
wolfgang

Rod Pemberton

2021-03-20 11:12:17 UTC

Permalink

On Sat, 20 Mar 2021 09:02:21 +0100

Post by wolfgang kern
CM16, as Rod already answered this

Wolfgang, what is your favorite processor mode(s)?

Do you like PM64?

Are the CM16/CM32 modes easy to use?

--
Countries that won't talk to Biden: North Korea, China, Russia, Iran.

wolfgang kern

2021-03-20 20:29:14 UTC

Permalink

Post by Rod Pemberton
On Sat, 20 Mar 2021 09:02:21 +0100

Post by wolfgang kern
CM16, as Rod already answered this

Wolfgang, what is your favorite processor mode(s)?
Do you like PM64?
Are the CM16/CM32 modes easy to use?

My favorite ? Z80
I sold only PM32 with a restricted link to RM16 modules even
all the switches for LM,CM,PM,RM are in the OS core.

There were just no demands from clients to do more, my tools
can make use of all "mixtures" :) but I use it rare yet anyway.

CM16/32<->LM switching is pretty easy and fast too.
RM to LM is short as well, but LM to RM needs several stages.

my preference was/is always short code, so LM64 is not my best choice
even some interesting powerful instructions work only within LM.
__
wolfgang

muta...@gmail.com

2021-03-22 09:30:50 UTC

Permalink

Post by ***@gmail.com
Assuming I have activated 16-bit shifts, and I am
loading a medium memory model MSDOS executable,
is it possible to adjust the offsets within the executable,
not just the segment?

According to this:

https://wiki.osdev.org/MZ

The relocation table contains both a segment and an
offset. And my existing code looks like this:

/* This 16:16 arithmetic will work because the exeStart
offset is 0. */
fixSeg = (unsigned int *)
((unsigned long)exeStart + relocStart[relocI]);
*fixSeg = *fixSeg + addSeg;

In other words, the pointer is pointing to a segment to
be corrected. It is unclear if there is an offset before
or after that segment. I would assume that these are
far pointers, so there is probably an offset at that
location too, even if it isn't official.

Perhaps the solution is to simply say that only executables
that have an offset there are supported, and most
executables would satisfy that.

I think I would be happy if we backdated some sensible
rules to the dawn of the 8086, to future-proof it. I don't
really care if in practice that means we need to recompile
all 8086 executables to get proper relocation information
in the executable, or perhaps proper startup code, or
perhaps a proper OS call (or register) on entry to the
startup code, to dynamically accept a shift factor.

BFN. Paul.

muta...@gmail.com

2021-03-22 09:43:51 UTC

Permalink

Post by ***@gmail.com
I think I would be happy if we backdated some sensible
rules to the dawn of the 8086, to future-proof it. I don't
really care if in practice that means we need to recompile
all 8086 executables to get proper relocation information
in the executable, or perhaps proper startup code, or
perhaps a proper OS call (or register) on entry to the
startup code, to dynamically accept a shift factor.

Note that it was also necessary to backdate a sensible
rule for Amiga executables, to give the OS an opportunity
to override the SysBase in situations (such as different
hardware, such as the Atari) where the OS doesn't have
access to absolute location 4 in memory. It is very silly
hardcoding the number "4" in every application. Well, as
a default, that is fine, but the ability to override it should
have been provided. Here is the relevant (new) code:

https://sourceforge.net/p/pdos/gitcode/ci/master/tree/pdpclib/amistart.asm

BFN. Paul.

muta...@gmail.com

2021-03-22 10:31:29 UTC

Permalink

Post by ***@gmail.com
https://wiki.osdev.org/MZ

Note also that it says:

If both the minimum and maximum allocation fields are cleared, MS-DOS will attempt to load the executable as high as possible in memory.

That sounds very promising. It means it is not asking
for a particular amount of memory to run (so it
doesn't matter if that size is traditionally in 16-byte
multiples). So if Microsoft had encouraged all new
programs to zero out these fields and ... this ... and
... that - we would have been able to solve this
problem over time.

As it stands, the problem was "solved" with this cute
message:

C:\dospath>dir zcalc.exe
...
1992-10-25 12:01 33,808 zcalc.exe
...

C:\dospath>zcalc
This version of C:\dospath\zcalc.exe is not compatible with the version of Windows you're running. Check your computer's system information and then contact the software publisher.

I notice that the "MZ" format is extensible too. If
something needed to be added to support 16 bit
shifts, it could have been. Or an additional API
that it is dependent upon to retrieve that information,
and perhaps if it uses another API to obtain memory
using a "long" instead of requesting "paragraphs" in
an "int". And then there is the API itself. I would
instead like to receive control using the osfunc()
interface so that I could then make osfunc() calls
instead of doing INT 21H. Is there a way to make
this work? Obviously everything can be rewritten,
but is there a way of make it work and still provide
compatibility for some time?

I have the same situation with AmigaPDOS (designed
to replace AmigaOS). It needs to provide a SysBase
to run D7-compliant AmigaOS executables that get
their services via SysBase (normally location 4), while
still executing "generic" 68000 PDOS executables that
conform to the osfunc() "standard". I was planning on
keeping the executable format identical (Amiga Hunk)
but changing the first 4 bytes (magic cookie identifier)
to be 00000F3F instead of 000003F3, signifying the
different interface. But is that the right approach or is
there some philosophy that should be followed instead?

Thanks. Paul.

muta...@gmail.com

2021-03-16 10:37:12 UTC

Permalink

Post by Alexei A. Frounze

You can translate 16-bit code into 64-bit code and stop
making doomed imaginary experiments.

I have been thinking about this more. I believe it should
be possible to write what I am calling 16-bit and 32-bit
programs under x64. You may call it a 64-bit program,
but I call it a 32-bit program. All address references will
be 64-bit. But prior to running the program, the high
32 bits of each address register are set to a particular
location in memory, e.g. 1 if you were loaded at the
4 GiB mark.

It doesn't matter if the ENTRY instruction pushes all
64-bit registers onto the stack etc. That is transparent
to the 32-bit program. The 32-bit program merely
does 32-bit arithmetic to registers, and if that causes
or necessitates the top 32-bits (or top 1000 bits) being
cleared to 0 with a:

subtract rax,rax

that's fine, so long as when you've finished your
instruction it is restored with

mov rax,rdx

so that the high 32-bits or 1000 bits are restored to
whatever they were before (ie the pointer that you
actually need).

It is the job of the C compiler to sort this out for you.
All you need to do is tell it you want a 32-bit program
in long mode.

Another solution would be for the C compiler to have
dedicated address and data registers so that it stops
disturbing the high 32 bits or 1000 bits of registers.

BFN. Paul.

Paul Edwards

2023-08-16 23:14:38 UTC

Permalink

Post by ***@gmail.com
Is it possible to write x64 code that doesn't
disturb the top 32 bits of registers? As far
as possible I would like to stick to using
pure 80386 instructions and registers.
I'm only interested in application code, e.g.
printf("hello, world\n");
Then we could run 32-bit executables on a
64-bit OS without needing to switch modes.
A different executable, sure, but that's not
my fault. That's the fault of the 80386 and
x64 designers who apparently didn't think
ahead.

I have this working now as proof of concept.

See the bottom of "University Challenge x64" at http://pdos.org

Note that this is running (some) Win32 executables in LM64
under UEFI still in boot services.

This is NOT running 16-bit executables.

BFN. Paul.

Continue reading on narkive:

Search results for '32-bit x64' (Questions and Answers)

replies

I want to upgrade Windows XP (32-bit) to Windows 7 (64-bit). Is this possible? If so how do I do it?

started 2010-10-13 11:28:04 UTC

software

replies

bits-what is the best type for a sensitive mouth?

started 2009-01-27 09:46:04 UTC

horses

replies

simple p(x)/q(x) integration question?