Discussion:
assembler: short/long jump
(too old to reply)
Alexei A. Frounze
2015-02-10 09:32:24 UTC
Permalink
I'm still contemplating the idea of implementing a simple assembler for Smaller C to make it fully self-sufficient, easily portable (someone asked for a C compiler for xv6 on Stack Overflow) and a tad faster as it turns out that NASM can be horribly slow (I think I've mentioned that before).

But there's still one unsolved technical problem. Namely, I need NASM's ability to automatically substitute short and long jumps as necessary, that is, either with an 8-bit relative address or with a 16/32-bit one, depending on how far the target location is from the jump instruction.

It looks like it's not a trivial problem. I may have inquired about it before (Rod might be able to confirm), but I don't remember ever finding or learning a reasonably good solution.

There's one solution that I came up during the weekend, though. I wonder if you could suggest improvements or something radically better. Before I state it I should probably note that I've considered a dynamic programming solution, but it looks like the problem can't be trivially reduced to identical subproblems.

Components of the problem:
- regular instructions of known/fixed length
- relative jump instructions whose length isn't known beforehand
- align directives
- other instructions whose length may be subject to optimization similar to relative jumps (e.g. "mov ax, label2 - label1")

Desirable qualities of the solution:
- little memory overhead (should work in real mode in DOS)
- little I/O overhead
- cost not higher than quadratic with small coefficient

So, here's what I have so far...

Consider a normal, line-by-line assembling process.

If a relative jump instruction is encountered and its target label precedes it (=we've seen that label defined), figure out which relative offset should be there (8- or 16/32-bit) and go on.

If the jump's target label is unknown (=defined somewhere ahead), note the position of this jump, chose an 8-bit relative offset and go on until another 127 or more bytes of code are generated. If the target label is encountered somewhere between the instructions from these 127 bytes, keep the 8-bit relative address and go on. Otherwise, restart assembling from that jump instruction but use a 16/32-bit relative address now.

After this pass all label addresses are known and can be encoded into instructions.

This is the basic idea.

There are two flaws, however.

First, assembling the same 127 instructions again is bad. So, they should be cached.

Second, there may be other jump instructions between the first instruction and its target label and those other jump instructions may also need to be changed from 8-bit relative addresses to 16/32-bit ones, which has the effect of moving the target label farther away and possibly triggering reassembly of one or more of the preceding jumps. You can end up with a chain reaction.

A possible (imperfect) workaround might be this... While assembling instructions that follow a jump instruction, note all instructions whose length isn't known yet (align, other jumps) and maintain a lower and upper bound of the size of the code assembled so far since the jump instruction. If the jump target label is found before the upper bound reaches 127 bytes, the 8-bit relative address can be kept and the process continued. Otherwise it should be switched to 16/32-bit relative address and the code will be reassembled from after the jump.

Instructions like "mov ax, label2 - label1" complicate things further, but such instructions should be rare and I can always choose the longest encoding for them.

With this should be able to make most jumps short when possible and have relatively little code size overhead from the unnecessarily long relative addresses and immediates. The time spent in reassembly should be limited because the reassembly window is short (127 bytes at most) and most of the instructions in the window should not change and can be cached.

What do you think?

Alex
Benjamin David Lunt
2015-02-10 16:10:14 UTC
Permalink
Hi Alex,

[I wonder why sometimes Outlook Express doesn't prefix the quoted
text with '>' as in below. Some times I just add it myself, but
with this one being quite long...So sorry for the top post.]

Here is my solution. I haven't implemented it in my assembler yet,
but when I get the time and interest, I will.

Make your assembler at least three passes. The first pass finds
the offset to all labels and objects that will be referenced by
any type of instruction: Jmp, jcc, etc.

Make all of these references the larger access. For example,
if the user has placed [16-bits] at the top, use all 16-bit
references, even though you know an 8-bit reference will work.
If the user placed [32-bits] (or whatever NASM uses), use
all 32-bit references, still even though an 8-bit reference
will work.

(Note, a reverse reference of 8-bits can be stored as an
8-bit reference since you know the offset. However, all
forward references must be the largest common size.)

When you get to the end of the first pass, you have an exact
size and reference to all instructions, including any .align
issues.

At this point, you could skip and move to the last pass and
assemble the code to object code. Done.

However, the middle pass(es) are now to "shrink" these references.

Start at the top. When you get to your first reference, calculate
the distance. If it can be less than the 32-bit, shrink it.
Then adjust every reference from this offset to the end by the
difference. Move to the next reference, check, shrink, etc.

When you get to an .align, see if the alignment can be moved
to the next lower alignment. If so, do so, and shrink every
reference after this by the difference the move made.

Continue to the end.

Now, repeat the loop. Continue doing passes until you find
no more references to shrink.

For example, look at the following:

mov ax,1234h
jmp next ; 32-bit relative jump
...
...
...
align 16 ; at 0x00001231 in memory
...
...
next:
...
...

Let's assume that there is just over 128 bytes of code between
the jmp and the 'next' label. The first pass to shrink the code
will find that the jmp instruction can be shrunk to a 16-bit
relative jump. Great. Move on to the next item, which happens
to be the 'align 16' item. Now that we have removed one byte from
the jmp instruction, the align can move 16 bytes to a new alignment.
It was at 0x00001231 and made the next instruction at 0x00001240,
but since we eliminated a byte from the jmp instruction, the alignment
is now at 0x00001230, removing 16 bytes from every forward reference.

Now the jmp instruction can be shrunk even more to an 8-bit relative
jump since we moved the 'next' label closer by 16 bytes.

However, don't worry about it at this point. Continue on shrinking
what you can in the forward direction. As soon as you make another
pass, using the exact same assembler code, the next pass will find
that the jmp instruction can now be made an 8-bit relative and make
the change.

This is where NASM's command line of "number of passes" is used.
The more passes, the smaller the code, but the longer it takes
to assemble it. NASM will stop looping once it finds that no more
modifications can be made.

For example, if you tell NASM to do 30 passes, but it only takes
4 passes, and the 5th pass would do nothing, NASM stops, and continues
with the last pass to assemble to object code.

In other words, don't worry about any references lower in memory than
where you are currently checking. Only shrink anything from this
point on. The next pass will find anything before this point.

Except for extremely detailed and aligned code, usually only 4 passes
are needed, the first pass, the two middle passes, and the last pass.

I hope this makes sense.

Ben
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Forever Young Software
http://www.fysnet.net/index.htm
http://www.fysnet.net/osdesign_book_series.htm
To reply by email, please remove the zzzzzz's

Batteries not included, some Assembly required.


"Alexei A. Frounze" <***@gmail.com> wrote in message news:11fc839d-09e7-42a9-8263-***@googlegroups.com...
I'm still contemplating the idea of implementing a simple assembler for
Smaller C to make it fully self-sufficient, easily portable (someone asked
for a C compiler for xv6 on Stack Overflow) and a tad faster as it turns out
that NASM can be horribly slow (I think I've mentioned that before).

But there's still one unsolved technical problem. Namely, I need NASM's
ability to automatically substitute short and long jumps as necessary, that
is, either with an 8-bit relative address or with a 16/32-bit one, depending
on how far the target location is from the jump instruction.

It looks like it's not a trivial problem. I may have inquired about it
before (Rod might be able to confirm), but I don't remember ever finding or
learning a reasonably good solution.

There's one solution that I came up during the weekend, though. I wonder if
you could suggest improvements or something radically better. Before I state
it I should probably note that I've considered a dynamic programming
solution, but it looks like the problem can't be trivially reduced to
identical subproblems.

Components of the problem:
- regular instructions of known/fixed length
- relative jump instructions whose length isn't known beforehand
- align directives
- other instructions whose length may be subject to optimization similar to
relative jumps (e.g. "mov ax, label2 - label1")

Desirable qualities of the solution:
- little memory overhead (should work in real mode in DOS)
- little I/O overhead
- cost not higher than quadratic with small coefficient

So, here's what I have so far...

Consider a normal, line-by-line assembling process.

If a relative jump instruction is encountered and its target label precedes
it (=we've seen that label defined), figure out which relative offset should
be there (8- or 16/32-bit) and go on.

If the jump's target label is unknown (=defined somewhere ahead), note the
position of this jump, chose an 8-bit relative offset and go on until
another 127 or more bytes of code are generated. If the target label is
encountered somewhere between the instructions from these 127 bytes, keep
the 8-bit relative address and go on. Otherwise, restart assembling from
that jump instruction but use a 16/32-bit relative address now.

After this pass all label addresses are known and can be encoded into
instructions.

This is the basic idea.

There are two flaws, however.

First, assembling the same 127 instructions again is bad. So, they should be
cached.

Second, there may be other jump instructions between the first instruction
and its target label and those other jump instructions may also need to be
changed from 8-bit relative addresses to 16/32-bit ones, which has the
effect of moving the target label farther away and possibly triggering
reassembly of one or more of the preceding jumps. You can end up with a
chain reaction.

A possible (imperfect) workaround might be this... While assembling
instructions that follow a jump instruction, note all instructions whose
length isn't known yet (align, other jumps) and maintain a lower and upper
bound of the size of the code assembled so far since the jump instruction.
If the jump target label is found before the upper bound reaches 127 bytes,
the 8-bit relative address can be kept and the process continued. Otherwise
it should be switched to 16/32-bit relative address and the code will be
reassembled from after the jump.

Instructions like "mov ax, label2 - label1" complicate things further, but
such instructions should be rare and I can always choose the longest
encoding for them.

With this should be able to make most jumps short when possible and have
relatively little code size overhead from the unnecessarily long relative
addresses and immediates. The time spent in reassembly should be limited
because the reassembly window is short (127 bytes at most) and most of the
instructions in the window should not change and can be cached.

What do you think?

Alex
Alexei A. Frounze
2015-02-12 10:04:19 UTC
Permalink
Post by Benjamin David Lunt
Hi Alex,
[I wonder why sometimes Outlook Express doesn't prefix the quoted
text with '>' as in below. Some times I just add it myself, but
with this one being quite long...So sorry for the top post.]
Sorry, I forget about this peculiarity of google groups' web
interface. I could either use "Windows Live castrated Male" (modern
day Outlook Express) or try to remember to break long lines manually,
which is what I'm doing now.
Post by Benjamin David Lunt
Here is my solution. I haven't implemented it in my assembler yet,
but when I get the time and interest, I will.
Make your assembler at least three passes. The first pass finds
the offset to all labels and objects that will be referenced by
any type of instruction: Jmp, jcc, etc.
Make all of these references the larger access. For example,
if the user has placed [16-bits] at the top, use all 16-bit
references, even though you know an 8-bit reference will work.
If the user placed [32-bits] (or whatever NASM uses), use
all 32-bit references, still even though an 8-bit reference
will work.
(Note, a reverse reference of 8-bits can be stored as an
8-bit reference since you know the offset. However, all
forward references must be the largest common size.)
When you get to the end of the first pass, you have an exact
size and reference to all instructions, including any .align
issues.
At this point, you could skip and move to the last pass and
assemble the code to object code. Done.
However, the middle pass(es) are now to "shrink" these references.
Start at the top. When you get to your first reference, calculate
the distance. If it can be less than the 32-bit, shrink it.
Then adjust every reference from this offset to the end by the
difference. Move to the next reference, check, shrink, etc.
Right. Looks like quadratic in the number of jumps and labels.
This is why I don't quite like it. The only obvious and worse
solution is to try both options with every jump independently,
which will make it exponential. :)

...
Post by Benjamin David Lunt
Except for extremely detailed and aligned code, usually only 4 passes
are needed, the first pass, the two middle passes, and the last pass.
Or 5. Dunno. I don't have any stats at the moment. Perhaps I could
run NASM of Smaller C with a set of different -O<N> options to see how
many passes NASM does (=after which N there's no difference in output).
Post by Benjamin David Lunt
I hope this makes sense.
Sure, thanks a lot!

Alex
wolfgang kern
2015-02-10 20:04:17 UTC
Permalink
Alexei A. Frounze wrote:

sorry for top posting yet
(same problem as Ben with missing quote marks, which came from Goggle!)

My first idea was similar to Ben's reply, to make all forward references
large ones and later reduce their size if applicable.

But I think a compiler could figure all references in a first pass and
immediate decide if this fits a short or need a long jmp/jmpcc.
And of course this first pass must be able to look ahead 127 bytes then.
Seems you are already on such a path ...
__
wolfgang

<q>
I'm still contemplating the idea of implementing a simple assembler for
Smaller C to make it fully self-sufficient, easily portable (someone asked
for a C compiler for xv6 on Stack Overflow) and a tad faster as it turns out
that NASM can be horribly slow (I think I've mentioned that before).

But there's still one unsolved technical problem. Namely, I need NASM's
ability to automatically substitute short and long jumps as necessary, that
is, either with an 8-bit relative address or with a 16/32-bit one, depending
on how far the target location is from the jump instruction.

It looks like it's not a trivial problem. I may have inquired about it
before (Rod might be able to confirm), but I don't remember ever finding or
learning a reasonably good solution.

There's one solution that I came up during the weekend, though. I wonder if
you could suggest improvements or something radically better. Before I state
it I should probably note that I've considered a dynamic programming
solution, but it looks like the problem can't be trivially reduced to
identical subproblems.

Components of the problem:
- regular instructions of known/fixed length
- relative jump instructions whose length isn't known beforehand
- align directives
- other instructions whose length may be subject to optimization similar to
relative jumps (e.g. "mov ax, label2 - label1")

Desirable qualities of the solution:
- little memory overhead (should work in real mode in DOS)
- little I/O overhead
- cost not higher than quadratic with small coefficient

So, here's what I have so far...

Consider a normal, line-by-line assembling process.

If a relative jump instruction is encountered and its target label precedes
it (=we've seen that label defined), figure out which relative offset should
be there (8- or 16/32-bit) and go on.

If the jump's target label is unknown (=defined somewhere ahead), note the
position of this jump, chose an 8-bit relative offset and go on until
another 127 or more bytes of code are generated. If the target label is
encountered somewhere between the instructions from these 127 bytes, keep
the 8-bit relative address and go on. Otherwise, restart assembling from
that jump instruction but use a 16/32-bit relative address now.

After this pass all label addresses are known and can be encoded into
instructions.

This is the basic idea.

There are two flaws, however.

First, assembling the same 127 instructions again is bad. So, they should be
cached.

Second, there may be other jump instructions between the first instruction
and its target label and those other jump instructions may also need to be
changed from 8-bit relative addresses to 16/32-bit ones, which has the
effect of moving the target label farther away and possibly triggering
reassembly of one or more of the preceding jumps. You can end up with a
chain reaction.

A possible (imperfect) workaround might be this... While assembling
instructions that follow a jump instruction, note all instructions whose
length isn't known yet (align, other jumps) and maintain a lower and upper
bound of the size of the code assembled so far since the jump instruction.
If the jump target label is found before the upper bound reaches 127 bytes,
the 8-bit relative address can be kept and the process continued. Otherwise
it should be switched to 16/32-bit relative address and the code will be
reassembled from after the jump.

Instructions like "mov ax, label2 - label1" complicate things further, but
such instructions should be rare and I can always choose the longest
encoding for them.

With this should be able to make most jumps short when possible and have
relatively little code size overhead from the unnecessarily long relative
addresses and immediates. The time spent in reassembly should be limited
because the reassembly window is short (127 bytes at most) and most of the
instructions in the window should not change and can be cached.

What do you think?

Alex
Rod Pemberton
2015-02-10 23:44:40 UTC
Permalink
On Tue, 10 Feb 2015 04:32:24 -0500, Alexei A. Frounze
Post by Alexei A. Frounze
I'm still contemplating the idea of implementing a simple assembler for
Smaller C to make it fully self-sufficient, easily portable (someone
asked for a C compiler for xv6 on Stack Overflow) and a tad faster as it
turns out that NASM can be horribly slow (I think I've mentioned that
before).
But there's still one unsolved technical problem. Namely, I need NASM's
ability to automatically substitute short and long jumps as necessary,
that is, either with an 8-bit relative address or with a 16/32-bit one,
depending on how far the target location is from the jump instruction.
It looks like it's not a trivial problem. I may have inquired about it
before (Rod might be able to confirm), but I don't remember ever finding
or learning a reasonably good solution.
I vaguely recall a short conversation somewhat recently.
Post by Alexei A. Frounze
There's one solution that I came up during the weekend, though. I wonder
if you could suggest improvements or something radically better. Before
I state it I should probably note that I've considered a dynamic
programming solution, but it looks like the problem can't be trivially
reduced to identical subproblems.
- regular instructions of known/fixed length
- relative jump instructions whose length isn't known beforehand
- align directives
- other instructions whose length may be subject to optimization similar
to relative jumps (e.g. "mov ax, label2 - label1")
- little memory overhead (should work in real mode in DOS)
- little I/O overhead
- cost not higher than quadratic with small coefficient
So, here's what I have so far...
Consider a normal, line-by-line assembling process.
If a relative jump instruction is encountered and its target label
precedes it (=we've seen that label defined), figure out which relative
offset should be there (8- or 16/32-bit) and go on.
If the jump's target label is unknown (=defined somewhere ahead), note
the position of this jump, chose an 8-bit relative offset and go on
until another 127 or more bytes of code are generated. If the target
label is encountered somewhere between the instructions from these 127
bytes, keep the 8-bit relative address and go on. Otherwise, restart
assembling from that jump instruction but use a 16/32-bit relative
address now.
Why wouldn't you choose the large 16/32-bit relative address first?
I.e., I'd think that larger branches would be more frequent.

If you chose the larger branch by default, you can NOP pad the 8-bit
relative jump correction, since it should be shorter. This only adds
a byte NOP (or two etc) for the branch not taken path. This seems
very easy to do to me, but of course, wastes some bytes. Is it
enough to seriously affect code size or execution speed? ...

You could also choose to always use a jump for the path taken:

jnz label: ; could be short or long

becomes

jz over:
jmp label: ; always long jump
over:

but, this has much the same effect as chosing the larger conditional
branch by default.

If you chose the larger branch and shorten it by a byte (or two etc)
for the 8-bit branch correction, you should be able to simply move
the compiled byte sequence up, *if* there are no intervening branch
or jump or call instructions. I.e., the compiled sequence may be
entirely comprised of relative addresses and address independent
instructions.

Maybe, reworking the generation of jump/branches for C's control-flow
could minimize the branches required, or re-position them to be less
of an issue? E.g., less inteference due to fewer labels ... For
example, each type of loop can be written to test at either the
beginning or end. So, one implementation will have more branches
or jumps than the other, or may have branches which are shorter or
closer together.

You could also separate the control-flow from the code.
E.g.,

backward_label:
jnz forward_label1
; loop body
; ...
; ...
; ... too big
; ...
; ...
; ...
jmp backward_label1
forward_label1:

Becomes:

body1:
; loop body
; ...
; ...
; ... still big
; ...
; ...
; ...
ret

backward_label:
jnz forward_label1
call body1:
jmp backward_label1
forward_label1:

This shortens the length between the jumps and the labels
of control-flow. I.e., it can allow much more control-flow
using the 127 byte range short form, before the 16/32-bit
branch address form is required. It's not difficult to very
create a large loop body in C. Now, that loop can have
another loop or maybe two or three around it before the 127
bytes of range are consumed. This compacts the body to one
call instruction, which does slow the code execution ...
Post by Alexei A. Frounze
After this pass all label addresses are known and can
be encoded into instructions.
This is the basic idea.
There are two flaws, however.
First, assembling the same 127 instructions again is bad. So,
they should be cached.
I wouldn't say it's bad. It'll take a few milli- or
micro-seconds or somesuch, which can add up if it's done
frequently. You'll want to avoid creating a rip up and
retry compiler, if possible. ;-)

Does the compiled sequence change any? I think that if the
compiled sequence doesn't have any intermediate branches or
jumps or calls, then the compiled byte sequence doesn't change
any. I.e., all code should be independent or relative without
control-flow.

So, there should be some/many situations where you don't need
to recompile, just move the bytes up for the shorter branch.
But, you may need to recompile for other situations, e.g.,
intervening control-flow.
Post by Alexei A. Frounze
Second, there may be other jump instructions between the first
instruction and its target label and those other jump instructions
may also need to be changed from 8-bit relative addresses to 16/32-
bit ones, which has the effect of moving the target label farther
away and possibly triggering reassembly of one or more of the
preceding jumps. You can end up with a chain reaction.
Choose long branches first, rework for short? ... The long branches
will only throw the code off by a byte or a few, i.e., the long
address is either 1) exactly what it needs to be, if it's actually
a long branch or 2) very close to what it needs to be but slightly
larger, if it's actually a short branch. If it's actually a short,
you can subtract one (or two etc) to adjust.

My assumption is that it's easier to remove the extra byte(s), in
general, than it is to insert additional bytes, but that's just
an assumption. E.g., if the code is moved up, you can leave the
next label after that section where it is. This prevents forward
code computations from needing to be recomputed. Add a NOP pad to
the end of the moved up sequence,i.e., tail-pad. If the sequence
branches out or returns near the padding, there is no execution
cost for the NOP, just space cost. If it falls through or continues,
then there is an execution cost for the NOP.
Post by Alexei A. Frounze
A possible (imperfect) workaround might be this... While assembling
instructions that follow a jump instruction, note all instructions
whose length isn't known yet (align, other jumps) and maintain a
lower and upper bound of the size of the code assembled so far since
the jump instruction.
Relocate? With some extra branches/jumps/calls, you could relocate any
instructions to later or earlier in the compile sequence, in order
to have enough time or the needed info to determine the complete length.
This is basically the same as separating code and data into text and
data segments, except your moving code and not data. So, you need
some additional branch/jump code thrown in to fix the code.


Rod Pemberton
Alexei A. Frounze
2015-02-12 10:26:57 UTC
Permalink
Post by Rod Pemberton
On Tue, 10 Feb 2015 04:32:24 -0500, Alexei A. Frounze
Post by Alexei A. Frounze
I'm still contemplating the idea of implementing a simple assembler for
Smaller C to make it fully self-sufficient, easily portable (someone
asked for a C compiler for xv6 on Stack Overflow) and a tad faster as it
turns out that NASM can be horribly slow (I think I've mentioned that
before).
But there's still one unsolved technical problem. Namely, I need NASM's
ability to automatically substitute short and long jumps as necessary,
that is, either with an 8-bit relative address or with a 16/32-bit one,
depending on how far the target location is from the jump instruction.
It looks like it's not a trivial problem. I may have inquired about it
before (Rod might be able to confirm), but I don't remember ever finding
or learning a reasonably good solution.
I vaguely recall a short conversation somewhat recently.
Post by Alexei A. Frounze
There's one solution that I came up during the weekend, though. I wonder
if you could suggest improvements or something radically better. Before
I state it I should probably note that I've considered a dynamic
programming solution, but it looks like the problem can't be trivially
reduced to identical subproblems.
- regular instructions of known/fixed length
- relative jump instructions whose length isn't known beforehand
- align directives
- other instructions whose length may be subject to optimization similar
to relative jumps (e.g. "mov ax, label2 - label1")
- little memory overhead (should work in real mode in DOS)
- little I/O overhead
- cost not higher than quadratic with small coefficient
So, here's what I have so far...
Consider a normal, line-by-line assembling process.
If a relative jump instruction is encountered and its target label
precedes it (=we've seen that label defined), figure out which relative
offset should be there (8- or 16/32-bit) and go on.
If the jump's target label is unknown (=defined somewhere ahead), note
the position of this jump, chose an 8-bit relative offset and go on
until another 127 or more bytes of code are generated. If the target
label is encountered somewhere between the instructions from these 127
bytes, keep the 8-bit relative address and go on. Otherwise, restart
assembling from that jump instruction but use a 16/32-bit relative
address now.
Why wouldn't you choose the large 16/32-bit relative address first?
I.e., I'd think that larger branches would be more frequent.
If you chose the larger branch by default, you can NOP pad the 8-bit
relative jump correction, since it should be shorter. This only adds
a byte NOP (or two etc) for the branch not taken path. This seems
very easy to do to me, but of course, wastes some bytes. Is it
enough to seriously affect code size or execution speed? ...
I don't see a point in changing one long jump into a short jump and a nop, more work everywhere (in the assembler, in the CPU), no size reduction. I'd really like smaller size, which is still important in code compiled for real mode / DOS. A 16-bit-only version of Smaller C uses almost the entire 64K segment for code. I've mentioned this before. I still want that version to fit into 64K of code. Jumps contribute quite a lot to the size. At one point I determined that there was like one jump per 7 other instructions. And the average instruction length is 3 bytes. So, there may be like ~65536/3/8=2730 jump instructions and the overhead of using long instructions can be up to ~2730*1.5=4095 bytes in the worst case. I don't have that much free space. Even an extra 2K is too much here.
Post by Rod Pemberton
jnz label: ; could be short or long
becomes
jmp label: ; always long jump
but, this has much the same effect as chosing the larger conditional
branch by default.
I'm not concerned about supporting pre-i80386 CPUs (or pre-i80186?), where long forms of conditional jumps are unavailable, hence there's no point in doing this.
Post by Rod Pemberton
If you chose the larger branch and shorten it by a byte (or two etc)
for the 8-bit branch correction, you should be able to simply move
the compiled byte sequence up, *if* there are no intervening branch
or jump or call instructions. I.e., the compiled sequence may be
entirely comprised of relative addresses and address independent
instructions.
Sure, some simple optimizations can be done.
Post by Rod Pemberton
Maybe, reworking the generation of jump/branches for C's control-flow
could minimize the branches required, or re-position them to be less
of an issue?
I'm not going to make Smaller C an optimizing compiler any time soon (if ever). So, the answer is no.

...

Thanks!
Alex
Rod Pemberton
2015-02-11 00:02:39 UTC
Permalink
On Tue, 10 Feb 2015 04:32:24 -0500, Alexei A. Frounze
Post by Alexei A. Frounze
If the jump's target label is unknown (=defined somewhere ahead), note
the position of this jump,
One of the things I did with my language I'm working on was
to make each output stage be text, not binary, until the
very last. This means you can go back and correct the
text if you know what and where. E.g., for forward references,
you save the label name and where it's located, then when
you come across the label definition, you stop parsing the
main file, then search through the list of saved forward
references, use the saved location to correct the missing
info, then return parsing the main file. This can be done
with the main file and one temp file in C. Of course, coding
the code to use a text file essentially as a database in
C is mildly difficult, but trivial. I wouldn't want to
do this in 16-bit assembly under DOS though ...


Rod Pemberton
Alexei A. Frounze
2015-02-12 10:30:28 UTC
Permalink
Post by Rod Pemberton
On Tue, 10 Feb 2015 04:32:24 -0500, Alexei A. Frounze
Post by Alexei A. Frounze
If the jump's target label is unknown (=defined somewhere ahead), note
the position of this jump,
One of the things I did with my language I'm working on was
to make each output stage be text, not binary, until the
very last. This means you can go back and correct the
text if you know what and where.
Except when you're working on the original input and not its
copy or an intermediate result. But yes, some tricks can be
done as long as you have enough memory or disk space for temp
files.

...

Alex
James Harris
2015-02-11 11:39:54 UTC
Permalink
Post by Alexei A. Frounze
I'm still contemplating the idea of implementing a simple assembler for
Smaller C to make it fully self-sufficient, easily portable (someone asked
for a C compiler for xv6 on Stack Overflow) and a tad faster as it turns
out that NASM can be horribly slow (I think I've mentioned that before).
What is xv6?

Because Nasm assembles so well for so many targets I wonder if it is a good
idea to replace it. When you say that Nasm is slow I must say I've never
seen that. Could it be that it is only "slow" when it has to do what you are
now looking to do yourself so that your solution may ultimately be just as
slow?
Post by Alexei A. Frounze
But there's still one unsolved technical problem. Namely, I need NASM's
ability to automatically substitute short and long jumps as necessary,
that is, either with an 8-bit relative address or with a 16/32-bit one,
depending on how far the target location is from the jump instruction.
It looks like it's not a trivial problem. I may have inquired about it
before (Rod might be able to confirm), but I don't remember ever finding
or learning a reasonably good solution.
There's one solution that I came up during the weekend, though. I wonder
if you could suggest improvements or something radically better. Before I
state it I should probably note that I've considered a dynamic programming
solution, but it looks like the problem can't be trivially reduced to
identical subproblems.
- regular instructions of known/fixed length
- relative jump instructions whose length isn't known beforehand
- align directives
- other instructions whose length may be subject to optimization similar
to relative jumps (e.g. "mov ax, label2 - label1")
- little memory overhead (should work in real mode in DOS)
- little I/O overhead
- cost not higher than quadratic with small coefficient
IIUC this is a problem with two solutions: 1) generate full-size jumps and
iteratively reduce them, or 2) remove all jumps and build them up
byte-by-byte as needed.

I see you have align directives to allow for so option 2 may be easier. I
think the process is to remove jumps so that follow-throughs are elided
(they become zero-length jumps) then to scan through increasing each jump to
the minimum size it has to be at the time, and then to repeat that scan
until there are no more changes to make. Once that has been done you have
set each jump to the minimum size it has to be. The algorithm is guaranteed
to complete because it only ever increases jump sizes and it stops when
there are no more changes to make.

...
Post by Alexei A. Frounze
What do you think?
Keep Nasm!

James
Alexei A. Frounze
2015-02-12 08:38:43 UTC
Permalink
Post by James Harris
Post by Alexei A. Frounze
I'm still contemplating the idea of implementing a simple assembler for
Smaller C to make it fully self-sufficient, easily portable (someone asked
for a C compiler for xv6 on Stack Overflow) and a tad faster as it turns
out that NASM can be horribly slow (I think I've mentioned that before).
What is xv6?
http://en.wikipedia.org/wiki/Xv6 ?
Post by James Harris
Because Nasm assembles so well for so many targets I wonder if it is a good
idea to replace it.
You're sounding like NASM is endangered or something. :)
Post by James Harris
When you say that Nasm is slow I must say I've never
seen that.
Because most NASM input is small and because NASM is rarely run in DOSBox or something very old and slow. If you combine the two, you're screwed. With sufficiently large input you can get screwed even when running NASM natively on Windows. My non-optimizing compiler generates lots of asm code with lots of labels (some are clearly redundant as they have no instructions inbetween) and jumps.
Post by James Harris
Could it be that it is only "slow" when it has to do what you are
now looking to do yourself so that your solution may ultimately be just as
slow?
Perhaps. Perhaps not. But I think I have an idea of where its weak points are.

...
Post by James Harris
Keep Nasm!
I now absolutely must erase it from my system and then from all websites! :)

Seriously, it should be of no concern to you. I like NASM and I will use it where it makes sense. If I implement a replacement for Smaller C, it will still be syntax-compatible and produce the same kind of ELF object files, but it will be smaller, faster and more portable (NASM now requires C99, AFAIR, ANSI C isn't enough anymore, and you need some other tools besides a C compiler to compile NASM). It'll be a drop-in replacement specifically tailored to the basic needs of the compiler. I'm planning to add an option to choose one of the two assemblers just like now there's now an option to invoke gcc as a preprocessor. If you like so much the combination of Smaller C with NASM (did I infer it correctly?;), you'll have it.

Thanks,
Alex
James Harris
2015-02-12 09:21:57 UTC
Permalink
Post by Alexei A. Frounze
If you like so much the combination of Smaller C with NASM (did I
infer it correctly?;), you'll have it.
I have been following your posts about Smaller C with interest and it is
near to becoming something I might be able to use but I haven't tried it
yet. I did not post about it already because I figured you have enough
to do (!) but I occasionally check the wiki to see if it yet supports
what I need. If you want to know, the only omissions I have noticed are
struct pass-by-value and 32-bit or wider ints in 8086 code. There may be
other things I have missed but from what I have seen so far, other than
those two features it seems to do what I would need to replace bcc on
Linux.

James
Alexei A. Frounze
2015-02-12 09:50:17 UTC
Permalink
Post by James Harris
Post by Alexei A. Frounze
If you like so much the combination of Smaller C with NASM (did I
infer it correctly?;), you'll have it.
I have been following your posts about Smaller C with interest and it is
near to becoming something I might be able to use but I haven't tried it
yet. I did not post about it already because I figured you have enough
to do (!) but I occasionally check the wiki to see if it yet supports
what I need. If you want to know, the only omissions I have noticed are
struct pass-by-value
I've been thinking about that one. It looks like I can make functions return structures easily with few small changes. But passing structures is still undecided in terms of implementation. Right now all function parameters take a machine word on the stack and are simply pushed one by one. This simplistic scheme must be redone and some extra code is needed to copy structures (still unsure if I should make the caller do it (as is usually done) or the callee (I could keep the machine word per parameter logic)).
Post by James Harris
and 32-bit or wider ints in 8086 code.
I've been thinking of adding a medium memory model that would be between the small and the huge models (use fixed font):
Model int long void* void(*)() Segmentation Details
+ small 16-bit n/a 16-bit/near 16-bit/near one segment (up to 64KB) for code; one segment (up to 64KB) for data/stack; ds=es=ss; cs != ss
- medium 32-bit 32-bit 16-bit/near 32-bit/flat up to 1MB for code; one segment (up to 64KB) for data/stack; ds=es=ss; cs != ss
+ huge 32-bit 32-bit 32-bit/flat 32-bit/flat up to 1MB for code and data; one segment (up to 64KB) for stack; unrelated to each other segment registers; cs, ds, es vary, ss fixed

It would solve the performance and size issues of the huge model at the expense of having a single 64KB segment for data and stack and it would solve the apparent deficiency of the tiny and small memory models that lack 32-bit types. IOW, it would be a quite usable memory model for real mode.

But in order to implement it I need to split tokIdent into two subtypes, one for data identifiers and the other for code identifiers, so I can generate different code for data pointers and code pointers. This doesn't look attractive. I have two options: literally introduce another kind of token and add it it many switches and ifs, where the two are handled the same way, and only in a few places, where there's a difference. OTOH, I could make the associated index into the table of identifiers carry this bit of information as I do in many other places (e.g. positive for data, negative for code) and I'd need to adjust all the places, where the index is used to extract the identifier's name.

Real mode is a mess that induces and favors more mess. :)
Post by James Harris
There may be
other things I have missed but from what I have seen so far, other than
those two features it seems to do what I would need to replace bcc on
Linux.
Yep, it could serve as a replacement eventually (I hear bcc is buggy; not that Smaller C never had bugs!:).

Alex
James Harris
2015-02-12 18:13:58 UTC
Permalink
Post by Alexei A. Frounze
Post by James Harris
Post by Alexei A. Frounze
If you like so much the combination of Smaller C with NASM (did I
infer it correctly?;), you'll have it.
I have been following your posts about Smaller C with interest and it is
near to becoming something I might be able to use but I haven't tried it
yet. I did not post about it already because I figured you have enough
to do (!) but I occasionally check the wiki to see if it yet
supports
what I need. If you want to know, the only omissions I have noticed are
struct pass-by-value
I've been thinking about that one. It looks like I can make functions
return structures easily with few small changes. But passing
structures is still undecided in terms of implementation. Right now
all function parameters take a machine word on the stack and are
simply pushed one by one. This simplistic scheme must be redone and
some extra code is needed to copy structures (still unsure if I should
make the caller do it (as is usually done) or the callee (I could keep
the machine word per parameter logic)).
It occurs to me that the two things I mentioned, integers wider than the
native int size and structs, may need similar treatment: either passing
a hidden pointer (which would preserve your one-word-per-parameter
scheme) or copying them via the stack frame (which would not). Whatever
you decide for one may work for the other too. I noticed that the bcc
compiler calls memcpy to copy structs in and out but, sadly, has a bug
in its copy-in call so that I have to pass pointers to structs in the C
source code.
Post by Alexei A. Frounze
Post by James Harris
and 32-bit or wider ints in 8086 code.
I've been thinking of adding a medium memory model that would be
Model int long void* void(*)() Segmentation Details
+ small 16-bit n/a 16-bit/near 16-bit/near one segment (up
to 64KB) for code; one segment (up to 64KB) for data/stack; ds=es=ss;
cs != ss
- medium 32-bit 32-bit 16-bit/near 32-bit/flat up to 1MB for
code; one segment (up to 64KB) for data/stack; ds=es=ss; cs != ss
+ huge 32-bit 32-bit 32-bit/flat 32-bit/flat up to 1MB for
code and data; one segment (up to 64KB) for stack; unrelated to each
other segment registers; cs, ds, es vary, ss fixed
A medium model (large code, small data) may be very useful if I could
get it to run at boot time by linking it with some startup assembly
code. I know you produce Elf object files and it looks like your linker
would accept -flat16 to produce a flat binary. If I put a __start symbol
at the beginning of a startup asm module and made it the first module in
the load module then AFAICS I would have no section alignment issues. I
hope that the linker would place the first-named object module first in
the output load module. Does it link them all in order?

That said, a small model would do me just now, i.e. where DS=SS but CS
can differ.

I saw in another reply that you are not interested in targetting less
than a 386 (or 186). That was a slight disappointment. I still keep my
boot code as pure 8086. I only use 80386 code if necessary and I carry
out a check before using anything above 8086 instructions.

...
Post by Alexei A. Frounze
Post by James Harris
There may be
other things I have missed but from what I have seen so far, other than
those two features it seems to do what I would need to replace bcc on
Linux.
Yep, it could serve as a replacement eventually (I hear bcc is buggy;
not that Smaller C never had bugs!:).
Yes, and it is not under active development whereas yours is. bcc does
produce 8086 code but that aside I think the Smaller C compiler is not
far behind supporting the things I have used bcc for.

James
Alexei A. Frounze
2015-02-14 12:10:29 UTC
Permalink
Post by James Harris
Post by Alexei A. Frounze
Post by James Harris
Post by Alexei A. Frounze
If you like so much the combination of Smaller C with NASM (did I
infer it correctly?;), you'll have it.
I have been following your posts about Smaller C with interest and it is
near to becoming something I might be able to use but I haven't tried it
yet. I did not post about it already because I figured you have enough
to do (!) but I occasionally check the wiki to see if it yet supports
what I need. If you want to know, the only omissions I have noticed are
struct pass-by-value
I've been thinking about that one. It looks like I can make functions
return structures easily with few small changes. But passing
structures is still undecided in terms of implementation. Right now
all function parameters take a machine word on the stack and are
simply pushed one by one. This simplistic scheme must be redone and
some extra code is needed to copy structures (still unsure if I should
make the caller do it (as is usually done) or the callee (I could keep
the machine word per parameter logic)).
It occurs to me that the two things I mentioned, integers wider than the
native int size and structs, may need similar treatment: either passing
a hidden pointer (which would preserve your one-word-per-parameter
scheme) or copying them via the stack frame (which would not).
In theory, one could pre-declare special kinds of structures to pose as long integers and generate calls out to dedicated arithmetic functions. Poor man's long longs. :)
Post by James Harris
Whatever
you decide for one may work for the other too. I noticed that the bcc
compiler calls memcpy to copy structs in and out but, sadly, has a bug
in its copy-in call so that I have to pass pointers to structs in the C
source code.
Nasty stuff.
Post by James Harris
Post by Alexei A. Frounze
Post by James Harris
and 32-bit or wider ints in 8086 code.
I've been thinking of adding a medium memory model that would be
Model int long void* void(*)() Segmentation Details
+ small 16-bit n/a 16-bit/near 16-bit/near one segment (up
to 64KB) for code; one segment (up to 64KB) for data/stack; ds=es=ss;
cs != ss
- medium 32-bit 32-bit 16-bit/near 32-bit/flat up to 1MB for
code; one segment (up to 64KB) for data/stack; ds=es=ss; cs != ss
+ huge 32-bit 32-bit 32-bit/flat 32-bit/flat up to 1MB for
code and data; one segment (up to 64KB) for stack; unrelated to each
other segment registers; cs, ds, es vary, ss fixed
A medium model (large code, small data) may be very useful if I could
get it to run at boot time by linking it with some startup assembly
code.
Yep. Perfectly doable (tested!) with the tiny, small and huge memory models and my FAT bootsectors. I've been thinking of putting together a small demo/tutorial showing how to make a hell'o'world "kernel" using Smaller C, NASM and a bit of my FAT code (including the FAT bootsectors). But it needs some work to be usable and to require minimum code and tools to build (I don't want to include my entire FAT module (first, it wouldn't compile with Smaller C as-is; second, it's too big for a small demo) and the tools dependent on it).
Post by James Harris
I know you produce Elf object files
Not me, NASM. :)
Post by James Harris
and it looks like your linker
would accept -flat16 to produce a flat binary. If I put a __start symbol
at the beginning of a startup asm module and made it the first module in
the load module then AFAICS I would have no section alignment issues. I
hope that the linker would place the first-named object module first in
the output load module. Does it link them all in order?
See the doc: https://github.com/alexfru/SmallerC/wiki/Smaller-C-Linker-Wiki:

* The linker sorts sections alphabetically within the same type of section (type being a combination of code/data, readable/writable, initialized/uninitialized). The sections appear in this order: ".text" (if exists), other code sections, read-only data sections (e.g. ".rodata"), writable initialized data sections (e.g. ".data"), uninitialized/zero-initialized sections (e.g. ".bss"). If in doubt, generate and examine the map file.

* You can specify the "origin" for Windows, Linux and flat executables. In Windows/PE and Linux/ELF executables it's the image base address, the address at which the PE/ELF headers will be loaded, which then will be followed by the code and data sections. In flat executables it's the address/offset at which the first byte of the file will be loaded for execution. The first byte in flat executables is part of the first executable instruction. Example: when linking into DOS .COM executables, the implicit origin is naturally set to 0x100 by the linker itself.

* -origin <number> Specifies the origin for the executable file as an integer constant, decimal, hex or octal (e.g. 10, 0xA or 012 would all specify the same value). In Windows/PE and Linux/ELF executables it's the image base address, the address in memory at which the PE/ELF headers will be loaded, which then will be followed by the code and data sections. In flat executables it's the address/offset in memory at which the first byte of the file will be loaded for execution. The (E)IP register is expected to have this address/offset at start. The first byte in flat executables is part of the first executable instruction. Example: when linking into DOS .COM executables, the implicit origin is naturally set to 0x100 by the linker itself. This option should typically be used only when making flat executables. In other cases the default origin value chosen by the linker should be sufficient.

* -flat16 Must be specified when linking into 16-bit flat executables similar to DOS .COM programs. If the entry point is not at the very beginning, the linker will insert at the beginning a jump instruction to the entry point. Note: forces section alignment greater than 4 (x86 ELF files usually have 16-byte alignment) to 4 to save space.

* -flat32 Must be specified when linking into 32-bit flat executables. If the entry point is not at the very beginning, the linker will insert at the beginning a jump instruction to the entry point.

Good enough?
Post by James Harris
That said, a small model would do me just now, i.e. where DS=SS but CS
can differ.
I saw in another reply that you are not interested in targetting less
than a 386 (or 186). That was a slight disappointment. I still keep my
boot code as pure 8086. I only use 80386 code if necessary and I carry
out a check before using anything above 8086 instructions.
In my opinion, there's little point in writing code compatible with
8086/8088/80186/80286. I never even owned an 8086/8088 machine and I
last had an 80286 in like 1996. Unless you have a very specific task
of writing code for some old system or some odd embedded system built
on 80286-, it's a complete waste of time. You have participated in
the discussion of how cumbersome 16-bit x86 is for specific parts of
compilers (e.g. register allocation), right? The x86 code generator
deals with this crap to some extent. But I hate it. And I'll hate it
even more to support x86 all the way down to 8086. You really want
this support, you'll most likely have to implement it yourself. I can
explain you how things work and answer any questions, but I don't want
to write that code.

Alex
Rod Pemberton
2015-02-14 20:29:42 UTC
Permalink
On Sat, 14 Feb 2015 07:10:29 -0500, Alexei A. Frounze
Post by Alexei A. Frounze
Post by James Harris
I saw in another reply that you are not interested in targetting less
than a 386 (or 186). That was a slight disappointment. I still keep my
boot code as pure 8086. I only use 80386 code if necessary and I carry
out a check before using anything above 8086 instructions.
In my opinion, there's little point in writing code compatible with
8086/8088/80186/80286. I never even owned an 8086/8088 machine and I
last had an 80286 in like 1996. Unless you have a very specific task
of writing code for some old system or some odd embedded system built
on 80286-, it's a complete waste of time. You have participated in
the discussion of how cumbersome 16-bit x86 is for specific parts of
compilers (e.g. register allocation), right?
The DJGPP (GCC for DOS) compiler doesn't produce 16-bit code,
but you need 16-bit code for a DOS compiler to work on DOS.

So, they use assembly in various forms and extra utils, mostly
for 16-bit support, such as DJASM and TASM to build. E.g.,
the compile 16-bit DJASM code to binary, dump as hex, include
the hex in a C array, and call the array as code. E.g., they
use TASM to compile the 16-bit C code for the DPMI host. E.g.,
they have special utilities in C to add or remove executable
stubs to COFF objects or to convert TASM .map symbol file to
a C include file.

.s and .S files - 32-bit GAS assembly
.asm files - 32-bit GAS assembly
.asm files - 16-bit TASM probably ... or MASM
.asm files - 16-bit DJASM
.c files and a .h file - inlined GAS assembly in C
.h file - raw, C style hex bytes which is included
in the middle of a C array in another file of .c
.c files - C character arrays of hex bytes with
preprocessor defines for bytes to fill in
and an assembly listing in a comments /* */ column

The point is the old standard version of NASM, i.e., 0.98.39,
which is never going to change, works for DOS can be used to
compile 16-bit assembly, dumped as hex, converted to an array,
and included in C code for use where needed. It's probably
convenient to also keep one "dead" 16-bit C compiler around
too, perhaps TASM. Of course, these are DOS utilities. James
might be able to use them with DOSBox, dosemu, or Qemu.
Although, James might want something for Linux such as AS86
which is part of bin86. I know James is aware of ELKS and BCC.
Another option is to code in hex like Wolfgang. There shouldn't
be too much need of 16-bit code. The bootloader can take care
of that for you, e.g., multi-boot via Grub or Grub4DOS start
your code in 32-bit PM and so can SYSLINUX COM32 format. Or,
you can use 16-bit DJASM. It not that big and might compile
cleanly for Linux. It has a makefile and code for GCC and Yacc.
GAS (GNU AS assembler) also supports, or did, support 16-bit
code segments with .code16 and .code16gcc directives. Of
course, there are a number of bootloader projects, e.g.,
Syslinux, LILO, Grub, etc that all must use 16-bit utilities
to build, and probably many others which deal with 16-bit
code or BIOS code like LinuxBIOS, FreeVGA, MAME, MESS, Wine,
Cygwin, MingW, etc.

https://linux.web.cern.ch/linux/scientific4/docs/rhel-as-en-4/i386-16bit.html
http://cs.nyu.edu/courses/spring03/G22.2130-001/assembly_howto.txt


James was looking at network booting:

http://cloudboot.org/
http://ipxe.org/


Rod Pemberton
Alexei A. Frounze
2015-02-14 22:58:14 UTC
Permalink
Post by Rod Pemberton
On Sat, 14 Feb 2015 07:10:29 -0500, Alexei A. Frounze
Post by Alexei A. Frounze
Post by James Harris
I saw in another reply that you are not interested in targetting less
than a 386 (or 186). That was a slight disappointment. I still keep my
boot code as pure 8086. I only use 80386 code if necessary and I carry
out a check before using anything above 8086 instructions.
In my opinion, there's little point in writing code compatible with
8086/8088/80186/80286. I never even owned an 8086/8088 machine and I
last had an 80286 in like 1996. Unless you have a very specific task
of writing code for some old system or some odd embedded system built
on 80286-, it's a complete waste of time. You have participated in
the discussion of how cumbersome 16-bit x86 is for specific parts of
compilers (e.g. register allocation), right?
The DJGPP (GCC for DOS) compiler doesn't produce 16-bit code,
but you need 16-bit code for a DOS compiler to work on DOS.
So, they use assembly in various forms and extra utils, mostly
for 16-bit support, such as DJASM and TASM to build. E.g.,
the compile 16-bit DJASM code to binary, dump as hex, include
the hex in a C array, and call the array as code. E.g., they
use TASM to compile the 16-bit C code for the DPMI host. E.g.,
they have special utilities in C to add or remove executable
stubs to COFF objects or to convert TASM .map symbol file to
a C include file.
Right, however DJASM requires external tools to be compiled and
it's a very minimal assembler (more minimal than I need) whose
only purpose is to compile 16-bit startup code that would take
care of .EXE parameter parsing, DPMI setup and such.

Alex
James Harris
2015-03-29 20:35:39 UTC
Permalink
"Alexei A. Frounze" <***@gmail.com> wrote in message news:4d40b077-a3fe-42ca-872b-***@googlegroups.com...

...
Post by Alexei A. Frounze
Good enough?
Yes, I think I could work with that.

...
Post by Alexei A. Frounze
In my opinion, there's little point in writing code compatible with
8086/8088/80186/80286. I never even owned an 8086/8088 machine ...
There are still 8086 emulators.
Post by Alexei A. Frounze
You have participated in
the discussion of how cumbersome 16-bit x86 is for specific parts of
compilers (e.g. register allocation), right?
It is hard to generate code for if you want to produce good code but you
don't optimise anyway, do you?

...
Post by Alexei A. Frounze
You really want this support, you'll most likely have to implement
it yourself. I can explain you how things work and answer any
questions, but I don't want to write that code.
Understandable.

James
Alexei A. Frounze
2015-03-30 00:50:35 UTC
Permalink
Post by James Harris
...
Post by Alexei A. Frounze
Good enough?
Yes, I think I could work with that.
Cool.
Post by James Harris
...
Post by Alexei A. Frounze
In my opinion, there's little point in writing code compatible with
8086/8088/80186/80286. I never even owned an 8086/8088 machine ...
There are still 8086 emulators.
There are better emulators, too. :)
Post by James Harris
Post by Alexei A. Frounze
You have participated in
the discussion of how cumbersome 16-bit x86 is for specific parts of
compilers (e.g. register allocation), right?
It is hard to generate code for if you want to produce good code but you
don't optimise anyway, do you?
Typically, if the architecture isn't too awkward, the difference between
optimized (think -O0) and unoptimized (think -O2) is 2x. When it's
awkward, it can be more than that (unoptimized being inadequate) and
when you get to 5+x difference you start to notice it where 2x would be
just fine. Segmented 16-bit x86 world is awkward.
Post by James Harris
...
Post by Alexei A. Frounze
You really want this support, you'll most likely have to implement
it yourself. I can explain you how things work and answer any
questions, but I don't want to write that code.
Understandable.
I've spent some time trying to add the medium memory model, but it
looks like it needs more undivided attention than I currently have.
Ditto for passing and returning structures by value. However, it
looks like I might actually be able to support the latter if I don't
try to get rid of temporary objects and double copying of those
(the return statement copies the structure to the caller-supplied
location and the caller always allocates a temp object to receive it
even if the value returned by the function is assigned to another
object and can actually be copied there directly). This isn't very
good, but much easier to implement than what you'd expect from a
compiler.

Alex

Rod Pemberton
2015-02-13 00:59:17 UTC
Permalink
On Thu, 12 Feb 2015 03:38:43 -0500, Alexei A. Frounze
Post by Alexei A. Frounze
Post by James Harris
Post by Alexei A. Frounze
I'm still contemplating the idea of implementing a simple assembler
for Smaller C to make it fully self-sufficient, easily portable
(someone asked for a C compiler for xv6 on Stack Overflow) and
a tad faster as it turns out that NASM can be horribly slow
(I think I've mentioned that before).
What is xv6?
http://en.wikipedia.org/wiki/Xv6 ?
Well, I assumed that was a typo for x86 ... Silly me!

ANSI C is good. MIT license (?) is good.

So, where's DOS port for DJGPP? ... ;-)

Wikipedia says xv6 is for an operating systems course ...
Shouldn't they be in a compiler course first?
They could be patching up PCC for their C compiler
in that course.

:-)

Have you read up on the project any? How did they decide
to handle multiprocessor support for x86 in ANSI C?
setjmp/longjmp? inlined assembly?
Post by Alexei A. Frounze
Post by James Harris
Keep Nasm!
I now absolutely must erase it from my system and then from all websites! :)
Seriously, it should be of no concern to you.
I'm not sure about James, but _using_ NASM is of no concern to me.
*NOT* using NASM is of concern to me, since I prefer to use NASM.

:-)

E.g., some of Japheth's code I'm interested in working on is in
JWASM. JWASM is Japheth's modified version of OpenWatcom's WASM.
WASM is a MASM clone. Some of his code didn't build with his
own JWASM ... ! I'm also seeing newer NASM code that won't compile
with 0.98.39. The NASM team dropped support for DOS for a while,
then someone starting building NASM for DOS again. So, that
concerns me too, i.e., NASM code may not compile for DOS.

Of course, I have yet to look at your project, sorry, time,
life, Usenet, lots of good TV, etc.
Post by Alexei A. Frounze
If I implement a replacement for Smaller C, it will still be
syntax-compatible and produce the same kind of ELF object files, but it
will be smaller, faster and more portable
Is that worth the work? It sounds like another big project.
There are a bunch of NASM syntax clones around, e.g., FASM, YASM,
etc. If speed is the only issue, FASM is supposedly very fast.

http://flatassembler.net/
http://yasm.tortall.net/


Rod Pemberton
Alexei A. Frounze
2015-02-14 12:33:41 UTC
Permalink
Post by Rod Pemberton
On Thu, 12 Feb 2015 03:38:43 -0500, Alexei A. Frounze
Post by Alexei A. Frounze
Post by James Harris
Post by Alexei A. Frounze
I'm still contemplating the idea of implementing a simple assembler
for Smaller C to make it fully self-sufficient, easily portable
(someone asked for a C compiler for xv6 on Stack Overflow) and
a tad faster as it turns out that NASM can be horribly slow
(I think I've mentioned that before).
What is xv6?
http://en.wikipedia.org/wiki/Xv6 ?
Well, I assumed that was a typo for x86 ... Silly me!
ANSI C is good. MIT license (?) is good.
So, where's DOS port for DJGPP? ... ;-)
No clue.
Post by Rod Pemberton
Wikipedia says xv6 is for an operating systems course ...
Shouldn't they be in a compiler course first?
They could be patching up PCC for their C compiler
in that course.
:-)
Perhaps. Perhaps they wanted a free ride.
Post by Rod Pemberton
Have you read up on the project any? How did they decide
to handle multiprocessor support for x86 in ANSI C?
setjmp/longjmp? inlined assembly?
No clue.

I only answered a few specific questions based on what I know
about x86 and what I could find in their kernel source code.
That's all I know about the system.
Post by Rod Pemberton
Post by Alexei A. Frounze
Post by James Harris
Keep Nasm!
I now absolutely must erase it from my system and then from all websites! :)
Seriously, it should be of no concern to you.
I'm not sure about James, but _using_ NASM is of no concern to me.
*NOT* using NASM is of concern to me, since I prefer to use NASM.
:-)
E.g., some of Japheth's code I'm interested in working on is in
JWASM. JWASM is Japheth's modified version of OpenWatcom's WASM.
WASM is a MASM clone. Some of his code didn't build with his
own JWASM ... ! I'm also seeing newer NASM code that won't compile
with 0.98.39. The NASM team dropped support for DOS for a while,
then someone starting building NASM for DOS again. So, that
concerns me too, i.e., NASM code may not compile for DOS.
Of course, I have yet to look at your project, sorry, time,
life, Usenet, lots of good TV, etc.
TV is bad for ya. :)
Post by Rod Pemberton
Post by Alexei A. Frounze
If I implement a replacement for Smaller C, it will still be
syntax-compatible and produce the same kind of ELF object files, but it
will be smaller, faster and more portable
Is that worth the work? It sounds like another big project.
A subset is easy to implement.
Post by Rod Pemberton
There are a bunch of NASM syntax clones around, e.g., FASM, YASM,
etc. If speed is the only issue, FASM is supposedly very fast.
http://flatassembler.net/
http://yasm.tortall.net/
Of all those which meet all of the following requirements?:
- straight ANSI C is sufficient to compile, no POSIX/Linux/Windows APIs used
- one doesn't need a bunch of other tools (sh, bison, yacc and other zoo inhabitants) to compile it
- it can be easily compiled for and run in DOS and DOSBox (=small, efficient)
- supports 16-bit and 32-bit code and sections in both
- emits ELF (with 16-bit relocation extensions for 16-bit code)

I bet there's none. Probably, none to meet more than 3 out of the 5 criteria.

Alex
Melzzzzz
2015-02-14 13:03:15 UTC
Permalink
On Sat, 14 Feb 2015 04:33:41 -0800 (PST)
Post by Alexei A. Frounze
Post by Rod Pemberton
On Thu, 12 Feb 2015 03:38:43 -0500, Alexei A. Frounze
Post by Alexei A. Frounze
Post by James Harris
Post by Alexei A. Frounze
I'm still contemplating the idea of implementing a simple
assembler for Smaller C to make it fully self-sufficient,
easily portable (someone asked for a C compiler for xv6 on
Stack Overflow) and a tad faster as it turns out that NASM can
be horribly slow (I think I've mentioned that before).
What is xv6?
http://en.wikipedia.org/wiki/Xv6 ?
Well, I assumed that was a typo for x86 ... Silly me!
ANSI C is good. MIT license (?) is good.
So, where's DOS port for DJGPP? ... ;-)
No clue.
Post by Rod Pemberton
Wikipedia says xv6 is for an operating systems course ...
Shouldn't they be in a compiler course first?
They could be patching up PCC for their C compiler
in that course.
:-)
Perhaps. Perhaps they wanted a free ride.
Post by Rod Pemberton
Have you read up on the project any? How did they decide
to handle multiprocessor support for x86 in ANSI C?
setjmp/longjmp? inlined assembly?
No clue.
I only answered a few specific questions based on what I know
about x86 and what I could find in their kernel source code.
That's all I know about the system.
Post by Rod Pemberton
Post by Alexei A. Frounze
Post by James Harris
Keep Nasm!
I now absolutely must erase it from my system and then from all websites! :)
Seriously, it should be of no concern to you.
I'm not sure about James, but _using_ NASM is of no concern to me.
*NOT* using NASM is of concern to me, since I prefer to use NASM.
:-)
E.g., some of Japheth's code I'm interested in working on is in
JWASM. JWASM is Japheth's modified version of OpenWatcom's WASM.
WASM is a MASM clone. Some of his code didn't build with his
own JWASM ... ! I'm also seeing newer NASM code that won't compile
with 0.98.39. The NASM team dropped support for DOS for a while,
then someone starting building NASM for DOS again. So, that
concerns me too, i.e., NASM code may not compile for DOS.
Of course, I have yet to look at your project, sorry, time,
life, Usenet, lots of good TV, etc.
TV is bad for ya. :)
Post by Rod Pemberton
Post by Alexei A. Frounze
If I implement a replacement for Smaller C, it will still be
syntax-compatible and produce the same kind of ELF object files,
but it will be smaller, faster and more portable
Is that worth the work? It sounds like another big project.
A subset is easy to implement.
Post by Rod Pemberton
There are a bunch of NASM syntax clones around, e.g., FASM, YASM,
etc. If speed is the only issue, FASM is supposedly very fast.
http://flatassembler.net/
http://yasm.tortall.net/
- straight ANSI C is sufficient to compile, no POSIX/Linux/Windows APIs used
- one doesn't need a bunch of other tools (sh, bison, yacc and other
zoo inhabitants) to compile it
- it can be easily compiled for and run in DOS and DOSBox (=small, efficient)
- supports 16-bit and 32-bit code and sections in both
- emits ELF (with 16-bit relocation extensions for 16-bit code)
I bet there's none. Probably, none to meet more than 3 out of the 5 criteria.
fasm is written in fasm assembler, you can assemble it on any of three
platforms it supports (DOS,Windows and Linux) for any of the three
platforms without any external tool.
fasm began as 16 bit assembler, now it supports 64 bit, too.
If you place use16 when elf format used I guess it generates
16 bit relocations.
I guess that it fits all your criteria.
Post by Alexei A. Frounze
Alex
Alexei A. Frounze
2015-02-14 22:19:07 UTC
Permalink
On Saturday, February 14, 2015 at 5:03:17 AM UTC-8, Melzzzzz wrote:
...
Post by Melzzzzz
fasm is written in fasm assembler, you can assemble it on any of three
platforms it supports (DOS,Windows and Linux) for any of the three
platforms without any external tool.
Looks like it requires DPMI in DOS and it doesn't try to load any DPMI host / DOS extender by itself nor has any built-in. I need to run CWSDPMI.EXE manually before FASM.EXE. Not good.
Post by Melzzzzz
fasm began as 16 bit assembler, now it supports 64 bit, too.
I don't care about 64-bit right now.
Post by Melzzzzz
If you place use16 when elf format used I guess it generates
16 bit relocations.
I guess that it fits all your criteria.
Close but not quite. While I can use it to compile 32-bit code into ELF object files and make 32-bit protected mode apps for Windows and Linux, I get a number of errors when I try to use ELF for 16-bit code, which is a clear indication that it wasn't something even expected.

Try uncommenting the commented lines or parts of the lines to get the same errors as I'm getting:

----8<----
format elf

use16

extrn _puts ;:near ; error: invalid argument.

section '.text' executable ;use16 ; error: extra characters on line.

public _main
_main:
; push msg ; error: invalid use of symbol.
; call _puts ; error: address sizes do not agree.
add sp, 2
; call _fxn
ret


section '.data' writable ;use16 ; error: extra characters on line.

msg db "Hello, World!",0


section '.text' executable ;use16 ; error: extra characters on line.

_fxn:
; push msg2 ; error: invalid use of symbol.
; mov ax, [pmsg2]
push ax
; call _puts ; error: address sizes do not agree.
add sp, 2
ret


section '.data' writable

;pmsg2 dw msg2 ; error: invalid use of symbol.
msg2 db "Bye!",0
----8<----

Alex
Melzzzzz
2015-02-15 01:29:27 UTC
Permalink
On Sat, 14 Feb 2015 14:19:07 -0800 (PST)
Post by Alexei A. Frounze
...
Post by Melzzzzz
fasm is written in fasm assembler, you can assemble it on any of
three platforms it supports (DOS,Windows and Linux) for any of the
three platforms without any external tool.
Looks like it requires DPMI in DOS and it doesn't try to load any
DPMI host / DOS extender by itself nor has any built-in. I need to
run CWSDPMI.EXE manually before FASM.EXE. Not good.
Hm, I don't know. I use Linux.
Post by Alexei A. Frounze
Post by Melzzzzz
fasm began as 16 bit assembler, now it supports 64 bit, too.
I don't care about 64-bit right now.
;)
Post by Alexei A. Frounze
Post by Melzzzzz
If you place use16 when elf format used I guess it generates
16 bit relocations.
I guess that it fits all your criteria.
Close but not quite. While I can use it to compile 32-bit code into
ELF object files and make 32-bit protected mode apps for Windows and
Linux, I get a number of errors when I try to use ELF for 16-bit
code, which is a clear indication that it wasn't something even
expected.
Try uncommenting the commented lines or parts of the lines to get the
Ok.
NimbUs
2015-02-15 11:14:57 UTC
Permalink
Alexei A. Frounze dit dans news:15557501-a369-4c54-a799-
***@googlegroups.com:

...FASM...
Post by Alexei A. Frounze
Looks like it requires DPMI in DOS and it doesn't try to
load any DPMI host / DOS extender by itself nor has any built-
in. I need to run CWSDPMI.EXE manually before FASM.EXE. Not
good.

Not correct ! FASM - the DOS version - will run in either of 2
configurations :

- raw (real mode) DOS, no EMM active, or
- under DPMI (not VCPI).

FASM's raw mode should be a particular interest to you, Alexei
: absent a DPMI host, FASM runs in a mostly undocumented CPU
mode, full 32-bit-(un)real i.e. PE=0 but all sefments being
32-bits INCLUDING CS !

An almost unique feat of FASM. Though a similar mode had been
used by Helix (the "cloaking" API), with anteriority I
believe, it was rediscovered by the author of FASM
independently.
--
NimbUs
Alexei A. Frounze
2015-02-15 11:48:23 UTC
Permalink
Post by NimbUs
Alexei A. Frounze dit dans news:15557501-a369-4c54-a799-
...FASM...
Post by Alexei A. Frounze
Looks like it requires DPMI in DOS and it doesn't try to
load any DPMI host / DOS extender by itself nor has any built-
in. I need to run CWSDPMI.EXE manually before FASM.EXE. Not
good.
Not correct ! FASM - the DOS version - will run in either of 2
- raw (real mode) DOS, no EMM active, or
- under DPMI (not VCPI).
FASM's raw mode should be a particular interest to you, Alexei
: absent a DPMI host, FASM runs in a mostly undocumented CPU
mode, full 32-bit-(un)real i.e. PE=0 but all sefments being
32-bits INCLUDING CS !
An almost unique feat of FASM. Though a similar mode had been
used by Helix (the "cloaking" API), with anteriority I
believe, it was rediscovered by the author of FASM
independently.
Not quite correct either. :) It looks like there's a bug.
Can you spot it in this excerpt from SOURCE\DOS\MODES.INC (from fasm17122.zip)?:

----8<----
no_dpmi:
smsw ax
test al,1
jz no_real32
call init_error
db 'system is in protected mode without 32-bit DPMI services',24h
no_real32:
call init_error
db 'processor is not able to enter 32-bit real mode',24h
----8<----

Hint: I'm getting "error: processor is not able to enter 32-bit real mode" in DOSBox.

Alex
Alexei A. Frounze
2015-02-15 12:05:32 UTC
Permalink
Post by Alexei A. Frounze
Post by NimbUs
Alexei A. Frounze dit dans news:15557501-a369-4c54-a799-
...FASM...
Post by Alexei A. Frounze
Looks like it requires DPMI in DOS and it doesn't try to
load any DPMI host / DOS extender by itself nor has any built-
in. I need to run CWSDPMI.EXE manually before FASM.EXE. Not
good.
Not correct ! FASM - the DOS version - will run in either of 2
- raw (real mode) DOS, no EMM active, or
- under DPMI (not VCPI).
FASM's raw mode should be a particular interest to you, Alexei
: absent a DPMI host, FASM runs in a mostly undocumented CPU
mode, full 32-bit-(un)real i.e. PE=0 but all sefments being
32-bits INCLUDING CS !
An almost unique feat of FASM. Though a similar mode had been
used by Helix (the "cloaking" API), with anteriority I
believe, it was rediscovered by the author of FASM
independently.
Not quite correct either. :) It looks like there's a bug.
----8<----
smsw ax
test al,1
jz no_real32
call init_error
db 'system is in protected mode without 32-bit DPMI services',24h
call init_error
db 'processor is not able to enter 32-bit real mode',24h
----8<----
Hint: I'm getting "error: processor is not able to enter 32-bit real mode" in DOSBox.
Alex
Actually, I should probably take that back. This is probably how the code ends up there in my case:

----8<----
jmp 1 shl 3:test_pm32
no_rm32:
sti
jmp dpmi
test_pm32:
use32
mov eax,cr0
and al,not 1
mov cr0,eax
mov ebx,0FFFFh
jmp modes:test_rm32
test_rm32:
inc ebx
jz short no_rm32
----8<----

It may be that DOSBox doesn't support 32-bit code in unreal mode ("inc ebx" is interpreted as "inc bx"). That sucks for both FASM and DOSBox.

Alex
NimbUs
2015-02-15 12:25:54 UTC
Permalink
Alexei A. Frounze dit dans news:37c25afc-6f31-4b16-9ce2-
Post by Alexei A. Frounze
It may be that DOSBox doesn't support 32-bit code in unreal
mode
See my earlier reply.
Post by Alexei A. Frounze
("inc ebx" is interpreted as "inc bx"). That sucks for both
FASM and DOSBox.

DOSBox is advertised as an environment for playing old DOS
games. No more no less. You aint'supposed to use it as a
development platform !

However, did you try FASM under DPMI in DOSBox ? Try different
DPMI hosts (CWSDPMI way have own problems. I'd try Japheth's
instead). If FASM still doesn't run, then it's clearly
DOSBOX's emulation bug.

Cheers
--
Nim'
Alexei A. Frounze
2015-02-16 01:12:59 UTC
Permalink
Post by NimbUs
Alexei A. Frounze dit dans news:37c25afc-6f31-4b16-9ce2-
Post by Alexei A. Frounze
It may be that DOSBox doesn't support 32-bit code in unreal
mode
See my earlier reply.
Post by Alexei A. Frounze
("inc ebx" is interpreted as "inc bx"). That sucks for both
FASM and DOSBox.
DOSBox is advertised as an environment for playing old DOS
games. No more no less. You aint'supposed to use it as a
development platform !
Could you educate me on where the line is between gaming and
development? Preferably, not in the terms of what the user
does, why and what they feel about it (both activities can be
thought as entertainment), but in software terms. Like, is
"inc (e)bx" supposed to work differently in a game vs in a
compiler? How would DOSBox differentiate between games and
development tools and why? Do you know that games routinely
use tricks, hacks and undocumented functionality to extract
the most performance from the platform? Why can they do that
while dev tools can't?

Perhaps, the true point of statements like yours is to
discourage any use of DOSBox in important projects, where
emulation bugs can lead to larger problems than in games.

However, I can see that Turbo C/C++/Pascal/Assembler/
Debugger work well in DOSBox. And so does NASM (unless you
feed it a huge input file). DJGPP kind of works (there's some
problem with long file names in included headers). My compiler
works. I think I've run Watcom Debugger in it as well.

So, DOSBox is not too bad of an environment to do simple
development and basic testing for real mode, DOS or DPMI.

I know ^C doesn't work in DOSBox as in DOS. I can live with
that. Most of the time ^C use implies that something went
wrong. You can't miss that. So, it's not a big deal.
Post by NimbUs
However, did you try FASM under DPMI in DOSBox ?
Yes. It worked but I had to manually load CWSDPMI.EXE.

Alex
NimbUs
2015-02-16 11:47:48 UTC
Permalink
Alexei A. Frounze dit dans news:e43e7cb7-4233-4c88-8a09-
On Sunday, February 15, 2015 at 4:25:55 AM UTC-8, NimbUs
wrote:
....
Post by NimbUs
DOSBox is advertised as an environment for playing old DOS
games. No more no less. You aint'supposed to use it as a
development platform !
Could you educate me on where the line is between gaming and
development? Preferably, not in the terms of what the user
does, why and what they feel about it (both activities can
be
thought as entertainment), but in software terms.
I have no desire nor pretense to "educate" you, Alexeï.
Besides, I know of DOSBox only by hearsay, or say, from
readings, not personal experience with the thing...
Like, is
"inc (e)bx" supposed to work differently in a game vs in a
compiler?
I guess not. Were you saying, incidentally, that "inc ebx"
does not work properly under DOSBox, or is it not what you've
been writing ? I have no idea whether that is true, frankly I
would double-check before believing such a strange assertion,
that is, if I were interested in running DOSBox, which I am
not.

There is a guy goes by the nym "DOS386" who appears to have
first hand experience struggling in vain against the DOSbox
devs. On forums such as BTTR.DE and the FASM forum, "DOS386"
keeps saying the DOSBox team has repeatedly proven they are
not interested in fixing bugs that they deem of no
significance to supported DOS games.
How would DOSBox differentiate between games and
development tools and why?
According to the man I named above, there are a lot of bugs in
the DOSbox emulation which the developpers refuse to fix as
long as they do not affect the old 'games'. I think DPMI is
one area he has mentionned as completely broken.
Do you know that games routinely
use tricks, hacks and undocumented functionality to extract
the most performance from the platform? Why can they do that
while dev tools can't?
For details, please consult with "DOS386"... I am not
predjudiced, and if DOSBox works for you and the tools you
need, all the better ! What is your OS, btw, Linux ? Is DOSbox
fast enough, compared to VirtualBOX, say ?
Perhaps, the true point of statements like yours is to
discourage any use of DOSBox in important projects, where
emulation bugs can lead to larger problems than in games.
Well, coming back to FASM, it is *32-bit* code unlike most
other X86 assemblers. IF 32-bit instructions (inc ebx ?) can't
be trusted under DOSBox, you certainly cannot reliably use
FASM even under a proper DPMI host.
However, I can see that Turbo C/C++/Pascal/Assembler/
Debugger work well in DOSBox. And so does NASM (unless you
feed it a huge input file).
Because those are 16-bit code albeit running "DOS-extended".
FASM is *32-bit* native code, hence needs a fully compatible
ia-32 CPU (or emulation thereof) which, it appears, DOSBox
does not quite provide.

Cheers
--
NimbUs
Alexei A. Frounze
2015-02-16 12:20:14 UTC
Permalink
Post by NimbUs
Alexei A. Frounze dit dans news:e43e7cb7-4233-4c88-8a09-
On Sunday, February 15, 2015 at 4:25:55 AM UTC-8, NimbUs
....
Post by NimbUs
DOSBox is advertised as an environment for playing old DOS
games. No more no less. You aint'supposed to use it as a
development platform !
Could you educate me on where the line is between gaming and
development? Preferably, not in the terms of what the user
does, why and what they feel about it (both activities can
be
thought as entertainment), but in software terms.
I have no desire nor pretense to "educate" you, Alexeï.
Besides, I know of DOSBox only by hearsay, or say, from
readings, not personal experience with the thing...
Like, is
"inc (e)bx" supposed to work differently in a game vs in a
compiler?
I guess not. Were you saying, incidentally, that "inc ebx"
does not work properly under DOSBox, or is it not what you've
been writing ?
Here I simply chose the same instruction as used in FASM to test
for unreal mode but I didn't mean this particular problem of
DOSBox with unreal mode. I should've probably used a different
one to avoid confusion.
Post by NimbUs
I have no idea whether that is true, frankly I
would double-check before believing such a strange assertion,
that is, if I were interested in running DOSBox, which I am
not.
There is a guy goes by the nym "DOS386" who appears to have
first hand experience struggling in vain against the DOSbox
devs. On forums such as BTTR.DE and the FASM forum, "DOS386"
keeps saying the DOSBox team has repeatedly proven they are
not interested in fixing bugs that they deem of no
significance to supported DOS games.
Yep. I've got two notifications recently about the fixes to
the bugs that I reported a year and a half ago being finally
committed. And there was some odd problem with DOSBox's dll
that caused weird keyboard mapping on my computers. I don't
know what it was, but recompilation fixed it. Indeed, the
fixing and the testing stories are not great.
Post by NimbUs
How would DOSBox differentiate between games and
development tools and why?
According to the man I named above, there are a lot of bugs in
the DOSbox emulation which the developpers refuse to fix as
long as they do not affect the old 'games'. I think DPMI is
one area he has mentionned as completely broken.
Could be, to some extent. So far I haven't had problems with
it as my use of it under DOSBox is limited to NASM.
Post by NimbUs
Do you know that games routinely
use tricks, hacks and undocumented functionality to extract
the most performance from the platform? Why can they do that
while dev tools can't?
For details, please consult with "DOS386"... I am not
predjudiced, and if DOSBox works for you and the tools you
need, all the better ! What is your OS, btw, Linux ? Is DOSbox
fast enough, compared to VirtualBOX, say ?
I run DOSBox on Windows. It's not fast with default settings,
but good enough for many simple things. When I need it faster,
I use a keyboard shortcut to give it more CPU cycles to burn.
Post by NimbUs
Perhaps, the true point of statements like yours is to
discourage any use of DOSBox in important projects, where
emulation bugs can lead to larger problems than in games.
Well, coming back to FASM, it is *32-bit* code unlike most
other X86 assemblers. IF 32-bit instructions (inc ebx ?) can't
be trusted under DOSBox, you certainly cannot reliably use
FASM even under a proper DPMI host.
It's not that instruction per se, the problem is that FASM
expects that the CPU will stay in 32-bit mode when CR0.PE is
cleared. Apparently, DOSBox switches the CPU to 16-bit mode
when CR0.PE goes to 0, which is why the test instruction
(inc (e)bx) is interpreted differently and which is why the
unreal mode test fails and FASM refuses to operate under
DOSBox unless DPMI is present.
This behavior that FASM relies on is nor warranted by the
CPU documentation. The documentation clearly states the
steps to switch out of protected mode into real mode and
FASM isn't doing it as documented/recommended.
Post by NimbUs
However, I can see that Turbo C/C++/Pascal/Assembler/
Debugger work well in DOSBox. And so does NASM (unless you
feed it a huge input file).
Because those are 16-bit code albeit running "DOS-extended".
FASM is *32-bit* native code, hence needs a fully compatible
ia-32 CPU (or emulation thereof) which, it appears, DOSBox
does not quite provide.
There's no such thing as "fully compatible" when one relies on
what's not even documented/defined. Compatible with what?

Alex
NimbUs
2015-02-16 13:02:25 UTC
Permalink
Alexei A. Frounze dit dans news:0734bdc5-a9c4-4b6b-8fca-
***@googlegroups.com:



( snip....)
Post by Alexei A. Frounze
Post by NimbUs
Well, coming back to FASM, it is *32-bit* code unlike most
other X86 assemblers. IF 32-bit instructions (inc ebx ?)
can't
Post by Alexei A. Frounze
Post by NimbUs
be trusted under DOSBox, you certainly cannot reliably use
FASM even under a proper DPMI host.
It's not that instruction per se, the problem is that FASM
expects that the CPU will stay in 32-bit mode when CR0.PE is
cleared. Apparently, DOSBox switches the CPU to 16-bit mode
when CR0.PE goes to 0, which is why the test instruction
(inc (e)bx) is interpreted differently and which is why the
unreal mode test fails and FASM refuses to operate under
DOSBox unless DPMI is present.
Indeed. Since 32-bit-real mode does not operate properly under
DOSBox, you must use DPMI. By the way DPMI is considered the
normal environment to execute FASM under DOS, 32-bit-(un)real
is a "bonus".

Under DPMI, FASM is a full 32-bit client, unlike TASM or MASM
or NASM which are 16-bit. If tou should find FASM fails uner
DOSbox - using DMPI of course - then it can be because of bugs
in DOSbox's emulation of 32-bit instructions. This is what I
meant in the quoted lines.
Post by Alexei A. Frounze
This behavior that FASM relies on is nor warranted by the
CPU documentation. The documentation clearly states the
steps to switch out of protected mode into real mode and
FASM isn't doing it as documented/recommended.
All intel ia-386 CPUs operate in that way, as well as all
clones (with an old, anecdotic, exception of Cyrix 586)
Post by Alexei A. Frounze
Post by NimbUs
Post by Alexei A. Frounze
However, I can see that Turbo C/C++/Pascal/Assembler/
Debugger work well in DOSBox. And so does NASM (unless you
feed it a huge input file).
Because those are 16-bit code albeit running "DOS-
extended".
Post by Alexei A. Frounze
Post by NimbUs
FASM is *32-bit* native code, hence needs a fully
compatible
Post by Alexei A. Frounze
Post by NimbUs
ia-32 CPU (or emulation thereof) which, it appears, DOSBox
does not quite provide.
There's no such thing as "fully compatible" when one relies
on
Post by Alexei A. Frounze
what's not even documented/defined. Compatible with what?
Not the question. I hope the confusion has been dissipated by
now.
--
NimbUs
Alexei A. Frounze
2015-02-20 08:56:23 UTC
Permalink
On Monday, February 16, 2015 at 5:02:26 AM UTC-8, NimbUs wrote:
...
Post by NimbUs
Indeed. Since 32-bit-real mode does not operate properly under
DOSBox, you must use DPMI. By the way DPMI is considered the
normal environment to execute FASM under DOS, 32-bit-(un)real
is a "bonus".
The problem is this particular combination of the "bonus", which
is known to have problems, and the fact that DPMI is not
something present in DOS by default and that FASM does attempt to
load CWSDPMI or its own DPMI host, thus falling "back" to the
"bonus", which may not work and indeed does not work in some
environments.

This can be remedied in most cases in several ways without the
user having to know or deal with FASM failures to operate:
- ship with and launch CWSDPMI or another DPMI host
- rewrite the code so it can compile for 16-bit realmode
(in most cases it would just involve additional address
and operand size prefixes; change how subroutines are called
and returned (e.g. have a separate segment for each)), which
should be doable with FASM's preprocessor
- simply run the code in proper 32-bit protected mode and
provide enough plumbing for interrupt servicing and for calls
into BIOS and DOS (virtual 8086 mode may help simplify
things)
Post by NimbUs
Under DPMI, FASM is a full 32-bit client, unlike TASM or MASM
or NASM which are 16-bit. If tou should find FASM fails uner
DOSbox - using DMPI of course - then it can be because of bugs
in DOSbox's emulation of 32-bit instructions. This is what I
meant in the quoted lines.
I don't know what you mean by NASM being a 16-bit client. NASM
for DOS is compiled with DJGPP, which uses 32-bit DPMI.
Post by NimbUs
Post by Alexei A. Frounze
This behavior that FASM relies on is nor warranted by the
CPU documentation. The documentation clearly states the
steps to switch out of protected mode into real mode and
FASM isn't doing it as documented/recommended.
All intel ia-386 CPUs operate in that way, as well as all
clones (with an old, anecdotic, exception of Cyrix 586)
Would you be bold enough to define the officially undefined
behavior of the shift and rotate instructions in the cases
where the shift count is equal to or greater than the
operand size? Just curious. :) I have reverse-engineered
3 different implementations from the observed results.
Two of them belonged to different Intel CPUs and one to
AMD CPUs.

I've also noticed different behavior with NULL selectors
on Intel vs AMD CPUs in 64-bit mode. While I don't
remember the details nor ever reading about different
implementations, this was something odd to discover.

IOW, I wonder how far you're willing to go defining
undefined behavior as what you think it is/should be/
must be? Until the first fiasco of your assumptions?
Until a bug report from one of your customers?

Alex
NimbUs
2015-02-20 10:41:31 UTC
Permalink
Alexei A. Frounze dit dans news:bb66cc1c-5c9c-4b51-8a83-
Post by Alexei A. Frounze
On Monday, February 16, 2015 at 5:02:26 AM UTC-8, NimbUs
...
Post by NimbUs
Indeed. Since 32-bit-real mode does not operate properly
under
Post by Alexei A. Frounze
Post by NimbUs
DOSBox, you must use DPMI. By the way DPMI is considered
the
Post by Alexei A. Frounze
Post by NimbUs
normal environment to execute FASM under DOS, 32-bit-(un)
real
Post by Alexei A. Frounze
Post by NimbUs
is a "bonus".
The problem is this particular combination of the "bonus",
which
Post by Alexei A. Frounze
is known to have problems, and the fact that DPMI is not
something present in DOS by default and that FASM does
attempt to
Post by Alexei A. Frounze
load CWSDPMI
or its own DPMI host,
You mean FASM's real32 host, which is nothing to do with DPMI
- obviously.
Post by Alexei A. Frounze
thus falling "back" to the
"bonus", which may not work and indeed does not work in some
environments.
Fortunately there are alternatives. Use the alternative that
works in your environment of choice, instead of cringing
against the ones that don't !
Post by Alexei A. Frounze
This can be remedied in most cases in several ways without
the
Post by Alexei A. Frounze
- ship with and launch CWSDPMI or another DPMI host
- rewrite the code so it can compile for 16-bit realmode
(in most cases it would just involve additional address
and operand size prefixes; change how subroutines are called
and returned (e.g. have a separate segment for each)), which
should be doable with FASM's preprocessor
- simply run the code in proper 32-bit protected mode and
provide enough plumbing for interrupt servicing and for
calls
Post by Alexei A. Frounze
into BIOS and DOS (virtual 8086 mode may help simplify
things)
Post by NimbUs
Under DPMI, FASM is a full 32-bit client, unlike TASM or
MASM
Post by Alexei A. Frounze
Post by NimbUs
or NASM which are 16-bit. If tou should find FASM fails
uner
Post by Alexei A. Frounze
Post by NimbUs
DOSbox - using DMPI of course - then it can be because of
bugs
Post by Alexei A. Frounze
Post by NimbUs
in DOSbox's emulation of 32-bit instructions. This is what
I
Post by Alexei A. Frounze
Post by NimbUs
meant in the quoted lines.
I don't know what you mean by NASM being a 16-bit client.
NASM
Post by Alexei A. Frounze
for DOS is compiled with DJGPP, which uses 32-bit DPMI.
My bad then. At least TASM and MASM are 16-bit beasts at the
heart...
Post by Alexei A. Frounze
Post by NimbUs
Post by Alexei A. Frounze
This behavior that FASM relies on is nor warranted by the
CPU documentation. The documentation clearly states the
steps to switch out of protected mode into real mode and
FASM isn't doing it as documented/recommended.
All intel ia-386 CPUs operate in that way, as well as all
clones (with an old, anecdotic, exception of Cyrix 586)
Would you be bold enough to define the officially undefined
behavior of the shift and rotate instructions in the cases
where the shift count is equal to or greater than the
operand size? Just curious. :) I have reverse-engineered
3 different implementations from the observed results.
Two of them belonged to different Intel CPUs and one to
AMD CPUs.
IIRC shifts /are/ documented by Intel such that the count be
reduced modulo 32 (32 bit operands) or 16 (16-bit), IOW higher
count bits (if any) are discarded. This was not the case in
8086/88088, possibly 186. Some old non-intel clones might
differ.
Post by Alexei A. Frounze
I've also noticed different behavior with NULL selectors
on Intel vs AMD CPUs in 64-bit mode. While I don't
remember the details nor ever reading about different
implementations, this was something odd to discover.
It is /not/ secret knowledge that Intel's reimplementation of
AMD-64 is far from accurate. Nor does Intel deign acknowledge
that "ia64" is not their its invention :=)
Post by Alexei A. Frounze
IOW, I wonder how far you're willing to go defining
undefined behavior as what you think it is/should be/
must be? Until the first fiasco of your assumptions?
Until a bug report from one of your customers?
I would'nt want to have to define "undefined" ! It is at best
a mobile target...
--
Nim'
Alexei A. Frounze
2015-02-20 11:07:28 UTC
Permalink
Post by NimbUs
Alexei A. Frounze dit dans news:bb66cc1c-5c9c-4b51-8a83-
Post by Alexei A. Frounze
On Monday, February 16, 2015 at 5:02:26 AM UTC-8, NimbUs
...
Post by NimbUs
Indeed. Since 32-bit-real mode does not operate properly
under
Post by Alexei A. Frounze
Post by NimbUs
DOSBox, you must use DPMI. By the way DPMI is considered
the
Post by Alexei A. Frounze
Post by NimbUs
normal environment to execute FASM under DOS, 32-bit-(un)
real
Post by Alexei A. Frounze
Post by NimbUs
is a "bonus".
The problem is this particular combination of the "bonus",
which
Post by Alexei A. Frounze
is known to have problems, and the fact that DPMI is not
something present in DOS by default and that FASM does
attempt to
Post by Alexei A. Frounze
load CWSDPMI
or its own DPMI host,
You mean FASM's real32 host, which is nothing to do with DPMI
- obviously.
Post by Alexei A. Frounze
thus falling "back" to the
"bonus", which may not work and indeed does not work in some
environments.
Fortunately there are alternatives. Use the alternative that
works in your environment of choice, instead of cringing
against the ones that don't !
You missed the point that I'd already voiced: UX.

...
Post by NimbUs
Post by Alexei A. Frounze
Would you be bold enough to define the officially undefined
behavior of the shift and rotate instructions in the cases
where the shift count is equal to or greater than the
operand size? Just curious. :) I have reverse-engineered
3 different implementations from the observed results.
Two of them belonged to different Intel CPUs and one to
AMD CPUs.
IIRC shifts /are/ documented by Intel such that the count be
reduced modulo 32 (32 bit operands) or 16 (16-bit), IOW higher
count bits (if any) are discarded. This was not the case in
8086/88088, possibly 186. Some old non-intel clones might
differ.
Check your documentation.

----8<----
IA-32 Architecture Compatibility

The 8086 does not mask the shift count. However, all other
IA-32 processors (starting with the Intel 286 processor) do
mask the shift count to 5 bits, resulting in a maximum count
of 31. This masking is done in all operating modes (including
the virtual-8086 mode) to reduce the maximum execution time
of the instructions.

Operation

IF 64-Bit Mode and using REX.W
THEN
countMASK <-3FH;
ELSE
countMASK <-1FH;
FI
tempCOUNT <-(COUNT AND countMASK);
...

Flags Affected

The CF flag contains the value of the last bit shifted out
of the destination operand; it is undefined for SHL and
SHR instructions where the count is greater than or equal
to the size (in bits) of the destination operand. The OF
flag is affected only for 1-bit shifts (see "Description"
above); otherwise, it is undefined. The SF, ZF, and PF
flags are set according to the result. If the count is 0,
the flags are not affected. For a nonzero count, the AF
flag is undefined.
----8<----

...
Post by NimbUs
Post by Alexei A. Frounze
IOW, I wonder how far you're willing to go defining
undefined behavior as what you think it is/should be/
must be? Until the first fiasco of your assumptions?
Until a bug report from one of your customers?
I would'nt want to have to define "undefined" ! It is at best
a mobile target...
And that's the case in point! DOSBox does not try to define
in a particular way a lot of things what are undefined.
FASM, OTOH, does effectively define what's undefined by
requiring a particular implementation of the undefined.

Can you see it?

Alex
NimbUs
2015-02-20 11:46:15 UTC
Permalink
Alexei A. Frounze dit dans news:47f57e38-f116-4cbf-b96d-
Post by Alexei A. Frounze
IA-32 Architecture Compatibility
The 8086 does not mask the shift count. However, all other
IA-32 processors (starting with the Intel 286 processor) do
mask the shift count to 5 bits, resulting in a maximum count
of 31. This masking is done in all operating modes
(including
Post by Alexei A. Frounze
the virtual-8086 mode) to reduce the maximum execution time
of the instructions.
Does different masking affect the visible result of certain
shift operations, other than the time it would take to do the
full count of useless shifts/rotations ?
Post by Alexei A. Frounze
----8<----
Post by NimbUs
I would'nt want to have to define "undefined" ! It is at
best
Post by Alexei A. Frounze
Post by NimbUs
a mobile target...
And that's the case in point! DOSBox does not try to define
in a particular way a lot of things what are undefined.
FASM, OTOH, does effectively define what's undefined by
requiring a particular implementation of the undefined.
Can you see it?
I can't see "it". FASM has multiple targets - OSes,
processors, on which it runs. The real-32 mode of operation
under DOS is just one of its host environments, the FASM
documentation clearly states it will run on most, not all,
ia32 compatible processors ever produced. The emulation of a
DOS system running on a ia-32 processor which DOS-box attempts
to provide is not one on which the RM32 mode of FASM will run.
So what ? I'm told by people who are familiar with DOSbox that
it has other annoying limitations... How is that FASM's
problem ?
--
Nim'
Alexei A. Frounze
2015-02-20 12:20:10 UTC
Permalink
Post by NimbUs
Alexei A. Frounze dit dans news:47f57e38-f116-4cbf-b96d-
Post by Alexei A. Frounze
IA-32 Architecture Compatibility
The 8086 does not mask the shift count. However, all other
IA-32 processors (starting with the Intel 286 processor) do
mask the shift count to 5 bits, resulting in a maximum count
of 31. This masking is done in all operating modes
(including
Post by Alexei A. Frounze
the virtual-8086 mode) to reduce the maximum execution time
of the instructions.
Does different masking affect the visible result of certain
shift operations, other than the time it would take to do the
full count of useless shifts/rotations ?
I'm not sure what you mean here. I was talking about the
undefined result under certain conditions (explicitly stated
in the doc, you snipped that most relevant part for some reason).
Post by NimbUs
Post by Alexei A. Frounze
----8<----
Post by NimbUs
I would'nt want to have to define "undefined" ! It is at
best
Post by Alexei A. Frounze
Post by NimbUs
a mobile target...
And that's the case in point! DOSBox does not try to define
in a particular way a lot of things what are undefined.
FASM, OTOH, does effectively define what's undefined by
requiring a particular implementation of the undefined.
Can you see it?
I can't see "it".
My hopes were wasted.
Post by NimbUs
FASM has multiple targets - OSes,
processors, on which it runs. The real-32 mode of operation
under DOS is just one of its host environments, the FASM
documentation clearly states it will run on most, not all,
ia32 compatible processors ever produced. The emulation of a
DOS system running on a ia-32 processor which DOS-box attempts
to provide is not one on which the RM32 mode of FASM will run.
So what ?
UX. The user isn't told explicitly that FASM is doing something
very rare and may therefore fail to operate. It isn't saying how
to workaround it.
Post by NimbUs
I'm told by people who are familiar with DOSbox that
it has other annoying limitations... How is that FASM's
problem ?
We aren't discussing other limitations and problems.
Just these:
- dependence on undefined behavior beyond what's normal
(4G-1 limits in data segments in real mode is de-facto
normal, almost universally known and supported, while
32-bit real mode isn't)
- not loading DPMI host
- the user isn't told what they should know and
potentially inconvenienced by FASM failure to operate

Alex
NimbUs
2015-02-20 16:03:17 UTC
Permalink
Alexei A. Frounze dit dans news:3da49b8f-82a9-4cd8-90bb-
Post by Alexei A. Frounze
Post by NimbUs
Does different masking affect the visible result of certain
shift operations, other than the time it would take to do
the full
Post by Alexei A. Frounze
Post by NimbUs
count of useless shifts/rotations ?
I'm not sure what you mean here. I was talking about the
undefined result under certain conditions (explicitly stated
in the doc, you snipped that most relevant part for some
reason).

Improper snipping if any was not intentional, I apologise
if it appeared to be.

I don't see /what/ undefined results you have in mind here
however. Please remember we are talking about the effect of
instructions, documented or not, on IA-32 architecture. ISTM
there is little left undefined bout the operation and intended
effect of shifts & rotations, is there ?
Specifically you mention of : "undefined result under certain
conditions (explicitly stated in the doc)" deserves
qualification : which intel "doc" says that exactly where ?
Please enlighten me !
Post by Alexei A. Frounze
Post by NimbUs
Post by Alexei A. Frounze
Can you see it?
I can't see "it".
My hopes were wasted.
:=)
Post by Alexei A. Frounze
Post by NimbUs
FASM has multiple targets - OSes, processors, on which it
runs. The
Post by Alexei A. Frounze
Post by NimbUs
real-32 mode of operation under DOS is just one of its host
environments, the FASM documentation clearly states it will
run on
Post by Alexei A. Frounze
Post by NimbUs
most, not all, ia32 compatible processors ever produced.
The
Post by Alexei A. Frounze
Post by NimbUs
emulation of a DOS system running on a ia-32 processor
which DOS-box
Post by Alexei A. Frounze
Post by NimbUs
attempts to provide is not one on which the RM32 mode of
FASM will
Post by Alexei A. Frounze
Post by NimbUs
run. So what ?
UX. The user isn't told explicitly that FASM is doing
something
Post by Alexei A. Frounze
very rare and may therefore fail to operate. It isn't saying
how
Post by Alexei A. Frounze
to workaround it.
FASM for DOS is supposed to run on a DOS machine with a 32-bit
capable, intel-386-compatible processor : it does, without
qualification, either under a DPMI-host or in real mode,
without an EMM. It was NOT designed to run in so-called,
incompatible, DOS-Box (although it seems it does, when using
proper settings).

RTFM ! I'm sure it tells DPMI may be needed. It is customary
for software to rely on DPMI host running before it starts. In
particular, DPMI is what makes FASM run in a Windows "DOS
box" - which you would expect more developpers to use as an
environment for assembly than DOSbox.
Post by Alexei A. Frounze
Post by NimbUs
I'm told by people who are familiar with DOSbox that
it has other annoying limitations... How is that FASM's
problem ?
Post by Alexei A. Frounze
We aren't discussing other limitations and problems.
- dependence on undefined behavior beyond what's normal
(4G-1 limits in data segments in real mode is de-facto
normal, almost universally known and supported, while
32-bit real mode isn't)
- not loading DPMI host
- the user isn't told what they should know and
potentially inconvenienced by FASM failure to operate
ISTM the problem is on your side : RTFM !

Anyway I have not been pushing for your use of FASM, someone
else mentioned it in this thread. Since IIRC you mentionned
having problems with FASM relating to DPMI, I was - stupid
enough - to mention that FASM /could/ use RM32 instead(under
proper circumstances). I think I wasn't even aware of your
using DOSBox when wirting that post, otherwise I would have
suggested immediately a switch to a better environment, on
real metal or emulated.
--
Nim'
Rod Pemberton
2015-02-21 00:54:28 UTC
Permalink
Post by NimbUs
Alexei A. Frounze dit dans news:3da49b8f-82a9-4cd8-90bb-
ISTM the problem is on your side : RTFM !
Yes, Alexei can easily push the conversation or
argument until someone gets angry. Others here can
as well. So, I don't think he intends to do so
intentionally, and you shouldn't hold it against him.
Post by NimbUs
Anyway I have not been pushing for your use of FASM,
someone else mentioned it in this thread.
Yeah, *I* mentioned it. I'm still here. There aren't
too many people who post and leave in this small group.

Alexei said he was going to develop a NASM syntax assembler
with just enough capability to support his C compiler. As
someone who has way too many incomplete personal projects,
I mentioned that this might be another big project ... It
seemed reasonable to me to mention that. He also mentioned
that NASM could be slow. So, I suggested other NASM syntax
assemblers: FASM, YASM.
Post by NimbUs
[...] I would have suggested immediately a switch
to a better environment, on real metal or emulated.
Unfortunately, I seem to be one of the last people here
to still use bare metal or real hardware for testing and
development. I suspect wolfgang does too. Steve might
also. Based on some of Alexei's online code, I'd say
he did that in the past as well.

Most seem to have moved on to emulators on Windows or
Linux to speed up development, or to enable them to
use more powerful tools available with their host OS, e.g.,
compilers, assemblers. Of course, emulators can find
other coding mistakes, as Alexei has demonstrated.

However, when your development becomes emulation bound,
you see projects like "ReactOS" which are very advanced
that fail to run on any real hardware. This is due to
their dependence on using emulators to develop. When
people complain that ReactOS won't work on their machine,
they're asked what emulator they used!


Rod Pemberton
Alexei A. Frounze
2015-02-23 08:00:53 UTC
Permalink
Post by NimbUs
Alexei A. Frounze dit dans news:3da49b8f-82a9-4cd8-90bb-
Post by Alexei A. Frounze
Post by NimbUs
Does different masking affect the visible result of certain
shift operations, other than the time it would take to do
the full
Post by Alexei A. Frounze
Post by NimbUs
count of useless shifts/rotations ?
I'm not sure what you mean here. I was talking about the
undefined result under certain conditions (explicitly stated
in the doc, you snipped that most relevant part for some
reason).
Improper snipping if any was not intentional, I apologise
if it appeared to be.
I don't see /what/ undefined results you have in mind here
however.
The snipped part with undefined results included:

----8<----
Flags Affected

The CF flag contains the value of the last bit shifted out
of the destination operand; it is undefined for SHL and
SHR instructions where the count is greater than or equal
to the size (in bits) of the destination operand. The OF
flag is affected only for 1-bit shifts (see "Description"
above); otherwise, it is undefined. The SF, ZF, and PF
flags are set according to the result. If the count is 0,
the flags are not affected. For a nonzero count, the AF
flag is undefined.
----8<----

It looks like you ignored it.
Post by NimbUs
Please remember we are talking about the effect of
instructions, documented or not, on IA-32 architecture. ISTM
there is little left undefined bout the operation and intended
effect of shifts & rotations, is there ?
Specifically you mention of : "undefined result under certain
conditions (explicitly stated in the doc)" deserves
qualification : which intel "doc" says that exactly where ?
Please enlighten me !
You asked for it. The excerpt, part of which you snipped
(and from what I can see) ignored, is from Vol. 2B of
Intel(R) 64 and IA-32 Architectures Software Developer's Manual
from May 2011. It's in the section SAL/SAR/SHL/SHR-Shift.
Post by NimbUs
Post by Alexei A. Frounze
Post by NimbUs
Post by Alexei A. Frounze
Can you see it?
I can't see "it".
My hopes were wasted.
:=)
Post by Alexei A. Frounze
Post by NimbUs
FASM has multiple targets - OSes, processors, on which it
runs. The
Post by Alexei A. Frounze
Post by NimbUs
real-32 mode of operation under DOS is just one of its host
environments, the FASM documentation clearly states it will
run on
Post by Alexei A. Frounze
Post by NimbUs
most, not all, ia32 compatible processors ever produced.
The
Post by Alexei A. Frounze
Post by NimbUs
emulation of a DOS system running on a ia-32 processor
which DOS-box
Post by Alexei A. Frounze
Post by NimbUs
attempts to provide is not one on which the RM32 mode of
FASM will
Post by Alexei A. Frounze
Post by NimbUs
run. So what ?
UX. The user isn't told explicitly that FASM is doing
something
Post by Alexei A. Frounze
very rare and may therefore fail to operate. It isn't saying
how
Post by Alexei A. Frounze
to workaround it.
FASM for DOS is supposed to run on a DOS machine with a 32-bit
capable, intel-386-compatible processor : it does, without
qualification, either under a DPMI-host or in real mode,
without an EMM. It was NOT designed to run in so-called,
incompatible, DOS-Box (although it seems it does, when using
proper settings).
Incompatible with what? With something that nobody even
defined? :)
Post by NimbUs
RTFM !
And so do you! :)
Post by NimbUs
I'm sure it tells DPMI may be needed. It is customary
for software to rely on DPMI host running before it starts. In
particular, DPMI is what makes FASM run in a Windows "DOS
box" - which you would expect more developpers to use as an
environment for assembly than DOSbox.
Perhaps. Perhaps not. NTVDM is not available in 64-bit Windows.
And older 32-bit Windows systems (XP and downwards) are
increasingly harder to keep alive and use.

These days people typically have 64-bit Windows 7 or 8.

So, with that about your only option with modern Windows is
to install what's known as "XP Mode" on Windows 7, which is
not available on Vista or 8. Further, that Windows XP virtual
machine has additional limitations, not found in regular
Windows XP. One of them is lack of graphics modes. Even if
you can run the DOS version of FASM in it, you will likely
not choose this as your normal DOS environment because
of those limitations, because of some inconvenience of use,
because you'll have to ditch it when you move to Windows
8 or 10 or whatever.

You could, OTOH, have a completely separate VM with DOS and
most of the PC hardware features in it. That would be
better. But as I can see, it's not very easy to copy files
to and from such VMs. Virtual network in DOS is probably
out of question. You can use virtual floppy and CD disks to
share data. Again, that's a number of extra steps nobody
really wants to do (and few know or can find out how).

DOSBox, OTOH, supports graphics modes, runs pretty much
everywhere and works off the host OS file system. It
doesn't require having a special VM environment, or
fully licensed DOS (there's FreeDOS, I hear) and it
won't cease to work forever because the next version of
Windows removes or breaks something that DOS support
requires (as gradually happened all the way since
Windows 2000 and XP).

Yes, DOSBox have bugs (some are quite bad) and its
compatibility with DOS and PC hardware is partial.
But its still good for many things and in some cases
its usability beats that of other virtual environments.

So, if you develop or maintain software for DOS
you should probably take into account the world around
and consider making your software work well and have
not odd usability issues in DOSBox as well.

Just like you can't force me to use FASM, I can't force
the FASM's author to make appropriate adjustments.
But that (the adjustments) would be a very good thing to do.
Post by NimbUs
Post by Alexei A. Frounze
Post by NimbUs
I'm told by people who are familiar with DOSbox that
it has other annoying limitations... How is that FASM's
problem ?
Post by Alexei A. Frounze
We aren't discussing other limitations and problems.
- dependence on undefined behavior beyond what's normal
(4G-1 limits in data segments in real mode is de-facto
normal, almost universally known and supported, while
32-bit real mode isn't)
- not loading DPMI host
- the user isn't told what they should know and
potentially inconvenienced by FASM failure to operate
ISTM the problem is on your side : RTFM !
I did. But I think you didn't.

You asked me about the undefined results per the Intel
doc and I showed you where exactly to read about that.
Why did you have to ask? You never read the doc? Or your
selective reading of it didn't include these parts because
you thought they were unimportant? Either way, looks like
you need to RTFM.

I also did read the online version of the FASM manual
and looked at the PDF version of it in the zip files
with the binaries and sources.

The PDF version (1.7) is a bit behind the online version
(1.71) and doesn't even mention DPMI or DOS in its
section 1.1.1 System requirements. The online version
is a tad better and states:

"
DOS version requires an OS compatible with MS DOS 2.0
and either true real mode environment or DPMI.
"

However it fails to explain that "true real mode environment"
is something very important and may be a problem.

Neither of the versions of the manual says what to do
if either DPMI isn't present or there's no "true real mode
environment" or there's neither, as is the case when
FASM is used under DOSBox.

I did even read the relevant parts of the FASM source code.
And now I know what "true real mode environment" means
and how it can be a problem. I also know that FASM doesn't
come with a DPMI host nor tries to load one that would work
(e.g. CWSDPMI). I know how to work around these issues.

However, if I cannot expect FASM to be a bit more compatible
with various environments and therefore more user friendly,
I would at least expect from FASM documentation to diligently
explain exactly what system requirements constitute and
provide a few hints/workarounds in clear text to help the
user.

So, who's left to RTFM?
Post by NimbUs
Anyway I have not been pushing for your use of FASM, someone
else mentioned it in this thread. Since IIRC you mentionned
having problems with FASM relating to DPMI, I was - stupid
enough - to mention that FASM /could/ use RM32 instead(under
proper circumstances). I think I wasn't even aware of your
using DOSBox when wirting that post, otherwise I would have
suggested immediately a switch to a better environment, on
real metal or emulated.
Performance aside, NASM works under DOSBox. And I was
expecting at least the same "QoS" from FASM. :)

Alex
Bertho G
2015-02-23 12:09:48 UTC
Permalink
Alexei A. Frounze dit dans news:37382959-20b3-4138-a0a4-
***@googlegroups.com:

I'll be trying, this time, not to snip so much as I usually
do. I hate overlong usenet posts however, not everyone has
fast internet even these days...
Post by Alexei A. Frounze
Post by NimbUs
I don't see /what/ undefined results you have in mind here
however.
----8<----
Flags Affected
The CF flag contains the value of the last bit shifted out
of the destination operand; it is undefined for SHL and
SHR instructions where the count is greater than or equal
to the size (in bits) of the destination operand. The OF
flag is affected only for 1-bit shifts (see "Description"
above); otherwise, it is undefined. The SF, ZF, and PF
flags are set according to the result. If the count is 0,
the flags are not affected. For a nonzero count, the AF
flag is undefined.
----8<----
It looks like you ignored it.
1. Well, I do not think in-depth discussion of shift
instructions, undefined or unused flags, etc, had much
relevance to the subject

2. Yhere are so many docs, including "official" Intel
processor manuals, sometimes conflicting. In the past there
used to be (somewhat expensive) printed manuals from Intel
presses, I have or rather used to have some in my possession.
The advantage (or disadvantage ?) relative to electronic
documentation which has become the norm is that, with
electronic docs, the publisher can and does disappear it, or
worse change bits overtime without always acknowledging it
:=(

3. you quoted from recent IA-64 docs, while I had IA-32
manuals in mind (386, 486, Pentium+). While IA-64 in principle
incorporates IA-32, it cannot be a reference for IA-32 any
more than DOSBox could be a reference-DOS !
Post by Alexei A. Frounze
Post by NimbUs
Please enlighten me !
You asked for it.
I did, and so thank you !
Post by Alexei A. Frounze
The excerpt, part of which you snipped
(and from what I can see) ignored, is from Vol. 2B of
Intel(R) 64 and IA-32 Architectures Software Developer's
Manual
Post by Alexei A. Frounze
from May 2011. It's in the section SAL/SAR/SHL/SHR-Shift.
No relevance to FASM's RM32 or, per Tomasz's naming, "flat
real mode" (FRM) - terminology varies.
Post by Alexei A. Frounze
Post by NimbUs
FASM for DOS is supposed to run on a DOS machine with a 32-
bit
Post by Alexei A. Frounze
Post by NimbUs
capable, intel-386-compatible processor : it does, without
qualification, either under a DPMI-host or in real mode,
without an EMM. It was NOT designed to run in so-called,
incompatible, DOS-Box (although it seems it does, when
using
Post by Alexei A. Frounze
Post by NimbUs
proper settings).
Incompatible with what? With something that nobody even
defined? :)
Incompatible with industry-standard PC AT - ISA ...
Post by Alexei A. Frounze
Post by NimbUs
RTFM !
And so do you! :)
Post by NimbUs
I'm sure it tells DPMI may be needed. It is customary
for software to rely on DPMI host running before it starts.
In
Post by Alexei A. Frounze
Post by NimbUs
particular, DPMI is what makes FASM run in a Windows "DOS
box" - which you would expect more developpers to use as an
environment for assembly than DOSbox.
Perhaps. Perhaps not. NTVDM is not available in 64-bit
Windows.

Blame Microsoft, not someone else! Anyway, Windows, any
bitness, is not DOS. You are the one who insists on running
FASM for DOS in a non-DOS system !

Do you realise that there exists a FASM version for MS-
Windows, too ? In fact there are not one, but TWO versions of
FASMW : one Console, and one GUI. Did you try them ? Maybe
they suit your needs better than the DOS versions !
All versions of FASM will assemble identically from source for
any supported target, AFAIK.
Post by Alexei A. Frounze
And older 32-bit Windows systems (XP and downwards) are
increasingly harder to keep alive and use.
These days people typically have 64-bit Windows 7 or 8.
As a DOS developper, you are by definition targetting users
who are able to run DOS successfully, not mainstream.
Post by Alexei A. Frounze
So, with that about your only option with modern Windows is
to install what's known as "XP Mode" on Windows 7, which is
not available on Vista or 8. Further, that Windows XP
virtual
Post by Alexei A. Frounze
machine has additional limitations, not found in regular
Windows XP. One of them is lack of graphics modes. Even if
you can run the DOS version of FASM in it, you will likely
not choose this as your normal DOS environment because
of those limitations, because of some inconvenience of use,
because you'll have to ditch it when you move to Windows
8 or 10 or whatever.
You could, OTOH, have a completely separate VM with DOS and
most of the PC hardware features in it. That would be
better. But as I can see, it's not very easy to copy files
to and from such VMs. Virtual network in DOS is probably
out of question.
Virtual networking in DOS is not difficult. I can't speak for
DOSBox of course.
Post by Alexei A. Frounze
You can use virtual floppy and CD disks to
share data. Again, that's a number of extra steps nobody
really wants to do (and few know or can find out how).
DOSBox, OTOH, supports graphics modes, runs pretty much
everywhere and works off the host OS file system. It
doesn't require having a special VM environment, or
fully licensed DOS (there's FreeDOS, I hear) and it
won't cease to work forever because the next version of
Windows removes or breaks something that DOS support
requires (as gradually happened all the way since
Windows 2000 and XP).
Yes, DOSBox have bugs (some are quite bad) and its
compatibility with DOS and PC hardware is partial.
But its still good for many things and in some cases
its usability beats that of other virtual environments.
So, if you develop or maintain software for DOS
you should probably take into account the world around
and consider making your software work well and have
not odd usability issues in DOSBox as well.
Just like you can't force me to use FASM, I can't force
the FASM's author to make appropriate adjustments.
But that (the adjustments) would be a very good thing to do.
Post by NimbUs
Post by Alexei A. Frounze
Post by NimbUs
I'm told by people who are familiar with DOSbox that
it has other annoying limitations... How is that FASM's
problem ?
Post by Alexei A. Frounze
We aren't discussing other limitations and problems.
- dependence on undefined behavior beyond what's normal
(4G-1 limits in data segments in real mode is de-facto
normal, almost universally known and supported, while
32-bit real mode isn't)
- not loading DPMI host
- the user isn't told what they should know and
potentially inconvenienced by FASM failure to operate
ISTM the problem is on your side : RTFM !
I did. But I think you didn't.
You asked me about the undefined results per the Intel
doc and I showed you where exactly to read about that.
Why did you have to ask? You never read the doc? Or your
selective reading of it didn't include these parts because
you thought they were unimportant? Either way, looks like
you need to RTFM.
I also did read the online version of the FASM manual
and looked at the PDF version of it in the zip files
with the binaries and sources.
The PDF version (1.7) is a bit behind the online version
(1.71) and doesn't even mention DPMI or DOS in its
section 1.1.1 System requirements. The online version
"
DOS version requires an OS compatible with MS DOS 2.0
and either true real mode environment or DPMI.
"
However it fails to explain that "true real mode
environment"
Post by Alexei A. Frounze
is something very important and may be a problem.
My opinion is that FASM's electronic documentation, including
website and forum is of outstanding quality. I do not take
your claims as being in good faith, either.
Post by Alexei A. Frounze
Neither of the versions of the manual says what to do
if either DPMI isn't present or there's no "true real mode
environment" or there's neither, as is the case when
FASM is used under DOSBox.
Bullsh*t !
Post by Alexei A. Frounze
I did even read the relevant parts of the FASM source code.
And now I know what "true real mode environment" means
and how it can be a problem. I also know that FASM doesn't
come with a DPMI host nor tries to load one that would work
(e.g. CWSDPMI). I know how to work around these issues.
I hope you do.
Post by Alexei A. Frounze
However, if I cannot expect FASM to be a bit more compatible
with various environments and therefore more user friendly,
I would at least expect from FASM documentation to
diligently
Post by Alexei A. Frounze
explain exactly what system requirements constitute and
provide a few hints/workarounds in clear text to help the
user.
The "user" of an assembler is supposed to be a developper who
understands, like you do, the meaning of "real mode DOS
environment" and how to launch a DPMI host, if need be.
Post by Alexei A. Frounze
So, who's left to RTFM?
Post by NimbUs
Anyway I have not been pushing for your use of FASM,
someone
Post by Alexei A. Frounze
Post by NimbUs
else mentioned it in this thread. Since IIRC you mentionned
having problems with FASM relating to DPMI, I was - stupid
enough - to mention that FASM /could/ use RM32 instead
(under
Post by Alexei A. Frounze
Post by NimbUs
proper circumstances). I think I wasn't even aware of your
using DOSBox when wirting that post, otherwise I would have
suggested immediately a switch to a better environment, on
real metal or emulated.
Performance aside, NASM works under DOSBox. And I was
expecting at least the same "QoS" from FASM. :)
Clearly you didn't try very hard to love FASM :=)
Since NASM and DOSBox make you happy, you're better off
keeping with them.

Feel free to consider the subthread finished, or reply if you
need to, either way I am done with it.

Best regards
--
Nim'
Alexei A. Frounze
2015-02-23 13:13:07 UTC
Permalink
Post by Bertho G
Alexei A. Frounze dit dans news:37382959-20b3-4138-a0a4-
I'll be trying, this time, not to snip so much as I usually
do. I hate overlong usenet posts however, not everyone has
fast internet even these days...
Post by Alexei A. Frounze
Post by NimbUs
I don't see /what/ undefined results you have in mind here
however.
----8<----
Flags Affected
The CF flag contains the value of the last bit shifted out
of the destination operand; it is undefined for SHL and
SHR instructions where the count is greater than or equal
to the size (in bits) of the destination operand. The OF
flag is affected only for 1-bit shifts (see "Description"
above); otherwise, it is undefined. The SF, ZF, and PF
flags are set according to the result. If the count is 0,
the flags are not affected. For a nonzero count, the AF
flag is undefined.
----8<----
It looks like you ignored it.
1. Well, I do not think in-depth discussion of shift
instructions, undefined or unused flags, etc, had much
relevance to the subject
It was an example of something you can't really define
or rely on. Because it's officially undefined (in the
CPU documentation) and because the behavior is known
to differ on different CPUs.

The situation with 32-bit real mode is very similar.
You may claim that all CPUs do support it (except,
perhaps, some obscure clones), but it's undefined and
there's no guarantee it will be supported in the future
or properly emulated. And we know it's not properly
emulated.

That's how it's relevant.
Post by Bertho G
2. Yhere are so many docs, including "official" Intel
processor manuals, sometimes conflicting. In the past there
used to be (somewhat expensive) printed manuals from Intel
presses, I have or rather used to have some in my possession.
The advantage (or disadvantage ?) relative to electronic
documentation which has become the norm is that, with
electronic docs, the publisher can and does disappear it, or
worse change bits overtime without always acknowledging it
:=(
I know their documentation sucks. Their i80386 (and perhaps
i80486) manuals were the best IMO.
Post by Bertho G
3. you quoted from recent IA-64 docs, while I had IA-32
manuals in mind (386, 486, Pentium+). While IA-64 in principle
incorporates IA-32, it cannot be a reference for IA-32 any
more than DOSBox could be a reference-DOS !
I don't see your point here. These manuals cover 16-bit and
32-bit modes as well as 64-bit ones.
Post by Bertho G
Post by Alexei A. Frounze
Post by NimbUs
Please enlighten me !
You asked for it.
I did, and so thank you !
Post by Alexei A. Frounze
The excerpt, part of which you snipped
(and from what I can see) ignored, is from Vol. 2B of
Intel(R) 64 and IA-32 Architectures Software Developer's
Manual
Post by Alexei A. Frounze
from May 2011. It's in the section SAL/SAR/SHL/SHR-Shift.
No relevance to FASM's RM32 or, per Tomasz's naming, "flat
real mode" (FRM) - terminology varies.
I spoke about relevance above.
Post by Bertho G
Post by Alexei A. Frounze
Post by NimbUs
FASM for DOS is supposed to run on a DOS machine with a 32-
bit
Post by Alexei A. Frounze
Post by NimbUs
capable, intel-386-compatible processor : it does, without
qualification, either under a DPMI-host or in real mode,
without an EMM. It was NOT designed to run in so-called,
incompatible, DOS-Box (although it seems it does, when
using
Post by Alexei A. Frounze
Post by NimbUs
proper settings).
Incompatible with what? With something that nobody even
defined? :)
Incompatible with industry-standard PC AT - ISA ...
Would you mind providing a reference number for said standard?
Full title, publisher, publication date, ISBN, etc, please?
Post by Bertho G
Post by Alexei A. Frounze
Post by NimbUs
RTFM !
And so do you! :)
Post by NimbUs
I'm sure it tells DPMI may be needed. It is customary
for software to rely on DPMI host running before it starts.
In
Post by Alexei A. Frounze
Post by NimbUs
particular, DPMI is what makes FASM run in a Windows "DOS
box" - which you would expect more developpers to use as an
environment for assembly than DOSbox.
Perhaps. Perhaps not. NTVDM is not available in 64-bit
Windows.
Blame Microsoft, not someone else!
So, you're suggesting that Microsoft should've provided an extra
module to emulate the CPU because of inaccessibility of the
real or virtual 8086 modes while the OS was running in the new
64-bit mode? That much effort to support virtually dead old DOS
business? Mind you, NTVDM is still available in 32-bit versions
of Windows Vista and Windows 7. They didn't completely remove it
there. But they had no cheap way of nor the incentive to keeping
it work in 64-bit Windows.

You can blame Microsoft all you want. But AMD and Intel were
the ones who first contributed to ending DOS support in Windows.
Post by Bertho G
Anyway, Windows, any
bitness, is not DOS. You are the one who insists on running
FASM for DOS in a non-DOS system !
FreeDOS isn't DOS either, right? Because it didn't come
from Microsoft or IBM or whoever else was in the business back
in the days. And I may not have the original disks anymore.

Doesn't matter. DOSBox is close enough to DOS and for the
most part it must be sufficient for a relatively simple tool
like an assembler, which reads a file, crunches some numbers
and spews out another file.
Post by Bertho G
Do you realise that there exists a FASM version for MS-
Windows, too ?
Thanks for noting! If you paid enough attention to my compiler,
in the context of which we started talking about using one
assembler or another or implementing a dedicated small one,
exists as DOS, Windows and Linux binaries and therefore
depends on an assembler that can run on all 3 systems.
If I were to use FASM instead of NASM with my compiler,
I'd have to either workaround FASM issues with RM32 (or
whatever one calls it) and DPMI or fix them or devote
an extra subsection of my documentation to cover what's
not covered in the FASM proper documentation. And
for some odd reason you call my mention of said
deficiency of the documentation bullshit.
Post by Bertho G
In fact there are not one, but TWO versions of
FASMW : one Console, and one GUI. Did you try them ? Maybe
they suit your needs better than the DOS versions !
All versions of FASM will assemble identically from source for
any supported target, AFAIK.
Post by Alexei A. Frounze
And older 32-bit Windows systems (XP and downwards) are
increasingly harder to keep alive and use.
These days people typically have 64-bit Windows 7 or 8.
As a DOS developper, you are by definition targetting users
who are able to run DOS successfully, not mainstream.
Not necessarily. There are people who learn and aren't pros
in DOS or assembly yet. There are certain advantages to
learning assembly and hardware programming in DOS. The hardware
is directly accessible and there's quite a bit of documentation
and code from the old days that can still be used for the purpose.
I see it as a viable stepping stone to things like protected
mode and so on.
And DOSBox eases the part of installation and configuration for
very basic usage patterns such as running a game or a
text/file-oriented tool like an assembler.
Post by Bertho G
Post by Alexei A. Frounze
So, with that about your only option with modern Windows is
to install what's known as "XP Mode" on Windows 7, which is
not available on Vista or 8. Further, that Windows XP
virtual
Post by Alexei A. Frounze
machine has additional limitations, not found in regular
Windows XP. One of them is lack of graphics modes. Even if
you can run the DOS version of FASM in it, you will likely
not choose this as your normal DOS environment because
of those limitations, because of some inconvenience of use,
because you'll have to ditch it when you move to Windows
8 or 10 or whatever.
You could, OTOH, have a completely separate VM with DOS and
most of the PC hardware features in it. That would be
better. But as I can see, it's not very easy to copy files
to and from such VMs. Virtual network in DOS is probably
out of question.
Virtual networking in DOS is not difficult. I can't speak for
DOSBox of course.
I'm not sure about it. Last time I used network in DOS
environment it was either something hardware-specific or it was
DOS box in Windows and some Windows-specific APIs exposed to
DOS were in use. I mean, you may be right. Or not. But I can
see that one would need a DOS driver for the network device(s)
supported by the VM, a TCP stack for DOS and some tools for
DOS to actually exchange data over some other protocols over
TCP. As far as I know, MSDOS never included a TCP stack and
it was always 3rd party tools for network stuff. Correct me
if I'm wrong. Or provide details on what tools exists now
for making DOS (running in a VM) reachable over network.
Post by Bertho G
Post by Alexei A. Frounze
You can use virtual floppy and CD disks to
share data. Again, that's a number of extra steps nobody
really wants to do (and few know or can find out how).
DOSBox, OTOH, supports graphics modes, runs pretty much
everywhere and works off the host OS file system. It
doesn't require having a special VM environment, or
fully licensed DOS (there's FreeDOS, I hear) and it
won't cease to work forever because the next version of
Windows removes or breaks something that DOS support
requires (as gradually happened all the way since
Windows 2000 and XP).
Yes, DOSBox have bugs (some are quite bad) and its
compatibility with DOS and PC hardware is partial.
But its still good for many things and in some cases
its usability beats that of other virtual environments.
So, if you develop or maintain software for DOS
you should probably take into account the world around
and consider making your software work well and have
not odd usability issues in DOSBox as well.
Just like you can't force me to use FASM, I can't force
the FASM's author to make appropriate adjustments.
But that (the adjustments) would be a very good thing to do.
Post by NimbUs
Post by Alexei A. Frounze
Post by NimbUs
I'm told by people who are familiar with DOSbox that
it has other annoying limitations... How is that FASM's
problem ?
Post by Alexei A. Frounze
We aren't discussing other limitations and problems.
- dependence on undefined behavior beyond what's normal
(4G-1 limits in data segments in real mode is de-facto
normal, almost universally known and supported, while
32-bit real mode isn't)
- not loading DPMI host
- the user isn't told what they should know and
potentially inconvenienced by FASM failure to operate
ISTM the problem is on your side : RTFM !
I did. But I think you didn't.
You asked me about the undefined results per the Intel
doc and I showed you where exactly to read about that.
Why did you have to ask? You never read the doc? Or your
selective reading of it didn't include these parts because
you thought they were unimportant? Either way, looks like
you need to RTFM.
I also did read the online version of the FASM manual
and looked at the PDF version of it in the zip files
with the binaries and sources.
The PDF version (1.7) is a bit behind the online version
(1.71) and doesn't even mention DPMI or DOS in its
section 1.1.1 System requirements. The online version
"
DOS version requires an OS compatible with MS DOS 2.0
and either true real mode environment or DPMI.
"
However it fails to explain that "true real mode
environment"
Post by Alexei A. Frounze
is something very important and may be a problem.
My opinion is that FASM's electronic documentation, including
website and forum is of outstanding quality. I do not take
your claims as being in good faith, either.
Sure, it's your choice and opinion.
Post by Bertho G
Post by Alexei A. Frounze
Neither of the versions of the manual says what to do
if either DPMI isn't present or there's no "true real mode
environment" or there's neither, as is the case when
FASM is used under DOSBox.
Bullsh*t !
Could you please elaborate on what you mean by that?
Can you point me to a location in the manual where
"true real mode environment" is explained? Or how to
get DPMI if there's none and specifically a list
of a few compatible DPMI hosts with instructions on
how to run them? If you can't, then that's buillsh*t.
Post by Bertho G
Post by Alexei A. Frounze
I did even read the relevant parts of the FASM source code.
And now I know what "true real mode environment" means
and how it can be a problem. I also know that FASM doesn't
come with a DPMI host nor tries to load one that would work
(e.g. CWSDPMI). I know how to work around these issues.
I hope you do.
Post by Alexei A. Frounze
However, if I cannot expect FASM to be a bit more compatible
with various environments and therefore more user friendly,
I would at least expect from FASM documentation to
diligently
Post by Alexei A. Frounze
explain exactly what system requirements constitute and
provide a few hints/workarounds in clear text to help the
user.
The "user" of an assembler is supposed to be a developper who
understands, like you do, the meaning of "real mode DOS
environment" and how to launch a DPMI host, if need be.
Addressed above.
Post by Bertho G
Post by Alexei A. Frounze
So, who's left to RTFM?
Post by NimbUs
Anyway I have not been pushing for your use of FASM,
someone
Post by Alexei A. Frounze
Post by NimbUs
else mentioned it in this thread. Since IIRC you mentionned
having problems with FASM relating to DPMI, I was - stupid
enough - to mention that FASM /could/ use RM32 instead
(under
Post by Alexei A. Frounze
Post by NimbUs
proper circumstances). I think I wasn't even aware of your
using DOSBox when wirting that post, otherwise I would have
suggested immediately a switch to a better environment, on
real metal or emulated.
Performance aside, NASM works under DOSBox. And I was
expecting at least the same "QoS" from FASM. :)
Clearly you didn't try very hard to love FASM :=)
It would be tough love. :)
Post by Bertho G
Since NASM and DOSBox make you happy, you're better off
keeping with them.
You bet! :)
Post by Bertho G
Feel free to consider the subthread finished, or reply if you
need to, either way I am done with it.
I'll be sad to see no continuation. Just kidding. But if you
don't want to continue the thread, it's fine with me. Don't reply.

Alex
P.S. thanks for participating in the thread!
Rod Pemberton
2015-02-21 00:54:09 UTC
Permalink
On Fri, 20 Feb 2015 07:20:10 -0500, Alexei A. Frounze
Post by Alexei A. Frounze
Post by NimbUs
Alexei A. Frounze dit dans news:47f57e38-f116-4cbf-b96d-
FASM has multiple targets - OSes,
processors, on which it runs. The real-32 mode of operation
under DOS is just one of its host environments, the FASM
documentation clearly states it will run on most, not all,
ia32 compatible processors ever produced. The emulation of a
DOS system running on a ia-32 processor which DOS-box attempts
to provide is not one on which the RM32 mode of FASM will run.
So what ?
UX. The user isn't told explicitly that FASM is doing
something very rare and may therefore fail to operate.
Earlier, you said FASM will fail RM32 startup, if RM32
isn't found. If FASM exits _cleanly_ under those
circumstances, that's sufficient for me. I'd be worried
though that they might've not reset some descriptors, or
something hidden in the processor state remains changed,
etc. Ctrl-alt-delete.

I expect most code that I run to be compiled by DJGPP
or OpenWatcom. So, my machine is setup for XMS and
self-loading DPMI apps. It's not setup for EMS, VCPI,
DPMS. It's also not setup for automatically loading
DPMI since multiple DPMI hosts would conflict. I.e.,
the DPMI startup for FASM should work for me on RM
MS-DOS, Windows 98/SE console, and Linux under DOSBox,
Qemu, and dosemu. Although, I have had issues with Qemu
and DPMI. If I had my way, I'd prefer for DPMI to
be loaded automatically, e.g., always present like
in Windows 98/SE console. Unfortunately, only a few
DPMI hosts work with both DJGPP and OpenWatcom.
Supposedly, Daniel Borca's D3X is one of those.
Post by Alexei A. Frounze
It isn't saying how to workaround it.
AIUI, the only place RM32 should attempt to start for
FASM is under RM DOS, where it's easy to start or setup
a DPMI host for it. So, basically, you're complaining
about FASM not self-loading a DPMI host, which seems
trivial to me. And, you know how to, and I also told
you, how to fix it. D3X and many other DPMI hosts can
stub the executable.
Post by Alexei A. Frounze
- not loading DPMI host
So, you'd rather have FASM be setup to:

1) self-load a DPMI host
2) drop all support for RM32

That would make it just like any other DOS DPMI app.
I think we get that. It's not that way. So, if you
intend to use it, you must select some other option.
I.e., stub it, patch it, recompile it, etc.


Rod Pemberton
Alexei A. Frounze
2015-02-23 08:11:54 UTC
Permalink
Post by Rod Pemberton
On Fri, 20 Feb 2015 07:20:10 -0500, Alexei A. Frounze
Post by Alexei A. Frounze
Post by NimbUs
Alexei A. Frounze dit dans news:47f57e38-f116-4cbf-b96d-
FASM has multiple targets - OSes,
processors, on which it runs. The real-32 mode of operation
under DOS is just one of its host environments, the FASM
documentation clearly states it will run on most, not all,
ia32 compatible processors ever produced. The emulation of a
DOS system running on a ia-32 processor which DOS-box attempts
to provide is not one on which the RM32 mode of FASM will run.
So what ?
UX. The user isn't told explicitly that FASM is doing
something very rare and may therefore fail to operate.
Earlier, you said FASM will fail RM32 startup, if RM32
isn't found. If FASM exits _cleanly_ under those
circumstances, that's sufficient for me. I'd be worried
though that they might've not reset some descriptors, or
something hidden in the processor state remains changed,
etc. Ctrl-alt-delete.
That isn't sufficient for me. :)
Post by Rod Pemberton
I expect most code that I run to be compiled by DJGPP
or OpenWatcom. So, my machine is setup for XMS and
self-loading DPMI apps. It's not setup for EMS, VCPI,
DPMS. It's also not setup for automatically loading
DPMI since multiple DPMI hosts would conflict. I.e.,
the DPMI startup for FASM should work for me on RM
MS-DOS, Windows 98/SE console, and Linux under DOSBox,
Qemu, and dosemu. Although, I have had issues with Qemu
and DPMI. If I had my way, I'd prefer for DPMI to
be loaded automatically, e.g., always present like
in Windows 98/SE console. Unfortunately, only a few
DPMI hosts work with both DJGPP and OpenWatcom.
Supposedly, Daniel Borca's D3X is one of those.
Post by Alexei A. Frounze
It isn't saying how to workaround it.
AIUI, the only place RM32 should attempt to start for
FASM is under RM DOS, where it's easy to start or setup
a DPMI host for it. So, basically, you're complaining
about FASM not self-loading a DPMI host, which seems
trivial to me. And, you know how to, and I also told
you, how to fix it. D3X and many other DPMI hosts can
stub the executable.
I'm complaining, as you put it, about a particular
combination of things:
- DPMI
- RM32
- poor documentation on the part of system requirements
Post by Rod Pemberton
Post by Alexei A. Frounze
- not loading DPMI host
1) self-load a DPMI host
2) drop all support for RM32
That would make it just like any other DOS DPMI app.
I think we get that. It's not that way. So, if you
intend to use it, you must select some other option.
I.e., stub it, patch it, recompile it, etc.
Which I don't want to do because at the moment I don't
intend to use FASM.

Alex
Rod Pemberton
2015-02-21 00:51:13 UTC
Permalink
On Fri, 20 Feb 2015 06:07:28 -0500, Alexei A. Frounze
Post by Alexei A. Frounze
Post by NimbUs
Alexei A. Frounze dit dans news:bb66cc1c-5c9c-4b51-8a83-
Post by Alexei A. Frounze
Would you be bold enough to define the officially undefined
behavior of the shift and rotate instructions in the cases
where the shift count is equal to or greater than the
operand size? Just curious. :) I have reverse-engineered
3 different implementations from the observed results.
Two of them belonged to different Intel CPUs and one to
AMD CPUs.
IIRC shifts /are/ documented by Intel such that the count be
reduced modulo 32 (32 bit operands) or 16 (16-bit), IOW higher
count bits (if any) are discarded. This was not the case in
8086/88088, possibly 186. Some old non-intel clones might
differ.
Check your documentation.
----8<----
IA-32 Architecture Compatibility
The 8086 does not mask the shift count. However, all other
IA-32 processors (starting with the Intel 286 processor) do
mask the shift count to 5 bits, resulting in a maximum count
of 31. This masking is done in all operating modes (including
the virtual-8086 mode) to reduce the maximum execution time
of the instructions.
Operation
IF 64-Bit Mode and using REX.W
THEN
countMASK <-3FH;
ELSE
countMASK <-1FH;
FI
tempCOUNT <-(COUNT AND countMASK);
...
Flags Affected
The CF flag contains the value of the last bit shifted out
of the destination operand; it is undefined for SHL and
SHR instructions where the count is greater than or equal
to the size (in bits) of the destination operand. The OF
flag is affected only for 1-bit shifts (see "Description"
above); otherwise, it is undefined. The SF, ZF, and PF
flags are set according to the result. If the count is 0,
the flags are not affected. For a nonzero count, the AF
flag is undefined.
----8<----
FYI, RBIL's 86BUGS.LST says 8086, 8088, NEC V20 and V30
don't mask the shift count by AND-ing with 0x1F on "SHL, SAL,
SHR, SAR, ROL, RCL, ROR, RCR" instructions and "all
xxxD variants".
Post by Alexei A. Frounze
And that's the case in point! DOSBox does not try to define
in a particular way a lot of things what are undefined.
DOSBox is really bad at implementing things correctly.
I'm completely amazed that so much works properly with
all the issues I've seen with stuff being wrong, at least,
for Linux. Maybe, that's a testament to programmers who
avoid tying their code to the machine.


Rod Pemberton
Melzzzzz
2015-02-16 12:37:47 UTC
Permalink
On 16 Feb 2015 11:47:48 GMT
Post by NimbUs
Because those are 16-bit code albeit running "DOS-extended".
FASM is *32-bit* native code, hence needs a fully compatible
ia-32 CPU (or emulation thereof) which, it appears, DOSBox
does not quite provide.
Hm, are you sure that fasm requires 32 bit CPU? It started
as 16 bit assembler?
It can produce mz DOS format with 16 bit executables?
Post by NimbUs
Cheers
Alexei A. Frounze
2015-02-16 12:43:00 UTC
Permalink
Post by Melzzzzz
On 16 Feb 2015 11:47:48 GMT
Post by NimbUs
Because those are 16-bit code albeit running "DOS-extended".
FASM is *32-bit* native code, hence needs a fully compatible
ia-32 CPU (or emulation thereof) which, it appears, DOSBox
does not quite provide.
Hm, are you sure that fasm requires 32 bit CPU?
Yes. The online manual says it.
Post by Melzzzzz
It started as 16 bit assembler?
It can produce mz DOS format with 16 bit executables?
Yes. The online manual says it.

In case you're lost:
http://flatassembler.net/docs.php?article=manual

Alex
Melzzzzz
2015-02-16 12:59:45 UTC
Permalink
On Mon, 16 Feb 2015 04:43:00 -0800 (PST)
Post by Alexei A. Frounze
Post by Melzzzzz
On 16 Feb 2015 11:47:48 GMT
Post by NimbUs
Because those are 16-bit code albeit running "DOS-extended".
FASM is *32-bit* native code, hence needs a fully compatible
ia-32 CPU (or emulation thereof) which, it appears, DOSBox
does not quite provide.
Hm, are you sure that fasm requires 32 bit CPU?
Yes. The online manual says it.
fasm itself requires dpmi but produced executables do not,
I think.
Post by Alexei A. Frounze
Post by Melzzzzz
It started as 16 bit assembler?
It can produce mz DOS format with 16 bit executables?
Yes. The online manual says it.
http://flatassembler.net/docs.php?article=manual
Thanks.
Post by Alexei A. Frounze
Alex
Rod Pemberton
2015-02-17 07:21:17 UTC
Permalink
Post by NimbUs
For details, please consult with "DOS386"...
I don't recall seeing him here or anywhere on Usenet
in a while. IIRC, he used to post to Usenet. I'm
thinking it was comp.os.msdos.programmer, but I didn't
confirm that ... I could've seen his name on "DOS ain't
dead" forums or MSFN.org or Vogons, etc.
Post by NimbUs
Well, coming back to FASM, it is *32-bit* code unlike most
other X86 assemblers.
Yes, I do recall reading about Thomasz (?) doing this
some years ago. I assumed it was the undocumented
"unreal" mode which is 16-bit, but your posts seem to
indicate it's 32-bit ... I.e., perhaps, it's what
Sandpile.org refers to as RM32 here:

http://www.sandpile.org/x86/mode.htm


Rod Pemberton
NimbUs
2015-02-17 11:53:04 UTC
Permalink
Post by Rod Pemberton
Yes, I do recall reading about Thomasz (?) doing this
some years ago. I assumed it was the undocumented
"unreal" mode which is 16-bit, but your posts seem to
indicate it's 32-bit ... I.e., perhaps, it's what
http://www.sandpile.org/x86/mode.htm
Yep, I assume it is also what Sandpile is referring to.
IIRC you can find useful reference to RM32 in Ralf Brown's
celebrated interrupt list, too -> search keywords: "cloaking",
"Helix"...
--
NimbUs
wolfgang kern
2015-02-17 18:25:11 UTC
Permalink
Post by NimbUs
Post by Rod Pemberton
Yes, I do recall reading about Thomasz (?) doing this
some years ago. I assumed it was the undocumented
"unreal" mode which is 16-bit, but your posts seem to
indicate it's 32-bit ... I.e., perhaps, it's what
http://www.sandpile.org/x86/mode.htm
Yep, I assume it is also what Sandpile is referring to.
IIRC you can find useful reference to RM32 in Ralf Brown's
celebrated interrupt list, too -> search keywords: "cloaking",
"Helix"...
Beside what other references may tell or not...
I'm experienced with Unreal-mode, which sometimes is also
called BigReal which is another thing.

The one I still use sometimes is:

CS=limited to 64k real-mode (after a farJMP)
DS,ES,SS can be left unlimited (max 4GB anyway) after a previous
Flat PM-setup and so can acccess data within 4GB from RM.

Thomasz once showed that also an unlimited (former Flat) CS
will work under certain circumstance after return from PM32.

But as I interprete this idea it just keeps CS in PM32 even
bit 0 of CR0 were cleared.
And it works, but just until any of INT/IRQ/farCall/FarJump occure.
IIRC Thomasz had also a workaournd for this situations and made
it work too with some sofware overhead.
But this is a very limited and restricting mode in my oppinion.
His target may not may not have been to promote or use this mode,
as I understood it was just an example of what's possible at all.

Dunno which are the most used or correct terms for modes like
this, so I keep onto "UNreal vs. BigReal" as mentioned above.

And btw:
I've seen an old FASM variant which produce only 16 bit code and
doesn't need any DPMI nor a 32-bit environment.

If I ever would use any ASM-compiler I'd prefer FASM over NASM,
just because FASM doesn't need cmd-line-opt/link/make/etc-shit.

And if I ever write an ASM-compiler, it would be a smart one and
for sure not a small nor the fastest one. All this just to produce
code that is fast and small and nothing else, and fully renounce
programmers convenience issues.
__
wolfgang
NimbUs
2015-02-18 11:09:06 UTC
Permalink
Post by wolfgang kern
CS=limited to 64k real-mode (after a farJMP)
DS,ES,SS can be left unlimited (max 4GB anyway) after a
previous
Post by wolfgang kern
Flat PM-setup and so can acccess data within 4GB from RM.
A very trivial, logical extension of the real address mode,
used at least by every BIOS since the first Compaq-386 ...
Post by wolfgang kern
Thomasz once showed that also an unlimited (former Flat) CS
will work under certain circumstance after return from PM32.
But as I interprete this idea it just keeps CS in PM32 even
bit 0 of CR0 were cleared.
The important criterion for "huge real" aka RM32 is *CS.D=1*
Post by wolfgang kern
And it works, but just until any of INT/IRQ/farCall/FarJump
occure.

/Wrong/! far transfers including int's do not reset the cached
CS.D bit ("D" is the "default" or "big" bit in the cached CS
descriptor).
Post by wolfgang kern
IIRC Thomasz had also a workaournd for this situations and
made
Post by wolfgang kern
it work too with some sofware overhead.
There are a couple problems with int's (not far jmp's or
calls). The main difficulty stems from the fact that the real
mode IVT (16:16 bit int vectors), and the very way the X86
firware processes interrupts are /not changed/ to accomodate
the "D" bit. Some magic sleight of hand is required to
rearrange things on the stack before returning from an
interrupt in RM32, and in the most general° case some way must
be found to "track" the high-16 bits of EIP (because only IP
is saved on interrupts).

° The latter is not done by Tomasz because he imposes unto
himself to keep his /code/ segment size under 64 k. /Data/
segment size of course unrestricted.
Post by wolfgang kern
But this is a very limited and restricting mode in my
oppinion.

I respect your opinion albeit a tad uninformed.
Fair comparisons of the seldom used huge real mode with other,
more common, systems of DOS extension have yet to be done,
based on compared overheads and also different scopes of the
methods. The evident advantage of "huge" - RM32 - is that it
allows one to use the /same/ 32-bit main program under DOS
than under 32-bit native environments such as Win32 and
various flavours of *X... I'm not sure the overhead (for
interrupt processing) is much worse, if at all, as compared
with a classic DOS-extension scheme, DPMI for instance.
Post by wolfgang kern
His target may not may not have been to promote or use this
mode,
Post by wolfgang kern
as I understood it was just an example of what's possible at
all.

I think Tomasz - as a curious, gifted individual - developped
his set up independently as an exercise in nonconventianal
programming and discovery of "undocumented" opportunities (as
earlier said, others had done similar and even further set ups
exploiting the ideas of RM32); he applied this self taught
experience quite naturally to produce a true DOS executable
while developping the multi-target FASM-32.
Post by wolfgang kern
Dunno which are the most used or correct terms for modes
like
Post by wolfgang kern
this, so I keep onto "UNreal vs. BigReal" as mentioned
above.

Because Intel did not officially endorse, and was even
reluctant to acknowledge the combinations which use "real
mode" (more exactly, PE=0) while /not/ emulating a 8086,
there is no universally accepted terminology. Big real is most
often used for the 'vanilla' real mode with only some, or all,
/data/ segment descriptor limits increased. "Huge" real may be
be used by some for 32-bit /CS/, but Tomasz prefers to call it
"flat real mode" ... whatever the "name of the Rose" ...

Finally, the reason this huge real mode was not more widely
used, apart from ignorance and a lack of imagination among the
industry, is probably that it appeared strange, even "wild"
and nobody dared predict future CPUs would remain RM-32
capable. Of course they have remained compatible, even
AMD/Intel X64 continue to operate in this not-so-wild mode.
Admittedly OTOH one could make a case that RM-32 is just an
oddity that serves no purpose which cannot be addressed in
other manners... Perso, I'm not dogmatic !


Regards...
--
NimbUs
Rod Pemberton
2015-02-19 01:29:18 UTC
Permalink
The important criterion for "huge real" aka RM32 is *CS.D=3D1*
...
[...] far transfers including int's do not reset the cached
CS.D bit ("D" is the "default" or "big" bit in the cached CS
descriptor).
...
There are a couple problems with int's (not far jmp's or
calls). The main difficulty stems from the fact that the real
mode IVT (16:16 bit int vectors), and the very way the X86
firware processes interrupts are /not changed/ to accomodate
the "D" bit.
Has anyone found out if the IVT works as 16:32 bit?
Or, are you saying the IVT remains 16:16 bit in RM32?
Some magic sleight of hand is required to
rearrange things on the stack before returning from an
interrupt in RM32,
Can you say what changed in more detail?
and in the most general=B0 case some way must
be found to "track" the high-16 bits of EIP (because only IP
is saved on interrupts).
So, you're saying CS.D=3D1 doesn't change the IP size from
16-bits to 32-bits in RM32? That's rather odd ...
Finally, the reason this huge real mode was not more widely
used, apart from ignorance and a lack of imagination among the
industry, is probably that it appeared strange, even "wild"
and nobody dared predict future CPUs would remain RM-32
capable. Of course they have remained compatible, even
AMD/Intel X64 continue to operate in this not-so-wild mode.
Admittedly OTOH one could make a case that RM-32 is just an
oddity that serves no purpose which cannot be addressed in
other manners... Perso, I'm not dogmatic !
I doubt that it's there without purpose ...
We just don't know what the purpose is or was.


I'm surprise that Tomasz Grysztar or his followers didn't
do a complete analysis of this mode and publish it!


Rod Pemberton
Bertho G
2015-02-19 11:36:43 UTC
Permalink
On Wed, 18 Feb 2015 06:09:06 -0500, NimbUs
Post by NimbUs
The important criterion for "huge real" aka RM32 is *CS.D=1
[...] far transfers including int's do not reset the cached
CS.D bit ("D" is the "default" or "big" bit in the cached
CS
Post by NimbUs
descriptor).
...
Post by NimbUs
There are a couple problems with int's (not far jmp's or
calls). The main difficulty stems from the fact that the
real
Post by NimbUs
mode IVT (16:16 bit int vectors), and the very way the X86
firware processes interrupts are /not changed/ to
accomodate
Post by NimbUs
the "D" bit.
Has anyone found out if the IVT works as 16:32 bit?
Or, are you saying the IVT remains 16:16 bit in RM32?
Correct. This is the main embarassment with RM32.

...
Post by NimbUs
and in the most general° case some way must
be found to "track" the high-16 bits of EIP (because only
IP
Post by NimbUs
is saved on interrupts).
So, you're saying CS.D=1 doesn't change the IP size from
16-bits to 32-bits in RM32? That's rather odd ...
No, what I said is interrupts operate exactly the same
irrespective of the "D" control bit, saving 16-bitty flags,
CS, IP on the stack, with the consequence that the high 16-
bits of Eflags and, importantly, EIP, are /not/ saved.

That's no problem as long as the code instructions can be
contained in less than 64k and restricted to the first 64k of
the code segment. Which is the case of Tomasz's FASM.

The other published setting up of RM32 - which I recalled,
possibly wrongly, as Helix's - provided for huge code sizes,
necessitating the tracking of high EIP across interrupts
(ingeniously and somehow ironicly kept in the... CR2 ! - which
has then to be modified upon, infrequent, jumps across 34-k-
boundaries inside the RM32 code segment)
Post by NimbUs
Finally, the reason this huge real mode was not more widely
used, apart from ignorance and a lack of imagination among
the
Post by NimbUs
industry, is probably that it appeared strange, even "wild"
and nobody dared predict future CPUs would remain RM-32
capable. ....
I doubt that it's there without purpose ...
We just don't know what the purpose is or was.
It is there because...of 'orthogonality' in the processor
features, and because nobody at Intel felt like there was a
reason to add /special/ interdiction.

The right way to think of X86 "modes", I think, is to
understand there are no separate modes, "real" vs "protected"
for instance. There never were two processors under one hood,
instead there is one X86-32, fetching, decoding and executing
instructions under the control of a few modal "feature
control" bits. By combining the bits and registers, one has
access to a gradation of "modes".
I'm surprise that Tomasz Grysztar or his followers didn't
do a complete analysis of this mode and publish it!
His code is published for your examination. Really there is
little mystery that can't be deduced by careful and /critical/
study of sometimes inaccurate, sometimes almost deliberately
lying, Intel 80386+ processor manuals - completed by personal
experimentation. The worst way of learning X86 intrinsic
behaviour with respect to processor "modes" - is to /believe
blindly/ Intel manuals. the best way is to read Intel manuals
- understanding that the processor's design is logical and
features are /orthogonal/. Mostly. Most features that were
/advertised/ as "protected mode" only are /not/ magically
turned off when PE=0, rather they are used in such a way that
they contribute to emulating the 16-bit, "real", address mode
of a i8086.

To develop a little further what i have in mind, one should
NOT mentally separate real mode / 16-bit protected / 32 bit
protected mode / SMM , and so on. Rather one should thing in
term of individual control registers and bits, descriptor
caches etc.
--
Bertho
Rod Pemberton
2015-02-20 00:40:29 UTC
Permalink
Post by Bertho G
On Wed, 18 Feb 2015 06:09:06 -0500, NimbUs
Post by NimbUs
There are a couple problems with int's (not far jmp's or
calls). The main difficulty stems from the fact that the
real mode IVT (16:16 bit int vectors), and the very way
the X86 firware processes interrupts are /not changed/ to
accomodate the "D" bit.
Has anyone found out if the IVT works as 16:32 bit?
Or, are you saying the IVT remains 16:16 bit in RM32?
Correct. This is the main embarassment with RM32.
Post by NimbUs
and in the most general=B0 case some way must
be found to "track" the high-16 bits of EIP (because only
IP is saved on interrupts).
So, you're saying CS.D=3D1 doesn't change the IP size from
16-bits to 32-bits in RM32? That's rather odd ...
No, what I said is interrupts operate exactly the same
irrespective of the "D" control bit, saving 16-bitty flags,
CS, IP on the stack, with the consequence that the high 16-
bits of Eflags and, importantly, EIP, are /not/ saved.
That's no problem as long as the code instructions can be
contained in less than 64k and restricted to the first 64k of
the code segment. Which is the case of Tomasz's FASM.
I would speculate that this seems to be more of an extension
to 16-bit RM, than an exclusive mode. I.e., it would seem
to me that you're supposed to used 16-bit RM for most code and
interrupts, but you can jump to and execute code in a 32-bit
RM segment with CS.D=3D1. Wouldn't this be much the same
as using 32-bit PM and jumping to a 16-bit PM segment? ...
Post by Bertho G
I doubt that it's there without purpose ...
We just don't know what the purpose is or was.
It is there because...of 'orthogonality' in the processor
features, and because nobody at Intel felt like there was a
reason to add /special/ interdiction.
I was going to speculate that it was the result of combination
of the logic functionality at the hardware level, but whenever
I say such electrical engineering things to other programmers,
they get hostile, irritated, and argumentative ...
Post by Bertho G
The right way to think of X86 "modes", I think, is to
understand there are no separate modes, "real" vs "protected"
for instance. There never were two processors under one hood,
instead there is one X86-32, fetching, decoding and executing
instructions under the control of a few modal "feature
control" bits. By combining the bits and registers, one has
access to a gradation of "modes".
Yes.
Post by Bertho G
I'm surprise that Tomasz Grysztar or his followers didn't
do a complete analysis of this mode and publish it!
His code is published for your examination. Really there is
little mystery that can't be deduced by careful and /critical/
study of sometimes inaccurate, sometimes almost deliberately
lying, Intel 80386+ processor manuals - completed by personal
experimentation. The worst way of learning X86 intrinsic
behaviour with respect to processor "modes" - is to /believe
blindly/ Intel manuals. the best way is to read Intel manuals
- understanding that the processor's design is logical and
features are /orthogonal/. Mostly. Most features that were
/advertised/ as "protected mode" only are /not/ magically
turned off when PE=3D0, rather they are used in such a way that
they contribute to emulating the 16-bit, "real", address mode
of a i8086.
Yes, Robert Collins, an American author, provide much insight
in that regard, i.e., his in-depth articles on undocumented
capabilities of x86, usually for Dr. Dobbs magazine.

Robert Collins' Intel Secrets
http://www.rcollins.org/secrets/
Post by Bertho G
To develop a little further what i have in mind, one should
NOT mentally separate real mode / 16-bit protected / 32 bit
protected mode / SMM , and so on. Rather one should thing in
term of individual control registers and bits, descriptor
caches etc.
...


Rod Pemberton
NimbUs
2015-02-20 11:26:31 UTC
Permalink
Post by Rod Pemberton
I would speculate that this seems to be more of an extension
to 16-bit RM, than an exclusive mode. I.e., it would seem
to me that you're supposed to used 16-bit RM for most code
and
Post by Rod Pemberton
interrupts,
I can' see where I have suggested such a thing. On the
opposite, I have tried to make it clear that the "big" bit
CS.D is never reset spontaneously by the CPU, even when
interrupts occur. Interrupts can be processed in 32-bit code,
or - for DOS and BIOS "extension" purposes - interfaced to
unerlying 16-bit RM code, however getting this interface right
is the tricky, and challenging, part of (re)inventing the
interface.
Post by Rod Pemberton
but you can jump to and execute code in a 32-bit
RM segment with CS.D=1. Wouldn't this be much the same
as using 32-bit PM and jumping to a 16-bit PM segment? ...
PM has its own challenges, privilege levels ('rings') for one,
which we do not use in RM at all (though privilege /would be/
honored even in "ordinary" real mode if we wanted to, they
have no use case).
In RM32 - user code - we do /not/ do intersegment jumps or
calls, they are useless and would be difficult to cope with.
Rathern we use "near-but-no-so-near" jumps and calss - recall
that "near" jumps have a 32-bit wide target and can address a
full four gigabytes in a 32-bit code segment, including RM32.

FASM uses a "restricted" flavour of RM32 where all its code is
contained in less than 64k-bytes anyway.
Post by Rod Pemberton
Post by Bertho G
Post by Rod Pemberton
I doubt that it's there without purpose ...
We just don't know what the purpose is or was.
It is there because...of 'orthogonality' in the processor
features, and because nobody at Intel felt like there was a
reason to
Post by Rod Pemberton
Post by Bertho G
add /special/ interdiction.
I was going to speculate that it was the result of
combination
Post by Rod Pemberton
of the logic functionality at the hardware level, but
whenever
Post by Rod Pemberton
I say such electrical engineering things to other
programmers,
Post by Rod Pemberton
they get hostile, irritated, and argumentative ...
The way Intel designed an emulation of a 8086 inside their new
baby, the i80286, was economic, clean and clever. Rather than
integrate a full 8086 on a corner of the 80286 silicon, and
later 386+, they put there the minimum possible extra-
circuitry to (more or less precisely) achieve the illusion.
The "real address mode" was initially only intended for the
early stages of booting up a "protected mode" OS, as well as
initial marketting aid for 80286s, allowing the execution of
older systems s.a. DOS, that were supposed to disappear from
use after a transition period (!)

In general they weren't going to add special checks and
barriers to purposefully prevent "tweaking" the real mode in
non-officially sanctionned ways.

One notable exception to this practical rule was Intel 486+
processors explicitly prohibiting a combination of "paging"
with "real mode" - although it had been secretly possible in
386s (using loadall).
Post by Rod Pemberton
Post by Bertho G
features are /orthogonal/. Mostly. Most features that were
/advertised/ as "protected mode" only are /not/ magically
turned off when PE=0, rather they are used in such a way
that
Post by Rod Pemberton
Post by Bertho G
they contribute to emulating the 16-bit, "real", address
mode of a
Post by Rod Pemberton
Post by Bertho G
i8086.
Yes, Robert Collins, an American author, provide much
insight
Post by Rod Pemberton
in that regard, i.e., his in-depth articles on undocumented
capabilities of x86, usually for Dr. Dobbs magazine.
Robert Collins' Intel Secrets
http://www.rcollins.org/secrets/
Indeed, a very talented reporter and experimentor.
Intel reps /hated/ him, for no valid reason I can grasp...
--
NimbUs
wolfgang kern
2015-02-19 12:16:39 UTC
Permalink
Post by NimbUs
Post by wolfgang kern
CS=limited to 64k real-mode (after a farJMP)
DS,ES,SS can be left unlimited (max 4GB anyway) after a
previous
Flat PM-setup and so can acccess data within 4GB from RM.
A very trivial, logical extension of the real address mode,
used at least by every BIOS since the first Compaq-386 ...
Yes.
Post by NimbUs
Post by wolfgang kern
Thomasz once showed that also an unlimited (former Flat) CS
will work under certain circumstance after return from PM32.
But as I interprete this idea it just keeps CS in PM32 even
bit 0 of CR0 were cleared.
The important criterion for "huge real" aka RM32 is *CS.D=1*
Post by wolfgang kern
And it works, but just until any of INT/IRQ/farCall/FarJump
occure.
/Wrong/! far transfers including int's do not reset the cached
CS.D bit ("D" is the "default" or "big" bit in the cached CS
descriptor).
Ok, my memory seem to fade away when it comes to discussions
'that long' ago, so I didn't remember the set Big-Bit in CS.
Post by NimbUs
Post by wolfgang kern
IIRC Thomasz had also a workaournd for this situations and
made it work too with some software overhead.
There are a couple problems with int's (not far jmp's or
calls). The main difficulty stems from the fact that the real
mode IVT (16:16 bit int vectors), and the very way the X86
firware processes interrupts are /not changed/ to accomodate
the "D" bit. Some magic sleight of hand is required to
rearrange things on the stack before returning from an
interrupt in RM32, and in the most general° case some way must
be found to "track" the high-16 bits of EIP (because only IP
is saved on interrupts).
° The latter is not done by Tomasz because he imposes unto
himself to keep his /code/ segment size under 64 k. /Data/
segment size of course unrestricted.
Post by wolfgang kern
But this is a very limited and restricting mode in my
oppinion.
I respect your opinion albeit a tad uninformed.
Fair comparisons of the seldom used huge real mode with other,
more common, systems of DOS extension have yet to be done,
based on compared overheads and also different scopes of the
methods. The evident advantage of "huge" - RM32 - is that it
allows one to use the /same/ 32-bit main program under DOS
than under 32-bit native environments such as Win32 and
various flavours of *X... I'm not sure the overhead (for
interrupt processing) is much worse, if at all, as compared
with a classic DOS-extension scheme, DPMI for instance.
Post by wolfgang kern
His target may not have been to promote or use this mode,
as I understood it was just an example of what's possible
at all.
I think Tomasz - as a curious, gifted individual - developped
his set up independently as an exercise in nonconventianal
programming and discovery of "undocumented" opportunities (as
earlier said, others had done similar and even further set ups
exploiting the ideas of RM32); he applied this self taught
experience quite naturally to produce a true DOS executable
while developping the multi-target FASM-32.
Me too like to play with opportunities, so I once checked on
various settings but never made RM32 work as expected.
What I figured and learned was just enough for a fully working
code-mix of trueRM16/PM16/PM32 in my OS w/o task-switches and
only one common stack by having all IRQs handled equal in all
modes. And when I think of how this all work then my intermode-
link may be temporary in such a RM32 state.
Post by NimbUs
Post by wolfgang kern
Dunno which are the most used or correct terms for modes
like this, so I keep onto "UNreal vs. BigReal" as mentioned
above.
Because Intel did not officially endorse, and was even
reluctant to acknowledge the combinations which use "real
mode" (more exactly, PE=0) while /not/ emulating a 8086,
there is no universally accepted terminology. Big real is most
often used for the 'vanilla' real mode with only some, or all,
/data/ segment descriptor limits increased. "Huge" real may be
be used by some for 32-bit /CS/, but Tomasz prefers to call it
"flat real mode" ... whatever the "name of the Rose" ...
Finally, the reason this huge real mode was not more widely
used, apart from ignorance and a lack of imagination among the
industry, is probably that it appeared strange, even "wild"
and nobody dared predict future CPUs would remain RM-32
capable. Of course they have remained compatible, even
AMD/Intel X64 continue to operate in this not-so-wild mode.
Admittedly OTOH one could make a case that RM-32 is just an
oddity that serves no purpose which cannot be addressed in
other manners... Perso, I'm not dogmatic !
Yeah, undocumented features were never recommended by vendors
and so often just neven tried.

__
wolfgang
Rod Pemberton
2015-02-17 21:05:10 UTC
Permalink
Post by NimbUs
Post by Rod Pemberton
Yes, I do recall reading about Thomasz (?) doing this
some years ago. I assumed it was the undocumented
"unreal" mode which is 16-bit, but your posts seem to
indicate it's 32-bit ... I.e., perhaps, it's what
http://www.sandpile.org/x86/mode.htm
Yep, I assume it is also what Sandpile is referring to.
...
Post by NimbUs
IIRC you can find useful reference to RM32 in Ralf Brown's
celebrated interrupt list, too -> search keywords: "cloaking",
"Helix"...
It seems you once knew of it:
http://board.flatassembler.net/topic.php?p=174199

But, so far, I'm not finding any mention of RM32 in RBIL,
that I recognize ... i.e., not under RM32, "huge real mode",
or in the Helix API descriptions. RBIL does describe Helix's
RAM-MAN/386 API, but nothing indicates it's RM32, as far as
I can tell.

Googling finds that Helix's Cloaking, RAM-MAN/386, Netroom
technologies etc used 32-bit protected-mode (PM32) versions
of BIOS and VGA bios. Supposedly, the BIOS was a ported
version of an updated BIOS, updated for the time period of
the product: 1991 to 1993. They also provided a disk cache,
mouse driver, and MSCDEX which used their cloaking API.
Wikipedia indicates their API is similar to DPMS (not DPMI).

I can't find any mention of RM32 anywhere in regards to Helix's
products ... Everything I've come across says they used PM32,
including old magazine articles from the era via Google Books.

I'm willing to hear more from you on this, but that's all
the time I have to look into this.


Rod Pemberton
Rod Pemberton
2015-02-17 07:21:25 UTC
Permalink
On Sun, 15 Feb 2015 20:12:59 -0500, Alexei A. Frounze
[...] DOSBox [...] DJGPP [...]
(there's some problem with long file names in included headers).
LFNs don't seem to work with DOSBox under Linux, either.

DJGPP needs LFNs to include <sys/limits.h> etc.
That's the only include I've had problems with,
without LFNs available. You can subsitute <dir.h>
for some of the things in <sys/limits.h> with DJGPP.
That will work without LFNs.

DOSBox doesn't support E820h or most other BIOS
memory calls either. This could possibly cause
problems with DPMI memory. I.e., they emulate
an old machine.

IIRC, someone on "DOS ain't dead" forums or perhaps
Vogons posted an LFN patched version of DOSBox.
Melzzzz or perhaps Nimbus might've already mentioned
that ...

DOSLFN
http://adoxa.altervista.org/doslfn/


Rod Pemberton
NimbUs
2015-02-15 12:16:08 UTC
Permalink
Alexei A. Frounze dit dans news:dcc6c3aa-57ae-48f6-9427-
Post by Alexei A. Frounze
Hint: I'm getting "error: processor is not able to enter 32-
bit real mode" in DOSBox.

In order to "enter 32-bit-real mode", FASM (the DOS version)
must start in true-real mode. I never ran DOSBbox but I'm sure
it is no true real mode DOS environment by any definition :=)

If you want to experience FASM's raw mode, you must run it on
real DOS "iron", no memory manager (HIMEM i.e. XMS is OK,
optionally).

The special 32-bit unreal mode is also incompatible with such
"virtual machine" environments as VMWare and VPC due to bugs
in those e latter (not FASM's fault!). It works as it should
under Virtual Box, and Bochs.

Cheers
--
Nim'
Alexei A. Frounze
2015-02-16 00:50:37 UTC
Permalink
Post by NimbUs
Alexei A. Frounze dit dans news:dcc6c3aa-57ae-48f6-9427-
Post by Alexei A. Frounze
Hint: I'm getting "error: processor is not able to enter 32-
bit real mode" in DOSBox.
In order to "enter 32-bit-real mode", FASM (the DOS version)
must start in true-real mode. I never ran DOSBbox but I'm sure
it is no true real mode DOS environment by any definition :=)
If you want to experience FASM's raw mode, you must run it on
real DOS "iron", no memory manager (HIMEM i.e. XMS is OK,
optionally).
The special 32-bit unreal mode is also incompatible with such
"virtual machine" environments as VMWare and VPC due to bugs
in those e latter (not FASM's fault!). It works as it should
under Virtual Box, and Bochs.
You do understand that unreal/big real mode is still not
officially documented (even though there's one occurrence of
"big real" in "Intel(R) 64 and IA-32 Architectures Software
Developer's Manual" from May 2011, which among the other typos
and inconsistencies only tells of Intel's sloppiness on the
documentation side) and the most typical setup includes only
4G-1 limits in data segment descriptors?

So, when FASM uses undocumented CPU features in unintended
ways, who's here to blame?

If you consider the fact that at least 3 popular VM
environments don't support said unintended use of undocumented
features and if you combine that with the lack of proper
information on the matter (there's only "DOS version requires
an OS compatible with MS DOS 2.0 and either true real mode
environment or DPMI.") and the resulting user experience,
you get extremely close to saying "fnck you, user, it's your
problem, not ours". :)

Don't defend FASM. :)

Alex
NimbUs
2015-02-16 14:18:53 UTC
Permalink
Alexei A. Frounze dit dans news:7095911a-d168-4f94-9039-
Post by Alexei A. Frounze
So, when FASM uses undocumented CPU features in unintended
ways, who's here to blame?
You do understand FASM fro DOS does NOT require or rely on
what you are calling undocumented features ? No, I think you
have not yet realised : FASM is absolutely standard, by the
rules, 32-bit code that runs undmodified under several
different setups - it is the SAME 32-bit code under Linux,
under 32-bit-DPMI, or under the special 32-bit-unreal mode.

Only the latter makes use of seldom used or "undocumented"
features, and only in the interface shim, comparable to the
DPMI host, not in the assembler itslef.

FASM under DOS+DPMI is a regular 32-bit host which makes no
use of any undocumented feature. Full stop.

The FASM assembler was written as 32-bit code from start and,
as I think Tomasz explained on his site, the 32-bit-unreal set
up was developed later, as an exercise or proof on concept. It
was a success, since it allowsed existing 32-bit code to run
unmodified under 16-bit mode DOS - a better alternative than
rewriting FASM from scratch as 16 bit code to run it under a
16-bit DOS extender.
--
Nim
NimbUs
2015-02-27 21:45:12 UTC
Permalink
I've looked back & stumbled upon this /old conversation of yr
2003 on the FASM forum, which you guys might find lluminating.

Significant excerpt :

*Tomasz Grysztar*
Assembly Artist
Location: Kraków, Poland :

Ralph, you are talking about the flat real mode - some people
do call it unreal mode also, but I prefer to use the "unreal
mode" name for the real mode that executes native 32-bit code
(without prefixes). Current DOS version of fasm uses it, so
you can look there to see how does it look like.

*Ralph :*

Hm, I always wondered if it was possible to do that. I didn't
think the 32-bit flag would carry over into real-mode. That's
a really cool, although useless, trick. :)

*Tomasz Grysztar* :

Why useless? On 386 processor unREAL version of fasm is
actually the fastest one. And on modern processor it has the
same speed as 32-bit protected mode version.
The only problem when executing 32-bit code in real mode are
the interrupts - in fasm I deal with them by using the
additional IDT solely for 32-bit mode, which redirects all
interrupts to 16-bit mode handlers (with quick mode switches).
It was the most suprising for me, that even IDT base address
for real mode can be moved! This allows to run 32-bit code
with interrupts enabled, though all the 32-bit code must still
fit in the first 64kB of segment, because high word of EIP is
lost when interrupt is executed in real mode (even with 32-bit
code enabled), to execute code from higher offsets you have to
disable the interrupts first.

*Ralph* :

Useless because in my opinion 16-bit code died years ago.
There is really no excuse for programming anything worthwhile
in it anymore, especially not with everyone slowly migrading
to 64-bit. Noone uses 16-bit applications, and if they do
there is bound to be a 32-bit version available. It might be
easier to learn, but that's not really applicable in this
case. I can see how the 386 version can run the fastest, but
realisticly, who still uses (and I mean uses, not has) a 386?
And for anyone planning on responding with "me", go to your
local school or computer store and check around the back by
the dumpsters. There's bound to be some better hardware there.
:)

As for the IDT base thing, I would've never suspected that.
That's a neat trick I'm sure someone other than myself can put
to good use. :)

*Tomasz Grysztar* :

This is 32-bit code, so the death of 16-bit one doesn't apply.
(Wink!)
And generally unREAL mode gives you many of the advantages of
protected mode (like 32-bit code and access to 4GB addressing
space), while keeping it much simpler when you need to program
some low-level OS-like features. Abillity to call both 16-bit
and 32-bit BIOS functions in very simple way and very quickly,
no need for the selectors, full control over hardware - that
makes the unREAL an ideal solution when you want to programm
something really low-level.

*Ralph/ :

Well, I guess you got a point there. However, can't you
achieve the same end result with protected mode if you just
ignore all the additional bloat intel decided would be cool to
throw in? Only use a single segment, run everything at ring0,
etc.
You wouldn't get BIOS access, but real men don't need that
anyway. :)

*Tomasz Grysztar* :

Yes, you're right, but you would have to write that everything
yourself as well....
______________________________________________________

It's there :
<http://board.flatassembler.net/topic.php?p=5024>
--
Nim'
wolfgang kern
2015-02-27 22:31:29 UTC
Permalink
NimbUs digged up an oldie:

Thanks for bringing it back to my memory, and I chose long
before this what Tomasz's last sentence here suggest:
"Write everything yourself"
So I use only a few BIOS-calls during boot-up and then do almost
everything my own way. I didn't check further on RM32 even it
could have worked for me too but there were no guaranty for any
future CPUs to show the same opportunity (looks like it still works).
__
wolfgang
sorry for top-posting yet.
Post by NimbUs
I've looked back & stumbled upon this /old conversation of yr
2003 on the FASM forum, which you guys might find lluminating.
*Tomasz Grysztar*
Assembly Artist
Ralph, you are talking about the flat real mode - some people
do call it unreal mode also, but I prefer to use the "unreal
mode" name for the real mode that executes native 32-bit code
(without prefixes). Current DOS version of fasm uses it, so
you can look there to see how does it look like.
*Ralph :*
Hm, I always wondered if it was possible to do that. I didn't
think the 32-bit flag would carry over into real-mode. That's
a really cool, although useless, trick. :)
Why useless? On 386 processor unREAL version of fasm is
actually the fastest one. And on modern processor it has the
same speed as 32-bit protected mode version.
The only problem when executing 32-bit code in real mode are
the interrupts - in fasm I deal with them by using the
additional IDT solely for 32-bit mode, which redirects all
interrupts to 16-bit mode handlers (with quick mode switches).
It was the most suprising for me, that even IDT base address
for real mode can be moved! This allows to run 32-bit code
with interrupts enabled, though all the 32-bit code must still
fit in the first 64kB of segment, because high word of EIP is
lost when interrupt is executed in real mode (even with 32-bit
code enabled), to execute code from higher offsets you have to
disable the interrupts first.
Useless because in my opinion 16-bit code died years ago.
There is really no excuse for programming anything worthwhile
in it anymore, especially not with everyone slowly migrading
to 64-bit. Noone uses 16-bit applications, and if they do
there is bound to be a 32-bit version available. It might be
easier to learn, but that's not really applicable in this
case. I can see how the 386 version can run the fastest, but
realisticly, who still uses (and I mean uses, not has) a 386?
And for anyone planning on responding with "me", go to your
local school or computer store and check around the back by
the dumpsters. There's bound to be some better hardware there.
:)
As for the IDT base thing, I would've never suspected that.
That's a neat trick I'm sure someone other than myself can put
to good use. :)
This is 32-bit code, so the death of 16-bit one doesn't apply.
(Wink!)
And generally unREAL mode gives you many of the advantages of
protected mode (like 32-bit code and access to 4GB addressing
space), while keeping it much simpler when you need to program
some low-level OS-like features. Abillity to call both 16-bit
and 32-bit BIOS functions in very simple way and very quickly,
no need for the selectors, full control over hardware - that
makes the unREAL an ideal solution when you want to programm
something really low-level.
Well, I guess you got a point there. However, can't you
achieve the same end result with protected mode if you just
ignore all the additional bloat intel decided would be cool to
throw in? Only use a single segment, run everything at ring0,
etc.
You wouldn't get BIOS access, but real men don't need that
anyway. :)
Yes, you're right, but you would have to write that everything
yourself as well....
______________________________________________________
<http://board.flatassembler.net/topic.php?p=5024>
--
Nim'
NimbUs
2015-02-15 12:18:23 UTC
Permalink
Alexei A. Frounze dit dans news:dcc6c3aa-57ae-48f6-9427-
Post by Alexei A. Frounze
Hint: I'm getting "error: processor is not able to enter 32-
bit real
Post by Alexei A. Frounze
mode" in DOSBox.
In order to "enter 32-bit-real mode", FASM (the DOS version)
must start in true-real mode. I never ran DOSBbox but I'm sure
it is no true real mode DOS environment by any definition :=)

If you want to experience FASM's raw mode, you must run it on
real DOS "iron", no memory manager (HIMEM i.e. XMS is OK,
optionally).

The special 32-bit unreal mode is also incompatible with such
"virtual machine" environments as VMWare and VPC due to bugs
in those (not FASM's fault!). It works as it should
under Virtual Box, and Bochs.

Cheers
--
Nim'
Rod Pemberton
2015-02-15 14:10:28 UTC
Permalink
On Sat, 14 Feb 2015 17:19:07 -0500, Alexei A. Frounze
Post by Alexei A. Frounze
Looks like it requires DPMI in DOS and it doesn't try to load
any DPMI host / DOS extender by itself nor has any built-in.
I need to run CWSDPMI.EXE manually before FASM.EXE. Not good.
I thought you were familiar with these options for CWSDPMI.

To make CWSDPMI permanent:
CWSDPMI -p

To disable CWSDPMI swap:
CWSDPMI -s-

To disable CWSDPMI's DPMI 1.0 functions:
CWSDPMI -x

To unload CWSDPMI:
CWSDPMI -u

From, Charles Sandmann's CWSDPMI r7 intro:
http://homer.rice.edu/~sandmann/cwsdpmi/


Nimbus mentioned Japheth's (Andreas Grech) DPMI host.
Japheth's website has been down. It seems he is still
posting to various websites. So, the "DOS Ain't Dead"
forums' speculation he'd passed away is probably wrong.

Way Back archive of Japheth's website, first is HXRT, next is main
http://web.archive.org/web/20130514072745/http://www.japheth.de/HX.html
http://web.archive.org/web/20130607202225/http://japheth.de/

There are many other DPMI hosts.

Years ago, I tracked down about half of the 31 or more DPMI
hosts out there. Narech Koumar's DOS/32A, Daniel Borca's
D3X, Bob Smith's DPMIONE, and Michael Devore's CAUSEWAY are
all popular.

PMODEDJ (PMODETSR.EXE) that comes with DJGPP would probably
work with FASM, i.e., assuming FASM has no need for a DOS
Extender. It won't fix FASM not self-loading a DPMI host.

Some DPMI hosts can stub and re-stub other programs. IIRC,
D3X was one of those. So, maybe you just need to stub
FASM with DPMI loader code.


Are you using Linux or Windows? For Linux,

dosemu has built in DPMI.
DOSBox doesn't have built in DPMI.
QEMU doesn't have built in DPMI.


To mount a drive directly, follow what's below.

For DOSBox, add a mount command in DOSBox's autoexec.bat
to mount your drive:

mount C /media/drive_mount_point_in_linux

For QEMU, specify the drive device in the start command:

qemu-kvm -boot c -hda /dev/linux_device

For dosemu, you put a link in a specific directory to
the mount point of your drive:

cd /root/.dosemu/drives
ln -s /media/drive_mount_point_in_linux c


Robert Riebisch's "DOS ain't dead" forum
http://www.bttr-software.de/forum/forum.php


Rod Pemberton
Alexei A. Frounze
2015-02-16 03:42:48 UTC
Permalink
Post by Rod Pemberton
On Sat, 14 Feb 2015 17:19:07 -0500, Alexei A. Frounze
Post by Alexei A. Frounze
Looks like it requires DPMI in DOS and it doesn't try to load
any DPMI host / DOS extender by itself nor has any built-in.
I need to run CWSDPMI.EXE manually before FASM.EXE. Not good.
I thought you were familiar with these options for CWSDPMI.
CWSDPMI -p
Still, that needs to be done explicitly more than zero times.

Alex
Rod Pemberton
2015-02-16 12:10:05 UTC
Permalink
On Sun, 15 Feb 2015 22:42:48 -0500, Alexei A. Frounze
Post by Rod Pemberton
On Sat, 14 Feb 2015 17:19:07 -0500, Alexei A. Frounze
Post by Alexei A. Frounze
Looks like it requires DPMI in DOS and it doesn't try to load
any DPMI host / DOS extender by itself nor has any built-in.
I need to run CWSDPMI.EXE manually before FASM.EXE. Not good.
I thought you were familiar with these options for CWSDPMI.
CWSDPMI -p
Still, that needs to be done explicitly ...
Yes. EDIT autoexec.bat. Change back when done using.

Or, create a .bat, FASM.bat, which execs CWSDPMI and FASM.
I don't recall the precedence of .bat .exe .com in DOS.
So, maybe make sure FASM.exe isn't on your path and FASM.bat
is, but there is a fix for that below.

E.g., something like this in your fasm.bat:

C:\CWSDPMI -p
C:\FASM\FASM.EXE %1 %2 %3 %4 %5 %6 %7 %8 %9

Generally, I add an underscore to the name for .bat to
distinguish setup .bat's from the raw executable e.g.,
for gcc my .bat is gcc_.bat. This means you don't have
to worry about the path or execution precedence. So,
I use gcc_ to setup gcc for path's and environment vars
etc, and gcc to actually use gcc once it's setup.

I would think that stubbing with D3X etc would be easier ...
It being a one-time deal and all.
... more than zero times.
I'm not sure exactly what you mean by that, since you didn't
specify the situation where you'd need to reload CWSDPMI
repeatedly. I can think of a couple situations where that
might be the case, but those aren't what I'd expect to be
normal use.

For normal situations, CWSDPMI -p causes CWSDPMI to remain
resident, until you reboot or you unload via CWSDPMI -u,
*if* you haven't botched up the IVT interrupt used to detect
DPMI (on 2Fh) or the interrupt to exit DPMI and DOS (on 21h).
I.e., under normal circumstances, there is no need to reload
since it'll stay resident. CWSDPMI only loads a single
instance even if you start additional apps via spawn or system.
PMODEDJ (PMODETSR.EXE) will load an instance for each app.


Rod Pemberton
Alexei A. Frounze
2015-02-16 12:35:44 UTC
Permalink
Post by Rod Pemberton
On Sun, 15 Feb 2015 22:42:48 -0500, Alexei A. Frounze
Post by Rod Pemberton
On Sat, 14 Feb 2015 17:19:07 -0500, Alexei A. Frounze
Post by Alexei A. Frounze
Looks like it requires DPMI in DOS and it doesn't try to load
any DPMI host / DOS extender by itself nor has any built-in.
I need to run CWSDPMI.EXE manually before FASM.EXE. Not good.
I thought you were familiar with these options for CWSDPMI.
CWSDPMI -p
Still, that needs to be done explicitly ...
Yes. EDIT autoexec.bat. Change back when done using.
There's no autoexec.bat in DOSBox. :) But there's a
configuration file with a dedicated section.
Post by Rod Pemberton
Or, create a .bat, FASM.bat, which execs CWSDPMI and FASM.
I don't recall the precedence of .bat .exe .com in DOS.
So, maybe make sure FASM.exe isn't on your path and FASM.bat
is, but there is a fix for that below.
C:\CWSDPMI -p
C:\FASM\FASM.EXE %1 %2 %3 %4 %5 %6 %7 %8 %9
...

You seem to be assuming I'm going to use FASM after all the
discoveries I've made about it. I don't intend to switch to
it because it clearly doesn't have the functionality my
compiler needs and because its reliability and usability
under DOS are questionable. I can clearly work around these
problems manually (use a different VM environment, edit
autoexec.bat, etc), but I don't want to make my compiler's
users have to deal with them as well. Sorry, but FASM is
just not good enough for me here.
Post by Rod Pemberton
I would think that stubbing with D3X etc would be easier ...
It being a one-time deal and all.
I don't want to maintain and distribute a modified version
of a 3rd party tool. Nor do I want to contribute to it.
Not in this case.
Post by Rod Pemberton
... more than zero times.
I'm not sure exactly what you mean by that, ...
One is more than zero. That's the whole point. Additional
editing of autoexec.bat counts as one as well in my book.

Alex
s***@yahoo.com
2015-02-14 17:45:59 UTC
Permalink
Post by Alexei A. Frounze
I'm still contemplating the idea of implementing a simple assembler for Smaller C to make it fully self-sufficient, easily portable (someone asked for a C compiler for xv6 on Stack Overflow) and a tad faster as it turns out that NASM can be horribly slow (I think I've mentioned that before).
But there's still one unsolved technical problem. Namely, I need NASM's ability to automatically substitute short and long jumps as necessary, that is, either with an 8-bit relative address or with a 16/32-bit one, depending on how far the target location is from the jump instruction.
It looks like it's not a trivial problem. I may have inquired about it before (Rod might be able to confirm), but I don't remember ever finding or learning a reasonably good solution.
There's one solution that I came up during the weekend, though. I wonder if you could suggest improvements or something radically better. Before I state it I should probably note that I've considered a dynamic programming solution, but it looks like the problem can't be trivially reduced to identical subproblems.
- regular instructions of known/fixed length
- relative jump instructions whose length isn't known beforehand
- align directives
- other instructions whose length may be subject to optimization similar to relative jumps (e.g. "mov ax, label2 - label1")
As far as this, AX takes a word value so the instruction length is the same.

The value of (label2-label1) is to be range-checked and filled in when those address values are known. Take the edge case of: "mov ax, end-of-file - start-of-file": This expression can not be resolved until the very end of fixing all the forward relative jumps.
Post by Alexei A. Frounze
- little memory overhead (should work in real mode in DOS)
- little I/O overhead
This seems to negate using intermediate files and also multiple passes? Intermediate files to hold dynamic data, or tables, which off-load internal assembler dynamic data spaces.
Post by Alexei A. Frounze
- cost not higher than quadratic with small coefficient
So, here's what I have so far...
Consider a normal, line-by-line assembling process.
If a relative jump instruction is encountered and its target label precedes it (=we've seen that label defined), figure out which relative offset should be there (8- or 16/32-bit) and go on.
The caveat here is that there must not be a pending Forward Reference (FR).

Resolving the FR would be needed before this current relative jump to the Known Reference (KR) can be calculated.
Post by Alexei A. Frounze
If the jump's target label is unknown (=defined somewhere ahead), note the position of this jump, chose an 8-bit relative offset and go on until another 127 or more bytes of code are generated. If the target label is encountered somewhere between the instructions from these 127 bytes, keep the 8-bit relative address and go on. Otherwise, restart assembling from that jump instruction but use a 16/32-bit relative address now.
Yeah, but..

jmp Exit ; FR-1
.
.
jmp Exit ; FR-2
.
jmp Error ; FR-1.error
.
jmp Exit ; FR-3
.
.
.
Exit: ; KR - Known Reference

Let's say FR-3 is a short reference.
Let's say FR-2 is a short reference iff FR-1.error is a short ref. otherwise it would be a long ref., presumably FR-1 is a long ref. And FR-1.error is unknown.

How do hope to cope with this? -In 'line-by-line' assembly?

(1) So, we have to process FR-3 first, triggered by the appearance of KR.
(2) FR-2 is suspended until FR-1.error is resolved.
(3) FR-3 is likewise suspended.
(4) Actually, FR-1.error needs to be resolved first.

This out-of-order requirement to resolve FR-3 first, coupled with the logical need to suspend for resolution of FR-1.error (actually needed as first to be done.) sort of indicates a recursive and stacked solution of some sort. Perhaps a software stack, or more likely, a dynamic tree structure. A tree node is created when a FR is found, and traversed when a matching KR is found, a FR.new branches to the right, right leaves are processed before left leaves.

Parsing on the assembly text can form the tree in partial stages.

We still need to handle distances in bytes. Fortunately, we break this down by Short (-128,+127), and Longer to fix the relative jmp instruction size.
AD - Assumed Distance in bytes. (Short or Long)
KD - Known Distance in bytes.

So perhaps a node for a FR contains the three fields: AD, KD, AsmTextLineNumber (not counting node links).

Oops, out of time..

hth,

Steve
Post by Alexei A. Frounze
After this pass all label addresses are known and can be encoded into instructions.
This is the basic idea.
There are two flaws, however.
First, assembling the same 127 instructions again is bad. So, they should be cached.
Second, there may be other jump instructions between the first instruction and its target label and those other jump instructions may also need to be changed from 8-bit relative addresses to 16/32-bit ones, which has the effect of moving the target label farther away and possibly triggering reassembly of one or more of the preceding jumps. You can end up with a chain reaction.
A possible (imperfect) workaround might be this... While assembling instructions that follow a jump instruction, note all instructions whose length isn't known yet (align, other jumps) and maintain a lower and upper bound of the size of the code assembled so far since the jump instruction. If the jump target label is found before the upper bound reaches 127 bytes, the 8-bit relative address can be kept and the process continued. Otherwise it should be switched to 16/32-bit relative address and the code will be reassembled from after the jump.
Instructions like "mov ax, label2 - label1" complicate things further, but such instructions should be rare and I can always choose the longest encoding for them.
With this should be able to make most jumps short when possible and have relatively little code size overhead from the unnecessarily long relative addresses and immediates. The time spent in reassembly should be limited because the reassembly window is short (127 bytes at most) and most of the instructions in the window should not change and can be cached.
What do you think?
Alex
Alexei A. Frounze
2015-02-14 22:52:49 UTC
Permalink
Post by s***@yahoo.com
Post by Alexei A. Frounze
I'm still contemplating the idea of implementing a simple assembler for Smaller C to make it fully self-sufficient, easily portable (someone asked for a C compiler for xv6 on Stack Overflow) and a tad faster as it turns out that NASM can be horribly slow (I think I've mentioned that before).
But there's still one unsolved technical problem. Namely, I need NASM's ability to automatically substitute short and long jumps as necessary, that is, either with an 8-bit relative address or with a 16/32-bit one, depending on how far the target location is from the jump instruction.
It looks like it's not a trivial problem. I may have inquired about it before (Rod might be able to confirm), but I don't remember ever finding or learning a reasonably good solution.
There's one solution that I came up during the weekend, though. I wonder if you could suggest improvements or something radically better. Before I state it I should probably note that I've considered a dynamic programming solution, but it looks like the problem can't be trivially reduced to identical subproblems.
- regular instructions of known/fixed length
- relative jump instructions whose length isn't known beforehand
- align directives
- other instructions whose length may be subject to optimization similar to relative jumps (e.g. "mov ax, label2 - label1")
As far as this, AX takes a word value so the instruction length is the same.
The value of (label2-label1) is to be range-checked and filled in when those address values are known. Take the edge case of: "mov ax, end-of-file - start-of-file": This expression can not be resolved until the very end of fixing all the forward relative jumps.
You're right, there's no MOV instruction to load an immediate shorter than the 32-bit or 16-bit destination. But there's PUSH signed byte and 8-bit signed displacement in instructions encoded with the ModR/M byte.
Post by s***@yahoo.com
Post by Alexei A. Frounze
- little memory overhead (should work in real mode in DOS)
- little I/O overhead
This seems to negate using intermediate files and also multiple passes? Intermediate files to hold dynamic data, or tables, which off-load internal assembler dynamic data spaces.
Post by Alexei A. Frounze
- cost not higher than quadratic with small coefficient
So, here's what I have so far...
Consider a normal, line-by-line assembling process.
If a relative jump instruction is encountered and its target label precedes it (=we've seen that label defined), figure out which relative offset should be there (8- or 16/32-bit) and go on.
The caveat here is that there must not be a pending Forward Reference (FR).
Resolving the FR would be needed before this current relative jump to the Known Reference (KR) can be calculated.
Post by Alexei A. Frounze
If the jump's target label is unknown (=defined somewhere ahead), note the position of this jump, chose an 8-bit relative offset and go on until another 127 or more bytes of code are generated. If the target label is encountered somewhere between the instructions from these 127 bytes, keep the 8-bit relative address and go on. Otherwise, restart assembling from that jump instruction but use a 16/32-bit relative address now.
Yeah, but..
jmp Exit ; FR-1
.
.
jmp Exit ; FR-2
.
jmp Error ; FR-1.error
.
jmp Exit ; FR-3
.
.
.
Exit: ; KR - Known Reference
Let's say FR-3 is a short reference.
Let's say FR-2 is a short reference iff FR-1.error is a short ref. otherwise it would be a long ref., presumably FR-1 is a long ref. And FR-1.error is unknown.
How do hope to cope with this? -In 'line-by-line' assembly?
(1) So, we have to process FR-3 first, triggered by the appearance of KR.
(2) FR-2 is suspended until FR-1.error is resolved.
(3) FR-3 is likewise suspended.
(4) Actually, FR-1.error needs to be resolved first.
This out-of-order requirement to resolve FR-3 first, coupled with the logical need to suspend for resolution of FR-1.error (actually needed as first to be done.) sort of indicates a recursive and stacked solution of some sort. Perhaps a software stack, or more likely, a dynamic tree structure. A tree node is created when a FR is found, and traversed when a matching KR is found, a FR.new branches to the right, right leaves are processed before left leaves.
Parsing on the assembly text can form the tree in partial stages.
We still need to handle distances in bytes. Fortunately, we break this down by Short (-128,+127), and Longer to fix the relative jmp instruction size.
AD - Assumed Distance in bytes. (Short or Long)
KD - Known Distance in bytes.
So perhaps a node for a FR contains the three fields: AD, KD, AsmTextLineNumber (not counting node links).
I don't see a problem here. If the label Error isn't found within 127 bytes from
jmp Error, it will be assumed to be far away (and need a 16/32-bit relative address) or external and assembling will restart at jmp Error. I don't need to change the order of assembling, just restart from an earlier point.

Alex
Loading...