Alexei A. Frounze
2015-02-10 09:32:24 UTC
I'm still contemplating the idea of implementing a simple assembler for Smaller C to make it fully self-sufficient, easily portable (someone asked for a C compiler for xv6 on Stack Overflow) and a tad faster as it turns out that NASM can be horribly slow (I think I've mentioned that before).
But there's still one unsolved technical problem. Namely, I need NASM's ability to automatically substitute short and long jumps as necessary, that is, either with an 8-bit relative address or with a 16/32-bit one, depending on how far the target location is from the jump instruction.
It looks like it's not a trivial problem. I may have inquired about it before (Rod might be able to confirm), but I don't remember ever finding or learning a reasonably good solution.
There's one solution that I came up during the weekend, though. I wonder if you could suggest improvements or something radically better. Before I state it I should probably note that I've considered a dynamic programming solution, but it looks like the problem can't be trivially reduced to identical subproblems.
Components of the problem:
- regular instructions of known/fixed length
- relative jump instructions whose length isn't known beforehand
- align directives
- other instructions whose length may be subject to optimization similar to relative jumps (e.g. "mov ax, label2 - label1")
Desirable qualities of the solution:
- little memory overhead (should work in real mode in DOS)
- little I/O overhead
- cost not higher than quadratic with small coefficient
So, here's what I have so far...
Consider a normal, line-by-line assembling process.
If a relative jump instruction is encountered and its target label precedes it (=we've seen that label defined), figure out which relative offset should be there (8- or 16/32-bit) and go on.
If the jump's target label is unknown (=defined somewhere ahead), note the position of this jump, chose an 8-bit relative offset and go on until another 127 or more bytes of code are generated. If the target label is encountered somewhere between the instructions from these 127 bytes, keep the 8-bit relative address and go on. Otherwise, restart assembling from that jump instruction but use a 16/32-bit relative address now.
After this pass all label addresses are known and can be encoded into instructions.
This is the basic idea.
There are two flaws, however.
First, assembling the same 127 instructions again is bad. So, they should be cached.
Second, there may be other jump instructions between the first instruction and its target label and those other jump instructions may also need to be changed from 8-bit relative addresses to 16/32-bit ones, which has the effect of moving the target label farther away and possibly triggering reassembly of one or more of the preceding jumps. You can end up with a chain reaction.
A possible (imperfect) workaround might be this... While assembling instructions that follow a jump instruction, note all instructions whose length isn't known yet (align, other jumps) and maintain a lower and upper bound of the size of the code assembled so far since the jump instruction. If the jump target label is found before the upper bound reaches 127 bytes, the 8-bit relative address can be kept and the process continued. Otherwise it should be switched to 16/32-bit relative address and the code will be reassembled from after the jump.
Instructions like "mov ax, label2 - label1" complicate things further, but such instructions should be rare and I can always choose the longest encoding for them.
With this should be able to make most jumps short when possible and have relatively little code size overhead from the unnecessarily long relative addresses and immediates. The time spent in reassembly should be limited because the reassembly window is short (127 bytes at most) and most of the instructions in the window should not change and can be cached.
What do you think?
Alex
But there's still one unsolved technical problem. Namely, I need NASM's ability to automatically substitute short and long jumps as necessary, that is, either with an 8-bit relative address or with a 16/32-bit one, depending on how far the target location is from the jump instruction.
It looks like it's not a trivial problem. I may have inquired about it before (Rod might be able to confirm), but I don't remember ever finding or learning a reasonably good solution.
There's one solution that I came up during the weekend, though. I wonder if you could suggest improvements or something radically better. Before I state it I should probably note that I've considered a dynamic programming solution, but it looks like the problem can't be trivially reduced to identical subproblems.
Components of the problem:
- regular instructions of known/fixed length
- relative jump instructions whose length isn't known beforehand
- align directives
- other instructions whose length may be subject to optimization similar to relative jumps (e.g. "mov ax, label2 - label1")
Desirable qualities of the solution:
- little memory overhead (should work in real mode in DOS)
- little I/O overhead
- cost not higher than quadratic with small coefficient
So, here's what I have so far...
Consider a normal, line-by-line assembling process.
If a relative jump instruction is encountered and its target label precedes it (=we've seen that label defined), figure out which relative offset should be there (8- or 16/32-bit) and go on.
If the jump's target label is unknown (=defined somewhere ahead), note the position of this jump, chose an 8-bit relative offset and go on until another 127 or more bytes of code are generated. If the target label is encountered somewhere between the instructions from these 127 bytes, keep the 8-bit relative address and go on. Otherwise, restart assembling from that jump instruction but use a 16/32-bit relative address now.
After this pass all label addresses are known and can be encoded into instructions.
This is the basic idea.
There are two flaws, however.
First, assembling the same 127 instructions again is bad. So, they should be cached.
Second, there may be other jump instructions between the first instruction and its target label and those other jump instructions may also need to be changed from 8-bit relative addresses to 16/32-bit ones, which has the effect of moving the target label farther away and possibly triggering reassembly of one or more of the preceding jumps. You can end up with a chain reaction.
A possible (imperfect) workaround might be this... While assembling instructions that follow a jump instruction, note all instructions whose length isn't known yet (align, other jumps) and maintain a lower and upper bound of the size of the code assembled so far since the jump instruction. If the jump target label is found before the upper bound reaches 127 bytes, the 8-bit relative address can be kept and the process continued. Otherwise it should be switched to 16/32-bit relative address and the code will be reassembled from after the jump.
Instructions like "mov ax, label2 - label1" complicate things further, but such instructions should be rare and I can always choose the longest encoding for them.
With this should be able to make most jumps short when possible and have relatively little code size overhead from the unnecessarily long relative addresses and immediates. The time spent in reassembly should be limited because the reassembly window is short (127 bytes at most) and most of the instructions in the window should not change and can be cached.
What do you think?
Alex