Discussion:
MSDOS divide by zero
(too old to reply)
muta...@gmail.com
2021-05-05 02:43:30 UTC
Permalink
Now that I've had some success with 80386, I
wondered why MSDOS didn't generate MVS-style
debug information when it crashed.

So I used the same divide-by-zero crash code, as
an MSDOS executable.

Freedos printed out "Divide error" and that's it.

PDOS/86 hung.

I don't have an INT 0 handler in PDOS/86, but I
had an expectation that the BIOS would have
installed default handlers for all exception conditions,
even if it was just "INT 0H invoked" and halted.

I put in this code:

printf("interrupt 0 handler is %p\n", *(char **)0);

and was surprised to see under both Freedos and PDOS/86, this:

interrupt 0 handler is 0072:CB5B

I was expecting 0000:0000 for PDOS/86, to explain
the hang.

And I didn't expect the values to be the same.

I am also surprised in that that address doesn't seem
to point to anything sensible. That is low memory
which seems to be used by my IO.SYS and MSDOS.SYS
stack.

Anyone know what is happening?

Thanks. Paul.
muta...@gmail.com
2021-05-05 04:33:33 UTC
Permalink
Post by ***@gmail.com
Now that I've had some success with 80386, I
wondered why MSDOS didn't generate MVS-style
debug information when it crashed.
I've thought of one reason - lack of memory protection
so the OS can't protect itself. You'll never get a Trap D.

Is there any option available on the 80386 (e.g. V86) or even
later processors, such as "unreal mode", that would
allow e.g. paging to be enabled, so that when PDOS/86
(not PDOS/386) is running, before it executes application
code it marks the OS pages, and interrupts, as read-only.
The 16-bit OS would still be using segmented memory,
the only difference is that address ranges below
40000 or whatever would be read-only.

That means you can't hook an interrupt unless you go
through the official channel, but that's fine by me too.

It's not designed to stop a nefarious application from
hooking INT 21H, it's designed to protect against
genuine NULL-pointer assignment and things like that.
A graceful crash, with PDOS/86 printing out/logging the
registers and stack trace.

BFN. Paul.
muta...@gmail.com
2021-05-05 05:13:29 UTC
Permalink
Post by ***@gmail.com
printf("interrupt 0 handler is %p\n", *(char **)0);
interrupt 0 handler is 0072:CB5B
It looks like it is a compiler (Watcom) issue.

I changed the code to:

char **m3 = 0;

printf("interrupt 0 handler is %p\n", *m3);

and the assembler looked much better, and I got sensible
and different values, and PDOS/86 has:

interrupt 0 handler is F000:FF53

which looks heaps better, and is presumably a silent
"halt" or something in the BIOS.

So I'm in a position to replace INT 0.

BFN. Paul.
muta...@gmail.com
2021-05-05 05:38:46 UTC
Permalink
Post by ***@gmail.com
interrupt 0 handler is F000:FF53
which looks heaps better, and is presumably a silent
"halt" or something in the BIOS.
You know, this is pretty deadly. It means the computer
freezes with no indication of what went wrong. If the
BIOS is going to do that, then it is imperative that the
OS replaces each deadly BIOS handler.

How many are there like this? I guess it is hardware-generated
interrupts we're talking about. And I guess they are different
for 8086 and 80386. Well, for the 80386 interrupt handlers are
completely non-existent. And I think I can't just get the 80386
interrupts to call their real mode equivalents by default.

Thanks. Paul.
muta...@gmail.com
2021-05-05 07:11:21 UTC
Permalink
I am having considerable success:

A:\]pdptest
warning - failed to open pdptest.com
welcome to pdptest
main function is at 561B:0000
interrupt 0 handler is 1079:4810
m1 is at 6315:0FC2
m2 is at 6315:0FC6
m1 is 0000:0000
m1 is now 0000:0005
got a divide by zero
AX 0005 BX 0000 CX 5C12 DX 0000
SI 6F70 DI 0000 DS 5C12 ES 5C12
BP 0FE8 CS 561B IP 00C1 FLAGS 0216
regptrs is 6315:0FA6
halting


But I have noticed that the IP is C1.

Here is the disassembly:

00C1 F7 FB idiv bx

The documentation says:

http://www.ctyme.com/intr/rb-0001.htm

Notes: On an 8086/8088, the return address points to the following instruction. On an 80286+, the return address points to the divide instruction. An 8086/8088 will generate this interrupt if the result of a division is 80h (byte) or 8000h (word)


But I'm getting the same behavior from 8086 as 80386,
ie it is pointing to the failed address.

Maybe it's because Bochs is configured as an 80386
rather than 8086, even though it is running real mode.

BTW, this is not rocket science. Why didn't MSDOS
print out the above diagnostics? And I haven't
finished yet.

BFN. Paul.
muta...@gmail.com
2021-05-05 11:32:23 UTC
Permalink
Post by ***@gmail.com
BTW, this is not rocket science. Why didn't MSDOS
print out the above diagnostics? And I haven't
finished yet.
I guess the main thing you want MSDOS to diagnose
is memory access violations, but the 8086 can't do
that, so maybe that's why they didn't bother with
sophisticated diagnosis. Just debugging divide by
zero situations isn't going to help much.

But an alternate processor, that is real mode plus
memory protection, could have made MSDOS
diagnosis come alive (and under Bochs we have
the ability to change the hardware).

Regardless, I've hit two problems.

One is that Watcom seems to be generating assembler
that doesn't handle bp the way I expected. Here is the code:

0000 main_:
0000 52 push dx
0001 56 push si
0002 57 push di
0003 55 push bp
0004 89 E5 mov bp,sp
0006 83 EC 26 sub sp,0x0026

The fact that it is only pushing bp after pushing some
other random registers means I can't get the return
address, and the next bp in the chain, from a predictable
spot. I think there needs to be a convention for MSDOS
like MVS has. Since I have source code to Watcom, GCC
and Smaller C, I'll see if some sensible standard can
be created.

The next problem is that wdis claims to have generated
this code:

07DF 8C D9 mov cx,ds
07E1 9A 00 00 00 00 call main_
07E6 89 C3 mov bx,ax

But after a lot of effort, I found out that that is not true,
the actual code is:

005F30 D90E3EE8 3AA189C3 0EE80800 89D889EC ..>.:...........

0E 3EE8 3AA1

I'm guessing that this constitutes a relative branch,
and that has the benefit of not requiring relocation
at load time.

However, that return address, offset 07E6 you see above,
is actually address 561B:5EC6

And it is supposed to be calling main_ at 561B:0000, which
is the load point (but not the entry point).

I've crunched that number 3EE8 3AA1, ie A13A:E83E
every which way I can think of, and I can see that the
Axxx is going to do an address wrap at A000 and
effectively do a subtraction, which is what we need
to go backwards, but I don't see how it can get back
to 561B:0000.

I also tried using debug:

C:\DEVEL\PDOS\SRC>debug
-e 561b:5ec1
561B:5EC1 00.0e 00.3e 00.e8 00.3a 00.a1 00.
-u 561b:5ec1
561B:5EC1 0E PUSH CS
561B:5EC2 3E SEG DS (unused)
561B:5EC3 E83AA1 CALL 0000
561B:5EC6 0000 ADD [BX+SI],AL
561B:5EC8 0000 ADD [BX+SI],AL

But that x'0e' seems to be an unknown instruction.

Any ideas?

Thanks. Paul.



AX 0005 BX 0000 CX 5C12 DX 0000
SI 6F70 DI 0000 DS 5C12 ES 5C12
BP 0FE8 CS 561B IP 00C1 FLAGS 0216
module loaded at 561B:0000, entry point 561B:551E
interrupt address is 561B:00C1
regptrs is 6315:0FA6
previous function's return address is 561B:5EC6
byte at retaddr is 89
byte at retaddr + 1 is C3
byte at retaddr - 5 is 0E
calladdr after subtracting 5 is 561B:5EC1
at calladdr - 2 is 8C
at calladdr - 1 is D9
byte at calladdr is 0E
byte at calladdr + 1 is 3E
byte at calladdr + 2 is E8
byte at calladdr + 3 is 3A
byte at calladdr + 4 is A1
byte at calladdr + 5 is 89
it probably called A13A:E83E
or maybe A13AE83E
call by relative address as indicated by 0E
newaddr is 561B:4704
halting
muta...@gmail.com
2021-05-07 10:20:31 UTC
Permalink
On Wednesday, May 5, 2021 at 9:32:24 PM UTC+10, ***@gmail.com wrote:

This mystery has now been solved.
Post by ***@gmail.com
-u 561b:5ec1
561B:5EC1 0E PUSH CS
561B:5EC2 3E SEG DS (unused)
561B:5EC3 E83AA1 CALL 0000
561B:5EC6 0000 ADD [BX+SI],AL
561B:5EC8 0000 ADD [BX+SI],AL
But actually it is exactly correct.

CS is pushed, allowing a far return.
Then a stupid DS override is done. Don't know why
a NOP wasn't used.
Then a near call is done, since this code all fits in
a 64k segment.
The call value is A13A, which, when added to 5EC6
causes an offset wraparound to 0.

CS:0000 is actually what we're after. Now my dump
looks like this:

got a divide by zero
AX 0005 BX 0000 CX 5C12 DX 0000
SI 6F70 DI 0000 DS 5C12 ES 5C12
BP 0FE8 CS 561B IP 00C1 FLAGS 0216
module loaded at 561B:0000, entry point 561B:551E
interrupt address is 561B:00C1
regptrs is 6315:0FA6
retaddr is 561B:5EC6
and that caller might have done a near call to 561B:0000
halting


And I have confirmed that Watcom is generating everything
needed to debug this problem, if PDOS/86 provides the
raw information.

wcl with "-d1" followed by wdis with "-s" gives:

rc = main(argc, argv);
07DC L$14:
07DC BB 20 48 mov bx,offset DGROUP:L$25
07DF 8C D9 mov cx,ds
07E1 9A 00 00 00 00 call main_
07E6 89 C3 mov bx,ax

Executables could do with standardization of calling
convention so that PDOS/86 can provide a nice
stack traceback. Otherwise a custom tool needs to
be written, so long as PDOS/86 provides the raw
data in its dump file.

Then debugging MSDOS applications won't look much
different to MVS. I'm happy with MVS.

BFN. Paul.
wolfgang kern
2021-05-05 16:23:36 UTC
Permalink
I wondered why MSDOS didn't generate MVS-style
debug information when it crashed.
DOS (depending on version) create its own exception handlers.
So I used the same divide-by-zero crash code, as
an MSDOS executable.
My test for div0 is just two bytes: D4 00.
Freedos printed out "Divide error" and that's it.
PDOS/86 hung.
I don't have an INT 0 handler in PDOS/86, but I
had an expectation that the BIOS would have
installed default handlers for all exception conditions,
even if it was just "INT 0H invoked" and halted.
printf("interrupt 0 handler is %p\n", *(char **)0);
this is text and who knows what the compiler may produce for it ?
interrupt 0 handler is 0072:CB5B
I was expecting 0000:0000 for PDOS/86, to explain
the hang.
[address 0:0 is the location of the INT_0 far pointer]
[this pointer is initially set by BIOS to FFFF:.... ]
[DOS and all other OS may alter or hook location 0:0 ]
And I didn't expect the values to be the same.
your PDOS may just use a copy of FreeDos ?
I am also surprised in that that address doesn't seem
to point to anything sensible. That is low memory
which seems to be used by my IO.SYS and MSDOS.SYS
stack.
Anyone know what is happening?
0072:CB5B seems odd to me but may actually really point
to an exception handler. Easy to check with DOS-debug:

RCS 0072 U CB5B

be aware that every DOS-handler creates a stack in a pretty
weird way. There are always a few data bytes within the code
perhaps just to fool disassemble attempts like you tried yet
__
wolfgang
muta...@gmail.com
2021-05-05 21:09:47 UTC
Permalink
Post by wolfgang kern
I wondered why MSDOS didn't generate MVS-style
debug information when it crashed.
DOS (depending on version) create its own exception handlers.
Yes, and why didn't those exception handlers
print out registers etc to aid in debugging?
Post by wolfgang kern
printf("interrupt 0 handler is %p\n", *(char **)0);
this is text and who knows what the compiler may produce for it ?
Not sure what you mean by that.

The code looks OK to me, but obviously didn't
work. Maybe it needs to see a (void *)0 first.
Post by wolfgang kern
And I didn't expect the values to be the same.
your PDOS may just use a copy of FreeDos ?
What do you mean by that? They are independent
products, and I reboot to switch between them.

BFN. Paul.
wolfgang kern
2021-05-05 22:23:08 UTC
Permalink
Post by ***@gmail.com
Post by wolfgang kern
I wondered why MSDOS didn't generate MVS-style
debug information when it crashed.
DOS (depending on version) create its own exception handlers.
Yes, and why didn't those exception handlers
print out registers etc to aid in debugging?
Post by wolfgang kern
printf("interrupt 0 handler is %p\n", *(char **)0);
DOS exception handlers wont print debug info because the
debugger is an add-on file and not part of the OS.
windoze Loonix KESYS and most other OS implied a debugger,
DOS just didn't.
Post by ***@gmail.com
Post by wolfgang kern
this is text and who knows what the compiler may produce for it ?
Not sure what you mean by that.
I can't see the resulting code of this.
Post by ***@gmail.com
The code looks OK to me, but obviously didn't
work. Maybe it needs to see a (void *)0 first.
Post by wolfgang kern
And I didn't expect the values to be the same.
your PDOS may just use a copy of FreeDos ?
What do you mean by that? They are independent
products, and I reboot to switch between them.
independent but use identical pointers ? :)

Have you tried this (repeat):

0072:CB5B seems odd to me but may actually really point
to an exception handler. Easy to check with DOS-debug:

RCS 0072 U CB5B

be aware that every DOS-handler creates a stack in a pretty
weird way. There are always a few data bytes within the code
perhaps just to fool disassemble attempts like you tried yet
__
wolfgang
muta...@gmail.com
2021-05-05 23:36:18 UTC
Permalink
Post by wolfgang kern
Post by ***@gmail.com
Yes, and why didn't those exception handlers
print out registers etc to aid in debugging?
Post by wolfgang kern
Post by ***@gmail.com
printf("interrupt 0 handler is %p\n", *(char **)0);
DOS exception handlers wont print debug info because the
debugger is an add-on file and not part of the OS.
windoze Loonix KESYS and most other OS implied a debugger,
DOS just didn't.
I think we're talking cross-purposes.

I'm not talking about a debugger. I'm talking about
diagnosis information.

On any system, when an application divides by 0,
you need to find out what went wrong.

To do that you need to know the name of the program
that crashed, where it was loaded in memory, what
the CS:IP was when it crashed, what the registers
were when it crashed, and what the stack callback
was, and then the entire application memory, code
plus data plus stack plus BSS plus heap.

You then send all that information to the author of
the failing program, or possibly the OS vendor (you
never know who has the bug, and it could be yet
another bit of software that caused the problem).
They then *attempt* to figure out what went wrong.
They may fail, and require the problem to be
reproducible before they can fix it. But normally the
dump is sufficient. Or they may ask for tracing to be
activated.

They may have some debug tools/debugger, but
that is a separate question, beyond scope.

That's what I'm used to on MVS, way back when I
started in the 1980s.

The concept of printing "Divide error" and that's all,
is beyond my comprehension.

What is the end user supposed to do when he sees
that printed on his screen instead of perhaps the
customer's date of birth which is what he was trying
to obtain when he ran his application?

If his application is called CUSTINFO.EXE, is he
supposed to look up the CUSTINFO manual and
see what "Divide error" means when returned as a
date of birth?

What are YOUR expectations?
Post by wolfgang kern
Post by ***@gmail.com
Post by wolfgang kern
your PDOS may just use a copy of FreeDos ?
What do you mean by that? They are independent
products, and I reboot to switch between them.
independent but use identical pointers ? :)
Are you suggesting that I took source code from
Freedos to use in PDOS/86?

No, I did not. Not one single byte. I'm not sure I've
ever even looked at the Freedos kernel. It's a waste
of time, and even harmful, as I may become stuck in
a rut if I see how someone else did something. I didn't
look at MSDOS either. I'm not interested in other
people's copyrighted code when I'm trying to replace
it. I prefer a clean room implementation, and
ask for technical details in English in this forum. I would
look at the MSDOS source code if it was the only source
of some technical information though. But I'm not even
writing in assembler anyway.

I did look at the Freedos fdisk though, to see what it would
take to port to PDOS/386. But since then I have had my
own idea for how to write an fdisk equivalent, which is
radically different, and is to treat the whole disk as a file
and use fseek/fwrite to implement fdisk.
Post by wolfgang kern
0072:CB5B seems odd to me but may actually really point
I didn't reply to that because you were replying to an
old message. I changed the code and got better
generated assembler and got a better result, and
already posted that.

BFN. Paul.
wolfgang kern
2021-05-06 13:09:38 UTC
Permalink
Post by ***@gmail.com
Post by wolfgang kern
Post by ***@gmail.com
Yes, and why didn't those exception handlers
print out registers etc to aid in debugging?
DOS exception handlers wont print debug info because the
debugger is an add-on file and not part of the OS.
windoze Loonix KESYS and most other OS implied a debugger,
DOS just didn't.
I think we're talking cross-purposes.
I'm not talking about a debugger. I'm talking about
diagnosis information.
how could it make any diagnose without a debugger ?
what of this line didn't you understand ?

"DOS doesn't have it !"

exception handlers will only know the faulting address but
nothing about registers nor about calling instances.

DOS IOsys terminates on exception with an error code.
DOS programmers know about it and write BATCH-files:

ECHO on
c:\com\test.com
ECHO TESTING test.com
if errorlevel 1 goto err_1
...
err_1:
echo error xxx ocured.
c:\dos\debug.com
exit
__
wolfgang
muta...@gmail.com
2021-05-06 14:20:53 UTC
Permalink
Post by wolfgang kern
Post by ***@gmail.com
Post by wolfgang kern
Post by ***@gmail.com
Yes, and why didn't those exception handlers
print out registers etc to aid in debugging?
DOS exception handlers wont print debug info because the
debugger is an add-on file and not part of the OS.
windoze Loonix KESYS and most other OS implied a debugger,
DOS just didn't.
I think we're talking cross-purposes.
I'm not talking about a debugger. I'm talking about
diagnosis information.
how could it make any diagnose without a debugger ?
I still don't understand. To answer your question
above, like this:

got a divide by zero
AX 0005 BX 0000 CX 5C12 DX 0000
SI 6F70 DI 0000 DS 5C12 ES 5C12
BP 0FE8 CS 561B IP 00C1 FLAGS 0216
Post by wolfgang kern
what of this line didn't you understand ?
"DOS doesn't have it !"
The question is "why doesn't DOS have it?".
Post by wolfgang kern
exception handlers will only know the faulting address but
nothing about registers nor about calling instances.
DOS has that information available, and doesn't
let anyone know. Why?
Post by wolfgang kern
DOS IOsys terminates on exception with an error code.
ECHO on
c:\com\test.com
ECHO TESTING test.com
if errorlevel 1 goto err_1
...
echo error xxx ocured.
c:\dos\debug.com
exit
That's completely useless for reporting a bug in test.com
to the software vendor.

Hey software vendor, there's a bug in your program!
DOS tells me so!

How often does it happen?

Once every 3 months.

What instruction does it fail on?

What's an instruction?

Where's the dump file?

What's a dump file?

BFN. Paul.
wolfgang kern
2021-05-07 06:40:20 UTC
Permalink
On 06.05.2021 16:20, ***@gmail.com wrote:
...
Post by ***@gmail.com
Post by wolfgang kern
Post by ***@gmail.com
I'm not talking about a debugger. I'm talking about
diagnosis information.
how could it make any diagnose without a debugger ?
I still don't understand. To answer your question
got a divide by zero
AX 0005 BX 0000 CX 5C12 DX 0000
SI 6F70 DI 0000 DS 5C12 ES 5C12
BP 0FE8 CS 561B IP 00C1 FLAGS 0216
this above come from a debugger or from a handmade
exception handler but not from IOSYS.
Post by ***@gmail.com
Post by wolfgang kern
what of this line didn't you understand ?
"DOS doesn't have it !"
The question is "why doesn't DOS have it?".
ask Bill Gates. I assume a money-issue...not needed 50years ago.
Post by ***@gmail.com
Post by wolfgang kern
exception handlers will only know the faulting address but
nothing about registers nor about calling instances.
DOS has that information available, and doesn't
let anyone know. Why?
DOS doesn't gather this information because there is no chance
at all for a program to continue after an fail exception.
But code CC aka INT3 (not INT03) aka break-point work as trap.

perhaps your DOS version have more debug support then what I see.
__
wolfgang
muta...@gmail.com
2021-05-07 06:51:59 UTC
Permalink
Post by wolfgang kern
Post by ***@gmail.com
got a divide by zero
AX 0005 BX 0000 CX 5C12 DX 0000
SI 6F70 DI 0000 DS 5C12 ES 5C12
BP 0FE8 CS 561B IP 00C1 FLAGS 0216
this above come from a debugger or from a handmade
exception handler but not from IOSYS.
It doesn't come from any of the above. It comes from
MSDOS.SYS which is what PDOS/86 calls the kernel.
Post by wolfgang kern
Post by ***@gmail.com
Post by wolfgang kern
what of this line didn't you understand ?
"DOS doesn't have it !"
The question is "why doesn't DOS have it?".
ask Bill Gates. I assume a money-issue...not needed 50years ago.
I thought Gates had lots of money. Surely it was
needed 50 years ago? How else do you report
bugs on your production machine?

That's how MVS did it. Why not MSDOS? Any reason
why PDOS/86 shouldn't do it?
Post by wolfgang kern
Post by ***@gmail.com
Post by wolfgang kern
exception handlers will only know the faulting address but
nothing about registers nor about calling instances.
DOS has that information available, and doesn't
let anyone know. Why?
DOS doesn't gather this information because there is no chance
at all for a program to continue after an fail exception.
I don't want the program to continue. I want DOS to
report full registers at time of error, stack traceback,
complete memory dump (all 640k) to a disk file
called c:\dump.txt, ready to be sent to the software
vendor for diagnosis.
Post by wolfgang kern
perhaps your DOS version have more debug support then what I see.
I don't quite understand that sentence. I'm not sure
this is called "debug support", and I'm not sure what
you are seeing.

BFN. Paul.
James Harris
2021-05-23 18:09:21 UTC
Permalink
Post by wolfgang kern
I wondered why MSDOS didn't generate MVS-style
debug information when it crashed.
DOS (depending on version) create its own exception handlers.
So I used the same divide-by-zero crash code, as
an MSDOS executable.
My test for div0 is just two bytes: D4 00.
That's AAM (which carries out a division) with divisor as 0 instead of
10. Rather than 'test for' I presume you mean that's what you use to
/generate/ div0?

Worth noting, perhaps, that on many CPUs it's undocumented so is not
guaranteed to work on them all. Dividing by a memory word or byte that's
a constant zero may be safer, though definitely not as short!
--
James Harris
wolfgang kern
2021-05-24 08:18:09 UTC
Permalink
Post by James Harris
Post by wolfgang kern
My test for div0 is just two bytes: D4 00.
That's AAM (which carries out a division) with divisor as 0 instead of
10. Rather than 'test for' I presume you mean that's what you use to
/generate/ div0?
Yes.
Post by James Harris
Worth noting, perhaps, that on many CPUs it's undocumented so is not
guaranteed to work on them all. Dividing by a memory word or byte that's
a constant zero may be safer, though definitely not as short!
It became finally documented by AMD within the info of other than 0A
bytes for AAM/AAD. Even slow and obsolete now they were quite handy.
My disassembler recognizes them as MULADD*_imm8 and DIVMOD/_imm8.
__
wolfgang

Loading...