Discussion:
primitive debugging
(too old to reply)
muta...@gmail.com
2021-04-27 19:07:58 UTC
Permalink
I currently have a situation where "ld" is giving
an error, and when I put in printfs to isolate the
fault, it took me back to fwrite() returning a
large negative number, and when I went to
debug that, it took me away from that error
and at the moment, after the last message
printed, I don't even know something as basic
as whether the CPU is even executing
instructions or not.

Because I found out that my application can
write to address 0 and it can also divide by
zero without any indication of a problem.

The problem is almost certainly in either
PDPCLIB or PDOS, but it is only revealed when
a specific application exercises it.

If I was running Hercules it would show the PSW
every second so I would at least know whether it
is actually running. But with Bochs I know nothing.
Yes, I could use the debug version of Bochs instead,
and do it that way, but that's cheating. I want to know
I can do this on raw metal on a primitive processor
that doesn't have debugging facilities.

What I would like to do is insert INT 24H (or some
other suitable number) after every machine
instruction, and get the OS to print the current CPU
instruction, and later some registers.

I don't care how long it takes for the output to be
printed. I've always wanted to spend more time
with my family anyway (yeah, right!).

Everything will be loaded at consistent addresses,
and I can print those out before doing the INT 24H
thing so that I can interpret the results.

I need to coax GCC to insert INT 24H instructions
everywhere.

Or is there some other technique? And is there a
different number than 24H that would be appropriate?

Thanks. Paul.
Scott Lurndal
2021-04-27 19:16:40 UTC
Permalink
Post by ***@gmail.com
I currently have a situation where "ld" is giving
an error, and when I put in printfs to isolate the
fault, it took me back to fwrite() returning a
large negative number, and when I went to
debug that, it took me away from that error
and at the moment, after the last message
printed, I don't even know something as basic
as whether the CPU is even executing
instructions or not.
Have your code write a value to port 80 periodically.

That's the debug output port. You can get port 80 cards
if your mainboard doesn't have a port 80 7-segment display.
muta...@gmail.com
2021-04-27 21:31:11 UTC
Permalink
Post by Scott Lurndal
Post by ***@gmail.com
I currently have a situation where "ld" is giving
an error, and when I put in printfs to isolate the
fault, it took me back to fwrite() returning a
large negative number, and when I went to
debug that, it took me away from that error
and at the moment, after the last message
printed, I don't even know something as basic
as whether the CPU is even executing
instructions or not.
Have your code write a value to port 80 periodically.
That's the debug output port. You can get port 80 cards
if your mainboard doesn't have a port 80 7-segment display.
That's fantastic!

https://stackoverflow.com/questions/6793899/what-does-the-0x80-port-address-connect-to

https://www.intel.com.au/content/www/au/en/support/articles/000005500/boards-and-kits.html#port80h

BTW, another idea I had was instead of doing an INT 24H
or whatever at an instruction level I could get GCC to insert
a printf of __FILE__ and __LINE__ in every valid place to do so.
Possibly via an INT 25H instead of a library routine. More of
a puts than a printf actually.

BFN. Paul.
muta...@gmail.com
2021-04-27 22:04:32 UTC
Permalink
The next thing I would like to know is that when I
have identified a place of interest, and the OS is
printing out a debug message via INT 21H, how
do I get the OS to chain through the entire call
stack printing out the return addresses?

The OS is currently in a.out format while the caller
is in PECOFF format if that complicates things.

And actually, if I have that information printed out
via port 80 or whatever, I would be able to write a
utility to match the addresses to a line of source
code.

I don't mind recompiling my application with
optimization off.

What are the steps I need to do?

I might be able to recompile the app in a.out format
or the OS in PECOFF format if that would be helpful,
but I don't think I'm ready for the latter yet.

Thanks. Paul.
muta...@gmail.com
2021-04-28 08:38:19 UTC
Permalink
Post by ***@gmail.com
The next thing I would like to know is that when I
have identified a place of interest, and the OS is
printing out a debug message via INT 21H, how
Actually, I'll probably try to implement the required
trap for divide by zero. Currently it just hangs.
Some other traps are intercepted.
Post by ***@gmail.com
do I get the OS to chain through the entire call
stack printing out the return addresses?
I'm familiar with S/370 where save areas are chained
via R13, but from my knowledge of 80386 assembler
programming, there is no way of knowing how much
the stack pointer has been increased by a routine,
which makes it impossible to chain back unless a new
convention is introduced.

I'm guessing that's what a "frame pointer" is, and what
is required is for assembler programs to also follow
that convention. I've never seen any assembler code
that does that though.

I haven't seen very much 80386 assembler code though. :-)

Actually, it also means that everyone needs to be using
the same calling convention too. Unlike Watcom putting
parameters in registers.

Maybe that's why I never see an 80386 stack traceback
from the OS (including OS/2) when an exception occurs
like I see on MVS. Because there is no agreed convention.

In my case I'm happy to assume a consistent convention
if there is something good to choose from. ie if it works
for both GCC and Smaller C, that's probably good enough.

And I assume I'll have to make some changes to my
assembler code as well.

BFN. Paul.
muta...@gmail.com
2021-04-28 08:54:18 UTC
Permalink
Post by ***@gmail.com
I'm familiar with S/370 where save areas are chained
via R13, but from my knowledge of 80386 assembler
programming, there is no way of knowing how much
the stack pointer has been increased by a routine,
which makes it impossible to chain back unless a new
convention is introduced.
Actually, I guess that's what I'm after. I would like
PDOS/386 to be as good as MVS as far as debugging
is concerned (getting decent system dumps whenever
you have a memory access violation).

Unless there is some fundamental technical problem
preventing that from happening. In which case I'd at
least like to understand what it is.

BFN. Paul.
muta...@gmail.com
2021-04-28 23:58:39 UTC
Permalink
Post by ***@gmail.com
Actually, I'll probably try to implement the required
trap for divide by zero. Currently it just hangs.
Some other traps are intercepted.
I have done this and get:

C:\]pdptest
warning - failed to open pdptest.com
exeStart is 00400000
entry point is 00401000
welcome to pdptest
Divide by zero fault occured (Protected Mode Exception 0x0)
EAX 00000005 EBX 00401000 ECX 00000001 EDX 00000000
ESI 00000001 EDI 000157E8 FLAGS 00000000
regs[-1] is E0012F4C
regs[-2] is 00000000
regs[-3] is 0001F888
regs[-4] is 00455FB0
regs[-5] is 00401000
regs[-6] is 00000005
regs[-7] is 00401000
regs[-8] is 608927EB
regs[-9] is 0006014C
System halting

I confirmed that the EAX value is as per my application.

There was some debate over what regs[-1] is, in an
attempt to do a traceback.

I thought it was the return address from the divide by 0
interrupt.

Unfortunately because of the virtual memory crap that was
added to PDOS/386 I don't know what the situation is.

I'm thinking of ripping all that out now that Alica is no longer
around to support it and returning to a simple, understandable
system for developers.

Anyway, I decided to do some more experimentation, and
I found that my PECOFF gcc and my a.out gcc were producing
identical code, other than an "ident". I might see if I can get
rid of that too.

That "ident" doesn't seem to appear in the executable, so I'm
wondering why e.g. my a.out version of "as" is 454k while the
Win32 version is 356k. The C library used by the a.out version
is only 65k, not enough to account for the difference.

I then compiled with -fomit-frame-pointer to see what
happened, and now I know what a frame pointer is:

C:\devel\pdos\src\xxx5>diff win.s foo.s
5,6d4
< pushl %ebp
< movl %esp, %ebp
9c7,8
< movl 12(%ebp), %ebx
---
Post by ***@gmail.com
movl 16(%esp), %ebx
movl 12(%esp), %esi
11d9
< movl 8(%ebp), %esi
18c16
< leal -8(%ebp), %esp
---
Post by ***@gmail.com
addl $20, %esp
21d18
< leave

It's just talking about the pushing of (e)bp.

And basically everyone is doing that anyway, so that isn't
controversial or new.

And now that I think about it, the ebp value should be
enough to trace back the call stack.

And produce a formatted dump just like MVS has - way cool!

BFN. Paul.
muta...@gmail.com
2021-04-29 00:00:28 UTC
Permalink
Post by ***@gmail.com
I'm thinking of ripping all that out now that Alica is no longer
around to support it and returning to a simple, understandable
system for developers.
And now that relocations are being generated for
Windows executables.

BFN. Paul.
muta...@gmail.com
2021-05-02 08:54:13 UTC
Permalink
Post by ***@gmail.com
Unfortunately because of the virtual memory crap that was
added to PDOS/386 I don't know what the situation is.
I'm thinking of ripping all that out now that Alica is no longer
around to support it and returning to a simple, understandable
system for developers.
I have #ifdef'ed out the VM code. It wasn't a big effort,
but there is still a side-effect that needs to be tracked
down.

My graphics is working again now too.

Anyway, with fresh snapshot, it seems that the regs[-1]
is pointing to the stack of the previous application
(command.com) that launched this one (pdptest.exe).

At some level it makes sense that the stacks are linked,
because each application gets its own stack.

Maybe that means that the interrupt location is on the
stack of the failing executable, so I need to regain access
to that.

BFN. Paul.



C:\]pdptest
warning - failed to open pdptest.com
exeStart is 002559B0
entry point is 002559B0
warning - still using a.out format
welcome to pdptest
main function is at 00255B91
Divide by zero fault occured (Protected Mode Exception 0x0)
EAX 00000005 EBX 0026345F ECX 002A45F8 EDX 00000000
ESI 00000001 EDI 00015140 FLAGS 00000000
regs[-1] is 0025581C
regs[-2] is 00000000
regs[-3] is 0001E720
regs[-4] is 002A472C
regs[-5] is 0026345F
regs[-6] is 00000005
regs[-7] is 0026345F
regs[-8] is 00000001
regs[-9] is 0025584C
regs[-10] is 00000001
System halting
muta...@gmail.com
2021-05-02 10:22:24 UTC
Permalink
Anyway, with fresh snapshot, it seems that the regs[-1]
is pointing to the stack of the previous application
(command.com) that launched this one (pdptest.exe).
At some level it makes sense that the stacks are linked,
because each application gets its own stack.
Maybe that means that the interrupt location is on the
stack of the failing executable, so I need to regain access
to that.
No, the problem is that I am overly familiar with MVS
and expected the stack to grow up.

A fresh display shows that a chain exists within my
current stack.

Next I'll see if I can follow that chain somewhere, and
see if I find the interrupt location along the way.

BFN. Paul.



C:\]pdptest
warning - failed to open pdptest.com
exeStart is 002559B0
entry point is 002559B0
warning - still using a.out format
welcome to pdptest
main function is at 00255B91
stack is around 002A471C
Divide by zero fault occured (Protected Mode Exception 0x0)
EAX 00000005 EBX 0026345F ECX 002A45F0 EDX 00000000
ESI 00000001 EDI 002A471C FLAGS 00000000
regs are at 0025581C
regs[8] is 002A46BC
regs[9] is 00000030
regs[10] is 00000000
regs[11] is 00000000
regs[12] is 00CF9300
regs[13] is 0000FFFF
regs[14] is 000041CE
regs[15] is 00000000
regs[16] is 000041A6
regs[17] is 00000246
System halting
muta...@gmail.com
2021-05-02 12:19:23 UTC
Permalink
Post by ***@gmail.com
Next I'll see if I can follow that chain somewhere, and
see if I find the interrupt location along the way.
I think I'm nearly there. regs[8], which I believe is EBP,
is pointing to some data, where regs[8] has what I need,
I think.

warning - failed to open pdptest.com
exeStart is 002559B0
entry point is 002559B0
warning - still using a.out format
welcome to pdptest
main function is at 00255B91
stack is around 002A471C
Divide by zero fault occured (Protected Mode Exception 0x0)
EAX 00000005 EBX 0026345F ECX 002A45F0 EDX 00000000
ESI 00000001 EDI 002A471C FLAGS 00000000
regs are at 0025581C
regs[8] is 002A46BC
next regs[0] is 00000000
next regs[1] is 00254B00
next regs[2] is 0026345F
next regs[3] is 002542EC
next regs[4] is 00000030
next regs[5] is 00000021
next regs[6] is 00000030
next regs[7] is 00000005
next regs[8] is 00255BEB
next regs[9] is 00000028
next regs[10] is 00010202
System halting

I believe I need to decipher this:

https://sourceforge.net/p/pdos/gitcode/ci/master/tree/src/protints.s

to confirm that regs[8] is where it should be.

.text 0x00000000 0x10 pdosst32.o
0x00000000 __pdosst32
.text 0x00000010 0x4e0 pdptest.o
0x000001e1 main
.text 0x000004f0 0x5a0 pdos.a(start.o)
0x00000a2c _cexit
0x00000a00 __exit
0x00000a16 _exit
0x00000a32 _c_exit
0x000004f1 __start


88 0220 A1E40400 movl _crash2, %eax
91 022b F73DE004 idivl _crash1
92 0231 A3E40400 movl %eax, _crash2

return address from interrupt should be 231 + 10 + 2559B0 = 255BF1

We have a 255BEB which is wrong by 6 and points to 231 - 6 = 22B

which is the idivl instruction itself.

The thing is, I was expecting to see another EBP
on the stack, because both gotint() and int0()
should be pushing EBP. But maybe I can't see
the one for int0(), and the one for gotint() is the
original regs[8] of 2A46BC. Yes, that would seem
to make sense.

I believe the 0x28 seen on the stack is the segment
register for called code (cs), and 0x30 is called data (ds).

I am wondering whether this:

next regs[8] is 00255BEB
next regs[9] is 00000028

is a pair that constitutes a far pointer and is expected
to the be the subject of a retf. But I was expecting it
to be the subject of an iret, so there should be flags
there, not a segment. Unless iret does all of segment,
address and flags??? But flags are supposedly 0
according to my display.

BFN. Paul.
muta...@gmail.com
2021-05-03 12:31:22 UTC
Permalink
On Sunday, May 2, 2021 at 10:19:24 PM UTC+10, ***@gmail.com wrote:

I have had enormous success. And here is my
current understanding. In MVS convention a
routine is normally entered via R15, and all
registers are saved, which includes R15, so
you know the entry point. In addition there is
often an "eyecatcher".

For 80386 to have the equivalent it would need:

push ebp
mov ebp, esp
call next:
eyecatcher db 'my_routine'
next:
push all registers, not just ones that are going to
be used by this function, and in a consistent
order.

There would appear to be no restriction doing that.

Without that, all is not lost. You do have the return
address available, and if relative addresses are
used exclusively, which is probably the case with
C-generated code, but presumably not function
pointers, you can figure out the address that was
called.

It's probably worth switching to MVS standard, but
maybe that can be an exercise for another day.
Post by ***@gmail.com
I think I'm nearly there. regs[8], which I believe is EBP,
is pointing to some data, where regs[8] has what I need,
I think.
It turns out that my interrupt code switches stack,
and that is actually a pointer to the previous stack.
Post by ***@gmail.com
https://sourceforge.net/p/pdos/gitcode/ci/master/tree/src/protints.s
That has now been deciphered to a large extent, and
here is the documentation:

/ This is for interrupts that should not alter
/ the flags, like the timer interrupt

/ by the time we get here, the following things are on the stack:
/ original eax, original ds (stored as doubleword), original intnum

/ And because this is an interrupt that does not push an error
/ code, above those 3 dwords are the EIP, cs (stored as a
/ dword), and the flags (also stored as a dword). All three of
/ those things will be popped when we do an iret.

/ Above that is completely unpredictable, as it is just whatever
/ the application had pushed onto the stack before the
/ interrupt occurred.

/ gotint() will receive the interrupt plus an array of registers.
/ It will then pass it on to the specific interrupt handler, just
/ passing the register array. This pointer is all that the
/ interrupt will have to work with, so we need to calculate
/ everything from that spot.

/ In addition, the stack is actually switched (to the caller's
/ stack, but with the OS (0x10) ss) prior to the register array
/ being constructed. I think
/ this switch is done to allow the interrupt to terminate the
/ called program if it wishes to do so. The stack pointer before
/ the switch is available on the new stack, after the registers
/ (EAX, EBX, ECX, EDX, ESI, EDI) then cflag, then flags. Although
/ at time of writing, the flags are not being correctly passed
/ up to gotint().

/ We don't switch stack if this is the highest level, ie ss of 0x10
/ Note that when an application is running it will have an ss of
/ 0x30 which is "spawn_data"

/ old stack looks like this at the time of stack switch:
/ unpredictable stack data pushed by application during execution,
/ before interruption
/ flags
/ cs
/ eip
/ eax
/ ds
/ old intnum
/ previous interrupt's ss (saveess)
/ previous interrupt's esp (saveesp)
/ ebx (not sure why it is needed, seems to do with flags)
/ previous interrupt's eax (saveeax)
/ previous interrupt's ebx (saveebx)
/ The above is what saveesp will be pointing to
/ ebp is temporarily stored here, and will remain if there is a
/ stack switch done
Post by ***@gmail.com
return address from interrupt should be 231 + 10 + 2559B0 = 255BF1
We have a 255BEB which is wrong by 6 and points to 231 - 6 = 22B
which is the idivl instruction itself.
I believe the intention here is that you can potentially
fix the data and rerun the instruction rather than
ignore it and continue with the next instruction. That
would especially make sense with a page fault.
Post by ***@gmail.com
next regs[8] is 00255BEB
next regs[9] is 00000028
is a pair that constitutes a far pointer and is expected
to the be the subject of a retf. But I was expecting it
to be the subject of an iret, so there should be flags
there, not a segment. Unless iret does all of segment,
address and flags???
Yes, it does the triplet.
Post by ***@gmail.com
But flags are supposedly 0 according to my display.
The value at the flags position was not being set. Rather
than fix that, I just ignored it and got the flags from the
location on the stack when the interrupt occurred, since
I'm getting other information anyway.

I was able to match up all the data and I will be able to
produce the full stack chain. And "ld" has a "Map" option
and "as" has a "-a" option providing the same ability MVS
has to look at listings to match everything up. I need to
use the "-g" option on gccwin to generate line numbers
into the assembler source. Coding "-O2" along with "-g"
is perfectly fine. Note that "as -a" also puts the C source
code (not just the line number that gccwin produced)
into the assembler listing.

BFN. Paul.



C:\]pdptest
warning - failed to open pdptest.com
warning - still using a.out format
welcome to pdptest
main function is at 00255B91
stack is around 002A471C
Divide by zero fault occurred (Protected Mode Exception 0x0)
EAX 00000005 EBX 0026345F ECX 002A45F0 EDX 00000000
ESI 00000001 EDI 002A471C
module loaded at 002559B0, entry point 002559B0
regs are at 00255820
old stack starts at 002A46BC
EBP is probably 002A472C
interrupt address is 00255BEB
flags are 00010202
EBP chain to EBP is 002A4744
previous function's return address is 0025631F
called address was possibly relative FFFFF872
which would make it absolute address 00255B91
System halting


Additional notes:

main is at 1e1 which makes it 2559B0 + 1e1 = 255B91

Continue reading on narkive:
Loading...