Discussion:
clang bug
(too old to reply)
muta...@gmail.com
2023-03-01 16:06:26 UTC
Permalink
Hello.

#Shouldn't @there be two mov `instructions` for 2 parameters?

I only see the second.

.ident "Android (7714059, based on r416183c1) clang version 12.0.8 (https://android.googlesource.com/toolchain/llvm-project c935d99d7cf2016289302412d708641d52d2f7ee)"

mystart takes 2 parameters. I can see the second parameter:

movl %eax, 4(%esp)

but I don't see the first parameter, which I expect to be inserted
at 0(%esp)

/* written by Paul Edwards */
/* released to the public domain */

#include "errno.h"
#include "stddef.h"

/* malloc calls get this */
static char membuf[31000000];
static char *newmembuf = membuf;

extern int __mystart(int argc, char
**argv);
extern int __exita(int rc);

int *paul;

#ifdef NEED_MPROTECT
extern int __mprotect(void *buf,
size_t len, int prot);



.text
.p2align 4
.globl _start
.type _start, @function
_start:
.LFB0:
.cfi_startproc
endbr32
pushl %ebx
.cfi_def_cfa_offset 8
.cfi_offset 3, -8
subl $8, %esp
.cfi_def_cfa_offset 16
leal 12(%esp), %eax
subl $8, %esp
.cfi_def_cfa_offset 24
movl %eax, paul
leal 24(%esp), %eax
pushl %eax



/* written by Paul Edwards */
/* released to the public domain */

#include "errno.h"
#include "stddef.h"

/* malloc calls get this */
static char membuf[31000000];
static char *newmembuf = membuf;

extern int __mystart(int argc, char
**argv);
extern int __exita(int rc);

int *paul;

#ifdef NEED_MPROTECT
extern int __mprotect(void *buf,
size_t len, int prot);



.text
.file "linstart.c"
.globl _start # -- Begin function _start
.p2align 4, 0x90
.type _start,@function
_start: # @_start
# %bb.0:
pushl %esi
subl $8, %esp
leal 12(%esp), %eax
movl %eax, paul
leal 16(%esp), %eax
movl %eax, 4(%esp)
calll __mystart
movl %eax, %esi
movl %eax, (%esp)
calll __exita
movl %esi, %eax
addl $8, %esp
popl %esi
retl
.Lfunc_end0:



/* Startup code for Linux */
/* written by Paul Edwards */
/* released to the public domain */

#include "errno.h"
#include "stddef.h"

/* malloc calls get this */
static char membuf[31000000];
static char *newmembuf = membuf;

extern int __mystart(int argc, char **argv);
extern int __exita(int rc);

int *paul;

#ifdef NEED_MPROTECT
extern int __mprotect(void *buf, size_t len, int prot);

#define PROT_READ 1
#define PROT_WRITE 2
#define PROT_EXEC 4
#endif

/* We can get away with a minimal startup code, plus make it
a C program. There is no return address. Instead, on the
stack is a count, followed by all the parameters as pointers */

int _start(char *p)
{
int rc;
char *argv[2] = { "prog", NULL };

#ifdef NEED_MPROTECT
/* make malloced memory executable */
/* most environments already make the memory executable */
/* but some certainly don't */
/* there doesn't appear to be a syscall to get the page size to
ensure page alignment (as required), and I read that some
environments have 4k page sizes but mprotect requires 16k
alignment. So for now we'll just go with 16k */
size_t blksize = 16 * 1024;
size_t numblks;

newmembuf = membuf + blksize; /* could waste memory here */
newmembuf = newmembuf - (unsigned int)newmembuf % blksize;
numblks = sizeof membuf / blksize;
numblks -= 2; /* if already aligned, we wasted an extra block */
rc = __mprotect(newmembuf,
numblks * blksize,
PROT_READ | PROT_WRITE | PROT_EXEC);
if (rc != 0) return (rc);
#endif

/* I don't know what the official rules for ARM are, but
looking at the stack on entry showed that this code
would work */
#ifdef __ARM__

#if defined(__UNOPT__)
rc = __mystart(*(int *)(&p + 5), &p + 6);
#else
rc = __start(*(int *)(&p + 6), &p + 7);
#endif

#else
paul = (int *)(&p - 1);
rc = __mystart(*(int *)(&p - 1), &p);
/* rc = __start(1, argv); */
#endif
__exita(rc);
return (rc);
}


void *__allocmem(size_t size)
{
return (newmembuf);
}


#if defined(__WATCOMC__)

#define CTYP __cdecl

/* this is invoked by long double manipulations
in stdio.c and needs to be done properly */

int CTYP _CHP(void)
{
return (0);
}

/* don't know what these are */

void CTYP cstart_(void) { return; }
void CTYP _argc(void) { return; }
void CTYP argc(void) { return; }
void CTYP _8087(void) { return; }

#endif



Holy cow I need a real computer
muta...@gmail.com
2023-03-02 04:53:49 UTC
Permalink
Ok, I'm back on my computer now and was able to raise a bug report:

https://github.com/llvm/llvm-project/issues/61112

BFN. Paul.
Alexei A. Frounze
2023-03-02 05:34:08 UTC
Permalink
Post by ***@gmail.com
https://github.com/llvm/llvm-project/issues/61112
BFN. Paul.
Without looking any further, you're taking address of a parameter (p),
and doing pointer arithmetic on it. The only valid arithmetic here
would be adding 0 or 1, but not 5, 6 or 7. But even if you add 1 to &p,
you aren't allowed to dereference it (as in *(&p + 1)).
You can only subtract the 1 back (as in (&p + 1) - 1) or compute
a difference of that and &p (as in (&p + 1) - &p or the other way around).
Your shady pointer manipulation is undefined behavior and
I almost guarantee your clang bug being soon closed as invalid.

Write clean code. OK, write cleaner code. Maybe, refresh your ANSI C
to get there.

Alexey
muta...@gmail.com
2023-03-02 06:47:08 UTC
Permalink
Post by Alexei A. Frounze
Post by ***@gmail.com
https://github.com/llvm/llvm-project/issues/61112
BFN. Paul.
Without looking any further, you're taking address of a parameter (p),
and doing pointer arithmetic on it. The only valid arithmetic here
would be adding 0 or 1, but not 5, 6 or 7. But even if you add 1 to &p,
you aren't allowed to dereference it (as in *(&p + 1)).
You can only subtract the 1 back (as in (&p + 1) - 1) or compute
a difference of that and &p (as in (&p + 1) - &p or the other way around).
Your shady pointer manipulation is undefined behavior and
I almost guarantee your clang bug being soon closed as invalid.
Write clean code. OK, write cleaner code. Maybe, refresh your ANSI C
to get there.
"undefined behavior" shouldn't mean "let's leave the stack
uninitialized, and not even issue a warning". It just means
you can't guarantee what the result will be.

But you can still give a sensible result.

In this case, I am trying to inspect the stack. I'm using
knowledge of the x86 as to what the stack looks like,
and the calling convention.

Yes. That code won't work on every platform in the world.

But it should work perfectly fine when I know the layout.

It is not in the spirit of C to catch every undefined behavior
and throw an error. And in this case, even throwing an error
would be better than silent failure and garbage.

Even if setting a pointer to the arbitrary value 0xb8000 and
indexing up a 1000 bytes is undefined behavior, doesn't
mean it should be disallowed either.

BFN. Paul.
Alexei A. Frounze
2023-03-02 08:30:11 UTC
Permalink
Post by ***@gmail.com
Post by Alexei A. Frounze
Post by ***@gmail.com
https://github.com/llvm/llvm-project/issues/61112
BFN. Paul.
Without looking any further, you're taking address of a parameter (p),
and doing pointer arithmetic on it. The only valid arithmetic here
would be adding 0 or 1, but not 5, 6 or 7. But even if you add 1 to &p,
you aren't allowed to dereference it (as in *(&p + 1)).
You can only subtract the 1 back (as in (&p + 1) - 1) or compute
a difference of that and &p (as in (&p + 1) - &p or the other way around).
Your shady pointer manipulation is undefined behavior and
I almost guarantee your clang bug being soon closed as invalid.
Write clean code. OK, write cleaner code. Maybe, refresh your ANSI C
to get there.
"undefined behavior" shouldn't mean "let's leave the stack
uninitialized, and not even issue a warning". It just means
you can't guarantee what the result will be.
This is wishful thinking. The standard means that all bets are off
regardless of your thinking of what should or should not be.
Post by ***@gmail.com
But you can still give a sensible result.
Sure. Garbage in, garbage out, as is the case here, is a sensible result.
Post by ***@gmail.com
In this case, I am trying to inspect the stack. I'm using
knowledge of the x86 as to what the stack looks like,
and the calling convention.
If the compiler inlines some code or uses link-time optimization,
there may be no stack at all. This likely won't happen if the caller
is some assembly code, but throw in enough C and it becomes
a possibility.
Post by ***@gmail.com
Yes. That code won't work on every platform in the world.
But it should work perfectly fine when I know the layout.
If you use an ancient compiler or disable optimizations in a modern
one, maybe.
Post by ***@gmail.com
It is not in the spirit of C to catch every undefined behavior
and throw an error. And in this case, even throwing an error
would be better than silent failure and garbage.
The standard does not require to detect undefined behavior
at compile or run time and inform of it. In some cases such
detection is impractically expensive if not outright impossible.
Post by ***@gmail.com
Even if setting a pointer to the arbitrary value 0xb8000 and
indexing up a 1000 bytes is undefined behavior, doesn't
mean it should be disallowed either.
That's less relevant to your current problem, I think.

Alex
Alexei A. Frounze
2023-03-02 08:40:43 UTC
Permalink
Post by Alexei A. Frounze
Post by ***@gmail.com
Post by Alexei A. Frounze
Post by ***@gmail.com
https://github.com/llvm/llvm-project/issues/61112
BFN. Paul.
Without looking any further, you're taking address of a parameter (p),
and doing pointer arithmetic on it. The only valid arithmetic here
would be adding 0 or 1, but not 5, 6 or 7. But even if you add 1 to &p,
you aren't allowed to dereference it (as in *(&p + 1)).
You can only subtract the 1 back (as in (&p + 1) - 1) or compute
a difference of that and &p (as in (&p + 1) - &p or the other way around).
Your shady pointer manipulation is undefined behavior and
I almost guarantee your clang bug being soon closed as invalid.
Write clean code. OK, write cleaner code. Maybe, refresh your ANSI C
to get there.
"undefined behavior" shouldn't mean "let's leave the stack
uninitialized, and not even issue a warning". It just means
you can't guarantee what the result will be.
This is wishful thinking. The standard means that all bets are off
regardless of your thinking of what should or should not be.
Post by ***@gmail.com
But you can still give a sensible result.
Sure. Garbage in, garbage out, as is the case here, is a sensible result.
Post by ***@gmail.com
In this case, I am trying to inspect the stack. I'm using
knowledge of the x86 as to what the stack looks like,
and the calling convention.
If the compiler inlines some code or uses link-time optimization,
there may be no stack at all. This likely won't happen if the caller
is some assembly code, but throw in enough C and it becomes
a possibility.
Post by ***@gmail.com
Yes. That code won't work on every platform in the world.
But it should work perfectly fine when I know the layout.
If you use an ancient compiler or disable optimizations in a modern
one, maybe.
Post by ***@gmail.com
It is not in the spirit of C to catch every undefined behavior
and throw an error. And in this case, even throwing an error
would be better than silent failure and garbage.
The standard does not require to detect undefined behavior
at compile or run time and inform of it. In some cases such
detection is impractically expensive if not outright impossible.
Post by ***@gmail.com
Even if setting a pointer to the arbitrary value 0xb8000 and
indexing up a 1000 bytes is undefined behavior, doesn't
mean it should be disallowed either.
That's less relevant to your current problem, I think.
Alex
If you really want to manipulate stuff directly on the stack,
declare a structure type with those things that will be on
the stack. On the caller side push the struct contents to the
stack and pass ESP as an argument to your C function
(make sure to abide by the ABI w.r.t. saved and non-saved
registers and ESP alignment (ESP may be expected to be
a multiple of 8 or 16, not just a multiple of 4)).

Alex
muta...@gmail.com
2023-03-02 08:59:38 UTC
Permalink
Post by Alexei A. Frounze
Post by ***@gmail.com
"undefined behavior" shouldn't mean "let's leave the stack
uninitialized, and not even issue a warning". It just means
you can't guarantee what the result will be.
This is wishful thinking. The standard means that all bets are off
regardless of your thinking of what should or should not be.
Sure. The standard doesn't stop the compiler from
detecting the unportable code and reformatting my
hard disk. That doesn't mean it should. There is a
sensible thing to do, and it could have done it.

There's even an asshole thing to do - throw a compile
error.

But it's simply diabolical to introduce a random value.
Post by Alexei A. Frounze
Post by ***@gmail.com
But you can still give a sensible result.
Sure. Garbage in, garbage out, as is the case here, is a sensible result.
No it isn't. The perfectly fine stack manipulation is the
sensible result.
Post by Alexei A. Frounze
Post by ***@gmail.com
In this case, I am trying to inspect the stack. I'm using
knowledge of the x86 as to what the stack looks like,
and the calling convention.
If the compiler inlines some code or uses link-time optimization,
there may be no stack at all. This likely won't happen if the caller
is some assembly code, but throw in enough C and it becomes
a possibility.
The caller is an external unit of work. The parameter passing
mechanism needs to be fixed in place.

It is the equivalent of assembler.

That is what I want - a compile time option that assumes that
this is an independent body of work called by assembler.

If only ancient compilers support that concept, then fine,
I'm going to use a modern ancient compiler.
Post by Alexei A. Frounze
Post by ***@gmail.com
Yes. That code won't work on every platform in the world.
But it should work perfectly fine when I know the layout.
If you use an ancient compiler or disable optimizations in a modern
one, maybe.
Or use gcc. It didn't have a problem.
Post by Alexei A. Frounze
Post by ***@gmail.com
It is not in the spirit of C to catch every undefined behavior
and throw an error. And in this case, even throwing an error
would be better than silent failure and garbage.
The standard does not require to detect undefined behavior
at compile or run time and inform of it. In some cases such
detection is impractically expensive if not outright impossible.
See above about reformatting the hard disk.
Post by Alexei A. Frounze
Post by ***@gmail.com
Even if setting a pointer to the arbitrary value 0xb8000 and
indexing up a 1000 bytes is undefined behavior, doesn't
mean it should be disallowed either.
That's less relevant to your current problem, I think.
No. It's exactly relevant.
Post by Alexei A. Frounze
If you really want to manipulate stuff directly on the stack,
declare a structure type with those things that will be on
the stack. On the caller side push the struct contents to the
stack and pass ESP as an argument to your C function
(make sure to abide by the ABI w.r.t. saved and non-saved
registers and ESP alignment (ESP may be expected to be
a multiple of 8 or 16, not just a multiple of 4)).
I can't control the external caller.

Which is the Linux OS.

BFN. Paul.

Loading...