Discussion:
subc
(too old to reply)
muta...@gmail.com
2022-01-26 00:57:32 UTC
Permalink
I have had considerable success in modifying subc
(a public domain C90-subset) to at least be able to
compile PDOS-generic.

I have high hopes for this combination of public
domain OS and public domain C compiler.

My modified version can be found in subc*.zip in
custom.zip from http://pdos.org

BFN. Paul.
Rod Pemberton
2022-01-29 19:52:12 UTC
Permalink
On Tue, 25 Jan 2022 16:57:32 -0800 (PST)
Post by ***@gmail.com
I have had considerable success in modifying subc
(a public domain C90-subset) to at least be able to
compile PDOS-generic.
I have high hopes for this combination of public
domain OS and public domain C compiler.
My modified version can be found in subc*.zip in
custom.zip from http://pdos.org
Did you ditch Alexei Frounze's Smaller C?

Being able to compile your OS with a minimal C
compiler is a good thing, if you're boot-
strapping the OS for the first time. However,
to code a fully developed OS, you need some
of those more advanced features of C, or you
must code replacement functionality yourself.

(list of some C compilers)
https://gist.github.com/P1n3appl3/7512d3526d165c16c8668d4f85afa20d

Of the C compilers on that list, I'd probably
look at TCC or LCC. TCC is capable of compiling
Linux on the fly. It's also small and fast.
LCC is the base compiler for LCC-Win32/64.
Small C (by Ron Cain) was a bit too limited,
missing structs etc. SubC is also very limited.
I recall looking at LadSoft's CC386 for Win32,
but I didn't use it. LadSoft's new project
is OrangeC. As I've stated previously, I'm
using DJGPP (GCC based) and OpenWatcom.
--
Biden is proving that Trump was correct all along.
muta...@gmail.com
2022-01-29 20:36:48 UTC
Permalink
Post by Rod Pemberton
Post by ***@gmail.com
I have had considerable success in modifying subc
(a public domain C90-subset) to at least be able to
compile PDOS-generic.
I have high hopes for this combination of public
domain OS and public domain C compiler.
My modified version can be found in subc*.zip in
custom.zip from http://pdos.org
Did you ditch Alexei Frounze's Smaller C?
I only ever used it for its huge memory model DOS
support. But when I found out that Watcom had
proper support for huge memory model, using 8086
instructions, I switched to that. But in the last few
hours I realized I could probably switch again, by
modifying SubC to provide the required huge memory
model support, and only 32-bit integers.

The other thing I am interested in doing is adding
support for S/370. Since I want my work to be public
domain, I need a public domain base, not someone
else's explicitly copyrighted work.
Post by Rod Pemberton
Being able to compile your OS with a minimal C
compiler is a good thing, if you're boot-
strapping the OS for the first time. However,
to code a fully developed OS, you need some
of those more advanced features of C, or you
must code replacement functionality yourself.
With modifications that I am willing to make, my
FAT processing code and my memory manager
were able to be compiled, meaning PDOS-generic
could be built, which is my main interest.

What features do you think are lacking in SubC?

BFN. Paul.
a***@math.uni.wroc.pl
2022-01-30 00:24:43 UTC
Permalink
Post by ***@gmail.com
With modifications that I am willing to make, my
FAT processing code and my memory manager
were able to be compiled, meaning PDOS-generic
could be built, which is my main interest.
What features do you think are lacking in SubC?
First few that I checked:
- macros with parameters
- structs that contain other structs as members
- returning structs by value (compiler accepts this but generates
code which can not work)
- intializing structures within function
- initializing structure containing array
- passing structure by value as argument of function
- two dimensional arrays

And of course low QOI: poor object code and misleading diagnostics.
--
Waldek Hebisch
muta...@gmail.com
2022-01-30 00:48:29 UTC
Permalink
Post by a***@math.uni.wroc.pl
Post by ***@gmail.com
With modifications that I am willing to make, my
FAT processing code and my memory manager
were able to be compiled, meaning PDOS-generic
could be built, which is my main interest.
What features do you think are lacking in SubC?
- macros with parameters
I run "pdcc -E" (also public domain, ships with PDOS)
as a step prior to executing SubC, so I don't care about
preprocessor limitations in SubC as #line is the only
thing it ever sees.
Post by a***@math.uni.wroc.pl
- structs that contain other structs as members
- returning structs by value (compiler accepts this but generates
code which can not work)
- intializing structures within function
- initializing structure containing array
- passing structure by value as argument of function
- two dimensional arrays
I haven't had a need for any of that stuff in my OS-related
code so far, and can presumably work around it if I do. Or
I may be able to modify SubC myself, as I have bought
Nil's book and am studying the code. Some of it was too
difficult for me to understand though, so I'm hoping no
problem occurs there. I have noticed that I'm not very good
with algorithms. Most code that I have written hasn't
involved complex algorithms. And so far I haven't been able
to find anyone willing to modify SubC even if I pay them.

BFN. Paul.
a***@math.uni.wroc.pl
2022-01-30 17:01:59 UTC
Permalink
Post by ***@gmail.com
Post by a***@math.uni.wroc.pl
Post by ***@gmail.com
With modifications that I am willing to make, my
FAT processing code and my memory manager
were able to be compiled, meaning PDOS-generic
could be built, which is my main interest.
What features do you think are lacking in SubC?
- macros with parameters
I run "pdcc -E" (also public domain, ships with PDOS)
as a step prior to executing SubC, so I don't care about
preprocessor limitations in SubC as #line is the only
thing it ever sees.
Post by a***@math.uni.wroc.pl
- structs that contain other structs as members
- returning structs by value (compiler accepts this but generates
code which can not work)
- intializing structures within function
- initializing structure containing array
- passing structure by value as argument of function
- two dimensional arrays
I haven't had a need for any of that stuff in my OS-related
code so far, and can presumably work around it if I do. Or
I may be able to modify SubC myself, as I have bought
Nil's book and am studying the code. Some of it was too
difficult for me to understand though, so I'm hoping no
problem occurs there. I have noticed that I'm not very good
with algorithms. Most code that I have written hasn't
involved complex algorithms. And so far I haven't been able
to find anyone willing to modify SubC even if I pay them.
Well, compiler algorithms were hard to invent. But now
they are described in books and it is not hard to
reproduce algorithm from a book.

Concerning SubC, it looks like rather bad start for
a full C compiler. There are design decisions that
make it more complex than necessary simultanously
limiting what it can do. In particular using two symbol
tables and acumulator model for code generation.

OTOH first Google hit on SubC is:

https://github.com/DoctorWkt/SubC

where a guy at the end is doing some modifications
to SubC...
--
Waldek Hebisch
Alexei A. Frounze
2022-01-30 19:40:23 UTC
Permalink
...
Post by a***@math.uni.wroc.pl
Post by ***@gmail.com
I haven't had a need for any of that stuff in my OS-related
code so far, and can presumably work around it if I do. Or
I may be able to modify SubC myself, as I have bought
Nil's book and am studying the code. Some of it was too
difficult for me to understand though, so I'm hoping no
problem occurs there. I have noticed that I'm not very good
with algorithms. Most code that I have written hasn't
involved complex algorithms. And so far I haven't been able
to find anyone willing to modify SubC even if I pay them.
[This is more of a response to Paul than Waldek].

The code has to be understandable and has to have some
potential for anyone to mess with it.
I think so far only Ben Lunt has been willing to go through the
ugly Smaller C source to try to adapt it to his needs.
I haven't heard of any other attempts to get into it as deep
as he did.
I'm not willing to make improvements into Smaller C myself
because Smaller C suffers from the same design problems that
SubC and the original Small C (by Hendrix and Cain) do.
It's hard to modify it in order to extend and improve significantly
beyond what's already there.
That's the very thing Waldek is saying below.
...
Post by a***@math.uni.wroc.pl
Concerning SubC, it looks like rather bad start for
a full C compiler. There are design decisions that
make it more complex than necessary simultanously
limiting what it can do. In particular using two symbol
tables and acumulator model for code generation.
Yep, there are many different design problems that
are very limiting further development. It's more fruitful to
redo the bad design and rewrite the implementation.

Alex
muta...@gmail.com
2022-01-31 04:31:21 UTC
Permalink
Post by Alexei A. Frounze
I'm not willing to make improvements into Smaller C myself
because Smaller C suffers from the same design problems that
SubC and the original Small C (by Hendrix and Cain) do.
It's hard to modify it in order to extend and improve significantly
beyond what's already there.
Interesting.
Post by Alexei A. Frounze
Yep, there are many different design problems that
are very limiting further development. It's more fruitful to
redo the bad design and rewrite the implementation.
Yes, it would be great if someone was willing to sit
down and write the 400,000 lines of GCC 3.2.3 and
release it to the public domain.

It hasn't happened in the last 50 years and I'm not
particularly expecting that to change (although
gcc 3.2.3 will eventually fall into the public domain
if you can figure out who the longest-living author
is, wait for them to die, then wait another 70 years).

We have to work within this natural phenomenon.

There is a small number of people, close to 1, who have the
skills required to write a C compiler and a willingness to
release that unconditionally.

Then there are a small number of people, possibly also
close to 1, who actually appreciate that and are willing to
use that inferior product rather than gcc/Visual C/clang.

So for it "to happen" you can't just choose the perfect
option. You have to make tough decisions within
realistic choices.

I can live with int = long = short = 32-bits. That doesn't
stop it from being C90-compliant. In my natural coding
style I never code "short" anyway.

I also never use floating point and I'm surprised C90
puts a burden to support complicated mathematical
functions. My response to that is to trim C90. I'm doing
something different from "the market" which is racing
ahead with new versions of the C standard, or entirely
new languages.

So what is achievable within the constraints which I may
or may not have sufficiently outlined.

BFN. Paul.
Rod Pemberton
2022-02-02 06:38:50 UTC
Permalink
On Sun, 30 Jan 2022 20:31:21 -0800 (PST)
"***@gmail.com" <***@gmail.com> wrote:

[snip]
Post by ***@gmail.com
We have to work within this natural phenomenon.
That's why I keep bringing up open source code.

I'm not really a fan of open source, but one
needs to have plenty of tools available, and
there are only so many choices out there. Most
of the free stuff has migrated to open source.
So, if you don't have a lot of money ...
Post by ***@gmail.com
There is a small number of people, close to 1, who have the
skills required to write a C compiler and a willingness to
release that unconditionally.
Then there are a small number of people, possibly also
close to 1, who actually appreciate that and are willing to
use that inferior product rather than gcc/Visual C/clang.
What I've been hoping for - about a decade now - is a
transpiler for C. A transpiler is a source-to-source
compiler, e.g., Fortran-to-C etc.

Specifically, I'd like a C-to-C compiler. The compiler
would take modern C99 or C11 etc and output the code as
C89 or maybe even K&R C. Ideally, a valid subset of C
for C89 would be even better.

It would be great if the CompCert C compiler project
had this feature. Then, you could "dumb-ify" any piece
of C code, so that it would bootstrap and compile with
any old simple C compiler. Other projects that might
be ideal for this could be LLVM or Necula's CIL or
maybe even TCC. Maybe even GCC with a new machine
description could work.
--
Is Biden up to his neck, soon to drown?
Scott Lurndal
2022-02-02 15:54:49 UTC
Permalink
Post by Rod Pemberton
On Sun, 30 Jan 2022 20:31:21 -0800 (PST)
Specifically, I'd like a C-to-C compiler. The compiler
would take modern C99 or C11 etc and output the code as
C89 or maybe even K&R C. Ideally, a valid subset of C
for C89 would be even better.
Why on earth would you want this?
Rod Pemberton
2022-02-05 07:03:43 UTC
Permalink
On Wed, 02 Feb 2022 15:54:49 GMT
Post by Scott Lurndal
Post by Rod Pemberton
On Sun, 30 Jan 2022 20:31:21 -0800 (PST)
Specifically, I'd like a C-to-C compiler. The compiler
would take modern C99 or C11 etc and output the code as
C89 or maybe even K&R C. Ideally, a valid subset of C
for C89 would be even better.
Why on earth would you want this?
Why on earth did you stop reading at that point? ...
In the middle of the last paragraph:

RP> [...]
RP> Then, you could "dumb-ify" any piece
RP> of C code, so that it would bootstrap
RP> and compile with any old simple C compiler.
RP> [...]
--
Is Biden up to his neck, soon to drown?
Scott Lurndal
2022-02-05 15:58:35 UTC
Permalink
Post by Rod Pemberton
On Wed, 02 Feb 2022 15:54:49 GMT
Post by Scott Lurndal
Post by Rod Pemberton
On Sun, 30 Jan 2022 20:31:21 -0800 (PST)
Specifically, I'd like a C-to-C compiler. The compiler
would take modern C99 or C11 etc and output the code as
C89 or maybe even K&R C. Ideally, a valid subset of C
for C89 would be even better.
Why on earth would you want this?
Why on earth did you stop reading at that point? ...
RP> [...]
RP> Then, you could "dumb-ify" any piece
RP> of C code, so that it would bootstrap
RP> and compile with any old simple C compiler.
RP> [...]
The question stands - why would you want to do that?
Rod Pemberton
2022-02-07 07:01:39 UTC
Permalink
On Sat, 05 Feb 2022 15:58:35 GMT
Post by Scott Lurndal
Post by Rod Pemberton
On Wed, 02 Feb 2022 15:54:49 GMT
Post by Scott Lurndal
Post by Rod Pemberton
On Sun, 30 Jan 2022 20:31:21 -0800 (PST)
Specifically, I'd like a C-to-C compiler. The compiler
would take modern C99 or C11 etc and output the code as
C89 or maybe even K&R C. Ideally, a valid subset of C
for C89 would be even better.
Why on earth would you want this?
Why on earth did you stop reading at that point? ...
RP> [...]
RP> Then, you could "dumb-ify" any piece
RP> of C code, so that it would bootstrap
RP> and compile with any old simple C compiler.
RP> [...]
The question stands
No, it really doesn't ... It was answered.
Post by Scott Lurndal
why would you want to do that?
Why would someone want to bootstrap code to a new OS?
Is that really the question that you're asking me? ...


This is alt.os.development. We generally discuss
coding operating systems, actual real-world code,
which we've written ourselves. So, you've managed
to code an OS, you've bootstrapped a simple C
compiler or a home-brew assembler for your OS.
Once you've coded your own operating system (OS),
you next need some utilities to develop your OS
further. Your choices are to write them yourself,
or bootstrap code someone(s) else wrote, or
cross-compile code someone(s) else wrote.
Cross-compiling is obviously the last choice here,
because doing so usually requires significant
porting of the utility, and may require updates or
modifications to the cross-compiler tool chain.
So, you go find a bunch of pre-written utilities,
e.g., think GNUish MS-DOS project, so that you can
start using your OS and develop your own utilities.
However, you can only do that, if you can bootstrap
those utilities. Bootstrapping requires that the
utilities compile on your OS using the simple C
compiler you've bootstrapped, or assembled with
your home-brew assembler. Alternately, you
may wish to bootstrap a more powerful C compiler,
with the initial compile being done by that
simple C compiler. Most modern C compilers
aren't capable of self-hosting anymore, i.e.,
bootstrapping, meaning that you need a powerful
C compiler in order to compile the powerful C
compiler you want for your OS. Hence, you're
back to some type of cross-compile, which
requires porting of the utility and possibly
tool chain updates or modifications.

Clear?

(Yes, I'm already assuming that you disagree.)
--
Is Biden up to his neck, soon to drown?
muta...@gmail.com
2022-02-07 08:23:38 UTC
Permalink
which we've written ourselves. So, you've managed
to code an OS, you've bootstrapped a simple C
compiler or a home-brew assembler for your OS.
simple C compiler. Most modern C compilers
aren't capable of self-hosting anymore, i.e.,
bootstrapping,
I'm not 100% sure I understand, but what is the difference
between running a simple C compiler and running the
modified, C90-compliant GCC 3.2.3 that I created?

A larger executable (3 MB), more memory required to run
it, but is that an issue?

BFN. Paul.
Scott Lurndal
2022-02-07 15:16:34 UTC
Permalink
Post by Rod Pemberton
On Sat, 05 Feb 2022 15:58:35 GMT
Post by Scott Lurndal
The question stands
No, it really doesn't ... It was answered.
Post by Scott Lurndal
why would you want to do that?
Why would someone want to bootstrap code to a new OS?
Is that really the question that you're asking me? ...
This is alt.os.development. We generally discuss
coding operating systems, actual real-world code,
which we've written ourselves.
Actually, you discuss using ancient, obsolete operating
systems (DOS, PDDOS) on ridiculously obsolete hardware
with crippled compilers, with contempt dripping from your
posts about anything more modern or widely used.
s***@yahoo.com
2022-02-02 19:28:25 UTC
Permalink
Post by Rod Pemberton
On Sun, 30 Jan 2022 20:31:21 -0800 (PST)
[snip]
Post by ***@gmail.com
We have to work within this natural phenomenon.
That's why I keep bringing up open source code.
I'm not really a fan of open source, but one
needs to have plenty of tools available, and
there are only so many choices out there. Most
of the free stuff has migrated to open source.
So, if you don't have a lot of money ...
Post by ***@gmail.com
There is a small number of people, close to 1, who have the
skills required to write a C compiler and a willingness to
release that unconditionally.
Then there are a small number of people, possibly also
close to 1, who actually appreciate that and are willing to
use that inferior product rather than gcc/Visual C/clang.
What I've been hoping for - about a decade now - is a
transpiler for C. A transpiler is a source-to-source
compiler, e.g., Fortran-to-C etc.
Specifically, I'd like a C-to-C compiler. The compiler
would take modern C99 or C11 etc and output the code as
C89 or maybe even K&R C. Ideally, a valid subset of C
for C89 would be even better.
It would be great if the CompCert C compiler project
had this feature. Then, you could "dumb-ify" any piece
of C code, so that it would bootstrap and compile with
any old simple C compiler. Other projects that might
be ideal for this could be LLVM or Necula's CIL or
maybe even TCC. Maybe even GCC with a new machine
description could work.
--
Is Biden up to his neck, soon to drown?
Well, for a start, there is this.. (I think you may be aware of..)

/* Copyright (C) 1989, 1997, 1998, 1999 Aladdin Enterprises. All rights reserved. */

/*$Id: ansi2knr.c,v 1.14 1999/04/13 14:44:33 meyering Exp $*/
/* Convert ANSI C function definitions to K&R ("traditional C") syntax */

/*
ansi2knr is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY. No author or distributor accepts responsibility to anyone for the
consequences of using it or for whether it serves any particular purpose or
works at all, unless he says so in writing. Refer to the GNU General Public
License (the "GPL") for full details.

Everyone is granted permission to copy, modify and redistribute ansi2knr,
but only under the conditions described in the GPL. A copy of this license
is supposed to have been given to you along with ansi2knr so you can know
your rights and responsibilities. It should be in a file named COPYLEFT,
or, if there is no file named COPYLEFT, a file named COPYING. Among other
things, the copyright notice and this notice must be preserved on all
copies.

We explicitly state here what we believe is already implied by the GPL: if
the ansi2knr program is distributed as a separate set of sources and a
separate executable file which are aggregated on a storage medium together
with another program, this in itself does not bring the other program under
the GPL, nor does the mere fact that such a program or the procedures for
constructing it invoke the ansi2knr executable bring any other part of the
program under the GPL.
*/


~~ other points, more broadly ~~

o Small-C has structures, <if> you will allow an array[] to qualify as a primitive structure type.
It certainly treats an array as a structure of symbol, where char sym[0..9] holds a fixed length symbol name;
sym[10] holds a type value, ... sym[12] holds the low byte and sym[13] holds the high byte of a word sized pointer. AIR
So here we have, a string, byte data, and pointer, as elements of a structure, implemented as an array of char. A bit tortuous perhaps.

You know, getting a HLL compiler that can compile itself, on a micro, has its own allure, and 'subsets' of C was an initial solution. [*]

o In the 8080/8086 world the pointer & int are treated as the same type in Small-C, ah because they are ~~ until you choose to deal with 'far pointers' of segment:offset.

[*] And, (I believe) the allure of a HLL is grant of flow control contructs of {if,then,else} , {do while}, etc. ~~ which are more obscure to implement in an asm program, without macro preprocessing. -- Yet there is RATFOR (Rational Fortran) for Fortran.
Allot of what I'm saying should be taken in historical context of the times they came about, the hardware available at those times, and the solutions that fit to them.

o Small-C is a single pass compiler, yet other commercial products used multi-pass compiling with intermediate results held in temporary files. Say a first pass to collect function identifiers & their meta-data, would eliminate the 'forward reference' problems and the need to prototype functions in a header.h, make the compiler do that.


Steve
a***@math.uni.wroc.pl
2022-02-05 00:26:56 UTC
Permalink
Post by s***@yahoo.com
~~ other points, more broadly ~~
o Small-C has structures, <if> you will allow an array[] to qualify as a primitive structure type.
No, I do not. Eqally well you can say that Unix 'cp' is C compiler,
just pretend that assembler is C...
Post by s***@yahoo.com
It certainly treats an array as a structure of symbol, where char sym[0..9] holds a fixed length symbol name;
sym[10] holds a type value, ... sym[12] holds the low byte and sym[13] holds the high byte of a word sized pointer. AIR
So here we have, a string, byte data, and pointer, as elements of a structure, implemented as an array of char. A bit tortuous perhaps.
You know, getting a HLL compiler that can compile itself, on a micro, has its own allure, and 'subsets' of C was an initial solution. [*]
Going for subset is quite reasonable. But structures are absolutely
fundamental and need only tiny code to implement.
Post by s***@yahoo.com
[*] And, (I believe) the allure of a HLL is grant of flow control contructs of {if,then,else} , {do while}, etc. ~~ which are more obscure to implement in an asm program, without macro preprocessing. -- Yet there is RATFOR (Rational Fortran) for Fortran.
Well, first thing is artithmetic expressions (including array indexing),
that is basically what early Fortran was about. Than you there are
structures (one good thing in Cobol, copied and improved in later
languages). Some assemblers (IBM mainframe assember and I think
also MASM) have resonably good support for structures. It would
be silly to create "HLL" that is lower level than assemblers.

Of course, structural control statements are now also indispensable
part of HLL, but they came third, after expressions and structures.
Post by s***@yahoo.com
Allot of what I'm saying should be taken in historical context of the times they came about, the hardware available at those times, and the solutions that fit to them.
o Small-C is a single pass compiler, yet other commercial products used multi-pass compiling with intermediate results held in temporary files. Say a first pass to collect function identifiers & their meta-data, would eliminate the 'forward reference' problems and the need to prototype functions in a header.h, make the compiler do that.
Single pass versus multi pass is almost as old as compilers.
Burroughs early Algol is claimed to be single pass. Basically
when you want simple compiler and have enough RAM to hold
whole compiler + symbol table in RAM there are advantages to
single pass approach. You may want multiple passes for optimization
or to fit compiler in avaliable RAM. In ACK C there was
pass to eliminate unused prototypes. It was done because if
you include several headers symbol table may get big.
In ACK next pass could use much smaller symbol table,
freeing memory for other uses.
--
Waldek Hebisch
muta...@gmail.com
2022-01-31 04:17:14 UTC
Permalink
Well, compiler algorithms were hard to invent. But now
they are described in books and it is not hard to
reproduce algorithm from a book.
It is when you don't understand the algorithm in the book.

My brain wasn't designed for complex algorithms.

I have other skills though.
Concerning SubC, it looks like rather bad start for
a full C compiler.
"full" meaning what level of the standard? I'm only
interested in C90.

And if there is difficulty in getting to C90, especially
finding someone willing to write "released to the
public domain", I'm interested in trimming down the
C90 standard to something that is actually
achievable.
There are design decisions that
make it more complex than necessary simultanously
limiting what it can do. In particular using two symbol
tables and acumulator model for code generation.
Could you explain these problems in more detail please?
I'd like to understand what is wrong with it.
https://github.com/DoctorWkt/SubC
where a guy at the end is doing some modifications
to SubC...
Thanks. I emailed him to see if we can maybe work together.

BFN. Paul.
a***@math.uni.wroc.pl
2022-01-31 19:56:15 UTC
Permalink
Post by ***@gmail.com
Well, compiler algorithms were hard to invent. But now
they are described in books and it is not hard to
reproduce algorithm from a book.
It is when you don't understand the algorithm in the book.
My brain wasn't designed for complex algorithms.
I have other skills though.
Hmm, algorithms in compiler are really not more complicated
that algorithms in OS, database, BBS or even text editor.

Actually main thing is bookkeeping: you need to make sure
that right information is at places that need it.

There is problem of scale if you go for full compiler.
But this is not too bad. Self compiling toy compiler
for subset of C should be doable in 2-3 kloc. At about
20-25 kloc you should be able to get C90 compilance.
In our time those are not really big programs.
Post by ***@gmail.com
Concerning SubC, it looks like rather bad start for
a full C compiler.
"full" meaning what level of the standard? I'm only
interested in C90.
Already C90 have many things which are problematic to add
to SubC. For me non-negotiable is full support for structs.
That does not need much code, but AFAICS it includes change
of frequently used compiler data structures, that is lot
of code to change.

Floating point is another thing: it is reasonable to skip
floating point in initial toy and add it only at later
stage. But compiler structure should support easy addition,
which means support different modes of variables. Which
essentially meanst that if you want to add floating point
later you should support different size of short and long
releatively early.

Another thing is bitfields: personally I make little use
of them, but sometimes they are quite useful. And there
are a lot of programs using them.

Let me add that if I were to pick my subset of C it
probably would inculde limited VLA-s. Namely, AFAIK code
like:

void mat_add(int n, int a[n][n], int b[n][n]) {
int i,j;
for(i = 0; i < n; i++) {
for(j = 0; j < n; j++) {
a[i][j] += b[i][j];
}
}
}

is valid C99. And one could write essentially identical
code in Fortran66 and it is not very hard to support
(full VLA support need much more effort).
Post by ***@gmail.com
There are design decisions that
make it more complex than necessary simultanously
limiting what it can do. In particular using two symbol
tables and acumulator model for code generation.
Could you explain these problems in more detail please?
I'd like to understand what is wrong with it.
Scope in programming languages is recursive. You may claim
that in C there are "really" only two scopes: file scope and
function scope, but C90 standard says differently. So,
for correct implementation of C you need to handle multiple
scopes. Now, there are well established methods to handle
multiple scopes using single symbol table. Simplest one
just puts new definitions at and of symbol table. It also
remembers position in symbol table corresponding to start of
scope and when inner scope ends simply resets end to remembered
position (which effectively removes all entries from inner
scope and restores entries from outer scope). There is
variation of this using hash tables. So two symbol
tables in SubC means that you get _more_ complicated
compiler which is less capable than simpler one.

Actually, the two symbol tables are tip of iceberg.
Natural handling of types is recursive: pointer type
contains type of thing pointed to, structure has list
of fields and each of fields has its own type. In
compiler it is convenient to have function types,
such type has type of return value and argument types
(C does not have function types but have pointers to
functions and function type in compiler simplifies
handling of pointers). SubC apparently encodes
several combinations of types as integers and
uses switches to handle them. Each switch branch
needs its own code and due to combinations you
get more code. With recursive handling you have
less cases, so simpler code and more capable
compiler.

After second thought I think there is reason for
unnecessary complexity in SubC. Namely, IIUC
oryginally SubC had no support of structures.
Nils wanted self-compilation which implied
no structures in compiler. Recursive approach
heavily depends on structures so Nils probably
thought that without structures "flat" approach
is simpler. But avoidance of structures (I saw
only one structure type in SubC source) means
that code is more complicated and harder to
understand.

Concerning accumulator machine: there are several
machine models usable for code generation. Basically,
main part of compiler generates instructions for
simple abstract machine and code generator is
responsible for translating instructions from
abstract version to actual machine code. In
simplest version each abstract instruction is
separately translated to one or more machine
instructions.

Popular abstract machines are stack machines
and 3 address representation. What is wrong
with accumulator machine as abstract target?
Well, there is one accumulator so there are
a lot of moves to/form accumulator. Which
means more work when generating instructions
for abstract machine and more work in code
generator. And you either get poor object
code or spent significant effort combining
abstract instructions into machine instructions.
Compared to that 3 address representation
means that when generating abstract instructions
you need to allocate space for temporaries
(which is easy) and except of temporaries
other parts are easier. When generating
machine instructions from 3 address
representation code generator is simpler
and has more opportunity to better use machine
instructions.

Let me add that I did really simple code
generator using 3 address representation
(as part of a toy compiler). This was mainly
to see how bad the generated code will be.
Initially it was really bad, maybe a little
worse than code from SubC. But I noticed
a few little tricks. The main being that
each expression has some destination and
machine code generator directly produces
result in this destination. In abstract
level a little care ensured that final
destination is propagated down, so many
moves generated by earlier version are no
longer needed.

Maybe as another comparison, I also did machine
code generator for "production" compiler (this was
addition to existing compiler, other part were already
done, but did not support machines that I wanted). This
was fancy language having more features than
C. Code generator for x86_64 has 2291 lines.
Code generator for 32-bit arm has 1579.
Here representation is again 3 address representation
(but there is explict stack and address may be
on the stack). I dare to say that both generate
_much_ better code than SubC. The code is
not as good as I would wish, mainly because
register allocation is done outside and is
quite naive (first few variables go to
available registers, other are no the stack).
As I mentioned, language is rather fancy.
Many things were handled in machine independent code.
But my code generator had to do loads/stores of
various sizes (including arbitrarily sized bitfields)
and had special support for some higher-level
constructs. My point here is that you can
get to much better machine code than SubC in
releatively simple compiler (in particular code
was essentially generated separately for each
abstract instruction). But design choices
matter: due to earler design choice I was
stuck with suboptimal register allocation.
OTOH representation of addresses had place
for offset and autoincrement/autodecrement.
Which means that on x86_64 equivalent of

a = t[i + 5];

could translate to single machine instruction
(assuming that a was allocated in register
and t and i were available in registers and
size was OK for x86 SIB mode). Similarly
on arm equivalent of

*(q++) = *(p++);

translated to two machine instructions (thanks
to availabliity of autoincrement on arm).

Anyway, you want public domain code, but licence does
not prevent you learning from licenced code.
You can adapt the same design choices in your
program. Or when something does not work well
you can avoid this.
--
Waldek Hebisch
muta...@gmail.com
2022-02-01 14:44:50 UTC
Permalink
On Tuesday, February 1, 2022 at 6:56:26 AM UTC+11, ***@math.uni.wroc.pl wrote:

Hi Waldek. Thanks for your response.
Post by a***@math.uni.wroc.pl
Hmm, algorithms in compiler are really not more complicated
that algorithms in OS, database, BBS or even text editor.
Actually I struggled writing the FAT handling code
for writing too. It took years before I was able to
manage FAT16 only. Then someone from Slovakia
came along and made it support writing FAT12,
plus read and write FAT32, very quickly.
Post by a***@math.uni.wroc.pl
There is problem of scale if you go for full compiler.
But this is not too bad. Self compiling toy compiler
for subset of C should be doable in 2-3 kloc. At about
20-25 kloc you should be able to get C90 compilance.
In our time those are not really big programs.
20 kloc is very big and is not going to happen. What
we know is that we got 6 kloc in the last 50 years.
That's the real-world constraint.
Post by a***@math.uni.wroc.pl
Already C90 have many things which are problematic to add
to SubC. For me non-negotiable is full support for structs.
Ok. In my life I've never attempted to put a struct
on the stack, so I'm not going to miss anything.
Post by a***@math.uni.wroc.pl
Floating point is another thing: it is reasonable to skip
floating point in initial toy and add it only at later
stage. But compiler structure should support easy addition,
which means support different modes of variables. Which
essentially meanst that if you want to add floating point
later you should support different size of short and long
releatively early.
Ok. I'll just assume I can't have floating point until
someone does a rewrite.
Post by a***@math.uni.wroc.pl
Another thing is bitfields: personally I make little use
of them, but sometimes they are quite useful. And there
are a lot of programs using them.
If it can support my own code, that's a bloody good
start. And I don't think I have ever coded a bitfield.
Post by a***@math.uni.wroc.pl
Scope in programming languages is recursive. You may claim
that in C there are "really" only two scopes: file scope and
function scope, but C90 standard says differently. So,
for correct implementation of C you need to handle multiple
scopes.
Ok, so that's variables that are defined in blocks instead
of at the beginning of a function. That did in fact bite me,
as I occasionally define variables in blocks, but I am
basically happy to "fix" that code so that it complies with
what SubC requires.
Post by a***@math.uni.wroc.pl
Actually, the two symbol tables are tip of iceberg.
Natural handling of types is recursive: pointer type
contains type of thing pointed to, structure has list
of fields and each of fields has its own type. In
compiler it is convenient to have function types,
such type has type of return value and argument types
(C does not have function types but have pointers to
functions and function type in compiler simplifies
handling of pointers). SubC apparently encodes
several combinations of types as integers and
uses switches to handle them. Each switch branch
needs its own code and due to combinations you
get more code. With recursive handling you have
less cases, so simpler code and more capable
compiler.
Ok, I didn't fully understand that, but maybe SubC is
more understandable with some concrete discreet
types rather than generic recursion. I guess I will find
out when I reach that bit of the book.
Post by a***@math.uni.wroc.pl
After second thought I think there is reason for
unnecessary complexity in SubC. Namely, IIUC
oryginally SubC had no support of structures.
Nils wanted self-compilation which implied
no structures in compiler. Recursive approach
heavily depends on structures so Nils probably
thought that without structures "flat" approach
is simpler. But avoidance of structures (I saw
only one structure type in SubC source) means
that code is more complicated and harder to
understand.
Ok.
Post by a***@math.uni.wroc.pl
Concerning accumulator machine: there are several
machine models usable for code generation. Basically,
main part of compiler generates instructions for
simple abstract machine and code generator is
responsible for translating instructions from
abstract version to actual machine code. In
simplest version each abstract instruction is
separately translated to one or more machine
instructions.
I see.
Post by a***@math.uni.wroc.pl
Popular abstract machines are stack machines
and 3 address representation. What is wrong
with accumulator machine as abstract target?
Well, there is one accumulator so there are
a lot of moves to/form accumulator. Which
means more work when generating instructions
for abstract machine and more work in code
generator. And you either get poor object
code or spent significant effort combining
abstract instructions into machine instructions.
It might be easier to understand though, which
would be beneficial to me so that I can actually
support the product. Regarding poor object code -
I don't care about that. I just want the thing to work.
If a Good Samaritan turns up in the next 50 years
and writes a better public domain C compiler, that's
great. Otherwise, it's looking increasingly likely that
I am going to make SubC work for a practical task.

BFN. Paul.
Johann 'Myrkraverk' Oskarsson
2022-02-06 09:18:49 UTC
Permalink
Post by ***@gmail.com
Well, compiler algorithms were hard to invent. But now
they are described in books and it is not hard to
reproduce algorithm from a book.
It is when you don't understand the algorithm in the book.
My brain wasn't designed for complex algorithms.
Have you read Holub's compiler book? Most of it is expressed
in C code, and I find it fairly easy to understand. The most
accessible compiler book I've come across.

You can find the pdf and source at

https://holub.com/compiler/

and while the compiler itself is nothing fancy, and isn't
/public domain/, it could be a good start for such a project
if you really want to make a compiler.

[All that said, I have my own projects, and am not interested
in adding more, outside of paid developer contracts at work,
so I won't be writing any public domain compiler on my own.]

Good luck,
--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk
James Harris
2022-02-10 13:27:49 UTC
Permalink
Post by ***@gmail.com
Post by Rod Pemberton
Post by ***@gmail.com
I have had considerable success in modifying subc
(a public domain C90-subset) to at least be able to
compile PDOS-generic.
I have high hopes for this combination of public
domain OS and public domain C compiler.
My modified version can be found in subc*.zip in
custom.zip from http://pdos.org
Did you ditch Alexei Frounze's Smaller C?
I only ever used it for its huge memory model DOS
support. But when I found out that Watcom had
proper support for huge memory model, using 8086
instructions, I switched to that. But in the last few
hours I realized I could probably switch again, by
modifying SubC to provide the required huge memory
model support, and only 32-bit integers.
The other thing I am interested in doing is adding
support for S/370. Since I want my work to be public
domain, I need a public domain base, not someone
else's explicitly copyrighted work.
I am not sure what your requirements are but AIUI a certain Andrew
Jenner wrote a true 8086 backend for GCC and someone called TK Chia
added far pointers to it. A good keyword for it is ia16 and there's some
info at

* https://gitlab.com/tkchia/gcc-ia16
* https://launchpad.net/~tkchia/+archive/ubuntu/build-ia16/
--
James Harris
muta...@gmail.com
2022-03-16 11:33:16 UTC
Permalink
Is there a reason why SubC would not allow local
variables to be initialized? Is it more difficult to
parse?

If so, maybe I should stop initializing local variables
rather than attempt to change SubC.

BFN. Paul.

Loading...