« THIS IS A FORK OF THE CALCULATOR, WHICH WILL COMPILE RAINER'S "LC" LANGUAGE »/*
Note: code forces a divide-by-zero error in order to allow point of error
to be trapped by gdb, thus allowing a backtrace.
To do:
const arrays not yet assigned
also need to handle local variables properly. currently manually renamed, eg k => breakline_k
while/loop/break to be added
see codegen_c entry for DEFPROC - tweak to determine int or void
current bug:
#line 263
// 8440: SEQUENCE
// 6875: DEFPROC
*** Not Implemented: codegen_three_address(DEFPROC)
// 8435: SEQUENCE
// 8430: LINENO
#line 263
// 8432: SEQUENCE
// 8001: DEFPROC
*** Not Implemented: codegen_three_address(DEFPROC)
// 8426: DEFPROC
*** Not Implemented: codegen_three_address(DEFPROC)
// 19663: SEQUENCE
// 19658: LINENO
#line 538
// 19660: SEQUENCE
// 8941: FNCALL
// 8938: TAG
// 0: LINE
// 1: NOOP
// 0: LINE
// 1: NOOP
// 0: LINE
// 1: NOOP
... repeats ad inf ...
Walk_AST() - one pass for main body, another for procs, third for initialised
decls, fourth for uninit decls
create a codegen_src() procedure to re-output source
fix occassional ASTcodes that pun on other ops.
print reconstituted source at sequence points, rather than trying to print actual
source. cst would have been better than AST however. can i link concrete syntax
array (a) to AST (a) ???? (is a[] reused per statement? don't think so)
REGRESSION TEST 45 FAILING! works when a label inserted...
relatively soon: interpreter first, 3-address next.
Replace AST with linearised code. use something like Robertson 3rd pass technique
of blocks of irreducable code to be stitched together between things like branches
ret, call etc.
instead of outputting decls, procs, and code all interspersed, have a tree-walk
scan the code for each type of data and output only that type so that we
can not only plant the __main__: label properly, but also output procs
after the code, and output data to an appropriate psect.
code changed to force decl order at parse time
fix local decls in procs. array decls, consts. fold <number> (-num)
ADD SOURCE CODE TO AST PROPERLY. Perhaps hook it in at <SS> level? Add in
all lines not previously added, for lineno = thisline? <-------- working on this now.
error recovery currently broken. 'bestparse' syntax error pointer
precedence for bitwise & and |. Duplicate full C exprs?
array params? (dope vectors)
Need to ensure that the language supports '//' comments
Seemed to hang indefinitely when given a badly formed const: '\'
Lesson roadmap: minimise stack depth by rotating symmetric nodes,
leads to minimising number of registers... (same algorithm, reuse code)
lesson: static stack depth equates to R<n>
Add the no. of cells in a trip to the trip itself (immediately after
the opcode - adjust the macros accordingly...) so that things like
function definitions and function calls are not linked lists.
the trip-debugging code also needs to be adjusted to match.
have the opcode field point to an opcode trip?
similar level of flattening for ops at the same precedence level?
(why, apart from it looks nicer as a tree in graphviz????)
do a proper symbol table and generate a tree describing a declaration.
generalise getvar and use 'var's where appropriate, clean up
current messy code where we extract the string
replace the lexer with the simpler one from the precedence demo?
CFG: SERIALISE THE AST! Needed for loops/conditionals/gotos. how much of
the non-flow-control can be left as a tree??? Do we need a 'link' field
in the AST (or should I use a separate data structure altogether? There's
no real need to...) Is this merely a case of flattening the tree by
using the 'SEQUENCE' operator? How about just adding a label field
to a sequence op? Is the LHS always code and the RHS always a link?
Serialised AST *requires* REDIR nodes. Probably a lot of link scanning,
since we can't easily compact code or index into it when making optimisation
changes.
to add: '&', distinguish between BOOL and LOG AND,
if/then/else while/for/until/break signal??? (error handling?)
procedures/functions multiple result syntax? but not until the recent
messes caused by adding simple functions have been cleaned up. Don't
let the code deteriorate *and* bloat at the same time.
Add I/O (put/get char from stdin/outfile, and keyboard/screen)
arrays, structs, pointers??? Can we keep this a 'safe' language?
strings as indexes into string pool. range checked. no absolute
ram addresses ever.
(Note: interpreter does *not* require serialisation! Can't let Tim
get away with taking this shortcut!)
See rainer's "LC" - are we missing any features it needs? Can we get
away with a version that uses 32-bit ints and no chars/words? (maybe
justify it by using UTF8 for our character set?)
do we canonicalise all ast items so that a recursive compile() does
the right thing, vs having to look down particular branches with a
different procedure because the same node might have two different
meanings in different contexts? I.e. an AST cleanup phase to make
the actual code generation phase simpler. (& more drop-through cases)
(first example that springs to mind - VARs in param list) maybe
some times same code *needs* to be interpreted twice in different
ways, eg one pass over params to add to symbol table and declare
local storage, and then a second to pop from stack at entry to proc.
conflating code by drop-through in switch statements.
should DEFPROC just be another DECLARE object with a type of FN?
Add handler to generate code in different psects so that we can
effectively switch output streams to, say, write procedures at the
end of the file to avoid having to jump over them...
(fix the trailing '\' bug in ctohtml/cfold! - or have taken read the C
source, and warn if \ followed by a non-blank non-comment line?)
extract procs to smaller files - plug-in swap of modules as new
features added. Need a #include pre-processor (to regenerate single-
file format for folded display)? (handle "..." includes only, not <>'s)
(move cfold to windows... update unix copy with portability fixes
before updating with function-call code.)
generate HTML listing files and embed graphviz images. remove ascii
tree stuff (which is rather big)
Revisit having return address on data stack? How awkward would
that be?
Tree traversal to detect recursion? How expensive? (PDP15 soln)
Wasn't even thinking about optimisations yet! (SSA etc) [what was
the thing call that recombined a loop? Y-node? Combinator?] - it was a Phi-node I think
However...
'clean' functions allow compile-time constant folding of
function calls. Is there a way to generalise this with the
current interpreter rather than the usual methods of
constant folding? Maybe passing back something like 'NAN'
which causes fast exit from procs/loops etc whenever
encountered???? how about "a = b + badfn() * c + d",
where c is known to be 0. NAN scheme would cause early
return and miss opportunity to optimise.
inline replacement using AST is more powerful than C's
macros because you can obey scoping rules and make it
identical to a real fn call. (Reminiscent of lambda
functions/functional language programming)
almost forgot the old 'tail recursion' hack. is there
any way we can generalise it and get even more cases?
Question: do a version that duplicates Rainer's LC so that we can compile
his port of ecce? Or edit the ecce port to match the new language (esp.
Tim's version of the language, as a test piece...)
Code generation - external calls via DLLs. Calling conventions,
register conventions. Use of two conventions - one for internal
calling, another for external callers that follows system standards.
Simple stack machine - is there a macro assembler for windows
that will let us compile using the current output???
If not should we write one? :-)
Re earlier comment of psects - generate procedures to a serialised
AST, allows us to output the code in arbitrary order (see Robertson's
talk at Edinburgh, anecdote paper on Ackerman's) i.e. can we do
the same tricks as Robertson's pass three by juggling AST nodes
at some appropriate stage in the conversion from AST to CFG to code?
*/
// You are reading a C source file displayed using 'folds'. Click on one of the
// sections below to unfold its contents. Reclicking, or clicking on the expanded
// text, will collapse the expanded text back to the summary.
// If you are interested in the software used here (rather than the actual program,
// which is a demonstration compiler for the Yahoo 'compilers101' group) then
// you can read about it at my blog:
//
// http://techennui.blogspot.com/2008/04/semi-literate-programming.html
« Include files, consts, incidental globals »
« Parser support »
« Data structures »
// We use a stringpool, and strings are indexes into this pool. This
// is useful for the same reasons that the AST is an array indexed
// by integers rather than a struct with pointers. It may also
// save space by reusing common strings. And we get a free tag
// to describe strings. (Also we can compare strings just with
// an integer tag comparison, if we ever want to)
// not everything uses this yet. search for malloc and realloc and strdup and free to find potential problems
#define MAXPOOL (1024*32)
char stringpool[MAXPOOL];
int nextstring = 0;
int str_to_pool(char *s)
{
int tag;
for (tag = 0; tag <= nextstring; tag++) {
if (strcmp(stringpool+tag, s) == 0) {
return tag; /* found, one way or another */
}
}
}
« C[] is the source character token stream »
« A[] is the Analysis record. Contents point to a C[] structure when the item is a BIP,
otherwise the format is [phraseno] [altno] [no-of-subphrases] [subphrases and/or BIPs...]
for example, if A[25] contained 10, that would mean that the token for A[25] was stored
in C[10]. (And the C string would be at &stringpool[c[A[25]]]?) »
/* variables used by line-reconstruction (lexer) */
FILE *sourcefile;
char *curfile;
int startline = TRUE, whitespace = TRUE, lineno = 1, col = 0, ch, peek;
« Support procedures »
« Incidental I/O support »
« Support proc for storage management »/*
I'm reasonably sure that I have a bug in this procedure. Compiling ecce crashes
if MINSIZE is 1024. By greatly increasing the initial allocation we avoid whatever
the problem is. MUST BE FIXED. TO DO
Should check null returns from calloc/realloc - may be as simple as running out of
space. Possibly a problem with lack of contiguous blocks available - worst case,
no block can be reused and we end up with N^2/2 space used instead of N.
GOT IT! *arraysize = MINSIZE should be *arraysize = MINSIZE-1
*/
static void *makespace_(void *c, int nextfree, int *arraysize, int objsize) {
#define MINSIZE (1024*16)
if ((c == NULL) || (*arraysize == 0) || (nextfree == 0)) { // TOO MANY TESTS! Need to pick the right one!
c = calloc(MINSIZE, objsize); *arraysize = MINSIZE-1;
} else if (nextfree >= *arraysize) {
*arraysize = (*arraysize * 2) + 1;
c = (void *)realloc(c, (*arraysize+1) * objsize); // 0:arraysize, inclusive. eg 0:15, 0:127 etc
// this was causing a crash on the second call, on Windows/TCC. c was NULL on first call.
}
if (c == NULL) {fprintf(stderr, "makespace: %s\n", strerror(errno)); exit(errno);}
return c;
}
#define makespace(c, nextfree, arraysize) c = (typeof(c))makespace_(c, nextfree, &arraysize, sizeof(c[0]))
« Support procs for storing lexical units »void stores(char *s, int lineno, int col, int type, char *fname) {
int tag;
if (nextstring + strlen(s) + 1 >= MAXPOOL) exit(1); // TO DO: add message
strcpy(stringpool+nextstring, s); /* Create a backstop for when not found */
tag = str_to_pool(s);
if (tag == nextstring) nextstring += strlen(s)+1; /* Not found, add it */
makespace(c, nextfree, arraysize);
c[nextfree].s = stringpool+tag; c[nextfree].l = lineno; c[nextfree].col = col;
c[nextfree].f = fname; c[nextfree].t = type;
nextfree++;
}
void storec(int ch, int lineno, int col, int type, char *fname) {
onecharstr[ch*2] = ch; onecharstr[ch*2+1] = '\0'; // convert char to 1-char string before saving.
stores(&onecharstr[ch*2], lineno, col, type, fname);
}
« simple proc to recognise if a token is a keyword »
« Main procedures - the parser and the code generator (which embodies the grammar) »
« Line reconstruction, which for this language equates to lexing »
static int xfgetc(FILE *f);
static void xungetc(int c, FILE *f);
void line_reconstruction(void)
{
« Pre-process input ready for parsing. Tokens are stored in array C[] » for (;;) {
ch = xfgetc(sourcefile); if (ch == EOF) break;
ch &= 255; // int, positive.
peek = xfgetc(sourcefile); xungetc(peek, sourcefile);
if (isalpha(ch)) {
« token or keyword » } else if (isdigit(ch)) {
« Number » } else switch (ch) {
case '$':
« Hex constant \$[0-9a-fA-F]+ »
// Q: store the '$' in the string or not?
whitespace = FALSE;
col++;
if (ishex(peek)) {
int nextfree = 0, numsize = 0;
char *number = NULL;
for (;;) {
makespace(number, nextfree, numsize);
ch = xfgetc(sourcefile);
if (ishex(ch)) {
col++;
number[nextfree++] = ch;
} else {
number[nextfree] = '\0'; xungetc(ch, sourcefile);
break;
}
}
stores(number, lineno, col, TYPE_HEXINT, curfile);
free(number);
} else {
// Warn: probably an error... should not be any naked '$' symbols.
// If the error to be prined would have been a generic syntax
// error at the same location, then maybe give a more informative
// error message such as "Unexpected character '$' near: ..."
// On the other hand the generic mechanism probably reports this
// almost as accurately.
storec(ch, lineno, col++, TYPE_CHAR, curfile);
}
break;
case '\'': // Handle 'c' character constants
case '"': // Handle "string"
« literals » {
int nextfree = 0, strsize = 0, quotech = ch;
char *string = NULL;
whitespace = FALSE;
col++;
for (;;) {
ch = xfgetc(sourcefile); // Newlines are allowed
col++;
makespace(string, nextfree, strsize);
if (ch == '\\') {
ch = xfgetc(sourcefile); col++;
if (ch == '\\') { string[nextfree++] = ch;
} else if (ch == '\'') { string[nextfree++] = '\'';
} else if (ch == '"') { string[nextfree++] = '"';
} else if (ch == 'n') { string[nextfree++] = '\n';
} else if (ch == 'r') { string[nextfree++] = '\r';
} else if (ch == 't') { string[nextfree++] = '\t';
} else if (ch == 'x') {
int x, x1, x2;
x1 = xfgetc(sourcefile); col++;
if (!ishex(x1)) {
// WARN: Bad format
continue;
}
x2 = xfgetc(sourcefile); col++;
if (!ishex(x2)) {
// WARN: Bad format
continue;
}
x = (hextobin(x1)<<4) | hextobin(x2);
if (x == 0) {
// WARN: embedded NUL in a string is asking for trouble...
}
string[nextfree++] = x;
} else {
// Warn of unknown (to me) \x escape. Probably an error.
string[nextfree++] = '\\'; string[nextfree++] = ch;
}
} else if (ch != quotech) { string[nextfree++] = ch;
} else {
string[nextfree] = '\0';
break;
}
}
if (quotech == '\'') {
if (strlen(string) == 1) {
} else if (strlen(string) <= 4) {
// Warn that 'xx' as a 32-bit int is a non-standard extension
} else {
// Warn that this is probably a string with the wrong type of quote.
}
}
stores(string, lineno, col, (quotech == '\'' ? TYPE_CHARCONST : TYPE_STRING), curfile);
free(string);
}
break;
case '/':
« COMMENTS (or just a divide symbol) »
// WHITESPACE
case '\n': lineno++;
case '\r': startline = TRUE; col = 0; whitespace = TRUE;
break;
case '\t':
case ' ': col++; // Does not affect whitespace
break;
// DIRECTIVES
case '#':
// If we interpret any #-directives while lexing, we don't want to
// do an expensive test on every token, so what we can do is set
// a countdown timer on the introductory token (either this '#'
// or the actual keyword such as 'ifdef') and then test that the
// *previous* tokens match when the timer hits 0, eg
// C[cp-3] == '#' && C[cp-2] == 'include' ... etc
if (!whitespace) {
// WARN: probably an error... should not be any '#' symbols in the
// middle of a line. (This language uses "!=" or "<>" for not-equal)
}
// Drop through
default:
whitespace = FALSE;
storec(ch, lineno, col++, TYPE_CHAR, curfile);
}
}
// set up a dummy at the end because we sometimes look ahead by 1
// in the parsing code and don't want to hit uninitialised data.
c[nextfree].t = TYPE_EOF;
c[nextfree].s = "<EOF>";
c[nextfree].l = lineno;
c[nextfree].col = col;
}
« Abstract Syntax Tree data structures »typedef int TRIP; // A 'trip' was originally a 'triple' of <opcode, operand, operand>
// Now we use n-ary tuples but retain the name for nostalgia's sake...
#define MAXTRIPS (1024*100)
int AST[MAXTRIPS]; /* Should use flex arrays here too... */
int nexttrip = 0;
#ifdef DEBUG
int checkast(TRIP idx, int lineno)
{
if (idx < 0) {
fprintf(stderr, "Run-time error at line %d: negative index AST[%d] is not valid!\n", lineno, idx);
idx = idx/0;
} else if (idx >= MAXTRIPS) {
fprintf(stderr, "Run-time error at line %d: AST[%d] is out of range (max %d)!\n", lineno, idx, MAXTRIPS);
idx = idx/0;
} else {
return idx;
}
}
#else
#define checkast(x,l) (x)
#endif
#define opsym(root) AST[checkast(root,__LINE__)]
#define leftchild(root) AST[checkast((root)+1,__LINE__)]
#define rightchild(root) AST[checkast((root)+2,__LINE__)]
#define nthchild(root, n) AST[checkast((root)+n,__LINE__)]
« Several tables describing the AST opcodes. Some are only relevant to diagnostic output. »
// the first 3 tables are used by the compiler:
typedef enum opcode {
« names of AST operators » ERROR, NOOP, // we're now *only* using NOOP as a placeholder when swapping out DEFPROC etc... TEMP HACK
SEQUENCE, DEFPROC, DEFPARAM, PARAM, ARRAYEL,
CONST, TAG, LABEL, OPERATOR,
ASSIGNSCALAR, ASSIGNARRAY, DECLARESCALAR, DECLAREARRAY, VAR, IFTHEN, IFTHENELSE,
INPUT, PRINT, FNCALL, PROCCALL, RETURN,
BAND, BOR,
ADD, SUB, MUL, DIV, MOD,
LSH, RSH, LAND, LOR, EXP,
NEG, NOT, PAREN,
EQ, NE, LT, GT, LE, GE,
ASS, INDEX,
INT, REAL,
WHILE, LOOP, REPEATIF, BREAK, CONTINUE, GOTO,
REDIRECT, LINEAR_CODE,
LINE, LINENO,
// ---------- anything below this line is only for use in serialise_AST
B, BT, BF, //BEQ, BNE, BLT, BGT, BLE, BGE,
MAX_OPCODE
} OPCODE;
int prio[] = {
« priority level of infix operators »/*
Ordered according to http://www.difranco.net/cop2220/op-prec.htm
unary neg, not 200
exp 100 ?
* / % MUL DIV MOD 90 <-- left-assoc broken TO DO!!!!!
+ - ADD SUB 80 <-- already fixed left associativity
<< >> RSH LSH 70 <-- left-assoc broken (or is this a right-assoc operator?)
the rest are not used in a context where associativity matters.
< <= > >= LT LE GT GE 60
== != EQ NEQ 50
& BAND 40
| BOR 30
&& LOGAND 20
|| LOGOR 10
*/
999/*ERROR*/, 999/*NOOP*/,
999/*SEQUENCE*/, 999/*DEFPROC*/, 999/*DEFPARAM*/, 999/*PARAM*/, 999/*ARRAYEL*/,
999/*CONST*/, 999/*TAG*/, 999/*LABEL*/, 999/*OPERATOR*/,
999/*ASSIGNSCALAR*/, 999/*ASSIGNARRAY*/, 999/*DECLARESCALAR*/, 999/*DECLAREARRAY*/, 999/*VAR*/, 999/*IFTHEN*/, 999/*IFTHENELSE*/,
999/*INPUT*/, 999/*PRINT*/, 999/*FNCALL*/, 999/*PROCCALL*/, 999/*RETURN*/,
40/*BAND*/, 30/*BOR*/,
80/*ADD*/, 80/*SUB*/, 90/*MUL*/, 90/*DIV*/, 90/*MOD*/,
70/*LSH*/, 70/*RSH*/, 20/*LAND*/, 10/*LOR*/, 100/*EXP*/,
200/*NEG*/, 200/*NOT*/, 999/*PAREN*/,
50/*EQ*/, 50/*NE*/, 60/*LT*/, 60/*GT*/, 60/*LE*/, 60/*GE*/,
1/*ASS*/, 999/*INDEX - probably does need a priority... */,
999/*INT*/, 999/*REAL*/,
999/*WHILE*/, 999/*LOOP*/, 999/*REPEATIF*/, 999/*BREAK*/, 999/*CONTINUE*/, 999/*GOTO*/,
999/*REDIRECT*/, 999/*LINEAR_CODE*/,
999/*LINE*/, 999/*LINENO*/,
// ---------- anything below this line is only for use in serialise_AST
999/*B*/, 999/*BT*/, 999/*BF*/, //999/*BEQ*/, 999/*BNE*/, 999/*BLT*/, 999/*BGT*/, 999/*BLE*/, 999/*BGE*/,
};
int arity[] = {
« Number of operands in the tuple for this opcode. Most are triples. » 1, 1,
3, 4 /* DEFPROC */, 3 /* DEFPARAM*/, 3 /* PARAM */, 3 /* ARRAYEL - why 3? Why not 1? - check once rest of code filled in for this new item ... */,
3, 3, 2 /* LABEL */, 3,
3 /*ASSIGNSCALAR*/, 4 /*ASSIGNARRAY*/, 3 /* DECLARESCALAR */, 4 /* DECLAREARRAY */, 3, 3, 4,
2, 2, 3 /* FNCALL */, 3 /* PROCCALL */, /* Variable! Store in the struct? */ 2 /* RETURN */,
3, 3,
3, 3, 3, 3, 3,
3, 3, 3, 3, 3,
/* NEG */ 2, /* NOT */ 2, /* PAREN */2,
3, 3, 3, 3, 3, 3,
3, 3 /*INDEX*/,
3, 3,
3, 3/*LOOP - wrong?*/, 3, 3, 3, 2 /*GOTO*/,
3, 2/*LINEAR_CODE*/,
3, 2,
2/*B*/, 2/*BT*/, 2/*BF*/, //2/*BEQ*/, 2/*BNE*/, 2/*BLT*/, 2/*BGT*/, 2/*BLE*/, 2/*BGE*/,
};
// name was intended only for debugging, but in this initial implementation
// it is also being used as the source of mnemonics for the code generator.
// In real life, the machine opcodes would _not_ map 1:1 with AST operators.
char *name[] = {
« ascii representation of AST operator names for debugging » "ERROR", "NOOP",
"SEQUENCE", "DEFPROC", "DEFPARAM", "PARAM", "ARRAYEL",
"CONST", "TAG", "LABEL", "OPERATOR",
"ASSIGNSCALAR", "ASSIGNARRAY", "DECLARESCALAR", "DECLAREARRAY", "VAR", "IFTHEN", "IFTHENELSE",
"INPUT", "PRINT", "FNCALL", "PROCCALL", "RETURN",
"BAND", "BOR",
"ADD", "SUB", "MUL", "DIV", "MOD",
"LSH", "RSH", "LAND", "LOR", "EXP",
"NEG", "NOT", "PAREN",
"CMPEQ", "CMPNE", "CMPLT", "CMPGT", "CMPLE", "CMPGE",
"ASS", "INDEX",
"INT", "REAL",
"WHILE", "LOOP", "REPEATIF", "BREAK", "CONTINUE", "GOTO",
"REDIRECT", "LINEAR_CODE",
"LINE", "LINENO",
"B", "BT", "BF", //"BEQ", "BNE", "BLT", "BGT", "BLE", "BGE",
};
// the following tables are only used for debugging:
char *shortname[] = {
« ascii representation of AST operators for display when drawing trees » "ERROR", "%)" /* %) */,
"';'", "DEFPROC", "()" /* def (% */, "','", "','",
"CONST", "TAG", "LABEL", "OPERATOR",
"=", "[]=", "DECLARESCALAR", "array", "var", "'if (...) then'", "'if (...) then ... else ...'",
"INPUT", "PRINT", "(%" /* (% */, "(%", "=>",
"'&&'", "'||'",
"'+'", "'-'", "'*'", "'/'", "'%'",
"'<<'", "'>>'", "'&'", "'|'", "'^'",
"'-'", "'~'", "'()'",
"'=='", "'!='", "'<'", "'>'", "'<='", "'>='",
"'='", "'[]'",
"INT", "REAL",
"WHILE", "LOOP", "REPEATIF", "BREAK", "CONTINUE", "GOTO",
"REDIRECT", "LINEAR_CODE",
"LINE", "LINENO",
"B", "BT", "BF", //"BEQ", "BNE", "BLT", "BGT", "BLE", "BGE",
};
char *c_infix_op[] = {
« ascii representation of infix operators for display. Non-infix ops have dummy values. » "ERROR", "NOOP",
";", "DEFPROC", "DEFPARAM", ",", ",",
"CONST", "TAG", "LABEL", "OPERATOR",
"=", "[]=", "DECLARESCALAR", "DECLAREARRAY", "VAR", "if (...) then", "if (...) then ... else ...",
"INPUT", "PRINT", "FNCALL", "PROCCALL", "RETURN",
"&&", "||",
"+", "-", "*", "/", "%",
"<<", ">>", "&", "|", "^",
"-", "~", "()",
"==", "!=", "<", ">", "<=", ">=",
"=", "[]",
"INT", "REAL",
"WHILE", "LOOP", "REPEATIF", "BREAK", "CONTINUE", "GOTO",
"REDIRECT", "LINEAR_CODE",
"LINE", "LINENO",
"B", "BT", "BF", //"BEQ", "BNE", "BLT", "BGT", "BLE", "BGE",
};
int display_children[] = {
« In tree-drawing code, how many children do we draw for this node? Not always the same as the arity. » 0, 0,
2, 3 /* DEFPROC */, 2 /* DEFPARAM */, 2 /* PARAM */, 2 /* ARRAYEL */,
0, 0, 1, 0,
2 /* ASSIGNSCALAR */, 3 /* ASSIGNARRAY */, 1 /* DECLARESCALAR - name, type, initval */, 2 /* DECLAREARRAY - name, type, bounds initvals */, 0, 2, 3,
1, 1, 2 /* FNCALL */, 2 /* PROCCALL */, /* Variable no of params */ 1 /* RETURN */,
2, 2,
2, 2, 2, 2, 2,
2, 2, 2, 2, 2,
1, 1, 1,
2, 2, 2, 2, 2, 2,
2, 2,
1, 0,
1/*WHILE*/, 0/*LOOP*/, 2/*REPEATIF*/, 0, 0, 1 /*GOTO*/,
0/*REDIRECT*/, 1/*LINEAR_CODE*/,
0, 0/*LINENO - suppressed for now - was causing crash in Walk_AST*/,
1/*B*/, 1/*BT*/, 1/*BF*/, //1/*BEQ*/, 1/*BNE*/, 1/*BLT*/, 1/*BGT*/, 1/*BLE*/, 1/*BGE*/,
};
int display_leftchild[] = {
« Do we display the left child of the node? » FALSE, FALSE,
TRUE, FALSE, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, FALSE,
TRUE /* ASSIGNSCALAR */, TRUE /* ASSIGNARRAY */, TRUE /*DECLARESCALAR*/, TRUE /*DECLAREARRAY*/, FALSE, TRUE, TRUE,
TRUE, TRUE, TRUE /* FNCALL */, TRUE /* PROCCALL */, TRUE /* RETURN */,
TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE,
FALSE, FALSE,
TRUE, FALSE, TRUE, FALSE, FALSE, TRUE,
TRUE, TRUE,
FALSE, TRUE,
TRUE/*B*/, TRUE/*BT*/, TRUE/*BF*/, //TRUE/*BEQ*/, TRUE/*BNE*/, TRUE/*BLT*/, TRUE/*BGT*/, TRUE/*BLE*/, TRUE/*BGE*/,
};
int display_rightchild[] = {
« Do we display the right child of the node? » FALSE, FALSE,
TRUE, FALSE, TRUE, TRUE, TRUE,
FALSE, FALSE, FALSE, FALSE,
TRUE /* ASSIGNSCALAR */, TRUE /* ASSIGNARRAY */, FALSE, TRUE, FALSE, TRUE, TRUE,
FALSE, FALSE, TRUE /* FNCALL */, TRUE /* PROCCALL */, /* Variable no of args >= 0 */ FALSE /* RETURN */,
TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE,
FALSE, FALSE, FALSE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE,
FALSE, FALSE,
TRUE, FALSE, TRUE, FALSE, FALSE, FALSE,
FALSE, FALSE,
FALSE, FALSE,
FALSE/*B*/, FALSE/*BT*/, FALSE/*BF*/, //FALSE/*BEQ*/, FALSE/*BNE*/, FALSE/*BLT*/, FALSE/*BGT*/, FALSE/*BLE*/, FALSE/*BGE*/,
};
static char xline[1024]; // make off heap instead, shorten binary
static int xi = 0, xcur_line = 1;
static void xungetc(int c, FILE *f)
{
if (xi > 0) {
xi -= 1;
} else if (c == '\n') {
xcur_line -= 1;
xi = strlen(xline);
}
ungetc(c, f);
}
TRIP make_binary_tuple(OPCODE op, TRIP parm1, TRIP parm2);
static int xfgetc(FILE *f)
{
static int last_line = -1;
int c, ch;
c = fgetc(f);
if (c == EOF) return EOF;
ch = c&255;
if (ch == '\n') {
xline[xi] = '\0'; xi = 0;
if (last_line != xcur_line) {
strcpy(stringpool+nextstring, xline); (void)make_binary_tuple(LINE, xcur_line, str_to_pool(xline));
last_line = xcur_line;
}
xcur_line += 1;
} else xline[xi++] = ch;
if (xi == 1023) xi = 1022;
xline[xi] = '\0';
return c;
}
char *nameof(TRIP t)
{
« Extract the name of a variable or the value of a const. Stored in stringpool.
Space is never reclaimed, it just grows linearly. Still, has to be better than
using malloc and all the overhead/fragmentation that goes with it. » char *result = stringpool+nextstring;
int poolptr, op;
if (nextstring + 128 >= MAXPOOL) exit(1); // TO DO: add error message
if (t == -1) {
sprintf(result, "(null)");
} else {
op = opsym(t);
if (op == CONST) {
sprintf(result, "%d", rightchild(t));
} else if (op == VAR) { // for simpler diagrams, skip a level
sprintf(result, "%s", stringpool+rightchild(leftchild(t))); // punt to tag
} else if (op == TAG) { // for simpler diagrams, skip a level
sprintf(result, "%s", stringpool+rightchild(t));
} else {
sprintf(result, "%s", shortname[op]);
}
}
poolptr = str_to_pool(result);
if (poolptr == nextstring) nextstring += strlen(stringpool+poolptr)+1; /* Not found, add it */
return(stringpool+poolptr);
}
#ifdef DEBUG
int checkop(int idx, char *caller)
{
if (idx < 0) {
fprintf(stderr, "Run-time error: negative index arity[%d] is not valid (in %s)!\n", idx, caller);
idx = idx/0;
} else if (idx >= MAX_OPCODE) {
fprintf(stderr, "Run-time error: opsym(%d) is out of range (max %d)! (in %s)\n", idx, (int)MAX_OPCODE, caller);
idx = idx/0;
} else {
return idx;
}
}
#else
#define checkop(i,c) (i)
#endif
#define tripsize(i) arity[checkop((int)opsym(i),"tripsize")]
#if defined(DEBUG_TRIPS_AFTER) || defined(DEBUG_TRIPS_DURING) || defined(DEBUG_TRIPS_CODE)
void printtrip(TRIP i);
#endif
// note we're taking a slight shortcut here by storing the opcode directly in the
// first field. If fact a cleaner design might have been to make that field point to
// an 'opcode' trip, which is a unary operator whose sole parameter is the opcode.
// this would require one extra indirection at the point of access, which could
// be hidden in a procedure call anyway.
TRIP make_unary_tuple(OPCODE op, TRIP parm1) {
« Create a tuple for a unary operator »}
TRIP make_binary_tuple(OPCODE op, TRIP parm1, TRIP parm2) {
« Create a tuple for a binary operator »}
TRIP mkop(OPCODE op) {
return make_binary_tuple(OPERATOR, op, -1); // rightchild could be ptr to string?
}
static int latest_line = -1;
TRIP sequence(TRIP parm1, TRIP parm2) {
TRIP t1,t2,t3;
t1 = make_unary_tuple(LINENO, latest_line);
t2 = make_binary_tuple(SEQUENCE, parm1, parm2);
t3 = make_binary_tuple(SEQUENCE, t1, t2);
return t3;
}
void showline(int line)
{
int i, l;
l = c[0].l;
for (i = 0; i < nextfree; i++) {
//if (c[i].l != l) {if (c[i].l == line) fprintf(outfile, "\n %4d: ", c[i].l); l = c[i].l;}
if (c[i].t == TYPE_CHARCONST) {
if (c[i].l == line) fprintf(outfile, "'%s' ", c[i].s);
} else {
if (c[i].l == line) fprintf(outfile, "%s ", c[i].s);
}
}
// fprintf(outfile, "\n\n");
}
TRIP make_nary_tuple(OPCODE op, TRIP parm1, ...) {
« Create a tuple for an n-ary operator. Uses stdargs for arbitrary no. of params »}
TRIP make_proc_name(int pooloffset) {
return make_binary_tuple(TAG, 0, pooloffset);
}
TRIP make_int_const(int datatype, char *value) {
return make_binary_tuple(CONST, datatype, (int)atol(value)); // this casting will cause problems on 64 bit machines
}
TRIP make_string_const(int datatype, char *value) {
return make_binary_tuple(CONST, datatype, (int)value);
}
TRIP make_real_const(int datatype, char *value) {
return make_binary_tuple(CONST, datatype, (int)atof(value) /* Hacky. Should fix this. */);
}
TRIP getvar_from_tag(TRIP tag)
{
int i;
TRIP trip;
for (i = 0; i < nexttrip; i++) {
trip = leftchild(i);
if ((opsym(i) == VAR) && (rightchild(i) == INT) &&
(opsym(trip) == TAG) && (rightchild(trip) == tag)
) return i;
}
}
TRIP getvar(char *s) { /* tag must exist */
« Look up a tag in the string pool. » /* getvar needs to be more complex. Currently it just maps
from a string to the triple for a pre-declared var, adding
any unrecognised string as an int. Should really add it as
an error type, and handle scope rules */
int i, trip, tag;
// TO DO: use str_to_pool instead
strcpy(stringpool+nextstring, s); /* Create a backstop in case not found */
for (tag = 0; tag <= nextstring; tag++) {
if (strcmp(stringpool+tag, s) == 0) {
break; /* found, one way or another */
}
}
if (tag == nextstring) { /* Not found, auto-declare it */
nextstring += strlen(s)+1;
fprintf(stderr, "getvar_1: cannot find declaration for '%s'! - autodeclaring as 'INT'\n", s);
trip = make_binary_tuple(TAG, 0, tag);
return make_binary_tuple(VAR, trip, INT);
}
// unfortunately after the recent restructuring to use the string pool
// for everything, we may now have the string in the pool already even
// though we haven't created the corresponding VAR
/* having located the stringpool entry,
now find the appropriate declaration whose tag is using it */
for (i = 0; i < nexttrip; i++) {
trip = leftchild(i);
if ((opsym(i) == VAR) && (rightchild(i) == INT) &&
(opsym(trip) == TAG) && (rightchild(trip) == tag)
) return i;
}
fprintf(stderr, "getvar_2: cannot find declaration for '%s'! - autodeclaring as 'INT'\n", s);
trip = make_binary_tuple(TAG, 0, tag);
return make_binary_tuple(VAR, trip, INT);
}
TRIP newtag(char *s) { /* tag must *not* exist */
« Create a new tag and add it to the stringpool. » return make_binary_tuple(TAG, 0, tag);
}
TRIP new_or_existingtag(char *s) { /* tag may exist */
« Create a new tag and add it to the stringpool. » return make_binary_tuple(TAG, 0, tag);
}
« Debugging »#if defined(DEBUG_TRIPS_AFTER) || defined(DEBUG_TRIPS_DURING) || defined(DEBUG_TRIPS_CODE)
« Diagnostic procedure to print a triple. »« This section really isn't very interesting. It's not structured for
folding, and it's not relevant to the compiler algorithms.
You can comfortably skip this part of the code... »/* this is ONLY used for diagnostics and is the same in all test harness programs */
void printtrip(TRIP i) {
int op, parm1, parm2;
if (i < 0) {
fprintf(outfile, "** Too small: TRIP %d\n", i); (void)*(int *)0;
}
if (i >= MAXTRIPS) {
fprintf(outfile, "** Too large: TRIP %d\n", i); (void)*(int *)0;
}
op = opsym(i);
parm1 = leftchild(i);
parm2 = rightchild(i);
switch (opsym(i)) {
case LINENO:
fprintf(outfile, "// %d: LINENO %d\n", i, parm1);
break;
case DECLARESCALAR:
fprintf(outfile, "// %d: %s [@AST %d] ; TO DO - modify for different types\n",
i, name[op], parm1);
break;
case VAR:
fprintf(outfile, "// %d: %s %s[@AST %d] TYPE=%s\n",
i, name[op], stringpool+rightchild(parm1), parm1, name[parm2]);
break;
case TAG: /* parm1 not used for the moment */
fprintf(outfile, "// %d: %s %s[@Stringpool %d]\n",
i, name[op], stringpool+parm2, parm2);
break;
case DEFPROC:
fprintf(outfile, "// %d: %s \"%s\" [arglist @AST %d] [fnbody @AST %d]\n",
i, name[op], stringpool+rightchild(leftchild(parm1)), parm2, nthchild(i, 3));
break;
case CONST:
fprintf(outfile, "// %d: %s %s %d\n",
i, name[op], name[parm1], parm2);
break;
case OPERATOR:
fprintf(outfile, "// %d: %s %s (Orphaned AST cell)\n",
i, name[op], name[parm1]);
break;
case ASSIGNSCALAR:
fprintf(outfile, "// %d: %s [declared @AST %d] [value @AST %d]\n",
i, name[op], parm1, parm2);
break;
case IFTHENELSE:
fprintf(outfile, "// %d: %s [condition @AST %d] [then-statement @AST %d] [else-statements @AST %d]\n",
i, name[op], parm1, parm2, nthchild(i, 3));
break;
case IFTHEN:
fprintf(outfile, "// %d: %s [condition @AST %d] [then-statement @AST %d]\n",
i, name[op], parm1, parm2);
break;
case LINE:
fprintf(outfile, "// %d: %s %d: %s\n",
i, name[op], parm1, &stringpool[parm2]);
break;
case LAND:
case BAND:
case LOR:
case BOR:
case ADD:
case SUB:
case MUL:
case DIV:
case MOD:
case LSH:
case RSH:
case EXP:
fprintf(outfile, "// %d: %s [@AST %d] [@AST %d]\n",
i, name[op], parm1, parm2);
break;
case EQ:
case NE:
case LT:
case GT:
case LE:
case GE:
fprintf(outfile, "// %d: [@AST %d] %s [@AST %d]\n",
i, parm1, name[op], parm2);
break;
case NEG:
case NOT:
fprintf(outfile, "// %d: %s [@AST %d]\n",
i, name[op], parm1);
break;
default:
if (arity[opsym(i)] == 3) {
fprintf(outfile, "// %d: %s %d %d\n",
i, name[op], parm1, parm2);
} else if (arity[opsym(i)] == 2) {
fprintf(outfile, "// %d: %s [@AST %d]\n",
i, name[op], parm1);
} else {
fprintf(outfile, "// %d: %s ...? (%d ops)\n",
i, name[op], arity[opsym(i)]);
}
break;
}
}
#endif
#ifdef DEBUG_TREES
« Ascii-art debugging procedure for drawing trees »« OK, look, you really don't want to expand this section. It's not structured for
folding, and it's not relevant to the compiler algorithms. Trust me, you can skip
this... (and did I mention it was huge and monolithic???) »
int del = 4/*was 1*/; /* distance of graph columns */
int eps = 3; /* distance of graph lines */
/* interface for drawing (can be replaced by "real" graphic using GD or other) */
void graphInit (void);
void graphFinish();
void graphBox (char *s, int *w, int *h);
void graphDrawBox (char *s, int c, int l);
void graphDrawArrow (int c1, int l1, int c2, int l2);
/* recursive drawing of the syntax tree */
void exNode (int trip, int c, int l, int *ce, int *cm, int depth, int *needed);
/*****************************************************************************/
/* main entry point of the manipulation of the syntax tree */
/* draw_tree is taken from the yacc/lex demo by Thomas Niemann
at http://epaperpress.com/lexandyacc/
I wrote an ascii tree-drawing package of my own, but it does not
handle nodes with more than 2 children. My package uses diagonal
links and Niemann's uses a rectilinear grid; when possible I use
my own package as the diagonal tree looks more natural, falling back
to the rectilinear layout only when necessary. They cannot be mixed
within the one display.
I'm in the process of adding a third option, which is to use the
graphical layout language of DOT and generate images rather than
ascii art, so that the images can be embedded in HTML output.
*/
void draw_tree_orig(int root);
void dottree(int root)
{
char *operator;
TRIP leftkid, rightkid;
int linkno = 0;
operator = nameof(root);
leftkid = leftchild(root);
rightkid = rightchild(root);
fprintf(outfile, "\"node%d\" [\n label = \"<f0> %d: %s ", root, root, operator);
if (display_leftchild[opsym(root)]) fprintf(outfile, "| <f%0d> %d: ", ++linkno, root+1);
if (display_rightchild[opsym(root)]) fprintf(outfile, "| <f%0d> %d: ", ++linkno, root+2);
linkno = 0;
fprintf(outfile, "\"\n shape = \"record\"\n];\n\n");
if (display_leftchild[opsym(root)]) dottree(leftkid);
if (display_rightchild[opsym(root)]) dottree(rightkid);
if (display_leftchild[opsym(root)])
fprintf(outfile, "\"node%0d\":f1 -> \"node%0d\":f0 [\n id = %d\n];\n\n",
root, leftchild(root), 888);
if (display_rightchild[opsym(root)])
fprintf(outfile, "\"node%0d\":f2 -> \"node%0d\":f0 [\n id = %d\n];\n\n",
root, rightchild(root), 999);
}
void draw_tree(int trip) {
int rte, rtm, needed;
fprintf(outfile, "\nTree for AST[%d]:\n", trip);
graphInit ();
needed = FALSE;
exNode (trip, 0, 0, &rte, &rtm, 0, &needed);
if (needed) {
graphFinish();
} else {
draw_tree_orig(trip);
}
#ifdef DOT_SUPPORT
fprintf(outfile, "digraph g {\n");
fprintf(outfile, "graph [\n");
fprintf(outfile, " rankdir = \"LR\"\n");
fprintf(outfile, "];\n\n");
fprintf(outfile, "node [\n");
fprintf(outfile, " fontsize = \"16\"\n");
fprintf(outfile, " shape = \"ellipse\"\n");
fprintf(outfile, "];\n\n");
fprintf(outfile, "edge [\n");
fprintf(outfile, "];\n\n");
dottree(trip);
fprintf(outfile, "}\n");
#endif
}
/*c----cm---ce----> drawing of leaf-nodes
l leaf-info
*/
/*c---------------cm--------------ce----> drawing of non-leaf-nodes
l node-info
* |
* ------------- ...----
* | | |
* v v v
* child1 child2 ... child-n
* che che che
*cs cs cs cs
*
*/
void indentsp(int d)
{
int i;
for (i = 0; i < d*4; i++) {
putchar(' ');
}
}
void exNode
( int trip,
int c, int l, /* start column and line of node */
int *ce, int *cm, /* resulting end column and mid of node */
int depth, int *needed
)
{
int op;
int w, h; /* node width and height */
char *s; /* node text */
int cbar; /* "real" start column of node (centred above subnodes) */
int k; /* child number */
int che, chm; /* end column and mid of children */
int cs; /* start column of children */
char word[40];
//indentsp(depth);fprintf(outfile, "start: TRIP=%d startcol=%d startline=%d\n", trip, c, l);
if (trip == -1) return;
op = opsym(trip);
if (display_children[op] >= 3) *needed = TRUE;
s = nameof(trip);
sprintf(word, "%s", s);
s = word;
//indentsp(depth);fprintf(outfile, "graphbox: s = %s\n", s);
/* construct node text box */
graphBox (s, &w, &h);
cbar = c;
//indentsp(depth);fprintf(outfile, "assign: c=%d\n", c);
*ce = c + w;
*cm = c + w / 2;
/* node is leaf */
if (
(op == CONST ||
op == VAR ||
display_children[op] == 0)
) {
//indentsp(depth);fprintf(outfile, "drawbox: s = %s cbar=%d\n", s, cbar);
graphDrawBox (s, cbar, l);
return;
}
/* node has children */
cs = c;
//indentsp(depth);fprintf(outfile, "node has %d children: cs=c=%d\n", display_children[op], c);
for (k = 1; k <= display_children[op]; k++) {
//indentsp(depth);fprintf(outfile, "%d: exnode1 %d cs=%d\n", k, nthchild(trip, k), cs);
exNode (nthchild(trip, k), cs, l+h+eps, &che, &chm, depth+1, needed);
cs = che;
}
/* total node width */
if (w < che - c) {
cbar += (che - c - w) / 2;
*ce = che;
*cm = (c + che) / 2;
}
/* draw node */
//indentsp(depth);fprintf(outfile, "cbar=%d\n", cbar);
graphDrawBox (s, cbar, l);
/* draw arrows (not optimal: children are drawn a second time) */
cs = c;
for (k = 1; k <= display_children[op]; k++) {
//indentsp(depth);fprintf(outfile, "%d: exnode2 %d cs=%d\n", k, nthchild(trip, k), cs);
exNode (nthchild(trip, k), cs, l+h+eps, &che, &chm, depth+1, needed);
graphDrawArrow (*cm, l+h, chm, l+h+eps-1);
cs = che;
}
}
/* interface for drawing */
#define lmax 2000
#define cmax 2000
char graph[lmax][cmax]; /* array for ASCII-Graphic */
void graphTest (int l, int c)
{ int ok;
ok = 1;
if (l < 0) ok = 0;
if (l >= lmax) ok = 0;
if (c < 0) ok = 0;
if (c >= cmax) ok = 0;
if (ok) return;
printf ("\n+++error: l=%d, c=%d not in drawing rectangle 0, 0 ... %d, %d",
l, c, lmax, cmax);
// fprintf (stderr, "\n+++error: l=%d, c=%d not in drawing rectangle 0, 0 ... %d, %d",
// l, c, lmax, cmax);
{
int i, j;
int lmx=20, cmx=60;
for (i = 0; i < lmx; i++) {
for (j = cmx-1; j > 0 && graph[i][j] == ' '; j--);
graph[i][cmx-1] = 0;
if (j < cmx-1) graph[i][j+1] = 0;
if (graph[i][j] == ' ') graph[i][j] = 0;
}
for (i = lmx-1; i > 0 && graph[i][0] == 0; i--);
printf ("\n");
for (j = 0; j <= i; j++) printf ("\n // %s", graph[j]);
printf("\n");
};
exit (1);
}
void graphInit (void) {
int i, j;
for (i = 0; i < lmax; i++) {
for (j = 0; j < cmax; j++) {
graph[i][j] = ' ';
}
}
}
void graphFinish() {
int i, j;
char *s;
for (i = 0; i < lmax; i++) {
for (j = cmax-1; j > 0 && graph[i][j] == ' '; j--);
graph[i][cmax-1] = 0;
if (j < cmax-1) graph[i][j+1] = 0;
if (graph[i][j] == ' ') graph[i][j] = 0;
}
for (i = lmax-1; i > 0 && graph[i][0] == 0; i--);
printf ("\n");
for (j = 0; j <= i; j++) {
char *p;
s = graph[j];
// hacks to slightly improve formatting
if (j == 0) s += 2;
else if (j == 1) s += 1;
else if ((p=strchr(s, '|')) == NULL || p[1]=='|') s += 2;
printf ("\n // %s", s);
}
printf("\n\n");
}
void graphBox (char *s, int *w, int *h) {
*w = strlen (s) + del;
*h = 1;
}
void graphDrawBox (char *s, int c, int l) {
int i;
//fprintf(outfile, "c=%d strlen=%d del=%d\n", c, strlen(s), del);
graphTest (l, c+strlen(s)-1+del);
for (i = 0; i < strlen (s); i++) {
graph[l][c+i+del] = s[i];
}
}
void graphDrawArrow (int c1, int l1, int c2, int l2) {
int m;
graphTest (l1, c1);
graphTest (l2, c2);
m = (l1 + l2) / 2;
while (l1 != m) { graph[l1][c1] = '|'; if (l1 < l2) l1++; else l1--; }
while (c1 != c2) { graph[l1][c1] = '-'; if (c1 < c2) c1++; else c1--; }
while (l1 != l2) { graph[l1][c1] = '|'; if (l1 < l2) l1++; else l1--; }
graph[l1][c1] = '|';
}
#endif
/*-----------------------------------------------------------------------*/
/*-----------------------------------------------------------------------*/
#ifdef DEBUG_TREES
/* See drawtree.c for test harness. Small mods made for this interface. */
/* NOTE!!! Now we have n-ary trees, this will not work. ifthenelse breaks */
static int tree_debug = (0!=0);
static int vertical = (0==0);
static int horizontal = (0==0);
static int wide = (0!=0);
static int trim = (0==0);
static int testone = (0==0);
// row col
long pic[256][256]; // 0..255 is char, >= 256 is ptr to string
int oldtextblit(int row, int col, char *src)
{
// post-processing string expansion
int l = 0;
for (;;) {
if (*src == '\0') break;
if (tree_debug) fprintf(stderr, "1: Planting '%c' at [%d][%d]\n", *src, row, col);
pic[row][col++] = *src++;
l += 1;
}
return l;
}
int textblit(int row, int col, char *src)
{
// store pointer to string, unpack later on output
int l = strlen(src);
strcpy(stringpool+nextstring, src); pic[row][col] = 256+str_to_pool(src);
return (l+(wide?3:1))>>(wide?2:1); // half size because on diagonal
}
void layout(int id, int idx, int rowoff, int coloff, int *depth, int *width)
{
char *operator;
int op;
int leftkid, rightkid;
int leftkiddepth = 0, leftkidwidth = 0;
int rightkiddepth = 0, rightkidwidth = 0;
int deltadepth = 0, deltawidth = 0;
int i;
if (tree_debug) fprintf(stderr, ">> %d:layout(%d, rowcol[%d][%d], depth %d, width %d);\n", id, idx, rowoff, coloff, *depth, *width);
if (idx == -1) return; // was NOOP, now (null)
operator = nameof(idx);
leftkid = leftchild(idx);
rightkid = rightchild(idx);
// Anchor the corner node.
(void)textblit(rowoff, coloff, operator); /* not strcpy - don't copy NUL */
deltawidth = 1;
if (display_rightchild[opsym(idx)]) {
int len = ((strlen(nameof(leftkid))+(wide?3:1))>>(wide?2:1))+1; // text on the diagonal
while (len-- > 1) {deltawidth += 1; pic[rowoff][coloff-1+deltawidth] = (vertical ? '\\' : '-');}
// attach the RHS tree
if (tree_debug) fprintf(stderr, "Recursing to right node %d\n", rightkid);
layout(2*id, rightkid, rowoff, coloff+deltawidth, &rightkiddepth, &rightkidwidth);
deltadepth = rightkiddepth;
} else {
deltadepth = 1; /* The op itself */
}
// testing: correcting a typo
if (((strlen(operator)+(wide?3:1))>>(wide?2:1)) >= deltawidth) deltawidth = ((strlen(operator)+(wide?3:1))>>(wide?2:1))+2;
if (display_leftchild[opsym(idx)]) {
// draw the down link
// calculate extra height
if ((((strlen(nameof(leftkid))+(wide?3:1))>>(wide?2:1))) > deltadepth) {
deltadepth = ((strlen(nameof(leftkid))+(wide?3:1))>>(wide?2:1));
}
for (i = 1; i < deltadepth+1 /* +1 for spacer row */; i++) {
if (tree_debug) fprintf(stderr, "2: Planting '%c' at [%d][%d]\n", '/', rowoff+i, coloff);
pic[rowoff+i][coloff] = (horizontal ? '/' : '|');
}
// recurse on the LHS tree
if (tree_debug) fprintf(stderr, "Recursing to left node %d\n", leftkid);
layout(2*id+1, leftkid, rowoff+deltadepth+1, coloff, &leftkiddepth, &leftkidwidth);
*depth = (*depth) + leftkiddepth + deltadepth + 1;
} else *depth = (*depth) + deltadepth;
if (rightkidwidth+deltawidth > leftkidwidth) {
*width = (rightkidwidth+deltawidth);
} else {
*width = leftkidwidth;
}
if (tree_debug) fprintf(stderr, "<< %d:layout(%d, rowcol[%d][%d], depth %d, width %d);\n", id, idx, rowoff, coloff, *depth, *width);
}
void draw_tree_orig(int root)
{
int depth = 0, width = 0, row, col, offset, trimmable;
fprintf(outfile, "\n");
// Init.
for (col = 0; col < 256; col++) {
for (row = 0; row < 256; row++) {
pic[row][col] = ' ';
}
}
/* Generate layout */
layout(1, root, 128, 0, &depth, &width);
if (tree_debug) fprintf(stderr, "Dump layout: rows = %d cols = %d\n", depth, width);
if (tree_debug) fflush(stderr);
if (vertical) {
/* apply vertical shear first */
offset = 1;
for (col = 1; col < 256; col++) {
// move this column down by 'offset'
for (row = 255; row > offset; row--) {
pic[row][col] = pic[row-offset][col]; pic[row-offset][col] = ' ';
}
offset += 1;
}
}
if (horizontal) {
/* apply horizontal shear next */
row = 255; // start at bottom of drawing
offset = 0;
for (;;) {
static long temp[1024];
for (col = 0; col < 256; col++) {
temp[col] = ' ';
}
for (col = 0; col < 256; col++) {
temp[col*2+offset] = pic[row][col];
temp[col*2+offset+1] = ' ';
}
for (col = 0; col < 256; col++) {
pic[row][col] = temp[col];
}
if (row == 0) break;
offset += 1; /* more shear on next row up */
row -= 1;
}
}
if (trim) {
trimmable = (0==0);
for (;;) {
for (row = 0; row < 256; row++) {
if (pic[row][0] != ' ') {
trimmable = (0!=0);
break;
}
}
if (!trimmable) break;
for (row = 0; row < 256; row++) {
for (col = 0; col+1 < 256; col++) {
pic[row][col] = pic[row][col+1];
}
pic[row][255] = ' ';
}
}
}
if (wide) {
/* apply widening last */
row = 255; // start at bottom of drawing
offset = 0;
for (;;) {
static long temp[1024];
for (col = 0; col < 256; col++) {
temp[col] = ' ';
}
for (col = 0; col < 256; col++) {
temp[col*2+offset] = pic[row][col];
temp[col*2+offset+1] = ' ';
}
for (col = 0; col < 256; col++) {
pic[row][col] = temp[col];
}
if (row == 0) break;
row -= 1;
}
}
/* display tree */
for (row = 0; row < 256; row++) {
trimmable = (0 == 0);
for (col = 0; col < 256; col++) {
if (pic[row][col] != ' ') {
trimmable = (0!=0);
break;
}
}
if (!trimmable) {
fprintf(outfile, " "); // INDENT
for (col = 255; col >= 0; col--) {
if (pic[row][col] != ' ') break;
pic[row][col] = '\0';
}
printf(" // ");
for (col = 0; col < 256; col++) {
if ((pic[row][col] < -128) || (pic[row][col] > 255)) {
oldtextblit(row, col, &stringpool[pic[row][col]-256]);
} else {
if (pic[row][col] == '\0') break;
putchar(pic[row][col]);
}
}
putchar('\n');
}
}
putchar('\n');
fflush(outfile);
return;
}
#endif /* DEBUG_TREES */
« Parsing. (Table driven top-down recursive-descent parser) »
« The parser used here is based on a design by Tony Brooker which was
originally used in Atlas Autocode and the "Compiler Compiler".
It generates a concrete syntax tree rather than the abstract
syntax tree more popular in modern compilers. A later phase
converts from concrete to abstract.
Note that the parsing procedure here is just a piece of code
to walk a pre-built table. There is nothing in this section
which reflects the grammar, if that is what you are looking for.
You'll find the grammar embedded in the 'compile()' procedure
in the following section.
»
int cp = 0; // code pointer. Has to be global state.
int ap = 0; // Analysis record pointer.
int parse(int pp, int depth) // depth is only for indentation in diags
{
« Main parsing procedure. This is a table-driven parser, with the tables
being generated from the grammar rules embedded in the 'compile' procedure
below. The result of the parse is a tree structure, and the values of the
nodes in the tree structure are used to drive a large 'case' statement
which selects the actions to be performed after a successful parse.
There is no grammatical structure embedded in this procedure. If you're
looking for the grammar definition, see the procedure called 'compile' instead.
» int saved_cp, saved_ap, i, gp, alts, match;
char saved_desc[256];
« Initialisation »
for (i = 0; i < alts; i++) {
« Starting with the root phrase, recursively examine each alternative » int each, phrases = gram[gp++], phrase_count, gap = 0;
cp = saved_cp;
ap = saved_ap;
if (ap+3 > next_free_a) next_free_a = ap+3;
makespace(A, next_free_a, a_size);
A[ap++] = pp; // record which phrase (could be done outside loop)
A[ap++] = i; // and which alt.
// Count slots needed. *Could* be precalculated and stored
// in the grammar, either embedded (after the ALT) or as a
// separate table
for (each = 0; each < phrases; each++) if (gram[gp+each] >= 512) gap++;
A[ap++] = gap; // Count of alts (gap)
// ap+gap now points to the slot after the space required, which
// is where the first subphrase will be stored.
ap = ap+gap; // recursive subphrases are stored after this phrase.
// ap is left updated if successful.
// successfully parsed phrases are stored in A[saved_ap+3+n]
if (saved_ap+3+gap > next_free_a) next_free_a = saved_ap+3+gap;
makespace(A, next_free_a, a_size);
« Debug » // this loop is only for diagnostics
#ifdef DEBUG_PARSER
if (debug_parser) {
char *saved_descp;
fprintf(outfile, "\n");
indent(depth, outfile);
fprintf(outfile, "Alternative %d: (%d phrases) ", i+1, phrases);
saved_descp = saved_desc; *saved_descp = '\0';
for (each = 0; each < phrases; each++) {
int phrase = gram[gp+each];
if (phrase < 256) {
saved_descp += sprintf(saved_descp, " '%c'", phrase);
} else if (phrase < 512) {
saved_descp += sprintf(saved_descp, " \"%s\"/%d", keyword[phrase-256], phrase-256);
} else if (phrase < 512+MAX_BIP) {
saved_descp += sprintf(saved_descp, " {%s/BIP%d}", phrasename[phrase-512], BIP[phrase-512]);
} else {
saved_descp += sprintf(saved_descp, " <%s/%d>", phrasename[phrase-512], phrase);
}
}
fprintf(outfile, "%s\n", saved_desc);
fflush(outfile);
}
#endif
match = TRUE; // stays true if all subphrases match
phrase_count = 0; // only phrases which make it into the A record,
// i.e. excluding literals and keywords
for (each = 0; each < phrases; each++) {
« Within a single grammar rule (alternative), ensure that each subphrase is present » int phrase = gram[gp+each];
« Debug »#ifdef DEBUG_PARSER
if (debug_parser) {
indent(depth, outfile);
fprintf(outfile, "Input token stream = '%s' '%s' '%s' ...\n",
(cp < nextfree ? c[cp].s : "EOF"),
(cp+1 < nextfree ? c[cp+1].s : "EOF"),
(cp+2 < nextfree ? c[cp+2].s : "EOF"));
}
#endif
if (cp > bestparse) {
static char s[128]; // off heap instead? - binary size
#ifdef DEBUG_PARSER
if (phrase < 256) {
sprintf(s, "'%c'", phrase);
} else if (phrase < 512) {
sprintf(s, "\"%s\"", keyword[phrase-256]);
} else if (phrase < 512+MAX_BIP) {
sprintf(s, "{%s}", phrasename[phrase-512]);
} else {
sprintf(s, "<%s>", phrasename[phrase-512]);
}
#endif
looking_for = s;
bestparse = cp;
}
#ifdef DEBUG_PARSER
if (debug_parser) indent(depth, outfile);
#endif
if (phrase < 256) {
« Literal » } else if (phrase < 512) {
« Keyword » } else if (phrase < 512+MAX_BIP) {
« Built-in phrase » } else {
« Recursive call to parser for a subphrase » }
« debug » if (!match) break;
}
gp += phrases; // move over all phrases, to next alt
if (match) break;
#ifdef DEBUG_PARSER
else if (debug_parser) {
indent(depth, outfile);
fprintf(outfile, "** Alternative %d FAILED.\n", i+1);
}
#endif
// gp now points to first element (length) of next alt, or start of next phrase
}
return(match);
}
« Convert the concrete syntax tree into an abstract syntax tree.
(the grammar itself is also embedded in this section) »
« This is primarily the main 'compile()' procedure, which is actually where
the grammar of the language is defined. The grammar is extracted from
in-line comments, and converted into a table by the 'takeon' program
which you can find in the same directory as this file.
(You can view the extracted grammar in file "varcalc.g")
The style of compiler on which this design is based actually goes
directly from concrete syntax tree to code generation - but that was
from the days when memory was tight. Since most modern compilers -
and especially books about them - are AST-based, we'll take that
extra step here in order to give our students an AST-based compiler
to experiment with.
»
« Compiler tree-walking support code »«
In previous compilers, I had to write custom code for every tree-based
optimisation, in order to walk down the tree to the right place to find
the leaves to be optimised. In this one, I have a generic tree-walking
procedure which can walk the entire program, but it can be customised
so that it takes action only on specific phrases. This is possible in
this design only because each set of subphrases stores the count of
subphrases befoe it - thus allowing a generic tree-walking procedure
that doesn't have to know what each node consists of until it happens
across a node of the type it is looking for.
However this only walks the concrete syntax tree - there's a separate
Walk_AST() procedure to do the same to the AST.
»void walk_analysis_record(int ap, int depth, int wanted(int phraseno), void perform(int ap, int before, int depth))
{
int i;
if (wanted(A[ap])) perform(ap, TRUE, depth);
for (i = 3; i < A[ap+2]+3; i++) {
if (A[A[ap+i]] >= 512+MAX_BIP) walk_analysis_record(A[ap+i], depth+1, wanted, perform);
}
if (wanted(A[ap])) perform(ap, FALSE, depth);
}
int want_all(int phraseno) {
return TRUE;
}
void print_all(int ap, int before, int depth) {
#ifdef DEBUG_PARSER
int saved_ap = ap;
int phrase = A[ap++];
int alt = A[ap++];
int phrases = A[ap++]; // defined subphrases
int i;
indent(depth, stderr);
fprintf(stderr, "<%s%s/%d> ", (before ? "" : "/"), phrasename[phrase-512], alt);
for (i = 0; i < (3+phrases); i++) {
fprintf(stderr, "A[%d] = %d, ", saved_ap+i, A[saved_ap+i]);
}
fprintf(stderr, "\n");
#endif
}
/*
This is similar to the previous walking procedure for the concrete syntax tree.
Under development, not yet debugged. Problem may be conceptual - is it always
safe to walk all children of a node? I think we need a bitmap array of flags
for each node to say whether each child is walkable or a terminal.
Ideally terminals would only be on ops with no children...
The tree-printing code also needs this same improvement - for the moment
it only the first two children that have information in a special-case table...
(see tables display_children/display_leftchild etc)
*/
void Walk_AST(TRIP p, int depth, int wanted(TRIP p), void perform(TRIP p, int before, int depth))
{
int i;
if ((p == -1) || (AST[p] == -1)) return;
//fprintf(stderr, "Walk_AST(%d): %s\n", p, name[opsym(p)]);
if (wanted(p)) perform(p, TRUE, depth);
// the extra tests below are rather hacky and could be removed if the AST were
// better defined, so that only teminal nodes contained content other than pointers to other AST nodes
for (i = 1; i < tripsize(p); i++) {
if (nthchild(p, i) != -1) {
//fprintf(stderr, " child(%d): %s\n", i, name[opsym(nthchild(p, i))]);
if (display_children[opsym(nthchild(p, i))] > 0) Walk_AST(nthchild(p, i), depth+1, wanted, perform);
}
}
if (wanted(p)) perform(p, FALSE, depth);
}
int want_decls(TRIP p) {
return ((opsym(p) == DECLAREARRAY) || (opsym(p) == DECLARESCALAR));
}
int want_procs(TRIP p) {
return (opsym(p) == DEFPROC);
}
int test_DEFPROC_or_NOOP(TRIP p) {
return ((opsym(p) == DEFPROC) || (opsym(p) == NOOP) || want_decls(p));
}
int test_DECL_or_NOOP(TRIP p) {
return ((opsym(p) == NOOP) || want_decls(p));
}
void hide_one_DEFPROC_and_DECLS(TRIP p, int before, int depth) {
if (before) {
if (opsym(p) == DEFPROC)
opsym(p) = NOOP;
else if (opsym(p) == DECLARESCALAR)
opsym(p) = NOOP;
else if (opsym(p) == DECLAREARRAY)
opsym(p) = NOOP;
}
}
void hide_DEFPROC_and_NOOP_and_DECLS(TRIP root)
{
Walk_AST(root, 0, test_DEFPROC_or_NOOP, hide_one_DEFPROC_and_DECLS);
}
void hide_one_DECL(TRIP p, int before, int depth) {
// static locals in procedures were being printed twice
if (before) {
if (opsym(p) == DECLARESCALAR)
opsym(p) = NOOP;
else if (opsym(p) == DECLAREARRAY)
opsym(p) = NOOP;
}
}
void hide_DECLS_and_NOOP(TRIP root)
{
Walk_AST(root, 0, test_DECL_or_NOOP, hide_one_DECL);
}
int test_return(TRIP p) {
return (opsym(p) == RETURN);
}
static int return_type;
void record_return_types(TRIP p, int before, int depth) {
if (before) {
return_type = leftchild(p);
}
}
int locate_returns(TRIP procroot)
{
return_type = -1; // will be overridden if a result is returned
Walk_AST(procroot, 0, test_return, record_return_types);
return return_type;
}
void codegen_stack(TRIP root);
void output_stack_code(TRIP p, int before, int depth) {
if (!before) return;
codegen_stack(p);
}
void codegen_c(TRIP root);
void output_c(TRIP p, int before, int depth) {
if (before) {
codegen_c(p);
}
}
void output_top_level_statements(int ap, int depth, void perform(int ap, int depth))
{
int i;
if ((ap == -1) || (AST[ap] == -1)) return;
if (opsym(ap) == SEQUENCE) {
output_top_level_statements(leftchild(ap), depth+1, perform);
output_top_level_statements(rightchild(ap), depth+1, perform);
} else perform(ap, depth);
}
void print_all_AST(int ap, int before, int depth) {
int saved_ap = ap;
int phrase = A[ap++];
int alt = A[ap++];
int phrases = A[ap++]; // defined subphrases
int i;
#ifdef DEBUG_PARSER
indent(depth, stderr);
fprintf(stderr, "<%s%s/%d> ", (before ? "" : "/"), phrasename[phrase-512], alt);
for (i = 0; i < (3+phrases); i++) {
fprintf(stderr, "A[%d] = %d, ", saved_ap+i, A[saved_ap+i]);
}
fprintf(stderr, "\n");
#endif
}
void print_trees(int ap, int depth) {
#ifdef DEBUG_PARSER
#ifdef DRAW_TREES
if (opsym(ap) != SEQUENCE) {
draw_tree(ap);
return;
}
if (opsym(leftchild(ap)) != SEQUENCE) {
draw_tree(leftchild(ap));
}
if (opsym(rightchild(ap)) != SEQUENCE) {
draw_tree(rightchild(ap));
}
#endif
#endif
}
TRIP compile(int ap, int depth)
{
« Main code-generation procedure. This is called after parsing,
with parameters which describe the parse tree. By jumping to the
corresponding statement in the large 'case' below, we execute the
actions associated with the parse-tree nodes.
The grammar which was used to build the parse tables is extracted
from the source below (from comments marked with "//\\") and the
tables are built with the associated 'takeon' program. (See this
same directory for the source. It's quite short...) »
int saved_ap;
int phrase; // A[ap] is the phrase number. A[ap+1] is the alt.
int alt; // For consistency, in BIPs, the Alt should always be 1
// although that may not be the case at the moment :-(
int phrases; // defined subphrases
int i;
TRIP t1, t2, t3, t4, t5; // Temporaries
// The following ecce command executed on this file will generate varcalc.g:
// ecce -c "(v.//\\.s..(v/ /e)?m,k)0;%c" varcalc.c varcalc.g
// May later tweak takeon.c to read from varcalc.c rather than varcalc.g
// thus simplifying build process and becoming more like yacc.
saved_ap = ap;
phrase = A[ap++];
alt = A[ap++];
phrases = A[ap++];
#ifdef DEBUG
// fprintf(outfile, "compile(A[%d], %d) phrase=%s\n", saved_ap, depth, phrasename[phrase-512]);
#endif
switch (phrase) {
« Built-in phrases »//\\ # BIPS (Built-in Phrases) are linked to the type-code returned
//\\ # by the line-reconstruction code (aka lexer)
//\\
//\\ # These *must* come first.
// See TYPE_* in first page for the values to use.
//\\
//\\ B<IDENT>=1;
case P_IDENT: // NEED A makevar CALL!
if (c[A[ap]].l > latest_line) latest_line = c[A[ap]].l;
////////////////////////////////////////////////////////fprintf(outfile, "%d: %s\n", c[A[ap]].l, c[A[ap]].s);
return new_or_existingtag(c[A[ap]].s);
//\\ B<NUM>=5;
case P_NUM:
if (c[A[ap]].l > latest_line) latest_line = c[A[ap]].l;
////////////////////////////////////////////////////////fprintf(outfile, "%d: %s\n", c[A[ap]].l, c[A[ap]].s);
return make_int_const(/*INT*/TYPE_INT, c[A[ap]].s);
//\\ B<STRING>=2;
case P_STRING:
if (c[A[ap]].l > latest_line) latest_line = c[A[ap]].l;
////////////////////////////////////////////////////////fprintf(outfile, "%d: %s\n", c[A[ap]].l, c[A[ap]].s);
return make_string_const(TYPE_STRING, c[A[ap]].s);
//\\ B<CHARLIT>=3;
case P_CHARLIT:
if (c[A[ap]].l > latest_line) latest_line = c[A[ap]].l;
////////////////////////////////////////////////////////fprintf(outfile, "%d: %s\n", c[A[ap]].l, c[A[ap]].s);
return make_binary_tuple(CONST, /*INT*/TYPE_CHARCONST, (int)*c[A[ap]].s); // NEEDS TO BE DECORATED WITH THE INFO THAT THIS WAS AN ASCII SYMBOL!
//\\
//\\ # Phrase definitions. PROGRAM is the main entry point.
//\\
case P_PROGRAM:
//\\ P<PROGRAM> = <DECLARATIONS> <SSLIST>;
t1 = compile(A[ap], depth+1);
t2 = compile(A[ap+1], depth+1);
if (t2 == -1) return t1;
return sequence(t1, t2); // break;
case P_DECLARATIONS:
//\\ P<DECLARATIONS> = <VARDECL> <DECLARATIONS>,
//\\ <PROCDECL> <DECLARATIONS>,
//\\ <DEFPROC> <DECLARATIONS>,
//\\ <ARRAY> <DECLARATIONS>,
//\\ <EXTERN> <DECLARATIONS>,
//\\ ;
if (alt == 5) return -1;
t1 = compile(A[ap], depth+1);
t2 = compile(A[ap+1], depth+1);
if (t2 == -1) return t1;
return sequence(t1, t2); // break;
case P_PROCDECLARATIONS:
//\\ P<PROCDECLARATIONS> = <VARDECL> <DECLARATIONS>,
//\\ <ARRAY> <DECLARATIONS>,
//\\ <EXTERN> <DECLARATIONS>,
//\\ ;
if (alt == 3) return -1;
t1 = compile(A[ap], depth+1);
t2 = compile(A[ap+1], depth+1);
if (t2 == -1) return t1;
return sequence(t1, t2); // break;
case P_PROCDECL:
//\\ P<PROCDECL> = "proc" <EXISTINGVAR> '(' <ARGLIST> ')' <PROCDECLARATIONS> <SSLIST> "end" <OPTSEMI>;
// some <SS>'s need to checked for and rejected semantically rather than syntactically
t1 = compile(A[ap], depth+1); // procname
t2 = compile(A[ap+1], depth+1); // args
t3 = compile(A[ap+2], depth+1); // decls - some invalid
t4 = compile(A[ap+3], depth+1); // sslist - ditto
if (t3 == -1) t5 = t4; else t5 = sequence(t3, t4);
return make_nary_tuple(DEFPROC, t1, t2, t5);
case P_SS: // simple statement that can occur almost anywhere
//\\
//\\ P<SS> = <EXISTINGVAR> '=' <BOOLEXPR> <OPTIF> <OPTSEMI>,
if (alt == 0) {
t1 = compile(A[ap], depth+1);
t2 = compile(A[ap+1], depth+1);
t3 = make_binary_tuple(ASSIGNSCALAR, t1, t2);
t4 = compile(A[ap+2], depth+1);
if (t4 != -1) {
rightchild(t4) = t3; // plug the 'then' part into the IFTHEN opcode
return t4;
}
return t3;
} else if (alt == 1) {
//\\ <EXISTINGVAR> '[' <EXPR> ']' '=' <BOOLEXPR> <OPTIF> <OPTSEMI>,
t1 = compile(A[ap], depth+1);
t2 = compile(A[ap+1], depth+1);
t3 = compile(A[ap+2], depth+1);
t4 = make_nary_tuple(ASSIGNARRAY, t1, t2, t3);
t5 = compile(A[ap+3], depth+1);
if (t5 != -1) {
rightchild(t5) = t4; // plug the 'then' part into the IFTHEN opcode
return t5;
}
return t4;
} else if (alt == 2) {
//\\ <PROCCALL> <OPTIF> <OPTSEMI>,
t1 = compile(A[ap], depth+1);
t2 = compile(A[ap+1], depth+1);
if (t2 == -1) return t1;
rightchild(t2) = t1;
return t2;
} else if ((alt == 3) || (alt == 4)) {
//\\ <IFSEQ> <OPTSEMI>,
//\\ <LOOP> <OPTSEMI>,
return compile(A[ap], depth+1);
} else if ((alt == 5) || (alt == 6)) {
//\\ "while" '(' <BOOLEXPR> ')' "do" <SSLIST> "endwhile" <OPTSEMI>,
//\\ "loop" <SSLIST> "endloop" <OPTSEMI>,
// while, loop - NOT IMPL.
return -1;
} else if (alt == 7) { // return var - error if not in a function
//\\ "return" <BOOLEXPR> <OPTIF> <OPTSEMI>,
t1 = make_unary_tuple(RETURN, compile(A[ap], depth+1)); // function result
t2 = compile(A[ap+1], depth+1);
if (t2 == -1) return t1;
rightchild(t2) = t1;
return t2;
} else if (alt == 8) { // error if not in a procedure
//\\ "return" <OPTIF> <OPTSEMI>,
t1 = make_unary_tuple(RETURN, -1); // procedure return
t2 = compile(A[ap], depth+1);
if (t2 == -1) return t1;
rightchild(t2) = t1;
return t2;
} else if (alt == 9) { // jump
//\\ "jump" <EXISTINGVAR> <OPTIF> <OPTSEMI>
t1 = make_unary_tuple(GOTO, compile(A[ap], depth+1));
t2 = compile(A[ap+1], depth+1);
if (t2 == -1) return t1;
rightchild(t2) = t1;
return t2;
}
//\\ ;
case P_SSLIST:
//\\ P<SSLIST> = <OPTLABELS> <SS> <SSLIST>,
//\\ ;
if (alt == 1) return -1;
t1 = compile(A[ap], depth+1);
t2 = compile(A[ap+1], depth+1);
t3 = compile(A[ap+2], depth+1);
if (t1 == -1) t4 = t2; else t4 = sequence(t1, t2);
if (t3 == -1) return t4;
return sequence(t4, t3);
case P_CONSTDECL:
//\\
//\\ P<CONSTDECL> = "const" <EXISTINGVAR> '=' <NUMBER> <OPTSEMI>;
return -1; // not implemented
// Declarations need to be in the context of global or local to a procedure,
// and the latter are either static or auto.
case P_VARDECLIST:
//\\
//\\ P<VARDECLIST> = ',' <VARDEC> <VARDECLIST>,
//\\ ;
if (alt == 1) return -1; // drop through
case P_VARDECL:
//\\
//\\ P<VARDECL> = "static" <VARDEC> <VARDECLIST> <OPTSEMI>;
t1 = compile(A[ap], depth+1);
t2 = compile(A[ap+1], depth+1);
if (t2 == -1) return t1;
return sequence(t1, t2);
case P_VARDEC: // = <BOOLEXPR> ?????????????????
//\\
//\\ P<VARDEC> = <IDENT> '=' <BOOLEXPR>,
//\\ <IDENT>;
t1 = compile(A[ap], depth+1);
if (alt == 0) {
t2 = compile(A[ap+1], depth+1);
t3 = make_binary_tuple(VAR, t1, INT);
t4 = make_binary_tuple(DECLARESCALAR, t1, t2);
t5 = make_binary_tuple(ASSIGNSCALAR, t3, t2);
return sequence(t4, t5); // Or should we subsume the ASSIGN into the DECLARE?
} else if (alt == 1) {
return make_binary_tuple(DECLARESCALAR, t1, -1 /* no init */);
}
case P_OPTLABELS: // 0 or more labels
//\\ P<OPTLABELS> = <LABEL> <OPTLABELS>,
//\\ ;
if (alt == 1) return -1;
t1 = compile(A[ap], depth+1);
t2 = compile(A[ap+1], depth+1);
if (t2 == -1) return t1;
return sequence(t1, t2);
case P_LABEL:
//\\ P<LABEL> = <EXISTINGVAR> ':';
return make_unary_tuple(LABEL, compile(A[ap], depth+1));
//\\ P<SYNTAXERROR> = ;
case P_SYNTAXERROR: // also not implemented yet
fprintf(stderr, "*** Syntax error. Details later.\n");
exit(1);
//\\ P<EXISTINGVAR> = <IDENT>;
case P_EXISTINGVAR:
t1 = compile(A[ap], depth+1);
return make_binary_tuple(VAR, t1, INT);
// WRONG!!! Needs to *find* existing tuple with this tag - need proper symbol table management TO DO BUG use getvar?
// return getvar_from_tag(t1);
//\\ P<NEWVAR> = <IDENT>;
case P_NEWVAR:
t1 = compile(A[ap], depth+1);
return make_binary_tuple(VAR, t1, INT);
case P_EXTERN:
//\\ P<EXTERN> = "extern" <EXISTINGVAR> <OPTSEMI>;
return -1; // not implemented
case P_ARGLIST:
//\\ P<ARGLIST> = <EXISTINGVAR> <ARGLIST>,
//\\ ;
if (alt == 1) return -1;
t1 = compile(A[ap], depth+1);
t2 = compile(A[ap+1], depth+1);
return make_binary_tuple(DEFPARAM, t1, t2);
case P_ARRAY:
//\\
//\\ P<ARRAY> = "const" "array" <EXISTINGVAR> '[' <NUM> ']' '=' <INITLIST> <OPTSEMI>,
//\\ "array" <IDENT> '[' <NUM> ']' <OPTSEMI>;
t1 = compile(A[ap], depth+1);
t2 = compile(A[ap+1], depth+1);
if (alt == 0) {
t3 = compile(A[ap+2], depth+1);
} else t3 = -1;
return make_nary_tuple(DECLAREARRAY, t1, t2, t3);
case P_INITLIST:
//\\
//\\ P<INITLIST> = <NUM> <INITLIST>,
//\\ ;
if (alt == 1) return -1;
t1 = compile(A[ap], depth+1);
t2 = compile(A[ap+1], depth+1);
return make_binary_tuple(ARRAYEL /* was: PARAM*/, t1, t2);
case P_DEFPROC:
//\\
//\\ P<DEFPROC> = "let" <EXISTINGVAR> '(' <FORMALS> ')' '=' <BOOLEXPR> <OPTSEMI>;
t1 = compile(A[ap], depth+1);
t2 = compile(A[ap+1], depth+1);
t3 = compile(A[ap+2], depth+1);
return make_nary_tuple(DEFPROC, t1, t2, t3);
case P_FORMALS:
//\\
//\\ P<FORMALS> = <EXISTINGVAR> <RESTOFFORMALS>;
t1 = compile(A[ap], depth+1);
t2 = compile(A[ap+1], depth+1);
return make_binary_tuple(DEFPARAM, t1, t2);
case P_RESTOFFORMALS:
//\\
//\\ P<RESTOFFORMALS> = ',' <EXISTINGVAR> <RESTOFFORMALS>, ;
if (alt == 1) return -1;
t1 = compile(A[ap], depth+1);
t2 = compile(A[ap+1], depth+1);
return make_binary_tuple(DEFPARAM, t1, t2);
case P_IFSEQ:
//\\ P<IFSEQ> = "if" <BOOLEXPR> <OPTSEMI> <SSLIST> <OPTELSEPART> "finish";
t1 = compile(A[ap], depth+1);
t2 = compile(A[ap+2], depth+1);
t3 = compile(A[ap+3], depth+1);
if (t3 == -1) return make_binary_tuple(IFTHEN, t1, t2);
return make_nary_tuple(IFTHENELSE, t1, t2, t3);
case P_OPTELSEPART: // an elseif is returned as if it were a simple elsepart, which may contain a nested if/then/else
//\\ P<OPTELSEPART> = "else" "if" <BOOLEXPR> <OPTSEMI> <SSLIST> <OPTELSEPART>,
//\\ "elseif" <BOOLEXPR> <OPTSEMI> <SSLIST> <OPTELSEPART>,
//\\ "else" <OPTSEMI> <SSLIST>,
//\\ ;
if (alt == 3) return -1;
if (alt == 2) return compile(A[ap+1], depth+1);
t1 = compile(A[ap], depth+1);
t2 = compile(A[ap+2], depth+1);
t3 = compile(A[ap+3], depth+1);
if (t3 == -1) return make_binary_tuple(IFTHEN, t1, t2);
return make_nary_tuple(IFTHENELSE, t1, t2, t3);
case P_LOOP: // Change this: pull the condition out of the OPTIF and adjust the codegen where it is used to match. Remember this is an AST not a CST.
//\\ P<LOOP> = "cycle" <OPTSEMI> <SSLIST> "repeat" <OPTIF>;
t1 = compile(A[ap+1], depth+1);
t2 = compile(A[ap+2], depth+1);
return make_binary_tuple(REPEATIF, t1, t2);
case P_PROCCALL:
//\\
//\\ P<PROCCALL> = <EXISTINGVAR> '(' <PARAMLIST> ')',
//\\ "print" <STRING>;
if (alt == 0) {
t1 = compile(A[ap], depth+1); // VAR (ident is a var; leftchild of ident is a tag; rightchild of tag is index into stringpool)
t2 = compile(A[ap+1], depth+1); // expr param
t3 = leftchild(t1); // TAG (fn name)
t4 = rightchild(t3); // stringpool offset
return make_binary_tuple(PROCCALL, make_proc_name(t4), t2);
} else {
return make_unary_tuple(PRINT, compile(A[ap], depth+1)); // for now
}
case P_OPTIF:
//\\
//\\ P<OPTIF> = "if" <BOOLEXPR>,
//\\ ;
if (alt == 1) return -1;
return make_binary_tuple(IFTHEN, compile(A[ap], depth+1), -1); // 'then' part is a hole to be filled.
case P_BOOLEXPR:
//\\
//\\ P<BOOLEXPR> = <BOOLTERM> <RESTOFBOOLTERM>;
« (Click here to expand the code) »
case P_RESTOFBOOLTERM:
//\\
//\\ P<RESTOFBOOLTERM> = <BOROP> <BOOLTERM> <RESTOFBOOLTERM>,
//\\ ;
« (Click here to expand the code) »
case P_BOOLTERM:
//\\
//\\ P<BOOLTERM> = <BOOLFACTOR> <RESTOFBOOLFACTOR>;
« (Click here to expand the code) »
case P_RESTOFBOOLFACTOR:
//\\
//\\ P<RESTOFBOOLFACTOR> = <BANDOP> <BOOLFACTOR> <RESTOFBOOLFACTOR>,
//\\ ;
« (Click here to expand the code) »
case P_BOOLFACTOR:
//\\
//\\ P<BOOLFACTOR> = <OPTNOT> <RELATION>;
« (Click here to expand the code) »
case P_OPTNOT:
//\\
//\\ P<OPTNOT> = '!',
//\\ ;
if (alt == 0) return mkop(NOT);
return -1;
case P_OPTSEMI:
//\\
//\\ P<OPTSEMI> = ';',
//\\ ;
return -1;
case P_RELATION:
//\\
//\\ P<RELATION> = <EXPR> <RESTOFRELATION>;
« (Click here to expand the code) »
case P_RESTOFRELATION:
//\\
//\\ P<RESTOFRELATION> = <RELOP> <EXPR>,
//\\ ;
« (Click here to expand the code) »
case P_EXPR:
//\\
//\\ P<EXPR> = <SUM> <RESTOFEXPR>;
« (Click here to expand the code) »
case P_RESTOFEXPR:
//\\
//\\ P<RESTOFEXPR> = <SHIFTOP> <SUM> <RESTOFEXPR>,
//\\ ;
« (Click here to expand the code) »
case P_UNOP:
//\\
//\\ P<UNOP> = '+', '-', '~';
return (alt == 0 ? -1 : (alt == 1 ? mkop(NEG) : mkop(NOT)));
case P_OPTUNOP:
//\\
//\\ P<OPTUNOP> = <UNOP>,
//\\ ;
« (Click here to expand the code) »
case P_ADDOP: // DO NOT USE AS UNARY!!!
//\\
//\\ P<ADDOP> = '+', '-';
return (alt ==0 ? mkop(ADD) : mkop(SUB));
case P_OPTADDOP:
//\\
//\\ P<OPTADDOP> = <ADDOP>,
//\\ ;
« (Click here to expand the code) »
// fixed a bug here by reorderinhg tree nodes.
// A - B + C was being parsed as A - (B + C) which is equivalent to A - B - C !!!!!
case P_SUM:
//\\
//\\ P<SUM> = <OPTUNOP> <TERM> <RESTOFSUM>;
« (Click here to expand the code) »
case P_RESTOFSUM:
//\\
//\\ P<RESTOFSUM> = <ADDOP> <TERM> <RESTOFSUM>,
//\\ ;
« (Click here to expand the code) »
case P_TERM:
//\\
//\\ P<TERM> = <BITFACTOR> <RESTOFTERM>;
« (Click here to expand the code) »
case P_RESTOFTERM:
//\\
//\\ P<RESTOFTERM> = <MULOP> <BITFACTOR> <RESTOFTERM>,
//\\ ;
« (Click here to expand the code) »
//---------------- adding:
case P_BITFACTOR:
//\\
//\\ P<BITFACTOR> = <FACTOR> <RESTOFBITFACTOR>;
« (Click here to expand the code) »
case P_RESTOFBITFACTOR:
//\\
//\\ P<RESTOFBITFACTOR> = <BITOP> <FACTOR> <RESTOFBITFACTOR>,
//\\ ;
« (Click here to expand the code) »
case P_BITOP: // STILL TO test 'OR'...
//\\
//\\ P<BITOP> = '&', '|';
return (alt == 0 ? mkop(LAND) : mkop(LOR));
//----------------
case P_BOROP:
//\\
//\\ P<BOROP> = '||';
return mkop(BOR);
case P_BANDOP:
//\\
//\\ P<BANDOP> = '&&';
return mkop(BAND);
case P_RELOP:
//\\
//\\ P<RELOP> = '<>', '<=', '<', '>=', '>', '!=', '==', '#', '=';
switch (alt) {
case 0:
case 7:
case 5: return mkop(NE);
case 1: return mkop(LE);
case 2: return mkop(LT);
case 3: return mkop(GE);
case 4: return mkop(GT);
case 6:
case 8: return mkop(EQ);
}
case P_SHIFTOP:
//\\
//\\ P<SHIFTOP> = '<<', '>>';
return (alt == 0 ? mkop(LSH) : mkop(RSH));
case P_MULOP:
//\\
//\\ P<MULOP> = '*', '/', '%';
return (alt == 0 ? mkop(MUL) : (alt == 1 ? mkop(DIV) : mkop(MOD)));
case P_FACTOR:
//\\
//\\ P<FACTOR> = '(' <BOOLEXPR> ')',
//\\ <EXISTINGVAR> '(' <PARAMLIST> ')',
//\\ <EXISTINGVAR> '[' <EXPR> ']',
//\\ <EXISTINGVAR>,
//\\ <NUMBER>;
if (alt == 0) {
return compile(A[ap], depth+1);
} else if (alt == 1) {
t1 = compile(A[ap], depth+1); // VAR (ident is a var; leftchild of ident is a tag; rightchild of tag is index into stringpool)
t2 = compile(A[ap+1], depth+1); // expr param
t3 = leftchild(t1); // TAG (fn name)
t4 = rightchild(t3); // stringpool offset
return make_binary_tuple(FNCALL, make_proc_name(t4), t2);
} else if (alt == 2) { // indexed array element
t1 = compile(A[ap], depth+1); // VAR (ident is a var; leftchild of ident is a tag; rightchild of tag is index into stringpool)
t2 = compile(A[ap+1], depth+1);
return make_binary_tuple(INDEX, t1, t2);
} else if (alt == 3) {
return compile(A[ap], depth+1);
} else { // alt = 4
return compile(A[ap], depth+1);
}
case P_EXPRORSTRING:
//\\ P<EXPRORSTRING> = <EXPR>, <STRING>;
t1 = compile(A[ap], depth+1);
return t1;
case P_PARAMLIST:
//\\
//\\ P<PARAMLIST> = <EXPRORSTRING> <RESTOFPARAMLIST>,
//\\ ;
//\\
if (alt == 1) return -1;
t1 = compile(A[ap], depth+1);
t2 = compile(A[ap+1], depth+1);
return make_binary_tuple(PARAM, t1, t2);
case P_RESTOFPARAMLIST:
//\\
//\\ P<RESTOFPARAMLIST> = ',' <EXPRORSTRING> <RESTOFPARAMLIST>,
//\\ ;
//\\
if (alt == 1) return -1;
t1 = compile(A[ap], depth+1);
t2 = compile(A[ap+1], depth+1);
return make_binary_tuple(PARAM, t1, t2);
case P_NUMBER:
//\\
//\\ P<NUMBER> = <NUM>,
//\\ <UNOP> <NUM>,
//\\ <CHARLIT>;
t1 = compile(A[ap], depth+1);
if (alt == 1) {
return make_unary_tuple(leftchild(t1), compile(A[ap+1], depth+1));
}
return t1;
//\\
//\\ E
//\\ # 'E' is end of grammar. Everything after this is ignored.
default:
fprintf(stderr, "*** Internal error at line %d. ap=%d phrase=%d", __LINE__, ap, phrase);
if (((phrase-512) >= 0) && ((phrase-512) < MAX_PHRASE))
fprintf(stderr, "\n (possible missing \"case P_%s:\" in compile()?)",
#ifdef DEBUG_PARSER
phrasename[phrase-512]
#else
"something"
#endif
);
fprintf(stderr, "\n");
#ifdef DEBUG
t5=t5/0;
#endif
exit(2);
}
return(-1); // DUMMY TRIP, NOTHING TO RETURN
}
« Code generators - one for three-address, one for a stack machine, one that generates structured C,
and one run-time interpreter. At the moment, function definitions and calls have only been added
to the stack-based machine and the C translator »static int nextlab = 0;
« basic codegen_three_address() generates a simple 3-address intermediate code »// Codegen is the guts of the compiler, which effectively serialises
// the AST into sequentially executable statements. A further phase
// is required to actually generate executable code.
« Support I/O procs for codegen »void declare(char *name)
{
fprintf(outfile, " int %s;\n", name);
}
void loadconst(char *format, TRIP dest, int value)
{
fprintf(outfile, " ");
fprintf(outfile, format, dest, value);
}
void loadvar(char *format, TRIP dest, char *varname)
{
fprintf(outfile, " ");
fprintf(outfile, format, dest, varname);
}
void store(char *format, char *varname, TRIP source)
{
fprintf(outfile, " ");
fprintf(outfile, format, varname, source);
}
void operate(char *format, TRIP dest, TRIP leftop, char *op, TRIP rightop)
{
fprintf(outfile, " ");
fprintf(outfile, format, dest, leftop, op, rightop);
}
void monoperate(char *format, TRIP dest, char *op, TRIP leftop)
{
fprintf(outfile, " ");
fprintf(outfile, format, dest, op, leftop);
}
void put_goto(int lab)
{
fprintf(outfile, " ");
fprintf(outfile, "goto L%02d;\n", lab);
}
void put_ifgoto(int cond, int lab, int sense)
{
fprintf(outfile, " ");
fprintf(outfile, "if (%s_t%d) goto L%02d;\n", (sense ? "" : "!"), cond, lab);
}
void input(TRIP i)
{ /* TACC BUG! CANNOT ESCAPE double quotes properly */
fprintf(outfile, " ");
fprintf(outfile, "fprintf(stderr, %c%s: %c); fflush(stderr); fscanf(stdin, %c%%d%c, &%s);\n",
'"', nameof(leftchild(i)), '"',
'"', '"', nameof(leftchild(i)));
}
void print(TRIP i)
{
// fprintf(outfile, " ");
// fprintf(outfile, "fprintf(outfile, %c%%d\\n%c, _t%d); fflush(outfile);\n", '"', '"', leftchild(i));
}
// this macro was used in debugging, not really needed now.
#define put_label(n) xput_label(n, __LINE__)
void xput_label(int lab, int line)
{
fprintf(outfile, "L%02d:\n", lab);
}
void codegen_three_address(TRIP root) {
if (root == -1) return;
fprintf(outfile, " // %d: %s\n", root, name[opsym(root)]);
switch (opsym(root)) {
case NOOP:
break;
case CONST:
loadconst("_t%d = %d;\n", root, rightchild(root));
break;
case DECLARESCALAR: // TO DO: modify for different decl types
root = leftchild(root);
declare(stringpool+rightchild(leftchild(root)));
break;
case VAR:
loadvar("_t%d = %s;\n", root, nameof(root));
break;
case ASSIGNSCALAR:
codegen_three_address(rightchild(root));
store("%s = _t%d;\n", nameof(leftchild(root)), rightchild(root));
break;
#ifdef TODO
case ASSIGNARRAY:
codegen_stack(nthchild(root, 3));
fprintf(outfile, " PUSH &%s\n", nameof(leftchild(root)));
// push the ADDRESS of the LHS - easy if it is a name, but be careful for a[2][3] etc (TO DO)
// might not be so hard if we implement arrays as a dopevector consisting of <addr,low,high>
codegen_stack(rightchild(root)); // array index
fprintf(outfile, " ADD\n");
fprintf(outfile, " POPI\n");
break;
case LABEL:
fprintf(outfile, "L_%s:\n", nameof(leftchild(root)));
break;
case DECLARESCALAR:
//fprintf(outfile, " NOOP ; DECLARESCALAR. TBD.\n");
break;
case DECLAREARRAY:
//fprintf(outfile, " NOOP ; DECLAREARRAY. TBD.\n");
break;
case DEFPROC:
current_function_name = stringpool+rightchild(leftchild(leftchild(root)));
fprintf(outfile, "%s:\n", current_function_name);
// first we use the paramlist in rightchild to push a temporary definition
// of each parameter on the namespace stack
// push_temporary_parameter_definitions(rightchild(root)); // (and also generate code to pop params from stack to locals)
codegen_stack(rightchild(root));
// then we compile code for the definition, which will pick up local parameters
// for those idents rather than any globals of the same name
// first pop the params off the data stack and assign to locals.
// later implementation won't use static locals but will pick up off stack directly
codegen_stack(nthchild(root, 3));
// restore_temporary_parameter_definitions(rightchild(root));
// TO DO: don't output RET if last statement was 'return'
fprintf(outfile, " RET 0\n"); // NOTE WE'RE ONLY HANDLING ONE IMPLICIT RESULT BY DEFAULT
break;
case DEFPARAM:
// small tweak to ensure parameters popped in reverse order to undo pushes to stack
codegen_stack(rightchild(root));
if (opsym(leftchild(root)) == DEFPARAM) codegen_stack(leftchild(root)); else {
int formal = rightchild(leftchild(leftchild(root)));
int local;
static char locals[128]; // off heap? (size)
sprintf(locals, "%s", stringpool+formal);
if (nextstring + strlen(locals) + 1 >= MAXPOOL) exit(1); // TO DO: add message
strcpy(stringpool+nextstring, locals); /* Create a backstop for when not found */
local = str_to_pool(locals);
if (local != nextstring) {
//////////////////////////////////////fprintf(stderr, "ERROR: local parameter '%s' already exists.\n", locals);
} else nextstring += strlen(locals)+1; /* Not found, add it */
rightchild(leftchild(leftchild(root))) = local; // replace for the context of this fn def
// the parameter at leftchild(root) is a "VAR" - currently we can cheat
// and dive into the string directly, because our model of params (and variables)
// is so dumb. However when we start using types properly this will have to change.
fprintf(outfile, " PARAM %s\n", locals);
}
break;
#endif
case ARRAYEL: // TO DO
case PARAM:
codegen_three_address(leftchild(root));
codegen_three_address(rightchild(root));
break;
#ifdef TODO
case RETURN:
codegen_stack(leftchild(root)); // result is just a <BOOLEXPR>, or -1...
fprintf(outfile, " RET 0\n");
break;
#endif
case SEQUENCE:
codegen_three_address(leftchild(root));
codegen_three_address(rightchild(root));
break;
#ifdef TODO
case IFTHEN:
{
int lab = ++nextlab;
// for a proper branch, need to look at root node here...
codegen_stack(leftchild(root)); // push the condition on the stack (True or False)
fprintf(outfile, " BF F_%d\n", lab);
codegen_stack(rightchild(root));
fprintf(outfile, "F_%d:\n", lab);
}
break;
case IFTHENELSE:
{
int lab = ++nextlab;
// for a proper branch, need to look at root node here...
codegen_stack(leftchild(root)); // push the condition on the stack (True or False)
fprintf(outfile, " BF T_%d\n", lab);
codegen_stack(rightchild(root));
fprintf(outfile, " B E_%d\n", lab);
fprintf(outfile, "T_%d:\n", lab);
codegen_stack(nthchild(root, 3));
fprintf(outfile, "E_%d:\n", lab);
}
break;
case REPEATIF:
{
int lab = ++nextlab;
// for a proper branch, need to look at root node here...
fprintf(outfile, "B_%d:\n", lab);
codegen_stack(leftchild(root));
if (rightchild(root) != -1) {
codegen_stack(leftchild(rightchild(root)));
fprintf(outfile, " BT B_%d\n", lab);
} else {
fprintf(outfile, " B B_%d\n", lab);
}
}
break;
case GOTO:
fprintf(outfile, " B %s\n", stringpool+rightchild(leftchild(leftchild(root))));
break;
case PROCCALL:
case FNCALL:
codegen_stack(rightchild(root)); // zero or more params
fprintf(outfile, " CALL %s\n", stringpool+rightchild(leftchild(root)));
break;
case INDEX:
fprintf(outfile, " PUSH &%s\n", nameof(leftchild(root)));
// push the ADDRESS of the LHS - easy if it is a name, but be careful for a[2][3] etc (TO DO)
// might not be so hard if we implement arrays as a dopevector consisting of <addr,low,high>
codegen_stack(rightchild(root)); // array index
fprintf(outfile, " ADD\n");
fprintf(outfile, " PUSHI\n");
break;
#endif
case INPUT:
input(root);
break;
case PRINT:
codegen_three_address(leftchild(root));
print(root);
break;
case NEG:
case NOT:
codegen_three_address(leftchild(root));
monoperate("_t%d = %s_t%d;\n", root, c_infix_op[opsym(root)], leftchild(root));
break;
case LINENO:
fprintf(outfile, "#line %d\n", leftchild(root));
break;
default:
/* Be careful not to default anything other than binary operators! */
if (arity[opsym(root)] != 3) {
fprintf(outfile, "*** Not Implemented: codegen_three_address(%s)\n", name[opsym(root)]);
break;
}
codegen_three_address(leftchild(root));
codegen_three_address(rightchild(root));
operate("_t%d = (_t%d %s _t%d);\n", root,
leftchild(root), c_infix_op[opsym(root)], rightchild(root));
break;
}
}
« codegen_c outputs structured c »
char *t[16*1024]; // An array is overkill. Keeping the last token in 'pending'
// would be enough! This was in case anything more complicated
// turned out to be needed, while developing the fix. Will change later.
int last_tok = -1;
void emit_c(char *s)
{
int lastch;
// The output is flat and must be run through gnu indent before use.
if (s == NULL) {
// flush
int i;
for (i = 0; i <= last_tok; i++) {
fprintf(outfile, "%s", t[i]);
fprintf(outfile, "%c", (strcmp(t[i], "}") != 0 ? ' ' : '\n'));
free(t[i]);
}
fprintf(outfile, "\n");
last_tok = -1;
return;
}
if (last_tok < 0) { t[++last_tok] = strdup(s); return; }
// fix the 'semicolon problem' by suppressing extra ones after ';', '{', and '}'
lastch = t[last_tok][strlen(t[last_tok])-1];
if ((strcmp(s, ";") == 0) && ((lastch == ':') || (lastch == '\n'))) {
t[++last_tok] = strdup("\n"); // unsatisfactory quick hack for labels
return;
}
if ((strcmp(s, ";") == 0) && (
(lastch == ';') ||
(lastch == '{') ||
(lastch == '}')
)) return;
t[++last_tok] = strdup(s);
}
void codegen_c(TRIP root) { // Walk the AST and output structured C
// ADD: rsym psym selin selout xprompt trap
// to do: brackets for reinserting precedence; 'x' symbols; save comments; invert until conds
static char tok[128]; // off heap? (size)
if (root == -1) return;
switch (opsym(root)) {
case NOOP:
break;
case LINENO:
break;
case CONST:
if (leftchild(root) == TYPE_CHARCONST) {
tok[0] = '\'';
if (rightchild(root) == '\'' || rightchild(root) == '\\') {
tok[1] = '\\';
tok[2] = rightchild(root);
tok[3] = '\'';
tok[4] = '\0';
} else {
tok[1] = rightchild(root);
tok[2] = '\'';
tok[3] = '\0';
}
} else { // assume TYPE_CONST
sprintf(tok, "%d", rightchild(root));
}
emit_c(tok);
break;
case VAR:
emit_c(nameof(root));
break;
case ASSIGNSCALAR:
emit_c(nameof(leftchild(root))); emit_c("=");
codegen_c(rightchild(root));
break;
case ASSIGNARRAY:
sprintf(tok, "%s[", nameof(leftchild(root))); emit_c(tok);
codegen_c(rightchild(root)); // array index
emit_c("]"); emit_c("=");
codegen_c(nthchild(root, 3));
break;
case LABEL:
sprintf(tok, "%s:", nameof(leftchild(root))); emit_c(tok);
break;
case DECLARESCALAR: // TO DO: modify for different decl types
emit_c("static int"); emit_c(nameof(leftchild(root))); emit_c(";");
break;
case DECLAREARRAY:
if (nthchild(root, 3) != -1) emit_c("const"); else emit_c("static");
emit_c("int"); emit_c(nameof(leftchild(root)));
emit_c("["); codegen_c(rightchild(root)); emit_c("]");
if (nthchild(root, 3) != -1) {
emit_c("=");emit_c("{");codegen_c(nthchild(root, 3));emit_c("};"); // forced semicolon
} else emit_c(";");
break;
case DEFPROC:
// treewalk the function body. If 'return' has a parameter, emit int; otherwise emit void... TO DO
if (locate_returns(root) != -1) emit_c("int"); else emit_c("void");
emit_c(stringpool+rightchild(leftchild(leftchild(root)))); emit_c("(");
if (rightchild(root) != -1) {
codegen_c(rightchild(root)); // PARAM list - basically just a comma-separated list
} else {
emit_c("void");
}
emit_c(")"); emit_c("{");
codegen_c(nthchild(root, 3));
emit_c(";"); emit_c("}");
break;
case DEFPARAM:
if (opsym(leftchild(root)) == DEFPARAM) {
codegen_c(leftchild(root));
} else {
emit_c("int"); emit_c(stringpool+rightchild(leftchild(leftchild(root))));
}
if (rightchild(root) != -1) {
emit_c(",");
codegen_c(rightchild(root));
}
break;
case ARRAYEL: // TO DO
case PARAM:
codegen_c(leftchild(root));
if (rightchild(root) != -1) emit_c(",");
codegen_c(rightchild(root));
break;
case RETURN:
emit_c("return");
if (leftchild(root) != -1) codegen_c(leftchild(root)); // result is just a <BOOLEXPR>, or -1...
break;
case SEQUENCE:
{TRIP child;
child = leftchild(root);
codegen_c(child);
if (child == -1) {
} else if (opsym(child) == PROCCALL) {
emit_c(";");
} else emit_c(";");
child = rightchild(root);
codegen_c(child);
if (child == -1) {
} else if (opsym(child) == PROCCALL) {
emit_c(";"); // need to check this is correct. Write some test edge cases.
}
}
break;
case IFTHEN:
emit_c("if"); emit_c("(");
codegen_c(leftchild(root));
emit_c(")");
if (opsym(rightchild(root)) == SEQUENCE) emit_c("{");
codegen_c(rightchild(root));
if (opsym(rightchild(root)) == SEQUENCE) {emit_c(";"); emit_c("}");}
break;
case IFTHENELSE:
emit_c("if"); emit_c("(");
codegen_c(leftchild(root));
emit_c(")");
if (opsym(rightchild(root)) == SEQUENCE) emit_c("{");
codegen_c(rightchild(root));
if (opsym(rightchild(root)) == SEQUENCE) {emit_c(";"); emit_c("}");} else emit_c(";");
emit_c("else");
if (opsym(nthchild(root, 3)) == SEQUENCE) emit_c("{");
codegen_c(nthchild(root, 3));
if (opsym(nthchild(root, 3)) == SEQUENCE) {emit_c(";"); emit_c("}");} else emit_c(";");
break;
case REPEATIF:
if (rightchild(root) != -1) {
emit_c("do"); emit_c("{");
codegen_c(leftchild(root));
emit_c(";"); emit_c("}"); emit_c("while"); emit_c("(");
codegen_c(leftchild(rightchild(root)));
emit_c(")");
} else {
emit_c("for"); emit_c("(;;)"); emit_c("{");
codegen_c(leftchild(root));
emit_c(";"); emit_c("}");
}
break;
case GOTO:
emit_c("goto"); emit_c(stringpool+rightchild(leftchild(leftchild(root))));
break;
case PROCCALL:
case FNCALL:
emit_c(stringpool+rightchild(leftchild(root))); emit_c("(");
codegen_c(rightchild(root)); // zero or more params
emit_c(")");
break;
case INDEX:
emit_c(nameof(leftchild(root))); emit_c("[");
codegen_c(rightchild(root)); // array index
emit_c("]");
break;
#ifdef TODO
case INPUT:
input(root);
break;
case PRINT:
codegen_three_address(leftchild(root));
print(root);
break;
#endif
case NEG:
case NOT:
// ( don't forget to check for operator precedence and insert brackets if needed )
emit_c(c_infix_op[opsym(root)]);
codegen_c(leftchild(root));
break;
default:
/* Be careful not to default anything other than binary operators! */
if ((root < 0) || (root >= MAXTRIPS)) {
fprintf(outfile, "*** Out of range: codegen_c(%d)\n", root); break;
} else {
int op = opsym(root);
if (arity[op] != 3) {
if ((op < 0) || (op >= MAX_OPCODE)) {
fprintf(outfile, "*** Not Implemented: codegen_c(%d)\n", op); break;
} else {
fprintf(outfile, "*** Not Implemented: codegen_c(%s)\n", name[op]); break;
}
}
/*
Given a node in your expression tree, you print it out using a
straightforward algorithm which recursively prints out its two
subtrees, with the operator inbetween. All you need to decide
is whether to print brackets around each of the subtrees.
You don't need to if the precedence of the subtree operator is
higher (in the target language) than that of the middle operator,
nor, of course, if the subtree is leafy.
You do need to if it's lower.
That only leaves the question of what to do if it's the same.
In general, it's non-trivial, but in practice all your operators
are going to be left-associative (except assignments and
unaries which come out in the wash), so this means (doesn't
it?) that the left tree won't need them and the right one might.
*/
{
int leftbranch = leftchild(root);
int left_op = opsym(leftbranch);
int left_prec = prio[left_op];
int mid_prec = prio[op];
if (left_prec < mid_prec) emit_c("(");
codegen_c(leftchild(root));
if (left_prec < mid_prec) emit_c(")");
}
emit_c(c_infix_op[op]);
{
int rightbranch = rightchild(root);
int right_op = opsym(rightbranch);
int right_prec = prio[right_op];
int mid_prec = prio[op];
if (right_prec < mid_prec) emit_c("(");
codegen_c(rightchild(root));
if (right_prec < mid_prec) emit_c(")");
}
break;
}
}
}
« codegen_stack generates for a stack machine »static char *SPACES =
" ";
void stack_emit(char *label, char *opcode, char *addressing_mode, char *operand, char *comment)
{
int col = 0, spaces;
col += fprintf(outfile, "%s", label);
if (strlen(label) >= 9) spaces = 1; else spaces = 10-strlen(label);
col += fprintf(outfile, SPACES+strlen(SPACES)-spaces-1);
col += fprintf(outfile, "%s", opcode);
if (strlen(opcode) >= 6) spaces = 1; else spaces = 7-strlen(opcode);
col += fprintf(outfile, SPACES+strlen(SPACES)-spaces-1);
col += fprintf(outfile, "%s%s", addressing_mode, operand);
#ifdef COMMENTS_SUPPORTED
if (*comment != '\0') {
if (col >= 40) spaces = 1; else spaces = 25-strlen(operand);
col += fprintf(outfile, SPACES+strlen(SPACES)-spaces-1);
fprintf(outfile, "%s", comment);
}
#endif
fprintf(outfile, "\n");
}
static char *stackasm_const(TRIP root)
{
static char tok[128]; // heap?
if (leftchild(root) == TYPE_CHARCONST) {
tok[0] = '\'';
if (rightchild(root) == '\'' || rightchild(root) == '\\') {
tok[1] = '\\';
tok[2] = rightchild(root);
tok[3] = '\'';
tok[4] = '\0';
} else if (rightchild(root) == '\n') {
tok[1] = '\\';
tok[2] = 'n';
tok[3] = '\'';
tok[4] = '\0';
} else {
tok[1] = rightchild(root);
tok[2] = '\'';
tok[3] = '\0';
}
} else if (leftchild(root) == TYPE_STRING) {
char *s = (char *)rightchild(root);
int p = 0;
tok[p++] = '"';
while (*s != '\0') {
if (*s == '\'' || *s == '\\') {
tok[p++] = '\\';
tok[p++] = *s;
tok[p++] = '\'';
} else if (*s == '\n') {
tok[p++] = '\\';
tok[p++] = 'n';
tok[p++] = '\'';
} else {
tok[p++] = *s;
}
s += 1;
}
tok[p++] = '"';
tok[p++] = '\0';
} else { // assume TYPE_CONST
sprintf(tok, "%d", rightchild(root));
}
return tok; // for safety, could use stringpool?
}
void codegen_stack(TRIP root) {
static char *current_function_name=""; // temp hack. real soln involves a scope/block-stack
static int nextlab = 1000;
char tok[128]; /* MUST NOT BE STATIC! */
if (root == -1) return;
switch (opsym(root)) {
case NOOP:
//stack_emit("", "; Suppressed declaration", "", "", "; If not, check the source. Use a -1 rather than NOOP where possible");
break;
case LINENO:
// output source code from last line to this line here ... proving problematic
break;
case CONST:
if ((leftchild(root) == TYPE_CHARCONST) || (leftchild(root) == TYPE_INT)) {
stack_emit("", "PUSH", "#", stackasm_const(root), "");
} else if (leftchild(root) == TYPE_STRING) {
stack_emit("", "PUSHS", "", stackasm_const(root), "");
} else {
// not yet handled
exit(99);
}
break;
case VAR:
// TO DO: if the var is actually the name of an array, you push the address rather than the value (ie the first word at the address)
// still waiting for name table managememt.
stack_emit("", "PUSH", "", nameof(root), "");
break;
case ASSIGNSCALAR:
codegen_stack(rightchild(root));
stack_emit("", "POP", "", nameof(leftchild(root)), "");
break;
case ASSIGNARRAY:
codegen_stack(nthchild(root, 3));
stack_emit("", "PUSH", "&", nameof(leftchild(root)), "array base");
// push the ADDRESS of the LHS - easy if it is a name, but be careful for a[2][3] etc (TO DO)
// might not be so hard if we implement arrays as a dopevector consisting of <addr,low,high>
// Note if the array is a procedure parameter, '&whatever' doesn't work, it has to be 'whatever'
codegen_stack(rightchild(root)); // array index
// PUSHI and POPI instructions supercede calculating offsets explicitly.
// stack_emit("", "PUSH", "#", "2", "slow but will do for now");
// stack_emit("", "LSH", "", "", "scale array offset to dword");
// stack_emit("", "ADD", "", "", "add array offset");
stack_emit("", "POPI", "", "4", "assign array element"); // still need machine-dependent word size
break;
case LABEL:
{
char lab[128];
sprintf(lab, "%s:", nameof(leftchild(root)));
stack_emit(lab, "", "", "", ""); // Add "L_" in front of user labels?
}
break;
case DECLARESCALAR:
// initialisedscalar should use .data - currently we are generating a separate explicit assignment
{
char lab[128];
sprintf(lab, "%s:", nameof(leftchild(root)));
// TO DO: use .name in procedures relative to proc name? Whatever soln we use depends on having proper name tables
stack_emit(lab, ".WORD", "", "1", "; static. if this is in a procedure, there's a name clash.");
}
break;
case DECLAREARRAY:
{
char lab[128];
sprintf(lab, "%s:", nameof(leftchild(root)));
// TO DO: need to enter name & type info into name tables, so that when used (eg as a parameter to
// and external call), we know that we have to push the *address* of the array, rather than the
// contents of the first word of the array.
// name table handling is common to all code generators, and should probably be handled at the point
// where the tuples are created rather than where they are used. I'm really not at all sure that
// I have the structure of this thing well enough thought out yet that I know what I should be doing here.
if (nthchild(root, 3) != -1) {
stack_emit("", ".WORD", "", stackasm_const(rightchild(root)), ""); // dopevector precedes array contents
stack_emit(lab, ".ARRAY", "", "", ""); // start of const data
codegen_stack(nthchild(root, 3)); // a .data nnnn statement for each item
} else {
stack_emit(lab, ".WORD", "", stackasm_const(rightchild(root)), ""); // rightchild() is wrong. Need to look inside
}
}
break;
case DEFPROC:
current_function_name = stringpool+rightchild(leftchild(leftchild(root)));
{
char procs[128];
sprintf(procs, "%s:", current_function_name);
stack_emit(procs, ".PROC", "", "", "; Proc/fn entry point");
}
// first we use the paramlist in rightchild to push a temporary definition
// of each parameter on the namespace stack
// push_temporary_parameter_definitions(rightchild(root)); // (and also generate code to pop params from stack to locals)
codegen_stack(rightchild(root));
// then we compile code for the definition, which will pick up local parameters
// for those idents rather than any globals of the same name
// first pop the params off the data stack and assign to locals.
// later implementation won't use static locals but will pick up off stack directly
codegen_stack(nthchild(root, 3));
// restore_temporary_parameter_definitions(rightchild(root));
// TO DO: don't output RET if last statement was 'return'
stack_emit("", "RET", "0", "", "");
break;
case DEFPARAM:
// small tweak to ensure parameters popped in reverse order to undo pushes to stack
codegen_stack(rightchild(root));
if (opsym(leftchild(root)) == DEFPARAM) codegen_stack(leftchild(root)); else {
int formal = rightchild(leftchild(leftchild(root)));
int local;
static char locals[128]; // heap???
sprintf(locals, "%s", stringpool+formal);
if (nextstring + strlen(locals) + 1 >= MAXPOOL) exit(1); // TO DO: add message
strcpy(stringpool+nextstring, locals); /* Create a backstop for when not found */
local = str_to_pool(locals);
if (local != nextstring) {
//////////////////////////////////////fprintf(stderr, "ERROR: local parameter '%s' already exists.\n", locals);
} else nextstring += strlen(locals)+1; /* Not found, add it */
rightchild(leftchild(leftchild(root))) = local; // replace for the context of this fn def
// the parameter at leftchild(root) is a "VAR" - currently we can cheat
// and dive into the string directly, because our model of params (and variables)
// is so dumb. However when we start using types properly this will have to change.
stack_emit("", ".PARAM", "", locals, "");
}
break;
case ARRAYEL: // TO DO
{TRIP t1;
t1 = leftchild(root); // <NUM> -- don't use default codegen which is CONST (resulting in PUSH #num)
// make_binary_tuple(CONST, datatype, (int)atol(value))
if (opsym(t1) != CONST) {
fprintf(stderr, "Assertion failed. type is %d\n", opsym(t1)); exit(0);
}
stack_emit("", ".CONST", "", stackasm_const(t1), "");
codegen_stack(rightchild(root));
}
break;
case PARAM:
codegen_stack(leftchild(root));
codegen_stack(rightchild(root));
break;
case RETURN:
if (leftchild(root) == -1) {
stack_emit("", "RET", "0", "", "");
} else {
codegen_stack(leftchild(root)); // result is just a <BOOLEXPR>, or -1...
stack_emit("", "RET", "1", "", "");
}
break;
case SEQUENCE:
codegen_stack(leftchild(root));
codegen_stack(rightchild(root));
break;
case IFTHEN:
{
int lab = ++nextlab;
char labs[128];
// for a proper branch, need to look at root node here...
codegen_stack(leftchild(root)); // push the condition on the stack (True or False)
sprintf(tok, "F_%d", lab);
stack_emit("", "BF", "", tok, "");
codegen_stack(rightchild(root));
sprintf(labs, "%s:", tok); stack_emit(labs, "", "", "", "");
}
break;
case IFTHENELSE:
{
int lab = ++nextlab;
// for a proper branch, need to look at root node here...
codegen_stack(leftchild(root)); // push the condition on the stack (True or False)
sprintf(tok, "T_%d", lab);
stack_emit("", "BF", "", tok, "");
codegen_stack(rightchild(root));
sprintf(tok, "E_%d", lab);
stack_emit("", "B", "", tok, "");
sprintf(tok, "T_%d:", lab);
stack_emit(tok, "", "", "", "");
codegen_stack(nthchild(root, 3));
sprintf(tok, "E_%d:", lab);
stack_emit(tok, "", "", "", "");
}
break;
case REPEATIF:
{
int lab = ++nextlab;
// for a proper branch, need to look at root node here...
sprintf(tok, "B_%d:", lab);
stack_emit(tok, "", "", "", "");
sprintf(tok, "B_%d", lab);
codegen_stack(leftchild(root));
if (rightchild(root) != -1) {
codegen_stack(leftchild(rightchild(root)));
stack_emit("", "BT", "", tok, "");
} else {
stack_emit("", "B", "", tok, "");
}
}
break;
case GOTO:
stack_emit("", "B", "", stringpool+rightchild(leftchild(leftchild(root))), "");
break;
case PROCCALL:
case FNCALL:
{
int paramcount = 0;
char tmp[128];
TRIP p, p2;
p = rightchild(root);
while (p != -1) {
paramcount += 1;
p2 = leftchild(p); // the param -- could do with some assert checks here, just in case
p = rightchild(p); // link or -1
}
codegen_stack(rightchild(root)); // zero or more params
sprintf(tmp, "%s,%d,%d", stringpool+rightchild(leftchild(root)), paramcount, (opsym(root) == FNCALL ? 1 : 0));
stack_emit("", "CALL", "", tmp, "function, params, results");
}
break;
case INDEX:
stack_emit("", "PUSH", "&", nameof(leftchild(root)), "");
// push the ADDRESS of the LHS - easy if it is a name, but be careful for a[2][3] etc (TO DO)
// might not be so hard if we implement arrays as a dopevector consisting of <addr,low,high>
codegen_stack(rightchild(root)); // array index
// stack_emit("", "PUSH", "#", "2", "slow but will do for now");
// stack_emit("", "LSH", "", "", "scale array offset to dword");
// stack_emit("", "ADD", "", "", "");
stack_emit("", "PUSHI", "4", "", ""); // redefined! INDEX then push indirect
break;
#ifdef TODO
case INPUT:
input(root);
break;
#endif
case PRINT:
codegen_stack(leftchild(root));
stack_emit("", "PRINT", "", "", "");
break;
case NEG:
case NOT:
codegen_stack(leftchild(root));
stack_emit("", name[opsym(root)], "", "", "");
break;
default:
/* Be careful not to default anything other than binary operators! */
if ((root < 0) || (root >= MAXTRIPS)) {
fprintf(outfile, "*** Out of range: codegen_stack(%d)\n", root); break;
} else {
int op = opsym(root);
if (arity[op] != 3) {
if ((op < 0) || (op >= MAX_OPCODE)) {
fprintf(outfile, "*** Not Implemented: codegen_stack(%d)\n", op); break;
} else {
fprintf(outfile, "*** Not Implemented: codegen_stack(%s)\n", name[op]); break;
}
}
codegen_stack(leftchild(root));
codegen_stack(rightchild(root));
stack_emit("", name[op], "", "", "");
break;
}
}
}
« Run-time interpreter: Cheap & nasty code execution directly from the AST! »
// Trivial run-time stack implementation:
static int stack[128]; // heap?
static int stackp = -1;
void Push(int val) {
fprintf(stderr, "Push(%d)\n", val);
stack[++stackp] = val;
}
int Pop(void) {
fprintf(stderr, "Pop() -> %d\n", stack[stackp]);
return stack[stackp--];
}
« (a little hack) »
// We cannot execute the AST in the obvious manner because this language (and our test program)
// use jumps all over the place. We have to flatten and serialise all the control flow structures
// such as repeat/until loops etc. We can probably keep procedures as a recursive call however
// since jumping into and out of procedures is not allowed.
// No problem leaving expressions and assignments etc as high-level objects. We don't need to
// compile down to anything as basic as a byte code.
// convert these to come off heap???
static int CODE[16*1024]; // These two arrays are very temp hacks
static int Memory[640*1024]; // old PC size :-)
static int CODEPC = 0;
typedef int CODEP;
CODEP linear_code(TRIP orig)
{
CODEP here = CODEPC;
CODE[CODEPC] = make_unary_tuple(LINEAR_CODE, orig);
CODEPC += 1;
return here;
}
CODEP plant_code(TRIP orig)
{
CODEP here = CODEPC;
CODE[CODEPC] = orig;
CODEPC += 1;
return here;
}
// prototype symbol table management - VERY TEMPORARY
CODEP lookup_linearised_proc(char *name)
{
fprintf(stderr, "NOT IMPLEMENTED: lookup_linearised_proc(%s)\n", name);
return 0;
}
CODEP lookup_jump_label(char *name)
{
fprintf(stderr, "NOT IMPLEMENTED: lookup_jump_label(%s)\n", name);
return 0;
}
void define_linearised_proc(char *name, CODEP addr)
{
fprintf(stderr, "NOT IMPLEMENTED: define_linearised_proc(%s, %d)\n", name, addr);
}
void define_jump_label(char *name, CODEP addr)
{
fprintf(stderr, "NOT IMPLEMENTED: define_jump_label(%s, %d)\n", name, addr);
}
void serialise_AST(TRIP root) { // make a few tweaks to the data structure to make execution easier
static char *current_function_name;
int tmp1, tmp2;
if (root == -1) return;
switch (opsym(root)) {
case LABEL:
// enter (nameof(leftchild(root)), CODEPC) into table for later retrieval by lookup_jump_label
define_jump_label(nameof(leftchild(root)), CODEPC);
break;
case DEFPROC:
// need to enter address of function in a table, and linearise the code in the function too.
// enter the procedure name in a table of functions mapping to linearised code addresses
current_function_name = stringpool+rightchild(leftchild(leftchild(root)));
define_linearised_proc(current_function_name, CODEPC);
// first we use the paramlist in rightchild to push a temporary definition
// of each parameter on the namespace stack
// push_temporary_parameter_definitions(rightchild(PC)); // (and also generate code to pop params from stack to locals)
serialise_AST(rightchild(root));
// then we compile code for the definition, which will pick up local parameters
// for those idents rather than any globals of the same name
// first pop the params off the data stack and assign to locals.
// later implementation won't use static locals but will pick up off stack directly
serialise_AST(nthchild(root, 3));
// restore_temporary_parameter_definitions(rightchild(PC));
plant_code(make_unary_tuple(RETURN, -1)); // fallback return for procs - error if function
break;
case PROCCALL:
case FNCALL:
// look up address of function from table, substitute address into leftchild(root) element
// We'll probably insist that a function is fully defined before it is used, otherwise we need to patch like with jump labels
linear_code(rightchild(root)); // zero or more params
// plant updated copy of call:
plant_code(make_unary_tuple(opsym(root), lookup_linearised_proc(stringpool+rightchild(leftchild(root)))));
break;
case RETURN:
// plant code to push return parameter if present, then plant the return opcode (ie copy of this trip)
linear_code(leftchild(root)); // result is just a <BOOLEXPR>, or -1...
plant_code(make_unary_tuple(RETURN, -1)); // simple return, param already handled
break;
case SEQUENCE:
serialise_AST(leftchild(root));
serialise_AST(rightchild(root));
break;
case IFTHEN:
linear_code(leftchild(root)); // push the condition on the stack (True or False)
plant_code(tmp1 = make_unary_tuple(BF, -1));
serialise_AST(rightchild(root));
leftchild(tmp1) = CODEPC;
break;
case IFTHENELSE: // same sort of thing as above
linear_code(leftchild(root)); // push the condition on the stack (True or False)
plant_code(tmp1 = make_unary_tuple(BF, -1)); // jump over 'then' part
serialise_AST(rightchild(root));
plant_code(tmp2 = make_unary_tuple(B, -1)); // jump over 'else' part
leftchild(tmp1) = CODEPC;
serialise_AST(nthchild(root, 3));
leftchild(tmp2) = CODEPC;
break;
case REPEATIF:
tmp1 = CODEPC; // note this address for jump back
serialise_AST(leftchild(root));
if (rightchild(root) != -1) {
linear_code(leftchild(rightchild(root))); // test condition
plant_code(make_unary_tuple(BT, tmp1)); // conditional branch back to start of loop
} else {
plant_code(make_unary_tuple(B, tmp1)); // unconditional branch back to start of loop
}
break;
case GOTO:
// look up code address of label. may not be planted yet so will need to plug later. Chain back?
tmp1 = lookup_jump_label(stringpool+rightchild(leftchild(leftchild(root))));
plant_code(make_unary_tuple(B, tmp1)); // jump over 'else' part
break;
default:
linear_code(root); // shouldn't be any flow control issues in remaining opcodes
break;
}
}
void execute_AST(TRIP PC) {
int op, opd;
TRIP left, right;
if (PC == -1) return;
// remaining opcodes must not do any flow-control, just simple imperative statements
fprintf(stderr, "execute_AST(%d)\n", PC);
fprintf(stderr, " %s (%d)\n", name[opsym(PC)], opsym(PC));
switch (opsym(PC)) {
case NOOP:
break;
case LINENO:
break;
case CONST:
Push(rightchild(PC));
break;
case VAR:
Push(variable_contents(PC)); // this and ASSIGNSCALAR may be broken - see prog2.t test
break;
case ASSIGNSCALAR:
execute_AST(rightchild(PC));
variable_contents(leftchild(PC)) = Pop(); // not sure if this is working...
fprintf(stderr, " %s = %d\n", // Until I add a "print" command to this language, we can see the results of
nameof(leftchild(PC)), // computations by a simple hack, which is to print out the value of any assignments.
variable_contents(leftchild(PC)));
break;
case ASSIGNARRAY:
#ifdef TODO
codegen_stack(nthchild(PC, 3));
stack_emit("", "PUSH", "&", nameof(leftchild(PC)), "");
// push the ADDRESS of the LHS - easy if it is a name, but be careful for a[2][3] etc (TO DO)
// might not be so hard if we implement arrays as a dopevector consisting of <addr,low,high>
codegen_stack(rightchild(PC)); // array index
stack_emit("", "ADD", "", "", "");
stack_emit("", "POPI", "", "", "");
#endif
break;
case LABEL:
#ifdef TODO
stack_emit(nameof(leftchild(PC)), "", "", "", ""); // Add "L_" in front of user labels?
#endif
break;
case DECLARESCALAR:
#ifdef TODO
// initialisedscalar should use .data - currently we are generating a separate explicit assignment
stack_emit(nameof(leftchild(PC)), ".WORD", "", "1", "");
#endif
break;
case DECLAREARRAY:
#ifdef TODO
if (nthchild(PC, 3) != -1) {
stack_emit(nameof(leftchild(PC)), ".WORD", "", stackasm_const(rightchild(PC)), ""); //number of words to follow - can be removed
//codegen_stack(nthchild(PC, 3)); // a .data nnnn statement for each item
} else {
stack_emit(nameof(leftchild(PC)), ".WORD", "", stackasm_const(rightchild(PC)), ""); // rightchild() is wrong. Need to look inside
}
#endif
break;
case DEFPARAM:
#ifdef TODO
// small tweak to ensure parameters popped in reverse order to undo pushes to stack
codegen_stack(rightchild(PC));
if (opsym(leftchild(PC)) == DEFPARAM) codegen_stack(leftchild(PC)); else {
int formal = rightchild(leftchild(leftchild(PC)));
int local;
static char locals[128]; // heap?
sprintf(locals, "%s", stringpool+formal);
if (nextstring + strlen(locals) + 1 >= MAXPOOL) exit(1); // TO DO: add message
strcpy(stringpool+nextstring, locals); /* Create a backstop for when not found */
local = str_to_pool(locals);
if (local != nextstring) {
//////////////////////////////////////fprintf(stderr, "ERROR: local parameter '%s' already exists.\n", locals);
} else nextstring += strlen(locals)+1; /* Not found, add it */
rightchild(leftchild(leftchild(PC))) = local; // replace for the context of this fn def
// the parameter at leftchild(PC) is a "VAR" - currently we can cheat
// and dive into the string directly, because our model of params (and variables)
// is so dumb. However when we start using types properly this will have to change.
stack_emit("", "PARAM", "", locals, "");
}
#endif
break;
case ARRAYEL: // TO DO
case PARAM:
execute_AST(leftchild(PC));
execute_AST(rightchild(PC));
break;
case INDEX:
// push_address(nameof(leftchild(PC))); - or ? - push_address(leftchild(PC));
execute_AST(rightchild(PC)); // calculate array index
right = Pop(); left = Pop();
Push(Memory[left+right]); // contents of memory at address+offset (integer offset, not byte offset)
break;
case INPUT:
#ifdef TODO
input(PC);
#endif
break;
case PRINT:
#ifdef TODO
execute_AST(leftchild(PC));
print(PC);
#endif
break;
case NEG:
execute_AST(leftchild(PC));
Push(-Pop());
break;
case NOT:
execute_AST(leftchild(PC));
Push(!Pop()); // Boolean NOT, not bitwise NOT
break;
// BAND, BOR,
case LAND:
case BAND:
execute_AST(leftchild(PC));
execute_AST(rightchild(PC));
right = Pop(); left = Pop(); Push(left && right);
break;
case LOR:
case BOR:
execute_AST(leftchild(PC));
execute_AST(rightchild(PC));
right = Pop(); left = Pop(); Push(left || right);
break;
// ADD, SUB, MUL, DIV, MOD,
case ADD:
execute_AST(leftchild(PC));
execute_AST(rightchild(PC));
right = Pop(); left = Pop(); Push(left + right);
break;
case SUB:
execute_AST(leftchild(PC));
execute_AST(rightchild(PC));
right = Pop(); left = Pop(); Push(left - right);
break;
case MUL:
execute_AST(leftchild(PC));
execute_AST(rightchild(PC));
right = Pop(); left = Pop(); Push(left * right);
break;
case DIV:
execute_AST(leftchild(PC));
execute_AST(rightchild(PC));
right = Pop(); left = Pop();
if (right == 0) {
fprintf(stderr, "Run-time error: divide by zero\n"); exit(1);
}
Push((int)((int)left / (int)right));
break;
case MOD:
execute_AST(leftchild(PC));
execute_AST(rightchild(PC));
right = Pop(); left = Pop();
if (right == 0) {
fprintf(stderr, "Run-time error: divide by zero\n"); exit(1);
}
Push((int)((int)left % (int)right));
break;
// LSH, RSH, EXP,
case LSH:
execute_AST(leftchild(PC));
execute_AST(rightchild(PC));
right = Pop(); left = Pop(); Push(left << right);
break;
case RSH:
execute_AST(leftchild(PC));
execute_AST(rightchild(PC));
right = Pop(); left = Pop(); Push(left >> right);
break;
case EXP:
execute_AST(leftchild(PC));
execute_AST(rightchild(PC));
right = Pop(); left = Pop();
//Push(iexp(left, right));
{int temp = left;
while (right >= 2) {
left = left * temp; right -= 1;
}
}
Push(left);
break;
// EQ, NE, LT, GT, LE, GE,
case EQ:
execute_AST(leftchild(PC));
execute_AST(rightchild(PC));
right = Pop(); left = Pop(); Push(left == right);
break;
case NE:
execute_AST(leftchild(PC));
execute_AST(rightchild(PC));
right = Pop(); left = Pop(); Push(left != right);
break;
case LT:
execute_AST(leftchild(PC));
execute_AST(rightchild(PC));
right = Pop(); left = Pop(); Push(left < right);
break;
case GT:
execute_AST(leftchild(PC));
execute_AST(rightchild(PC));
right = Pop(); left = Pop(); Push(left > right);
break;
case LE:
execute_AST(leftchild(PC));
execute_AST(rightchild(PC));
right = Pop(); left = Pop(); Push(left <= right);
break;
case GE:
execute_AST(leftchild(PC));
execute_AST(rightchild(PC));
right = Pop(); left = Pop(); Push(left >= right);
break;
// These opcodes should never be executed as they've already been converted to B/BT/BF
case IFTHEN:
case IFTHENELSE:
case REPEATIF:
case GOTO:
case PROCCALL: // handled in level above
case FNCALL: // handled in level above
case RETURN:
case DEFPROC: // linearised already
case SEQUENCE: // should not happen after serialisation!
default:
// INTERNAL ERROR!
fprintf(outfile, "*** Not Implemented: execute_AST(%s)\n", name[opsym(PC)]);
exit(1);
break;
}
}
static TRIP pcstack[1024]; // move to heap?
static int pcstackp = 0;
void push_returnaddr(TRIP PC)
{
pcstack[pcstackp++] = PC;
}
TRIP pop_returnaddr(void)
{
return pcstack[--pcstackp];
}
void execute_serialised_AST(CODEP PC) {
int op, opd;
TRIP trip, left, right;
// *only* special AST objects for handling flow control. Simple imperatives are called the old way
// THIS COULD BE DONE WITH A CODE[] ARRAY. DOES NOT NEED TO OVERLOAD AST[]
for (;;) {
trip = CODE[PC];
op = opsym(trip);
fprintf(stderr, "PC: %04x (op=%s (%d))\n", PC, name[op], op);
switch (op) {
case B: if (leftchild(trip) == -2) exit(0); // hack to terminate program
PC = leftchild(trip); break;
case BT: opd = Pop(); if (opd) PC = leftchild(trip); else PC += 1; break;
case BF: opd = Pop(); if (!opd) PC = leftchild(trip); else PC += 1; break;
case PROCCALL: push_returnaddr(PC+1); PC = leftchild(trip); break; // untested
case FNCALL: push_returnaddr(PC+1); PC = leftchild(trip); break;
case RETURN: PC = pop_returnaddr(); break;
case LINEAR_CODE: execute_AST(leftchild(trip)); PC += 1; break;
default:
fprintf(stderr, "execute_serialised_AST: bad opcode %d\n", op);
exit(1);
}
fprintf(stderr, "PC after: %04x\n", PC);
}
}
int main(int argc, char **argv) {
int opt_3address = FALSE, opt_debug = FALSE, opt_stack = FALSE, opt_c = FALSE, opt_execute = FALSE, opt_optimiser = FALSE;
char *s;
#ifdef DEBUG
// GDB backtrace facility!
// extern void restart_under_gdb(int argc, char **argv);
// restart_under_gdb(argc, argv);
#endif
« Handle program arguments »
« Get clean version of executable name. Should work on most existing systems (2006) »
if ((argc >= 2) && strcmp(argv[1], "-O") == 0) {
argv++; argc--; opt_optimiser = TRUE;
}
if ((argc >= 2) && strcmp(argv[1], "-d") == 0) {
argv++; argc--; debug_parser = TRUE;
}
if ((argc >= 2) && strcmp(argv[1], "-3") == 0) {
argv++; argc--; opt_3address = TRUE;
}
if ((argc >= 2) && strcmp(argv[1], "-s") == 0) {
argv++; argc--; opt_stack = TRUE;
}
if ((argc >= 2) && strcmp(argv[1], "-c") == 0) {
argv++; argc--; opt_c = TRUE;
}
if ((argc >= 2) && strcmp(argv[1], "-e") == 0) {
argv++; argc--; opt_execute = TRUE;
}
if ((argc >= 2) && strcmp(argv[1], "-h") == 0) {
fprintf(stderr, "%s:\n", progname);
fprintf(stderr, "\t-3\tgenerate low-level 3-address code using c\n");
fprintf(stderr, "\t-c\tgenerate high-level c translation\n");
fprintf(stderr, "\t-s\tgenerate stack-based assembly code\n");
fprintf(stderr, "\t-e\texecute directly\n");
fprintf(stderr, "\t-d\tdebug\n");
fprintf(stderr, "\t-h\thelp (this info)\n");
exit(0);
}
if (argc != 2) {
fprintf(stderr, "syntax: %s [-3cdehs] filename\n", progname);
exit(1);
}
if (!(opt_3address || opt_c || opt_execute || opt_stack)) opt_stack = TRUE;
sourcefile = fopen(argv[1], "r");
if (sourcefile == NULL) {
fprintf(stderr, "%s: %s - %s\n", progname, strerror(errno), argv[1]);
exit(errno);
}
if (opt_execute) {
outfile = stdout;
} else {
char *s;
sprintf(outname, "%s", argv[1]);
s = strrchr(outname, '.');
if (s == NULL) s = outname+strlen(outname);
if (opt_3address || opt_c) {
sprintf(s, "%s", ".c");
} else if (opt_stack) {
sprintf(s, "%s", ".asm");
} else {
fprintf(stderr, "Won't\n"); exit(123); // shouldn't happen
}
outfile = fopen(outname, "w");
if (outfile == NULL) {
fprintf(stderr, "%s: cannot output to %s - %s\n", progname, outname, strerror(errno));
}
}
curfile = argv[1]; startline = TRUE; whitespace = TRUE;
onecharstr = (char *)malloc(512);
« Lexical scan » line_reconstruction(); // Effectively, lexing.
« Debug the lexed tokens? »#ifdef DEBUG_PARSER
if (debug_parser) {
int i; // DEBUG ONLY
fprintf(stderr, "\nLexical token stream:\n\n");
for (i = 0; i < nextfree; i++) {
fprintf(stderr, "C[%d] => %s, line %d, col %d: [%0d] %s\n",
i, c[i].f, c[i].l, c[i].col, c[i].t, c[i].s);
}
}
#endif
« Call the parser »
#ifdef DEBUG_TRIPS_DURING
fprintf(outfile, "Trips before patching holes:\n");
#endif
if (!parse(PHRASE_BASE, 0)) {
« Attempt to print a sensible error if the parse failed » if (bestparse == nextfree) {
fprintf(stderr, "\"%s\", Line %d, Col %d: Premature end of file while looking for %s\n",
argv[1], c[bestparse].l, c[bestparse].col+1, looking_for);
} else {
int i;
fprintf(stderr, "\"%s\", Line %d, Col %d: Syntax error while looking for %s near ",
argv[1], c[bestparse].l, c[bestparse].col+1, looking_for);
for (i = bestparse; i < bestparse+3; i++) {
if (i == nextfree) {
fprintf(stderr, "<End of file>");
break;
}
switch (c[i].t) {
case TYPE_HEXINT:
fprintf(stderr, "$"); // *OR* ... We could put the '$' back in front of the string
/* drop through */ // and probably save much code whenever printing. Use str+1
case TYPE_TAG:
case TYPE_CHAR:
case TYPE_INT:
case TYPE_KEYWORD:
fprintf(stderr, "%s", c[i].s);
break;
case TYPE_STRING:
fprintf(stderr, "\"%s\"", c[i].s);
break;
case TYPE_CHARCONST:
fprintf(stderr, "'%s'", c[i].s);
break;
}
fprintf(stderr, (i == (bestparse+2) ? " ..." : " "));
}
fprintf(stderr, "\n");
}
exit(1);
}
« Generate code »
« Debugging »
{int program;
« Debugging »
program = compile(0, 0);
if (nexttrip <= 0) {
fprintf(outfile, "\nError: no code generated! (nexttrip = %d)\n", nexttrip); exit(0);
}
« Debugging »
#ifdef DEBUG
if (opt_debug) {
int i, l;
l = c[0].l;
fprintf(outfile, "\ntokens:\n\n %4d: ", l); fflush(outfile);
for (i = 0; i < nextfree; i++) {
if (c[i].l != l) {fprintf(outfile, "\n %4d: ", c[i].l); l = c[i].l;}
if (c[i].t == TYPE_CHARCONST) {
fprintf(outfile, "'%s' ", c[i].s);
} else {
fprintf(outfile, "%s ", c[i].s);
}
}
fprintf(outfile, "\n\n");
}
#endif
// Now generate the output code from the AST.
if (opt_3address) {
fprintf(outfile, "\nAST serialised into three-address code:\n\n");
codegen_three_address(program);
} else if (opt_stack) {
stack_emit("", "","","","; Stack-based code");
Walk_AST(program, 0, want_decls, output_stack_code);
hide_DECLS_and_NOOP(program); // Hide local decls
Walk_AST(program, 0, want_procs, output_stack_code);
stack_emit("__start:", "","","","; Main Entry Point");
hide_DEFPROC_and_NOOP_and_DECLS(program); // Hide procedures and local decls (irreversible, so this codegen better come last)
codegen_stack(program);
stack_emit("", "EXIT","","","");
} else if (opt_c) {
emit_c("#include <stdio.h>\n");
emit_c("#include <stdlib.h>\n");
emit_c("\n");
emit_c("static int instream = 0, outstream = 0;\n");
emit_c("FILE *infile = NULL, *secondary = NULL, *outfile = NULL;\n");
emit_c("\n");
emit_c("int trap(int mask)\n");
emit_c("{\n");
emit_c(" return (0);\n");
emit_c("}\n");
emit_c("\n");
emit_c("void dosignal(int i, int j, int k)\n");
emit_c("{\n");
emit_c(" if (i == 0) exit(0);\n");
emit_c(" if ((i == 14) && (j == 2)) {\n");
emit_c(" fprintf(stderr, \"Missing switch label: '%c'\\n\", k); exit(1);\n");
emit_c(" }\n");
emit_c(" fprintf(stderr, \"Unhandled signal %d %d %d\\n\", i, j, k);\n");
emit_c(" exit(1);\n");
emit_c("}\n");
emit_c("\n");
emit_c("int consoleget(void)\n");
emit_c("{\n");
emit_c(" int c;\n");
emit_c(" c = fgetc( stdin );\n");
emit_c(" return c;\n");
emit_c("}\n");
emit_c("\n");
emit_c("int fileget(int stream)\n");
emit_c("{\n");
emit_c(" int c;\n");
emit_c(" c = fgetc( (stream == 1 ? infile : secondary) );\n");
emit_c(" return c;\n");
emit_c("}\n");
emit_c("\n");
emit_c("void putsym(int c)\n");
emit_c("{\n");
emit_c(" fputc(c, outfile);\n");
emit_c("}\n");
emit_c("\n");
emit_c("void psym(int c)\n");
emit_c("{\n");
emit_c(" fputc(c, outfile);\n");
emit_c("}\n");
emit_c("\n");
emit_c("void xprompt(int c)\n");
emit_c("{\n");
emit_c(" fputc(c, outfile);\n");
emit_c("}\n");
emit_c("\n");
Walk_AST(program, 0, want_decls, output_c); // output top-level declarations only, not initialisations if dynamic (TO DO)
emit_c("\n");
Walk_AST(program, 0, want_procs, output_c);
hide_DEFPROC_and_NOOP_and_DECLS(program); // Hide procedures and local decls (irreversible, so this codegen better come last)
emit_c("int main(int argc, char **argv)"); emit_c("{");
emit_c(" if (argc != 3) {\n");
emit_c(" fprintf(stderr, \"syntax: %s infile outfile\\n\", argv[0]);\n");
emit_c(" exit(1);\n");
emit_c(" }\n");
emit_c(" if (strcmp(argv[1], argv[2]) == 0) {\n");
emit_c(" fprintf(stderr, \"%s: output file cannot overwrite input file\\n\", argv[0]);\n");
emit_c(" exit(1);\n");
emit_c(" }\n");
emit_c(" infile = fopen(argv[1], \"r\");\n");
emit_c(" if (infile == NULL) {\n");
emit_c(" fprintf(stderr, \"%s: cannot read file '%s'\\n\", argv[0], argv[1]);\n");
emit_c(" exit(1);\n");
emit_c(" }\n");
emit_c(" outfile = fopen(argv[2], \"w\");\n");
emit_c(" if (outfile == NULL) {\n");
emit_c(" fprintf(stderr, \"%s: cannot write file '%s'\\n\", argv[0], argv[2]);\n");
emit_c(" exit(1);\n");
emit_c(" }\n");
codegen_c(program); // output initialisation of declared variables but not the actual declarations. TO DO!!!
emit_c("exit(0);");
emit_c("}");
emit_c(NULL); // and flush
} else if (opt_execute) {
fprintf(stderr, "\nRun-time interpretation.\n\nSerialise AST:\n");
CODEPC = 0;
serialise_AST(program); // reassign result to program ?
linear_code(make_unary_tuple(B, -2)); // exit at end of program
// can we fix up back pointers merely by resetting 'nexttrip' and calling the same code again
// to generate linear AST in same sequence, now we know where everything is going to be stored?
// (jump label destinations, procedure entry points etc)
// start address is wrong. need to note where the '__start' label would have been planted.
// note it during serialisation and use it when starting execution
// declarations, especially const arrays, still to be done.
// there's a problem with 'var' tuples. should only be one, at point of declaration?
// not sure yet if it is a conceptual problem or a bug
#if defined(DEBUG_TRIPS_AFTER) || defined(DEBUG_TRIPS_DURING) || defined(DEBUG_TRIPS_CODE)
{
int i;
for (i = 0; i < CODEPC;i++) {
fprintf(stderr, "CODE %04x: ", i);
if (opsym(CODE[i]) == LINEAR_CODE) {
int trip = leftchild(CODE[i]);
fprintf(stderr, " ... tree starting with ... ");
printtrip(trip);
} else {
// print special trip
printtrip(CODE[i]);
}
}
}
#endif
fprintf(stderr, "\n\nExecute Serialised AST:\n");
execute_serialised_AST(0);
}
}
exit(0); return(1);
}