This is not the first IMP to C translator. Back in the 90's when IMP was still a live language, Peter Stephens of Edinburgh Portable Compilers Ltd (and previously of Edinburgh Regional Computer Center) wrote a good IMP to C translator (unsurprisingly called "imptoc") for Imp80 which was the final standardised version of IMP at Edinburgh. This was actually based on the portable Imp80 compiler in use at the time so had extremely good compatibility with Imp80.
Unfortunately, by the time the source code was released to the Edinburgh Computer History Project some many years later, software rot had set in and some IMP language constructs now caused the compiler to crash and exit without any output. (I believe the specific language area that was problematic was some part of the %record handling.) We did attempt to fix the code but because the grammar that had been used to produce the parser had become separated from the code, we were unable to match the code that handled the problematic areas to the grammar that described them. There were also some language features of Imp80 that simply could not be implemented accurately by the C compilers of that time, so that particular imptoc program was more appropriate for translating IMP sources to C as a one-off operation with the intent of manually correcting any deficiencies, and maintaining the translated program from then on in C form.
This translator is therefore a new project, written from scratch, that uses a new parser which has the significant feature of embedding the textual input form the grammar in the source code, in the hope that it will never become separated from the code that handles the grammar rules in the future. (In the intervening years between Peter Stephen's imptoc compiler and this project, I worked on a Skimp to C project and two other IMP to C projects. These were R&D projects and never intended to be full working compilers - they all required considerable manual intervention. One of the IMP to C projects was a translator for ICode, the intermediate code output by Peter Robertson's Imp77 compiler. This one was never intended to produce maintainable C code - it was only ever meant as an alternative code-generator for Imp77. If it had worked. These projects were part of my learning curve.)
Because IMP is an Algol-60-like language, it makes heavy use of nested procedures. As you may know, these were not supported by C back in the 90's (and even now are not supported by any of the official C standards). However we now have access to the Gnu project's GCC compiler which *does* provide proper support for nested procedures, so the output of our new IMP to C (now called "imps") targets GCC alone. (Although the rival "Clang" compiler does have a facility that approximates nested procedures, it does not allow mutual recursion between nested procedures, so is unsuitable as a target for our IMP to C translator.)
There are two primary uses of a language translator like this one. The first is to convert old sources into a current language where they can be maintained in that language; the second is to make the conversion transparent and present the user with the appearance of a native compiler for the original language. As usual with projects born of Edinburgh, we attempt to handle both cases! To do so, some compromises have to be made. 1) When generating code that is to be converted once and maintained in C, we translate some constructs in a way that usually is correct, but may not be in some obscure corner cases. When this happens we will warn the user and encourage them to check that area of the translation carefully; and 2) When using the translator as a compiler for code maintained in Imp, some of the translations may verge on the verbose if not unreadable side, to ensure 100% compatibility with standard IMP behaviour. Since the user should not be looking at the intermediate C stage anyway, this is not considered a major problem. Note that both of these options are the extreme cases and have to be requested using command-line options - the default translations will be a compromise between the two - relatively readable C code but code which handles some constructs (such as strings) the IMP way rather than the C way.
When I started this project it was with the intention of being able to translate all the variations of IMP that I knew of (and believe me, there are many). I did expect there to be a few areas where one old IMP compiler's interpretation of a statement was at odds with another compiler's idea of how to handle it, but I thought these would be few and could be handled with compile-time command-line options. How wrong I was! The current scheme therefore is this: before translating with our code, we will run the front-end syntax checking passes of all the old compilers, and from how well they handle the source code (ie how many errors they generate), we'll set all the options corresponding to the version of the compiler that is the best fit. We've already built front ends and syntax checkers for Imp9 (Old EMAS IMP), Imp77, Imp80, PDP15 Imp, and the non-standard Imp80 for the 68000-based Edinburgh Advanced Personal Machines (APMs) - so we're pretty confident that this scheme will work. As a bonus, we can use the front end of the chosen compiler to produce the listing file and error messages, so the IMP to C translator does not have to do those housekeeping chores too! This simplifies its code to some extent.
A table-driven top-down recursive-descent parser in the style of most of the parsers used at Edinburgh (in turn derived from Tony Brooker's parser from the Compiler-Compiler project and used in creating Atlas Autocode and Algol60 implementations) creates a concrete syntax tree or 'analysis record' from the IMP source. An automatically-generated program converts this in turn into an Abstract Syntax Tree (AST) which is initially almost a 1:1 mapping of the analysis record, although somewhat simplified. This automatically generated skeleton code is available for the programmer to modify and produce a properly abstract AST representing the IMP program structure.
The AST is then 'compiled' by another module of the system into another type of AST that more closely represents the C program that is to be generated. (This data structure is called the 'C AST' or 'CAST' in the source.) Depending on expediency, some parts of the compilation process are handled using the AST and others, using the CAST.
The final action is to output C code from the CAST.