A High Level Assembler For the ICL 2900
Philip A. F. Hartley
Computer Science 4 Project
May 1976
CHAPTER 1.
Introduction.
The goal of the project was to write a high level
assembler for the ICL 2900 series of computers. This
problem immediately seperated into two distinct subproblems:
1. Gaining an understanding of the machine.
2. Choosing the form of the language.
The 2970 had been used for development work over the
last year by ERCC. So there was a lot of first hand
experience of the machine around. The first problem,
therefore, could be overcome by reading the machine manuals
and then talking to people who had experience of the machine
in order to clear up the points which needed clarification.
The choice of HAL as the structure of the language was
arrived at quickly. Once HAL had been selected, the
problem of where to start arose. Whether to start from
scratch or to take an already existing HAL, determine the
machine dependent parts and alter or replace these to
produce 2900 code. EMAS was chosen as the development
machine on which to implement HAL-2900 using the latter of
the above techniques.
It is important to see this project in a greater context
than just a fourth year programming exercise. This was the
first project undertaken in the department which involved
direct contact with the machine structure. It was therefore
felt that the project could play a large part in helping
later projects come to grips with the machine at an earlier
stage than was possible in this project, due to lack of
adequate documentation.
CHAPTER 2.
Approach to the 2900.
In order to write a good compiler or high level
assembler for a machine the implementor must have an
extemely good knowledge of the machine both in terms of its
order code and of the overall structure of the system. A
considerable part of the project was spent trying to
understand the machine by going through the manufacturer's
documentation.
It was quite evident that none of the documentation gave
an adequate 'middle of the road' view of the system - it was
either too glossy or extremely detailed and virtually
unreadable.
The experience of the EMAS project team was that the
manufacturer's documentation was not quite good enough; some
details were either confused, ambiguous or even omitted.
It was for these reasons, both for the project's own
sake and to aid the dissemination of information about the
machine, that Appendix A was written, a detailed, but not
exhaustive, description and discussion of the 2970 system.
It was intended to give the reasonably experienced computer
scientist an introduction to the main concepts of the
system, without getting bogged down in too much detail. But
it was also designed as a start to the documentation which
would complement that of the manufacturer inasmuch as it
would contain clarification of details of the system which
were not adequately covered elsewhere.
CHAPTER 3.
The approach to the high level assembler for the 2900
General
A high level assembler is a programming language in
which there are program structuring - and often data
structuring - facilities which one expects to find in a high
level language but which allows direct access to all machine
facilities, including all machine instructions, registers
and memory.
The first high level assembler was N. Wirth's PL360
[1] for the IBM 360/370. It was originally written as a
tool to aid the implementation of an ALGOL 60 compiler but
it soon became evident that this type of language bridged
the rather wide gap between assemblers and high level
languages.
The advantage of high level assemblers over normal
assemblers is, as Bell and Wichmann say in [3], "that they
are an easy method of exploiting the hardware without
becoming entangled in machine code". The number of mistakes
eliminated by being able to write arbitrary expressions
instead of having to generate the instructions by hand shows
how great these advantages are.
The advantage over high level languages is precisely
that one can get direct access to machine facilities.
Techniques used to access data or program can be tailored
to the particular application rather than being very general
as is necessary in a compiler. Certainly, on a machine like
the Interdata 70/74, by equivalencing a variable name with
a register, 2 bytes per access to that variable can be saved
over a program in which the variable was equivalenced to a
memory location (that is, the difference between a
register-register form and a register-indexed form of
instruction). Therefore, unless the compiler can do global
optimisation, a difficult and expensive procedure, as much
as 2 bytes per access can be gained in the high level
assembler over the high level language. If a variable is
accessed 100 times, then up to 200 bytes can be saved on the
size of the code - a very important factor when programming
on machines with a small memory.
A high level assembler for the 2900?
But why a high level assembler for the 2900? The 2900
is not a multi-register machine so the advantage of
assigning a variable name to a register quoted above does
not apply.
However, there are some short forms (e.g. the LNB+n
addressing mode) but the main advantages are in the areas
where the available high level languages are deficient. For
example, in IMP [4] there is no direct methods of
manipulating 64 bit integer quantities; the escape
descriptor for dynamic data structuring cannot be used to
full advantage.
At the moment, the alternative to programming in a high
level language on the 2900 is assembler. There is one
tremendous disadvantage of working at this level on the
2900: the primitive level interface is not guaranteed to
remain the same. This makes impracticable the use of such
techniques for all but the smallest applications, even
although system VME/K was written in MAPLE/STAPLE.
Choice of the language.
There were three constraints which the language had to
satisfy, in order to maximise the advantages stated above.
1. The language should not enforce any method of
accessing data or program.
2. The language should allow easy access to all
machine types and functions.
3. It should be possible to produce code as efficient
as that produced by hand.
HAL languages, designed by H. Dewar of the Department
of Computer Science, are a family of assemblers whose high
level features are basically the same on all machines but
whose low level features are tailored to the particular
machine for which the HAL is designed. The high level
features include block structure (only to restrict the scope
of macros and variable names), assignment statements,
conditional statements, controlled loops, a macro scheme and
a statement for equivalencing variable names to registers,
memory locations or constants. The language allows for the
explicit positioning of code and data.
One of the main features of the language is to seperate
the program into two main sections; a declarative section
and a control section. Most of the control section can be
machine independent, operating on names which are bound to
machine resources in the declarative section.
This means that the binding of a variable name in the
declarative section can be changed from a store location to
a register, or from a long form to a short form, for
example, without altering the control section.
The language had to be extended to cater for the machine
types, namely descriptors, which do not exist on any other
machine for which HAL is implemented.
HAL was chosen because it satisfied the above
constraints and because of the great deal of programming
experience I already had with HAL70 and the small amount
with HAL7502. It is very important, I feel, to be
familiar with the language which is being implemented.
Method of implementation.
Once HAL was chosen, a decision had to be made: to
start from scratch or to convert an already existing HAL
implementation to HAL2900.
This was an extremely difficult decision to make. The
first method would have involved a great deal of
're-inventing the wheel' type of work. Therefore the second
technique was chosen, thus taking the chance that the
program might have to have been rejected and so wasting a
great deal of time.
HAL70 was examined since it was by far the most
established HAL language in use. It is quite a large and
complicated program and took a long time to understand. It
was written for an 8K pdp15 and is therefore necessarily
very tightly written, which made it even more difficult to
understand.
A brief description of the machine independent HAL
The main function of the machine independent part of the
HAL program is to convert source statements into a more
convenient semi-reverse polish form (not true reverse
polish, as will be seen later). This includes the tasks of
reducing of names to an internal form and dictionary
handling. This sections handles conditional statements,
converting them to a set of assembler labels and jump
directives, the macro scheme, conditional assembly,
definition and redefinition of variable names, block
stucture, setting of the assembler location counter, the
production of the listing file and error reports.
The machine dependent section generates the object file,
evaluates expressions and conditions, planting code where
necessary, generating jumps and planting data.
The two sections are not totally dependent especially in
the area of assembler jumps where they become slightly
entangled in order to optimise the short form of the jump
instruction. Some machine dependent features to do with
descriptors appear in the main reverse polish conversion
routine but these will be discussed later.
Choice of machine.
On which machine was the language to be implemented?
EMAS was chosen for several reasons. Firstly, the 2900 is
a 32 bit machine and the assembler would have to manipulate
values of up to 32 bits in length. EMAS was the only
readily available machine on which this could be done
easily.
More importantly, there were extensive facilities for
the generation and manipulation of 2900 object files
available on EMAS.
Other reasons for choosing EMAS were that IMP on the
Interdatas was not fully debugged, the operating system was
not designed for general purpose use anyhow, and the
machines were not reliable enough at the start of the
project. The PDP15 was not big enough with only 8K words
of store - although HAL70 and HAL7502 both fit into this.
CHAPTER 4.
Implementation details.
For reference, most of the machine dependent section of
the assembler is contained in routine ASSEMBLE starting at
line 244 in appendix D.
The internal representation of a tag is by 3*16 and 1*32
bit values.
+--------+--------+ +--------+----------------+
| | | | | |
| tag1 | tag2 | | type | val |
| | | | | |
+--------+--------+ +--------+----------------+
16 16 16 32
TAG1 represents the first 3 characters of a tag (in base 36
format) and TAG2 holds characters 4 - 6. A tag is a
sequence of letters followed by a sequence of digits of
which only the first 6 are significant.
The type part is interpreted as follows:
bit 0 unused
bit 1 machine instruction
bit 2 macro
bit 3 forward reference
bit 4 undefined tag
bit 5 relocatable
bit 6 pseudo register
bit 7 register
bit 8 memory reference
bit 9 type of reference
=0 => W(REG+VAL)
=1 => L(REG+VAL)
=2 => B(REG+VAL)
=3 => IS(VAL) or IS(B) if REG = B
=4 => @(REG+VAL)
=5 => @DR(REG+VAL)
=6 => @(REG+VAL)(B)
bits 12 - 15 index register (REG, above)
The registers are represented by :
LNB = 1, XNB = 2, PC = 3, SSN = 4, TOS = 5
DR = 6, B = 8, SF = 9, ACC = 10
Zero in the index register field implies that no index
register is being used.
Examples
In the following examples, and throughout the rest of the
report, ( , ) represents a (TYPE, VAL) pair, LMASK = x'10'
(i.e. 1 set in 'type of reference'), PSREG = x'200' (i.e.
'pseudo register' set to 1) and REGISTER = x'100' (i.e.
'register' set to 1). The expression on the left hand side
is what actually appears in the source; the expression on
the right hand side is the internal representation of the
left hand side.
lnb = (register, lnb)
w(lnb+4) = (lnb, 1) / lnb is offset by a number of words
/ so a conversion is done from the
/ offset stated in the source (which
/ is always bytes) to words.
l(pc+16) = (lmask+pc, 8) / again the conversion but to half
/ words this time.
w(r0+16) = (psreg+0, 16) / no conversion for pseudo registers
/ until the actual evaluation of the
/ pseodo reg is required.
Therefore an assembly time expression, to which a
variable name can be equivalenced, for example, is one whose
result can be represented by a (TYPE, VAL) pair, without
generating any code.
An operand in the reverse polish form is therefore
represented by a (TYPE, VAL) pair (TAG1 and TAG2 can be
discarded since they are only required for dictionary
look-up and these operations have been completed by the time
the reverse polish is generated). An operator is
represented by a negative number. (The switch ASS in
ASSEMBLE is indexed by the operator value). For example,
given the following definitions:
$def a = w(lnb+4), b = w(lnb+8)
$def c = l(ssn+32)
(i.e. define A, B and C to have the values of the
assembly time expressions on the right hand sides of the
appropriate equals signs)
then the reverse polish form generated for the source
statement:
c = a+b+3
(i.e. assign to the memory location or register synonymed
by C the value of the expression given by adding the
contents of the memory location, register or constant
synonymed by A, adding it to the memory reference, register
or constant synonymed by B, and then adding 3).
would be:
LMASK+SSN, 8, store,
LNB, 1, LNB, 2, add, 0, 3, add, store
!
where 'store', 'load' and 'add' are the operators in the
reverse polish.
Routine ASSEMBLE is then called and the expression
starting at the position indicated is evaluated. When the
'load' operation is met the pointer is returned to the start
of the expression and when the 'store' operation is reached,
the evaluation is terminated.
When an operation is met in the reverse polish the
appropriate procedure is invoked by jumping through the
switch ASS (in routine ASSEMBLE) indexed by the operation
value. For binary operations the left operand is delivered
in (TYPE1, VAL1) and the right operand in (TYPE, VAL) and
the result is expected to be returned in (TYPE, VAL). A
unary operation is invoked with its operand in (TYPE, VAL)
and the result returned in the same.
Descriptors
The descriptor provides a method of accessing a regular,
usually static, set of data items. The descriptor itself is
a 64 bit quantity of which the first (most significant) 32
bits define the type of the data being accessed (1, 8, 32,
64 bit length, for example), whether any modifier applied is
to be scaled and an upper bound on the size of the modifier.
The second (least significant) 32 bits contain the address
of the first item of the set.
The notation chosen for descriptors was @DESC for
non-modified descriptors and @DESC(MOD) for modified
descriptors. This was chosen as being the most consistent
with the syntax of other indirect forms in HAL; e.g
W(ADDR), B(ADDR), etc.
The symbol '@' could then be treated as an operation -
unary for non-modified descriptors and binary for
descriptors to which modification is applied. The
particular operation to be invoked is worked out from
context. It is quite straightforward but not trivial since,
in the modified case, the '@' is being treated as a prefixed
binary operator, whereas all other binary operators are
infix. Of course, the treatment of unary '@' is quite
consistent with unary minus (-) and not (\).
The technique of 'evaluation' of descriptors is to try
to fit the descriptor reference to one of the available
addressing modes and if this is not possible, to load the
DR (a register which holds a descriptor) with the
descriptor. The modifier, if any, is loaded into the B
register unless it fits one of the modifier addressing modes
which is consistent with the desciptor reference. For
example,
@(L(LNB+4))(W(LNB+12))
would be coerced to
@DR(W(LNB+12))
where DR gets L(LNB+4), and
@(L(XNB+40))(L+M)
would be coerced to
@(L(XNB+40))(B)
where B gets L+M, given that L+M will not fit one of the
modifier addressing modes.
Registers
Each register on the 2900 is meant to serve a particular
purpose. For instance, the XNB is meant for use as an
index register and no computation can be performed on it.
The accumulator is meant to hold intermediate results of
expressions and for performing computation and it cannot be
used as an index register. The B register is semi-flexible
in that a limited amount of computational functions can be
performed with it (on 32 bit quantities) and it has got a
route into the stack, although its primary function is to
hold modifiers.
Because registers have such set functions and a
particular register has to be used when its function is
required, it was difficult to see how the concept of the
temporary register specification (the '$TEMP' directive)
would fit into the HAL2900 scheme. After all, the stack
and the accumulator can be used to hold partial results of
expressions so that explicit specification of temporary
registers is unnecesary. The only register it appeared to
be necessary to let the the user claim or release was the B
register since there are a number of instructions to allow
it to be used as a cycle control variable.
This, however, turns out to be a rather short sighted
view. There are situations, especially with XNB, where the
user may want to load a register with a particular value and
be sure that the register will not be used later by the
assembler for its own purposes. If the assembler does need
to use the register, then it should flag the occasion by
generating an error message.
A problem with index registers is that offsets are of
different lengths depending on the register in use. For
example, offsets from the LNB are taken by the hardware to
mean a number of words; offsets from the PC are taken to be
a number of half-words (i.e. 2 bytes). But when a register
is stored, if it can be, then the value is a byte address.
The assembler, therefore, assumes that all offsets it finds
are a number of bytes and does the appropriate conversions.
The only cases in which this is not true is when it detects
SF = sf+(....) or LNB = sf-(....) in which case it uses
the instructions ASF and RALN (Adjust Stack Front and
Raise Local Name Base) which assume an operand which is
a number of words.
The evaluation of expressions
At first sight the 2900 is ideal for the evaluation of
expressions; it has a stack, the top item of which (top of
stack - TOS) can be accessed by a primary addressing mode
and an accumulator (the real top of stack) which can be
loaded, stack and loaded or stored using any of the primary
addressing modes.
On top of this the B register can be used to a limited
extent as an accumulator to perform addition, subtraction
(but not reverse subtraction) and multiplication on 32 bit
quantities. It is better to use the B register when
possible since its arithmetic is faster than that of the
main accumulator.
The drawbacks come in the way the multiple length
accumulator is handled. Operations cannot be performed
directly between a 64 bit accumulator and 32 bit twos
complement directly or indirectly (i.e. via a descriptor)
addressed values. In order to convert a 32 bit value to a
64 bit value with sign extension, the value must first
either be loaded into a 32 bit accumulator and then, by a
direct modification of the program status register, change
the size to 64 bits, or it could be loaded directly into a
64 bit accumulator via a descriptor. In either case the
result is a 32 bit value extended on the left to 64 bits
with zeroes. The sign extension must be done by program; a
left shift logical 32 bits and a right shift arithmetic 32
bits. There are other cases when the accumulator causes
problems, mostly to do with conversion, but they do not
concern this project.
The assembler, therefore, had to handle expressions
consisting of operands potentially of different lengths and
had to decide what conversions should be done.
There is no way at compile time, however, of telling the
length of indirectly addressed data items. There are 2
choices here: either a new notation is introduced to specify
the length of the item implicitly or some explicit way of
describing the lengths of all indirectly addressed items is
employed.
The latter method was chosen. The former method would
certainly be the most flexible but a notation to describe
all types of descriptors would have been cumbersome,
perhaps. The latter method seemed to cover the majority of
cases.
So the '$ACC' directive was introduced. If the
directive '$ACC 32' is given then all indirectly addressed
items up to the next '$ACC' directive are assumed to be 32
bit twos complement values. Similarly for '$ACC 64'.
The problems which now existed were when to use the B
register as an accumulator and when to coerce data to
another length.
The choice of accumulator was made at the start of an
expression. The B register was chosen when '$ACC 32' was
in force and all operations involved in the expression could
be performed on it. This only seemed to have one
disadvantage; that there was no reverse subtraction on the
B register and the occurence of this operation could not be
detected by a simple scanning of the source text (as could
be done to detect other non-B register operations). So
when this operation was required in the evaluation of an
expression it is converted to a negate and add operation.
The length to which all operands should be converted was
taken to be indicated by the current '$ACC' directive.
This seemed like the most desirable length at the time but
perhaps the length of the LHS (if it existed) would have
been a better choice.
Conversion at present is always done when a variable is
required as an operand. There is, however, quite a
considerable amount of code required to do conversion so
perhaps it would have been a better idea to do the
conversion only when necessary, deferring the conversion
when 2 operands of the same length are combined in a binary
operation, and converting if the operands are of dissimilar
length, for example. This gives the advantage of doing
CONVERT(J+K+L) rather than CONVERT(J)+CONVERT(K)+CONVERT(L).
After having written some test programs and having
examined the code produced there is, perhaps, a better
scheme for evaluating expressions which comes to mind.
Take, for example, the expression:
dest = @desc(mod)+w(addr)
The destination of the expression involved in DESC is the
DR, therefore the operands should be coerced to 64 bits
without sign extension. MOD will, unless some other
'less strong' addressing mode will suffice, be destined for
the B register. Therefore operands should be treated as
signed 32 bit twos complement quantities and coerced to this
length, if necessary. Similarly for ADDR except that the
possible final destination will be XNB. The whole
expression should then be coerced to the length of DEST.
This is difficult with the present structure since a
binary operation sees only 2 operands; it knows nothing
about the environment of the expression. This could be done
by inserting more information into the reverse polish to
indicate what kind of expression is coming up so the
appropriate accumulator and coercion length choices can be
made. There are still problems with expressions which have
no destination, conditions, for example, and thus no
sensible guess can be made about coercion length. Perhaps
the '$ACC' directive may be the choice.
Assignments
Assignments on the 2900 are not as straightforward as on
multi-register machines. The difficulty comes because
different registers and memory locations are accessed by
different instructions whereas on multi-register machines
they are accessed more uniformly. There are special cases
with ACC = dr which has to be detected in order to use CYD
(copy the descriptor register into the accumulator) and with
B and TOS since they are treated as registers but, unlike
the other registers, can be accessed directly by a primary
addressing mode.
Pseudo registers
In multi-register machines like the IBM 360/370 and the
Interdata 70/74 it is possible to access a number of
seperately addressable areas by assigning a register to
point to the base of each area. Elements of each area can
then be accessed by specifying an offset from the
appropriate base register.
The 2900 does not have multiple index registers. There
is only one such register, the XNB.
A facility in the assembler is therefore provided to
allow a number of memory locations to act like index
registers, the XNB being loaded with the appropriate value
when required.
For this purpose, sixteen pseudo registers are provided,
R0 - R15. A pseudo register can be equivalenced in the
'$DEF' statement to any expression capable of assembly time
evaluation. The pseudo register can then be used in place
of a real index register in both assembly time and run time
expressions.
This effectively allows the user to equivalence a
variable name with an expression which has an extra level of
indirection (not meaning via a descriptor this time), for
example, an expression like W(W(LNB+4)+16) in the following
manner:
$def r0 = w(lnb+4)
$def k = w(r0+16)
When the variable name K is referenced, the XNB would be
automatically loaded with W(LNB+4) and the (TYPE, VAL)
pair altered to represent W(XNB+16).
There is a rather obvious generalisation to this but it
would probably create more problems than it would solve.
For instance, when does one actually 'evaluate' the object
and load the appropriate registers? The method which is
implemented is rather clumsy but it covers the majority of
cases in which indirection is required.
Segmentation
The 2900 is a segmented machine. That is, logical
program and data sections can be seperated by placing them
in different segments.
It is worthwhile discussing here how much the loader, or
rather the program which generates a file for the loader,
can effect the way the assembler operates.
The system routine LPUT is the program which generates
ERCC 2900 object files on EMAS. A program passes
fragments of information to LPUT and it assembles them into
a format suitable for other programs to use as input in
order to produce proper loader files. It is, however,
primarily designed for use by compilers. The restrictions
which this brings are far reaching. For instance, in the
process of the compilation of an IMP program, information
for a limited number of segments is generated. These
segments are the code segment, the GLA segment for static
data, the procedure linkage table for external procedure
linking and another segment for storing diagnostic
information. LPUT is tailored to this sort of use; it is
not intended to produce general object files, containing
information for an arbitrary number of segments.
If LPUT is used, therefore, there a restricted number
of segments for use, of which only two are really useful,
the code and the GLA segments. It would obviously be
better to use a more generalised system. This could be done
by generating loader information directly but there seems to
be such a plethora of ICL loader formats that this might
lead to unacceptable inflexibility. The chosen loader
format may become available on only one system, for
instance, or the specification of the format, like the
primitive level interface, may change.
It was felt, therefore, that the only way of producing a
working system, no matter how limited that system is, was to
use the facilities already available.
So, at the moment, code can only be dumped in two
segments, the code and GLA segments, although this does not
preclude the linking in of other segments at run time. The
mechanism for depositing values in either segment is as
follows: when the assembler is entered it is set up as
currently dumping in the code segment. To change to the
GLA segment, the current location counter is captured (by
the $DEF var = * facility) and then changed (by the '$LOC'
directive) to a value containing a pseudo register
component. The assembler assumes that when the location
counter has a component of a pseudo register, then it is
required to dump code in the GLA segment.
This can be generalised by specifying two expressions in
the '$LOC' directive; the segment required and the location
counter within that segment. Some segments would have
predefined values: 0 for the code segment and 1 for the GLA
segment, say.
Optimisation
The implementors of some languages (of PL11 [2], for
example) feel that the code produced by the assembler should
be as 'clear' as possible. That is, it should not try to
hide anything from the programmer, and that a simple
inspection of the source program should be enough to 'guess'
what code should be produced. They therefore feel that
optimisation should only be done in very rare, but well
defined cases, since it leads to obscure code being
produced.
Certainly, the assembler should not use any memory
locations unless they are specifically set aside by the
programmer but optimisation in terms of remembering the
contents of registers through basic blocks (i.e. section of
code with one entry point and one exit and containing no
machine instructions) plays a large part in satisfying the
fourth constraint of the language mentioned in chapter
three, that the code produced by the assembler should be
efficient.
The main optimisation done, in fact, is to remember the
contents of registers. This can be done by having a
(TYPE, VAL) pair associated with each register whose
contents are to be recorded. When the register is loaded,
the duplet describing the item being loaded is deposited in
the relevant (TYPE, VAL) pair of the register.
The problem here is that there are some (TYPE, VAL)
forms which cannot be accessed directly, for example, the
value of the contents of an index register plus an offset
(an immediate form in Interdata terms), or an access to a
byte which has to be loaded via a descriptor. But by the
time the code has been generated to produce a form which can
be directly referenced, the original (TYPE, VAL) pair has
been lost. This is a difficult problem to get round and it
has not really been solved, but the assembler still catches
a lot of important cases.
The condition code, and what set it, is also remembered
but this is relatively minor since, unlike the Interdata,
the loading of a variable into a register does not affect
the condition code. So the only time this optimisation will
produce a result is when the same comparison is done twice
within a short enough space so that the register involved
does not get corrupted.
Register contents are forgotten whenever a value is
written to store unless it is the contents of the register
itself which are being written or the register contains a
constant. A better way would be to forget all memory
references in registers if the store was to be done via a
descriptor, but if the store was direct then only to forget
references to that same location in other registers. This
is not done at the moment because it would be too clumsy
with the present set up - another area for improvement.
Other features
All other HALS, so far, have generated 'stand alone'
programs. That is, there were no facilities for generating
external references or entry points. For the 2900
implementation this is impracticable since HAL2900 programs
will almost always want to interface with other entities in
the system, for example, to do I/O.
Two directives have been introduced for this purpose,
'$EXT' and '$ENT'. Both are followed by a name, '$EXT'
instructing the loader to deposit in the suceeding two
locations a descriptor via which the object referred to by
the given name can be accessed, and '$ENT' informing the
loader that the succeeding two locations will contain a
descriptor defining an entry point which will be externally
referred to by the given name.
Current state of the program
The assembler is now available on EMAS for general use.
Programs have been run on the New Range Simulator on
EMAS, but not, as yet, on the real machine, although this
not cause any problems.
CHAPTER 5.
Conclusions.
Chapter 3 discusses in detail what the advantages might
be of programming in high level assemblers but after
programming a little in HAL2900 it becomes clear that very
little can be gained over high level languages on this
machine. This is connected with fact that the architecture
of the machine enforces quite strict techniques on the
programmer, for example, of routine calling. Very little is
saved in terms of code produced and is it really worth it to
sacrifice excellent diagnostics for what is basically just a
reduction in the awkwardness of handling certain items? I
think not.
But the prime object of the project was not to produce a
production system, although I wanted to get something
working, but more to learn how to write such a system,
perhaps producing a platform to a useable system on the way.
I certainly feel I have provided the latter.
I have learned a lot about the process of compilation,
the problems of evaluating expressions at this level, the
problems of optimisation, and how to manage a large program.
The project would have been worthwhile just to gain this
experience.
CHAPTER 6.
References and acknowledgements
[1] PL360 - A Programming Language for the IBM360.
Wirth, N.
JACM, vol 15 (1968), p37
[2] PL11 - A Programming Language for the PDP11
Russell, R.
CERN report #74-24
[3] PL516 - Programming Language for the Honeywell DDP516
Bell and Wichmann
Software, VOL 1 (1971), p61
[4] IMP Programming Language and Compiler
Stevens, P. D.
Computer Journal, Val 17 (1965), #3
Many thanks go to my supervisor, Nick Shelness, without
whose help, advice, good ideas, and general bringing down to
earth, I wouldn't have finished the project.
Also to H. Dewar who suggested the project in the first
place (in a slightly altered form) and without whose amazing
program I couldn't possibly have got as far.
Thanks to Jeff Tansley for manuals and advice.
Thanks to P. Stevens, G. Millard, R. Wickham and
others in ERCC and the RCO who gave me advice.