Edinburgh University Computer Science Department Motorola 88100 Assembler User Notes This document relates to a locally written assembler for the Motorola 88100 processor. It describes the input language accepted, as well as the organisation of the output object file produced. The assembler is designed to be used in conjunction with high-level language compilers not yet written. In particular this means that there is support for external linkage, i.e. cross references (both to code and data objects) between separately compiled or assembled modules. It also means that each separately compiled module is likely to consist of two areas, one containing code, the other static data and external linkage information. It is assumed that at run time this data area will be pointed at by a base register. One of the registers "reserved for the linker", R29, has been commandeered for this purpose, and has been called SB for Static Base. To supplement the assembler, a linking loader is available, which takes several separately assembled (or compiled) object files and combines them into a single image suitable for loading into EPROM. This is also described. The user (assembly language programmer) is expected to be familiar with the architecture and instruction set of the processor, and/or to have access to Motorola's "MC88100 RISC Microprocessor User's Manual". This is a one-pass assembler. Therefore the listing file which it produces while processing the source input cannot show the actual branch displacements for forward references. The generated binary code is therefore omitted from the listing altogether, which as a result contains only line numbers, address offsets, source text, and error messages. The generated code is not output on the fly, but buffered internally to allow forward references to be fixed up as soon as the relevant labels are reached. As an option, an additional listing file may be generated from the buffered code, which consists of code address, code value, and disassembled code, but without the original source text or line numbers. This feature is particularly useful in the development stage of the assembler, as it may be used to check that it is working correctly. Using the assembler The assembler is available on the APMs, in directory ASSEM on filestores B and C. The command to invoke it is: ASSEM:ASS88K file {-{NO}OBJ} {-{NO}LIS} {-{NO}DIS} The single parameter is the name of the file to be assembled. A file name extension of ".88K" is assumed, and the output will go to three files with extensions ".OBJ", ".LIS", and ".DIS", respectively. The options -OBJ, -NOOBJ, -LIS, -NOLIS, -DIS, -NODIS (which may be abbreviated to -O etc) may be used to enable or disable generation of these respective files. The default options are -OBJ-LIS-NODIS. Using the loader The loader will link together a number of object modules, and build an image file suitable for putting into EPROM. The command is: ASSEM:LOAD88K infile1, infile2, ... / outfile {-codebase=??} {-database=??} The input file names are all assumed to have extension ".OBJ", the output file will have extension ".ROM". If "/outfile" is omitted, the output file will have the same name as the first input file. Specification of the code and data area base addresses is optional (they default to 0 and the size of the code area, respectively), their values are used in the computation of addresses of external objects. The "??" values are specified in hexadecimal. The layout of the output image is simply the juxtaposition of the code areas of all the input files (in the order in which they were specified in the command), followed by the similarly juxtaposed data areas, in the same order, initialised as specified in the respective object files, with all links required by import/export directives filled in appropriately. If it is sufficient for the data areas to be read-only, the above defaults are satisfactory. Normally, however, the programmer will expect these to be writeable, so a non-zero value for the base address of the data area (corresponding to writeable RAM in the address space of the hardware) will almost always need to be supplied. Being ROM based code, it is assumed that the main (first) file will begin with an exception vector table. The first word of this contains the first instruction to be executed, and the loader will overwrite the second word with two halfwords, namely the sums of the sizes (in words, not bytes) of the code and data areas, respectively, of all the files being linked together. It is recommended that the main program begins by copying the ROM image of the data areas into RAM. To do this, make the first instruction of the vector table a BSR to the start of the main program, this points R1 at the two aforementioned halfwords. It may be a good idea to make the first word of the data area an external self-reference, in order to convey the RAM area start address to the program. The alternative of having that value specified as a constant in the program involves the risk of it becoming inconsistent with the value passed as parameter to the linker. Example: BSR COPY ; first word of vector table PLANT 0 ; dummy second word ... ; rest of exception vector table SELF DATAEXPORT SELF ; export start of data area DATAIMPORT SELF ; and import it again COPY LD.HU T1,0(R1) ; T1 = size of code area LD.HU T2,2(R1) ; T2 = size of data area SUBU R1,R1,4 ; point R1 at start of code area LDA SB,R1[T1] ; point SB at start of ROM data area LD T1,SELF ; point T1 at start of RAM data area LOOP SUBU T2,T2,1 ; decrement word count LD T3,SB[T2] ; copy using count as index, ST T3,T1[T2] ; i.e. copy last word first BCND GT0,T2,LOOP ; go round for more until T2=0 Pre-defined operand names The assembler knows the names of the general registers (R0-R31), of the CPU control registers (CR0-CR20), and of the FPU control registers (FCR0-FCR8, FCR62, FCR63). Bearing in mind conventions suggested in the manual, the assembler recognises alternative names for the general registers, as follows: P1-P8 R2-R9 Eight registers for passing PARAMETERs to procedures T1-T4 R13-R10 Four TEMPORARY working registers V1-V12 R14-R25 Twelve registers to be used as local VARIABLES B1-B2 R27-R26 Two local BASE registers PB R28 The PROGRAM base pointer SB R29 The STATIC base pointer FP R30 The FRAME pointer SP R31 The STACK pointer Regarding R26-R29, the words "reserved for linker" in the manual are loosely interpreted here as "base registers for various purposes including external linkage between separately compiled modules". The use of SB has already been explained. L1 and L2 may be used by compilers to access intermediate level variables, for high-level languages which allow this (Imp and Pascal do, C, not being high-level, does not), PB is a spare "Global Data Pointer" register. The exact purpose of L1, L2, and PB is left open to be defined by the compiler designer. The following alternative names for CPU and FPU control registers are recognised, and the registers they refer to are those stated in the manual. PID, PSR, EPSR, SSBR, SXIP, SNIP, SFIP, VBR, DMT0, DMD0, DMA0, DMT1, DMD1, DMA1, DMT2, DMD2, DMA2, SR0, SR1, SR2, SR3, FPECR, FPHS1, FPLS1, FPHS2, FPLS2, FPPT, FPRH, FPRL, FPIT, FPSR, FPCR. Various constants are also pre-defined, especially for use in BB0/BB1 instructions following a comparison or for use in the BCND instruction. These constants are: For BCND: GT0, LE0, EQ0, NE0, GE0, LT0 (refer to BCND instruction in manual). For BB0/BB1: NC, CP, EQ, NE, GT, LE, LT, GE, HI, LS, LO, HS, OU, IB, IN, OB (refer to CMP and FCMP instructions in manual for meanings). General syntax The assembler expects statements at most one per line. Comments may be included in the source text in one of two forms. They either begin with a semicolon (';') and extend to the end of the line, or they begin and end with matching curly brackets, in which case they may extend over several lines. Statements consist of a label, an instruction, and parameters. The label and instruction are optional, and the number of parameters required depends on the instruction, if present. A label, if present, must start at the beginning of the line, and may be followed by a colon (':'). Instructions may not commence at the beginning of a line, i.e. if they are not labelled, the line must begin with at least one space. An "instruction" can take one of two forms. It is either a real instruction (i.e. an 88100 opcode) or an assembler directive. If a real instruction is labelled, this has the effect of defining that label to be the address of that instruction. This effect does not pertain to labels attached to directives. Each particular directive defines its own effect on its label, indeed some directives require a mandatory label, while some require to have none. A label without an instruction has the same effect as a label with a real instruction, i.e. it defines the label as being the value of the current code location counter, which is the address of the next instruction. Names (of instructions, labels, and operands) consist of any number of (well, at most 255) letters, digits, underline, or dollar symbols, but must begin with a letter. Upper and lower case letters are not distinguished. The special name * may be used to refer to the address of the current instruction. Literal constants consist of signed decimal numbers or quoted ASCII constants (using single quote marks, e.g. 'A', '?', or '''' (note special form of quoted quote)). Non-decimal numbers are specified using the form RADIX_NUMBER; any radix may be used, but the most useful will be hexadecimal, octal, or binary (e.g. 16_FF80, 8_507, 2_10101). An alternative notation is also accepted (e.g. 0xff80, 0o507, 0b10101). Single-precision floating-point constants are recognised, and these take the form . Note that as the integer-part is mandatory, .1 must be expressed as 0.1. The optional exponent part consists of '@' or 'E' or 'e' followed by a signed integer. Constant expressions may be formed by stringing together literal or predefined integer constants, using the usual arithmetic operators. Note, however, that the assembler does not observe the usual rules of operator precedence. It evaluates expressions strictly right-to-left, but brackets may be used to over-ride this. The operators recognised are: +, -, *, /, %, \, &, !, !!, <<, >> (% means remainder, \ means exponentiation, ! means or, !! means exclusive or). All these operate on integer expressions only. Although the assembler accepts floating-point numbers in this context, they will be treated as if the IEEE pattern were integers, and so the effect may or may not be what the programmer expects. "1.5+1", for example, is pretty meaningless, but "1.5&16_ffff" and "1.5>>16" may be used to load (or or) the two halves of the number 1.5 into a register. Syntax of memory operands The memory referencing instructions (XMEM, LD, ST, and LDA) require one of their parameters to be a register (the one that will be loaded from or stored to or have its contents exchanged with memory), the other parameter identifies the memory operand. The manual shows three notations for this, depending on the addressing mode. For scaled indexing the notation places the index register, enclosed in square brackets, after the base register (as in ST T1,FP[T2]), while for unscaled and immediate indexing the convention is to place the index, whether it is a constant or a register, after the base register, with a comma inbetween (as in LD T1,FP,T2 or ST T2,SP,8). While this assembler accepts this comma-form, it also allows the form in which the index is placed before the base register, the latter being enclosed in round brackets (as in LD T1,T2(FP) or ST T1,8(SP)). Furthermore, it is possible to define a name as being a complete memory reference, using the DEF, DS, or VAR directives. For example, following "FRED DEF R5[R6]" and "BERT DEF 4(FP)", one could write "LD T1,FRED" and "ST T1,BERT", which would mean the same as "LD T1,R5[R6]" and "ST T1,FP,4". A few notes regarding some instructions All the instructions mentioned in the manual are recognised. In addition, alternative mnemonics ASR and LSR are accepted for EXT and EXTU, respectively, and alternatives ASL and LSL are accepted for MAK. Remember that ROT means rotate RIGHT. Unlike as recommended in the manual, the assembler does not convert LDA to ADDU in circumstances where this would be preferable, this responsibility is left to the programmer. Assembler Directives BEGIN and END may be used to restrict the scope of labels and names, so that these may be re-used. BEGIN/END may be nested to any depth. INCLUDE may be used to incorporate the contents of another file into the source text. The parameter should be a filename enclosed in quotes. DEF is used to define a name to be either a constant or an operand reference. A label is mandatory. Examples: "LIMIT DEF 127" or "SP DEF R31" or "I DEF 8(FP)". EQU is accepted as an alternative to DEF. A warning is given if an attempt is made to define a name which already exists. REDEF is provided for occasions on which it is desired to do this. DC (with optional size suffix .B, .H, .W, .F, or .D) stands for DEFINE CONSTANT, and is used to plant a value (byte, halfword, or word (used if no suffix is given), and single and double precision floating point) into the data area. If a label is present, that name is defined as a memory reference relative to base register SB, at the current data offset, which is first aligned to the relevant multiple of 2, 4, or 8 if necessary. Then the value which follows as a constant expression is added to the data area, and the offset is incremented by its size. A list of values separated by commas may be given. If, in the case of DC.B, it is desired to plant a text string, a list of single characters enclosed in quotes may be abbreviated (e.g. 'a','b','c' may be written as 'abc'). If double quotes are used instead, the string length byte is planted first (e.g. "abc" is equivalent to 3,'a','b','c'). DS (with optional suffix .B, .H, .W, .F, or .D) stands for DEFINE STORAGE, and is used to reserve space in the data area without planting initial values. If a label is given, it is defined, as with DC, as a memory reference into the data area. The optional parameter is a constant expression denoting the number of slots (of the size determined by the suffix) to reserve, i.e. the amount by which to advance the data offset. A value of 1 is assumed in the absence of a parameter. The BASE directive is used to establish the base register and initial offset for subsequent use of the VAR directive, below. The VAR directive is similar to the DS directive in that it reserves space and defines the label, if any, as being a memory reference at a certain offset (which is incremented appropriately) from some base register. Example: BASE 8(FP) ; establish current base as 8(FP) I VAR ; define I as 8(FP), increase offset by 4 J VAR 3 ; define J as 12(FP), inc by 12 K VAR.H ; define K as 24(FP), inc by 2 L VAR.H ; define L as 26(FP), inc by 2 The following four directives are concerned with external linkage between modules. The cases numbered 1-4 are discussed below. label CODEIMPORT "name" ; case 1 label CODEEXPORT "name" ; case 2 label DATAIMPORT "name" ; case 3 label DATAEXPORT "name" ; case 4 Cases 2 and 4 define "name" as an object defined in (i.e. to be exported from) the module being assembled, case 2 exports an external procedure from the code area, at the current location counter value, case 4 exports an external variable from the data area, at the current data offset. Labels, if present (and they usually will be), are defined in the normal way as a code label (case 2) or a data reference as with the DC/DS directives (case 4). Cases 1 and 3 assert that "name" is an object defined in (exported from) some other module (to be imported into this one). In both cases the labels (usually present) are defined as a data reference, and in both cases the effect is to add a word (or two) to the data area. In case 3 one word, the address of the external object, is added. In case 1 two words are added, firstly the address of the entry point of the external procedure, and secondly the address of the beginning of the data area belonging to the module containing that procedure. This is needed because the external procedure, once called, may not only need to access static variables of its own, but may indeed need to call further external procedures, the addresses of which will be stored somewhere in its data area. We assume that the called procedure has no way of working out for itself the address of its data area, and therefore it has to be communicated in this way by the calling procedure. It is the responsibility of the loader to fill in the correct addresses, and the object file contains all the information needed to do this. A typical call sequence to an external procedure might look like this: FRED CODEIMPORT "BERT" {evaluate and pass parameters as needed} LD R1,FRED ; pick up addr of entry point JSR.N R1 ; call procedure, but first LD T1,FRED+4 ; establish new data pointer PLANT may used to generate a value in the instruction stream that can not otherwise be generated in the normal way. It is followed by a constant expression, which evaluates to a full word. Any label is treated as a normal code label. Layout of object code file The object file consists of a two-part header followed by several information sections. The first part of the header contains simple items of information, while the second is a list of sizes of the information sections which follow. The layout of the header is somewhat future-proof in that extra fields may be added at a later date without disturbing the meaning of those which are already defined. This is achieved by incorporating information about the size of the header parts into the first word of the header. Throughout the object file, multi-byte values are presented in big-endian order, i.e. most significant byte first. At present only two words are defined for the first part of the header. The first word contains two 8-bit bytes giving the sizes (in words) of the two header parts, and a 16-bit magic number, the purpose of which is to identify the file as an 88100 object module, its value is 88100 modulo 65536. As the first and second parts of the header are currently defined to contain two and six words, respectively, the value of the first word of the object file is therefore 16_02065824. The second word is the size of this module's static data area (needed by the loader when it allocates space). The second part of the header contains the sizes of the six sections which follow. The first four sections contain, respectively, information about code imports, code exports, data imports, data exports. The fifth section contains the module's code, and the sixth contains information for initialising its static data area. The four external linkage sections, not all of which need be present, all have the same layout. For each external name in the section, there appears an offset word (even though data area offsets are limited to two bytes), followed by the name itself (prefixed by a length byte and padded out at the end so that the length byte plus the characters of the name plus the padding occupy a multiple of four bytes). The offset defines where the object being exported is, or where in the data area the loader should fill in the address information of imported objects. The list is terminated by a dummy entry in which the name string is null. The code section simply consists of the code to be loaded. The data initialisation section consists of a number of segments (this saves object file space when there are significantly large uninitialised portions in the data area). Each segment begins with a size word (number of bytes), which is followed by an offset word, which is followed by the data. The segment is padded out to a multiple of four bytes, and there is a dummy null segment at the end (consisting just of one zero word). RWT December 1989, revised 20/12/90 and 25/04/91