Edinburgh University Computer Science Department Intel i860 Assembler User Notes This document relates to a locally written assembler for the Intel i860 processor. It describes the input language accepted, as well as the organisation of the output object file produced. The assembler is designed to be used in conjunction with high-level language compilers not yet written. In particular this means that there is support for external linkage, i.e. cross references (both to code and data objects) between separately compiled or assembled modules. It also means that each separately compiled module is likely to consist of two areas, one containing code, the other static data and external linkage information. It is assumed that at run time the data area will be pointed at by a base register. One of the "local variable" registers (R4) has been commandeered for this purpose, and has been called DP for Data Pointer. To supplement the assembler, a linking loader is available, which takes several separately assembled (or compiled) object files and combines them into a single image suitable for loading into EPROM. This is also described. The user (assembly language programmer) is expected to be familiar with the architecture and instruction set of the processor, and/or to have access to Intel's "i860 64-bit Microprocessor Programmer's Reference Manual". This is a one-pass assembler. Therefore the listing file which it produces while processing the source input cannot show the actual branch displacements for forward references. The generated binary code is therefore omitted from the listing altogether, which as a result contains only line numbers, address offsets, source text, and error messages. The generated code is not output on the fly, but buffered internally to allow forward references to be fixed up as soon as the relevant labels are reached. As an option, an additional listing file may be generated from the buffered code, which consists of code address, code value, and disassembled code, but without the original source text or line numbers. This feature is particularly useful in the development stage of the assembler, as it may be used to check that it is working correctly. Using the assembler The assembler is available on the APMs, in directory ASSEM on filestores B and C. The command to invoke it is: ASSEM:ASS860 file {-{NO}OBJ} {-{NO}LIS} {-{NO}DIS} The single parameter is the name of the file to be assembled. A file name extension of ".860" is assumed, and the output will go to three files with extensions ".OBJ", ".LIS", and ".DIS", respectively. The options -OBJ, -NOOBJ, -LIS, -NOLIS, -DIS, -NODIS (which may be abbreviated to -O etc) may be used to enable or disable generation of these respective files. The default options are -OBJ-LIS-NODIS. Using the loader The loader will link together a number of object modules, and build an image file suitable for putting into EPROM. The command is: ASSEM:LOAD860 infile1, infile2, ... / outfile {-codebase=??} {-database=??} The input file names are all assumed to have extension ".OBJ", the output file will have extension ".ROM". If "/outfile" is omitted, the output file will have the same name as the first input file. Specification of the code and data area base addresses is optional (they default to 0 and the size of the code area, respectively), their values are used in the computation of addresses of external objects. The "??" values are specified in hexadecimal. The layout of the output image is simply the juxtaposition of the code areas of all the input files (in the order in which they were specified in the command), followed by the similarly juxtaposed data areas, in the same order, initialised as specified in the respective object files, with all links required by import/export directives filled in appropriately. If it is sufficient for the data areas to be read-only, the above defaults are satisfactory. Normally, however, the programmer will expect these to be writeable, so a non-zero value for the base address of the data area (corresponding to writeable RAM in the address space of the hardware) will almost always need to be supplied. Being ROM based code, it is assumed that the main (first) file will begin with an exception vector table. The first word of this contains the first instruction to be executed, and the loader will overwrite the second word with two halfwords, namely the sums of the sizes (in words, not bytes) of the code and data areas, respectively, of all the files being linked together. It is recommended that the main program begins by copying the ROM image of the data areas into RAM. To do this, .... Example: ... ; not in yet Pre-defined operand names The assembler knows the names of the core registers (R0-R31), of the control registers (FIR, PSR, DIRBASE, DB, FSR, EPSR), and of the FPU registers (F0-F31). Bearing in mind conventions suggested in the manual, the assembler recognises alternative names for R0-R31 and the even-numbered F0-F30, as follows: SP R2 The STACK pointer FP R3 The FRAME pointer DP R4 The static DATA pointer GP R5 The GLOBAL data pointer V1-V10 R15-R6 Ten registers to be used as local VARIABLES P1-P12 R16-R27 Twelve registers for passing PARAMETERs to procedures T1-T4 R28-R31 Four TEMPORARY working registers FV1-FV7 F2-F14 Seven double floating-point local variables FP1-FP6 F16-F26 Six double floating-point parameters FT1-FT2 F28-F30 Two double floating-point temporaries The use of DP has already been explained. GP may be used by compiler and/or system designers as a "Global Data Pointer" to access data which is common to all modules making up the program running in a process. High-level language compilers may also choose to allocate some (perhaps V9 and V10) of the "local" registers for use as intermediate level base registers for languages which allow this ( (Imp and Pascal do, C, not being high-level, does not). General syntax The assembler expects statements at most one per line. Comments may be included in the source text in one of two forms. They either begin with a semicolon (';') and extend to the end of the line, or they begin and end with matching curly brackets, in which case they may extend over several lines. Statements consist of a label, an instruction, and parameters. The label and instruction are optional, and the number of parameters required depends on the instruction, if present. A label, if present, must start at the beginning of the line, and may be followed by a colon (':'). Instructions may not commence at the beginning of a line, i.e. if they are not labelled, the line must begin with at least one space. An "instruction" can take one of two forms. It is either a real instruction (i.e. an i860 opcode) or an assembler directive. If a real instruction is labelled, this has the effect of defining that label to be the address of that instruction. This effect does not pertain to labels attached to directives. Each particular directive defines its own effect on its label, indeed some directives require a mandatory label, while some require to have none. A label without an instruction has the same effect as a label with a real instruction, i.e. it defines the label as being the value of the current code location counter, which is the address of the next instruction. Names (of instructions, labels, and operands) consist of any number of (well, at most 255) letters, digits, underline, or dollar symbols, but must begin with a letter. Upper and lower case letters are not distinguished. The special name * may be used to refer to the address of the current instruction. Literal constants consist of signed decimal numbers or quoted ASCII constants (using single quote marks, e.g. 'A', '?', or '''' (note special form of quoted quote)). Non-decimal numbers are specified using the form RADIX_NUMBER; any radix may be used, but the most useful will be hexadecimal, octal, or binary (e.g. 16_FF80, 8_507, 2_10101). An alternative notation is also accepted (e.g. 0xff80, 0o507, 0b10101). Single-precision floating-point constants are recognised, and these take the form . Note that as the integer-part is mandatory, .1 must be expressed as 0.1. The optional exponent part consists of '' or 'E' or 'e' followed by a signed integer. Constant expressions may be formed by stringing together literal or predefined integer constants, using the usual arithmetic operators. Note, however, that the assembler does not observe the usual rules of operator precedence. It evaluates expressions strictly right-to-left, but brackets may be used to over-ride this. The operators recognised are: +, -, *, /, %, \, &, !, !!, <<, >> (% means remainder, \ means exponentiation, ! means or, !! means exclusive or). All these operate on integer expressions only. Although the assembler accepts floating-point numbers in this context, they will be treated as if the IEEE pattern were integers, and so the effect may or may not be what the programmer expects. "1.5+1", for example, is pretty meaningless, but "1.5&16_ffff" and "1.5>>16" may be used to load (or or) the two halves of the number 1.5 into a register. Syntax of memory operands The memory referencing instructions (LD, ST, FLD, FST, etc) require one of their parameters to be a register (the one that will be loaded from or stored to memory), the other parameter identifies the memory operand. The manual defines the notation for this to be a signed immediate offset or an index register placed before a base register which is enclosed in round brackets (as in LD.L 8(FP),P1 or ST.B P2,P3(P4)). This is the notation accepted by this assembler, but in addition a memory operand may take the form of a single name if it has been predefined to be such an operand using the DEF, DS, or VAR directives. For example, following "FRED DEF 8(FP)" and "BERT DEF P3(P4)", one could write "LD.L FRED,P1" and "ST.L P2,BERT", which would mean the same as in the above examples. A few notes regarding some instructions All the instructions mentioned in the manual are recognised. In addition, alternative mnemonics .... Assembler Directives BEGIN and END may be used to restrict the scope of labels and names, so that these may be re-used. BEGIN/END may be nested to any depth. INCLUDE may be used to incorporate the contents of another file into the source text. The parameter should be a filename enclosed in quotes. DEF is used to define a name to be either a constant or an operand reference. A label is mandatory. Examples: "LIMIT DEF 127" or "SP DEF R31" or "I DEF 8(FP)". EQU is accepted as an alternative to DEF. A warning is given if an attempt is made to define a name which already exists. REDEF is provided for occasions on which it is desired to do this. DC (with optional size suffix .B, .H, .W, .F, or .D) stands for DEFINE CONSTANT, and is used to plant a value (byte, halfword, or word (used if no suffix is given), and single and double precision floating point) into the data area. If a label is present, that name is defined as a memory reference relative to base register DP, at the current data offset, which is first aligned to the relevant multiple of 2, 4, or 8 if necessary. Then the value which follows as a constant expression is added to the data area, and the offset is incremented by its size. A list of values separated by commas may be given. If, in the case of DC.B, it is desired to plant a text string, a list of single characters enclosed in quotes may be abbreviated (e.g. 'a','b','c' may be written as 'abc'). If double quotes are used instead, the string length byte is planted first (e.g. "abc" is equivalent to 3,'a','b','c'). DS (with optional suffix .B, .H, .W, .F, or .D) stands for DEFINE STORAGE, and is used to reserve space in the data area without planting initial values. If a label is given, it is defined, as with DC, as a memory reference into the data area. The optional parameter is a constant expression denoting the number of slots (of the size determined by the suffix) to reserve, i.e. the amount by which to advance the data offset. A value of 1 is assumed in the absence of a parameter. The BASE directive is used to establish the base register and initial offset for subsequent use of the VAR directive, below. The VAR directive is similar to the DS directive in that it reserves space and defines the label, if any, as being a memory reference at a certain offset (which is incremented appropriately) from some base register. Example: BASE 8(FP) ; establish current base as 8(FP) I VAR ; define I as 8(FP), increase offset by 4 J VAR 3 ; define J as 12(FP), inc by 12 K VAR.H ; define K as 24(FP), inc by 2 L VAR.H ; define L as 26(FP), inc by 2 The following four directives are concerned with external linkage between modules. The cases numbered 1-4 are discussed below. label CODEIMPORT "name" ; case 1 label CODEEXPORT "name" ; case 2 label DATAIMPORT "name" ; case 3 label DATAEXPORT "name" ; case 4 Cases 2 and 4 define "name" as an object defined in (i.e. to be exported from) the module being assembled, case 2 exports an external procedure from the code area, at the current location counter value, case 4 exports an external variable from the data area, at the current data offset. Labels, if present (and they usually will be), are defined in the normal way as a code label (case 2) or a data reference as with the DC/DS directives (case 4). Cases 1 and 3 assert that "name" is an object defined in (exported from) some other module (to be imported into this one). In both cases the labels (usually present) are defined as a data reference, and in both cases the effect is to add a word (or two) to the data area. In case 3 one word, the address of the external object, is added. In case 1 two words are added, firstly the address of the entry point of the external procedure, and secondly the address of the beginning of the data area belonging to the module containing that procedure. This is needed because the external procedure, once called, may not only need to access static variables of its own, but may indeed need to call further external procedures, the addresses of which will be stored somewhere in its data area. We assume that the called procedure has no way of working out for itself the address of its data area, and therefore it has to be communicated in this way by the calling procedure. It is the responsibility of the loader to fill in the correct addresses, and the object file contains all the information needed to do this. A typical call sequence to an external procedure might look like this: FRED CODEIMPORT "BERT" *** adapt *** {evaluate and pass parameters as needed} ADDU V1,DP,R0 ; save our own data area address LD R1,FRED ; pick up entry point address JSR.N R1 ; call procedure, but first LD DP,FRED+4 ; establish new data pointer ADDU DP,V1,R0 ; restore own data pointer PLANT may used to generate a value in the instruction stream that can not otherwise be generated in the normal way. It is followed by a constant expression, which evaluates to a full word. Any label is treated as a normal code label. The ORG directive may be used to advance the code location counter. The parameter is the new value of the location counter. Since it is intended that code should be position independent, and as there is no object file directive to load code at a particular address, it is unlikely that there is ever any need to use this directive. Code is always generated starting at notional address zero. If the programmer knows that the code is going to be loaded, say, starting at 16_800, then beginning the source text with "ORG 16_800" will have the undesired effect of prefixing the generated code with 2048 bytes of rubbish. The directives DUAL and ENDDUAL may be used to force setting of the D bits in floating point instructions. Within a DUAL...ENDDUAL section, every floating point instruction will be treated as if it had been supplied with the 'D.' prefix. In addition the DUAL directive has the effect of aligning the location counter to an even instruction boundary, by planting a NOP if the location counter is not divisible by 8. Layout of object code file The object file consists of a two-part header followed by several information sections. The first part of the header contains simple items of information, while the second is a list of sizes of the information sections which follow. The layout of the header is somewhat future-proof in that extra fields may be added at a later date without disturbing the meaning of those which are already defined. This is achieved by incorporating information about the size of the header parts into the first word of the header. Throughout the object file, multi-byte values are presented in little-endian order, i.e. least significant byte first. At present only two words are defined for the first part of the header. The first word contains two 8-bit bytes giving the sizes (in words) of the two header parts, and a 16-bit magic number, the purpose of which is to identify the file as an i860 object module, its value is 860. As the first and second parts of the header are currently defined to contain two and six words, respectively, the value of the first word of the object file is therefore 16_02063C05. The second word is the size of this module's static data area (needed by the loader when it allocates space). The second part of the header contains the sizes of the six sections which follow. The first four sections contain, respectively, information about code imports, code exports, data imports, data exports. The fifth section contains the module's code, and the sixth contains information for initialising its static data area. The four external linkage sections, not all of which need be present, all have the same layout. For each external name in the section, there appears an offset word (even though data area offsets are limited to two bytes), followed by the name itself (prefixed by a length byte and padded out at the end so that the length byte plus the characters of the name plus the padding occupy a multiple of four bytes). The offset defines where the object being exported is, or where in the data area the loader should fill in the address information of imported objects. The list is terminated by a dummy entry in which the name string is null. The code section simply consists of the code to be loaded. The data initialisation section consists of a number of segments (this saves object file space when there are significantly large uninitialised portions in the data area). Each segment begins with a size word (number of bytes), which is followed by an offset word, which is followed by the data. The segment is padded out to a multiple of four bytes, and there is a dummy null segment at the end (consisting just of one zero word). Document dated 15/03/90 - Rainer W Thonnes