Edinburgh University Computer Science Department

                        Intel i860 Assembler User Notes


This document relates  to  a  locally  written  assembler  for  the  Intel  i860
processor.    It   describes  the  input  language  accepted,  as  well  as  the
organisation of the output object file produced.   The assembler is designed  to
be  used in conjunction with high-level language compilers not yet written.   In
particular this means that there is support for  external  linkage,  i.e.  cross
references  (both  to  code  and  data  objects)  between separately compiled or
assembled modules.  It also means that each separately compiled module is likely
to consist of two areas, one containing code, the other static data and external
linkage information.   It is assumed that at run time  the  data  area  will  be
pointed  at by a base register.   One of the "local variable" registers (R4) has
been commandeered for this purpose, and has been called DP for Data Pointer.

To supplement the assembler, a linking loader is available, which takes  several
separately  assembled (or compiled) object files and combines them into a single
image suitable for loading into EPROM.  This is also described.

The user (assembly language programmer) is expected  to  be  familiar  with  the
architecture  and  instruction  set  of  the processor, and/or to have access to
Intel's "i860 64-bit Microprocessor Programmer's Reference Manual".

This is a one-pass assembler.   Therefore the listing  file  which  it  produces
while  processing  the  source input cannot show the actual branch displacements
for forward references.  The generated binary code is therefore omitted from the
listing altogether, which as  a  result  contains  only  line  numbers,  address
offsets,  source text, and error messages.   The generated code is not output on
the fly, but buffered internally to allow forward references to be fixed  up  as
soon  as  the relevant labels are reached.   As an option, an additional listing
file may be generated from the buffered code, which consists  of  code  address,
code  value, and disassembled code, but without the original source text or line
numbers.   This feature is particularly useful in the development stage  of  the
assembler, as it may be used to check that it is working correctly.


Using the assembler

The  assembler  is available on the APMs, in directory ASSEM on filestores B and
C.  The command to invoke it is:

   ASSEM:ASS860 file {-{NO}OBJ} {-{NO}LIS} {-{NO}DIS}

The single parameter is the name of the file  to  be  assembled.   A  file  name
extension  of  ".860"  is  assumed,  and  the output will go to three files with
extensions ".OBJ", ".LIS", and ".DIS", respectively.   The options -OBJ, -NOOBJ,
-LIS,  -NOLIS,  -DIS, -NODIS (which may be abbreviated to -O etc) may be used to
enable or disable generation of these respective files.  The default options are
-OBJ-LIS-NODIS.


Using the loader

The loader will link together a number of object modules,  and  build  an  image
file suitable for putting into EPROM.  The command is:

   ASSEM:LOAD860  infile1, infile2, ... / outfile {-codebase=??} {-database=??}

The  input  file names are all assumed to have extension ".OBJ", the output file
will have extension ".ROM".  If "/outfile" is omitted, the output file will have
the same name as the first input file.   Specification of the code and data area
base  addresses  is  optional  (they default to 0 and the size of the code area,
respectively), their values are used in the computation of addresses of external
objects.  The "??" values are specified in hexadecimal.

The layout of the output image is simply the juxtaposition of the code areas  of
all  the input files (in the order in which they were specified in the command),
followed by the similarly juxtaposed data areas, in the same order,  initialised
as  specified  in  the  respective  object  files,  with  all  links required by
import/export directives filled in appropriately.

If it is sufficient for the data areas to be read-only, the above  defaults  are
satisfactory.   Normally,  however,  the  programmer  will  expect  these  to be
writeable,  so  a  non-zero  value  for  the  base  address  of  the  data  area
(corresponding  to  writeable  RAM  in  the  address space of the hardware) will
almost always need to be supplied.  Being ROM based code, it is assumed that the
main (first) file will begin with an exception vector table.   The first word of
this  contains  the  first  instruction  to  be  executed,  and  the loader will
overwrite the second word with two halfwords, namely the sums of the  sizes  (in
words,  not  bytes)  of  the code and data areas, respectively, of all the files
being linked together.

It is recommended that the main program begins by copying the ROM image  of  the
data areas into RAM.  To do this, ....  Example:

     ...                 ; not in yet


Pre-defined operand names

The  assembler  knows  the  names of the core registers (R0-R31), of the control
registers (FIR, PSR, DIRBASE, DB, FSR, EPSR), and of the FPU registers (F0-F31).
Bearing in mind conventions suggested in the manual,  the  assembler  recognises
alternative names for R0-R31 and the even-numbered F0-F30, as follows:

     SP        R2        The STACK pointer
     FP        R3        The FRAME pointer
     DP        R4        The static DATA pointer
     GP        R5        The GLOBAL data pointer
     V1-V10    R15-R6    Ten registers to be used as local VARIABLES
     P1-P12    R16-R27   Twelve registers for passing PARAMETERs to procedures
     T1-T4     R28-R31   Four TEMPORARY working registers
     FV1-FV7   F2-F14    Seven double floating-point local variables
     FP1-FP6   F16-F26   Six double floating-point parameters
     FT1-FT2   F28-F30   Two double floating-point temporaries

The  use  of  DP has already been explained.   GP may be used by compiler and/or
system designers as a "Global Data Pointer" to access data which  is  common  to
all  modules  making  up the program running in a process.   High-level language
compilers may also choose to allocate some (perhaps V9 and V10) of  the  "local"
registers for use as intermediate level base registers for languages which allow
this ( (Imp and Pascal do, C, not being high-level, does not).


General syntax

The assembler expects statements at most one per line.  Comments may be included
in  the  source  text  in one of two forms.   They either begin with a semicolon
(';') and extend to the end of the line, or they begin  and  end  with  matching
curly brackets, in which case they may extend over several lines.

Statements  consist  of a label, an instruction, and parameters.   The label and
instruction are optional, and the number of parameters required depends  on  the
instruction,  if  present.   A label, if present, must start at the beginning of
the line, and may be followed by a colon (':').   Instructions may not  commence
at  the  beginning of a line, i.e. if they are not labelled, the line must begin
with at least one space.

An "instruction" can take one of two forms.   It is either  a  real  instruction
(i.e.  an  i860  opcode)  or  an assembler directive.   If a real instruction is
labelled, this has the effect of defining that label to be the address  of  that
instruction.   This  effect  does  not pertain to labels attached to directives.
Each particular directive defines its own  effect  on  its  label,  indeed  some
directives require a mandatory label, while some require to have none.   A label
without an instruction has the same effect as a label with a  real  instruction,
i.e.  it  defines  the  label  as  being  the value of the current code location
counter, which is the address of the next instruction.

Names (of instructions, labels, and operands) consist of any number of (well, at
most 255) letters, digits, underline, or dollar symbols, but must begin  with  a
letter.  Upper and lower case letters are not distinguished.  The special name *
may  be  used  to  refer  to  the  address of the current instruction.   Literal
constants consist of signed decimal numbers or  quoted  ASCII  constants  (using
single quote marks, e.g. 'A', '?', or '''' (note special form of quoted quote)).
Non-decimal  numbers are specified using the form RADIX_NUMBER; any radix may be
used, but the most useful will be hexadecimal, octal, or binary  (e.g.  16_FF80,
8_507, 2_10101).   An alternative notation is also accepted (e.g. 0xff80, 0o507,
0b10101).   Single-precision floating-point constants are recognised, and  these


take the form

     <integer-part>.<fraction-part><optional-exponent-part>

Note  that  as the integer-part is mandatory, .1 must be expressed as 0.1.   The
optional exponent part consists of ''  or  'E'  or  'e'  followed  by  a  signed
integer.

Constant  expressions  may be formed by stringing together literal or predefined
integer constants, using the usual arithmetic operators.   Note,  however,  that
the  assembler  does  not  observe  the usual rules of operator precedence.   It
evaluates expressions strictly  right-to-left,  but  brackets  may  be  used  to
over-ride  this.   The operators recognised are: +, -, *, /, %, \, &, !, !!, <<,
>> (% means remainder, \ means exponentiation, ! means or,  !!  means  exclusive
or).   All  these  operate on integer expressions only.   Although the assembler
accepts floating-point numbers in this context, they will be treated as  if  the
IEEE  pattern  were  integers,  and  so  the  effect  may or may not be what the
programmer  expects.  "1.5+1",  for  example,   is   pretty   meaningless,   but
"1.5&16_ffff"  and  "1.5>>16"  may be used to load (or or) the two halves of the
number 1.5 into a register.


Syntax of memory operands

The memory referencing instructions (LD, ST, FLD, FST, etc) require one of their
parameters to be a register (the one that will  be  loaded  from  or  stored  to
memory), the other parameter identifies the memory operand.   The manual defines
the notation for this to be a signed  immediate  offset  or  an  index  register
placed  before  a  base  register  which  is  enclosed  in round brackets (as in
LD.L 8(FP),P1 or  ST.B P2,P3(P4)).   This  is  the  notation  accepted  by  this
assembler,  but  in addition a memory operand may take the form of a single name
if it has been predefined to be such an  operand  using  the  DEF,  DS,  or  VAR
directives.   For example, following "FRED DEF 8(FP)" and "BERT DEF P3(P4)", one
could write "LD.L FRED,P1" and "ST.L P2,BERT", which would mean the same  as  in
the above examples.


A few notes regarding some instructions

All  the  instructions  mentioned  in  the manual are recognised.   In addition,
alternative mnemonics ....


Assembler Directives

BEGIN and END may be used to restrict the scope of labels  and  names,  so  that
these may be re-used.  BEGIN/END may be nested to any depth.

INCLUDE  may be used to incorporate the contents of another file into the source
text.  The parameter should be a filename enclosed in quotes.

DEF is used to define a name to be either a constant or an operand reference.  A
label is mandatory.  Examples: "LIMIT DEF 127" or "SP DEF R31" or "I DEF 8(FP)".
EQU is accepted as an alternative to DEF.   A warning is given if an attempt  is
made to define a name which already exists.   REDEF is provided for occasions on
which it is desired to do this.

DC (with optional size suffix .B, .H, .W, .F, or .D) stands for DEFINE CONSTANT,
and is used to plant a value (byte, halfword, or word  (used  if  no  suffix  is
given), and single and double precision floating point) into the data area.   If


a label is present, that name is defined as a memory reference relative to  base
register  DP, at the current data offset, which is first aligned to the relevant
multiple of 2, 4, or 8 if necessary.  Then the value which follows as a constant
expression is added to the data area, and the offset is incremented by its size.
A list of values separated by commas may be given.   If, in the case of DC.B, it
is  desired  to  plant  a  text  string, a list of single characters enclosed in
quotes may be abbreviated (e.g. 'a','b','c' may be written as 'abc').  If double
quotes are used instead, the string length byte is planted first (e.g. "abc"  is
equivalent to 3,'a','b','c').

DS  (with  optional suffix .B, .H, .W, .F, or .D) stands for DEFINE STORAGE, and
is used to reserve space in the data area without planting initial values.  If a
label is given, it is defined, as with DC, as a memory reference into  the  data
area.   The  optional  parameter is a constant expression denoting the number of
slots (of the size determined by the suffix) to  reserve,  i.e.  the  amount  by
which  to advance the data offset.   A value of 1 is assumed in the absence of a
parameter.

The BASE directive is used to establish the base register and initial offset for
subsequent use of the VAR directive, below.

The VAR directive is similar to the DS directive in that it reserves  space  and
defines  the  label,  if  any,  as  being a memory reference at a certain offset
(which is incremented appropriately) from some base register.  Example:

     BASE 8(FP)     ; establish current base as 8(FP)
I    VAR            ; define I as 8(FP), increase offset by 4
J    VAR 3          ; define J as 12(FP), inc by 12
K    VAR.H          ; define K as 24(FP), inc by 2
L    VAR.H          ; define L as 26(FP), inc by 2


The following four  directives  are  concerned  with  external  linkage  between
modules.  The cases numbered 1-4 are discussed below.

label  CODEIMPORT "name"  ; case 1
label  CODEEXPORT "name"  ; case 2
label  DATAIMPORT "name"  ; case 3
label  DATAEXPORT "name"  ; case 4

Cases  2  and 4 define "name" as an object defined in (i.e. to be exported from)
the module being assembled, case 2 exports an external procedure from  the  code
area, at the current location counter value, case 4 exports an external variable
from  the  data area, at the current data offset.   Labels, if present (and they
usually will be), are defined in the normal way as a code label (case  2)  or  a
data reference as with the DC/DS directives (case 4).

Cases  1  and  3 assert that "name" is an object defined in (exported from) some
other module (to be imported into this one).   In both cases the labels (usually
present) are defined as a data reference, and in both cases the effect is to add
a  word  (or  two)  to  the  data area.   In case 3 one word, the address of the
external object, is added.   In case 1 two words are added, firstly the  address
of  the  entry  point of the external procedure, and secondly the address of the
beginning of the data area belonging to the module  containing  that  procedure.
This is needed because the external procedure, once called, may not only need to
access static variables of its own, but may indeed need to call further external
procedures,  the  addresses  of which will be stored somewhere in its data area.
We assume that the called procedure has no way of working  out  for  itself  the
address of its data area, and therefore it has to be communicated in this way by
the  calling  procedure.   It is the responsibility of the loader to fill in the


correct addresses, and the object file contains all the information needed to do
this.  A typical call sequence to an external procedure might look like this:

FRED  CODEIMPORT  "BERT"       *** adapt ***

      {evaluate and pass parameters as needed}
      ADDU V1,DP,R0      ; save our own data area address
      LD R1,FRED         ; pick up entry point address
      JSR.N R1           ; call procedure, but first
      LD DP,FRED+4       ; establish new data pointer
      ADDU DP,V1,R0      ; restore own data pointer


PLANT may used to generate a value  in  the  instruction  stream  that  can  not
otherwise  be  generated  in  the  normal  way.   It  is  followed by a constant
expression, which evaluates to a full word.   Any label is treated as  a  normal
code label.

The  ORG  directive  may  be  used  to  advance the code location counter.   The
parameter is the new value of the location counter.   Since it is intended  that
code should be position independent, and as there is no object file directive to
load code at a particular address, it is unlikely that there is ever any need to
use this directive.  Code is always generated starting at notional address zero.
If  the  programmer  knows that the code is going to be loaded, say, starting at
16_800, then beginning the source text with "ORG 16_800" will have the undesired
effect of prefixing the generated code with 2048 bytes of rubbish.

The directives DUAL and ENDDUAL may be used to force setting of the  D  bits  in
floating  point  instructions.   Within a DUAL...ENDDUAL section, every floating
point instruction will be treated as if it  had  been  supplied  with  the  'D.'
prefix.   In addition the DUAL directive has the effect of aligning the location
counter to an even instruction boundary, by  planting  a  NOP  if  the  location
counter is not divisible by 8.


Layout of object code file

The  object  file  consists of a two-part header followed by several information
sections.   The first part of the header contains simple items  of  information,
while  the  second  is a list of sizes of the information sections which follow.
The layout of the header is somewhat future-proof in that extra  fields  may  be
added  at a later date without disturbing the meaning of those which are already
defined.   This is achieved by incorporating information about the size  of  the
header  parts  into  the first word of the header.   Throughout the object file,
multi-byte values are presented in little-endian order, i.e.  least  significant
byte first.

At  present  only  two words are defined for the first part of the header.   The
first word contains two 8-bit bytes giving the  sizes  (in  words)  of  the  two
header parts, and a 16-bit magic number, the purpose of which is to identify the
file as an i860 object module, its value is 860.   As the first and second parts
of the header are currently defined to contain two and six words,  respectively,
the  value  of the first word of the object file is therefore 16_02063C05.   The
second word is the size of this module's static data area (needed by the  loader
when it allocates space).

The  second  part  of  the  header  contains the sizes of the six sections which
follow.   The first four sections contain, respectively, information about  code
imports,  code exports, data imports, data exports.   The fifth section contains
the module's code, and the  sixth  contains  information  for  initialising  its


static data area.

The  four  external linkage sections, not all of which need be present, all have
the same layout.  For each external name in the section, there appears an offset
word (even though data area offsets are limited to two bytes), followed  by  the
name  itself  (prefixed  by  a length byte and padded out at the end so that the
length byte plus the characters of the name plus the padding occupy  a  multiple
of four bytes).  The offset defines where the object being exported is, or where
in  the  data area the loader should fill in the address information of imported
objects.   The list is terminated by a dummy entry in which the name  string  is
null.

The code section simply consists of the code to be loaded.

The  data  initialisation  section  consists of a number of segments (this saves
object file space when there are significantly large uninitialised  portions  in
the  data area).   Each segment begins with a size word (number of bytes), which
is followed by an offset word, which is followed by the data.   The  segment  is
padded out to a multiple of four bytes, and there is a dummy null segment at the
end (consisting just of one zero word).

Document dated 15/03/90 - Rainer W Thonnes