Edinburgh University Computer Science Department

                      Motorola 88100 Assembler User Notes


This document relates to a locally written  assembler  for  the  Motorola  88100
processor.    It   describes  the  input  language  accepted,  as  well  as  the
organisation of the output object file produced.   The assembler is designed  to
be  used in conjunction with high-level language compilers not yet written.   In
particular this means that there is support for  external  linkage,  i.e.  cross
references  (both  to  code  and  data  objects)  between separately compiled or
assembled modules.  It also means that each separately compiled module is likely
to consist of two areas, one containing code, the other static data and external
linkage information.   It is assumed that at run time this  data  area  will  be
pointed at by a base register.   One of the registers "reserved for the linker",
R29, has been commandeered for this purpose, and has been called SB  for  Static
Base.

To  supplement the assembler, a linking loader is available, which takes several
separately assembled (or compiled) object files and combines them into a  single
image suitable for loading into EPROM.  This is also described.

The  user  (assembly  language  programmer)  is expected to be familiar with the
architecture and instruction set of the processor,  and/or  to  have  access  to
Motorola's "MC88100 RISC Microprocessor User's Manual".

This  is  a  one-pass  assembler.   Therefore the listing file which it produces
while processing the source input cannot show the  actual  branch  displacements
for forward references.  The generated binary code is therefore omitted from the
listing  altogether,  which  as  a  result  contains  only line numbers, address
offsets, source text, and error messages.   The generated code is not output  on
the  fly,  but buffered internally to allow forward references to be fixed up as
soon as the relevant labels are reached.   As an option, an  additional  listing
file  may  be  generated from the buffered code, which consists of code address,
code value, and disassembled code, but without the original source text or  line
numbers.   This  feature  is particularly useful in the development stage of the
assembler, as it may be used to check that it is working correctly.


Using the assembler

The assembler is available on the APMs, in directory ASSEM on filestores  B  and
C.  The command to invoke it is:

   ASSEM:ASS88K file {-{NO}OBJ} {-{NO}LIS} {-{NO}DIS}

The  single  parameter  is  the  name of the file to be assembled.   A file name
extension of ".88K" is assumed, and the output  will  go  to  three  files  with
extensions ".OBJ", ".LIS", and ".DIS", respectively.   The options -OBJ, -NOOBJ,
-LIS, -NOLIS, -DIS, -NODIS (which may be abbreviated to -O etc) may be  used  to
enable or disable generation of these respective files.  The default options are
-OBJ-LIS-NODIS.


Using the loader

The  loader  will  link  together a number of object modules, and build an image
file suitable for putting into EPROM.  The command is:

   ASSEM:LOAD88K  infile1, infile2, ... / outfile {-codebase=??} {-database=??}

The input file names are all assumed to have extension ".OBJ", the  output  file
will have extension ".ROM".  If "/outfile" is omitted, the output file will have
the same name as the first input file.   Specification of the code and data area
base addresses is optional (they default to 0 and the size  of  the  code  area,
respectively), their values are used in the computation of addresses of external
objects.  The "??" values are specified in hexadecimal.

The  layout of the output image is simply the juxtaposition of the code areas of
all the input files (in the order in which they were specified in the  command),
followed  by the similarly juxtaposed data areas, in the same order, initialised
as specified in  the  respective  object  files,  with  all  links  required  by
import/export directives filled in appropriately.

If  it  is sufficient for the data areas to be read-only, the above defaults are
satisfactory.   Normally, however,  the  programmer  will  expect  these  to  be
writeable,  so  a  non-zero  value  for  the  base  address  of  the  data  area
(corresponding to writeable RAM in the  address  space  of  the  hardware)  will
almost always need to be supplied.  Being ROM based code, it is assumed that the
main (first) file will begin with an exception vector table.   The first word of
this contains the  first  instruction  to  be  executed,  and  the  loader  will
overwrite  the  second word with two halfwords, namely the sums of the sizes (in
words, not bytes) of the code and data areas, respectively,  of  all  the  files
being linked together.

It  is  recommended that the main program begins by copying the ROM image of the
data areas into RAM.  To do this, make the first instruction of the vector table
a BSR to the start of the main program, this points R1 at the two aforementioned
halfwords.   It may be a good idea to make the first word of the  data  area  an
external  self-reference,  in  order to convey the RAM area start address to the
program.   The alternative of having that value specified as a constant  in  the
program  involves  the risk of it becoming inconsistent with the value passed as
parameter to the linker.  Example:

     BSR COPY            ; first word of vector table
     PLANT 0             ; dummy second word
     ...                 ; rest of exception vector table

SELF DATAEXPORT SELF     ; export start of data area
     DATAIMPORT SELF     ; and import it again

COPY LD.HU T1,0(R1)      ; T1 = size of code area
     LD.HU T2,2(R1)      ; T2 = size of data area
     SUBU R1,R1,4        ; point R1 at start of code area
     LDA SB,R1[T1]       ; point SB at start of ROM data area
     LD T1,SELF          ; point T1 at start of RAM data area
LOOP SUBU T2,T2,1        ; decrement word count
     LD T3,SB[T2]        ; copy using count as index,
     ST T3,T1[T2]        ;   i.e. copy last word first
     BCND GT0,T2,LOOP    ; go round for more until T2=0


Pre-defined operand names

The assembler knows the names of the general  registers  (R0-R31),  of  the  CPU
control  registers  (CR0-CR20),  and  of  the  FPU control registers (FCR0-FCR8,
FCR62, FCR63).   Bearing in  mind  conventions  suggested  in  the  manual,  the
assembler recognises alternative names for the general registers, as follows:

     P1-P8     R2-R9     Eight registers for passing PARAMETERs to procedures
     T1-T4     R13-R10   Four TEMPORARY working registers
     V1-V12    R14-R25   Twelve registers to be used as local VARIABLES
     B1-B2     R27-R26   Two local BASE registers
     PB        R28       The PROGRAM base pointer
     SB        R29       The STATIC base pointer
     FP        R30       The FRAME pointer
     SP        R31       The STACK pointer

Regarding  R26-R29,  the  words  "reserved for linker" in the manual are loosely
interpreted here as "base registers  for  various  purposes  including  external
linkage  between  separately compiled modules".   The use of SB has already been
explained.   L1 and L2 may be used by compilers  to  access  intermediate  level
variables,  for high-level languages which allow this (Imp and Pascal do, C, not
being high-level, does not), PB is a spare "Global Data Pointer" register.   The
exact  purpose  of  L1,  L2,  and  PB is left open to be defined by the compiler
designer.

The  following  alternative  names  for  CPU  and  FPU  control  registers   are
recognised, and the registers they refer to are those stated in the manual.
PID, PSR, EPSR, SSBR, SXIP, SNIP, SFIP, VBR, DMT0, DMD0, DMA0, DMT1, DMD1, DMA1,
DMT2,  DMD2,  DMA2, SR0, SR1, SR2, SR3, FPECR, FPHS1, FPLS1, FPHS2, FPLS2, FPPT,
FPRH, FPRL, FPIT, FPSR, FPCR.

Various  constants  are  also  pre-defined,  especially  for  use   in   BB0/BB1
instructions  following a comparison or for use in the BCND instruction.   These
constants are:
For BCND: GT0, LE0, EQ0, NE0, GE0, LT0 (refer to BCND instruction in manual).
For BB0/BB1: NC, CP, EQ, NE, GT, LE, LT, GE, HI, LS, LO,  HS,  OU,  IB,  IN,  OB
(refer to CMP and FCMP instructions in manual for meanings).


General syntax

The assembler expects statements at most one per line.  Comments may be included
in  the  source  text  in one of two forms.   They either begin with a semicolon
(';') and extend to the end of the line, or they begin  and  end  with  matching
curly brackets, in which case they may extend over several lines.

Statements  consist  of a label, an instruction, and parameters.   The label and
instruction are optional, and the number of parameters required depends  on  the
instruction,  if  present.   A label, if present, must start at the beginning of
the line, and may be followed by a colon (':').   Instructions may not  commence
at  the  beginning of a line, i.e. if they are not labelled, the line must begin
with at least one space.

An "instruction" can take one of two forms.   It is either  a  real  instruction
(i.e.  an  88100  opcode)  or an assembler directive.   If a real instruction is
labelled, this has the effect of defining that label to be the address  of  that
instruction.   This  effect  does  not pertain to labels attached to directives.
Each particular directive defines its own  effect  on  its  label,  indeed  some
directives require a mandatory label, while some require to have none.   A label
without an instruction has the same effect as a label with a  real  instruction,


i.e.  it  defines  the  label  as  being  the value of the current code location
counter, which is the address of the next instruction.

Names (of instructions, labels, and operands) consist of any number of (well, at
most 255) letters, digits, underline, or dollar symbols, but must begin  with  a
letter.  Upper and lower case letters are not distinguished.  The special name *
may  be  used  to  refer  to  the  address of the current instruction.   Literal
constants consist of signed decimal numbers or  quoted  ASCII  constants  (using
single quote marks, e.g. 'A', '?', or '''' (note special form of quoted quote)).
Non-decimal  numbers are specified using the form RADIX_NUMBER; any radix may be
used, but the most useful will be hexadecimal, octal, or binary  (e.g.  16_FF80,
8_507, 2_10101).   An alternative notation is also accepted (e.g. 0xff80, 0o507,
0b10101).   Single-precision floating-point constants are recognised, and  these
take the form

     <integer-part>.<fraction-part><optional-exponent-part>

Note  that  as the integer-part is mandatory, .1 must be expressed as 0.1.   The
optional exponent part consists of '@' or  'E'  or  'e'  followed  by  a  signed
integer.

Constant  expressions  may be formed by stringing together literal or predefined
integer constants, using the usual arithmetic operators.   Note,  however,  that
the  assembler  does  not  observe  the usual rules of operator precedence.   It
evaluates expressions strictly  right-to-left,  but  brackets  may  be  used  to
over-ride  this.   The operators recognised are: +, -, *, /, %, \, &, !, !!, <<,
>> (% means remainder, \ means exponentiation, ! means or,  !!  means  exclusive
or).   All  these  operate on integer expressions only.   Although the assembler
accepts floating-point numbers in this context, they will be treated as  if  the
IEEE  pattern  were  integers,  and  so  the  effect  may or may not be what the
programmer  expects.  "1.5+1",  for  example,   is   pretty   meaningless,   but
"1.5&16_ffff"  and  "1.5>>16"  may be used to load (or or) the two halves of the
number 1.5 into a register.


Syntax of memory operands

The memory referencing instructions (XMEM, LD, ST, and LDA) require one of their
parameters to be a register (the one that will be loaded from or  stored  to  or
have  its  contents  exchanged  with memory), the other parameter identifies the
memory operand.   The manual shows three notations for this,  depending  on  the
addressing  mode.   For  scaled indexing the notation places the index register,
enclosed in square brackets, after the base register (as in ST T1,FP[T2]), while
for unscaled and immediate indexing  the  convention  is  to  place  the  index,
whether  it  is  a constant or a register, after the base register, with a comma
inbetween (as in LD T1,FP,T2 or ST T2,SP,8).   While this assembler accepts this
comma-form, it also allows the form in which the index is placed before the base
register,  the  latter  being  enclosed in round brackets (as in LD T1,T2(FP) or
ST T1,8(SP)).

Furthermore, it is possible  to  define  a  name  as  being  a  complete  memory
reference,  using  the  DEF,  DS,  or  VAR  directives.   For example, following
"FRED DEF R5[R6]"  and  "BERT DEF 4(FP)",  one  could  write  "LD T1,FRED"   and
"ST T1,BERT", which would mean the same as "LD T1,R5[R6]" and "ST T1,FP,4".


A few notes regarding some instructions

All  the  instructions  mentioned  in  the manual are recognised.   In addition,
alternative mnemonics ASR and LSR are accepted for EXT and  EXTU,  respectively,


and  alternatives  ASL  and  LSL are accepted for MAK.   Remember that ROT means
rotate RIGHT.

Unlike as recommended in the manual, the assembler does not convert LDA to  ADDU
in  circumstances where this would be preferable, this responsibility is left to
the programmer.


Assembler Directives

BEGIN and END may be used to restrict the scope of labels  and  names,  so  that
these may be re-used.  BEGIN/END may be nested to any depth.

INCLUDE  may be used to incorporate the contents of another file into the source
text.  The parameter should be a filename enclosed in quotes.

DEF is used to define a name to be either a constant or an operand reference.  A
label is mandatory.  Examples: "LIMIT DEF 127" or "SP DEF R31" or "I DEF 8(FP)".
EQU is accepted as an alternative to DEF.   A warning is given if an attempt  is
made to define a name which already exists.   REDEF is provided for occasions on
which it is desired to do this.

DC (with optional size suffix .B, .H, .W, .F, or .D) stands for DEFINE CONSTANT,
and is used to plant a value (byte, halfword, or word  (used  if  no  suffix  is
given), and single and double precision floating point) into the data area.   If
a label is present, that name is defined as a memory reference relative to  base
register  SB, at the current data offset, which is first aligned to the relevant
multiple of 2, 4, or 8 if necessary.  Then the value which follows as a constant
expression is added to the data area, and the offset is incremented by its size.
A list of values separated by commas may be given.   If, in the case of DC.B, it
is  desired  to  plant  a  text  string, a list of single characters enclosed in
quotes may be abbreviated (e.g. 'a','b','c' may be written as 'abc').  If double
quotes are used instead, the string length byte is planted first (e.g. "abc"  is
equivalent to 3,'a','b','c').

DS  (with  optional suffix .B, .H, .W, .F, or .D) stands for DEFINE STORAGE, and
is used to reserve space in the data area without planting initial values.  If a
label is given, it is defined, as with DC, as a memory reference into  the  data
area.   The  optional  parameter is a constant expression denoting the number of
slots (of the size determined by the suffix) to  reserve,  i.e.  the  amount  by
which  to advance the data offset.   A value of 1 is assumed in the absence of a
parameter.

The BASE directive is used to establish the base register and initial offset for
subsequent use of the VAR directive, below.

The VAR directive is similar to the DS directive in that it reserves  space  and
defines  the  label,  if  any,  as  being a memory reference at a certain offset
(which is incremented appropriately) from some base register.  Example:

     BASE 8(FP)     ; establish current base as 8(FP)
I    VAR            ; define I as 8(FP), increase offset by 4
J    VAR 3          ; define J as 12(FP), inc by 12
K    VAR.H          ; define K as 24(FP), inc by 2
L    VAR.H          ; define L as 26(FP), inc by 2


The following four  directives  are  concerned  with  external  linkage  between
modules.  The cases numbered 1-4 are discussed below.


label  CODEIMPORT "name"  ; case 1
label  CODEEXPORT "name"  ; case 2
label  DATAIMPORT "name"  ; case 3
label  DATAEXPORT "name"  ; case 4

Cases  2  and 4 define "name" as an object defined in (i.e. to be exported from)
the module being assembled, case 2 exports an external procedure from  the  code
area, at the current location counter value, case 4 exports an external variable
from  the  data area, at the current data offset.   Labels, if present (and they
usually will be), are defined in the normal way as a code label (case  2)  or  a
data reference as with the DC/DS directives (case 4).

Cases  1  and  3 assert that "name" is an object defined in (exported from) some
other module (to be imported into this one).   In both cases the labels (usually
present) are defined as a data reference, and in both cases the effect is to add
a  word  (or  two)  to  the  data area.   In case 3 one word, the address of the
external object, is added.   In case 1 two words are added, firstly the  address
of  the  entry  point of the external procedure, and secondly the address of the
beginning of the data area belonging to the module  containing  that  procedure.
This is needed because the external procedure, once called, may not only need to
access static variables of its own, but may indeed need to call further external
procedures,  the  addresses  of which will be stored somewhere in its data area.
We assume that the called procedure has no way of working  out  for  itself  the
address of its data area, and therefore it has to be communicated in this way by
the  calling  procedure.   It is the responsibility of the loader to fill in the
correct addresses, and the object file contains all the information needed to do
this.  A typical call sequence to an external procedure might look like this:

FRED  CODEIMPORT  "BERT"

      {evaluate and pass parameters as needed}
      LD R1,FRED         ; pick up addr of entry point
      JSR.N R1           ; call procedure, but first
      LD T1,FRED+4       ; establish new data pointer


PLANT may used to generate a value  in  the  instruction  stream  that  can  not
otherwise  be  generated  in  the  normal  way.   It  is  followed by a constant
expression, which evaluates to a full word.   Any label is treated as  a  normal
code label.


Layout of object code file

The  object  file  consists of a two-part header followed by several information
sections.   The first part of the header contains simple items  of  information,
while  the  second  is a list of sizes of the information sections which follow.
The layout of the header is somewhat future-proof in that extra  fields  may  be
added  at a later date without disturbing the meaning of those which are already
defined.   This is achieved by incorporating information about the size  of  the
header  parts  into  the first word of the header.   Throughout the object file,
multi-byte values are presented in big-endian order, i.e. most significant  byte
first.

At  present  only  two words are defined for the first part of the header.   The
first word contains two 8-bit bytes giving the  sizes  (in  words)  of  the  two
header parts, and a 16-bit magic number, the purpose of which is to identify the
file  as an 88100 object module, its value is 88100 modulo 65536.   As the first
and second parts of the header are currently defined  to  contain  two  and  six
words, respectively, the value of the first word of the object file is therefore


16_02065824.   The  second  word  is  the size of this module's static data area
(needed by the loader when it allocates space).

The second part of the header contains the  sizes  of  the  six  sections  which
follow.   The  first four sections contain, respectively, information about code
imports, code exports, data imports, data exports.   The fifth section  contains
the  module's  code,  and  the  sixth  contains information for initialising its
static data area.

The four external linkage sections, not all of which need be present,  all  have
the same layout.  For each external name in the section, there appears an offset
word  (even  though data area offsets are limited to two bytes), followed by the
name itself (prefixed by a length byte and padded out at the  end  so  that  the
length  byte  plus the characters of the name plus the padding occupy a multiple
of four bytes).  The offset defines where the object being exported is, or where
in the data area the loader should fill in the address information  of  imported
objects.   The  list  is terminated by a dummy entry in which the name string is
null.

The code section simply consists of the code to be loaded.

The data initialisation section consists of a number  of  segments  (this  saves
object  file  space when there are significantly large uninitialised portions in
the data area).   Each segment begins with a size word (number of bytes),  which
is  followed  by an offset word, which is followed by the data.   The segment is
padded out to a multiple of four bytes, and there is a dummy null segment at the
end (consisting just of one zero word).

RWT December 1989, revised 20/12/90 and 25/04/91