\documentstyle[a4,12pt]{article}
\begin{document}
\author{Rainer Thonnes}
\title{APM Cross-Assembler for Acorn ARM}
\maketitle
\parskip .1 in
\setcounter{secnumdepth}{10}
\parindent 0in
\section{Preamble}
Cross-Assembler for the ARM (Acorn RISC Machine) Processor

\section{USER NOTES}

The APM command "ASSEM:ARM HARRY" will assemble the sequence of statements
held in file HARRY.ARM, and generate two output files. Object code (in pure
binary) is sent to HARRY.OBJ, and a listing is sent to HARRY.LIS.

The assembler recognises all the mnemonics listed in the "Instruction Set"
section of VLSI Technology Inc's VL86C010 data sheet as well as the locally
defined assembler directives described below.

Each line in the source file may be no longer than 255 characters, and will
contain either a machine instruction, a directive, or neither. In addition each
line may be labelled or may contain a comment (or both). The part of a line to
the right of any unquoted semicolon is treated as a comment and ignored. Labels,
where they occur, must begin in column 1, and be separated from the statement
proper by one or more spaces (or by a colon and any number of spaces).
Non-labelled statements must have a space in column 1.

Labels and other names, used to define constant values or fixed addresses, may
be defined by the programmer. Apart from '\_' they must contain alphanumeric
characters only, upper and lower case are not distinguished. There is no
restriction on the length of names. Names must be distinct from all the
predefined mnemonics, which include all the opcodes, condition code, shift and
block transfer addressing mode specifiers. The name "*" may be used to denote
the address of the current instruction. The names R0-R15, PC, and LINK are
predefined for convenience only (i.e. they are eligible for redefinition if need
be), and have the values 0-15, 15, 14, respectively.

Literals are either quoted characters or numbers. Numbers are interpreted in
decimal radix unless over-ridden using '\_'-notation. So 2\_101111, 8\_57, 16\_2F,
and 47 are all the same thing. An 'H' suffix is accepted as an alternative for
the '16\_' prefix, but the first character must still be in the range 0-9, so if
a number begins with a letter then a leading 0 must be added, for example 2FH is
the same as 16\_2F, and 16\_C37 would be written as 0C37H. Literals may be signed
using '-' (minus) or '\' (not). Quoted constants consist of up to four ASCII
characters enclosed in single quote marks. Quote marks themselves may be quoted
by doubling them up. Where fewer than four characters appear, they are
right-aligned. So 'C' is the same as 16\_43, 'a b' is 16\_612062, '''!''''' is
16\_27212727.

In most contexts constant expressions, to be evaluated at assembly time, are
allowed, in which all the usual integer operations are supported (+, -, *, /,
\% (remainder), \&, ! (or), !! (exclusive or), $<$$<$, $>$$>$). These operations are
performed strictly left-to-right, i.e. the usual precedence rules do not apply.
No bracketing is allowed.


\section{Assembler directives}

The directive "=" equates a name to a constant expression. The name begins in
column 1, as if it were a label. It is followed (possibly after a few spaces)
by an equals sign, which in turn is followed by the expression.

The directive "ORG" is used to specify an alternative base address for the code
being assembled. At most one such directive may appear in the source file, and
it must appear before any statement which generates code (because the object
code, being pure binary, cannot contain switching directives). The effect of
the ORG statement is to determine the values generated for the offset fields in
instructions using PC-relative addressing. In the absence of an ORG directive
the code is assumed to start at location zero.

The directive "DATA" is used to plant in-line data. The data operand is a full
32-bit word (because code must always be word-aligned). Text string data may be
generated with the aid of multi-character quoted constants described above,
possibly in combination with constant expressions. For example, the two strings
"on" and "mat" may be generated using 'no'$<$$<$8+2 and 'tam'$<$$<$8+3 for length
prefixed right-aligned form, bearing in mind that the ARM treats byte 0 in
memory as the low-order byte of word 0 (opposite to the 68000 way).

The "END" directive marks the end of the source file.


\section{Summary of instruction mnemonics (or their components) and operand syntax}

All ARM instructions are conditional (i.e. are only executed if the condition
code bits in the PSR are such that the condition specified in the COND field of
the instruction is true). The instruction mnemonics are formed by concatenating
the basic opcode mnemonic, the condition mnemonic (if not supplied, "always"
is assumed), and some instruction specific option flags.

The condition mnemonics known to the assembler are those listed in the data
sheet (EQ, NE, CS, CC, MI, PL, VS, VC, HI, LS, GE, LT, GT, LE, AL, NV), plus HS
and LO (which are alternatives for CS and CC).

\subsection{"BRANCH" instructions}

The opcodes B (branch) and BL (branch and link, i.e. call) may, just as all
other instruction mnemonics, occur on their own or together with a condition
specifier. B is the same as BAL, BLAL is the same as BL. The opcode is
followed by an operand which is a constant expression, usually a label, giving
the address of the instruction to be executed next.


\subsection{"DATA PROCESSING" instructions}

These are as listed in the data sheet (AND, EOR, SUB, RSB, ADD, ADC, SBC, RSC,
TST, TEQ, CMP, CMN, ORR, MOV, BIC, MVN). The opcode may be followed by the
option letter S or P. The operands number two (for compare and move
instructions) or three (for the rest). The first and middle operands are always
registers, the first of these (except in compare instructions) is always the
destination. The last operand may be a register (possibly shifted) or a literal.
General operand syntax, including shift code mnemonics (ASL, LSL, LSR, ASR, ROR,
RRX), are as in the data sheet. Examples:

\small\tt \begin{verbatim}   MVNEQ R0,#16_FF000       ; set R0 to 16_FFF00FFF if "zero" is true.
   ADDS  R1,R1,R1,LSL #4    ; set R1 to R1+(R1<<4), i.e. multiply by 17,
                            ; and set the condition codes.
\end{verbatim}\rm  \normalsize 
\subsection{"MULTIPLY" instructions}

MUL (multiply) and MLA (multiply and accumulate) are accepted, and take three
(MUL) or four (MLA) operands, which are all register numbers.

\small\tt \begin{verbatim}   MULS R1,R2,R3    ; sets R1 to R2*R3, and affects the condition codes
   MLA  R1,R2,R3,R4 ; sets R1 to R2*R3+R4
\end{verbatim}\rm  \normalsize 
The assembler checks that the restrictions (that the first register must not be
the PC or the same as the second) are complied with. Where it is required to
replace the contents of R1 with R1*R2, you must write

\small\tt \begin{verbatim}   MUL R1,R2,R1    instead of    MUL R1,R1,R2
\end{verbatim}\rm  \normalsize 
and indeed in this case the assembler does the switch automatically and issues a
warning.


\subsection{"SINGLA DATA TRANSFER" instructions}

The basic STR or LDR opcode may be followed by the letters B (indicating byte
rather than word transfer) and/or T (see data sheet). The opcode is then
followed by two operands, the first of which is the register to be stored or
loaded, the second specifies the address of the operand in memory. The same
syntax as in the sheet is used to specify the various addressing modes:

\small\tt \begin{verbatim}   LDRVC R5,TEMP       ; (TEMP is a label) If the overflow flag is clear, then
                       ;   load the contents of location TEMP into R5.
                       ;   The assembler will convert TEMP into an offset
                       ;   form using PC as the base register).
   STR R0,[R1]         ; Store the contents of R0 in the memory location
                       ;   pointed at by R1.
   ...,[R1,#8]         ; Specifies the location 8 bytes on from where R1 points
       [R1,#8]!        ; As above, but replaces R1 with R1+8 at the same time.
       [R1,R2]         ; Refers to the location whose address is the sum of
                       ;   the contents of registers 1 and 2.
       [R1,R2]!        ; As above and also replaces R1 with R1+R2.
       [R1,R2,LSL #1]  ; Here the address used is R1+R2*2
       [R1],#12        ; Specifies location R1+12 and also adds 12 to R1
       [R1],-R2,LSL #2 ; Specifies location R1-R2*4 and subtracts R2*4 from R1.
\end{verbatim}\rm  \normalsize 
Where an immediate offset is used (as in \#8 or \#12 above), this must be in the
range -4095 to +4095 (or, unless byte transfers are involved, and assuming the
base register involved is divisible by four, in the range -4092 to +4092). In
the case of a label, this must be within 4092 bytes of the instruction after the
next.


\subsection{"BLOCK DATA TRANSFER" (i.e. multiple register transfer) instructions}

STM and LDM, followed by one of the eight addressing mode specifiers (IB, IA,
DB, DA, ED, FD, EA, FA) forms the opcode for these instructions. There are two
operands. The first is the register containing the base address for the
transfer, the second is either a single register (although in this case it would
probably be more appropriate to use a STR or LDR instruction instead) or a list
of registers enclosed in curly brackets. Within the list, registers are
separated by commas (for enumeration) or dashes (for sequencing), so R2-R5 and
R5-R2 both mean the same as R2,R3,R4,R5, or as R5,R2,R4,R3. The first operand
may be followed by '!' if the base register is to be incremented or decremented
by the size of the transfer (it is normally unchanged otherwise), the register
list may be followed by '$\hat{ }${}' to force updating of the PSR flags or a user bank
transfer).


\subsection{"SOFTWARE INTERRUPT" instruction}

The SWI opcode (with condition specifier if appropriate) takes a single
literal operand.


\subsection{"COPOROCESSOR" instructions}

These are not implemented.



\section{Coda}
Documentation dated 07/11/88

\vspace{.75in} assem:arm.doc printed on 14/03/89 at 15.27

\newpage
\tableofcontents
\end{document}