Non-ASCII characters are represented using HTML entities. These include £ for pound (currency).
USING IMP
An informal introduction
Issue 1.1
Peter S. Robertson
Lattice Logic Ltd. 1986
This document is intended as an informal introduction to the IMP
language for people with a general understanding of the ideas
and concepts of programming. It introduces sample programs or
program fragments and discusses various features of the language
which have been used. Also included are comments on why things
are done the way they are and the benefits and disadvantages of
the choices which were made during the development of the
language. For a more formal and detailed description of the
language please refer to "The IMP Language".
Throughout the text the IMP language will be referred to simply
as IMP.
Part of the philosophy of IMP is to provide convenience forms of
the standard constructions which, if used with care, can greatly
improve the readability of programs. These convenience forms
need never be used as the standard forms will always work.
First of all, here is a trivial program which prints out:
Hello there.
%begin {simple first program}
Printstring("Hello there.")
newline
%end
%end %of %file
It would be a good idea at this stage simply to get this program
into your machine, exactly as written above, and attempt to
compile and run it. This will give you some feeling for how
easy (or more likely how difficult) it is to generate the
necessary incantations to get the operating system to do
anything.
If the compiler produces error messages then you have probably
mistyped the program. If the program is exactly the same as
given here then the fault lies outwith the scope of this
document.
Even though this program is so simple, it illustrates many of
the features which give IMP its 'flavour'.
In the example above the text between braces (the curly brackets
{ and }) is considered to be a comment and along with the braces
will be ignored by the compiler. If the closing brace is
omitted the compiler will assume that there should have been one
immediately before the end of the line, so if you have a comment
which extends over several lines, each line must start with an
opening brace.
While it is considered good programming practice to include
apposite comments, their use will be minimised in this document
as they would probably distract the reader, especially as the
programs will be described in the text for it is the program
statements themselves and not the algorithms they implement
which are of interest.
1
Next, some 'words' start with a % character and some do not.
The reason for this is to divide the 'words' of a program into
two totally distinct categories: KEYWORDS and IDENTIFIERS.
KEYWORDS start with a % and 'belong' to the compiler, while
IDENTIFIERS do not start with a % and 'belong' to the
programmer. When writing in IMP, if an identifier is more
easily understood with spaces in it, put them in.
For example, our original program could be rewritten:
%begin
print string("Hello there.")
new line
%end
%end %of %file
The % character in keywords is often thought of as underlining
the keyword, because when writing programs on paper it is much
faster to underline than to write a percent sign. The exact
definition of the effect of percent is that it underlines
everything following it stopping at the first character which is
not a letter. Hence %end %of %file could equally well be
written %endoffile.
There are absolutely no restrictions on what characters can
appear (or not appear if they are invisible) inside quotes; if
your editor or operating system will let you put characters in
the program the compiler will accept them. The only problem
might be the effect of these characters on output devices if
listings are generated from the compiler. In particular there
is no need for the obscure 'escape sequences' of some languages
to include spaces, newlines, tabs or whatever into a program.
Some programmers may prefer to rewrite the program as:
%begin
printstring("Hello there.
")
%end
%endoffile
The disadvantage of allowing newlines inside quotes is that if
the closing quote is omitted the compiler will remain in text
mode and suck in the rest of the program, eventually giving a
fault such as 'String too long' or 'Input ended'.
In practice this is never much of a problem, especially as the
compiler will mark line numbers in the listing file to show
whenever it is still in text mode at the start of a line:
1 %begin
2 printstring("Hello there.
3" ")
4 %end
5 %endoffile
2
IMP requires that statements are terminated by a newline or a
semicolon; newlines are not ignored. The only time a semicolon
is needed is if you want two or more statements on the same
line:
%begin
printstring("Hello there."); newline
%end
%end %of %file
If you want to break a statement over several lines each line
break must be preceded by a hyphen (which is otherwise ignored),
or must come after a comma or the keywords %AND or %OR.
%begin
print string -
("Hello there.")
newline
%end
%end -
%offile
The question of where statements start and finish is one of the
more obscure parts of IMP and does cause some difficulty for
beginners, but this is learned as one becomes more familiar with
the language.
Now for a program which is a little more adventurous:
%begin {program to add two numbers}
%integer First, Second, Sum
read(first) {input first number}
read(second) {input second number}
sum = first+second
write(sum, 0) {output the sum}
newline
%end
%endoffile
This program is made up of one block (the bit between %begin and
%end).
Within the block is a declaration (%integer) and imperative
statements (read ..., sum = ..., etc.). Within each block in a
program declarations must come before imperative statements.
When a block is executed the declarations cause various objects
(integers, reals etc.) to be created and given identifiers by
means of which they can be referenced. When the block is left
any such objects are destroyed and the identifiers loose their
meaning.
The program above creates three objects which can hold integer
(whole number) values and calls them: first, second, and sum.
Such objects are commonly called variables. These variables may
3
be given values and subsequently the values they have been given
may be retrieved for furthur use. Note that if an attempt is
made to take the value of a variable before any value has been
given to it the program will signal an error (unassigned
variable). This check that variables have been given values
catches one of the most frequent programming bugs (at least in
the experience of the author), but sadly very few languages
bother with it. Some languages give every new variable the
value zero (or its equivalent). This often lets programs
stagger on and fail long after the point at which a particular
variable should have been initialised, or worse, the program
just produces a credible but wrong answer. The more common
approach is for new variables to be left with whatever rubbish
is lying about in the memory, giving rise to programs which run
sometimes and fail at other times depending what was happening
previously.
The statement:
Sum = First + Second
is an assignment which computes the value of the expression on
the right hand side of the equals sign and assigns that value to
the variable on the left hand side of the equals sign. The
expression can be as simple or as complex as you like but it
must give a result which is the same type (sort of thing) as the
final destination. For example as First and Second are integers
their sum must be an integer, and hence may be assigned to the
integer variable Sum.
However, the division operator (/) always gives an answer which
is real (fractional) and hence the assignment:
Sum = First / Second
would be faulted by the compiler as Sum cannot hold real values.
If an integer result is required as the result of dividing two
integer values, the integer division operator (//) must be used.
This performs the division and then discards any remainder.
The definition of IMP gives and number of operators which may be
used to form expressions (add, ,subtract, multiply, etc.) and
also defines how the expressions are to be evaluated, hence
A+B*C means 'multiply B by C and then add in A' but not 'add A
to B and multiply the answer by C'. It is to be strongly
recommended that wherever there might be the slightest confusion
don't be clever; use brackets to make the meaning obvious, i.e.
write A+(B*C) or (A+B)*C. A very common mistake is to write
A/2*B intending to get A/(2*B) but in fact getting (A/2)*B.
READ, WRITE, and NEWLINE are examples of routine calls, and the
things in brackets after READ and WRITE are parameters. NEWLINE
has no parameters and so is not followed by any brackets (some
languages would insist that the call be followed by empty
brackets: NEWLINE(), perhaps consistent but definitely
irritating).
4
Note the difference in the parameters in READ(FIRST) and
WRITE(SUM, 0). Because READ inputs a value from outside the
program and assigns it to the variable given as a parameter,
FIRST in this case, it is not the value in FIRST which is passed
to the routine but the object FIRST itself. Such a parameter is
said to be passed by reference. Conversely, the parameters to
WRITE (SUM and zero) represent the value to be output (SUM) and
the minimum number of characters to be output (0), and hence it
is the values which are important. These parameters are said to
be passed by value. Whether parameters are passed by reference
or by value depends on how the procedure was defined; this will
be discussed later.
As the call to WRITE only requires the value of SUM as its first
parameter, and that value just happens to be First+Second the
program could be rewritten:
%begin
%integer First, Second
read(first)
read(second)
write(first+second, 0)
newline
%end
%endoffile
Now for a program which introduces conditional statements. It
simply reads in two numbers and outputs the relation between
them.
%begin
%integer X, Y
Read(X); Read(Y)
%if X > Y %start
Printstring("The first was larger")
%finish %else %if Y > X %start
Printstring("The second was larger")
%finish %else %start
Printstring("They were equal")
%finish
Newline
%end
%end %of %file
This subject causes the most difficulty for beginners, mainly
because it is different from most other languages. The first
point to note is that lines starting %if..... and %finish......
are all complete statements and must be terminated by a newline
or a semicolon. Secondly the bits between %start and %finish
may contain as many statements as you like, including more
conditional statements. Thirdly, the %else %if clause may be
repeated as often as you wish (including zero times).
5
Finally, if there is nothing to be done in the case when none of
the previous conditions has been satisfied then the final
%finish %else %start may be omitted altogether.
This is the most general form of conditional structure and will
always work. The difficulty results from the fact that
conditions occur so frequently and are usually so simple that
use of the general form can be like cracking a nut with a
sledgehammer. Consequently IMP provides convenience forms
which, if used with care, can lead to more readable programs.
It is the firmly-held opinion of the author that extra effort
spent in organising and writing programs is well worthwhile;
programs are usually only written once but read many times.
Therefore all the emphasis should go in attempting to make the
program readable and its logic clear. Demanding that the
standard form be used everywhere does not help to make code more
understandable.
The first simplification is for the trivial case where something
simple is to be done if a condition is true. This could be
written:
%if [condition] %start
Do something simple
%finish
but a simpler and more readable variant is:
Do something simple %if [condition]
after all, that was almost the form of words used to describe
the problem in the first place!
The second change again follows from everyday English usage.
The effect of the condition may be inverted by changing the
keyword %if into %unless. This should be used with care as
while it can make code clearer, misuse can make them very
unclear:
Average = Total/Number %unless Number = 0
is clear enough, but what about:
%unless %not 1 <= N <= 9 %or M # 23 %start {??????}
The final form is just a way to remove some of the wood so that
the trees can be seen. Any statement starting with the keyword
%FINISH and ending with the keyword %START may be rewritten with
both of those keywords omitted.
6
Hence the original program could be written in what we think is
a more understandable form:
%begin
%integer X, Y
Read(X); Read(Y)
%if X > Y %start
Printstring("The first is larger")
%else %if Y > X
Printstring("The second is larger")
%else
Printstring("They were equal")
%finish
Newline
%end
%endoffile
IMP does not see conditions (things like X > Y) as being
'expressions' which give a boolean value, consequently IMP does
not have boolean variables. Instead, IMP sees conditions as
questions, so instead of the statement:
%if A = B ......
being thought of as 'does A=B have the value TRUE' it is thought
of as 'is A equal to B'. This may seem a trivial point but it
can have can have a major effect on the language.
For example, in IMP the statement:
Do something special %if A = B %and C = D
means exactly what most people would understand by the English
statement formed by removing the percent signs. Pascal, for
example would insist on having brackets round the components
(A=B), (C=D).
Trying to keep as close to common English usage does help to
make a language more readable (within limits) but it can cause
problems when English is ambiguous. For example, what is the
exact meaning of 'Bring me an apple or a pear and an orange'.
Many computer languages resolve the ambiguity of AND and OR by
means of precedence rules which bear no relation to everyday
usage: AND is done before OR, the analogy being that AND =
MULTIPLY and OR = ADD. Rather than leave the possibility of
getting this wrong (and complex conditions are difficult enough
anyway) IMP resolves the ambiguity by refusing to accept both
AND and OR in the same condition unless the meaning is made
clear with brackets. Hence the fruity example would become:
'Bring me (an apple and a pear) or an orange' or
'Bring me an apple and (a pear or an orange)'.
7
The next program will take in a sequence of 'words' (character
sequences delimited by spaces or newlines) and count them. The
program stops when it finds the word "%file" and so it can use
itself as input for test purposes.
%begin {program to count words}
%string(63) Word
%integer Number of words
Number of words = 0
%cycle
read(word)
Number of words = Number of words+1
%repeat %until Word as = "%file"
printstring("There were ")
write(Number of words, 0)
printstring(" words")
newline
%end
%end %of %file
Before getting into the main ideas in this program it is worth
discussing the routine READ in a little detail. You may have
noticed that in this example READ is given a string variable as
its parameter while the previous example gave it an integer
variable. This looks suspiciously like the non-standard
procedures which are commonly used to perform input/output
operations. Non-standard means that (taking Pascal as an
example) although READ and WRITE look like ordinary procedures
because they can take parameters of almost any type they cannot
be defined in Pascal, as all user-defined parameters must have a
fixed type. In IMP this is not the case as there is a 'general
type' reference parameter which will accept a variable of any
type. The functions ADDR, SIZE OF and TYPE OF are available for
making use of such parameters.
In the same way that %integer introduces variables which may
hold integer values, %string introduces variables which may hold
string values, where a string is just a sequence of up to 255
characters. Any string variable has three properties: the
number of characters it currently contains (its LENGTH), the
maximum number of characters is can contain (its MAXIMUM
LENGTH), and the actual characters themselves. Whenever a
string variable is declared the maximum length of the string
must be specified (63 characters in this example), and must be
an integer in the range 1 to 255 inclusive. The reason for this
limitation is so that the length of a string can be held as a
sort of invisible character at the start of the actual
characters. While this is not guaranteed to be the way in which
strings will be implemented, to the knowledge of the author no
compiler handles them differently. The only place where the
knowledge that a string of maximum length N will take up N+1
characters-worth of storage is when the function SIZE OF is
used. SIZEOF(WORD) will return the value 64, which is one
greater than the maximum length of the string.
8
A frequent gripe against IMP is 'strings limited to 255
characters are useless'. There is no question that strings of
any length would be ideal, but in practice the limit is not
often a problem, and IMP strings are a lot more powerful and
convenient (and useful) than no strings at all or the highly
restrictive strings offered by the more common languages.
The main point of the example is to introduce cycles (or loops).
In general any sequence of statements may be repeated by
enclosing them in the statements: %CYCLE and %REPEAT. Note that
%CYCLE and %REPEAT are statements and must be terminated (by a
newline or a semicolon). If no furthur action is taken the
cycle will continue indefinitely, so some means must be provided
for terminating the loop. IMP provides one general mechanism
and three 'syntactic sugarings' of common cases. The general
mechanism is to use the instruction %EXIT, execution of which
causes the loop to be terminated and control to pass to the
statements following the corresponding %REPEAT. %EXIT can only
be used to terminate one loop at a time; it cannot take you out
of nested loops in one go.
The three common cases are provided by adding conditional
clauses to either the %CYCLE or the %REPEAT statement, and they
are:
Simple form Expanded form
%while [condition] %cycle %cycle
......... %exit %unless [condition]
%repeat .........
%repeat
%cycle %cycle
......... .........
%repeat %until [condition] %exit %if [condition]
%repeat
%for V = F, B, L %cycle Temp1 = B; Temp2 = L
......... V = F-Temp1
%repeat %cycle
%exit %if V = Temp2
V = V+Temp1
.........
%repeat
Apart from simplicity there are no reasons why you shouldn't
just stick to the %exit form of loops in all cases if you find
it easier. In fact if the loop stops in the middle there is
little choice other than contorting the program to force it into
a %while or %until form.
9
In a direct parallel to the %if statement, IMP provides
convenience forms of the %while, %until, and %for loops when the
'body' of the loop is very simple. E.g.
Buy something %while Money left # 0
X = X*10 %until X > 100
Visit(Patient) %for Patient = First, 1, Last
Beware that %while always does the test before the action which
may not be executed at all, whereas %until always does the test
after performing the action at least once.
The next program counts the number of letters, digits, and other
characters in a piece of text, but instead of detecting the end
of the text by looking for a special data item, it just waits
for the event 'Input Ended' to be signalled instead.
%begin {program to count letters and digits}
%constant %integer Input Ended = 9
%integer Sym, Letters, Digits, Others
%on %event Input Ended %start
Printstring("There were ")
Write(Letters, 0); Printstring(" letters, ")
Write(Digits, 0); Printstring(" digits, and ")
Write(Others, 0); Printstring(" other characters")
Newline
%stop
%finish
Letters = 0; Digits = 0; Others = 0
%cycle
Readsymbol(Sym)
%if 'A' <= Sym <= 'z' %or 'a' <= Sym <= 'z' %start
Letters = Letters+1
%else %if '0' <= Sym <= '9'
Digits = Digits+1
%else
Others as Others+1
%finish
%repeat
%end
%endoffile
The statement %on %event Input Ended %start can be thought of as
a sort of condition which is never satisfied. This means that
when the %on statement is reached control will always pass by
the bit between %start and %finish (the 'Event block') and carry
on from after the %finish (or the %else if one is specified).
However, if during the execution of the rest of the block the
event (or events) mentioned in the %on statement is signalled,
control is immediately passed to the first statement of the
event block and execution continues from there. Any block may
contain an %on statement, but there may only be one in a block
and it must come immediately after the declarations (if there
are any). In general it is not possible to resume execution
from the point at which the event was signalled.
10
The exact definition of this event mechanism is a little more
complicated but this description should be enough for the time
being.
This program demonstrates the use of the single quote to provide
the internal value (an integer) corresponding to any character.
For example in the ASCII character set the constant 'A' is
indistinguishable from the constant 65. Again, there are no
limitations on the characters which can be placed between single
quotes, so that ' ' is the value of a space (32 in ASCII) and '
' is the value of a newline (10 in ASCII). Because the newline
constant can make it awkward to read a program, the named
constant NL is available as a substitute.
The three tests of SYM are of interest as they are examples of
'double-sided conditions'. Effectively 'A' <= Sym <= 'Z' is an
abbreviation for the condition: 'A' <= Sym %and Sym <= 'Z'.
This sort of condition is very useful for testing for ranges of
values as in this example. Note that the example assumes that
there are no 'holes' in the character set and that the letters
and digits are in sequence (beware of EBCDIC!).
In an earlier example the standard routine READ was used to
input 'words' which were then counted. This program made the
assumption that no word contained more than 63 characters.
However, the implementation-provided routine READ cannot know
this and will attempt to input words of any length. If it
encounters a word of 64 characters or more it will quite happily
take it in and then fail with 'string overflow' when trying to
assign the value to its parameter.
11
The next program overcomes this problem by redefining the
routine READ so that it simply truncates any words with too many
characters.
%begin {program to count words revisited}
%string(63) Word
%integer Number of words = 0
%routine Read (%string (63)%name Text)
%integer Sym
Text = ""
%cycle
readsymbol(Sym)
%return %if Sym = ' ' %or Sym = NL
Text = Text . To String(Sym) %unless Length(Text) = 63
%repeat
%end
%on %event 9 %start {end of input}
Printstring("There were ")
Write(Number of words, 0)
Printstring(" words")
Newline
%else {first entry comes here}
%cycle
Read(word)
Number of words = Number of words+1
%repeat
%finish
%end
%endoffile
The definition of the routine READ is a block, nested within the
%begin-%end block of the main program. Whenever an identifier
is declared in a block it remains available for use throughout
the rest of that block and any blocks which are subsequently
nested within it. The only exception to this is the case of
labels which are visible in the block in which they were defined
but are not visible in any nested blocks. Labels will be
discussed in a later section. If an identifier is redeclared
within a nested block, that definition effectively masks out
access to the outer definition. The result of all this is that
in the sample program references to READ will access the
newly-defined procedure and not the standard procedure, which
along with all standard procedures is defined in a sort of
'super block' which contains the whole program.
In brackets following the procedure identifier, READ, is a list
of declarations which define the number and type of the
parameters which must be given at each call. The %NAME suffix
specifies that the parameter is to be passed by reference, that
is, in this example any reference to TEXT within the procedure
will be exactly equivalent to a reference to the string variable
actually given in the call.
12
Execution of the routine is terminated whenever the instruction
%RETURN is executed. For convenience the %END of a routine is
considered to be an abbreviation for %RETURN; %END. Note that
this is only true for routines, other types of procedure must
have explicit terminating statements or the compiler will flag
an error (RESULT missing).
Three new features of IMP are illustrated by the statement:
Text = Text . To String(Sym) %unless Length(Text) = 63
The dot is the only operator available to form string
expressions and indicates concatenation. Concatenation is
simply the joining-together of the two string operands to give a
new string. The number of characters in this new string will be
the sum of the numbers of characters in the original strings,
not the sum of their maximum lengths; magic padding characters
are never inserted.
The identifier TO STRING refers to a standard function which
takes as its parameter an integer expression which must give a
value corresponding to a character value (commonly any value in
the range 0 to 255 inclusive). The result is a string value of
length one character, that character being the value of the
parameter.
For example, the following two assignments both give the same
value to the variable Text:
Text = "H"
Text = To String('H')
Once again, note the difference between "H" which is a string
value and 'H' which is an integer value.
The identifier LENGTH refers to a standard function which takes
as its parameter a reference to a string variable and returns as
its result the number of characters currently contained in the
string.
In passing it is worth commenting on the initial assignment to
Text:
Text = ""
The double-quotes contain no characters, that is, a string of
length zero; this is termed a null string.
Before leaving the routine READ itself it should be clear that
the definition as given will accept string variables as
parameters only if they are defined to have a maximum length of
exactly 63. In the program in question this is no limitation,
but if the procedure were wanted in other contexts it would be
useful if it could accept strings of any maximum length.
13
To specify this the maximum length of the parameter definition
should be changed to a star:
%routine Read(%string(*)%name Text)
%integer Sym
Text = ""
%cycle
readsymbol(Sym)
%return %if Sym = ' ' %or Sym = NL
Text = Text . To String(Sym) -
%unless Length(Text) = Size of(Text)-1
%repeat
%end
Also the reference to 63 has been changed to Size of(Text)-1.
'Size of' returns the size of the storage allocated to the
variable given as its parameter. Remember that the -1 is needed
as there is a one character overhead in strings to hold the
current length.
Finally, there are two minor points about the complete program.
The first is the declarations
%integer Number of Words = 0
which may loosely be thought of as a contraction of:
%integer Number of Words
Number of Words = 0
although strictly the initialisation is performed when the
variable is created, but this is only significant when %OWN
variable are concerned. %OWN variables will be discussed later.
The second is the use of an %ELSE clause with an %on %event
statement. The %else clause is executed when control first
reaches the %on statement, and is skipped if the %on clause is
executed following the signalling of a suitable event,
The following program demonstrates the definition and use of a
function. In addition it shows the method by which constants
can be given identifiers by means of the %constant declaration.
Such identifiers may be used wherever their value may be given
as a literal constant. This provides a convenient way of
parameterising a program so that it is easier to read and
change.
14
The example also uses the CHAR NO map which returns a reference
to the N'th character in the given string variable. An error
will be signalled if Char No attempts to access characters which
are not within the string, i.e. if N is outwith the range 1 to
Length(First parameter) inclusive.
%begin {program to test capitalisation function}
%routine Read Line(%string(*)%name Line)
%integer Sym
Line = ""
%cycle
Readsymbol(Sym)
%exit %if Sym = NL
Line = Line.Tostring(Sym)
%repeat
%end
%string(127)%function Capital form of(%string(127) Who)
%constant %integer Shift = 'A'-'a'
%integer Up, N, Sym
%for N = 1, 1, Length(Who) %cycle
Sym = Char No(Who, N)
Sym = Sym-Shift %if 'A' <> Sym <= 'Z' {to lower case}
%if 'a' <= Sym <= 'z' %start
%if Up # 0 %start
Sym = Sym+Shift {to upper case}
Up = 0
%finish
%else
Up = 1
%finish
Char No(Who, N) = Sym
%repeat
%result = Who
%end
%string(80) Line
%cycle
Read Line(Line); %exit %if Line = ""
Printstring(Capital form of(Line))
Newline
%repeat
%end
%endoffile
This time the declaration of the parameter for the procedure
does not end with %NAME and so the parameter is passed by value,
that is when the procedure is called an ordinary string variable
(Who) is created and it is assigned the value of the string
expression given as the parameter to the call. The use of a
star for the maximum length of a string value parameter is not
permitted.
In the same way that %RETURN terminates the execution of a
%ROUTINE, the instruction %RESULT=..... terminates the execution
of a %FUNCTION. The expression following the equals sign must
produce a value of the same type as the function (a string value
in this case).
15
A common feature of languages which have been influenced by
FORTRAN or ALGOL 60 is that they return the result of functions
by means of a rather unpleasant pun on the function name. The
mechanism is roughly that the use of the function identifier on
the left hand side of an assignment specifies the 'result so
far' but does not terminate the function. Eventually when the
END of the function is reached the 'result so far' is returned
as the actual result. IMP does not (indeed cannot) permit this
as apart from the generally unpleasant nature of the pun and its
associated difficulties, there would be no way of returning the
result of a %MAP, as use of the map's identifier on the left of
an assignment would quite naturally be seen as a recursive call
on the map!
Within a function there may be as many %RESULT= statements as
you wish; the execution of any one of the will immediately
terminate the function.
For example, the following function returns the ordinal number
of an upper-case letter, when that letter is from the EBCDIC
character set. The difficulty is caused by the fact that EBCDIC
has 'holes' between the letters; their values are not
consecutive as they are in ASCII.
%integer %function Ebcdic letter number(%integer Sym)
%constant %integer Ea = 16_C1, Ei = 16_C9,
Ej = 16_D1, Er = 16_D9,
Es = 16_E2, Ez = 16_E9
%result = Sym-Ea+1 %if Ea <= Sym <= Ei
%result = Sym-Ej+9 %if Ej <= Sym <= Er
%result = Sym-Es+18 %if Es <= Sym <= Ez
%result = 0
%end
The equivalent function for ASCII would be:
%integer %function Ascii letter number(%integer Sym)
%result = Sym - 'A' +1
%end
Note the form of the constants defining the EBCDIC values the
the letters A,I,J,R,S,Z. The 16_ specifies that the following
constant is expressed in base 16 (hexadecimal). In such
constants the letters (upper or lower case) represent the
'digits' 10 (A), 11 (B) , 12 (C) etc. The notation is quite
general and any base greater than one can be specified. For
example: Octal is 8_77715, Binary is 2_010101110 and base
seventeen is 17_ABCDEFG. The notation may also be used for real
constants. This is especially useful when the limit of accuracy
is required as putting the constant into the base used by the
machine can give more accuracy than expressing it in decimal.
For example PI could be defined as:
%constant %long %real PI = 16_3.243F 6A89
16
Now for a program using real variables.
%begin {reals}
%real %function Compound interest(%real Capital, Rate,
%integer Years)
%real Balance
%integer Years Left
Balance = Capital
Years Left = Years
%while Years Left > 0 %cycle
Years Left = Years Left-1
Balance = Balance+(Balance*Rate/100)
%repeat
%result = Balance-Capital
%end
%real Cap, Rate
%integer Time
Read(Cap); Read(Rate); Read(Time)
Printstring("The interest on £"); Print(Cap, 0, 2)
Printstring(" at "); Print(Rate, 0, 2)
Printstring("% per annum for")
Write(Time, 0); Printstring(" years is £")
Print(Compound interest(Cap, Rate, Time), 0, 2)
Newline
%end
%endoffile
The function Compound interest takes three parameters, two real
and one integer, all of which are passed by value. Again, this
means that the function creates three variables and copies into
them the values given in the call. Apart from this
initialisation there are absolutely no differences between
parameters and other variables declared in a procedure.
The program uses the two standard output routines WRITE and
PRINT to generate its output. WRITE outputs an integer value
using its second parameter to control the minimum number of
characters output. PRINT outputs a real value using the second
parameter to control the minimum size of the part before the
decimal point, and its third parameter to control the actual
number of places printed after the decimal point, if the third
parameter of print is zero the decimal point and the fractional
part of the number are not output.
So far, all the variables declared inside blocks have been
destroyed when the execution of the blocks terminated. In
several cases it is convenient for procedures to be able to
exist in different 'states', that is to remember what they did
last. This could be achieved by using variables declared
outside the procedures (global variables) but then there would
be no protection against other procedures altering those
varaiables (perhaps as the result of a typing error). This is
where %OWN variables are useful. An %OWN variable is identical
17
to equivalent non-%OWN variables in every respect except that
they effectively always exist, at least as far as the program
which declares them is concerned. This means that they are not
created and destroyed like other variables but exist throughout
the execution of the whole program. As they are not destroyed
when control passes from a procedure their values will be
retained and will be available for use on subsequent calls of
the procedure. However, access to the identifier of an %OWN
variable is still limited to the block in which it was declared
and to blocks subsequently defined within that block.
%constant %integer FF = 12
%routine Print and Suppress(%integer Sym)
%own %integer Previous = FF
%return %if Sym = FF %and Previous = FF
Previous = Sym
Printsymbol(Sym)
%end
This routine is intended for use in place of PRIMTSYMBOL when
the output is to be sent to a printer and consecutive form feeds
(FF) are to be suppressed. The %OWN variable PREVIOUS is used
to remember the character last output and to stop printing
consecutive ones.
Unlike the initialisation of non-%OWN variables the statement:
%own %integer Previous = FF
does not mean:
%own %integer Previous
Previous = FF
as this would set PREVIOUS to FF every time the routine were
called. Rather, the initialisation is performed when the
variable PREVIOUS is created, which is effectively when the
program containing the routine starts execution.
The basic data types provided by IMP are integer, real, and
string. There are several ways in which these types may be used
to create more complicated objects. The first of these is by
means of record variables.
A record is a variable which is made up from a collection of
other variables. The collection of other variables is described
using a %RECORD
%FORMAT declaration. This defines the type, order, and
identifiers of the components and gives an identifier, the
format identifier, to the complete collection.
18
The format identifier can then be used to create objects with
the internal structure described by the %FORMAT declaration:
%record %format Person(%string(63) Surname, Prename,
%integer Age,
%real Weight)
%record (Person) Fred
This defines the variable FRED to be a record containing two
string variables, one integer and one real variable. To extract
a particular variable from a record just follow the reference to
the record by an underline followed by the identifier of the
required component variable. For example:
Fred_Prename = "Frederic"
Fred_Surname = "Chopin"
Printstring(Fred_Prename." ".Fred_Surname)
Records may be used in the same ways as the other sorts of
variable discussed previously, in particular they may be passed
as parameters, by value or be reference, and may be the results
of record functions and record maps. The only operations
available on records as complete entities are to copy one into
another of the same format (Rec1=Rec2) or to set the complete
record to zero (Rec=O).
%recordformat Coordinate(%real X, Y)
%real %function Distance between(%record(Coordinate) Point1, Point2)
%external %real %function %spec Sqrt(%real Arg)
%real Dx, Dy
Dx = Point1_X - Point2_X
Dy = Point1_Y - Point2_Y
%result = Sqrt(Dx^2 + Dy^2)
%end
This program fragment defines a function which operates on
points in a two-dimensional plane, and returns the distance
between two of them. The parameters to DISTANCE BETWEEN are
records passed by value. The user should take care with record
value parameters as they require that the whole record be
copied, not very expensive in this particular case, but with
large records the overhead can be considerable. For this reason
records are more commonly passed by reference, even though they
will not be altered by the procedure. The function makes use of
another function, SQRT, which returns the square root of its
parameter. This function is not defined in this program but
will exist at run-time in some other module. However, as all
identifiers must be declared before they can be used, this
information must be presented to the compiler. This is the
function of the %external statement. The %external keyword
tells the compiler that the identifier about to be declared is
to be made available to the environment outside the program
(commonly a linker of some sort). The %spec keyword tells the
compiler that this declaration is a specification of an
identifier which is not actually being defined by this
19
statement. In other words the complete declaration tells the
compiler what sort of thing SQRT is (a longreal function with
one longreal value parameter), and that it is defined somewhere
in the external environment of the program. For the program
containing the %spec to be able to run, there must be a module
somewhere which defines SQRT.
This module could be written in IMP and the complete module
would look something like:
%external %longreal %function Sqrt(%longreal Arg)
..........
..........
%result = ......
%end
%endoffile
This definition of Sqrt must match the specification statement
used to reference it. In fact they are identical with the
exception that the definition does not contain the keyword
%spec.
The external mechanism is not just limited to procedures but may
be used with variables. For example a module could define an
external record into which various modules can place data:
%record %format Things(%integer Number of washers,
%string(255) the saying of the day,
%real rate of inflation)
%external %record(Things) Useful rubbish
%endoffile
%external declarations like this also give variables the
properties of %OWN variables. If a program or another module
wishes to access this record it just declares it with an
external specifications
%begin
%recordformat Things(%integer Number of washers,
%string(255) the saying of the day,
%real Rate of Inflation)
%external %record(Things) %spec Useful rubbish
Printstring(Useful rubbish_the saying of the day)
newline
%end
%endoffile
20
Now you should be able to see that there is nothing magic about
the routines NEWLINE, PRINTSTRING, WRITE etc. They are just
external routines which the compiler automatically %SPECs for
you. In effect the compiler starts each compilation by
compiling a special file which contains statements like:
%external %routine %spec Newline
%external %routine %spec Write(%integer Value, Places)
%external %routine %spec Print(%real V, %integer B, A)
%external %integer %fn %spec Length(%string(*)%name S)
%external %byte %map %spec Char No(%string(*)%name S, %integer N)
.........
.........
%endoffile
Another way of creating more complex objects is to gather
together a number of objects of the same type as an array.
%begin {counting letters}
%integer %array Times('A':'Z')
%integer Sym, J, N
%on %event 9 %start
%for J = 'A', 1, 'Z' %cycle
N = Times(J)
Printstring("There ")
%if N = 1 %start
Printstring("was ")
%else
Printstring("were ")
%finish
write (N, 0)
Printsymbol(J); Printstring("'s")
newline
%repeat
%stop
%finish
Times(J) = 0 %for J = 'A', 1, 'Z'
%cycle
Readsymbol(Sym)
%if 'A' <= Sym <= 'Z' %start
Times(Sym) = Times(Sym)+1
%else %if 'a' <= Sym <= 'z'
Times(Sym-'a'+'A') = Times(Sym-'a'+'A') + 1
%finish
%repeat
%end
%endoffile
21
This program creates an array TIMES with 26 elements:
TIMES('A'), TIMES('B') ..... TIMES('Z') and uses it to
accumulate the number of times each letter (upper or lower case)
appears in the input.
As a general point about efficiency, the expression -'a'+'A' has
the value -32 but it is much clearer to write it in the form
given rather than as the magic value -32. Perhaps even better
would be to define a %constant %integer with the value -'a'+'A'.
In all of these cases though, the compiler will generate the
same machine code so there is nothing at all to be gained by
calculating such constant expressions and obscuring the program
with wierd and wonderful values. In the same way absolutely
nothing is gained by using 65 instead of 'A' where that is
meant, in fact legibility and perhaps character-set independence
is lost by doing it.
Unlike Pascal and FORTRAN the bounds of arrays need not be
constants; it is quite common for a program to calculate the
size of arrays needed and then create them dynamically. The
only restriction is that all arrays must have a non-negative
number of elements. In other words the upper bound (the second
one) minus the lower bound (the first one) plus one must not be
negative.
%integer %array A(1:0) {is valid}
%integer %array B(2:0) {is not}
Consistently in the definition of IMP repetitions of zero or
more times are always valid whereas negative repetitions are
not. Does anyone know what it means to execute a loop -1 times?
Do you do it backwards once?
The program also illustrates the use of a %for clause to
initialise all of the elements of an array. This is a place
where IMP is a little weak; it would not be difficult to permit
simple operations on complete arrays as a direct parallel to the
operations on complete records viz. copying and initialisation.
This could be included as part of the continuing evolution of
the language.
Records and array may be combined, that is you can have arrays
of records and arrays within records, although any arrays inside
record formats must have constant bounds and be one-dimensional.
The following program uses an array of records. It also uses a
specification of a routine which is defined in the same block.
This is to enable the routine to be used before is is actually
defined. Whether you put routines first and then use them, or
put specs first with the routines at the end of the program is
purely a matter of taste; in general it has no effect on the
efficiency of the program.
22
%begin
%constant %integer Max Items = 100
%recordformat Inf(%string(63) Word, %integer Occurred)
%record(Inf)%array Item(l:Max Items+1)
%string(63) Word
%integer Items in = 0
%constant %string(3) End Mark = "*E*"
%routine %spec Add word to table
%cycle
Read(word)
%exit %if Word = End Mark
add word to table
%repeat
%for J = 1, 1, Items in %cycle
printstring(Item(J)_Word)
printstring(" occurred ")
write(Item(J)_Occurred, 0)
printstring(" time")
printstring("s") %if Item(J)_Occurred # 1
newline
%repeat
%routine Add word to table
%integer P
{insert the word provisionally}
Items in = Items in+1
Item(Items in)_Word = Word
Item(Items in)_Occurred = 0
{now look for it}
%for P as 1, 1, Items in %cycle
%exit %if Item(P)_Word = Word
%repeat
{remove it if duplicated}
Items in = Items in-1 %if P # Items in
Item(P)_Occurred = Item(P)_Occurred+1
%end
%end
%endoffile
It has been mentioned previously that parameters can be passed
by reference, that is a reference to a variable can be assigned
to the parameter rather than the actual value of that variable.
Variable which can hold references to other variables are called
'pointer variables' and they may be declared and used like other
variables.
23
For example, the procedure "Add word to table" described above
could be rewritten:
%routine Add word to table
%integer P
%record(Inf)%name New, Old
{insert the word provisionally}
Items in = Items in+1
New == Item (Items in)
New_Word = Word
New_Occurred = 0
{now look for it}
%for P = 1, 1, Items in %cycle
Old == Item(P)
%repeat %until Old Word = Word
{remove it if duplicated}
Items in = Items in-1 %unless New === Old
Old_Occurred = Old_Occurred+1
%end
The assignment Old == Item(P) assigns to Old a reference to
Item(P). This can be thought of as making Old 'point at'
Item(P). It is important to realise that it is the current
value of P which is used in the sense that it after the ==
assignment P is altered Old will still point at the same element
of the array Item.
Except when being used as the left hand side of an ==
assignment, use of a pointer variable is exactly equivalent to
the use of the variable to which it is pointing. For example
given:
%integer X = 0, Y = 1
%integer %name N
N == X
The assignment Y = N is equivalent to Y = X, N = 3 is equivalent
to X = 3, M == N is equivalent to M == X, and Read(N) is
equivalent to Read(X).
Now the mechanism of passing parameters can be understood a
little more clearly, parameters passed by value are assigned
using = and those passed by reference are assigned using ==.
Pointer variables may be used as components of records as the
following program fragment shows. It is part of a program which
manipulates lists of cells, with each cell using a pointer
variable to point at the next one. The record NULL is a dummy
record used to mark the end of the lists.
24
%begin
%record format Cellfm(%integer Data, %record(Cellfm)%name Link)
%record(Cellfm)%array Cells(1:Max Cells)
%record(Cellfm) Null
%record(Cellfm)%map New Cell
%own integer Last = 0
%signal 14,1,Last %if Last = Max Cells {none left}
Last = Last+1
%result == Cells(Last)
%end
%record(Cellfm)%map Copy of(%record(Cellfm)%name List)
%record(Cellfm) Head
%record(Cellfm)%name End, Cell
End == Head
%while List ## Null %cycle
Cell == New Cell
Cell_Data = List_Data
End_Link == Cell
End == Cell
List == List_Link
%repeat
End_Link == Null
%result == Head_Link
%end
%record (Cellfm)%map Reversed copy of (%record (Cellfm)%name List)
%record(Cellfm)%name New, Cell
New == Null
%while List ## Null %cycle
Cell == New Cell
Cell_Data = List_Data
Cell_Link == New
New == Cell
List == List_Link
%repeat
%result == New
%end
.......
The example also shows the use of %result == to give the result
of a %map. This is an exact parallel to functions where
%result = is used to return a value, while in a map %result ==
is used to return a reference. The compiler will report a fault
if an attempt is made to use %result= in a map or %result== in a
function.
25
The next to functions demonstrate the use of the standard
function SUB STRING to split a string into fragments:
%integer %function Index(%string(255) Data, Pattern)
{returns the index of the first occurrence of Pattern
{in the string Data. Zero is returned if the pattern
{cannot be found}
%integer Chars Left, Here, Limit, Len
Len = Length(Pattern)
Limit = Length(Data)-Len+1 {limit of search}
Here = 0
%while Here < Limit %cycle
Here = Here+1
%result = Here %if -
Sub String(Data, Here, Here+Len-1) = Pattern
%repeat
%result = 0
%end
%routine Insert Today(%string(*)%name Line)
%integer Pos
%string(255) Before, After
Pos = Index(Line, "*DATE*")
%if Pos # 0 %start {found}
Before = Sub String(Line, 1, Pos-1)
After = Sub String(Line, Pos+6-1, Length(Line))
Line = Before . Date . After
%finish
%end
SubString simply returns as its result the string made up of the
sequence of characters between and including the characters at
the positions specified by the second and third parameters.
E.g. Substring("123456", 2, 4) = "234". If the third parameter
is equal to the second a string of length 1 is returned, while
is the third parameter is one less than the second a null string
is returned.
Date is a standard string function which returns the date in the
system-standard format. Similarly there is a function Time
which does the same thing for the time of day.
As the operation programmed in the previous example is quite
common IMP provides a unique instruction for doing it. This is
termed 'string resolution' and looks like a backwards sort of
assignment using ->.
%routine Insert Today(%string(*)%name Line)
%string(255) Before, After
%if Line -> Before .("*DATE*"). After %start
Line = Before . Date . After
%finish
%end
26
The string expression in brackets is evaluated and the string
variable on the left (Line) is searched for that value. If the
value is found the characters to the left are assigned to the
variable to the left of the bracket (Before), and those on the
right of the pattern are assigned to the variable on the right
(After). A string resolution instruction has the strange but
useful property that it can either be used on its own as an
instruction, or, as in the example, it can be used as a
condition. When so used the success of the resolution satisfies
the condition and the implied assignments are carried out (so
beware, this condition has a side-effect!). If the resolution
fails, that is the pattern cannot be found, the condition is not
satisfied and no assignments are performed. When a resolution
is used as an instruction failure causes an event to be
signalled (Resolution fails).
Either or both of the variables outside the brackets may be left
out in which case the corresponding fragments of the original
string are simply discarded. Hence the condition:
%if S -> (Rude Word) %start
asks the question: 'does the string S contain within it the
string contained in Rude Word?'.
Single-dimensional arrays of constants may be declared:
%external %string (3) %function Month(%integer N)
%constant %string (3) %array M(1:12) =
"Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Mov", "Dec"
%result = M(N) %if 1 <= N <= 12
%result = "???"
%end
%endoffile
Procedures may be textually nested inside other procedures:
{1} %routine Print Hex(%integer N, Width)
%integer Places
%routine Hex Digit(%integer D)
%if D <= 9 %then Printsymbol(D+'0') -
%else Printsymbol(D-10+'A')
%end
%return %if Width <= 0
%for Places = (Width-1)*4, -4, 0 %cycle
Hex Digit ( (N>>Places)&15 )
%repeat
%end
27
{2} %routine Print to Base(%integer Base, N)
%routine Print Digit(%integer D)
%if D <> 9 %then Printsymbol(D+'0') -
%else Printsymbol(D-10+'A')
%end
Print to Base(Base, N//Base) %unless N < Base
Print Digit(Rem(N, Base))
%end
The second example (Print to Base) demonstrates that procedures
in IMP may be recursive, that is they may be defined in terms of
themselves. This routine will output any positive integer to
any base greater than 1, although the output will be a bit odd
for bases greater than 36.
28