Strongly-typed dimensions in a programming language

I wrote this back in 2012 - it was sketched out as a design for a text-based calculator app but my real motive was to design a syntax that I could layer over C to add dimensioned units to my C programs. I didn't actually get around to implementing it, but ten years later I'm looking at it again and still not seeing any problems with the design.

However in the intervening years, I do note that some small part of these features have been added to C++11.

Firstly, let me introduce the type system. Names introduced by 'type' are strong basic types.

You cannot assign any variable of one basic type to a variable of any other basic type.

type m;                  // a basic type, meters
type s;                  // seconds.
type g;                  // grams

we can have simple alias, for instance

type sec = s;
type secs = s;

these are actually a special case of derived types which merely change the scale of a variable, for example

type cm = m * 100.0;     // centimeters = 100 * meters

Now, we're likely to have many derived types that look similar, so here we introduce a mechanism to create a generic relationship between types. This might be looked on similarly to a C macro, but the semantics are stronger than simple macros:

type k(x) = x / 1000.0;  // kg, km, etc

We instantiate the generic above like so:

type km = k(m);          // kilometers = meters/1000

Our unit-aware calculator would work quite nicely with just basic units, because it will create the derived units itself when you combine basic units (like this example):

time = 3s
dist = 90m
speed = dist/time
print speed
30 m/s

Note that we can just as happily assign a complex unit directly:

speed = 45 m/s

However sometimes we'll want to enter data as if it were a simple type when in fact it is a derived type. So if we don't want to say:

gravity = 9.81 m / s^2

we can say instead:

gravity = 9.81 mpss

which we allow by defining:

type mpss = m / (s * s)       (or m / s^2,   or m * s^-2)

This is a complex derived type comprising more than one basic type.

Here's the syntax for constants:

3s
3 secs
9.81mpss
9.81 m/s/s         (left associative, ie   (m/s) / s   )
9.81 m/s^2
9.81 m/(s*s)

We have a few implicit rules.

Variables don't need to be declared, they're typed automatically depending on the value assigned to them. If you want to predeclare them, just assign them a value of 0 with the appropriate units, eg:

speed = 0 m/s

or if you want, you can omit the constant and just enter:

speed = m/s

Finally, there is a default type which is "pure number" which has no type. This is used in expressions, for example:

rod = 1.2 m
chain = 22 * rod

chain is of type 'm' and value 26.4

Now, with all this set up, here's what you *cannot* do!

You cannot assign a pure number to a typed variable.

You cannot assign a variable or constant of one type to a variable of another type.

An attempt to do either of these will result in an error.

With this, we can now use our strongly typed calculator. We can ask for an expression to be printed as so:

print distance

The print command may take multiple parameters, including literal strings:

print "It is " distance miles " from Dallas to Houston"

(literal strings do not take type modifiers)

We can modify print requests to scale units to a preferred size. For example:

type feet = m / 100 * 25.4
distance = 50 feet
print distance/2 km

This use of the type suffix is slightly different from previous uses. The expression is calculated as above using the base types, then the type suffix is examined to confirm that it is a simple unit scaling expression of the base type. (This may be recursive, eg if km are first defined in terms of cm, and cm in terms of meters)

Here are some examples:

type usec = s * 1000000
type G = 9.81 * m /s /s
moon_gravity = .6G
drop_time = .03s
drop_height = .5 * moon_gravity * drop_time^2
print "an object which takes " drop_time " usec to fall on the moon must have been dropped from a height of " drop_height cm

type gal
tank = 12gal
used = tank/4
type miles = km * 8/5
trip = 200 km
consumption = trip / used
print consumption miles/gal

NOTE: All numbers are stored in terms of the base type. All the derived type information is lost:

type m
type km = m/1000
dist = 20 km
print dist
20000 m

So if you want an answer in a scaled unit, you have to ask for it by name:

print dist km
20 km

TO DO: user-defined functions, eg calculating the time it takes an object to drop as a function of height and gravity.

After writing the above, I accidentally (while searching for the formula for acceleration) stumbled across something remarkably similar: http://www.isr.umd.edu/~austin/aladdin.d/quantity-defn.html

Reading that caused me to add one more function:

There is a 'magic' function, magnitude(), which returns a pure number without type information. magnitude can be modified in the same way as a print statement, so magnitude(speed) defaults to the base type, but magnitude(speed) miles/hr first converts to mph and then returns the numeric value. (It is 'magic' because unlike any other function, it takes a parameter that can be of any type, and does not preserve that type information in its result. cf C's sizeof/typeof special functions.)

Similarly there is a string function units() which returns the type information of a variable as a printable string. This is useful if you have made a long and complex calculation and want to know what the units of the result are.

Printing a value is equivalent to printing its magnitude followed by its units.

The description above is for a calculator and assumes a dynamic typing structure. It's really no extra effort to enforce "var x : type" declarations and limit statements to what can be statically compiled, so that this can be used within a 'traditional' programming language. We'll probably need that anyway, when we add procedures and functions. And as the Alladin link above mentions, engineering calculations often need matrices, which should not be a problem to add (i.e. arrays)

Limitations: variables are scalar quantities of complex units. This could not handle a compound variable at the moment, eg imaginary numbers, or vectors (i.e. a pair of angle, magnitude).

Internal representation: each number is a triple, of magnitude, and a list of numerator types, and a list of denominator types.

For example, moon_gravity is a triple of 5.886; m; s, s

(Question is there anything which can't be handled with such a trivial numerator/denominator scheme? I suspect there may be several things that don't fit well...)

So, reasonably well specified now; all that remains is to implement it :-)

Extensions: reimplement using either rational numbers or arbitrary precision reals (or rational bignums, ie both?)

To add: when I was looking for this document recently after forgetting where I filed it (or posted it), I found several other dimensional systems (all bookmarked now). Need to add info about them here, with links.

The main thing I realised subsequent to when I first wrote this, is that it can be done as a layer on top of C, rather than as a stand-alone language or as a simple calculator. The only question I have to settle is whether it needs lexical or dynamic scope. If lexical, extensions should be done in a syntax similar to C preprocessor. If dynamic, need to pass more info around at runtime. That would require parsing the C properly and outputting a modified program, rather than handling it 'on cheap' with a more simple pre-processor.