Cryptograms - source code
This is a resource of the
wordgame-programmers@egroups.com
mailing list.
Click to subscribe to wordgame-programmers
Cryptograms, as published in newspapers and puzzle books, are a 1:1
encryption cypher, with the restriction that no letter may stand
for itself. Although there are many web pages on the net which
"help you solve" these, they generally are merely a substitute
for pen and paper, and offer no active guidance.
If you've never tackled a crypto quote puzzle, read this
introduction page or
this short tutorial first.
When you're ready to try one, here is our own
'Crypto-quote of the day'. Sometimes
there may be a short delay as the server fetches a new quotation
from another net site.
We concern ourselves here with code which solves these cyphers
for you. I had read about programs on the net which solve these
using genetic algorithms and recently I did indeed find one,
though it turns out the traditional methods work pretty well too.
I include here my own attempt to solve these problems
using letter patterns and brute force searching. When it works,
it's actually remarkably fast. When it doesn't work, it's
awful. This was experimental code and can't be considered working,
though with a little hand-holding it has solved many puzzles; however
Karl Dunn's code below based on roughly the same principles
does work and works very well.
It's interesting though - Karl's code will make a valid decode for
some cryptoquotes, but it's not the right one. It's obviously
not the right one to a human, but not to a program. For example:
- Ubty lzm vz dy xzq j kzg dyrtqadtu,
D rbdyn j vzzs rbdyv rz jen de dx
rbtl tatq oqtee pbjqvte.
maps to
- WHEN YOU GO IN FOR A JOB INTERVIEW,
I THINK A GOOP THING TO ASK IS IF
THEY EVER DRESS CHARGES.
It is clearly meant to be "GOOD THING" and "PRESS CHARGES".
A much tougher example of text that can be decoded in two completely
different ways is a Dual Cryptogram.
By the way, the best the genetic algorithm could do with the above, after
expending a lot more CPU than the classic tree search algorithm,
was
- When mob go in for a lox interview, I think a good thing to ask is if them ever press charges.
A nice improvement to such code would be to generate multiple
solutions rather than just one, and use some sort of parser -
or word frequency table - to pick the most likely one. Other
interesting developments could be: when Karl's code produces a
partial solution, feed it to the G.A. to improve it. (The G.A.
version 1.2 code specifically has a feature to accept already decoded
words).
Or when
neither can fully decode the text, do an exhaustive search
exchanging pairs of letters methodically, keeping any that
improve the solution, until no more improvements can be made.
This is not quite what the GA is doing, because the GA swaps
random pairs. This does all 26x26. A bad algorithm at the
start of the problem (it's exponentially expensive if you apply
it recursively), but maybe not too expensive on a mostly-solved
example that just needs 3 or 4 letters to fall into place, ie only
go two (26x26 * 26x26) or three (26x26 * 26x26 * 26x26) levels
deep. Whatever can be done in a second or two on a modern
computer.
The example solutions for this problem are an excellent demonstration
that a problem can be tackled in more than one way, because we have
a third solution below from Robert Muth that relies on trigram
frequencies to do the decryption.
Finally - for the standard 'crypto quote' puzzles - and I emphasise
the 'quote' part of the name - how about a specialist
dictionary of quotable people's names which can be used for the
attribution part, separate from the main text?
- cryptogram/
My partial solution using letter patterns, as described above. Some time
after initially writing that, and having had a chance to study some of the
code here and realise a neater way to structure the problem, I wrote
this: a subroutine to take a pattern such as DYRTQADTU and return a
list of possible matching words such as
alfridary
buildable
coradical
decapodal
decapodan
embracery
ethnogeny
impartial
inflatile
interview
metasomal
operatory
ophiuroid
ornithoid
overshort
overstory
recontrol
sclerosed
squamosal
tidewater
touristry
unfactual
unleagued
unleaguer
unprocure
unrescued
A more subtle version of the same code uses a wordlist which has
been precomputed according to the canonical word pattern, thus
restricting the search space considerably at the cost of much more Ram.
- kdunn/
The first good cryptogram solver I have found on the net is
that of Karl Dunn.
Karl was kind enough to release the sources of this for us, and a local
copy is here on the archive. His original sources
are here.
Although I haven't finished studying his code, I did note
one feature that might be worth working on.
He assumes the only 1-letter words
are A and I. If there are 1-letter words present that are not those
(O for the wings of a dove!) the program fails completely. Note
the general problem here is not the lack of the word "O" but the
backtracking failure that occurs when a 1-letter word is not found
at all. Cryptogram solvers must assume that some words in
the solution will not be present in their dictionary, and still
produce the best answer. This is a very minor niggle however, because
this is the best solver in the archive.
(FYI the genetic code also has problems with 1-letter words). Here is
Karl's writeup of how his program works.
- David Eppstein's solver
This is a recent addition to the archive, but I've put it near
the top instead of following the usual chronological order of addition
because it's a very good program that is worth studying. The
algorithm is consise, and does a few
minor tweaks that the others don't which seems to make a big
difference; it is heavily based on word-frequency probabilities,
and as a result gets the test example 100% right very quickly.
It's written in Java and downloads a large wordlist, so you need a
fast connection to try out the applet, but the source is short
and available.
- crypto_ga/
To contrast with the above two examples (which use basically the same
algorithm, although of course the second one above actually works
which is more than can be said of my own hack above), here we have
a genetic algorithm solution from John Escobar and Tom Kiehl.
Again, we keep a local copy in case it disappears off the net...
(Requires linking with galib)
- Ramkumar Natarajan's cryptogram solver (with sources)
On our example above this gives: When you go in por a job interview, I think a goof thing to ask is ip they ever dress charges.
- Genetic Programming
Genetic programming tutorial using crypto solver as an example. Very poor
performance compared to the others in this page, doesn't exactly stand out as
a poster boy for genetic algorithms. Source code in ADA included.
- crypto_hart/
I recently discovered this interesting discussion about substitution cyphers
at the Applied Cryptography Project. The author, Christopher Mooney, examines another cryptogram decoder by George Hart and
compares its performance to that of Karl Dunn's referenced above.
- cbw/
Crypt Breakers Workbench. A veteran Unix tool for cracking all
sorts of encryptions. Don't know if it specifically does these,
but it might do or might be made to do.
- Playfair Cypher
- Cryptquote
Cryptogram solver written in TCL. Biggest problem with this code is
that the initial guess is started off by making a frequency table
and assuming the most frequent letter is 'e'. Many cryptograms are
designed so that they do not follow standard 'etaoin...' frequencies;
indeed some have no "e"s at all.
- Cryptogram solver from John K Taber
This is mostly is Intel Assembler. From the large crypto
archive at funet.fi
- Solving cryptoquotes in parallel
No code, but a good writeup. This person's solution doesn't appear to
decompose well as a parallel problem, according to his notes. Perhaps
if he had written a good linear solution first he might have found
better ways to decompose it into a parallel solution.
- Simple Cipher Breakers
Robert Muth's decryptor, which primarily relies on trigram
frequencies.
Feeding it the test text from above makes it go away for 20 seconds
and then come back with either
"when you go in xor a lod interview i think a goof thing to ask
is ix they ever press charges" (apparently worked out using trigrams)
or "when mob go in xor a joy interview, i think a goof
thing to ask is ix them ever dress charges" (biasing the results with
scores using dictionary lookups).
- Crypto Cracker
No source here. CGI gives similar but not identical results to Muth's
code above. I suspect it's the same code with a different wordlist, or
at least another trigram-based decryptor.
- Crypto cracker written in Euphoria
- An online cryptogram solver at 1-across
The first time I tried this I was very impressed because it was
the only solver that got the example on this page 100%
right. I couldn't decide whether it was a general purpose
solver, or just has a large database of solved cryptograms.
A little tinkering showed that it was a proper solver, but the
weird thing was that on subsequent attempts, it never got it right
again. It seems to have a random element to its solutions, and never
gave the same answer twice.
- Some info on letter frequencies
- Decrypto v3.0
I haven't tried 3.0 yet, but 2.1 had big problems - it actually got within
about 3 letters of the test cryptogram above (none wrong, some missing) but
due to an incomplete dictionary never finished it, and at the end of the
run (which took ages) it simply said it could not find a solution, instead
of printing it's best effort. Proper homepage is http://edder.mit.edu/shareware/, however
the site was offline whenever I tried to access it.
Latest news: this software is up to Version 5 and is considerably improved.
It returns multiple decryptions, which for our test example included the
correct one (quite near the top of the list). It does suffer a little
in the user interface department (I was unable to get the Mac version to
run at all, but the PC one worked OK) but the author, Ed Olson, is now working on
a better user interface. The new home of the code is http://www.ravenousbirds.com/shareware/.
Even later news! :-) The author, Ed Olsen, has now written a
web
interface to Decrypto, and explains
the algorithm he uses. The Windows version (v7.1) has now moved here.
It's pretty fast and the results aren't bad...
When you go in for a mob interview, I think a good thing to ask is if they ever press charges.
When you go in for a lob interview, I think a good thing to ask is if they ever press charges.
When you go in for a box interview, I think a good thing to ask is if they ever press charges.
When you go in for a job interview, I think a good thing to ask is if they ever press charges.
When mob go in for a you interview, I think a good thing to ask is if them ever press charges.
When mob go in for a loy interview, I think a good thing to ask is if them ever press charges.
When mob go in for a lou interview, I think a good thing to ask is if them ever press charges.
When mob go in for a joy interview, I think a good thing to ask is if them ever press charges.
Found 8 solutions
CPU Time Used: 0.00 seconds
- Pointers to some more advanced decrypting tools
Vaguely related to this field are cryptarithms
and phone number words.
Return to the Archive overview.
Keywords: cryptograms, cyphergrams, cryptoquotes,
cryptogram, cyphergram, cryptoquote,
crypto-grams, cypher-grams, crypto-quotes,
crypto-gram, cypher-gram, crypto-quote,
caesar cypher, vignere cypher, rot13