Substitution Cipher: Algorithm
The basic algorithm of the substitution cipher breaker
can be characterized as hillclimbing with several
random starting points. Our "landscape" consists of all permutations
(substitutions) of the 27 characters allowed in the cipher-text.
We will distinguish three phases:
Phase 1: Picking a Starting Point
Originally we picked an arbitrary permutation of the
alphabet. However, it proved to be better to start with
a permutation that is "close" to the optimal permutation
according to a single letter (monogram) frequency analysis.
By close we mean that the permutation can be obtained by a few
transpositions (swaps) of neighbouring elements from this optimal
permutation.
Phase 2: Climbing the Hill using Trigram Scoring
Giving a permutation to start with and a
scoring scheme based on trigrams that
assigns values to permutations. We now try to improve the permutation.
That is we try to modify the permutation slightly so that the score
of the modified permutation is higher than the score of the original
one. We do so until we cannot improve the permutation anymore.
As "slight modifications" we only consider transpositions.
Phase 3: Climbing the Hill using Dictionary Scoring
This phase tries to fine tune the permutation that was the result
of phase 2. It also employs hillclimbing but uses a different
dictionary based scoring scheme.
Phase 1 to 3 are repeated several times. The permutations that had the highest
scores are saved.
Somtimes the dictionary optimization (phase 3) does more harm
than good especially if there are many words in the text that are not in the
dictionary. We therefore save not only the permutatation that had the highest
dictionary score but also the one that had the highest trigram score.
Last Update: 97/06/20