cwc is a crossword compiler that I wrote in the summer of 1999. The algorithm is inspired by the work of Sik Cambon Jensen, but improved in a few places. The crossword compiler takes a word list and a pattern as input and tries to fill the words from the word list into the grid formation resulting in a filled crossword grid. cwc does not generate clues.
Request for help: I would like to write an addon to cwc to format crosswords with clues (human input) for printing. I have tried outputting LaTeX code, but LaTeX won't hyphenate words in \parbox'es and similare constructs (needed for the danish crossword formatting style where clues are placed inside the grid cells). If you have a suggestion on how to solve this or easily output another format, please write to me at larsch@cs.auc.dk.
Note: cwc requires GNU Make, g++ and the Standard Template Library to compile. Version 1.0.1 has been succesfully compiled and run on Linux, Solaris and HPUX.
In brief cwc works by searching the complete search tree for solutions to the grid. However, the search is optimized in order to avoid really trying all possible solutions. When a letter is to be placed in a specific cell and no solutions are present according to the current letters in the grid, the compiler "backtracks", i.e. removes previous choices in order to allow for a now letter at the present cell.
Several backtrack schemes can be used. The simplest will simply backtrack to the most recent filled cell and continue from there. However, replacing the letter in that cell may not lead to a new solution at the present cell at all. Therefor cwc backtracks only to cells that the present cell ``depends'' on. That is, cells in the same words. Changing them may lead to a new solution.
The algorithm cwc uses is procedural and uses a static walking heuristic. It works letter by letters and thus tends to be slower than word by word algorithms but for complex grids, the algorithm is just as fast or even faster.
One of the improvements I made was an optimization of the backtracking algorithm. Sik describes an "optimal backtrack choice", but this choice can only be used for the first step (else, the compiler risks ignoring possible solutions). By remembering all possible backtrack choices and reusing these choices in subsequent backtracking situations the "optimal backtrack choice" can always be used. (See Sik's thesis for explanation of these terms).
The second most important improvement I made was to remember reuse the letter in a cell from a previous attempt if the cell is not a backtracking cell (the cell is backtracked past). This tends to speed the compilation process up a lot when working on grid where the algorithm tend to work in different areas at once. Effectively, if the algorithm fails in one of the areas, attempts in the other areas are reused even thought they are backtracked beyond.
Finally, I spend quite a lot of time profiling and optimizing the program. This showed to pay off because of two reasons. First the algorithm uses the dictionary a lot (actually, 95% of the compilation time is spend in the dictionary). Second, I have written cwc in C++ and due to the object orientation, the responsibility is heavily distributed and thus, certain functions (like constructors for simple objects) tended to be activated millions of times. Inlining these constructors resulted in a heavy speedup.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.