CPS 100E, Fall 1999, Assignment 4

CPS 100E Fall 1999: Word Ladder

Due: Wednesday, November 24, by midnight

24 points

You'll write a program to find word-ladders: connections from one word to another formed by changing one letter at a time with the constraint that each time a letter is changed, the resulting string of letters is a word. For example, to turn stone into money, one possible ladder is (replace 't' by 'h', replace 'o' by 'i', etc.):

stone
shone
shine
chine
chins
coins
corns
cores
cones
coney
money

All of these words that make up the ladder can be found in a dictionary (or a data file specified by the user).

This assignment provides practice with pointers, lists, queues, and recursion.

Getting Started

Change into your cps100e directory and create a directory called assign4. Change into that directory and copy the following files from the professor's home directory using the Unix command cp (do not forget the trailing dot "."):

cp ~rcduvall/cps100e/assign/assign4/* .

If you type the Unix command ls you should see the files listed below.

Makefile, doladder.cpp, ladder.h, ladder.cpp

Additionally, two large data files are provided for you to test your program. So you can access these files without specifying a long path-name, create a link to the data files in your assign4 directory by typing:

ln -s ~rcduvall/data/knuth.dat
ln -s ~rcduvall/data/words5.dat

For this assignment, you'll be using a database of all the English five-letter words compiled by Don Knuth over the past thirty years. The file knuth.dat has 5,757 words in it as well as some extraneous information in it. Code to read this file is already written for you to use. The code ignores lines that begin with *, and only processes the first five characters on other lines. Knuth asks that the file not be altered, hence these restrictions.

There is also a smaller file of data (3,200 words). You can, of course, make your own data files (strongly recommended for debugging!): the format will be one 5-letter word per line.

Input/Output

Your program will find the shortest word ladder between two words entered by the user when running the program. There may be more than one shortest ladder, your program must find one, but not all such ladders.

Your program should:

Read and store the words from a file given by the user on the command-line (already done).
Prompt the user to enter two five-letter words, and output a shortest ladder from the first word to the second.
Repeat step 2 until one of the words entered by the user is not of length 5.

The following is a sample run of the program:

> doladder
Enter two 5-letter words (length != 5 to end): smart brain
smart
start
stark
stack
slack
black
blank
bland
brand
braid
brain
Enter two 5-letter words (length != 5 to end): angel devil
There is no path from angel to devil
Enter two 5-letter words (length != 5 to end): no more

Algorithm

A ladder is found by putting the starting word on a queue, then putting all words one letter away from this word on the queue, then putting all words two letters away on the queue, then all words three letters away, etc. As each word is taken off the queue, if the last (target) word is found the process can stop (there may be other words on the queue, but they'll be ignored). This process is guaranteed to find the shortest word ladder (please explain why in your README file).

A word w is not actually stored on the queue, but instead a pointer to a struct containing w is stored. The other fields of the struct are a pointer to the word that is one letter away from w and that caused w to be put on the queue (i.e., the word's predecessor in a potential ladder), and a boolean that records whether or not this word has been put onto queue (i.e., if the algorithm has yet visited this word). For example, if w is bears, then pointers to structs containing dears, fears, gears, beard (and so on) are enqueued with each struct pointing to bears since this word preceded the others and caused them to be enqueued. The first word does not have a predecessor, so its ancestor field is 0/NULL. The boolean field prevents a word from being put on the queue multiple times and potentially creating a circular ladder.

This is diagrammed below

Thus, store all of the words from the file in a vector of type LadderNode * (this is already done in Ladder::Load). The LadderNode struct is used to store the word from the data file, the previous LadderNode that it may be one letter apart from (this link is formed during the construction of the ladder), and a sentinel that marks that the node has already been considered on the queue. Thus after Ladder::Load has been called, the vector may look like the following:

The first word (entered by the user) is looked up in the list of words, and a pointer to the struct containing the word is enqueued. If the first word is not in the list, you should create a LadderNode struct containing the first word and add it to the queue (as if it had been in the list of words). Once you have dealt with the special case of adding the first word onto the queue (because it has to be a special case is why you can fake that the first word was in the list already), then repeat the dequeue/enqueue process below.

Dequeue an element (it is a pointer).
Find each word one letter apart from the dequeued word. If one of these is the target word, you're done (or if one of the words is one apart from the target word you're done).
Otherwise, if each of the words one apart is not the last word, enqueue the word if it hasn't been enqueued before (you can use the IsVisited member function in a LadderNode to determine if a word has been enqueued before --- initially all nodes are not on the queue and that function returns false). This means each word is enqueued at most once.

When the target word is derived, you will need to print out the ladder from the first word to the target word. The ancestor pointer in the LadderNode stores information that will allow the ladder to be recreated, you may need to use recursion or a vector since the ladder will be backwards (but should be printed properly). Instead, you could try to search backwards, so to find a ladder from smart to brain search from brain to smart.

It should also be possible to find target words that are not initially in your list of words by using the same trick, of creating a new temporary LadderNode, that you did for the first word.

Coding

The code provided you reads a set of words from a user specified file (either on the command line or via prompt) and prompts the user for two words to check if they can form a word ladder. It then calls some empty member functions of the class Ladder. You must develop the class Ladder that has been started for you, by filling in the public member functions and perhaps adding more private member functions.

You must implement the public functions, Find and Print, as specified in the Ladder header file. You may implement as many other private member functions as you need. A private member function is a helper function for other member functions, but should not be available to be called by the user of the class. Making a helper function private ensures that only other member functions can access the helper function, but client programs cannot.

Use these helper functions liberally, because points will be taken off of your program for each function you write that is longer than twenty lines (and even more for each additional five lines above fifteen). You get in the habit of writing small, single purpose, functions that are easier to write and thus easier to debug.

Additionally, you should not include any "magic" values in your program. If you must use magic values, make a constant and refer to the constant within your code. Again, points will be taken off for each magic value found within your code.

You should use the standard templated queue class documented here.

Submitting

When your program compiles and produces the correct output, you should create a README file for this and all assignments. In this file, include your name, the date, and an estimate of how long you worked on the assignment. You must also include a list of names of all those people (students, professor, TAs, tutor) with whom you consulted on the assignment. See the rules for collaboration in the CPS 100e syllabus. Finally you must explain why the algorithm described in this handout is guaranteed to generate the shortest word ladder.

To submit your assignment, type:

submit100e assign4 README doladder.cpp ladder.h ladder.cpp

You should receive a message telling you that the program was submitted correctly.

To see that files were submitted correctly, you can use the submit100e command with just the assignment name and it will list the files submitted. That is, type submit100e assign4

Extra Credit

Adding the following functionality will earn you extra credit.

Stack (3 points)
Replace the queue with a stack and discuss (in your README file) the differences in behavior between the stack and queue versions of the program.
Good Matches (6 points)
Write a new version of the Ladder class. This program should process the words so that only "good" matches are tried when ladders are found. The preprocessing step will take a long time, but word ladders will be found very quickly.
The idea is that for each word, all words one-letter away are determined (and stored somehow) when the words are loaded. When looking for candidate words to enqueue, only words that are one-letter away (these are already known) are checked for previous use. This saves searching through the entire list of words and checking whether each is one letter away.

To submit the extra credit assignment, use the name assign4X.