From: "Jean-Charles Meyrignac" To: Subject: Word square Date: Thu, 2 Mar 2000 00:08:23 +0100 Hi ! I've just discovered your page and mailing list, and I am very interested into word squares. Some years ago, I wrote some programs to search these squares, and I personally found the following 8*8 french square, using an Atari ST 8Mhz: PAPELARD ANOMALIE POLIMENT EMINENCE LAMENTER ALENTOUR RINCEUSE DETERRER I used a small wordlist containing no plural nor conjugation. Bernard Yqueaux found a double 8*8 wordsquare. I recently found the french OSD2 wordlist (comparable to OSPD wordlist, but in french) and would like to extend the search using a distributed computation. Since I have started a distributed project -check http://euler.free.fr/ -, I would like to explore all types of squares with one or more holes in it (I'm not sure about the term 'hole', in french we use 'case noire', that means 'black square'). The goal will be to find crosswords with the minimal number of holes on every size. For example, it will compute 8*8 with no hole, or 8*9 with 0 or 1 hole, etc... The user will have to choose the size of the crossword and the number of holes. My own program to compute wordsquares uses a different approach from yours. I didn't know the DAWG algorithm at that time, so I used fast lookup tables. I'm currently analyzing your program, and already found one major improvement ! (In fact, I used a trick I found on my previous program !). Also, my program performs a search using a bottom-up approach using a frequency-sorted wordlist. In french, the order of the most frequent letters are ESANTIRULOC, etc... I used this order to define a new alphabet, where E is replaced by 1, S by 2, etc... Then I reverse the words, and I sort them. My method is VERY efficient for finding quickly a FIRST solution. Of course, I'll mail you as soon as possible with more explanations. Jean-Charles Meyrignac Date: Thu, 2 Mar 2000 11:35:24 -0600 (CST) From: Graham Toal To: jean-charles.meyrignac@vnumail.com Subject: Re: Word square > Hi ! Salut! > I've just discovered your page and mailing list, and I am very interested > into word squares. > > Some years ago, I wrote some programs to search these squares, and I > personally found the following 8*8 french square, using an Atari ST 8Mhz: > > PAPELARD > ANOMALIE > POLIMENT > EMINENCE > LAMENTER > ALENTOUR > RINCEUSE > DETERRER > > I used a small wordlist containing no plural nor conjugation. > Bernard Yqueaux found a double 8*8 wordsquare. > > I recently found the french OSD2 wordlist (comparable to OSPD wordlist, but > in french) and would like to extend the search using a distributed > computation. Yeah, I found that one too! Have you also found ODS3? :-) (It's out there somewhere; it was a little harder to find. Actually I forget where I found it. I think it was embedded in someone's free scrabble program.) > Since I have started a distributed project -check http://euler.free.fr/ -, I Hmm. I can't find anything there abour word squares. > would like to explore all types of squares with one or more holes in it (I'm > not sure about the term 'hole', in french we use 'case noire', that means > 'black square'). The goal will be to find crosswords with the minimal number > of holes on every size. > For example, it will compute 8*8 with no hole, or 8*9 with 0 or 1 hole, > etc... > > The user will have to choose the size of the crossword and the number of > holes. > > My own program to compute wordsquares uses a different approach from yours. > I didn't know the DAWG algorithm at that time, so I used fast lookup tables. > > I'm currently analyzing your program, and already found one major > improvement ! (In fact, I used a trick I found on my previous program !). Excellent. I'm looking forward to hearing how that works. (Even better, to getting some code back from you :-) ) > Also, my program performs a search using a bottom-up approach using a > frequency-sorted wordlist. > In french, the order of the most frequent letters are ESANTIRULOC, etc... > I used this order to define a new alphabet, where E is replaced by 1, S by > 2, etc... > Then I reverse the words, and I sort them. > My method is VERY efficient for finding quickly a FIRST solution. Yep. I did consider that, but since I wrote the program in order to find 10x10 squares - and there may only be 0 or 1 of them - I didn't think it was worth the effort to find the first solution faster because that only works if there are a lot of solutions. Also for the smaller squares, it tended to find something pretty fast anyway. (Not in seconds perhaps, but I wasn't generating them interactively like on a web page so I wasn't too bothered by that either) > Of course, I'll mail you as soon as possible with more explanations. I'm looking forward to it! can I suggest that you subscribe to the wordgame-programmers amiling list, and post the info there rather than as personal mail to me alone? It's not a very high-volume mailing list but there are a few people there who would definitely be interested. Thanks Graham From: "Jean-Charles Meyrignac" To: Subject: Optimisation Date: Thu, 2 Mar 2000 20:13:46 +0100 Hi ! I've finally tested my optimisations. The program now runs 2 times faster than before, except that the output takes a lot of time. You'll find attached the 2 versions, the first one is yours, and the second one is the modified version. What trick did I use ? Well, that's very simple, since there are 26 letters in most of the alphabets, I can store letters from 'A' to 'Z' on 26 bits, instead of 26 characters. The following code does the conversion: 1 << (character & 63) This way, 'A' will become 2, 'B' will become 4, etc... Thus, there is no more strings in the program ! Jean-Charles PS: Here are the benchmarks: 4*4: 3718565 solutions found in 110 seconds with your program, and 54 secs with my version 5*5: 314261 solutions in 586 seconds with yours and 266 secs with my version. Date: Thu, 2 Mar 2000 16:36:40 -0600 (CST) From: Graham Toal To: jean-charles.meyrignac@vnumail.com Subject: Re: Word square > > PS Did you look at http://www.gtoal.com/athome/scrabble/laeuter/ > > which is some other improvements to the wordsquare program by Martin > Laeuter? > > > Hum... > It seems that his code generator is written in Perl language, and I have no > access to a Perl interpreter (I'm using Win32, and the Activeperl package is > a little bit too huge for my modem). > > Can you send me an output for 6*6 square ? It's in http://www.gtoal.com/athome/scrabble/laeuter/generated-6.c How can Perl be too large for your modem when you just copied 250Mb of word games sources! Surely Perl isn't *that* bloated??! :-) G To: Date: Sun, 5 Mar 2000 18:15:07 +0100 From: "Jean-Charles Meyrignac" Subject: [wgp] Grid filling Hi ! I'm new to this list, and would like to speak about some wordgames. The first one is a grid filling contest that is held by a french game magazine. This game is called 'Maxiscore' and the purpose of this contest is to fill a grid with letters, and these letters are weighted. As an example, here is the last contest: we had to fill the following grid (I used 0 and 1, since they are fully aligned, and the cases that are represented by 0 cannot contain a letter): 1111110111111 1010100010101 1011111111101 1010101010101 1111101011111 1010001000101 1011111111101 0001001001000 1111101011111 1001001001001 1011111111101 1001010101001 1111110111111 with the following letters: N weighting 5 O weighting 6 C weighting 4 T weigthing 6 A weighting 4 M weighting 2 B weighting 1 U weighting 3 L weighting 1 E weighting 2 The goal of this game is to fill the grid ONLY with these letters, and we have to reach the highest possible score. For example, the word TONTON gives us 6+6+5+6+6+5=34 points. The highest score achieved in this contest has been 541 points reached by 575 people ! With my own program, I reached only 539 points :-( I used a list of 1277 words (the contest disallows conjugations and plural and the reuse of words, and some other restictions). I computed some days for this contest, but due to mis-preparation, I didn't have the time to finish my computations. Also, it was a lot slower than expected ! My question is: is there a method which can achieve the best score in a given time ? The second problem is also about grid filling. In the 'Science & Vie' number 781 dated October 1982, a man called Patrick Vuillaume published some interesting grids. Of course, some of these grids are classic, like the 8*8 wordsquares, but others are not. For example: REAMORCER ENDOCARPE ADEN BAIS MONOTONES OC TATERA RABOTEURS CRANEUSES EPIERRERA RESSASSAS (this time I haven't been able to align the words). This is, to my knowledge, the 'most filled' 9*9 square in french, there are only 2 holes. In the article, there is also one 10*10 with 4 holes, one 12*10 with 8 holes. Since I'm french and as I'm currently running a distributed project (http://euler.free.fr), I intend to start a distributed crossword project via Internet. The goal will be to fill grids of any size with the minimum number of holes. For example, 10*10 with 0 or 1 hole, 11*10 with less than 2 holes. Thanks to Graham Toal, I found the laeuter program which is able to compute symetric wordsquares, and I'm pretty sure that this program can be extended to solve my problem. Can anybody help me with some documentation about Liang's algorithm ? Thank you. Jean-Charles Meyrignac To: , "Jean-Charles Meyrignac" Date: Sun, 5 Mar 2000 10:46:46 -0700 From: "Gordon" Subject: Re: [wgp] Grid filling Jean-Charles, I recall a researcher in Australia proving that solving similar puzzles was NP-Complete in the early 1990s. I cannot recall his name, nor eaxctly how similar the puzzles were, so I will attempt to dig through my files when I have time (hopefully, within the next few weeks). The implication of NP-Complete is that no algorithm can be guaranteed to find the optimal solution within realistic time constraints as the size of the puzzle increases. This, of course, does not mean that some algorithm might not find optimal solutions for sufficiently small puzzles or do a good job of approximation to optimality for large puzzles. I hope this helps put your problem in perspective, Steven Gordon Date: Sun, 5 Mar 2000 14:10:21 -0600 (CST) From: Graham Toal To: SAGordon@primenet.com Subject: Re: [wgp] Grid filling > The implication of NP-Complete is that no algorithm can be guaranteed to > find the optimal solution within realistic time constraints as the size of > the puzzle increases. This, of course, does not mean that some algorithm > might not find optimal solutions for sufficiently small puzzles or do a > good job of approximation to optimality for large puzzles. One thing to bear in mind algorithmic complexity is usually described in terms of time only, but almost always it is actually a product of time and space with the space factor usually omitted because it is constant. However if you have an algorithm that is say X^2 in time, it might be possible to make it linear in time but X^2 in space. I mention this because the time for these big square problems can be days or weeks, but the space is usually trivial. It may make sense to throw huge amounts of ram or disk at the problems (for any given size) and bring the time down considerably. For example: lets say you have a 9x9 square with just one hole in it, in the center, such that on those two rows you could place two 4-letter words. Well, instead of having a more complex algorithm to handle the special case, just generate a set of new 9-letter words which is really the combination of all 4-letter words with a space character literally inserted between then. The disk space for this new list is O(w(4)^2) where w(4) means the number of words of length 4. For each shape you can add a similar list (6+2, 5+3, 3+5, 2+6). There will be some point when this ceases to be feasible because of the size of the word lists, but if you don't reach that point, it could be a winner. Then just run the regular 9x9 word square program. Because of the trimming of the search by the dawg data structure, the run time should not be excessively greater, but the space requirements of the dawg are indeed much greater. Perhaps, however, within the limits of your system. (This might all be unnecessary; for all I know it might be trivially easy to generate the cross product of say the 5 and 3 letter lists on demand while running the search, without too great a programming overhead... I've never tried it.) G To: wordgame-programmers@onelist.com Date: Wed, 29 Jan 1958 11:31:00 -0600 (CST) From: Graham Toal Subject: Re: [wgp] Grid filling I messed up my mailer today so here are two replies that I'm reposting - G. ------------------------------------------------------------------------ "Jean-Charles Meyrignac" wrote: > This game is called 'Maxiscore' and the purpose of this contest is to fill a > grid with letters, and these letters are weighted. The wordsquare programs you've been looking at are highly optimised for the specific subset of problems which are simple squares. The generic problem is equivalent to that of generating an arbitrary crossword. You might find the crossword programs in the archives are a better place to start than the word square programs. I don't think they're as efficient but they are much more general. See for example: http://www.gtoal.com/athome/scrabble/cross/ http://www.gtoal.com/athome/scrabble/cword/ Both of these crossword generators allow seeding of the crossword grid with chosen words, which they will then work around. It occurs to me that a possible heuristic for generating high-scoring grid fills (assuming that an exhaustive search is intractible) would be to find the highest scoring play for each position, then place as many of those that are compatible with each other (even if this means placing only a small subset of words which don't interact), then run a recursive filling search on the remaining board. A "greedy algorithm". I'm sure the obvious greedy algorith (place the highest scoring word first, then recurse) has already been tried? Did it run into hill-climbing issues? I also wonder if the standard solution for hill-climbing problems might be applicable here: simulated annealing? I.e. get a solution any way you can, then randomly remove any word from the grid. Try a higher-scoring word, and backtrack out from that word removing words which don't intersect, replacing them with words which do, to form the smallest replacement that you can which is compatible. Keep ones which are better. > Thanks to Graham Toal, I found the laeuter program which is able to compute > symetric wordsquares, and I'm pretty sure that this program can be extended > to solve my problem. I believe Martin Laeuter is still a member of this list. > Can anybody help me with some documentation about Liang's algorithm ? I am 99% certain that Liang's Stanford thesis on hyphenation which explains the packed trie data structure very well is *not* online anywhere. You will almost certainly have to contact Stanford for a copy. It would be a service to the net to scan it in if you get one. I did get a copy many years ago when I worked in the TeX field but that was before scanners were common, and I'm afraid I've mislaid it since then. If I ever find it I'll scan it in myself, if no-one else has by then. The thesis paper is: Liang, F. M. Word hy-phen-a-tion by com-put-er / by Franklin Mark Liang. Word hyphenation by computer. Thesis (Ph. D.)--Stanford University, 1983. I don't know where Liang would be contactable by email for permission to reproduce part of his thesis, though perhaps his Stanford tutor, Don Knuth, or Knuth's secretary might know. (I think I heard that Knuth doesn't handle email much himself any more) I could have sworn I remember writing this algorithm up myself, but if I did, I didn't save it anywhere convenient :-( Actually trie vs dawg isn't critical to these problems. It may give a factor of 2 in speed, but space is the same. 2x speedup may not be significant here. G PS Folks: It's real nice to see that the effort to share code is starting to pay off! If anyone here still has some word game sources that they're willing to release to the archives, please email me. Don't worry if it's not your prettiest code and you'd rather wait until you tidy it up - people really don't mind. If I waited until I'd rewritten everything that I didn't think was perfect, I'd never release *anything* ! :-) ------------------------------------------------------------------------- SAGordon@primenet.com wrote: > The implication of NP-Complete is that no algorithm can be guaranteed to > find the optimal solution within realistic time constraints as the size of > the puzzle increases. This, of course, does not mean that some algorithm > might not find optimal solutions for sufficiently small puzzles or do a > good job of approximation to optimality for large puzzles. One thing to bear in mind: algorithmic complexity is usually described in terms of time only, but almost always it is actually a product of time and space with the space factor usually omitted because it is constant. However if you have an algorithm that is say X^2 in time, it might be possible to make it linear in time but X^2 in space. I mention this because the time for these big square problems can be days or weeks, but the space is usually trivial. It may make sense to throw huge amounts of ram or disk at the problems (for any given size) and bring the time down considerably. For example: lets say you have a 9x9 square with just one hole in it, in the center, such that on those two rows you could place two 4-letter words. Well, instead of having a more complex algorithm to handle the special case, just generate a set of new 9-letter words which is really the combination of all 4-letter words with a space character literally inserted between then. The disk space for this new list is O(w(4)^2) where w(4) means the number of words of length 4. For each shape you can add a similar list (6+2, 5+3, 3+5, 2+6). There will be some point when this ceases to be feasible because of the size of the word lists, but if you don't reach that point, it could be a winner. Then just run the regular 9x9 word square program. Because of the trimming of the search by the dawg data structure, the run time should not be excessively greater, but the space requirements of the dawg are indeed much greater. Perhaps, however, within the limits of your system. (This might all be unnecessary; for all I know it might be trivially easy to generate the cross product of say the 5 and 3 letter lists on demand while running the search, without too great a programming overhead... I've never tried it.) From: "Jean-Charles Meyrignac" To: "Graham Toal" Subject: Maxiscore Date: Tue, 7 Mar 2000 00:01:36 +0100 Hi ! You convinced me ;-) Here is my Maxiscore program with full sources, and the compiled Windows executable. The original source is called CROISE.C and tries to load the file LISTE4, now renamed DICO.TXT. The new version has an internal assembly core for faster computations and is highly impossible to transcribe, but that's the price of efficiency. Anyway, I think that it could be useful for readers. The algorithm I used is simply to put the words in a certain order, which has been manually defined. In the Windows version, I added a refinment. First, the computation is done during idle cycles, so the program is not blocking. Second, I added a duration parameter which allows to compute during a certain amount of time on each word. This little trick helped me to improve the solution a lot, since I found a better solution just by putting this value to 60 seconds. Also, I didn't check yet your suggested techniques, since I thought that I could speed up the program so that it could be able to explore the whole combinations. I have to admit that I was wrong, but this was not so evident at first sight. I've not worked on the program since the last contest, but I have no doubt that I'll work on it very soon (there is one contest every year). I'll surely test some of your tricks on the problem. Thanks. Jean-Charles PS: I'll post an answer for the 9*9 crossword on the mailing list tomorrow. ----------------------------------------------------------------------------- ************* FINAL OUTCOME *************** Hello, Some time ago, I sent a message explaining I'd like to find a method to fill a grid with some constraints on a contest. This contest is held once a year by a game magazine called 'Tele 7 Jeux' and is called 'Maxiscore'. Since the last time, I improved my program by adding some assembly routines, so the program is now very fast (but I've already found some other improvements). Ok, now, I will talk about the contest. The goal of the contest was to fill a grid with the letters: INCROYABLE (in english: incredible). Each letter had a value: I=2 N=1 C=6 R=6 O=1 Y=2 A=4 B=3 L=5 E=4 and the goal was to have the highest scoring grid. With my program, I found the following solution: CARRER CRACRA :64 A E R E A R :28 R CARCERALE R :57 A R E N L C I :28 CREER C ECALE :55 A E E L R :23 L RECARRELE E :54 C C A :16 RACLE L ACCRO :53 A A E E C :22 C RICERCARE R :56 E R R L E E :29 RECELE ECRIER :57 Sum=542 (the grid is readable with fixed-width font). Since the competition is tough, I was anxious to wait for the results (in the previous contests, there were a lot of participants who found the best solutions). The results have been published at the beginning of May. The best solutions were: CARRER CRACRA E E R A R R R CARCAILLE R C Y E C E A I LACER C RANCE E L E C R R ECAILLERE E A E E CLERC R ACCRO R R E R C E RECERCLER R E L R A E E RECELE BERCER Sum=541 CARRER CRACRA E E R A R R R CARCAILLE R C Y E C E A I LACER C RANCE E L E C R R ECAILLERE E A E E CLERC R ACCRO R R E R C E RECERCLER R E L R A E E RECELE BARRER Sum=541 and 898 participants found these grids !!! What sounded strange to me was the fact that the best grid was 541, since I found 542 ? In fact, it was written that we had to use ALL the letters of the word, and mine didn't include the Y letter). Argh. The 20 final winners were judged by their definition of the word 'INCROYABLE'. (my definition was 'Faisait des manieres sous le Directoire'). Jean-Charles Meyrignac ---------------------------------------------------------------------------- To: "Wordgame-Programmers" From: "Jean-Charles Meyrignac" Date: Sun, 18 Mar 2001 10:57:08 +0100 Subject: [wgp] New episode of the Maxiscore saga. Every year, in March, a magazine called "Télé 7 Jeux" proposes a contest called "Le championnat de France de Mots Fléchés" (something like: French crossword championship). The subsidiary question called 'Maxiscore' is quite interesting for us, since it requires to fill a grid with the 10 letters of a word, where every letter of this word has a value. This year, the grid has been changed, in fact, it has been simplified. Last year, I wrote a program, which you can find on Graham Toal's excellent site, to find the best possible solution, but my program was way too slow and couldn't explore all the grids, so I didn't find the optimal grid. This time, I carefully rewrote my program (it's now the third incarnation of my program since 3 years !) and obtained a VERY fast program, since the exploration of ALL grids has been achieved in less than 2 hours. I decided not to publish my code due to the fact that the winning prize is really interesting (and the contest is annual), but I can send short explanations to anyone asking me. Oh, some details of the 2001 grid. The word is PSYCHIATRE where: P=3, S=4, Y=1, C=3, H=1, I=4, A=4, T=6, R=4, E=4 There are strong restrictions on the words to fill the following grid: ...... ...... . . ...... ...... . . ....... . . . . . .... . .... . . . ....... . . . .... . .... . . . . . ....... . . ...... ...... where every point represents a letter(I'm sorry, but if you really want to look the grid, view it with fixed font). Finally, I have found 1,080 solutions with 472 total points. Jean-Charles