------------------------------------ CPT Dictionary 1.0 Linux (x86,glibc) version, Java JDK/JRE 1.1, 1.2 or 1.3 required. Shareware, free for non-commercial use. Freely distributable. Updated: 20-June-2000 ------------------------------------ DESCRIPTION ----------- CPT Dictionary is browser for dictionary files (CTrees), created by the program CPT Word Lists 1.0. Features: - browsing/searching in any standard encoding including decomposition and bidi support; - creates display list of all words or clues (definitions); - supports inverted indexes; - many options for the keyboard input, the searching, and the information to be shown; - can be localized by the user. The distribution contains two sample dictionaries, just to test the installation: - "The Unofficial Smiley Dictionary" is CTree with clues (smile.dic). - "2000K" is artificial word list of two million words, stored in 9KB file (2000K.wlz). The only documentation for now is this file and for the details you should look in the description of CPT Word Lists. SYSTEM REQUIREMENTS ------------------- - Supported OS: Linux (x86), tested on: Red Hat 5.x, 6.x, and compatible, (with Corel Linux 1.0 we fetched a problem with the installation of JRE, but it works), for Win 9X and Win NT/2K there is a separate distribution. - Requires 600 KB of disk space and 32 MB RAM. - This is Java program and Sun's JDK/JRE 1.1.6 or greater or compatible is needed - see LinuxJRE.txt and JavaFonts.txt. INSTALL ------- 1. Extract this archive into temporary directory. 2. Edit "install" script to reflect your Java VM. Run it as root/su from the temporary directory in xterm (it runs under X Window). This will start the wizard and according to your choices, the CPT Dictionary program will be installed. 3. The installation program (install.class) is a self-extracting class file whose contents get extracted during the installation and two directories will be created: - the target one chosen for the installation; - /bin directory for the uninstall program (see UNINSTALL below). If your window manager is recognized (KDE, Gnome, Window Maker, Fvwm, ...), entries for CPT Dictionary in the desktop menu will be added. 4. If the installation fails, you still can extract the program from data.zip file. 5. If you have problem running CPT Dictionary, check/modify the generated cpt_dc10 script to reflect your JDK/JRE environment, especially if you install a new version of JRE after you install this program. UNINSTALL --------- To uninstall, do one of the following: 1. Click on the 'uninstall' item added to the desktop menu. This will work only if the installation program managed to recognize your window manager and a menu folder 'Crossword Power Tools' has been added, if not - see 2. 2. Go to your /bin and run ./juninst /UnInst where is the path where you installed the program. If you have installed a new version of JRE after the installation of this program, check/modify 'juninst' in your /bin subdirectory. After the uninstallation the '/bin' directory will not be removed because it serves all CPT packages. If you don't have any other CPT program, you can delete it. DOCUMENTATION ------------- A. Introduction The program can do extremely fast and incredibly slow searches depending of the settings. The rules of thumb are: - when using regular expressions do not put excessive '*' or '?' in the beginning of the search pattern - the search will be optimized if the pattern starts with real letter; - do not set 'Unicode Normalization', if you don't know what it means in the specific case (usually, it will switch off most of the optimizations); - when the main search list is clues, choose 'Browse Style'; - open the dictionaries in 'Low' memory/speed mode, the other modes are for the users who know what they are doing (see the documentation of CPT Word Lists). The rules above are effective for big dictionaries, having thousands or millions of words. To be more clear, 'extremely fast' means 'less than a second' - e.g. searching a word in 5 millions words CTree, 'incredibly slow' means 'more than 10 minutes' - e.g. searching a clue pattern in 150K words with 150K clues packed dictionary in 'Search Style'. Well, after the 'special notes' above, here is the short description of the program. B. Select Dictionary After starting the program, click on the left most button to open a dictionary and/or to add new one to the list. For now, you can do searching only in one opened file. The radio button group 'Open selected on start up' allows to choose one dictionary and to forget about this dialog. 'None' is used to clear any selection made, without browsing the whole list. The radio button group 'RAM used and search speed' is almost obsolete. In most cases you should select 'Low' (the packed CTrees now have reasonable speed, and the inverted indexes will force 'Low'). If you select 'High' for big CTree with clues, you will really gain in speed for multiple searches in 'Search Style', but the openining of the dictionary will be very slow. C. Display Options The second button from the bar will start a dialog with the following options: C.1. Format Tab. - 'Right Alignment' should be set for right-to-left scripts. - 'Shaping' should be set if you need Arabic shaping or if the dictionary is in Thai composed form. - 'Search Style' means no display list and allows all matches from the searching to be shown. - 'Browse Style' means to create display list and only the first match will be selected. When you click on a word, you will see the tags and clues linked to this word. - 'Search/Browse in Clues' will switch the main search list to the clues if available. - 'Browse with Inverted Index' will create/use supporting inverted index when searching in clues. If the file does not contain inverted index, its creation could be very slow. In this mode when you select a clue (click or search), you will see all words, which have links to this clue. The main idea behind the inverted index is to use a dictionary in both directions - e.g. if it is de-en you can browse it as en-de. C.2. Tags Text Use this tab to select the text of the tags to be shown. Note: 'Wrap Tags/Clues Lines' should not be selected if the clues are in Thai or in RTL script stored in visual order - the wrapping will not be correct. C.3. Clues Data Use this tab to select the text of the clues linked to each word to be shown. The clue types are presented by the codes and display text of the tags. In the cases when the filtering is not supported, the selection will be disabled. This selection will be a filter as well when the clues are the main browse list and inverted index is used. D. Search Text Field Here you can enter a word to search for. Simple regular expressions, bidi, and Unicode notation are supported, Note that the searching is for words, not strings. To find clue entry containing "word", you have to enter the regular expression "*word*". The communication with clipboard is always in Unicode and in logical order when the dictionary is stored in logical order (RTL scripts). You can use the 'Search' button instead of key to start the searching. This button will mean 'Find Next' when working in 'Browse Style' - the searching will start from the list item following the last selected. E. Search Options The first button on the right of the text field will start a dialog for the keyboard input and search options. E.1 Input Tab - 'Allow \uxxxx notation' will transparently convert the \uxxxx encoded characters to Unicode. - 'Regular expressions' will switch on this processing. - 'Keyboard converter' is option only for Linux. If set, the selected encoding from 'Select Font' dialog will be used to convert the typed 8-bit characters to Unicode. E.2. Search Tab - 'Ignore case' will switch on caseless searching. - 'Special casing' will switch on the special Unicode casing when changing the letter case. - 'Stop on first match' is valid for 'Search Style' mode. - 'Unicode Normalization' means to apply the selected normalization to the source text and to the search pattern. E.3 Unicode Tab Use the radio buttons to select the desired Unicode normalization. The processing for any of the normalizations is described in the documentation of CPT Word Lists. F. Select Font The 'a' button will start the dialog for selecting the display font characteristics. For Sun's Java 1.1 the font list is limited to several fonts. For any other JVM the list will contain most of the installed fonts on your OS. Some of the problems with Java keyboard input could be solved if you set the 'Encoding'. You can type or paste in the text field any sample text to see how it will be shown. G. Quit Finally, to stop the program, click on the right most button. The current setting for the dictionaries from the list will be saved. H. Localization If you want the program to talk to you in your language, you have to do the following: H.1. Replace in cpt_dc10.pr the line ProgramLocale= where is ISO-639 language code, optionally followed by "_" plus ISO-3166 country code. For example, el or el_GR is for Greek, en for English, ru for Russian, etc. This is the easy part. H.2. Put in 'locale' directory a file with name '.msg', which contains the messages in your language. Use the 'default.msg' file as a template to translate the text. There is another 'Readme.txt' file with instructions in the same directory. H.3. Ensure that in Java's 'font.properties' file, the 'dialog.plain.' and 'dialog.bold.' fonts are assigned to your locale font. This step should be done for Java 2 (v1.2 and v1.3) as well. CONTACT ------- We are very interested in receiving your comments, suggestions, and bug reports at our email: cpt.software@usa.net