/* Read in a sentence.

   Clearly, when reading in text character by character (using get0),
   it is essential NEVER to backtrack over a get0 goal, otherwise you'd
   have lost the character it read in. Thus, when you get a character,
   you must make sure it gets used. If it can't get used in one context
   (e.g. when constructing a word, and you reach a space) it must be
   passed back to predicates that can use it.
   Thus, for example, readword has three arguments below: the first is
   a character, potentially the first of a word but perhaps a bit of
   interword filling (space or tab), the second is uninstantiated at
   first but will be a word eventually, the third is uninstantiated but
   will finally be instantiated to the character that followed the word
   - i.e. the character that told readword it had reached the end of a
   word.
   Similarly, restsent (which is "higher-level" than readword) has three
   arguments: the first is the word just read, the second is the
   character following that word (which has to be handed forward for the
   reasons explained above) and the third is a variable that will be
   instantiated by restsent to a list consisting of the rest of the 
   sentence. The first argument is necessary in order that restsent can
   tell whether there IS any more to be read: it might be that the word
   last read was the last. This is decided by lastword - the current
   candidates for being the last in a sentence are the dot, the
   exclamation mark and the question mark (all valid atoms, by the way,
   though this is not a vital characteristic).

*/

read_in([W|Ws]) :- get0(C),
                   readword(C,W,C1),
                   restsent(W,C1,Ws).

/* Given a word and the character after it, read in the rest
   of the sentence */
restsent(W,_,[]) :- lastword(W), !.
restsent(W,C,[W1|Ws]) :- readword(C,W1,C1),
                         restsent(W1,C1,Ws).

/* Read in a single word, given an initial character (C),
   and remembering what character came after the word */
readword(C,W,C1) :- single_char(C), ! ,
                    name(W,[C]),
                    get0(C1).
readword(C,W,C2) :- in_word(C,NewC), !,
                    get0(C1),
                    restword(C1,Cs,C2),
                    name(W,[NewC|Cs]).
readword(C,W,C2) :- get0(C1),
                    readword(C1,W,C2).

/* When in a word, get the rest of it */
restword(C,[NewC|Cs],C2) :- in_word(C,NewC), !,
                            get0(C1),
                            restword(C1,Cs,C2).
restword(C,[],C).

/* These are single character words */
single_char(44).
single_char(59).
single_char(58).
single_char(63).
single_char(33).
single_char(46).

/* These characters can be inside a word. The second clause lowers
   upper case letters */
in_word(C,C) :- C>96, C<123.
in_word(C,L) :- C>64, C<91, L is C+32.
in_word(C,C) :- C>47, C<58.              /* digits */
in_word(39,39).
in_word(45,45).

/* These terminate a sentence */
lastword('.').
lastword('!').
lastword('?').