Jan 00 Challenge
Volume Number: 16 (2000)
Issue Number: 1
Column Tag: Programmer's Challenge
Programmer's Challenge
by Bob Boonstra, Westford, MA
Peg Triangle
Welcome to the New Year! If you're not one of those people who quibble about whether the new century starts in 2000 or 2001, welcome also to a new century and a new millennium! As I write this, I can't be certain whether the doomsday predictions will turn out to have had any validity, but I'm betting that as you read this, the lights will be on, the telephones will work, the ATMs might be occasionally short of cash, the grocery stores will be replenishing their canned goods, and the pundits will be questioning what the fuss was all about. (At least this is my prediction for the so-called First World - I'm not close enough to the situation in the rest of the world to be as confident.) Of course, as someone whose Real Job included making certain that this transition to January was as boring as any other, I know that there was a lot of hard work invested in ensuring that life went on in the IT world.
But I'm betting that most of you are ready and able to take on this month's Programmer's Challenge. Especially since most of you are presumably Mac users, who didn't (or don't) have to go through the complicated process of remediating their Monopoly-OS machine so that it would continue to function. We're going to make the Challenge a little simpler this month, in hopes of encouraging some new participants to take a shot at making the Top Five.
A while back my daughter presented me with a puzzle she had made in wood-shop. (Yes, my town has progressed to the point where the girls take wood-shop and metallurgy classes and the boys take cooking and sewing.) The puzzle is a triangular block of wood with holes drilled into a triangular pattern: 1 hole in the first row, 2 holes in the second, etc. A peg occupies each of the holes, except one. The puzzle is solved by a sequence of jump moves, where a peg jumps over an adjacent peg into an open hole in the same direction, after which the jumped peg is removed from the board. The object of the game is to remove all of the pegs from the board, except one.
Our Challenge will be a slight generalization of this puzzle. The initial board position will always contain at least one vacant hole, but perhaps more than one. The object will be to minimize the number of pegs remaining on the board at the end of the game, but because of the initial configuration it might not be possible to reduce the number of pegs remaining to one.
The prototype for the code you should write is:
#if defined(__cplusplus)
extern "C" {
#endif
typedef struct TrianglePegPosition {
short row; /* 0..numberOfRows */
short col; /* -row, -row+2, ..., +row */
} TrianglePegPosition;
typedef struct PegJump {
TrianglePegPosition from;
TrianglePegPosition to;
// must satisfy
// ( (to.row == from.row) && (to.col == from.col +- 4) &&
// ((from.row +- 1, from.col +-2) occupied) ||
// (to.row == from.row +- 2) && (to.col == from.col +- 2) &&
// ((from.row +- 1, from.col +-1) occupied) )
} PegJump;
short /* number of moves */ SolvePegTriangle (
short triangleSize, /* number of rows in triangle to solve */
short numInitialPegs, /* number of pegs in starting puzzle position */
TrianglePegPosition initialPegPositions[],
/* peg locations in starting puzzle position */
PegJump pegJumps[]
/* return peg moves that solve the puzzle here, in sequence */
);
#if defined(__cplusplus)
}
#endif
Your SolvePegTriangle routine is provided with the initial puzzle conditions and must store the moves required to solve the puzzle in pegJumps, returning from SolvePegTriangle the number of moves in pegJumps. As initial conditions, SolvePegTriangle is given the dimension of the puzzle (triangleSize), the number of pegs in the initial puzzle state (numInitialPegs), and the position of those initial pegs.
The winner will be the solution that solves a sequence of puzzles at the lowest cost. Each millisecond of execution time used to solve the puzzles will incur a cost of 1 point. Each peg left on the board beyond the first will incur a cost of 1000 points (whether or not it is possible to remove all but one peg from the board).
This will be a native PowerPC Challenge, using the CodeWarrior Pro 5 environment. Solutions may be coded in C, C++, or Pascal. Solutions in Java will also be accepted this month. Java entries must be accompanied by a test driver that uses the interface provided in the problem statement.
Remember that you can get a head start on the Programmer's Challenge by subscribing to the Challenge mailing list. See www.mactech.com/progchallenge/ for details.
Three Months Ago Winner
The October SuperDuperGhost Challenge attracted entries from four of the top six contestants in the Programmer's Challenge points standing, and the top two in the points standing finished one and two. Congratulations to Ernst Munter (Kanata, Ontario) and Tom Saxton for finishing first and second, respectively, in the Ghost Challenge.
The SuperDuperGhost Challenge required contestants to compete in a tournament, where the object was to win a generalized game of Ghost. The concept of Ghost is simple - players spell a word, taking turns adding letters, trying to avoid being the one to add the last letter to a word in the dictionary. SuperDuperGhost complicated the original game by allowing the addition of letters to the end of the string, to the beginning of the string, to the beginning or the end, or finally at any place in the growing string, depending on the mode of the game. Scoring was based on winning games, with a penalty based on execution time, and an additional penalty for forming a word without declaring yourself the loser.
Each of the four entries in the Challenge competed against every other entry, with each player having a chance to play first against each other. With four entries in this Challenge, that meant that there were 12 matches in each round, of which each entry competed in 6. I conducted 10 rounds of competition, times four game modes (addToEndOnly, addToBeginningOnly, addToBeginningOrEnd, addAnywhere), for a total of 480 matches, with each player competing in 60 of each type.
Playing first had a significant advantage. Ernst's winning entry was successful in every single match where it made the first move, regardless of whether the game mode allowed additions to the beginning, end, beginning or end, or arbitrary location within the growing word string. Tom's second place entry did almost as well, winning 117 of the 120 matches where it played first. The third-place entry of Sebastian Maurer won all of the entries where it played first for the game modes in which it chose to compete; however, Sebastian only had time to complete code for two of the modes of the game (addToEndOnly and addToBeginningOnly). The table below lists the number of wins achieved by each entry in each of the four modes of the game.
|
Wins addEnd |
Wins addBeginning |
Wins addEither |
Wins addAnywhere |
Ernst Munter |
38 |
40 |
50 |
49 |
Tom Saxton |
39 |
40 |
49 |
45 |
Sebastian Maurer |
39 |
40 |
0 |
0 |
JG Heithcock |
4 |
0 |
21 |
26 |
In the addToEndOnly and addToBeginningOnly modes, Ernst employs a recursive solution to find a letter that does not complete a word, but which forces an opponent to complete a word. If a win cannot be guaranteed, he selects a letter that admits solutions where his opponent must complete a word. Depending on the dictionary, this strategy always succeeds when Ernst moves first and sometimes succeeds otherwise. In the other two game modes, the code attempts to force the opponent to complete a word, but otherwise tries to guarantee that the opponent is forced to make a word shorter than the shortest word Ernst can be forced to make.
The table below lists, for each of the solutions submitted, the total score achieved, the execution time in milliseconds, the total number of wins achieved, the number of wins when playing first, and the size and language code parameters. As usual, the number in parentheses after the entrant's name is the total number of Challenge points earned in all Challenges prior to this one.
Name |
Score |
Time (msec) |
Total Wins |
Wins play first |
Code Size |
Data Size |
Lang |
Ernst Munter (527) |
17700 | 979 |
177 | 120 |
10412 |
2019 |
C++ |
Tom Saxton (128) |
17250 |
2575 |
173 |
117 |
6132 |
637 |
C |
Sebastian Maurer (70) |
7900 |
1157 |
79 |
60 |
4448 |
187 |
C |
JG Heithcock (39) |
5000 |
1543 |
51 |
28 |
2856 |
97 |
C |
Top Contestants
Listed here are the Top Contestants for the Programmer's Challenge, including everyone who has accumulated 10 or more points during the past two years. The numbers below include points awarded over the 24 most recent contests, including points earned by this month's entrants.
Rank | Name | Points |
1. | Munter, Ernst | 237 |
2. | Saxton, Tom | 126 |
3. | Maurer, Sebastian | 77 |
4. | Boring, Randy | 66 |
5. | Rieken, Willeke | 51 |
6. | Heithcock, JG | 43 |
7. | Shearer, Rob | 34 |
8. | Brown, Pat | 20 |
9. | Hostetter, Mat | 20 |
10. | Mallett, Jeff | 20 |
11. | Jones, Dennis | 12 |
12. | Hart, Alan | 11 |
13. | Hewett, Kevin | 10 |
14. | Murphy, ACC | 10 |
15. | Selengut, Jared | 10 |
16. | Smith, Brad | 10 |
17. | Strout, Joe | 10 |
18. | Varilly, Patrick | 10 |
There are three ways to earn points: (1) scoring in the top 5 of any Challenge, (2) being the first person to find a bug in a published winning solution or, (3) being the first person to suggest a Challenge that I use. The points you can win are:
1st place | 20 points |
2nd place | 10 points |
3rd place | 7 points |
4th place | 4 points |
5th place | 2 points |
finding bug | 2 points |
suggesting Challenge | 2 points |
Here is Ernst's winning Ghost solution:
ghost-v2.cp
Copyright © 1999
Ernst Munter, Kanata, ON, Canada
/*
October 3, 1999.
Submission to MacTech Programmer's Challenge for October 99.
Copyright © 1999, Ernst Munter, Kanata, ON, Canada.
"Super Duper Ghost"
Version 2.
Problem Statement
---------
The program plays a game of ghost against one other player. Given a
dictionary, and an initial ghost string, one letter must be added to
the ghost string without making a dictionary word. But the new ghost
string must be part of a dictionary word.
There are four game types:
(0) add a letter only to the end of the ghost string
(1) add a letter only to the beginning.
(2) add a letter only to the either the end or the beginning
(3) add a letter anywhere
Solution Strategy
---------
At each stage of the game the current ghost string is incremented by one
letter to yield a new ghost string. Processing at each stage in the
game is independent of previous steps. A "wordInMind" at one step may
be completely different from the wordInMind at the previous or the next
step.
A full lookahead is employed for game type (0), using recursion. The
objective is to find a letter which does not complete a word, but which
forces the opponent to make a word.
Depending on the actual dictionary, this strategy leads to a certain win
whenever possible.
If a win cannot be assured by the lookahead search, I select a move
which preferably includes words the opponent must make. But of course,
the failed search means that the opponent can win if they play
correctly.
The strategy for game type (1) is identical. In order to be able to use
the same code, all working strings are reversed, and the lookups are
done with a sorted inverse dictionary in which all words are reversed
(PLAIN becomes NIALP).
Modes 2 and 3
-------
Game types 2 and 3 are more complex and the strategy for these modes is
slightly different to account for the more difficult job of finding
valid target words.
Some of the code from the Disambiguator Programmers Challenge of July
1997 is reused to provide a method for finding dictionary words matching
ghost strings according to modes 2 and 3.
But because this lookup is not as fast as the direct binary search used
in modes 0 and 1, full look-ahead cannot be accomplished in a short
enough time. So only a simple heuristic is used:
The words are conceptually divided into odd and even sets. A player who
starts will loose if he makes a word with an odd number of letters.
Conversely, he can win if he can force the opponent to make an
even-length word.
The choice of the next letter to play is thus based on an analysis of
all possible next letters, in all possible positions in the ghost
string.
The first choice, if available, is to use a letter that results in only
words the opponent can make (thus guaranteeing a win).
The second choice is to use a letter where the shortest word I can make
is longer than the shortest word the opponent can make.
Optimizations
-------
The moves for ghostStrings of length 0 and 1 are cached to avoid
recomputation when several games are played.
Assumptions
------
Dictionary words are at least 4, and at most 31 characters long.
Dynamic memory allocated by the program for the inverse dictionary and
the disambiguator tables is about 24 bytes per word of the original
dictionary (depending on actual average word length).
Static memory is about 2K.
*/
#define DBG 0
#include "ghost.h"
#include "newdisambig.h"
#include "myString.h"
#include <string.h>
#include <stdlib.h>
#if DBG
#include <stdio.h>
#else
#define NDEBUG
#endif
#include <assert.h>
/******************** typedefs ************************/
typedef unsigned long ulong;
typedef const char* charPtr;
struct CacheType {
ulong response;// (insertion_point << 8) + character
charPtr* wordInMind;
};
typedef CacheType Cache[32];
/******************** globals **********************/
static charPtr* gDictWords;
static long gNumWords;
static char* gInverseChars;
static charPtr* gInverseDict;
static GameType gGameType;
// Selection of character orders for faster convergence
static char* gSearchOrder="JQXZWKVFBYHPMGUDCLOTNRASIE";
static char* gAlternateOrder="EISARNTOLCDUGMPHYBFVKWZXQJ";
// Caches of the 1st and second character choices
static Cache cache[4];
static CacheType* choices;
/******************** Utilities ***********************/
ClearCaches
static void ClearCaches()
{
memset(cache,0,sizeof(cache));
choices=cache[0];
}
ReverseString
static void ReverseString(char* s,int len)
{
char* l=s;
char* r=s+len-1;
char t=*l;*l=*r;*r=t;
for (int i=1;i<len/2;i++){
char t=*++l;*l=*-r;*r=t;
}
}
Expand
inline void Expand(char* newGhost,char* fragment,
int newGhostLen,int insert)
// copy fragment into newGhost, leaving space for insert
{
strncpy(newGhost,fragment,insert);
strncpy(newGhost+insert+1,fragment+insert,newGhostLen-insert);
newGhost[newGhostLen]=0;
}
CmpStr
static int CmpStr(const void* a,const void* b)
{
return StrCmp4(*((charPtr*)a),*((charPtr*)b));
}
IsinDict
static charPtr* IsinDict(charPtr s)
{
charPtr* x=(charPtr*)bsearch(&s,gDictWords,gNumWords,4,CmpStr);
return x;
}
GetCharPositions
static void GetCharPositions(
char* ghostString,
int ghostLen,
const char* word,
int* charPositions)
{
// Collects character positions for mode 3 from actual word
int k=-1;
for (int i=0;i<ghostLen;i++) {
char c=*ghostString;
char d;
while (0 != (d=word[++k])) {
if (c==d) break;
}
if (d==0)
return;
charPositions[i]=k;
ghostString++;
}
}
// Customized heap sort for the inverse dictionary
Send
static void Send(
// Inserts word wp in wordList as a priority queue
// in preparation for sorting later
charPtr wordList[],
charPtr wp,
ulong numWords)
{
charPtr* base=wordList-1;
charPtr z;
long i=numWords+1,j=i>>1;
while ((j>0)&&StrCmp4(wp,z=base[j])>0) {
base[i]=z;
i=j;
j=i>>1;
}
base[i]=wp;
}
Sort
static void Sort(
charPtr wordList[],
long numWords)
{
// Heap sort step 2, used for final sorting of the wordList.
charPtr* sList=wordList-1;
charPtr x;
int i,j;
charPtr* b=sList+numWords+1;
if (numWords>1) do {
i=1;j=2;
x=sList[numWords-];
*(-b) = sList[1];
if (numWords<=1) {
sList[1]=x;
break;
}
while (j<=numWords) {
if ((j<numWords) &&
(StrCmp4(sList[j],sList[j+1])<0))
j++;
if (StrCmp4(x,sList[j])>=0)
break;
sList[i]=sList[j];
i=j;j+=j;
}
sList[i]=x;
} while(1);
}
/****************** The inverse dictionary **********/
ScanDict
static int ScanDict()
{
// counts characters in all dictionary words
int num=0;
const charPtr* d=gDictWords;
for (int i=0;i<gNumWords;i++,d++)
{
num+=StrLen4(*d);
}
return num;
}
InverseCopy
static void InverseCopy(int numChars)
{
// Copies/inverts all words from gDictWords to gInverseDict
// by doing a character by character copy back to front.
const charPtr* d=gDictWords;
char* dest=gInverseChars+numChars;
int numWords=0;
for (int i=0;i<gNumWords;i++,d++)
{
*-dest=0;
const char* src=*d;
while (*src) {
*-dest = *src++;
}
Send(gInverseDict,dest,numWords++);
}
}
MakeInverseDict
static void MakeInverseDict()
{
if (gInverseDict) // already initialized
return;
int numChars=ScanDict();
gInverseChars=new char[numChars+gNumWords];
gInverseDict=new charPtr[gNumWords];
InverseCopy(numChars+gNumWords);
Sort(gInverseDict,gNumWords);
}
DeleteInverseDict
static void DeleteInverseDict()
{
if (gInverseDict) {
delete [] gInverseDict;
gInverseDict=0;
}
if (gInverseChars) {
delete [] gInverseChars;
gInverseChars=0;
}
}
/************ Finding ghosts in a dictionary ***********/
CmpFragment
static int gLength;
static int CmpFragment(const void* a,const void* b)
{
return strncmp(*((charPtr*)a),*((charPtr*)b),gLength);
}
FindFirst
static charPtr* FindFirst(
charPtr fragment,
int len,
charPtr* dict)
{
charPtr* x;
{
gLength=len;
x=(charPtr*)bsearch(&fragment,dict,gNumWords,4,
CmpFragment);
if (strncmp(fragment,*x,len))
return 0;
while (x>dict) {
charPtr* y=x-1;
if (strncmp(fragment,*y,len))
break;
x=y;
}
}
return x;
}
/************** Choosing the next move *******************/
Choose01
static ulong Choose01(
char* fragment, // word fragment (ghost string)
int fragLen,
charPtr* dict)
// Recursive search for best next character, modes 0 and 1
// Returns char-to-play that would force opp to make a word
{
// Try the whole alphabet, least used letters first
char* tryChar=gSearchOrder;
do {
// tack a character to the end of the fragment
fragment[fragLen-1]=*tryChar;
// find the shortest matching word in the dictionary
charPtr* x=FindFirst(fragment,fragLen,dict);
if (x){
if ((*x)[fragLen]==0) {
// bad choice: we must not make a complete word
continue;
} else {
// call opponent's choice, see if they can win
char next=Choose01(fragment,fragLen+1,dict);
if (next==0){
// opponont is forced to make a word, caller wins with this choice
return *tryChar;
}
}
}
} while (*++tryChar);
// remove letter and terminate shorter string before returning
fragment[fragLen-1]=0;
// failed
return 0;
}
#if DBG
static int S[4];
#endif
Choose23
static ulong Choose23(char* fragment,int newGhostLen)
// Search for best move for modes 2 and 3
// Returns character and insertion point, packed in <16 bits
{
ulong bestResponse=0;
// words are in groups according to length, even and odd.
// myGroups contain words I would complete
// oppGroups contain words the opponent would complete
int myGroups=newGhostLen & 1;
#define oppGroups (1 ^ myGroups)
#define MYGROUP(x) (0==(((x)^newGhostLen)&1))
#define OPPGROUP(x) (0!=(((x)^newGhostLen)&1))
int step=1;
if ((gGameType==addToBeginningOrEnd) &&
(newGhostLen>1))
step=newGhostLen-1;
int delta,bestDelta=-64;
for (int insert=0;insert<newGhostLen;insert+=step)
{
char newGhost[32];
Expand(newGhost,fragment,newGhostLen,insert);
char* tryChar;
if (newGhostLen==1) tryChar=gAlternateOrder;
else tryChar=gSearchOrder;
do {
newGhost[insert]=*tryChar;
int shortest[2];
int numGroups=
ShortestMatches(newGhost,newGhostLen,gGameType,shortest);
if (numGroups==0) // The fragment is not found in any word
continue;
if (numGroups==1)
{
if MYGROUP(shortest[0]) {
if (shortest[0]==newGhostLen) {
// must never make a dictionary word
continue;
}
// Mygroup words are bad, opp loose only if they make a mistake
delta=shortest[0]-64;
if (delta>bestDelta) {
// But keep longest of these choices, it may be all we have
bestDelta=delta;
bestResponse=*tryChar + (insert<<8);
}
} else {
// I am guaranteed to win, return immediately
bestResponse=*tryChar + (insert<<8);
return bestResponse;
}
} else {
// Words in both groups. Keep the one with the highest delta,
// that is where opponents word(s) are shorter than mine.
if MYGROUP(shortest[0]) {
if (shortest[0]==newGhostLen) {
// must never make a dictionary word
continue;
}
delta=shortest[0]-shortest[1];
} else {
if (shortest[1]==newGhostLen) {
// must never make a dictionary word
continue;
}
delta=shortest[1]-shortest[0];
}
if (delta>bestDelta) {
bestDelta=delta;
bestResponse=*tryChar + (insert<<8);
}
}
} while (*++tryChar);
}
return bestResponse;
}
/************** Playing the next move *******************/
PlayMode0
static charPtr* PlayMode0(
int newGhostLen,
char newGhostString[256],
charPtr* dict,
ulong& response)
{
// use recursive search for winning move
response=Choose01(newGhostString,newGhostLen,dict);
if (response) goto OK;
// else
// no winning move was found, so let's use a heuristic choice:
// The preference is for the shortest string of even (odd) length
// which the opponent would have to make. But if all available
// words are of the wrong length (I will have to make it), I choose
// a longer target word to increase the chances for the opponent
// to make a mistake.
charPtr* z=FindFirst(newGhostString,newGhostLen-1,dict);
ulong bestResponse[2]={0,0};
ulong len,bestLen[2];
int preferShort=1 & (newGhostLen+1);
#define preferLong (1-preferShort)
bestLen[preferLong]=0;
bestLen[preferShort]=100;
response=0;
for(;;) {
// get first word which differs in the next letter
do {
ulong nextLetter=(*z)[newGhostLen - 1];
if (nextLetter > response) {
response=nextLetter;
len=StrLen4(*z);
if (len > newGhostLen)
goto Try_This;
}
z++;
if (z>=dict+gNumWords)
break;
// make sure it is still the same word stem
} while(0==strncmp(newGhostString,*z,newGhostLen-1));
no_other_choice:
break;
Try_This:
// record longest and shortest alternatives
if ((len&1)==preferShort){
if (len <= bestLen[preferShort]) {
bestLen[preferShort]=len;
bestResponse[preferShort]=response;
}
} else {
if (len >= bestLen[preferLong]) {
bestLen[preferLong]=len;
bestResponse[preferLong]=response;
}
}
}
if (bestLen[preferShort] < 100) {
response=bestResponse[preferShort];
goto OK;
}
if (bestLen[preferLong] > 0) {
response=bestResponse[preferLong];
goto OK;
}
// Bluffing is not allowed, so we have to give up:
return 0;
OK:
newGhostString[newGhostLen-1]=response;
newGhostString[newGhostLen]=0;
return FindFirst(newGhostString,newGhostLen,dict);
}
PlayMode1
static charPtr* PlayMode1(
int newGhostLen,
char newGhostString[256],
ulong & response)
{
// returns wordInMind
// mode 1 (add at beginning) can be handled by mode 0 logic
// after all strings are reversed.
if (newGhostLen>2)
ReverseString(newGhostString,newGhostLen-1);
charPtr* wordInMind =
PlayMode0(newGhostLen,newGhostString,gInverseDict,response);
if (wordInMind)
{
// have to reverse wordInMind string and ghost string to normal
char buffer[32];
strcpy(buffer,*wordInMind);
int wordLen=strlen(buffer);
ReverseString(buffer,wordLen);
if (newGhostLen>1)
ReverseString(newGhostString,newGhostLen);
// and then find actual word in proper dictionary
wordInMind=FindFirst(buffer,wordLen,gDictWords);
return wordInMind;
}
return 0;
}
PlayMode23
static charPtr* PlayMode23(
int newGhostLen,
char newGhostString[256],
ulong& response)
{
// returns wordInMind
response=Choose23(newGhostString,newGhostLen);
if (response) {
char c=response & 0xFF;
int insert=response>>8;
char newGhost[32];
Expand(newGhost,newGhostString,newGhostLen,insert);
newGhost[insert]=c;
strcpy(newGhostString,newGhost);
charPtr wordInMind=ShortestWord(newGhost,gGameType);
// find the chosen word_in_mind in the dictionary
charPtr* dictWord=IsinDict(wordInMind);
return dictWord;
}
return 0;
}
MakeReport
static int MakeReport(
int ghostLen,
char newGhostString[256],
charPtr* wordInMind,
int charPositions[256])
{
// Returns wordInMindIndex for the actual word chosen
// and computes charPositions from the word, according to mode
int wordLen=strlen(*wordInMind);
switch (gGameType) {
case addToEndOnly:
for (int i=0;i<ghostLen;i++)
charPositions[i]=i;
break;
case addToBeginningOnly:
for (int i=0;i<ghostLen;i++)
charPositions[i]=wordLen-ghostLen+i;
break;
case addToBeginningOrEnd:
{
char* start=strstr(*wordInMind,newGhostString);
assert(start);
for (int i=0;i<ghostLen;i++)
charPositions[i]=start-*wordInMind+i;
}
break;
case addAnywhere:
GetCharPositions(
newGhostString,ghostLen,*wordInMind,charPositions);
}
return wordInMind-gDictWords;
}
/***************** Non-static Functions ********************/
InitSuperDuperGhost
void InitSuperDuperGhost(
const char *dictWords[], /* alphabetically sorted uppercase dictionary words */
long numDictionaryWords /* number of null-terminated words in dictionary */
) {
gDictWords=dictWords;
gNumWords=numDictionaryWords;
ClearCaches();
}
NewGhostGame
void NewGhostGame(
GameType theGameType
) {
if (gGameType != theGameType) {
gGameType=theGameType;
choices=cache[theGameType];
if (gGameType == addToBeginningOnly)
MakeInverseDict();
else if (gGameType > addToBeginningOnly)
InitDisambiguator(gDictWords,gNumWords);
}
}
PlayGhost
void PlayGhost(
const char *ghostString, /* the string so far, null-terminated */
char newGhostString[256], /* new ghostString, one letter added to ghostString */
int *wordInMindIndex, /* your string will match dictWords[wordInMindIndex] */
int charPositions[256]
/* index into dictWords[wordInMindIndex] for each char in newGhostString */
) {
int newGhostLen=strlen(ghostString)+1;
strcpy(newGhostString,ghostString);
newGhostString[newGhostLen]=0;
charPtr* wordInMind=0;
// check cached choices for very short ghosts.
ulong response=0;
if (newGhostLen<=2) {
if (newGhostLen<2) {
wordInMind=choices[0].wordInMind;
if (wordInMind) {
response=choices[0].response;
newGhostString[0]=response;
goto succeed;
}
} else {
wordInMind=choices[31&ghostString[0]].wordInMind;
if (wordInMind) {
response=choices[31&ghostString[0]].response;
if (response>>8) {
newGhostString[1]=response; // append
} else {
newGhostString[0]=response; // insert
newGhostString[1]=ghostString[0];
}
goto succeed;
}
}
}
switch (gGameType) {
case addToEndOnly:
wordInMind=PlayMode0(
newGhostLen,newGhostString,gDictWords,response);
if (wordInMind) break;
else goto fail;
case addToBeginningOnly:
wordInMind=PlayMode1(newGhostLen,newGhostString,response);
if (wordInMind) break;
else goto fail;
case addToBeginningOrEnd:
case addAnywhere:
wordInMind=PlayMode23(newGhostLen,newGhostString,response);
if (wordInMind) break;
else goto fail;
}
// remember choice in cache
if (newGhostLen<=2) {
if (newGhostLen<2){
choices[0].response=response;
choices[0].wordInMind=wordInMind;
} else {
if (gGameType==addToEndOnly) // fixup insertion point
response |= (1<<8);
choices[31&ghostString[0]].response=response;
choices[31&ghostString[0]].wordInMind=wordInMind;
}
}
succeed:
*wordInMindIndex=MakeReport(newGhostLen,
newGhostString,wordInMind,charPositions);
return;
fail:
newGhostString[0]=0;
}
TermSuperDuperGhost
void TermSuperDuperGhost(void)
{
DeleteInverseDict();
TermDisambiguator();
}
newdisambig-v2.cp
#include "newdisambig.h"
#include "myString.h"
#define NDEBUG
#include <assert.h>
/*** Function prototypes ***/
static ulong MakeSubDigest(CCC* xString,char sig[]);
static bool MatchFirstN(CCC* x,char* findString,int minLen,int group);
static bool MatchLastN(CCC* x,char* findString,int minLen,int group);
static bool MatchMiddleN(CCC* x,char* findString,int minLen,int group);
static bool MatchAnyN(CCC* x,char* findString,int minLen,int group);
/*** Static allocations ***/
// charTable serves double duty:
// in parsing, it helps separate wild cards.
// in digest forming, it provides a sort order.
static char charTable[128] = {
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 2, 3, 4, 5, 6, 7, 8, 9,10, 0, 0, 0, 0, 0, 0,
0,25,12,19,18,28,10,16,13,27, 4, 7,20,14,23,21,
15, 3,24,26,22,17, 9, 8, 5,11, 6, 0, 0, 0, 0, 1,
0,25,12,19,18,28,10,16,13,27, 4, 7,20,14,23,21,
15, 3,24,26,22,17, 9, 8, 5,11, 6, 0, 0, 0, 0, 0
};
typedef bool (*StringMatch)(CCC* x,char* findString,int minLen,int group);
static StringMatch stringMatch[4]={
MatchFirstN,
MatchLastN,
MatchMiddleN,
MatchAnyN
};
// The table LSB stores the position of the least
// significant '1' bit in a byte (range 1 to 8),
// a zero-byte reports 0.
// This table is built from nested #defines
#define T1 1
#define T2 T1,2,T1
#define T3 T2,3,T2
#define T4 T3,4,T3
#define T5 T4,5,T4
#define T6 T5,6,T5
#define T7 T6,7,T6
#define T8 T7,8,T7
static char LSB[256]={0,T8};
static CCC** gWordList;
static ulong gNumWords;
#define MIN(a,b) (a&b)
// HASH is just shorthand for charTable.
#define HASH(c) (charTable[c])
struct WordIndex
// The class WordIndex holds a pointer to a word from
// wordList and computes digests and signature for it.
struct WordIndex {
ulong digest1;
CCC* word;
void Init(CCC* wordx) {
CCC* wp=wordx;
int s=HASH(*wp);
ulong dig1=1L<<s;
for (;;) {
int c;
if (0==(c=*++wp)) break;
s=HASH(c);
ulong bit=1L<<s;
dig1 |= bit;
}
digest1=dig1;
word=wordx;
}
void GetSignature(char* sig) {
// computes its signature, i.e. converts the digest to a string
ulong dig1=digest1>>1;
char c=1;
while (dig1){
if (dig1 & 1)
*sig++=c;
c++; dig1 >>=1;
}
*sig=0;
return;
}
CCC* Word(){return word;}
};
struct Page
// The class Page holds indices of 32 words in wordList
// which are of the same length (words >= 31 together).
// Page also contains both word digests for each word,
// in the bits[] array, stored in signature oriented columns.
// A page provides string matching for the 32 words it owns.
struct Page{
CCC* word[32];
ulong fill;
Page* next;
ulong pageDigest1;
ulong bits[32];
void Init(Page* following) {
// clears all and sets linkage.
// memory may already be precleared, but not necessarily
// since pages may overlay the temporary wordIndex
memset(this,0,sizeof(Page));
next=following;
}
int IsFull() {return (fill>=32);}
void Add(WordIndex* wip) {
// Adds one word to a page,
// ORs the word digests into the page digests,
// Also ORs the horizontal bit slice representing word digests
// into the bits array.
char sig[64];
ulong curbit=1L<<fill;
word[fill++]=wip->word;
pageDigest1 |= wip->digest1;
wip->GetSignature(sig);
int c;
char* sigp=sig;
while (0 !=(c=*sigp++)) {
bits[c] |= curbit;
}
}
ulong Match(char sig[])
// Accumulates vertical bit slices from a given signature
// Returns a bit map of likely candidate words
{
int c=sig[0];
ulong acc=bits[c];
while ((acc) && (0 != (c=*++sig))) {
acc &= bits[c];
}
return acc;
}
CCC* GetFirst(
StringMatch matchFun,
ulong acc,
char* findString,
int minLen,
int group)
{
CCC** wp=word-1;
do {
ulong accLo=acc & 0xFF;
if (accLo) {
int j=LSB[accLo];
acc>>=j;
wp+=j;
if (matchFun(*wp,findString,minLen,group))
return *wp;
} else {
acc>>=8;
wp+=8;
}
} while (acc);
return 0;
}
};
struct PageIndex
// The class PageIndex contains a pointer to a page, and
// keeps a copy of the page digest1.
// During the scan, PageIndex provides a screening function
// to eliminate unnecessary page accesses if digest1 can
// already rule out all words on a page.
struct PageIndex {
ulong digest1;
Page* page;
ulong Init(Page* thePage) {
digest1=thePage->pageDigest1;
page=thePage;
return thePage->fill;
}
Page* Screen(ulong findDigest1) {
if ((findDigest1 ^ (findDigest1 & digest1)))
return 0;
return page;
}
};
struct PrivateData
// The class PrivateData is the main structure mapped into the
// private memory space allocated on the heap.
// The pageGroup[] array holds pointers to linked lists of
// pages, according to word length.
// Once all pages are assembled, the page addresses are remapped
// into a linear page index, sorted by ascending word length
// The indexGroup[i] array points to the the first page of each
// group of pages of a given word length i.
static struct PrivateData {
// Page* nextPage; overlay on unused pageGroup[0]
#define nextPage pageGroup[0]
Page* pageGroup[32];
// PageIndex* endOfPageIndex; overlay on indexGroup[0]
#define endOfPageIndex indexGroup[0]
PageIndex* indexGroup[32];
long storageSize;
int bottom[1];
void Init(CCC* wordList[],ulong numWords,long minimumStorage)
{
gWordList=wordList;
gNumWords=numWords;
storageSize=minimumStorage;
WordIndex* wip;
// Initialize word index
WordIndex* wordIndexList=(WordIndex*)bottom;
for (int n=0;n<numWords;n++)
wordIndexList[n].Init(gWordList[n]);
// Unload word index and build pages in linked lists
// based on word length
nextPage=(Page*)this + storageSize/sizeof(Page) - 1;
wip=wordIndexList+numWords;
while (wip>wordIndexList) {
wip-;
if (0==InsertWordInPage(wip))
return;
}
// Map linked lists to a linear index of pages by length
if (0==BuildIndex()) {
nextPage=NULL; // not enough storage
return;
}
}
int InsertWordInPage(WordIndex* wip) {
// Inserts one word in a page, opens a new page if
// none exists or if the current page is full.
int len=strlen(wip->Word());
if (len==0) return 1; // ignore 0-length words
Page* page=pageGroup[len];
if ((page==0) || (page->IsFull())) { Page* temp=page;
page=nextPage-;
// test if the bottom of the growing page array collides
// with the top of the shrinking word index array
if (page <= (Page*)wip) {
// not enough storage, we have to bail out
nextPage=NULL;
return 0;
}
page->Init(temp);
pageGroup[len]=page;
}
page->Add(wip);
return 1;
}
PageIndex* BuildIndex() {
// Builds the page index, starting at this->bottom,
// overwriting storage previously used by word index.
PageIndex* pi=(PageIndex*)bottom;
PageIndex* piTop=(PageIndex*)nextPage;
for (int len=1;len<32;len++) {
Page* page=pageGroup[len];
indexGroup[len]=pi;
int num=0;
while (page) {
if (pi>=piTop)
return 0;
num+=pi++->Init(page);
page=page->next;
}
}
return (endOfPageIndex=pi);
}
CCC* GetShortestWord(
char* findString,
int mode)
{
// Return the shortest word compatible with findstring and mode
char sig[64];
ulong findDigest1=MakeSubDigest(findString,sig);
int minLen=strlen(findString);
for (int group=minLen;group<32;group++)
{
PageIndex* pi=indexGroup[group];
PageIndex* endGroup=
(group<31?indexGroup[group+1]:endOfPageIndex);
for (;pi<endGroup;pi++) {
Page* page=pi->Screen(findDigest1);
if (page) {
ulong acc=page->Match(sig);
if (acc) {
CCC* matchWord=page->GetFirst(
stringMatch[mode&3],
acc,findString,minLen,group);
if (matchWord)
return matchWord;
}
}
}
}
return 0;
}
int GetShortestMatches(
char* findString,
int minLen,
int mode,
int shortest[2])
{
// Finds the shortest word matching findString and mode.
// Returns the length of this word, or 0 if none found
// Only check words of even (odd) length
char sig[64];
ulong findDigest1=MakeSubDigest(findString,sig);
int step=1,numFound=0;
for (int group=minLen;group<32;group+=step) {
PageIndex* pi=indexGroup[group];
PageIndex* endGroup=
(group<31?indexGroup[group+1]:endOfPageIndex);
for (;pi<endGroup;pi++) {
Page* page=pi->Screen(findDigest1);
if (page) {
ulong acc=page->Match(sig);
if (acc) {
CCC* matchWord=page->GetFirst(
stringMatch[mode&3],
acc,findString,minLen,group);
if (matchWord) {
shortest[numFound++]=group;
// return as soon as two matches are found.
if (numFound >= 2)
return numFound;
// if this was the first match in an even (odd) group,
// we continue with the odd (even) groups only.
group-;
step=2;
break;
}
}
}
}
}
return numFound;
}
}* PD;
//*************** external access functions (API) **************
InitDisambiguator
void InitDisambiguator(
CCC * wordList[],
ulong numWords ){
// Sets up the private data structure
if (PD) // already done
return;
long wordIndexStore=numWords*sizeof(WordIndex);
long pageStore=(31+numWords/32)*sizeof(Page);
long pageIndexStore=pageStore * sizeof(PageIndex) / sizeof(Page);
long minimumStorage=
sizeof(*PD) +
(wordIndexStore>pageStore+pageIndexStore?
wordIndexStore:pageStore+pageIndexStore);
PD=(PrivateData*)(new char[minimumStorage]);
memset(PD,0,minimumStorage);
PD->Init(wordList,numWords,minimumStorage);
}
ShortestMatches
int /*number found*/ ShortestMatches(
// Finds the shortest odd and even words matching findString
// Returns only the lengths of the 0, 1, or 2 words
char *findString,
int len,
int mode,
int shortest[2]
) {
assert(PD);
int numFound=
PD->GetShortestMatches(findString,len,mode,shortest);
return numFound;
}
ShortestWord
CCC* ShortestWord(char *findString,int mode)
{
return PD->GetShortestWord(findString,mode);
}
TermDisambiguator
void TermDisambiguator(void)
{
if (PD)
{
delete [] PD;// was allocated with new[]
PD=0;
}
}
//******* Static Functions **********
MatchFirstN
static bool MatchFirstN(
CCC* x,
char* findString,
int minLen,
int group) {
#pragma unused(group)
// match of x against findString, first minLen letters
return (0==strncmp(x,findString,minLen));
}
MatchLastN
static bool MatchLastN(
CCC* x,
char* findString,
int minLen,
int group) {
// match of x against findString, last minLen letters
return (0==strcmp(x+group-minLen,findString));
}
MatchMiddleN
static bool MatchMiddleN(
CCC* x,
char* findString,
int minLen,
int group) {
// match of x against findString, any consecutive minLen letters
for (int d=0;d<=group-minLen;d++) {
if (0==strncmp(x+d,findString,minLen))
return true;
}
return false;
}
MatchAnyN
static bool MatchAnyN(
CCC* x,
char* findString,
int minLen,
int group) {
// match of x against findString, any minLen letters in order
#pragma unused(group)
for (int i=0;i<minLen;i++) {
char c=*findString;
char d;
while (0 != (d=*x++)) {
if (c==d) break;
}
if (d==0)
return false;
findString++;
}
return true;
}
MakeSubDigest
static ulong MakeSubDigest(
CCC* xString,
char sig[]) {
// Creates a pair of word digests and a signature from
// findString. No duplicate signature characters.
int s=HASH(xString[0]);
ulong digest1=1L<<s;
for (;;) {
int c;
if (0==(c=*++xString)) break;
s=HASH(c);
ulong bit=1L<<s;
if (0==(digest1 & bit)) {
digest1 |= bit;
}
}
// Make the signature in bit order, that is with the
// less frequent English letters first, so that page
// scanning will fail as soon as possible if no word
// in the page will match.
ulong bit1=2;
ulong single=digest1;
int s1;
for (s1=1;s1<=28;s1++) {
if (single & bit1) *sig++=s1;
bit1 += bit1;
}
*sig++=0;
return digest1;
}
mystring.h
#if defined(__cplusplus)
extern "C" {
#endif
#ifndef MYSTRING_H
#define MYSTRING_H
typedef unsigned long ulong;
StrCmp4
inline long StrCmp4(const char* a,const char* b)
// Fast comparator for words of minimum length 4
{
const ulong* wa=(ulong*)(a);
const ulong* wb=(ulong*)(b);
ulong ca=*wa;
ulong cb=*wb;
long d=ca-cb;
if (d) return d;
a+=3,b+=3;
do// (;;)
{
ca=*++a;
cb=*++b;
d=ca-cb;
//if (d) break;// words differ
//if (ca==0) break;// words same, and NULL reached
} while (ca && (d==0));
return d;
}
StrCmpN4
inline long StrCmpN4(const char* a,const char* b,ulong n)
// Fast comparator for words of minimum length 4, up to n letters
{
const ulong* wa=(ulong*)(a);
const ulong* wb=(ulong*)(b);
ulong ca=*wa;
ulong cb=*wb;
long d=ca-cb;
if (d) return d;
a+=3,b+=3;
for (int i=4;i<n;i++)
{
ca=*++a;
cb=*++b;
d=ca-cb;
if (d) break;// words differ
if (ca==0) break;// words same, and NULL reached
}
return d;
}
StrLen4
inline ulong StrLen4(const char* w)
{
ulong len=4;
w+=3;
while (*++w) len++;
return len;
}
#endif
#if defined(__cplusplus)
}
#endif