Dec 95 Challenge
	
| Volume Number: |  | 11 | 
| Issue Number: |  | 12 | 
| Column Tag: |  | Programmers Challenge | 
Programmers Challenge	 
	
By Bob Boonstra, Westford, Massachusetts
 Note:  Source code files accompanying article are located on MacTech CD-ROM or source code disks.
 Note:  Source code files accompanying article are located on MacTech CD-ROM or source code disks.
Find Again And Again 
This month the Challenge is to write a text search engine that is optimized to operate repeatedly on the same text.  You will be given a block of text, some storage for data structures, and an opportunity to analyze the text before being asked to perform any searches against that text.  Then you will repeatedly be asked to find a specific occurrence of a given word in that block of text.  The prototypes for the code you should write are:
void InitFind(
 char *textToSearch, /* find words in this block of text  */
 long textLength,/* number of chars in textToSearch   */
 void *privateStorage,    /* storage for your use              */
 long storageSize/* number of bytes in privateStorage */
);
long FindWordOccurrence( 
    /* return offset of wordToFind in textToSearch   */
 char *wordToFind, /* find this word in textToSearch    */
 long wordLength,/* number of chars in wordToFind     */
 long occurrenceToFind, /* find this instance of wordToFind  */
 char *textToSearch, /* same parameter passed to InitFind */
 long textLength,/* same parameter passed to InitFind */
 void *privateStorage,  /* same parameter passed to InitFind */
 long storageSize/* same parameter passed to InitFind */
);
The InitFind routine will be called once for a given block of textLength characters at textToSearch to allow you to analyze the text, create data structures, and store them in privateStorage.  When InitFind is called, storageSize bytes of memory at privateStorage will have been preallocated and initialized to zero.  
FindWordOccurrence is to search for words, where a word is defined as a continuous sequence of alphanumeric characters delimited by a non-alphanumeric character (e.g., space, tab, punctuation, hyphen, CR, NL, or other special character).  Your code should look for complete words - it would be incorrect, for example, to return a value pointing to the word these if the wordToFind was the.  The wordToFind will be a legal word (i.e., no embedded delimiters).  FindWordOccurrence should return the offset in textToSearch of the occurrenceToFind-th instance of wordToFind.  It should return -1 if wordToFind does not occur in textToSearch, or if there are fewer than occurrenceToFind instances of wordToFind.
Both the InitFind and the FindWordOccurrence routines will be timed in determining the winner.  In designing your code, you should assume that FindWordOccurrence will be called approximately 1000 times for each call to InitFind (with the same textToSearch, but possibly differing values of wordToFind and occurrenceToFind).  
There is no predefined limit on textLength - you should handle text of arbitrary length.  The amount of privateStorage available could be very large, but is guaranteed to be at least 64K bytes.  While the test cases will include at least one large textToSearch with a small storageSize, most test cases will provide at least 32 bytes for each occurrence of a word in textToSearch, so you might want to optimize for that condition.
Other fine print: you may not change the input pointed to by textToSearch or wordToFind, and you should not use any static storage other than that provided in privateStorage.  
This will be a native PowerPC Challenge, scored using the latest CodeWarrior compiler.  Good luck, and happy searching.
Programmers Challenge Mailing List
We are pleased to announce the creation of the Programmers Challenge Mailing List.  The list will be used to distribute the latest Challenge, provide answers to questions about the current Challenge, and discuss suggestions for future Challenges.  The Challenge problem will be posted to the list each month, sometime between the 20th and the 25th of the month.  This should alleviate problems caused by variations in the publication and mailing date of the magazine, and provide a predictable amount of time to work on each Challenge.
To subscribe to the list, send a message to autoshare@mactech.com with the SUBJECT line sub challenge YourName, substituting your real name for YourName.  To unsubscribe from the list, send a message to autoshare@mactech.com with the SUBJECT line unsub challenge.
Note: the list server, autoshare, is set to accept commands in the SUBJECT line, not the body of the message.  If you have any problems, please contact online@mactech.com.
Two Months Ago Winner
The Master Mindreader Challenge inspired ten readers to enter, and all ten solutions gave correct results.  Congratulations to Xan Gregg (Durham, N.C.) for producing the fastest entry and winning the Challenge.
The problem required you to write code that would correctly guess a sequence of colors using a callback routine provided in the problem statement that returned two values for each guess: the number of elements of the guess where the correct color is located in the correct place in the sequence, and the number of elements where the correct color is in an incorrect place in the sequence.  The number of guesses was not an explicit factor in determining the winner, but the time used by the callback routine was included in determining the winner.  Participants correctly noted that this made the relative execution time of the guessing routine and the callback routine a factor in designing a fast solution.  A couple of entries went so far as to offer their own, more efficient, callbacks.  Nice try, but I didnt use them - the callback in the problem was designed to provide a known time penalty for making a guess, and that was the callback I used in evaluating solutions.
The callback I supplied had one unanticipated side effect - it permitted callers to supply an out-of-range value for positions in the sequence that they didnt care about for that guess, and six of the entries took advantage of this loophole.  This wasnt what I had intended, and I gave some thought to giving priority to solutions that did not use the loophole.  In the end, however, I decided not to treat these entries any differently, because the solution statement permitted and provided a defined callback behavior for out-of-range guesses.  As it turned out, the winning entry and three of the fastest four entries did not use out-of-range guesses.
Xans winning code first makes a sequence of guesses to determine how many positions are set to each of the possible colors.  He then starts with an initial guess corresponding to these colors and begins swapping positions to determine how the number of correctly placed colors is affected.  Separate logic handles the cases where the number of correctly placed colors increased or decreased by 0, 1, or 2, all the while keeping track of which color possibilities have been eliminated for each position.  These and other details of Xans algorithm are documented in the comments to his code.
The table of results below indicates, in addition to execution time, the cumulative number of guesses used by each entry for all test cases.  In general, it shows the expected rough correlation between execution time and the number of guesses, with a significant exception for the second-place entry from Ernst Munter, which took significantly fewer guesses.  Ernst precalculated tables to define the guessing strategy for problems of length 5 or less and devised a technique for partitioning larger problems to use these tables.  Normally I try to discourage the use of extensive precalculated data, but I decided to allow this entry because the amount of data was not unreasonable, because the tables guided the algorithm but did not precalculate a solution, and because I thought the approach was innovative and interesting.  Although including the second-place entry in the article is not possible because of length restrictions, I have included the preamble from Ernsts solution describing his approach.
Here are the times and code sizes for each of the entries.  Numbers in parentheses after a persons name indicate that persons cumulative point total for all previous Challenges, not including this one.  
Name	time	guesses	code	data	out-of-range
					values used? 
Xan Gregg (61)	102	4123	1360	16	no
Ernst Munter (90)	109	2880	6264	5480	limited
Gustav Larsson (60)	116	3700	712	40	no
Greg Linden	127	5002	576	16	no
M. Panchenko (4)	146	5391	344	16	yes
Eric Lengyel (20)	176	6456	312	16	yes
Peter Hance	206	6557	336	16	yes
J. Vineyard (42)	228	9933	328	16	no
Ken Slezak (10)	251	6544	808	16	yes
Stefan Sinclair	259	11058	200	16	yes
Top 20 Contestants of All Time
Here are the Top 20 Contestants for the Programmers Challenges to date.  The numbers below include points awarded for this months entrants.  (Note: ties are listed alphabetically by last name - there are more than 20 people listed this month because of ties.)
	Rank	Name	Points
	1.	[Name deleted]	176
	2.	Munter, Ernst	100
	3.	Gregg, Xan	81
	4.	Karsh, Bill	78
	5.	Larsson, Gustav	67
	6.	Stenger, Allen	65
	7.	Riha, Stepan	51
	8.	Goebel, James	49
	9.	Nepsund, Ronald	47
	10.	Cutts, Kevin	46
	11.	Mallett, Jeff	44
	12.	Kasparian, Raffi	42
	13.	Vineyard, Jeremy	42
	14.	Darrah, Dave	31
	15.	Landry, Larry	29
	16.	Elwertowski, Tom	24
	17.	Lee, Johnny	22
	18.	Noll, Robert	22
	19.	Anderson, Troy	20
	20.	Beith, Gary	20
	21.	Burgoyne, Nick	20
	22.	Galway, Will	20
	23.	Israelson, Steve	20
	24.	Landweber, Greg	20
	25.	Lengyel, Eric	20
	26.	Pinkerton, Tom	20
There are three ways to earn points: (1) scoring in the top 5 of any Challenge, (2) being the first person to find a bug in a published winning solution or, (3) being the first person to suggest a Challenge that I use.  The points you can win are:
1st place	20 points
2nd place	10 points
3rd place	7 points
4th place	4 points
5th place	2 points
finding bug	2 points
suggesting Challenge	2 points
Here is Xans winning solution:
MindReader
By Xan Gregg,Durham, N.C.
/*  
  I try to minimize the number of guesses without adding too much complexity to the
  code.  First I figure out how many of each color are present in the answer by
  essentially repeatedly guessing all of each color.
  
  Then I figure out the correct positions one at a time starting at slot 0.  I exchange it
  with each other slot (one at a time) until the correct color is found.  When there is a
  change in the numCorrect response from checkGuess I can tell which of the two
  slots caused the change by looking at my remembered information or, if necessary,
  by performing a second guess with one of the colors in both slots.
  
  The remembered information includes keeping track of colors that were
  determined (via the numCorrectchange) to be wrong before and/or a swap is made. 
  This doesnt help out too often, but it doesnt take much time to record compared to
  calling checkGuess.
  
  While the outer loop determines the color of each slot left-to-right (0 to n-1), I
  found that indexing the inner loop right-to-left instead of left-to-right increased the
  speed by 30% - 40%.  I wish I understood why!
  
  Oddly, the checkGuess function spends most of its time figuring out the numWrong
  value, which we generally ignore.
*/
typedef void (*CheckGuessProcPtr)(
        unsigned char  *theGuess,
        unsigned short *numInCorrectPos,
        unsigned short *numInWrongPos);
#define kMaxLength 16
#define Bit(color) (1L << (long) (color))
MindReader
void MindReader(unsigned char guess[],
    CheckGuessProcPtr checkGuess,
        unsigned short answerLength,
        unsigned short numColors)
{
  long    prevColorsFound;
  long    colorsFound;
  long    curColor;
  long    i, j;
  long    curCorrect;
  long    numOfColor[kMaxLength + 1];  /* 1-based */
  Boolean isCorrect[kMaxLength];
  long    possibilities[kMaxLength];   /* bit fields */
  long    colorBit1;
  long    colorBit2;
  char    color1;
  char    color2;
  long    delta;
  unsigned short  newCorrect;
  unsigned short  newWrong;
  
  /* first find the correct set of colors */
  colorsFound = 0;
  curColor = 1;
  while (colorsFound < answerLength)
   {
    for (i = colorsFound; i < answerLength; i++)
      guess[i] = curColor;
    (*checkGuess)(guess, &newCorrect, &newWrong);
    prevColorsFound = colorsFound;
    colorsFound = newCorrect + newWrong;
    numOfColor[curColor] = colorsFound - prevColorsFound;
    curColor++;
   }
  
  /* now work on the order */
  for (i = 0; i < answerLength; i++)
   {
    isCorrect[i] = false;
    possibilities[i] = -1;  /* all colors */
   }
  curCorrect = newCorrect;
  /* step through every slot, starting at 0 */
  for (i = 0; curCorrect < answerLength; i++)
   {
    if (isCorrect[i])
      continue;
    color1 = guess[i];
    colorBit1 = Bit(color1);
    /* try swapping slot i with every other open */
    /* slot, starting with the last one */
    j = answerLength;
    nextSubSlot:
    j--;
    if (guess[i] == guess[j])
      goto nextSubSlot;
    if (isCorrect[j])
      goto nextSubSlot;
    color2 = guess[j];
    colorBit2 = Bit(color2);
    if ((possibilities[i] & colorBit2) == 0)
      goto nextSubSlot;  /* no hope here */
    /* swap slots i & j and check result */
    guess[i] = color2;
    guess[j] = color1;
    (*checkGuess)(guess, &newCorrect, &newWrong);
    delta = newCorrect - curCorrect;
    if (delta >= 0)
      if (delta == 0)
       {  /* either both are incorrect OR */
                           /* one is correct and answer[i]==answer[j] */
        guess[i] = color1;
        guess[j] = color2;
        if (numOfColor[color1] == 1)
         {  /* color1 cant be in both places */
          possibilities[i] &= ~colorBit1;
          possibilities[j] &= ~colorBit1;
         }
        if (numOfColor[color2] == 1)
         {  /* color2 cant be in both places */
          possibilities[i] &= ~colorBit2;
          possibilities[j] &= ~colorBit2;
         }
       }
      else if (delta == 1)
       {  /* both were wrong, now one is correct */
                        /* find out which is correct */
        curCorrect = newCorrect;
        if ((possibilities[j] & colorBit1) == 0)
         {  /* i must be color2 */
          possibilities[j] &= ~colorBit2;
          numOfColor[color2] -= 1;
          goto nextSlot;
         }
        else if ((possibilities[i] & colorBit2) == 0)
         {  /* j must be color1 */
          isCorrect[j] = true;
          possibilities[i] &= ~colorBit1;
          numOfColor[color1] -= 1;
          color1 = color2;
          colorBit1 = colorBit2;
         }
        else
         {  /* well have to make another guess to */
                        /* see which is correct */
          guess[i] = color1;
          (*checkGuess)(guess, &newCorrect, &newWrong);
          if (newCorrect == curCorrect)
           {  /* j must be color1 */
            possibilities[i] &=
                  (~(colorBit1 | colorBit2));
            isCorrect[j] = true;
            guess[i] = color2;
            numOfColor[color1] -= 1;
            color1 = color2;
            colorBit1 = colorBit2;
           }
          else
           {  /* i must be color2 */
            possibilities[j] &=
                  (~(colorBit1 | colorBit2));
            guess[i] = color2;
            numOfColor[color2] -= 1;
            goto nextSlot;
           }
         }
       }
      else  /* delta == 2 */
       {  /* both were wrong, now both correct */
        isCorrect[j] = true;
        numOfColor[color1] -= 1;
        numOfColor[color2] -= 1;
        curCorrect = newCorrect;
        goto nextSlot;
       }
    else  /* delta < 0 */
      if (delta == -1)
       {  /* one was correct before swap, now neither is */
        guess[i] = color1;
        guess[j] = color2;
        if ((possibilities[i] & colorBit1) == 0)
         {  /* color2 in slot j was correct */
          isCorrect[j] = true;
          numOfColor[color2] -= 1;
          possibilities[i] &= ~colorBit2;
         }
        else if ((possibilities[j] & colorBit2) == 0)
         {  /* color1 in slot i was correct */
          possibilities[j] &= ~colorBit1;
          numOfColor[color1] -= 1;
          goto nextSlot;
         }
        else
         {  /* well have to make another guess to */
                        /* see which was correct */
          guess[j] = color1;
          (*checkGuess)(guess, &newCorrect, &newWrong);
          if (newCorrect == curCorrect)
           {  /* color1 in slot i was correct */
            possibilities[j] &=
                  (~(colorBit1 | colorBit2));
            guess[j] = color2;
            numOfColor[color1] -= 1;
            goto nextSlot;
           }
          else
           {  /* color2 in slot j was correct */
            possibilities[i] &=
                  (~(colorBit1 | colorBit2));
            guess[j] = color2;
            isCorrect[j] = true;
            numOfColor[color2] -= 1;
           }
         }
       }
      else  /* delta == -2 */
       {  /* both were already correct */
        guess[i] = color1;
        guess[j] = color2;
        isCorrect[j] = true;
        numOfColor[color1] -= 1;
        numOfColor[color2] -= 1;
        goto nextSlot;
       }
    goto nextSubSlot;
    nextSlot: ;
   }
  done: ;
}
Alternative Approach (Description Only)
Copyright 1995, Ernst Munter, Kanata, ON, Canada.
/*
  Problem:
    Find the value of a multidigit code, by a question and answer method.  Each
    question is a guess of the code, the answer is the number of digits that are correct,
    reported as either in correct or wrong positions.
    The challenge is to minimize total time, that is in the first order, keep the number of
    guesses small, since the time to check the guess is included in total time.  But
    spending too much time minimizing the number of guesses is counterproductive.
  Assumptions:
    1. It is OK to guess a color that is not within the range 1 to numColors.  It will not
        be correct or wrong, but it will also not corrupt the CheckGuess function.
    2. The opponent will call with randomly generated correctAnswer codes, and not
        try to defeat the MindReader by learning the solution strategy.
    3. The objective is not to be a true Mindreader, as this could be done by reading into
         the (*checkGuess) code, the address of which is handy.  One would then
         disassemble PowerPC instructions to discover the hidden address of
         correctAnswer.
  Solution:
    It is relatively simple to manually construct solution trees for small N
    (N=answerLength), and make them into a lookup table.
    I have made a table for N=4, and hardcoded the trees for N=2 and N=3.
    The table for N=5 was too large to be done easily by hand, and I wrote a Tree
    Builder program to construct its 246 nodes.  I then hand tuned the 2 smallest parts
    of it.
    I felt, a 246 node tree is about at the limit of what might be tolerable in a static
    array.  The tree for N=6 would have 1400 or so nodes.  There are diminishing
    returns.  Adding the N=5 tree improved the higher splits (2 or 3 splits instead of 3 or
    4), but gained only a few percentage points on the callBack frequency overall;
    To keep the trees manageable, the permutation patterns and the color schemes are
    normalized.
    Now the details:
    Even if numColors > N, there can be at most N distinct colors in the answer, for
    example 5, if answerLength=5.
    And we can arrange a color mapping so that all colors are refered to by index 1, 2,
    3, etc, with the most frequently occurring color labeled #1.
    For N=5, this reduces the possible answers to 7 color schemes, 11111, 11112,
    11122, 11123, 11223, 11234, 12345.
    To solve for N<=5, the function ProcessSlice() only needs the color mapping, and
    a list of the colors, suitably sorted.
    For example, the real answer 73646 can be solved by walking the solution tree in
    4 steps, given the color list 6,3,4,7 and the pattern to be found is 42131.  The
    pattern at the root of the tree T11234, is 11234.
    To obtain the pattern information, I scan the answer with successive guesses
    (somewhat optimized for answer lengths of 2 to 4, to eliminate some obviously
    unneeded calls to checkGuess).  The basic idea is:
    correctAnswer
    7 3 6 4 6
    Six or seven calls to checkGuess, to build the color and color-frequency lists:
    guess       correct wrong   yetToFind colorList
    1 1 1 1 1   0       0       5         -
    2 2 2 2 2   0       0       5         -
    3 3 3 3 3   1       0       4         3
    4 4 4 4 4   1       0       3         3,4
    5 5 5 5 5   0       0       3         3,4
    6 6 6 6 6   2       0       1         6,3,4
    7 7 7 7 7   1       0       0         6,3,4,7
    The last call back is avoided if the color==numColors occurs in the code.
    Then, using the tree, the correct answer is found with four more calls to checkGuess:
                                 goal   42131
    6 6 3 4 7   1       x   tree code   11234
    6 3 6 7 4   2       x               12143
    4 3 6 6 7   2       x               32114
    4 6 6 7 3   1       x               31142
    7 3 6 4 6                           42131 (no other choice)
    This results in a total of 10 or 11 calls or less to the checkGuess function.
    On average, 10 calls are needed to solve 5-wide answers, when numColors is
    randomly set to a value from 1 to 16.
    For N>5, the size of tree grows very rapidly.  So I decided to split the answer into
    multiple slices, and treat each as separate problems of width 3, 4, or 5:
    6 = 3 + 3
    7 = 4 + 3
    8 = 4 + 4
    9 = 5 + 4
    10 = 5 + 5
    11 = 5 + 6 = 5 + (3 + 3)
    12 = 5 + 7 = 5 + (4 + 3)
    13 = 5 + 8 = 5 + (4 + 4)
    14 = 5 + 9 = 5 + (5 + 4)
    15 = 5 + 10 = 5 + (5 + 5)
    16 = 8 + 8 = (4 + 4) + (4 + 4)
    To create a split, we call checkGuess with guesses of a solid color for the left side,
    and 0s for the right. (e.g. first guess 1 1 1 1 0 0 0 0, to split 8).  As a result,
    correctPos gives the number of 1s in the left slice, and wrongPos, the number of 1s
    in the right slice.  If we get correctPos+wrongPos=4 as an answer, we must call
    again because there might be more than four 1s in the answer;  the guess 0 0 0 0 1 1
    1 1 will do it.
    Performance:
    Overall, I find an almost linear relationship between the total number of call backs
    (CB) and the value of answerLength (AL), approximately CB = AL * 1.26 + 2.84
    when numColors varies randomly from 1 to 16.
*/