Oct 95 Challenge

Volume Number:		11
Issue Number:		10
Column Tag:		Programmer’s Challenge

Programmer’s Challenge

By Bob Boonstra, Westford, Massachusetts

Note: Source code files accompanying article are located on MacTech CD-ROM or source code disks.

Master MindReader

This month’s Challenge, MindReader, was suggested by Carl Constantine (Victoria, British Columbia), who earns points for the suggestion. The problem is to write code that will guess a sequence of colors (represented by integers) known only to the caller. You will be provided with a callback routine that can be used to examine a guess and return two values: the number of elements of your guess where the correct color is located in the correct place in the sequence, and the number of elements where the correct color is in an incorrect place in the sequence. You may revise your guess and use the callback routine as many times as you wish.

The prototype of the code you must write is:

typedef void (*CheckGuessProcPtr)( /* callback routine */
 unsigned char  *theGuess,/* your guess to be checked */
 unsigned short *numInCorrectPos,  /* return value - number in              correct position */
 unsigned short *numInWrongPos);   /* return value - number in wrong position */
);

void MindReader(
 unsigned char  theAnswer[],/* preallocated storage to return the               sequence you guess 
*/
 CheckGuessProcPtr checkGuess,/* callback */
 unsigned short answerLength, /* length of theAnswer */
 unsigned short numColors /* colors are numbered 1..numColors */
);

You would invoke the callback using code something like this:

unsigned short correctPos,wrongPos;
unsigned char guess[kMaxLength];

(*checkGuess)(guess,&correctPos,&wrongPos);

Given inputs of:

correctAnswer[] = {1,3,4,3};
theGuess[]      = {3,4,4,3};

the call back would produce the following:

numInCorrectPos = 2;
numInWrongPos   = 1;

You may assume that answerLength and numColors will each be no larger than 16. The winning entry will be the one which correctly guesses the sequence in the minimum amount of time. The number of times you call the callback routine is not an explicit factor in determining the winner. However, to encourage you to guess efficiently, all execution time used by MindReader, including the time spent in the callback, is included in your time. The code for the callback will be very close to the following:

#define kMaxLength 16
static unsigned short answerLength;
static unsigned char correctAnswer[kMaxLength];
void CheckGuess(
 unsigned char  *theGuess, 
 unsigned short *numCorrectPosition,
 unsigned short *numWrongPosition
) {
unsigned short correctPosition=0, wrongPosition=0;
unsigned char  answerUsed[kMaxLength], guessUsed[kMaxLength];
register unsigned char *guessP = theGuess;
register unsigned char *correctP = correctAnswer;
register int i,j;
 
/* find correct position matches first */
 for (i=0; i<answerLength; ++i) {
 if (*guessP++ == *correctP++) {
 *(answerUsed+i) = *(guessUsed+i) = 1;
 ++correctPosition; /* increment number in correct position*/
 } else {
 *(answerUsed+i) = *(guessUsed+i) = 0;
 }
 };
 
/* find wrong position matches */
 guessP = theGuess;
 for (i=0; i<answerLength; ++i) {
 if (!*(guessUsed+i)) {
 register unsigned char *answerUsedP = answerUsed;
 correctP = correctAnswer;
 j = answerLength; do {
 if ((!*answerUsedP) && (*guessP == *correctP)) {
 *answerUsedP = 1;  
 ++wrongPosition; /* increment number in wrong position*/
 goto nextGuess;
 }
 ++correctP; ++answerUsedP;
 } while (--j);
 }
nextGuess:
 ++guessP;
 };

 *numCorrectPosition = correctPosition;
 *numWrongPosition = wrongPosition;
}

The target instruction set for this Challenge will be the PowerPC - I’ll be testing your native code on a 6100/80. This problem will be scored using Symantec C++ version 8.0.3, which Symantec generously provided for use in the Challenge. If you have any questions, please send them to me at one of the Programmer’s Challenge e-mail addresses, or directly to boonstra@ultranet.com.

Two Months Ago Winner

Three people were brave enough to attempt the Diff-Warrior Challenge. Unfortunately, probably due in large part to the complexity of the problem statement, none of them passed my first set of test cases, so it was necessary to relax the test. The winner is Ernst Munter (Kanata, ON), who submitted one of two entries that successfully completed the relaxed test suite. I’ll try to make future Challenges a little less difficult to understand and solve (MindReader should be a little easier). Then again, we don’t want them to become too easy - MacTech has increased the amount of the prize (see the Rules box), and we want you to earn it!

The problem was to generate a procedure for converting oldText into newText. Ernst uses a hash table to find words in oldText that might have been moved to new locations in newText. Blocks of contiguous words that do not have corresponding entries are marked for insertion or deletion, while blocks that do match are marked as text that has been moved, subject to a heuristic that tries to minimize the distance moved. See Ernst’s well-commented code for further insight into his approach.

Here are the time and code sizes for the two most correct entries. Numbers in parens after a person’s name indicate that person’s cumulative point total for all previous Challenges, not including this one.

Name time code

Ernst Munter (70) 55 6318

Ken Slezak 250 3114

Top 20 Contestants of All Time

Here are the Top 20 Contestants for the Programmer’s Challenges to date. The numbers below include points awarded for this month’s entrants. (Note: ties are listed alphabetically by last name; there are more than 20 people listed this month because of ties.)

1. [Name deleted] 176

2. Munter, Ernst 90

3. Karsh, Bill 78

4. Stenger, Allen 65

5. Larsson, Gustav 60

6. Gregg, Xan 51

7. Riha, Stepan 51

8. Goebel, James 49

9. Nepsund, Ronald 47

10. Cutts, Kevin 46

11. Mallett, Jeff 44

12. Kasparian, Raffi 42

13. Vineyard, Jeremy 42

14. Darrah, Dave 31

15. Landry, Larry 29

16. Elwertowski, Tom 24

17. Lee, Johnny 22

18. Noll, Robert 22

19. Anderson, Troy 20

20. Beith, Gary 20

21. Burgoyne, Nick 20

22. Galway, Will 20

23. Israelson, Steve 20

24. Landweber, Greg 20

25. Pinkerton, Tom 20

There are three ways to earn points: (1) scoring in the top 5 of any Challenge, (2) being the first person to find a bug in a published winning solution or, (3) being the first person to suggest a Challenge that I use. The points you can win are:

1st place 20 points

2nd place 10 points

3rd place 7 points

4th place 4 points

5th place 2 points

finding bug 2 points

suggesting Challenge 2 points

Here is Ernst’s winning solution:

FindWordDifferences.c

/*

  Given two texts, compute a series of insert/delete/move instructions that will
  describe the conversion of one text into the other.

  I parse the old text, and create a sequential word list.  The words are further linked
   into lists, accessed through a hash table.

  Then, I parse the second text, and for each word in the second text, I try to find a
  matching word in the first text.  The hashed table of lists helps me find the first
  matching word quickly.  I remove it from the list, and mark it for a potential move
  operation.

  Each matching word found in the first text may either remain in place, or must be
  moved.

  There exists an algorithm to determine the best set of words (most characters) to
  leave undisturbed, and so minimize the amount to be moved.  However, I did not
  have the patience to work this out fully.

  So, as a compromise solution, I just go sequentially through the texts together, and
  make an educated guess for each block of contiguous matching words, to decide
  whether to leave or move them.

  Any words encountered in the new text that are not found in the old text are
  immediately marked for “insert”. After the whole text is scanned, any words or blocks
  left behind in the hash lists, are destined for deletion. Before the parsing, a character
  by character comparison of the two texts is done from each end, to cut off all text
  which is equal and does not need any further analysis.  This may result in the 
  discovery that both texts are identical.  All other special cases, such as only a single 
  change at one end or in the middle of one of the texts, will be automatically handled.

  Before returning the DiffRecs to the caller, I analyze pairs of inserts and deletes to
  eliminate redundancies.

  The program will allocate a fair amount of memory to build the word lists in.  This  
  can be minimized by doing a word count first.  But I prefer to just provide a 
  conservative factor (4 chars per word, white space included) which should be 
  enough for normal texts. Please use the “countWords” directive if memory must be 
  conserved.

  I have allocated a fixed static hash table of 997 entries.  With 65000 bytes, and say, 
  6.5 bytes per word really, each list would contain 10 entries on the average.  
  hashMod can easily be increased to some larger prime number to speed up word 
  lookup for large files.

  I have also provided a lookup table for determining what characters are considered 
  parts of words, and which are white space.  Include foreign letters and digits if that is 
  desirable.  If the texts are computer programs, underscore and other symbols might 
  be included in “alpha”, depending on the language.

*/

#include <stdio.h>
#include <stdlib.h>

#define ulong unsigned long
#define ushort unsigned short

typedef enum {
  deletedText=0, insertedText, movedText } DiffType;

typedef struct {
  DiffType      type;
  ulong         rangeStart;
  ulong         rangeEnd;
  ulong         position;
} DiffRec;

short FindWordDifferences (
  char          *oldText,
  char          *newText,
  ulong         numOldChars,
  ulong         numNewChars,
  DiffRec       diffs[],
  ulong         maxDiffRecs);

#define countWords 0

#if countWords
/*use the actual word count to reserve Snippet space*/
#else
/*use a very conservative estimate to reserve enough*/
#define averageWordSize 4
#endif

#define hashMod 997 //any reasonable prime number

#define inDelete  0 //state values during text scan
#define inMove    1
#define inInsert  2
#define inLimbo   3

#define nullRec   3 //used to mark redundant DiffRecs

/*A word is defined as a contiguous sequence of letters from the set defined in the 
  lookup table.

  The table defaults to 'A'-'Z','a'-'z', but foreign characters or digits are easily included if 
  desired.
*/

#define includeForeign  0
#define includeDigits   0

static char lookup[256] = {
  0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,     //control
  0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,     //control
  0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,     //punctuation
#if includeDigits
  0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,     //digits plus
#else
  1,1,1,1,1,1,1,1, 1,1,0,0,0,0,0,0,     //digits plus
#endif
  0,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,     //caps A-O
  1,1,1,1,1,1,1,1, 1,1,1,0,0,0,0,0,     //caps P-Z
  0,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,     //small a-o
  1,1,1,1,1,1,1,1, 1,1,1,0,0,0,0,0,     //small p-z
#if includeForeign
  1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,     //foreign caps
  1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,     //foreign small
  0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,     //symbols
  0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,     //symbols
  0,0,0,0,0,0,0,0, 0,0,0,1,1,1,1,1,     //symbols plus
  0,0,0,0,0,0,0,0, 1,1,0,0,0,0,0,0,     //symbols plus
#else
  0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,     //foreign caps
  0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,     //foreign small
  0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,     //symbols
  0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,     //symbols
  0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,     //symbols plus
  0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,     //symbols plus
#endif
  0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,     //graphic chars
  0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0      //graphic chars
};

#define isAlpha(x) lookup[x]

/*A snippet may describe a word or a block of words in the old text and includes the 
  white space. The reference to the new text is used to track the equivalent position in 
  the new text.
*/
typedef struct Snippet {
  DiffType      type;
  char*         oldTextRef;
  char*         newTextRef;
  ulong         length;
  void*         next;
} Snippet;

/*The hashTable is an array of lists where each list contains the snippets (words or 
  blocks) from the old text which hash to the same value, in the order in which they 
  occured.
*/
typedef struct List {
  Snippet*      first;
  Snippet*      last;
} List;

static List hashTable[hashMod];
static Snippet* snippetStore;   //allocated as needed
static ushort   numSnippets;
static ushort   nextFreeSnippet;


Hash

/*The hash function is used to distribute different snippets into different lists as evenly 
  as possible.
*/
ushort Hash(char* text,ulong numChars)
{
  register ulong accumulator=numChars;
  if (numChars>6) numChars=6;
  switch (numChars) {
case 6:accumulator=(accumulator<<5)+*text++;
case 5:accumulator=(accumulator<<5)+*text++;
case 4:accumulator=(accumulator<<5)+*text++;
case 3:accumulator=(accumulator<<5)+*text++;
case 2:accumulator=(accumulator<<5)+*text++;
case 1:accumulator=(accumulator<<5)+*text++;
case 0:;
  }
  return accumulator % hashMod;
}

ClearSnippetStore
/*Clears all snippets to 0*/
void ClearSnippetStore()
{
  Snippet* s=snippetStore;
  ushort n=numSnippets-1;
  s->type=deletedText;
  s->oldTextRef=0;
  s->newTextRef=0;
  s->length=0;
  s->next=0;
  s++;
  while (n--) *s++=*snippetStore;
}

GetSnippetStore

/*Preallocates an array of snippets to handle oldText. All snippets are initially marked 
  deletedText.
*/
void GetSnippetStore(ulong snippetsAllocated)
{
  ulong memoryRequired;
  if (snippetsAllocated>65535) snippetsAllocated=65535;
  numSnippets=(ushort)snippetsAllocated;
  memoryRequired=numSnippets*sizeof(Snippet);
  snippetStore=(Snippet*)malloc(memoryRequired);
  ClearSnippetStore();
  nextFreeSnippet=0;
}

NewSnippet
/*Assign the next snippet from the preallocated array.*/
Snippet* NewSnippet(char* text)
{
  register Snippet* s;
  s=&(snippetStore[nextFreeSnippet++]);
  s->oldTextRef=text;
  return s;
}

ExtendSnippet
/*Consecutive Snippets are merged into larger blocks*/
int ExtendSnippet(Snippet* oldS,Snippet* s)
{
  if (oldS->oldTextRef+oldS->length==s->oldTextRef) {
    oldS->length+=s->length;
    s->length=0;
    return 1;
  }
  return 0;
}

Record
/*Snippets are attached at end of a list (hashTable[])*/
void Record(Snippet* s)
{ register List* list=
        &(hashTable[Hash(s->oldTextRef,s->length)]);
  register Snippet* lastS=list->last;
  if (lastS) lastS->next=s;
  else list->first=s;
  list->last=s;
}

Match
/*Matches two substrings of length numChars*/
int Match(char* text0,char* text1,ulong numChars)
{
  while (numChars--) {
    if (*text0!=*text1) return 0;
    text0++;
    text1++;
  }
  return 1;
}

FindAndRemoveSnippet

/*Snippets of the oldText are matched against a word from newText.  The first 
  matching word is removed from the hash list.
*/
Snippet* FindAndRemoveSnippet(char* text,ulong numChars)
{
  List* list=&(hashTable[Hash(text,numChars)]);
  Snippet* s=list->first;
  Snippet* father=0;
  while (s) {
    if ((s->length==numChars) &&
      Match(s->oldTextRef,text,numChars)) {
        if (father) {
          father->next=s->next;
          if (list->last==s) list->last=father;
        } else {
          if (list->last==s) list->first=list->last=0;
          else list->first=s->next;
        }
        return s;
      }
    father=s;
    s=(Snippet*)(s->next);
  }
  return 0;
}

StartMoveRecord

/*The following macros set up the 3 different types of DiffRecs.
*/
#define StartMoveRecord                                 \
{ diffPtr->type=movedText;                              \
  diffPtr->rangeStart=block->oldTextRef-oldText;        \
  diffPtr->rangeEnd=diffPtr->rangeStart+block->length-1;\
  diffPtr->position=startOfText->oldTextRef+            \
    startOfText->length-oldText;                        \
}

StartInsertRecord

#define StartInsertRecord                               \
{ diffPtr->type=insertedText;                           \
  diffPtr->rangeStart=text-newText;                     \
  diffPtr->rangeEnd=diffPtr->rangeStart+wordLength-1;   \
  diffPtr->position=startOfText->oldTextRef+            \
    startOfText->length-oldText;                        \
}

StartDeleteRecord

#define StartDeleteRecord                               \
{ diffPtr->type=deletedText;                            \
  diffPtr->rangeStart=block->oldTextRef-oldText;        \
  diffPtr->rangeEnd=diffPtr->rangeStart+block->length-1;\
  diffPtr->position=block->newTextRef-newText;          \
}

MarkDeletedWords

/*The following macro marks all words from startOfText to “toHere” as potential 
  candidates for delete. Any of the words will be overwritten and become movedText if 
  they end up being matched in newText later.
*/
#define MarkDeletedWords(toHere)                        \
{ canDel=startOfText;                                   \
  while (++canDel<toHere)                               \
    if (canDel->type==deletedText)                      \
      canDel->newTextRef=markNew;                       \
}

BestGuess
/*
  A matching word may be found in the old text at a point far beyond the current 
  insertion point.  It would be a shame to declare this word the new insertion point, 
  and then have to move all intervening words (yet to be matched).  It is better to 
  move the smaller block or word up.  This macro is a heuristic attempt to make a 
  sensible guess as to which block is larger, the matching word (100%) or the 
  intervening text (adjusted according to the ratio of remaining chars still required for 
  matching, at 50%).
*/
#define BestGuess                                       \
  (skippedChars*(remainingNew>>1)>                      \
    block->length*remainingOld)

NextWord

/*Words are extracted by first scanning through any preceding white space, then by 
  scanning through the alpha characters, until another white space occurs.

  It is assumed that \x0 does not occur as part of either text.  The last character in the 
  text is then temporarily set to 0 so we easily find the end of text.
*/
#define NextWord                                        \
{ while ((*t)&&(0==isAlpha(*t))) t++;                   \
  while (0!=isAlpha(*t)) t++;                           \
}

EmitRecord 

#define EmitRecord                                      \
{ diffPtr++;                                            \
  if (diffPtr-diffs>=maxDiffRecs) return maxDiffRecs;   \
}

ClearHashTable

#define ClearHashTable                                  \
{ ushort n=hashMod;                                     \
  List* H=hashTable;                                    \
  while (n--) {                                         \
    H->first=0;                                         \
    H->last=0;                                          \
    H++;                                                \
  }                                                     \
}

ParseOldText

/*The function ParseOldText scans the oldText and creates an array of snippets, where 
  each snippet corresponds to one word.  In addition, each snippet is linked into a list, 
  where each list is headed by a List pointer in hashTable. The last character of the text 
  is temporarily set to 0 as an end-of-text marker.
*/
void ParseOldText(
char* oldText,
ulong numSameHead,
ulong numChars)
{
  Snippet*      s;
  char*         text=oldText+numSameHead;
  char*         t;
  char          lastChar=text[numChars-1]; //save it
  text[numChars-1]=0;
  ClearHashTable;
  t=text;
  NewSnippet(text);
  //A null snippet anchors the start of the text.
  do {
    NextWord;
    s=NewSnippet(text);
    s->length=t-text;
    if (0==*t) break;
    Record(s);
    text=t;
  } while (1);
  *t=lastChar;           //restore the last char
  s->length++;
  Record(s);
}

ParseNewText

/* The ParseNewText function scans the newText and tries to match it with the 
   oldText.

   It proceeds by isolating words, and matching them against previously created 
   Snippets (see ParseOldText).

   It is a state machine which tries to agglomerate contiguous blocks of words while it 
   is in either the insert or move state.  When it switches states it creates a DiffRec for 
   the preceding block.

   It moves the “startOfText” in the oldText along as it scans the newText and 
   encounters matching blocks.

   If the accumulated matching (movedText) block lies before the current startOfText, 
   or if it is small and lies far forward, it is made into a movedText record.  But 
   otherwise, it stays unmoved, and becomes the new startOfText.

   Blocks of contiguous words assembled during the insert state (i.e. no matching 
   words in oldText) are also accumulated and result in insertedText records.

   DeletedText records are created later, after all text has been scanned, by collecting 
   the leftovers in oldText (see CollectDeletes below).
*/
short ParseNewText( //returns number of inserted DiffRecs
char* oldText,
char* newText,
DiffRec* diffs,
ulong numSameHead,
ulong numOldChars,
ulong numNewChars,
ulong maxDiffRecs)
{
  Snippet*      s;
  Snippet*      canDel;
  Snippet*      block;
  Snippet*      startOfText=snippetStore;//null snippet
  Snippet*      endOfText=
                NewSnippet(oldText+numSameHead+numOldChars);
  char*         text=newText+numSameHead;
  char*         markNew=text;
  char*         t;
  DiffRec*      diffPtr=diffs;
  ulong         wordLength;
  ulong         remainingNew=numNewChars;
  ulong         remainingOld=numOldChars;
  long          skippedChars;   // signed !
  int           state=inLimbo;
  char          lastChar;
  char          done=0;

  if (0==numNewChars) goto cleanup;
  lastChar=text[numNewChars-1];
  text[numNewChars-1]=0;
  do {
    t=text;
    NextWord;
    wordLength=t-text;
    if (0==*t) {
      *t=lastChar;      //restore last char
      wordLength++;
      done=1;
    }
    if (0!=(s=FindAndRemoveSnippet(text,wordLength))) {
      s->type=movedText;
      remainingOld-=s->length;
      remainingNew-=s->length;
      switch (state) {
      case inMove:
        if (0==ExtendSnippet(block,s)) {
          skippedChars=block->oldTextRef-
            startOfText->oldTextRef-startOfText->length;
          if ((skippedChars<0) || BestGuess) {
            StartMoveRecord;
            EmitRecord;
          } else {
            MarkDeletedWords(block);
            startOfText=block;
          }
          block=s;
        }
        break;
      case inInsert:
        EmitRecord;
      case inLimbo:
        block=s;
        markNew=text;
        state=inMove;
      }
    } else {
      remainingNew-=wordLength;
      switch (state) {
      case inInsert:
        diffPtr->rangeEnd+=wordLength;
        break;
      case inMove:
        skippedChars=block->oldTextRef-
                startOfText->oldTextRef-startOfText->length;
        if ((skippedChars<0) || BestGuess) {
          StartMoveRecord;
          EmitRecord;
        } else {
          MarkDeletedWords(block);
          startOfText=block;
        }
      case inLimbo:
        StartInsertRecord;
        state=inInsert;
      }
    }
    text=t;
  } while (0==done);

cleanup:
  switch (state) {
  case inMove:
    skippedChars=block->oldTextRef-
      startOfText->oldTextRef-startOfText->length;
    if (skippedChars<0) {
      StartMoveRecord;
    } else {
      MarkDeletedWords(endOfText);
      break;
    }
  case inInsert:
    EmitRecord;
  case inLimbo:
    MarkDeletedWords(endOfText);
  }

  return diffPtr-diffs;
}

CollectDeletes

/*The CollectDeletes function scans through the entire snippet array, and  detects all  
   snippets still marked as deletedText.  Contiguous blocks of these are assembled to 
   generate deletedText DiffRecs.
*/
short CollectDeletes(
DiffRec* diffs,
char* oldText,
char* newText,
ulong maxDiffRecs)
{
  DiffRec*      diffPtr=diffs;
  Snippet*      s=snippetStore;
  Snippet*      block;
  ushort        n=nextFreeSnippet;
  int           state=inLimbo;
  while (n--) {
    if (s->length) {
      if (s->type==deletedText) {
        if (state==inLimbo) {
          block=s;
          state=inDelete;
        } else block->length+=s->length;
      } else if (state==inDelete) {
        StartDeleteRecord;
        EmitRecord;
        state=inLimbo;
      }
    }
    s++;
  }
//cleanup:
  if (state==inDelete) {
    StartDeleteRecord;
    diffPtr++;
  }

  return diffPtr-diffs;
}

WordCount
#if countWords
/*This function could be used to tailor make the number of snippets in snippetStore to 
   the exact number required
*/
short WordCount(char* text,ulong numChars)
{
  register      char*   t=text;
                short   WC=1;
                char    lastChar;
  if (numChars<=1) return numChars;
  lastChar=t[numChars-1];
  t[numChars-1]=0;
  do {
    NextWord;
    WC++;
  } while (*t);
  *t=lastChar;
  return WC;
}
#endif

ExtractNullRecs

/*Null records can result from cancelling common parts of insert/delete pairs, for 
   example insert “[the” and delete “the” would become just insert “[”. The delete 
   record would have a length of 0, i.e. a rangeEnd below rangeStart.  If this occurred at 
   the start of text, the value 0xFFFFFFFF would occur for rangeEnd, surely not a good 
   thing. It is surely best to eliminate those null records.
*/
void ExtractNullRecs(DiffRec* diffs,short numDiffRecs)
{
  DiffRec*      d=diffs+numDiffRecs;
  short         i=numDiffRecs;
  short         numMove;
  while (i--) {
    d--;
    if (d->type==nullRec) {
      DiffRec* dx=d+1;
      numDiffRecs--;
      numMove=numDiffRecs-i;
      while (numMove--) *d++=*dx++;
      numMove=i;
    }
  }
}

ReduceInsDelPairs

/*Since all snippets (and hence DiffRecs) refer to words which usually include leading 
  white space, they might differ as such, but have common white space with different 
  words, or vice versa. The ReduceInsDelPairs attempts to match pairs at the same text 
  location, and remove their common parts.

  The loop relies on the existing order of records, as created by ParseNewText, 
  followed by CollectDeletes. This means delete records are at the end, and all are in 
  the order of the text.  Ins and Del records are compared if they apply to the same 
  insertion/ deletion point, and share common words or white space at either end of 
  the block.  The aggregation of blocks hurts sometimes, in that redundancies in the 
  middle will not be found here.
*/
short ReduceInsDelPairs(
DiffRec* diffs,
short numDiffRecs,
char* oldText,
char* newText)
{
  DiffRec*      dRec=diffs+numDiffRecs-1;
  DiffRec*      iRec=dRec;
  short         eliminated=0;
  while (iRec>diffs) {
    iRec--;
    if (insertedText==iRec->type) break;
  }
  while (dRec>diffs) {
top_of_loop:
    if (deletedText!=dRec->type) break;
    if (dRec->rangeStart>iRec->position) {
      dRec--;
      continue;
    }
    if (dRec->rangeStart==iRec->position) {
//take care of disposable common chars at START of block
      do {
        ulong dStart=dRec->rangeStart;
        ulong iStart=iRec->rangeStart;
        ulong n=0;
        ulong n0=0;
//take care of common white space:
        while (oldText[n+dStart]==newText[n+iStart]) {
          if (isAlpha(oldText[n+dStart])) break;
          n++;
        }
//take care of common alpha but it must be a whole word:
        n0=n;
        while (oldText[n+dStart]==newText[n+iStart]) {
          if (isAlpha(oldText[n+dStart])) {
            n++;
            if (0==isAlpha(oldText[n+dStart])) {//success
              n0=n;
              break;
            }
          }
        }
        if (n0) {
          dRec->position+=dStart-dRec->rangeStart+n0;
          dRec->rangeStart=dStart+n0;
          iRec->position+=iStart-iRec->rangeStart+n0;
          iRec->rangeStart=iStart+n0;
        } else break;
      } while (oldText[dRec->rangeStart]==
          newText[iRec->rangeStart]);

//take care of disposable common chars at END of block
      do {
        ulong dEnd=dRec->rangeEnd;
        ulong iEnd=iRec->rangeEnd;
        ulong n=0;
        ulong n0=0;

//take care of common alpha but it must be a whole word:
        while (oldText[dEnd-n]==newText[iEnd-n]) {
          if (isAlpha(oldText[dEnd-n])) {
            n++;
            if (0==isAlpha(oldText[dEnd-n])) {//success
              n0=n;
              break;
            }
          }
        }
        n=n0;
//take care of common white space:
        while (oldText[dEnd-n]==newText[iEnd-n]) {
          if (isAlpha(oldText[dEnd-n])) break;
          n++;
        }
        if (n) {
          iRec->rangeEnd=iEnd-n;
          dRec->rangeEnd=dEnd-n;
        } else break;
      } while (oldText[dRec->rangeEnd]==
          newText[iRec->rangeEnd]);
      if ((long)(dRec->rangeStart)>(long)(dRec->rangeEnd)) {
        dRec->type=nullRec;
        eliminated++;
      }
      if ((long)(iRec->rangeStart)>(long)(iRec->rangeEnd)) {
        iRec->type=nullRec;
        eliminated++;
      }
    } else while (iRec>diffs) { //find next lower iRec
      iRec--;
      if (insertedText==iRec->type) goto top_of_loop;
    }
    dRec--;
  }
  if (eliminated) ExtractNullRecs(diffs,numDiffRecs);

  return eliminated;
}

FindWordDifferences 

/*The FindWordDifferences function is primarily a collection of calls to the other 
  routines. In addition, it tries to eliminate parts of the texts from processing which are 
  completely identical at the start or the tail ends of the texts. If an insufficient number 
  of DiffRecs was allocated, and maxDiffRecs is reached while processing the texts, the 
  function returns this value without further ado.
*/
short FindWordDifferences (
  char  *oldText,     /* pointer to old version of text */
  char  *newText,     /* pointer to new version of text */
  ulong numOldChars,  /* number of characters in oldText */
  ulong numNewChars,  /* number of characters in newText */
  DiffRec diffs[],    /* pointer to preallocated array
                      where text differences are to be stored */
  ulong maxDiffRecs   /* number of DiffRecs preallocated */
)
{
  ulong           numSameHead,numSameTail;
  long            numUniqueOldChars;
  long            numUniqueNewChars;
  short           numDiffRecs=0;
  register char*  OTx=oldText;
  register char*  NTx=newText;
  register char*  OTlimit;
  long            deltaChars=numOldChars-numNewChars;

  if (deltaChars>0) OTlimit=oldText+numNewChars;
  else OTlimit=oldText+numOldChars;

//check for common head:
  while (OTx<OTlimit) {
    if (*OTx != *NTx) break;
    OTx++;
    NTx++;
  }

  if ((isAlpha(*OTx)) || (isAlpha(*NTx))) {
//backtrack to start of word
    while ((OTx>oldText) && (isAlpha(OTx[-1]))) OTx--;
    if (OTx>oldText) OTx--; //grab previous WSP
  }

  numSameHead=OTx-oldText;

//check for common tail:
  OTx=oldText+numOldChars-1;
  NTx=newText+numNewChars-1;
  if (deltaChars>0) OTlimit=oldText+deltaChars;
  else OTlimit=oldText;
  while (OTx>OTlimit) {
    if (*OTx != *NTx) break;
    OTx--;
    NTx--;
  }

  if ((isAlpha(*OTx)) || (isAlpha(*NTx))) {
//track to end of word
    while ((OTx<oldText+numOldChars-1)
        && (isAlpha(OTx[1]))) OTx++;
  }
  numSameTail=oldText-OTx+numOldChars-1;

  numUniqueOldChars=numOldChars-numSameTail-numSameHead;
  numUniqueNewChars=numNewChars-numSameTail-numSameHead;

/*These values may be negative! If both are negative, old and new texts are identical, 
  and the function can return immediately.
*/
  if ((numUniqueOldChars<0) && (numUniqueNewChars<0))
        return 0;

#if countWords
  GetSnippetStore(3+
        WordCount(oldText+numSameHead,numUniqueOldChars));
#else
  GetSnippetStore(3+
        numUniqueOldChars/averageWordSize);
#endif
  if (numUniqueOldChars>0)
        ParseOldText(oldText,numSameHead,numUniqueOldChars);

  numDiffRecs+=
        ParseNewText(oldText,newText,diffs,
        numSameHead,numUniqueOldChars,numUniqueNewChars,
        maxDiffRecs);

  {
  ulong numDeletes=
        CollectDeletes(&diffs[numDiffRecs],oldText,newText,
        maxDiffRecs-numDiffRecs);

  numDiffRecs+=numDeletes;

  if (numDeletes) numDiffRecs-=
        ReduceInsDelPairs(diffs,numDiffRecs,oldText,newText);
  }

  free(snippetStore);

  return numDiffRecs;
}

Software Updates via MacUpdate

Latest Forum Discussions

Price Scanner via MacPrices.net

Jobs Board

SPREAD THE WORD:
Slashdot
Digg
Del.icio.us
Reddit
Newsvine