Feb 96 Challenge
	
| Volume Number: |  | 12 | 
| Issue Number: |  | 2 | 
| Column Tag: |  | Programmers Challenge | 
Programmers Challenge	
By Bob Boonstra, Westford, Massachusetts
 Note:  Source code files accompanying article are located on MacTech CD-ROM or source code disks.
 Note:  Source code files accompanying article are located on MacTech CD-ROM or source code disks.
Intersecting Rectangles
The Challenge this month is to write a routine that will accept a list of rectangles and calculate a result based on the intersections of those rectangles.  Specifically, your code will return a list of non-overlapping rectangles that contain all points enclosed by an odd (or even) number of the input rectangles.  The prototype for the code you should write is:
void RectangleIntersections(
 const Rect inputRects[], /* list if input rectangles */
 const long numRectsIn,   /* number of inputRects */
 Rect outputRects[], /* preallocated storage for output */
 long *numRectsOut,/* number of outputRects returned */
 const Boolean oddParity  /* see text for explanation */
);
The parameter oddParity indicates whether you are to return rectangles containing points enclosed by an odd number of the numRectsIn inputRects rectangles (oddParity==true) or by an even (nonzero) number of rectangles (oddParity==false).  Sufficient storage for the output will be preallocated for you and pointed to by outputRects.
As an example, if you were given these inputRects:
 {0,10,20,30}, {5,15,20,30}
  and oddParity were true, you might return the following list of outputRects:
 {0,10,5,15}, {0,15,5,30}, {5,10,15,20}
It would also be correct to return a result that combined the first of these rectangles with either of the other two.  If oddParity were false, you would return the following list for the example input:
 {5,15,20,30}
The outputRects must be non-empty and non-overlapping.  In the example, it would be incorrect to return the following for the odd parity case:
 {0,10,5,30} {0,10,20,15}
The outputRects you generate must also be maximal, in the sense that each edge of each of the outputRects should pass through a vertex of one of the inputRects.  That is, for example, I dont want you to return a 1¥1 rectangle representing each point enclosed in the desired number of inputRects.  Before returning, set *numRectsOut to indicate the number of outputRects you generated.
If you need auxiliary storage, you may allocate any reasonable amount within your code using toolbox routines or malloc, but you must deallocate that storage before returning.  (No memory leaks! - Ill be calling your code many times.)
This native PowerPC Challenge will be scored using the latest Metrowerks compiler, with the winner determined by execution time.  If you have any questions, or would like some test data for your code, please send me e-mail at one of the Programmers Challenge addresses, or directly to bob_boonstra@mactech.com.  Test data will also be sent to the Programmers Challenge mailing list, which you can join by sending a message to autoshare@mactech.com with the SUBJECT line sub challenge YourName, substituting your real name for YourName.
Two Months Ago Winner
Eight of the 13 solutions submitted for the Find Again And Again Challenge worked correctly.  Congratulations to Gustav Larsson (Mountain View, CA) for submitting an entry that was significantly faster than the others.  The problem was to write a text search engine optimized to operate repeatedly on the same block of text.  A variety of optimization techniques were represented in the solutions, a couple of which are highlighted in the table of results below.  Several people optimized for the case where the same word was repeatedly searched for.  Some of my tests included this case, and those results are in the columns headed repeat. The random columns shows results for tests that searched for random occurrences of random words.  Each of the tests were run under conditions where only 64KB of auxiliary storage was available, and where much more memory was available.  These conditions were weighted 20% and 80% respectively in calculating the total time, since the problem statement promised that ample memory would usually be provided.  You can see that Gustavs solution performed reasonably well when memory was scarce, and very well when memory was plentiful.
Gustavs solution hashes as many words of the input text as possible in the initialization routine.  He uses the Boyer-Moore-Horspool algorithm to find words in any text that was not parsed during initialization.  Other features of the approach are described in the well-commented code.
Here are the times and code sizes for entries that passed by tests.  Numbers in parentheses after a persons name indicate that persons cumulative point total for all previous Challenges, not including this one. 
	64K Memory	>>64K Memory	code
Name	repeat	random	repeat	random	time	size
Gustav Larsson (67)	1814	3773	62	111	1255	3584
Tom Saxton	46	16400	197	459	3814	2000
Xan Gregg (81)	27	2907	1316	2835	3907	1664
Kevin Cutts (46)	1760	3234	1760	2809	4654	1600
Joseph Ku	8856	14570	121	509	5189	1584
David Cary	60	22665	499	1000	5745	2124
Eric Lengyel (40)	34	10221	29	4697	5831	1188
Ernst Munter (110)	2036	2053	2287	4603	6330	2976
Top Contestants of All time
Here are the Top Contestants for the Programmers Challenges to date, including everyone who has accumulated more than 20 points.  The numbers below include points awarded for this months entrants.  
Rank	Name	Points	Rank	Name	Points
1.	[Name deleted]	176	11.	Mallett, Jeff	44
2.	Munter, Ernst	110	12.	Kasparian, Raffi	42
3.	Gregg, Xan	88	13.	Vineyard, Jeremy	42
4.	Larsson, Gustav	87	14.	Lengyel, Eric	40
5.	Karsh, Bill	80	15.	Darrah, Dave	31
6.	Stenger, Allen	65	16.	Landry, Larry	29
7.	Riha, Stepan	51	17.	Elwertowski, Tom	24
8.	Cutts, Kevin	50	18.	Lee, Johnny	22
9.	Goebel, James	49	19.	Noll, Robert	22
10.	Nepsund, Ronald	47
There are three ways to earn points: (1) scoring in the top 5 of any Challenge, (2) being the first person to find a bug in a published winning solution or, (3) being the first person to suggest a Challenge that I use.  The points you can win are:
1st place	20 points	5th place	2 points
2nd place	10 points	finding bug	2 points
3rd place	7 points	suggesting Challenge	2 points
4th place	4 points
Here is Gustavs winning solution:
Find Again and Again
Copyright © 1995 Gustav Larsson
Constants & Types
#define ALPHABET_SIZE 256
#define ALLOC_SIZE(n) ((n+3) & -4L) /* next multiple of 4 */
#define HASH_BUCKETS 1024           /* must be power of 2 */
#define HASH_MASK (HASH_BUCKETS - 1)
#define NO_NULL_CHAR 'A'
#define NULL 0
typedef unsigned char  uchar;
typedef unsigned short ushort;
typedef unsigned long  ulong;
typedef struct Word Word;
typedef struct Occurrence Occurrence;
typedef struct Private Private;
/* 
  A block of occurrence positions.  We pack in as many occurrences as possible into 
  a single block, from 3 to 6 depending on textLength.
  The first entry in the block is always used.  The remaining entries are in use if they 
  are not zero.  These facts are used several places to simplify the code.
*/
struct Occurrence {
  Occurrence *next;
  union {
    ushort pos2[6];   /* 2 bytes/occurrence */
    struct {
      ushort lo[4];
      uchar  hi[4];
    } pos3;           /* 3 bytes/occurrence */
    long pos4[3];     /* 4 bytes/occurrence */
  } p;
};
/*
  There is one Word struct for each distinct word.  The words length is stored in the 
  top eight bits of the hash value.  Theres no need to store the characters in the word 
  since we can just look at the first occurrence (first entry in Word.first).
*/
struct Word {
  Word        *next;
  ulong       hash;
  Occurrence  *last;
  Occurrence  first;
};
/*
  The structure of our private storage.  The hashCodes[] array serves two purposes: it 
  distinguishes alphanumeric from non-alphanumeric characters, and it provides a 
  non-zero hash code for each alphanumeric character.  The endParsedText field will 
  be -1 if there was enough private memory to parse all the text.  Otherwise, it points to 
  the start of the unparsed text.  nullChar is used by the BMH_Search() function when 
  we must search unparsed text for an occurrence.
*/
struct Private {
  ulong hashCodes [ ALPHABET_SIZE ];
  Word  *hashTable [ HASH_BUCKETS ];
  long  endParsedText;  /* start of parsed text */
  long  posBytes;       /* POS_x_BYTES, below */
  char  nullChar;       /* char not appearing in the text */
  long  heap;           /* start of private heap  */
};
Macros
/*
  These macros simplify access to the occurrence positions stored in an Occurrence 
  struct.  Posbytes is a macro argument that is usually set to private->posBytes.  
  However, you can also use a constant for posbytes, which lets the compiler choose 
  the right case at compile time, producing smaller and faster code.
*/
#define POS_2_BYTES 1   /* word position fits in 2 bytes */
#define POS_3_BYTES 0   /* fits in 3 bytes; usual case */
#define POS_4_BYTES 2   /* fits in 4 bytes */
#define GET_POS(pos,occur,index,posbytes)           \
  {                                                 \
    if ( (posbytes) == POS_3_BYTES )                \
      pos = ((long)(occur)->p.pos3.hi[index] << 16) \
          + (occur)->p.pos3.lo[index];              \
    else if ( (posbytes) == POS_2_BYTES )           \
      pos = (occur)->p.pos2[index];                 \
    else                                            \
      pos = (occur)->p.pos4[index];                 \
  }
#define SET_POS(pos,occur,index,posbytes)       \
  {                                             \
    if ( (posbytes) == POS_3_BYTES )            \
    {                                           \
      (occur)->p.pos3.hi[index] = (pos) >> 16;  \
      (occur)->p.pos3.lo[index] = (pos);        \
    }                                           \
    else if ( (posbytes) == POS_2_BYTES )       \
      (occur)->p.pos2[index] = pos;             \
    else                                        \
      (occur)->p.pos4[index] = pos;             \
  }
InitFind
void InitFind (
  char *textToSearch,
  long textLength,
  void *privateStorage,
  long storageSize
)
{
  Private *private = privateStorage;
  private->endParsedText =
    InitFindBody(
        (uchar *)textToSearch,
        textLength,
        privateStorage,
        (uchar *)privateStorage + storageSize
    );
  if ( private->endParsedText != -1 )
    private->nullChar =
      PickNullChar(
            private,
            (uchar *)textToSearch + private->endParsedText,
            (uchar *)textToSearch + textLength );
  else
    private->nullChar = NO_NULL_CHAR;
}
InitFindBody
/*
  This function does most of the work for InitFind().  The arguments have been recast 
  into a more useful form; uchar and ulong are used a lot so that we dont have to 
  worry about the sign, especially when indexing private->hashCodes[].
  The return value is the character position when the unparsed text begins (if we run 
  out of private storage), or -1 if all the text was parsed.
*/
static long InitFindBody (
  uchar   *textToSearch,
  long    textLength,
  Private *private,
  uchar   *endPrivateStorage
)
{
  uchar       *alloc, *textPos, *textEnd, *wordStart;
  long        wordLength;
  ulong       hash, code;
  Word        *word;
  Occurrence  *occur;
/*
  Init table of hash codes.  The remaining entries are guaranteed to be initialized to 
  zero.  The hash codes were chosen so that any two codes differ by at least five bits.
  */
  {
    ulong *table = private->hashCodes;  /* reduces typing */
    table['0'] = 0xFFC0;  table['5'] = 0xF492;
    table['1'] = 0xFE07;  table['6'] = 0xF31E;
    table['2'] = 0xF98B;  table['7'] = 0xF2D9;
    table['3'] = 0xF84C;  table['8'] = 0xCF96;
    table['4'] = 0xF555;  table['9'] = 0xCE51;
    table['A'] = 0xC9DD;  table['N'] = 0xA245;
    table['B'] = 0xC81A;  table['O'] = 0x9F0A;
    table['C'] = 0xC503;  table['P'] = 0x9ECD;
    table['D'] = 0xC4C4;  table['Q'] = 0x9941;
    table['E'] = 0xC348;  table['R'] = 0x9886;
    table['F'] = 0xC28F;  table['S'] = 0x959F;
    table['G'] = 0xAF5C;  table['T'] = 0x9458;
    table['H'] = 0xAE9B;  table['U'] = 0x93D4;
    table['I'] = 0xA917;  table['V'] = 0x9213;
    table['J'] = 0xA8D0;  table['W'] = 0x6DD3;
    table['K'] = 0xA5C9;  table['X'] = 0x6C14;
    table['L'] = 0xA40E;  table['Y'] = 0x6B98;
    table['M'] = 0xA382;  table['Z'] = 0x6A5F;
    table['a'] = 0x6746;  table['n'] = 0x3C88;
    table['b'] = 0x6681;  table['o'] = 0x3B04;
    table['c'] = 0x610D;  table['p'] = 0x3AC3;
    table['d'] = 0x60CA;  table['q'] = 0x37DA;
    table['e'] = 0x5D85;  table['r'] = 0x361D;
    table['f'] = 0x5C42;  table['s'] = 0x3191;
    table['g'] = 0x5BCE;  table['t'] = 0x3056;
    table['h'] = 0x5A09;  table['u'] = 0x0D19;
    table['i'] = 0x5710;  table['v'] = 0x0CDE;
    table['j'] = 0x56D7;  table['w'] = 0x0B52;
    table['k'] = 0x515B;  table['x'] = 0x0A95;
    table['l'] = 0x509C;  table['y'] = 0x078C;
    table['m'] = 0x3D4F;  table['z'] = 0x064B;
  }
  /*  Determine the number of bytes needed to store each occurrence position. */
  if ( textLength <= 0x10000L )
    private->posBytes = POS_2_BYTES;
  else if ( textLength <= 0x1000000L )
    private->posBytes = POS_3_BYTES;
  else
    private->posBytes = POS_4_BYTES;
  /* Set up variables to handle allocation of private storage. */
  alloc = (uchar *)&private->heap;
  /* Parse the text */
  textPos = textToSearch;
  textEnd = textPos + textLength;
  while ( textPos != textEnd )
  {
    /* Search for start of word */
    while ( private->hashCodes[*textPos] == 0 )
    {
      textPos++;
      if ( textPos == textEnd )
        return -1;  /* parse all text */
    }
    wordStart = textPos;
    /* Search for end of word; generate hash value too */
    hash = 0;
    while ( textPos != textEnd &&
            (code = private->hashCodes[ *textPos ]) != 0 )
    {
      hash = (hash << 1) ^ code;
      textPos++;
    }
    wordLength = textPos - wordStart;
    hash = (hash & 0xFFFFFF) | (wordLength << 24);
    /*
      Record the occurrence.  First we see if a Word struct exists for this word and 
      whether we need to allocate a new Occurrence struct.
    */
    word = LookupWord(
                private,
                (char *)textToSearch,
                (char *)wordStart,
                wordLength,
                hash );
    if ( word )
    {
      long allocateNewBlock, blockSize, i, pos;
      /*
        This word has occurred before, so it already has a Word struct.  See if theres 
        room in the last Occurrence block for another entry.  Remember that entry #0 in 
        the Occurrence block is always in use, so we can start checking at entry #1 for a 
        non-zero entry.
      */
      occur = word->last;
      allocateNewBlock = TRUE;
      switch ( private->posBytes )
      {
        case POS_2_BYTES:  blockSize = 6; break;
        case POS_3_BYTES:  blockSize = 4; break;
        case POS_4_BYTES:  blockSize = 3; break;
      }
      for ( i = 1; i < blockSize; i++ )
      {
        GET_POS( pos, occur, i, private->posBytes )
        if ( pos == 0 )
        {
          SET_POS( wordStart - textToSearch, occur, i,
                   private->posBytes )
          allocateNewBlock = FALSE;
          break;
        }
      }
      if ( allocateNewBlock )
      {
        /* Block is full.  Allocate new Occurrence block */
        occur = (Occurrence *) alloc;
        alloc += ALLOC_SIZE( sizeof(Occurrence) );
        if ( alloc >= endPrivateStorage )
          return wordStart-textToSearch; /* out of memory */
        /* Init the new struct and link it to the end of the occurence list. */
        SET_POS( wordStart - textToSearch, occur, 0,
                 private->posBytes )
        word->last->next = occur;
        word->last = occur;
      }
    }
    else
    {
      long i;
/* This is a new word.  Allocate a new Word struct, which contains an Occurrence 
    struct too.  */
      word = (Word *) alloc;
      alloc += ALLOC_SIZE( sizeof(Word) );
      if ( alloc >= endPrivateStorage )
        return wordStart-textToSearch ;  /* out of memory */
      /*  Link it to the start of the Word list, coming off the hash table. */
      word->next = private->hashTable[ hash & HASH_MASK ];
      private->hashTable[ hash & HASH_MASK ] = word;
      /* Init the Word struct */
      word->hash = hash;
      word->last = &word->first;
      /* Init the Occurrence struct */
      SET_POS( wordStart - textToSearch, &word->first, 0,
               private->posBytes )
    }
  }
  /* Finished parsing text */
  return -1;
}
FindWordOccurrence
long FindWordOccurrence (
  char *wordToFind,
  long wordLength,
  long occurrenceToFind,
  char *textToSearch,
  long textLength,
  void *privateStorage,
  long storageSize
)
{
  Private *private = privateStorage;
  Word  *word;
  ulong hash;
  /* Make occurenceToFind zero-based */
  occurrenceToFind--;
  /* Generate hash value for word to find */
  hash = 0;
  {
    long remain = wordLength;
    uchar *p = (uchar *) wordToFind;
    while ( remain > 0 )
    {
      hash = (hash << 1) ^ private->hashCodes[*p++];
      remain--;
    }
    hash = (hash & 0xFFFFFF) | (wordLength << 24);
  }
  /* Look for word/occurrence in hash table */
  word = LookupWord( private, textToSearch, wordToFind,
                     wordLength, hash );
  if ( word )
  {
    Occurrence *occur = &word->first;
    long blockSize, pos, i;
    /* Word exists in hash table, so go down the occurrence list.  */
    switch ( private->posBytes )
    {
      case POS_2_BYTES: blockSize = 6;  break;
      case POS_3_BYTES: blockSize = 4;  break;
      case POS_4_BYTES: blockSize = 3;  break;
    }
    while ( occur && occurrenceToFind >= blockSize )
    {
      occurrenceToFind -= blockSize;
      occur = occur->next;
    }
    if ( occur )
    {
      GET_POS( pos, occur, occurrenceToFind,
               private->posBytes )
      if ( occurrenceToFind == 0 || pos != 0 )
        return pos;
      occurrenceToFind -= blockSize;
    }
    occur = word->last;
    for ( i = 0; i < blockSize; i++ )
    {
      GET_POS( pos, occur, i, private->posBytes )
      if ( pos == 0 )
        occurrenceToFind++;
    }
  }
  /* Not in parsed text, so check the unparsed text */
  if ( private->endParsedText != -1 )
  {
    char *p;
    if ( wordLength > 3 )
      p = BMH_Search(
              private->hashCodes,
              wordToFind,
              wordLength,
              occurrenceToFind,
              textToSearch + private->endParsedText,
              textToSearch + textLength,
              private->nullChar );
    else
      p = SimpleSearch(
              private->hashCodes,
              wordToFind,
              wordLength,
              occurrenceToFind,
              textToSearch + private->endParsedText,
              textToSearch + textLength );
    if (p)
      return (p - textToSearch);
  }
  /* Not found */
  return -1;
}
LookupWord 
/* Look up a word in the hash table */
static Word *LookupWord (
  Private *private,
  char    *textToSearch,
  char    *wordText,
  long    wordLength,
  ulong   hash
)
{
  Word *word = private->hashTable[ hash & HASH_MASK ];
  while ( word )
  {
    if ( word->hash == hash )
    {
      char *w1, *w2;
      long pos, remain = wordLength;
      /*
        The hash values match, so compare characters to make sure its the right word.  
        We already know the word length is correct since the length is contained
        in the upper eight bits of the hash value.
      */
      GET_POS( pos, &word->first, 0, private->posBytes )
      w1 = textToSearch + pos;
      w2 = wordText;
      while ( remain-- > 0 && *w1++ == *w2++ )
        ;
      if ( remain == -1 )
        return word;
    }
    word = word->next;
  }
  return NULL;
}
PickNullChar 
/*
  Find a character that doesnt appear anywhere in the unparsed text.  BMH_Search() is 
  faster if such a character can be found.
*/
static char PickNullChar (
  Private *private,
  uchar   *textStart,
  uchar   *textEnd
)
{
  long i;
  uchar *p, occurs[ ALPHABET_SIZE ];
  for ( i = 0; i < ALPHABET_SIZE; i++ )
    occurs[i] = FALSE;
  for ( p = textStart; p < textEnd; p++ )
    occurs[*p] = TRUE;
  for ( i = 0; i < ALPHABET_SIZE; i++ )
    if ( occurs[i] == FALSE && private->hashCodes[i] == 0 )
      return i;
  return NO_NULL_CHAR;
}
BMH_Search
/*
  Search the unparsed text using the Boyer-Moore-Horspool algorithm.  Ideally a null 
  character is supplied (one that appears in neither the search string nor the text being 
  searched).  This allows the inner loop to be faster.
*/
static char *BMH_Search (
  ulong *hashCodes,       /* private->hashCodes     */
  char  *wordToFind,
  long  wordLength,
  long  occurrenceToFind, /* 0 is first occurrence  */
  char  *textStart,       /* start of unparsed text */
  char  *textEnd,         /* end of unparsed text   */
  char  nullChar          /* private->nullChar      */
)
{
  long  i;
  char  *text, *wordEnd;
  char  word[256];
  long  offset[ ALPHABET_SIZE ];
  /*
    Copy the search string to a private buffer, where
    the first character is the null character.
  */
  word[0] = nullChar;
  for ( i = 0; i < wordLength; i++ )
    word[i+1] = wordToFind[i];
  /* Set up the offset[] lookup table */
  for ( i = 0; i < ALPHABET_SIZE; i++ )
    offset[i] = wordLength;
  for ( i = 1; i < wordLength; i++ )
    offset[ word[i] ] = wordLength - i;
  /* Let the search begin... */
  wordEnd = word + wordLength;
  text = textStart + wordLength - 1;
  if ( nullChar == NO_NULL_CHAR )
  {
    /* No null character, so use a slower inner loop */
    while ( text < textEnd )
    {
      long i;
      char *p, *q;
      for ( i = wordLength, p = wordEnd, q = text;
            i > 0 && *p == *q;
            i--, p--, q-- )
        ;
/*If i == 0, we have found the search string.  Now we make sure that it is delimited.*/
      if ( i == 0 && hashCodes[*q] == 0 &&
           (text+1 == textEnd || hashCodes[text[1]] == 0) )
      {
        if ( occurrenceToFind == 0 )
          return q+1;
        occurrenceToFind--;
      }
      text += offset[*text];
    }
  }
  else
  {
    /* There is a null character (usual case), 
        so we can use a faster and simpler inner loop. */
    while ( text < textEnd )
    {
      char *p, *q;
      for ( p = wordEnd, q = text; *p == *q; p--, q-- )
        ;
      if ( p == word && hashCodes[*q] == 0 &&
           (text+1 == textEnd || hashCodes[text[1]] == 0) )
      {
        if ( occurrenceToFind == 0 )
          return q+1;
        occurrenceToFind--;
      }
      text += offset[*text];
    }
  }
  return NULL;
}
SimpleSearch
/*
  Search the unparsed text using a simple search algorithm.  Note that wordLength 
  must be 1, 2, or 3.  This algorithm runs faster than BMH_Search() for small search 
  strings.
*/
static char *SimpleSearch(
  ulong *hashCodes,       /* private->hashCodes      */
  char  *wordToFind,
  long  wordLength,       /* 1..3                    */
  long  occurrenceToFind, /* 0 is 1st occurrence     */
  char  *textStart,       /* start of unparsed text  */
  char  *textEnd          /* end of all text         */
)
{
  char *text, first;
  first = wordToFind[0];
  text = textStart;
  if ( wordLength == 1 )
  {
    while ( text < textEnd )
    {
      while ( text < textEnd && *text != first )
        text++;
      if ( hashCodes[*(text-1)] == 0 &&
           hashCodes[text[wordLength]] == 0 )
      {
        if ( occurrenceToFind == 0 )
          return text;
        occurrenceToFind--;
      }
    text++;
    }
  }
  else if ( wordLength == 2 )
  {
    while ( text < textEnd )
    {
      while ( text < textEnd && *text != first )
        text++;
      if ( text[1] == wordToFind[1] &&
           hashCodes[*(text-1)] == 0 &&
           hashCodes[text[wordLength]] == 0 )
      {
        if ( occurrenceToFind == 0 )
          return text;
        occurrenceToFind--;
      }
    text++;
    }
  }
  else /* wordLength == 3 */
  {
    while ( text < textEnd )
    {
      while ( text < textEnd && *text != first )
        text++;
      if ( text[1] == wordToFind[1] &&
           text[2] == wordToFind[2] &&
           hashCodes[*(text-1)] == 0 &&
           hashCodes[text[wordLength]] == 0 )
      {
        if ( occurrenceToFind == 0 )
          return text;
        occurrenceToFind--;
      }
    text++;
    }
  }
  return NULL;
}