Feb 96 Challenge
Volume Number: | | 12
|
Issue Number: | | 2
|
Column Tag: | | Programmers Challenge
|
Programmers Challenge
By Bob Boonstra, Westford, Massachusetts
Note: Source code files accompanying article are located on MacTech CD-ROM or source code disks.
Intersecting Rectangles
The Challenge this month is to write a routine that will accept a list of rectangles and calculate a result based on the intersections of those rectangles. Specifically, your code will return a list of non-overlapping rectangles that contain all points enclosed by an odd (or even) number of the input rectangles. The prototype for the code you should write is:
void RectangleIntersections(
const Rect inputRects[], /* list if input rectangles */
const long numRectsIn, /* number of inputRects */
Rect outputRects[], /* preallocated storage for output */
long *numRectsOut,/* number of outputRects returned */
const Boolean oddParity /* see text for explanation */
);
The parameter oddParity indicates whether you are to return rectangles containing points enclosed by an odd number of the numRectsIn inputRects rectangles (oddParity==true) or by an even (nonzero) number of rectangles (oddParity==false). Sufficient storage for the output will be preallocated for you and pointed to by outputRects.
As an example, if you were given these inputRects:
{0,10,20,30}, {5,15,20,30}
and oddParity were true, you might return the following list of outputRects:
{0,10,5,15}, {0,15,5,30}, {5,10,15,20}
It would also be correct to return a result that combined the first of these rectangles with either of the other two. If oddParity were false, you would return the following list for the example input:
{5,15,20,30}
The outputRects must be non-empty and non-overlapping. In the example, it would be incorrect to return the following for the odd parity case:
{0,10,5,30} {0,10,20,15}
The outputRects you generate must also be maximal, in the sense that each edge of each of the outputRects should pass through a vertex of one of the inputRects. That is, for example, I dont want you to return a 1¥1 rectangle representing each point enclosed in the desired number of inputRects. Before returning, set *numRectsOut to indicate the number of outputRects you generated.
If you need auxiliary storage, you may allocate any reasonable amount within your code using toolbox routines or malloc, but you must deallocate that storage before returning. (No memory leaks! - Ill be calling your code many times.)
This native PowerPC Challenge will be scored using the latest Metrowerks compiler, with the winner determined by execution time. If you have any questions, or would like some test data for your code, please send me e-mail at one of the Programmers Challenge addresses, or directly to bob_boonstra@mactech.com. Test data will also be sent to the Programmers Challenge mailing list, which you can join by sending a message to autoshare@mactech.com with the SUBJECT line sub challenge YourName, substituting your real name for YourName.
Two Months Ago Winner
Eight of the 13 solutions submitted for the Find Again And Again Challenge worked correctly. Congratulations to Gustav Larsson (Mountain View, CA) for submitting an entry that was significantly faster than the others. The problem was to write a text search engine optimized to operate repeatedly on the same block of text. A variety of optimization techniques were represented in the solutions, a couple of which are highlighted in the table of results below. Several people optimized for the case where the same word was repeatedly searched for. Some of my tests included this case, and those results are in the columns headed repeat. The random columns shows results for tests that searched for random occurrences of random words. Each of the tests were run under conditions where only 64KB of auxiliary storage was available, and where much more memory was available. These conditions were weighted 20% and 80% respectively in calculating the total time, since the problem statement promised that ample memory would usually be provided. You can see that Gustavs solution performed reasonably well when memory was scarce, and very well when memory was plentiful.
Gustavs solution hashes as many words of the input text as possible in the initialization routine. He uses the Boyer-Moore-Horspool algorithm to find words in any text that was not parsed during initialization. Other features of the approach are described in the well-commented code.
Here are the times and code sizes for entries that passed by tests. Numbers in parentheses after a persons name indicate that persons cumulative point total for all previous Challenges, not including this one.
64K Memory >>64K Memory code
Name repeat random repeat random time size
Gustav Larsson (67) 1814 3773 62 111 1255 3584
Tom Saxton 46 16400 197 459 3814 2000
Xan Gregg (81) 27 2907 1316 2835 3907 1664
Kevin Cutts (46) 1760 3234 1760 2809 4654 1600
Joseph Ku 8856 14570 121 509 5189 1584
David Cary 60 22665 499 1000 5745 2124
Eric Lengyel (40) 34 10221 29 4697 5831 1188
Ernst Munter (110) 2036 2053 2287 4603 6330 2976
Top Contestants of All time
Here are the Top Contestants for the Programmers Challenges to date, including everyone who has accumulated more than 20 points. The numbers below include points awarded for this months entrants.
Rank Name Points Rank Name Points
1. [Name deleted] 176 11. Mallett, Jeff 44
2. Munter, Ernst 110 12. Kasparian, Raffi 42
3. Gregg, Xan 88 13. Vineyard, Jeremy 42
4. Larsson, Gustav 87 14. Lengyel, Eric 40
5. Karsh, Bill 80 15. Darrah, Dave 31
6. Stenger, Allen 65 16. Landry, Larry 29
7. Riha, Stepan 51 17. Elwertowski, Tom 24
8. Cutts, Kevin 50 18. Lee, Johnny 22
9. Goebel, James 49 19. Noll, Robert 22
10. Nepsund, Ronald 47
There are three ways to earn points: (1) scoring in the top 5 of any Challenge, (2) being the first person to find a bug in a published winning solution or, (3) being the first person to suggest a Challenge that I use. The points you can win are:
1st place 20 points 5th place 2 points
2nd place 10 points finding bug 2 points
3rd place 7 points suggesting Challenge 2 points
4th place 4 points
Here is Gustavs winning solution:
Find Again and Again
Copyright © 1995 Gustav Larsson
Constants & Types
#define ALPHABET_SIZE 256
#define ALLOC_SIZE(n) ((n+3) & -4L) /* next multiple of 4 */
#define HASH_BUCKETS 1024 /* must be power of 2 */
#define HASH_MASK (HASH_BUCKETS - 1)
#define NO_NULL_CHAR 'A'
#define NULL 0
typedef unsigned char uchar;
typedef unsigned short ushort;
typedef unsigned long ulong;
typedef struct Word Word;
typedef struct Occurrence Occurrence;
typedef struct Private Private;
/*
A block of occurrence positions. We pack in as many occurrences as possible into
a single block, from 3 to 6 depending on textLength.
The first entry in the block is always used. The remaining entries are in use if they
are not zero. These facts are used several places to simplify the code.
*/
struct Occurrence {
Occurrence *next;
union {
ushort pos2[6]; /* 2 bytes/occurrence */
struct {
ushort lo[4];
uchar hi[4];
} pos3; /* 3 bytes/occurrence */
long pos4[3]; /* 4 bytes/occurrence */
} p;
};
/*
There is one Word struct for each distinct word. The words length is stored in the
top eight bits of the hash value. Theres no need to store the characters in the word
since we can just look at the first occurrence (first entry in Word.first).
*/
struct Word {
Word *next;
ulong hash;
Occurrence *last;
Occurrence first;
};
/*
The structure of our private storage. The hashCodes[] array serves two purposes: it
distinguishes alphanumeric from non-alphanumeric characters, and it provides a
non-zero hash code for each alphanumeric character. The endParsedText field will
be -1 if there was enough private memory to parse all the text. Otherwise, it points to
the start of the unparsed text. nullChar is used by the BMH_Search() function when
we must search unparsed text for an occurrence.
*/
struct Private {
ulong hashCodes [ ALPHABET_SIZE ];
Word *hashTable [ HASH_BUCKETS ];
long endParsedText; /* start of parsed text */
long posBytes; /* POS_x_BYTES, below */
char nullChar; /* char not appearing in the text */
long heap; /* start of private heap */
};
Macros
/*
These macros simplify access to the occurrence positions stored in an Occurrence
struct. Posbytes is a macro argument that is usually set to private->posBytes.
However, you can also use a constant for posbytes, which lets the compiler choose
the right case at compile time, producing smaller and faster code.
*/
#define POS_2_BYTES 1 /* word position fits in 2 bytes */
#define POS_3_BYTES 0 /* fits in 3 bytes; usual case */
#define POS_4_BYTES 2 /* fits in 4 bytes */
#define GET_POS(pos,occur,index,posbytes) \
{ \
if ( (posbytes) == POS_3_BYTES ) \
pos = ((long)(occur)->p.pos3.hi[index] << 16) \
+ (occur)->p.pos3.lo[index]; \
else if ( (posbytes) == POS_2_BYTES ) \
pos = (occur)->p.pos2[index]; \
else \
pos = (occur)->p.pos4[index]; \
}
#define SET_POS(pos,occur,index,posbytes) \
{ \
if ( (posbytes) == POS_3_BYTES ) \
{ \
(occur)->p.pos3.hi[index] = (pos) >> 16; \
(occur)->p.pos3.lo[index] = (pos); \
} \
else if ( (posbytes) == POS_2_BYTES ) \
(occur)->p.pos2[index] = pos; \
else \
(occur)->p.pos4[index] = pos; \
}
InitFind
void InitFind (
char *textToSearch,
long textLength,
void *privateStorage,
long storageSize
)
{
Private *private = privateStorage;
private->endParsedText =
InitFindBody(
(uchar *)textToSearch,
textLength,
privateStorage,
(uchar *)privateStorage + storageSize
);
if ( private->endParsedText != -1 )
private->nullChar =
PickNullChar(
private,
(uchar *)textToSearch + private->endParsedText,
(uchar *)textToSearch + textLength );
else
private->nullChar = NO_NULL_CHAR;
}
InitFindBody
/*
This function does most of the work for InitFind(). The arguments have been recast
into a more useful form; uchar and ulong are used a lot so that we dont have to
worry about the sign, especially when indexing private->hashCodes[].
The return value is the character position when the unparsed text begins (if we run
out of private storage), or -1 if all the text was parsed.
*/
static long InitFindBody (
uchar *textToSearch,
long textLength,
Private *private,
uchar *endPrivateStorage
)
{
uchar *alloc, *textPos, *textEnd, *wordStart;
long wordLength;
ulong hash, code;
Word *word;
Occurrence *occur;
/*
Init table of hash codes. The remaining entries are guaranteed to be initialized to
zero. The hash codes were chosen so that any two codes differ by at least five bits.
*/
{
ulong *table = private->hashCodes; /* reduces typing */
table['0'] = 0xFFC0; table['5'] = 0xF492;
table['1'] = 0xFE07; table['6'] = 0xF31E;
table['2'] = 0xF98B; table['7'] = 0xF2D9;
table['3'] = 0xF84C; table['8'] = 0xCF96;
table['4'] = 0xF555; table['9'] = 0xCE51;
table['A'] = 0xC9DD; table['N'] = 0xA245;
table['B'] = 0xC81A; table['O'] = 0x9F0A;
table['C'] = 0xC503; table['P'] = 0x9ECD;
table['D'] = 0xC4C4; table['Q'] = 0x9941;
table['E'] = 0xC348; table['R'] = 0x9886;
table['F'] = 0xC28F; table['S'] = 0x959F;
table['G'] = 0xAF5C; table['T'] = 0x9458;
table['H'] = 0xAE9B; table['U'] = 0x93D4;
table['I'] = 0xA917; table['V'] = 0x9213;
table['J'] = 0xA8D0; table['W'] = 0x6DD3;
table['K'] = 0xA5C9; table['X'] = 0x6C14;
table['L'] = 0xA40E; table['Y'] = 0x6B98;
table['M'] = 0xA382; table['Z'] = 0x6A5F;
table['a'] = 0x6746; table['n'] = 0x3C88;
table['b'] = 0x6681; table['o'] = 0x3B04;
table['c'] = 0x610D; table['p'] = 0x3AC3;
table['d'] = 0x60CA; table['q'] = 0x37DA;
table['e'] = 0x5D85; table['r'] = 0x361D;
table['f'] = 0x5C42; table['s'] = 0x3191;
table['g'] = 0x5BCE; table['t'] = 0x3056;
table['h'] = 0x5A09; table['u'] = 0x0D19;
table['i'] = 0x5710; table['v'] = 0x0CDE;
table['j'] = 0x56D7; table['w'] = 0x0B52;
table['k'] = 0x515B; table['x'] = 0x0A95;
table['l'] = 0x509C; table['y'] = 0x078C;
table['m'] = 0x3D4F; table['z'] = 0x064B;
}
/* Determine the number of bytes needed to store each occurrence position. */
if ( textLength <= 0x10000L )
private->posBytes = POS_2_BYTES;
else if ( textLength <= 0x1000000L )
private->posBytes = POS_3_BYTES;
else
private->posBytes = POS_4_BYTES;
/* Set up variables to handle allocation of private storage. */
alloc = (uchar *)&private->heap;
/* Parse the text */
textPos = textToSearch;
textEnd = textPos + textLength;
while ( textPos != textEnd )
{
/* Search for start of word */
while ( private->hashCodes[*textPos] == 0 )
{
textPos++;
if ( textPos == textEnd )
return -1; /* parse all text */
}
wordStart = textPos;
/* Search for end of word; generate hash value too */
hash = 0;
while ( textPos != textEnd &&
(code = private->hashCodes[ *textPos ]) != 0 )
{
hash = (hash << 1) ^ code;
textPos++;
}
wordLength = textPos - wordStart;
hash = (hash & 0xFFFFFF) | (wordLength << 24);
/*
Record the occurrence. First we see if a Word struct exists for this word and
whether we need to allocate a new Occurrence struct.
*/
word = LookupWord(
private,
(char *)textToSearch,
(char *)wordStart,
wordLength,
hash );
if ( word )
{
long allocateNewBlock, blockSize, i, pos;
/*
This word has occurred before, so it already has a Word struct. See if theres
room in the last Occurrence block for another entry. Remember that entry #0 in
the Occurrence block is always in use, so we can start checking at entry #1 for a
non-zero entry.
*/
occur = word->last;
allocateNewBlock = TRUE;
switch ( private->posBytes )
{
case POS_2_BYTES: blockSize = 6; break;
case POS_3_BYTES: blockSize = 4; break;
case POS_4_BYTES: blockSize = 3; break;
}
for ( i = 1; i < blockSize; i++ )
{
GET_POS( pos, occur, i, private->posBytes )
if ( pos == 0 )
{
SET_POS( wordStart - textToSearch, occur, i,
private->posBytes )
allocateNewBlock = FALSE;
break;
}
}
if ( allocateNewBlock )
{
/* Block is full. Allocate new Occurrence block */
occur = (Occurrence *) alloc;
alloc += ALLOC_SIZE( sizeof(Occurrence) );
if ( alloc >= endPrivateStorage )
return wordStart-textToSearch; /* out of memory */
/* Init the new struct and link it to the end of the occurence list. */
SET_POS( wordStart - textToSearch, occur, 0,
private->posBytes )
word->last->next = occur;
word->last = occur;
}
}
else
{
long i;
/* This is a new word. Allocate a new Word struct, which contains an Occurrence
struct too. */
word = (Word *) alloc;
alloc += ALLOC_SIZE( sizeof(Word) );
if ( alloc >= endPrivateStorage )
return wordStart-textToSearch ; /* out of memory */
/* Link it to the start of the Word list, coming off the hash table. */
word->next = private->hashTable[ hash & HASH_MASK ];
private->hashTable[ hash & HASH_MASK ] = word;
/* Init the Word struct */
word->hash = hash;
word->last = &word->first;
/* Init the Occurrence struct */
SET_POS( wordStart - textToSearch, &word->first, 0,
private->posBytes )
}
}
/* Finished parsing text */
return -1;
}
FindWordOccurrence
long FindWordOccurrence (
char *wordToFind,
long wordLength,
long occurrenceToFind,
char *textToSearch,
long textLength,
void *privateStorage,
long storageSize
)
{
Private *private = privateStorage;
Word *word;
ulong hash;
/* Make occurenceToFind zero-based */
occurrenceToFind--;
/* Generate hash value for word to find */
hash = 0;
{
long remain = wordLength;
uchar *p = (uchar *) wordToFind;
while ( remain > 0 )
{
hash = (hash << 1) ^ private->hashCodes[*p++];
remain--;
}
hash = (hash & 0xFFFFFF) | (wordLength << 24);
}
/* Look for word/occurrence in hash table */
word = LookupWord( private, textToSearch, wordToFind,
wordLength, hash );
if ( word )
{
Occurrence *occur = &word->first;
long blockSize, pos, i;
/* Word exists in hash table, so go down the occurrence list. */
switch ( private->posBytes )
{
case POS_2_BYTES: blockSize = 6; break;
case POS_3_BYTES: blockSize = 4; break;
case POS_4_BYTES: blockSize = 3; break;
}
while ( occur && occurrenceToFind >= blockSize )
{
occurrenceToFind -= blockSize;
occur = occur->next;
}
if ( occur )
{
GET_POS( pos, occur, occurrenceToFind,
private->posBytes )
if ( occurrenceToFind == 0 || pos != 0 )
return pos;
occurrenceToFind -= blockSize;
}
occur = word->last;
for ( i = 0; i < blockSize; i++ )
{
GET_POS( pos, occur, i, private->posBytes )
if ( pos == 0 )
occurrenceToFind++;
}
}
/* Not in parsed text, so check the unparsed text */
if ( private->endParsedText != -1 )
{
char *p;
if ( wordLength > 3 )
p = BMH_Search(
private->hashCodes,
wordToFind,
wordLength,
occurrenceToFind,
textToSearch + private->endParsedText,
textToSearch + textLength,
private->nullChar );
else
p = SimpleSearch(
private->hashCodes,
wordToFind,
wordLength,
occurrenceToFind,
textToSearch + private->endParsedText,
textToSearch + textLength );
if (p)
return (p - textToSearch);
}
/* Not found */
return -1;
}
LookupWord
/* Look up a word in the hash table */
static Word *LookupWord (
Private *private,
char *textToSearch,
char *wordText,
long wordLength,
ulong hash
)
{
Word *word = private->hashTable[ hash & HASH_MASK ];
while ( word )
{
if ( word->hash == hash )
{
char *w1, *w2;
long pos, remain = wordLength;
/*
The hash values match, so compare characters to make sure its the right word.
We already know the word length is correct since the length is contained
in the upper eight bits of the hash value.
*/
GET_POS( pos, &word->first, 0, private->posBytes )
w1 = textToSearch + pos;
w2 = wordText;
while ( remain-- > 0 && *w1++ == *w2++ )
;
if ( remain == -1 )
return word;
}
word = word->next;
}
return NULL;
}
PickNullChar
/*
Find a character that doesnt appear anywhere in the unparsed text. BMH_Search() is
faster if such a character can be found.
*/
static char PickNullChar (
Private *private,
uchar *textStart,
uchar *textEnd
)
{
long i;
uchar *p, occurs[ ALPHABET_SIZE ];
for ( i = 0; i < ALPHABET_SIZE; i++ )
occurs[i] = FALSE;
for ( p = textStart; p < textEnd; p++ )
occurs[*p] = TRUE;
for ( i = 0; i < ALPHABET_SIZE; i++ )
if ( occurs[i] == FALSE && private->hashCodes[i] == 0 )
return i;
return NO_NULL_CHAR;
}
BMH_Search
/*
Search the unparsed text using the Boyer-Moore-Horspool algorithm. Ideally a null
character is supplied (one that appears in neither the search string nor the text being
searched). This allows the inner loop to be faster.
*/
static char *BMH_Search (
ulong *hashCodes, /* private->hashCodes */
char *wordToFind,
long wordLength,
long occurrenceToFind, /* 0 is first occurrence */
char *textStart, /* start of unparsed text */
char *textEnd, /* end of unparsed text */
char nullChar /* private->nullChar */
)
{
long i;
char *text, *wordEnd;
char word[256];
long offset[ ALPHABET_SIZE ];
/*
Copy the search string to a private buffer, where
the first character is the null character.
*/
word[0] = nullChar;
for ( i = 0; i < wordLength; i++ )
word[i+1] = wordToFind[i];
/* Set up the offset[] lookup table */
for ( i = 0; i < ALPHABET_SIZE; i++ )
offset[i] = wordLength;
for ( i = 1; i < wordLength; i++ )
offset[ word[i] ] = wordLength - i;
/* Let the search begin... */
wordEnd = word + wordLength;
text = textStart + wordLength - 1;
if ( nullChar == NO_NULL_CHAR )
{
/* No null character, so use a slower inner loop */
while ( text < textEnd )
{
long i;
char *p, *q;
for ( i = wordLength, p = wordEnd, q = text;
i > 0 && *p == *q;
i--, p--, q-- )
;
/*If i == 0, we have found the search string. Now we make sure that it is delimited.*/
if ( i == 0 && hashCodes[*q] == 0 &&
(text+1 == textEnd || hashCodes[text[1]] == 0) )
{
if ( occurrenceToFind == 0 )
return q+1;
occurrenceToFind--;
}
text += offset[*text];
}
}
else
{
/* There is a null character (usual case),
so we can use a faster and simpler inner loop. */
while ( text < textEnd )
{
char *p, *q;
for ( p = wordEnd, q = text; *p == *q; p--, q-- )
;
if ( p == word && hashCodes[*q] == 0 &&
(text+1 == textEnd || hashCodes[text[1]] == 0) )
{
if ( occurrenceToFind == 0 )
return q+1;
occurrenceToFind--;
}
text += offset[*text];
}
}
return NULL;
}
SimpleSearch
/*
Search the unparsed text using a simple search algorithm. Note that wordLength
must be 1, 2, or 3. This algorithm runs faster than BMH_Search() for small search
strings.
*/
static char *SimpleSearch(
ulong *hashCodes, /* private->hashCodes */
char *wordToFind,
long wordLength, /* 1..3 */
long occurrenceToFind, /* 0 is 1st occurrence */
char *textStart, /* start of unparsed text */
char *textEnd /* end of all text */
)
{
char *text, first;
first = wordToFind[0];
text = textStart;
if ( wordLength == 1 )
{
while ( text < textEnd )
{
while ( text < textEnd && *text != first )
text++;
if ( hashCodes[*(text-1)] == 0 &&
hashCodes[text[wordLength]] == 0 )
{
if ( occurrenceToFind == 0 )
return text;
occurrenceToFind--;
}
text++;
}
}
else if ( wordLength == 2 )
{
while ( text < textEnd )
{
while ( text < textEnd && *text != first )
text++;
if ( text[1] == wordToFind[1] &&
hashCodes[*(text-1)] == 0 &&
hashCodes[text[wordLength]] == 0 )
{
if ( occurrenceToFind == 0 )
return text;
occurrenceToFind--;
}
text++;
}
}
else /* wordLength == 3 */
{
while ( text < textEnd )
{
while ( text < textEnd && *text != first )
text++;
if ( text[1] == wordToFind[1] &&
text[2] == wordToFind[2] &&
hashCodes[*(text-1)] == 0 &&
hashCodes[text[wordLength]] == 0 )
{
if ( occurrenceToFind == 0 )
return text;
occurrenceToFind--;
}
text++;
}
}
return NULL;
}