May 95 Challenge
Volume Number: | | 11
|
Issue Number: | | 5
|
Column Tag: | | Programmers Challenge
|
Programmers Challenge
By Mike Scanlin, Mountain View, CA
Note: Source code files accompanying article are located on MacTech CD-ROM or source code disks.
Hy-phen-a-tion
Hyphenation algorithms come in two flavors: rule-based and dictionary-based. Of the two, dictionary-based is more reliable but has the downside of requiring a lot of storage space. Rule-based methods require considerably less space and have the option of using an exceptions dictionary to improve accuracy. This months Challenge is to implement a rule-based hyphenation algorithm.
The algorithm is this: Each part of the word on either side of the hyphen must include a vowel, not counting a final e, es or ed. The part of the word after the hyphen cannot begin with a vowel or a double consonant. No break is made between any two of the following letter combinations: sh, gh, ph, ch, th, wh, gr, pr, cr, tr, wr, br, fr, dr, vowel-r, vowel-n, or om. That means that if any two of those pairs occur next to each other, you cant break them apart (i.e. from contains fr and om, which are both on the list; therefore you would not split it between the r and the o). The letter y is not a vowel.
The two routines youll write are:
void *
InitHyphenation(maxRAM)
ulong maxRAM;
void
Hyphenate(privateDataPtr, inPtr, outPtr)
void *privateDataPtr;
Str255 *inPtr;
Str255 *outPtr;
The InitHyphenation routine is an untimed routine called once that can set up whatever tables you might want to use. It must not allocate more than maxRAM bytes, which will be between 16K and 64K.
Hyphenate is the routine that does the work. PrivateDataPtr is the return value from the InitHyphenation routine. InPtr is a pointer to a read-only unhyphenated Pascal word (containing only letters a..z and A..Z). OutPtr is a pointer to 256 bytes of space where you store the hyphenated word. You need to preserve the case of the input and you should insert a kHyphen byte (0x2D) everywhere the above algorithm tells you to (yes, the complete output is guaranteed to fit within a Str255).
Note that this algorithm is valid for English only. It will find about 70% of all valid hyphens and will make mistakes about 45% of the time. It is assumed that if this was part of a real application that a hyphenation exception word list would be kept. That list would be checked before calling this function. You, however, dont need to be concerned with that.
Write to me if you have any questions.
Two Months Ago Winner
Congrats to Gustav Larsson (Mountain View, CA) for winning the Method Dispatcher Challenge. Gustav was a virtual unknown to this column a few months ago but he is rising fast in the Top 20 rankings. The 20 points he wins this month move him from 10th place to 4th place.
Here are the times and code sizes for each entry. Numbers in parens after a persons name indicate that persons cumulative point total for all previous Programmer Challenges, not including this one:
Name time code
Gustav Larsson (40) 261 1040
Kevin Cutts (36) 304 266
Jeff Mallett (27) 338 1192
Xan Gregg 349 158
Ernst Munter (51) 385 1466
Thomas Studer 437 572
David Howarth 632 130
Scanlins brute force method 420 122
Everyone who entered implemented some kind of cache. Gustav chose a 10-way set-associative cache. He splits the 16K of usable cache space into 256 cache sets, each of which holds 10 entries. He uses a hash based on the class and method numbers to map to one of the 256 caches. Once hes got that he checks the 10 entries for a match. If he doesnt find a match he uses an efficient binary search to look for the method within the class.
In designing any cache, the choice of hash function and amount of set-associativity (if any) is highly dependent on your data and access pattern. Gustavs code could be tuned for a variety of efficient cache uses. Nice job, Gustav.
Poker Winner Disqualified
Turns out that I failed to do adequate testing on the winning entry for the Poker Hand Evaluator Challenge, in which Kevin Cutts was the published winner. Unfortunately, Im going to have to retro-actively disqualify Kevin and award the 1st place prize to Gustav Larsson. My apologies to Kevin and the other MacTech readers. Rather than publish Gustavs well-commented winning solution (its quite long) Ill make it available via e-mail. If youre interested, send me a note at scanlin@genmagic.com or at any of the Programmer Challenge electronic addresses (see p. 2).
This latent win for Gustav means that hes 3 for 3 in the last 3 challenges, which is an unmatched winning streak. Congrats, Gustav! Also, Dave Darrah was the first person to point out the flaws in Kevins code and so he receives 5 extra points in the cumulative point totals for doing so. Thanks, Dave!
Top 20 Contestants of All Time
Here are the Top 20 Contestants for the 33 Programmers Challenges to date. The numbers below include points awarded for this months top 5 entrants. (Note: ties are listed alphabetically by last name -- there are 23 people listed this month because 7 people have 20 points each.)
1. Boonstra, Bob 176
2. Karsh, Bill 71
3. Stenger, Allen 65
4. Larsson, Gustav 60
5. Munter, Ernst 53
6. Riha, Stepan 51
7. Goebel, James 49
8. Cutts, Kevin 46
9. Nepsund, Ronald 40
10. Vineyard, Jeremy 40
11. Darrah, Dave 34
12. Mallet, Jeff 34
13. Landry, Larry 29
14. Elwertowski, Tom 24
15. Kasparian, Raffi 24
16. Lee, Johnny 22
17. Anderson, Troy 20
18. Burgoyne, Nick 20
19. Galway, Will 20
20. Israelson, Steve 20
21. Landweber, Greg 20
22. Noll, Bob 20
23. Pinkerton, Tom 20
There are three ways to earn points: (1) by scoring in the top 5 of any challenge, (2) by being the first person to find a bug in a published winning solution or, (3) being the first person to suggest a challenge that I use. The points you can win are:
1st place 20 points
2nd place 10 points
3rd place 7 points
4th place 4 points
5th place 2 points
finding bug 5 points
suggesting challenge 2 points
Here is Gustavs winning solution:
Dispatch.c
Copyright ©1995 Gustav K. Larsson
#define kMethodNotFound ((void *) 0)
#define kClassNotFound ((void *) -1)
typedef unsigned char uchar;
typedef unsigned short ushort;
typedef unsigned long ulong;
typedef ushort ClassID;
typedef ushort MethodNumber;
typedef void * MethodAddress;
typedef struct {
MethodNumber methodNumber;
MethodAddress methodAddress;
} MethodEntry;
typedef struct {
ushort inheritedCount;
ushort inheritedClasses[15];
MethodNumber largestMethodNumber;
ushort methodCount;
MethodEntry methods[];
} Class, *ClassPtr;
/* Each block in the cache is 64 bytes and holds 10 entries. The id field distinguishes class/method pairs
that map to the same cache block. The hits field has a bit for each entry, which is set whenever there is a
hit on the entry. The "next" field is 0..9 and indicates where to check first when deciding which entry to replace.
*/
#define BLOCK_SIZE 10
typedef struct {
ushort id [ BLOCK_SIZE ];
ushort hits;
ushort next;
MethodAddress methodAddress [ BLOCK_SIZE ];
} CacheBlock;
static CacheBlock Cache[256]; /* exactly 16K */
GetClassPtr
extern ClassPtr GetClassPtr( ClassID );
MethodAddress FindMethod( ClassID, MethodNumber );
static MethodAddress HandleMiss( ClassID, MethodNumber,
CacheBlock *, ushort );
FindMethod
MethodAddress
FindMethod( ClassID classID, MethodNumber methodNumber )
{
register ulong hash;
register ushort *idPtr;
register CacheBlock *block;
Check cache for this class & method.
/* First, generate a hash key that is unique for each class/method pair. The formula classID + 437 * method
works since 437 is greater than the highest class ID, 400. The ideal multiplier depends heavily on the distribution
of class/method pairs, but 437 seems to give a reasonable spread under a variety of conditions.
The low byte of the hash key becomes the block number, and the higher bytes become the "id" (to distinguish
class/method pairs that map to the same block). Add one to the id so it is never zero (zero indicates an unused
cache entry). */
hash = classID + 437L * methodNumber;
block = &Cache[ (uchar)hash ];
hash = (hash >> 8) + 1; /* now hash holds just the id */
idPtr = block->id;
if ( *idPtr++ == (ushort)hash ) goto hit0;
if ( *idPtr++ == (ushort)hash ) goto hit1;
if ( *idPtr++ == (ushort)hash ) goto hit2;
if ( *idPtr++ == (ushort)hash ) goto hit3;
if ( *idPtr++ == (ushort)hash ) goto hit4;
if ( *idPtr++ == (ushort)hash ) goto hit5;
if ( *idPtr++ == (ushort)hash ) goto hit6;
if ( *idPtr++ == (ushort)hash ) goto hit7;
if ( *idPtr++ == (ushort)hash ) goto hit8;
if ( *idPtr++ == (ushort)hash ) goto hit9;
return HandleMiss( classID, methodNumber,
block, (ushort)hash );
/* Handle cache hit */
hit0: block->hits |= 0x001; return block->methodAddress[0];
hit1: block->hits |= 0x002; return block->methodAddress[1];
hit2: block->hits |= 0x004; return block->methodAddress[2];
hit3: block->hits |= 0x008; return block->methodAddress[3];
hit4: block->hits |= 0x010; return block->methodAddress[4];
hit5: block->hits |= 0x020; return block->methodAddress[5];
hit6: block->hits |= 0x040; return block->methodAddress[6];
hit7: block->hits |= 0x080; return block->methodAddress[7];
hit8: block->hits |= 0x100; return block->methodAddress[8];
hit9: block->hits |= 0x200; return block->methodAddress[9];
}
HandleMiss
static MethodAddress
HandleMiss( ClassID classID, MethodNumber methodNumber,
CacheBlock *blockPtr, ushort id )
{
register ClassPtr classPtr;
MethodAddress methodAddress;
register ushort *temp; /* shared address register */
/* Not in cache, so look it up the hard way */
classPtr = GetClassPtr( classID );
if ( classPtr != kClassNotFound ) {
/* Look in this class. Use a binary search. */
{
register MethodEntry *methods = classPtr->methods;
ushort searchSize = classPtr->methodCount;
register MethodNumber methodReg = methodNumber;
/* method # in a register */
/* Unroll the binary search for the most common cases. The BINARY macro reduces the search to progressively
more elementary cases. If classPtr->methodCount is greater than 32, we run a general binary search until
the search is reduced to an unrolled case. */
continueBinarySearch:
switch ( searchSize )
{
bin2: case 2:
if ( methods[1].methodNumber == methodReg ) {
methodAddress = methods[1].methodAddress;
goto addCache;
}
/* else fall through... */
bin1: case 1:
if ( methods[0].methodNumber == methodReg ) {
methodAddress = methods[0].methodAddress;
goto addCache;
}
else goto checkSuperclasses;
#define BINARY(mid,mid1) \
if ( methodReg < methods[mid].methodNumber ) \
goto bin##mid; /* like "hi = mid" */ \
else { \
methods += mid; /* like "lo = mid" */ \
goto bin##mid1; \
}
/* binN: case N: BINARY(N/2,(N+1)/2) */
bin3: case 3: BINARY(1,2)
bin4: case 4: BINARY(2,2)
bin5: case 5: BINARY(2,3)
bin6: case 6: BINARY(3,3)
bin7: case 7: BINARY(3,4)
bin8: case 8: BINARY(4,4)
bin9: case 9: BINARY(4,5)
bin10: case 10: BINARY(5,5)
bin11: case 11: BINARY(5,6)
bin12: case 12: BINARY(6,6)
bin13: case 13: BINARY(6,7)
bin14: case 14: BINARY(7,7)
bin15: case 15: BINARY(7,8)
bin16: case 16: BINARY(8,8)
bin17: case 17: BINARY(8,9)
bin18: case 18: BINARY(9,9)
bin19: case 19: BINARY(9,10)
bin20: case 20: BINARY(10,10)
bin21: case 21: BINARY(10,11)
bin22: case 22: BINARY(11,11)
bin23: case 23: BINARY(11,12)
bin24: case 24: BINARY(12,12)
bin25: case 25: BINARY(12,13)
bin26: case 26: BINARY(13,13)
bin27: case 27: BINARY(13,14)
bin28: case 28: BINARY(14,14)
bin29: case 29: BINARY(14,15)
bin30: case 30: BINARY(15,15)
bin31: case 31: BINARY(15,16)
bin32: case 32: BINARY(16,16)
#define NUM_UNROLLED 32 /* # of unrolled cases */
default:
if (methodReg <= classPtr->largestMethodNumber) {
register ushort lo, mid, hi, mn;
lo = 0;
hi = classPtr->methodCount - 1;
while ( lo + NUM_UNROLLED - 1 < hi ) {
mid = (lo + hi) >> 1;
mn = methods[mid].methodNumber;
if ( methodReg < mn )
hi = mid;
else if ( methodReg > mn )
lo = mid;
else {
methodAddress = methods[mid].methodAddress;
goto addCache;
}
}
/* reduced to an unrolled case */
searchSize = hi-lo+1;
methods += lo;
goto continueBinarySearch;
}
}
}
/* Look in superclasses. Eliminating the recursion by keeping a custom stack of critical variables doesn't
buy us much. Recursion also makes the code clearer. */
checkSuperclasses:
{
register ulong i, max = classPtr->inheritedCount;
/* shared address register points into class list */
temp = classPtr->inheritedClasses;
for ( i = 0; i < max; i++ ) {
methodAddress = FindMethod(*temp++, methodNumber);
if ( methodAddress != kMethodNotFound )
goto addCache;
}
}
}
/* There are two ways to get here: classPtr is kClassNotFound (hopefully rare), or the method was not found
in any of the superclasses. Either way, put a "not found" entry into the cache. This helps when searching
complicated inheritance hierarchies or repeatedly looking up the same method in several related subclasses.
*/
methodAddress = kMethodNotFound;
Add an entry to the cache.
addCache:
// shared address register holds block pointer
temp = (ushort*) blockPtr;
#define block ((CacheBlock*) temp)
{
register ulong hits = block->hits;
register ulong next = block->next;
register ulong mask = 1L << next;
/* Choose the entry to replace. Look for a cleared bit in "hits" starting at "next". If all the bits are 1 initially,
go all the way around; we are guaranteed to stop since we clear bits as we go. */
while ( hits & mask ) {
hits &= ~mask; /* no, clear the bit */
mask <<= 1; /* and try next bit */
if ( mask == (1 << BLOCK_SIZE) )
mask = 1;
next++;
}
if ( next >= BLOCK_SIZE )
next -= BLOCK_SIZE;
block->methodAddress[ next ] = methodAddress;
block->id[ next ] = id;
block->hits = hits;
block->next = ( next < BLOCK_SIZE-1 ? next+1 : 0 );
}
return methodAddress;
}