Mar 93 Challenge
Volume Number: | | 9
|
Issue Number: | | 3
|
Column Tag: | | Programmers Challenge
|
Programmers Challenge
By Mike Scanlin, MacTech Magazine Regular Contributing Author
Note: Source code files accompanying article are located on MacTech CD-ROM or source code disks.
Count Unique Words
Most word processors these days have a Count Words command. The quality in terms of accuracy and speed of these commands varies quite a bit. I tested three leading word processors with a document containing 124,829 characters and got three different answers ranging from 18446 words to 18886 words and times ranging from 4 seconds to 11 seconds. Im not sure what the correct answer was for that document; it depends on how you define what a word is.
For purposes of this months challenge, a word is defined as an unbroken set of one or more letters. The input text will only contain upper and lower case letters a to z, spaces, carriage returns, periods and commas (for a total of 56 possible byte values). No digits, hyphens, tabs, other punctuation, etc. Since counting words using this simplified definition is rather trivial, youre going to count the number of unique words instead.
The prototype of the function you write is:
unsigned short CountUniqueWords(textPtr, byteCount)
PtrtextPtr;
unsigned short byteCount;
Your function should return the number of unique words (case insensitive) in the input text. The maximum word length for individual words in the input text is 255 characters.
This is my 7th programmers challenge that Ive posed to MacTech readers. I have received approximately very little feedback as to what you think of these challenges. Are they too easy, too hard, too uninteresting, or what? Do you want hard core numerical analysis puzzles (like write a fast sqrt function) or do you want Mac-specific problems (like write a fast TileAndStackWindows function) or are things okay as they are? If you have any ideas for future challenges, please send them in (credit will be given in this column if I use one of your ideas). Thanks.
Two Months Ago Winner
The winner of the Travelling Salesman challenge is Ronald Nepsund (Northridge, CA) whose solution was the only one of the five I received which gave correct results. The time intensive part of solutions to this class of problems is the distance between two points calculation, which involves a square root. Ronald uses a precomputed sqrt table for values 0 to 25 to eliminate much of this time.
A couple of people chose the algorithm of find the closest city to where we currently are and move to that city; repeat until all cities have been visited which is not correct. An example set of input data that broke everyone but Ronalds solution is: numCities = 8, startCityIndex = 5, *citiesPtr = {1,1}, {2,1}, {3,1}, {2,2}, {1,3}, {2,3}, {3,3}, {2,4}. If you draw it and work it out by hand (through trial and error) you can see that the minimum path distance is 8.66. There is more than one correct ordering for the optimal path but all of the optimal paths will have that same length.
Here is Ronalds winning solution to the January Challenge:
//***********************************
// Travelling Salesman
// by Ronald M. Nepsund
#include <math.h>
#define fracBase 0x20000000
//There are two 32 by 20 arrays of longs
//which together give the distance betwean
//any two cities.
//Instead of using Array[i,j] to access
//the array Array[(i<<5)+j] is used
//and two longs are needed to accurately
//measure the distance betwean cities
//so two arrays of longs are used.
//gDistanceFrac is used to hold the
//fractional part of distance in 1/0x20000000
//of a unit.
long gDistanceInt[640],
gDistanceFrac[640];
//these are used to represent a path betwean
//the cities.
Byte gNextCity[20],gOptPath[20];
//how long is the currently selected best
//path so far.
long gBestPathLength,gFracBestPathLength;
unsignedshort gNumCities;
unsignedshort gStartCityIndex;
//precalculated square root for zero to 25
long qSquTableInt[] =
{0,1,1,1,2,2,2,2,2,3,3,3,3,3,3,3,
4,4,4,4,4,4,4,4,4,
5,5,5,5,5,5,5,5,5,5,5};
//precalculated square root - fractional part
//in 1/0x20000000 of a whole unit
long qSquTableFrac[]=
{0,0,222379212,393016784,0,126738030,
241317968,346685095,444758425,0,
87122155,169986639,249162657,
325102865,398174277,468679365,0,
66091829,130266726,192682403,
253476060,312767944,370664138,
427258795,482635936,0 };
void DoPath(short cityIndex, long InttPathLength,
long fracPathLength);
//The recursive routien that actually finds
//the shortest path.
void DoPath(registershort cityIndex,
long InttPathLength,
long fracPathLength)
{
register short i;
BooleanlastCity;
long offset;
if (fracPathLength > fracBase) {
//the fractional value variable has
//exceeded the value of one whole
//unit
InttPathLength += 1;
fracPathLength -= fracBase;
}
//Has the path has become longer than the
//shortest path we have already found?
if (InttPathLength > gBestPathLength ||
((InttPathLength == gBestPathLength) &&
(fracPathLength >= gFracBestPathLength)))
return;
//lastCity is used to tell if all the
//cities have been visited
lastCity = TRUE;
//for each city
for(i = 0; i<gNumCities; i++)
//if not to same city or already
//visited city
if ( i != cityIndex &&
gNextCity[i] == 0xFF) {
//not at the end of the path
lastCity = FALSE;
//path from city cityIndex to i
gNextCity[cityIndex] = i;
//offset into distance arrays
offset = (cityIndex << 5) + i;
//go to next city adding the
//distance to that city to the
//path length
DoPath(i,
InttPathLength+gDistanceInt[offset],
fracPathLength+gDistanceFrac[offset]);
} //end if and for
// if this is the last city in the chain and
// is a shorter path than the previous best
if ((lastCity) &&
((InttPathLength < gBestPathLength) ||
((InttPathLength == gBestPathLength) &&
(fracPathLength < gFracBestPathLength))
) ) {
// make this the current best path
register long *LPnt1,*LPnt2;
//this is the current shortest path
//length now
gBestPathLength = InttPathLength;
gFracBestPathLength = fracPathLength;
//copy path to optPath
LPnt1 = (long *)&gNextCity;
LPnt2 = (long *)&gOptPath;
for (i= ((3+gNumCities) >> 2); i>0; i-)
*LPnt2++ = *LPnt1++;
} else
//this city is no longer connected to
//the next city
gNextCity[cityIndex] = 0xFF;
}
void InitDistances(unsigned short numCities
Point *citiesPtr);
//initialize two arrays which will give the
//distance betwean any two cities.
void InitDistances(
unsigned short numCities,
Point *citiesPtr)
{
short i,j,offset;
register long *LPntl1,*LPntF1,
*LPntI2,*LPntF2;
long dist;
short deltax,deltay;
double X;
//The distance from city i to j is the same
//as from city j to i.
//Use pointers into the arrays
//We will add a constant to the pointers to
//step through the array
//instead of doing a multiplication to find
//the wanted entries in the array
//how far is it betwean any two cities
for (i=0; i<numCities; i++){
LPntl1 = gDistanceInt + i;
LPntF1 = gDistanceFrac + i;
offset = i << 5;
LPntI2 = gDistanceInt + offset;
LPntF2 = gDistanceFrac + offset;
for (j=0; j<=i; j++)
if (i==j){
//both pointers are pointing
//to the same locations in the array
//distance to the same city is zero
*LPntI2++ = 0;
*LPntl1 = 0; LPntl1 += 32;
*LPntF2++;
LPntF1 += 32;
} else {
//calculate horizontal and vertical
//distance betwean city i and j
deltax = citiesPtr[i].h-
citiesPtr[j].h;
deltay = citiesPtr[i].v-
citiesPtr[j].v;
//The distance betwean the cities is
// squareRoot( deltax*deltax +
// deltay*deltay)
//Where you can, do multiplications
//using shorts instead of longs -
//They are faster.
if (-255< deltax && deltax<256)
if (-255< deltay && deltay<256)
dist = ((long)(deltax*deltax) +
(long)(deltay*deltay));
else
dist = ((long)(deltax*deltax) +
(long)deltay*deltay);
else
if (-255< deltay && deltay<256)
dist = ((long)deltax*deltax +
(long)(deltay*deltay));
else
dist = ((long)deltax*deltax +
(long)deltay*deltay);
//do squareRoot
if (dist <= 25) {
//use sqrt lookup tables for
//0 to 25
*LPntI2++ = *LPntl1 =
qSquTableInt[dist];
LPntl1 += 32;
*LPntF2++ = *LPntF1 =
qSquTableFrac[dist];
LPntF1 += 32;
} else {
X = sqrt(dist);
//gDistanceInt[(i<<5) + j] = X;
//gDistanceInt[(j<<5) + i] = X;
//integer part of distance
// between points
dist = X;
*LPntl1 = *LPntI2++ = dist;
LPntl1 += 32;
//gDistanceFrac[i<<5 + j] =
// (X - dist) * $20000000;
//gDistanceFrac[j<<5 + i] =
// (X - dist) * $20000000;
// fractional part
dist = (X - dist) * fracBase;
*LPntF2++ = *LPntF1 = dist;
LPntF1 += 32;
}
}
}
}
void OptimalPath(unsigned short numCities
unsigned short startCityIndex,
Point *citiesPtr,Point *optimalPathPtr);
void OptimalPath(numCities,startCityIndex,citiesPtr,
optimalPathPtr)
unsigned short numCities;
unsigned short startCityIndex;
Point *citiesPtr;
Point *optimalPathPtr;
{
register short i,j;
long time,index;
double X;
//generates the tables for the distances
//betwean any two cities.
//This routien takes up most of the time.
InitDistances(numCities,citiesPtr);
//OxFF means that there is no path from
//this city to another
for (i=0; i<numCities; i++)
//no paths betwean cities
gNextCity[i] = 0xFF;
gNumCities = numCities;
gStartCityIndex = startCityIndex;
//any path done by DoPath will be shorter
//than this
gBestPathLength = 0x7FFFFFFF;
gFracBestPathLength = 0;
//find the best path
DoPath(startCityIndex,0,0);
//put the best path into the form
//desired for optimalPath
j=startCityIndex;
for(i=0; i<numCities; i++) {
optimalPathPtr[i] = citiesPtr[j];
j = gOptPath[j];
}
}
Rules
Heres how it works: Each month there will be a different programming challenge presented here. First, you must write some code that solves the challenge. Second, you must optimize your code (a lot). Then, submit your solution to MacTech Magazine (formerly MacTutor). A winner will be chosen based on code correctness, speed, size and elegance (in that order of importance) as well as the postmark of the answer. In the event of multiple equally desirable solutions, one winner will be chosen at random (with honorable mention, but no prize, given to the runners up). The prize for the best solution each month is $50 and a limited edition The Winner! MacTech Magazine Programming Challenge T-shirt (not to be found in stores).
In order to make fair comparisons between solutions, all solutions must be in ANSI compatible C. All entries will be tested with the FPU and 68020 flags turned off in THINK C. When timing routines, the latest version of THINK C will be used (with ANSI Settings plus Honor register first and Use Global Optimizer turned on) so beware if you optimize for a different C compiler.
The solution and winners for this months Programmers Challenge will be published in the issue two months later. All submissions must be received by the 10th day of the month printed on the front of this issue.
All solutions should be marked Attn: Programmers Challenge Solution and sent to Xplain Corporation (the publishers of MacTech Magazine) via snail mail or preferably, e-mail - AppleLink: MT.PROGCHAL, Internet: progchallenge@xplain.com, and CompuServe: 71552,174. If you send via snail mail, please include a disk with the solution and all related files (including contact information). See page 2 for information on How to Contact Xplain Corporation.
MacTech Magazine reserves the right to publish any solution entered in the Programming Challenge of the Month and all entries are the property of MacTech Magazine upon submission. The submission falls under all the same conventions of an article submission.