Aug 96 Challenge

Volume Number:		12
Issue Number:		8
Column Tag:		Programmer’s Challenge

Programmer’s Challenge

By Bob Boonstra

Note: Source code files accompanying article are located on MacTech CD-ROM or source code disks.

A-Maze-ing

Deep underground in an abandoned mine. No lantern. No rope. Not much air. Or something like a lack of air that makes you want to leave in a hurry. This month you find yourself lost in an underground maze with nothing except your trusty Challenge solution to help you quickly find your way out.

The prototype for the code you should write is:

typedef Boolean  /* found exit */ (*MoveProc) (
 long xMove,/* amount of attempted move in x direction (1, 0, or -1) */
 long yMove,/* amount of attempted move in y direction (1, 0, or -1) */
 long zMove,/* amount of attempted move in z direction (1, 0, or -1) */
 long *newXPos,  /* new x position after attempted move */
 long *newYPos,  /* new y position after attempted move */
 long *newZPos   /* new z position after attempted move */
);

Boolean /* found exit */ Maze (
 long xPos, /* x component of initial position */
 long yPos, /* y component of initial position */
 long zPos, /* z component of initial position */
 long xSize,/* size of maze in x dimension */
 long ySize,/* size of maze in y dimension */
 long zSize,/* size of maze in z dimension */
 MoveProc MakeAMove, /* callback to attempt a move */
 char *mapStorage/* one byte for each position in the maze */
);

Your Maze routine will be called with your initial position (xPos,yPos,zPos) and the address of a callback routine you can use to attempt to move within the maze. Your code should make sequential calls to the MakeAMove callback until the callback returns TRUE, indicating that you have found your way out of the maze. All exits are on the boundaries of the maze (i.e., at least one coordinate equals 0 or [xyz]Size-1). With each move, you can attempt to move up to one unit in each dimension (x, y, and/or z). The callback will return your new position (*newXPos,*newYPos,*newZPos) after the attempted move. If a passageway exists in the desired direction, the new position will be offset from your previous position by the desired amount. If no passageway exists, the new position will be the same as the previous position (and you will be slightly bruised from walking into solid rock). One slight complicating factor is the combination of darkness and gravity - if you walk into a position where a vertical shaft is open below you, the callback will return a position where the vertical (z) coordinate is reduced the distance you fell. For this reason, moves in a z-only direction are never needed, as a purely downward move will always fail (you will already have fallen), and a purely upward move would be futile (you would fall back to your previous position).

The callback will obviously know which positions in the maze are filled with rock and which are open. In the event you wish to construct a map as you grope around in darkness, the mapStorage parameter will point to xSize*ySize*zSize bytes of storage for your use, initialized to zero by the caller. Once the callback indicates that you have found an exit for the maze, you should return TRUE. If you cannot find an exit, you should return FALSE (rather than loop forever), although this should not happen, because a path to an exit will always exist.

Selection of the winning solution will be based on total execution time of the Maze routine, including the time spent in the callback. The same callback code will be used to test each solution, providing a fixed time penalty for each attempted move.

This will be a native PowerPC Challenge, using the latest CodeWarrior environment. Solutions may be coded in C, C++, or Pascal. To keep the contest focused on code efficiency (rather than compiler efficiency), all solutions will be scored using the Metrowerks compiler for the appropriate language.

Next Month - PowerPC Assembly Language

Next month is the annual assembly language Challenge, and this will be the first year that the Challenge will be in PowerPC assembly language. Depending on the difficulty of the problem, I may provide the electronic version of the problem statement a little early to allow some extra time for you to code your solutions. (See “The New Deadline”, on the previous page, for instructions on subscribing to the Challenge mailings.)

Two Months Ago Winner

Congratulations once again to Ernst Munter (Kanata, ON), for submitting the fastest entry to the Programmer’s Challenge. This Challenge called for implementation of a comparison-free distribution sort algorithm that, in theory, can execute in time that grows linearly with the number of input records. While Ernst won the Postman’s Sort Challenge by default, as the only person to submit an entry, his solution is both efficient and instructive.

The problem statement required that the solution deal with cases when memory was scarce, requiring the distribution sort being performed using temporary disk files. Ernst’s solution also took advantage of the case when ample storage was available. The table below summarizes the results for a test case of 100000 records, given available working storage of 64KB, 640KB, 1MB, and 10MB, respectively. Numbers in parentheses after Ernst’s name indicate his cumulative point total for all previous Challenges, not including this one.

time (seconds)

Name size data Memory: 64K 640K 1MB 10MB

Ernst Munter (174) 4312 1112 2314 221 201 18

I examined a few other cases to indicate the growth in sort time as a function of the number of records. Much more information would be needed for a definitive conclusion, but the growth appears to be somewhere between the O(Nlog2N) behavior of comparison sort algorithms and the theoretical best case linear behavior of a distribution sort. I’ve included the results for a case where the temporary sort files were written to a RAM disk, which clearly indicates the dominant role of disk I/O time.

time (seconds)

Scenario / Memory available	640K	1MB	10MB
20000 records	64	36	3
100000 records	221	201	18
500000 records	1912	1514	644
20000 records (RAM disk)	9	5	2

With this win, Ernst Munter takes over first place in the Programmer’s Challenge points standing. Congratulations, Ernst!

Ernst has vaulted into first place by winning four of the last five Challenge contests. To give greater recognition to contestants like Ernst who are currently participating in the Challenge, I’ve decided to revise slightly the Top Contestants table that we publish each month. From now on, the published point total will include points earned in the twenty-four most recent contests. With this new scheme, a single first-place finish is currently enough to earn you a spot in the Top Contestants table.

Top Twenty Contestants

Here are the top twenty contestants for the Programmer’s Challenge. The numbers below include points awarded over the twenty-four most recent contests, including points earned by this month’s entrants.

Rank	Name	Points	Rank	Name	Points
1.	Munter, Ernst	187	11.	Vineyard, Jeremy	22
2.	Gregg, Xan	92	12.	Cutts, Kevin	21
3.	Larsson, Gustav	87	13.	Picao, Miguel Cruz	21
4.	[Name deleted]	74	14.	Brown, Jorg	20
5.	Lengyel, Eric	40	15.	Gundrum, Eric	20
6.	Lewis, Peter	30	16.	Karsh, Bill	19
7.	Darrah, Dave	29	17.	Stenger, Allen	19
8.	Beith, Gary	24	18.	Mallett, Jeff	17
9.	Kasparian, Raffi	22	19.	Nevard, John	17
10.	Noll, Robert	22	20.	Nicolle, Ludovic	14

There are three ways to earn points: (1) scoring in the top five of any Challenge, (2) being the first person to find a bug in a published winning solution, or (3) being the first person to suggest a Challenge that I use. The points you can win are:

1st place 20 points 5th place 2 points

2nd place 10 points finding bug 2 points

3rd place 7 points suggesting Challenge 2 points

4th place 4 points

Here is Ernst’s winning solution:

PostPlus.cpp

/*
    “Postman’s Sort”

Problem Statement
-----------------
 Sort an address file, using distribution sort, and avoid any direct or disguised 
 (indirect) comparison between sort keys of the input records.

Solution
--------
 It was tempting to create a sorted index tree of the keys, and then sort input records 
 by comparing (descending) the sort tree. However, I think that would not have been 
 in the spirit of the Challenge, which was, I believe, to find a fast implementation of 
 Bob Ramey’s postman’s sort algorithm. Hence, this solution works entirely by 
 distributing records into bins, or stacks, directly indexed by a single character.

 Tabs, and end-of-record markers are converted to \x0, the range \x1 to \x1F is used 
 to sort numeric fields by their length. Characters above ‘a’ are reduced by 0x20, 
 which makes all upper and lower case characters the same. Coincidentally, it also 
 sets {|}~ equal to [\]^.

 Whether the input file fits in memory or not, the principle of sorting is the same:

 All records are parsed into nodes and attached to stacks based on the current sort 
 key, for example the first letter of country. Then each stack is recursively sorted on 
 the next sort key, i.e. the second letter of country, in the same manner.

 If this is done in-memory (MemSort) the stacks are linked and detached from the 
 global STACK array, before sorting each stack.

 If the stack was too large to fit in memory, it is sent in segments to a temporary file 
 on disk. The temp file then contains interleaved segments of stacks which we keep 
 track of in a local variable (bin).

 To continue sorting, the segments of a particular stack are retrieved from disk, and 
 the whole stack may now fit in memory for final sorting. If it does not, it is again 
 sorted into segments and sent to a new temp file on disk.

 The number of temp files needed is equal to the recursion depth of the disk sort, until 
 a stack can be sorted to the finish in memory. The more memory we have, the fewer 
 temp files are needed.

Assumptions
-----------

 tmpfile() is used to create temporary files. Depending on system setup, the system 
 may run out of files to open, and sorting will then stop.

 The order of numerical fields will be natural numerical, that is 9, 12, 30, 100. 
 However, leading zero’s are not expected (or we get 41, 80, 053, 102).

 The length of a purely numeric field to be handled correctly should not exceed 31. 
 Longer numeric fields could get sorted behind fields starting with punctuation.

 The longest record expected should be substantially less than the constant MINMEM 
 which is set to 1024, but this limit can be relaxed. It plays a role only when we are 
 very short of memory.

 All ASCII characters may occur, and will be sorted on. Any control characters other 
 than Tab (0x09) will be treated as end-of-record indicators.

 The sort is not “stable”, in the sense that the order of equivalent records (differing in 
 case only, as an example) is not preserved. Records are tossed on stacks and into 
 bins, and are frequently retrieved in reverse order. Furthermore, running the program 
 with different amounts of storage may result in a different order as well (U/L case).
*/

#include <stdlib.h>
#include <stdio.h>
#include <memory.h>
#include <string.h>

#define uchar unsigned char
#define ushort unsigned short
#define ulong unsigned long

#define NUM_KEYS 224
#define MINMEM   1024
#define MIN_BUFFER 512
#define DEF_BUFFER_SIZE 16384
#define numFields9
#define lastField(numFields-1)
#define TAB_CHAR 0x09
#define END_RECORD 0x0D

typedef struct Node Node;
struct  Node {
 uchar* ref;
 Node*  next;
 Node*  link;
 ushort fld[numFields];
 ushort len;
};

typedef struct tHeader tHeader;
struct  tHeader {
 long next;
 int  size;
 int  key;
};

typedef struct Bin Bin;
struct  Bin {
 FILE* tf;//temp file
 long next; //next segment in temp file
 int  bytesLeft; //in current segment
 long fpos[NUM_KEYS];
};

typedef struct OutputBuffer OutputBuffer;
struct  OutputBuffer {
 int  space;//space left
 int  size; //size of buffer
 uchar c[4];//will be c[size]
};

static  FILE*    theOutFile;
static  OutputBuffer*buffer;
static  int topKey;
static  Node*    STACK[NUM_KEYS];

Macros and prototypes
// Output buffer functions:
// The output buffer accumulates characters to be sent to the output file. The block 
// size is chosen as 16K, but reduced to 8K if less than 128K of storage is available. 
// The buffer isreleased if, after many recursions storage space is severely reduced and 
// we need the memory for sorting.

void  FlushBuffer();
intOpenBuffer(uchar* storage,int storageSize);
void  CloseBuffer();
void  BufferedWrite(uchar* text,int size);
void  BufferedSend(Node* node);

// in-memory sort and parsing functions:

Node* Link(int* numeric);
intTestNumeric(int key,uchar* text,int len);
void  Dist(Node* node,int field,int offset,int test);
void  MemSort(int field,int offset);
intParse(uchar* text,Node* node,
 int size,int field,int offset,int test);

// SStream functions:
// SStreams are virtual files which are interleaved on a single disk file.
// Each SStream may consist of several segments, linked together through small
// headers (tHeader) on the disk. The last segment is pointed to from a “bin”.
// Bins administer access to these streams.

void  InitBin(Bin* bin);
intOpenBin(Bin* bin);
void  ResetBin(Bin* bin);
void  CloseBin(Bin* bin);
intSendToTemp(Bin* bin);
intSegmentSize(Node* node);
void  Send(Node* node,FILE* f);
intSStreamSeek(Bin* bin,int key,long pos);
intSStreamRead(uchar* text0,int size,Bin* bin);
long  SStreamSize(Bin* bin,int key);
void  SendSStreamOut(uchar* text,int storageSize,Bin* bin,int 
 key);

// The DiskSort function is called from PostmansSort, and calls itself recursively. 
// With each recursion, the calling Bin structure must be preserved which reduces the 
// storage available for the called DiskSort function. This wastage is somewhat 
// controlled by recovering the unused part of the Bin structure, since keys in the high 
// ASCII ranges would frequently not be needed.

// Bins could also have been put on the procedure stack, but this could put a more 
// severe limit on recursion depth. (about 1K per Bin), depending on the environment.

intRecoverMemory();
intDiskSort(Bin* inBin, uchar* storage, int storageSize,
 int field, int offset);

#define QUIT(rc) {CloseBin(bin); return rc;}
#define EMPTY_BIN(bin,key) (bin->fpos[key]<0)

/* Ernst included shortCut functions used to detect the case of a single key in a stack 
  of records. These functions were not used in the evaluation, and have been deleted
  for length. They can be found in the online version of the solution. [JRB] */

PostmansSort
pascal OSErr PostmansSort(
 FILE *inFile,   /* stream containing sort input */
 FILE *outFile,  /* stream to contain sort output */
 void *privateStorage,  /* preallocated storage for your use */
 size_t storageSize/* number of bytes of privateStorage */
) {
uchar*  storage=(uchar*)privateStorage;
Bin*  bin=(Bin*)storage;
uchar*  text;
Node* node;

long  fileSize,shortage,curPos;
inttextSize,recovered,rc;
intnextField=lastField;

// Setup storage:

 text=(uchar*)(bin+1);
 theOutFile=outFile;
 storageSize-=OpenBuffer(storage,storageSize);
 storageSize-=sizeof(Bin);
 storageSize-=sizeof(Node);
 if (storageSize<MINMEM) return 10;

// Determine size of input file and read as much as possible

 curPos=ftell(inFile);
 fseek(inFile,0,2);
 fileSize=ftell(inFile);
 fseek(inFile,curPos,0);
 textSize=fread(text,1,storageSize,inFile);

// Parse input data. Nodes grow from the top of storage
// down into the text, and may cause some of it to be reloaded

 node=(Node*)(text+storageSize);
 topKey=0;
 shortage=Parse(text,node,textSize,nextField,0,1);
 curPos=textSize-shortage;

 if (fileSize-curPos==0) {//everything fit

 MemSort(nextField,0);

 } else { //distribute into bins

 InitBin(bin);
 do {
 if (0!=(rc=SendToTemp(bin))) QUIT(rc);
 fseek(inFile,curPos,0);
 textSize=fread(text,1,storageSize,inFile);
 node=(Node*)(text+storageSize)-1;
 shortage=Parse(text,node,textSize,nextField,0,1);
 curPos+=textSize-shortage;

 } while(curPos<fileSize);
 if (0!=(SendToTemp(bin))) QUIT(rc);
 {
 recovered=RecoverMemory();
 if (0!=(rc=DiskSort(bin,text-recovered,
 storageSize+recovered,nextField,0))) 
 QUIT(rc);
 }

 CloseBin(bin);
 }

 FlushBuffer();
 return 0;
}

InitBin
void InitBin(Bin* bin) {
 memset(bin->fpos,-1,sizeof(bin->fpos));
 bin->next=0;
 bin->bytesLeft=0;
 bin->tf=0;
}

OpenBin
int OpenBin(Bin* bin) {
 if (bin->tf==0) {
 if (0==(bin->tf=tmpfile()))
 return 5;// unable to open temp file
 }
 return 0;
}

ResetBin
void ResetBin(Bin* bin) {
 memset(bin->fpos,-1,sizeof(bin->fpos));
 bin->next=0;
 bin->bytesLeft=0;
 if (bin->tf) fseek(bin->tf,0,0);
}

CloseBin
void CloseBin(Bin* bin) {
 if (bin->tf) {
 fclose(bin->tf);
 }
}

Parse
int Parse(uchar* text, Node* node, int size, int field, 
 int offset,int test) {

// returns the shortage, i.e. the difference between the end of the loaded text and the 
// end of the last record parsed.  The shortage is caused by the fact that we load text 
// up to the storage limit, but then overwrite some of it by the parsed nodes which
// grow downwards from the top. In this way we don’t have to guess at the size of a 
// typical record.

uchar*  record=text;
intxfield=0;
uchar*  textEnd=text+size;
 while (text<(uchar*)node) {
 int c=*text++;
 if (c<' ') {
 if (c==TAB_CHAR) {

// mark off those fields
 xfield++;
 if (xfield>lastField) return 4;
 node->fld[xfield]=(ushort)(text-record);
 } else {

// any other CTL-char is treated as END_RECORD:
 int key=*(record+node->fld[field]+offset);

// fill in fields of node and hook to STACK:
 node->ref=record;
 node->link=0;
 node->fld[0]=0;
 node->len=(ushort)(text-record);

// key is the character in the current key position modified to
// 0 if end of field,
// lower case becomes upper case
// length of field if numeric field detected

 if (node->len==1) key=0;
 else if (key>='a') key-=0x20;
 else if (key<' ') key=0;
 else if (test)
 key=TestNumeric(key,node->ref+node->fld[field],
 node->fld[field+1]-node->fld[field]-1);

// keep track of the highest key value used, for future economies.

 node->next=STACK[key];
 if (key>topKey) topKey=key;
 STACK[key]=node--;
 record=text;
 xfield=0;
 }
 }
 if (text>=textEnd)
 return 0;//no shortage
 }

 return textEnd-record;
}

MemSort
static mms;

void MemSort(int field,int offset) {
int endField=(int)STACK[0],numeric=0;
Node* node;

 mms++;
 node=Link(&numeric);
 while (node) {
 Node* nextNode=node->link;
 if (0==node->next) {//no need to sort a single record
 BufferedSend(node);
 endField=0;
 } else if (endField) { //go to next field
 int nextField=field-1;
 endField=0;
 if ((field==7)&&(offset>0)) nextField-=2;
 Dist(node,nextField,0,1);
 MemSort(nextField,0);
 } else if (numeric) {  //test same key position for value
 numeric--;
 Dist(node,field,offset,0);
 MemSort(field,offset);
 } else { //test next key position
 Dist(node,field,offset+1,0);
 MemSort(field,offset+1);
 }
 node=nextNode;
 }
}

Link
Node* Link(int* numeric) {
// detaches stacks under the same key from STACK[] and
// links their first nodes together
Node* node=0;
Node* n;
int key;
 for (key=topKey;key>=0;key--) {
 if (0 != (n=STACK[key])) {
 if ((key<' ') && (key>0)) (*numeric)++;
 n->link=node;
 node=n;
 STACK[key]=0;
 }
 }
 topKey=0;
 return node;
}

Dist
void Dist(Node* node,int field,int offset,int test) {
// scan all nodes in a stack and redistribute them
// back on STACK[] depending on the next key value.
 do {
 Node*  nextNode;
 int    key=*(node->ref+node->fld[field]+offset);
 if (key>='a') key-=0x20;
 else if (key<' ') {
 key=0;
 if (field==0) { //sorting done for this node
 nextNode=node->next;
 node->next=0;
 BufferedSend(node);
 node=nextNode;
 continue;
 }
 } else if (test) {
 key=TestNumeric(key,node->ref+node->fld[field],
 node->fld[field+1]-node->fld[field]-1);
 }
 nextNode=node->next;
 if (key>topKey) topKey=key;
 node->next=STACK[key];
 STACK[key]=node;
 node=nextNode;
 } while (node);
}

TestNumeric
int TestNumeric(int key,uchar* text,int len) {
// returns the field length of a numeric field, otherwise returns key unchanged
int i;
 for (i=0;i<len;i++) {
 int c=*text;
 if (c>'9') return key;
 if (c<'0') return key;
 text++;
 }
 return len;
}

DiskSort
int DiskSort(
 Bin* inBin,
 uchar* storage,
 int storageSize,
 int field,
 int offset) {

// recursively callable version of PostmansSort()

Bin*  bin=(Bin*)storage;
uchar*  text;
Node* node;

long  fileSize;
inttextSize;
long  shortage;
long  curPos;

intkey;
intrc;
intrecovered;
inttopKeyIn=topKey;

 InitBin(bin);
 storageSize-=sizeof(Bin);
 storageSize-=sizeof(Node);
 if (storageSize<MINMEM) {
 if (buffer) {
 storageSize+=buffer->size+sizeof(OutputBuffer)-4;
 CloseBuffer();
 if (storageSize<MINMEM) return 8;
 } else return 9;
 }
 text=(uchar*)(bin+1);
 for (key=0;key<=topKeyIn;key++) {
 int nextField,nextOffset,numericTest;
 if (EMPTY_BIN(inBin,key)) continue;
 if (0==(fileSize=SStreamSize(inBin,key))) continue;

 nextField=field;
 nextOffset=offset;

 topKey=0;
 if (key==0) {   //end of field reached

 if ((nextField==7)&&(nextOffset>0)) nextField=4;

 else if (nextField==0) { //copy whole stream
 SendSStreamOut(text,storageSize,bin,0);
 continue;

 } else nextField--;

 nextOffset=0;
 numericTest=1;
 } else if (key<' ') {  //start of a numeric field
 nextOffset=0;
 numericTest=0;
 } else { //a typical field
 nextOffset++;
 numericTest=0;
 }

 SStreamSeek(inBin,key,0);
 textSize=SStreamRead(text,storageSize,inBin);
 node=(Node*)(text+storageSize);
 shortage=Parse(text,node,textSize,nextField,nextOffset,
 numericTest);
 curPos=textSize-shortage;

 if (fileSize-curPos==0) {//sort in memory

 MemSort(nextField,nextOffset);

 } else { //sort in temp files

 do {
 if (0!=(rc=SendToTemp(bin))) QUIT(rc);
 SStreamSeek(inBin,key,curPos);
 textSize=SStreamRead(text,storageSize,inBin);
 node=(Node*)(text+storageSize)-1;
 shortage=Parse(
 text,node,textSize,
 nextField,nextOffset, numericTest);
 curPos+=textSize-shortage;

 } while(curPos<fileSize);
 if (0!=(rc=SendToTemp(bin))) QUIT(rc);
 {
 recovered=RecoverMemory();
 if (0!=(rc=DiskSort(
 bin, text-recovered,
 storageSize+recovered,
 nextField,nextOffset))) 
 QUIT(rc);
 }

 ResetBin(bin);
 }
 }
 CloseBin(bin);
 return 0;
}

SendToTemp
static sst;

int SendToTemp(Bin* bin) {
// send each segment (stack of equal key records) to disk,
// bin->fpos[key] points to the start of the last segment
// stored. The segment header points to the previous
// segment or -1 if this is the first.

int key,rc;
Node* node;
 if (0!=(rc=OpenBin(bin))) return rc;
 for (key=0;key<NUM_KEYS;key++) {
 if (0 != (node=STACK[key])) {
 tHeader th;
 long pos;
 STACK[key]=0;
 th.size=SegmentSize(node);
 th.next=bin->fpos[key];
 th.key=key+('#'<<8);
 pos=ftell(bin->tf);
 fwrite(&th,1,sizeof(tHeader),bin->tf);
 sst++;
 bin->fpos[key]=pos;
 Send(node,bin->tf);
 }
 }
 return 0;
}

SegmentSize
int SegmentSize(Node* node) {
// The length of a segment needs to be known for the header, before we store the text 
// records. This functions scans through all nodes of a stack and adds up their lengths.
int s=0;
 do {
 s+=node->len;
 node=node->next;
 } while(node);
 return s;
}

Send
void Send(Node* node,FILE* f) {
// Sends a stack to disk, either the temp file, or
// the output file in the case of unbuffered output.
 do {
 fwrite(node->ref,1,node->len,f);
 node=node->next;
 } while(node);
}

SStreamSeek
int SStreamSeek(Bin* bin,int key,long pos) {
// returns error=7 if no SStream, or if seeked beyond size
long fpos=bin->fpos[key];
tHeader th;
 if (0==bin->tf) return 6;
 while (fpos>=0) {
 fseek(bin->tf,fpos,0);
 fread(&th,1,sizeof(th),bin->tf);
 if (pos<th.size) {
 fseek(bin->tf,pos,1);
 bin->bytesLeft=th.size-pos;
 bin->next=th.next;
 return 0;
 } else {
 pos-=th.size;
 fpos=th.next;
 }
 }
 return 7;
}

SStreamRead
int SStreamRead(uchar* text0,int size,Bin* bin) {
// reads from current position reached with SStreamSeek()
// returns the actual number of bytes read
long fpos;
tHeader th;
int bytes2read;
uchar* text=text0;
 if (bin->bytesLeft) {
 if (bin->bytesLeft>size) bytes2read=size;
 else bytes2read=bin->bytesLeft;
 fread(text,1,bytes2read,bin->tf);
 text+=bytes2read; //even if fread comes short
 size-=bytes2read;
 }
 fpos=bin->next;
 while ((size) && (fpos>=0)) {
 fseek(bin->tf,fpos,0);
 fread(&th,1,sizeof(th),bin->tf);
 if (th.size>size) bytes2read=size;
 else bytes2read=th.size;
 fread(text,1,bytes2read,bin->tf);
 text+=bytes2read; //even if fread comes short
 size-=bytes2read;
 fpos=th.next;
 }
 return text-text0;
}

SStreamSize
long SStreamSize(Bin* bin,int key) {
// determines the total SStream size by adding up the sizes of all its segments.
long fpos=bin->fpos[key];
tHeader th;
long size=0;
 while (fpos>=0) {
 fseek(bin->tf,fpos,0);
 fread(&th,1,sizeof(th),bin->tf);
 size+=th.size;
 fpos=th.next;
 }
 return size;
}

SendSStreamOut
void SendSStreamOut(uchar* text,int storageSize,
 Bin* bin,int key) {
// function to copy an entire SStream to the output file, which
// can happen when there is a large number of identical records.
long fpos=bin->fpos[key];
tHeader th;
 while (fpos>=0) {
 fseek(bin->tf,fpos,0);
 fread(&th,1,sizeof(th),bin->tf);
 while(th.size){
 int textSize=storageSize;
 if (textSize>th.size) textSize=th.size;
 fread(text,1,textSize,bin->tf);
 BufferedWrite(text,textSize);
 th.size-=textSize;
 }
 fpos=th.next;
 }
}

RecoverMemory
int RecoverMemory() {
// unused top part of the current bin
 return (NUM_KEYS-topKey-1)*sizeof(long);
}

FlushBuffer
void FlushBuffer() {
int bytes2send=buffer->size-buffer->space;
 if (bytes2send) {
 fwrite(buffer->c,1,bytes2send,theOutFile);
 }
 buffer->space=buffer->size;
}

OpenBuffer
int OpenBuffer(uchar* storage,int storageSize) {
// allocates 1/8 of available memory, up to a maximum of
// DEF_BUFFER_SIZE as output buffer for outFile

int maxSize=storageSize>>3;
int bufferSize=DEF_BUFFER_SIZE;
 while (bufferSize>maxSize) bufferSize>>=1;
 if (bufferSize>=MIN_BUFFER) {
 buffer=(OutputBuffer*)(storage+storageSize-
 (bufferSize+sizeof(OutputBuffer)-4));
 buffer->size=bufferSize;
 buffer->space=bufferSize;
 return bufferSize+sizeof(OutputBuffer)-4;
 } else return 0;
}

CloseBuffer
void CloseBuffer() {
 FlushBuffer();
 buffer=0;
}

BufferedWrite
void BufferedWrite(uchar* text,int size) {
 if (buffer) {
 if (size > buffer->space) {
 int n1=buffer->space;
 int n2=size-buffer->space;
 memcpy(buffer->c+buffer->size-buffer->space,text,n1);
 buffer->space=0;
 FlushBuffer();
 BufferedWrite(text+n1,n2);
 } else {
 memcpy(buffer->c+buffer->size-buffer->space,
 text,size);
 buffer->space-=size;
 }
 } else {
 fwrite(text,1,size,theOutFile);
 }
}

BufferedSend
void BufferedSend(Node* node) {
 if (buffer) {
 do {
 BufferedWrite(node->ref,node->len);
 node=node->next;
 } while(node);
 } else Send(node,theOutFile);
}

Software Updates via MacUpdate

Latest Forum Discussions

Price Scanner via MacPrices.net

Jobs Board

SPREAD THE WORD:
Slashdot
Digg
Del.icio.us
Reddit
Newsvine