Aug 96 Challenge
Volume Number: | | 12
|
Issue Number: | | 8
|
Column Tag: | | Programmers Challenge
|
Programmers Challenge
By Bob Boonstra
Note: Source code files accompanying article are located on MacTech CD-ROM or source code disks.
A-Maze-ing
Deep underground in an abandoned mine. No lantern. No rope. Not much air. Or something like a lack of air that makes you want to leave in a hurry. This month you find yourself lost in an underground maze with nothing except your trusty Challenge solution to help you quickly find your way out.
The prototype for the code you should write is:
typedef Boolean /* found exit */ (*MoveProc) (
long xMove,/* amount of attempted move in x direction (1, 0, or -1) */
long yMove,/* amount of attempted move in y direction (1, 0, or -1) */
long zMove,/* amount of attempted move in z direction (1, 0, or -1) */
long *newXPos, /* new x position after attempted move */
long *newYPos, /* new y position after attempted move */
long *newZPos /* new z position after attempted move */
);
Boolean /* found exit */ Maze (
long xPos, /* x component of initial position */
long yPos, /* y component of initial position */
long zPos, /* z component of initial position */
long xSize,/* size of maze in x dimension */
long ySize,/* size of maze in y dimension */
long zSize,/* size of maze in z dimension */
MoveProc MakeAMove, /* callback to attempt a move */
char *mapStorage/* one byte for each position in the maze */
);
Your Maze routine will be called with your initial position (xPos,yPos,zPos) and the address of a callback routine you can use to attempt to move within the maze. Your code should make sequential calls to the MakeAMove callback until the callback returns TRUE, indicating that you have found your way out of the maze. All exits are on the boundaries of the maze (i.e., at least one coordinate equals 0 or [xyz]Size-1). With each move, you can attempt to move up to one unit in each dimension (x, y, and/or z). The callback will return your new position (*newXPos,*newYPos,*newZPos) after the attempted move. If a passageway exists in the desired direction, the new position will be offset from your previous position by the desired amount. If no passageway exists, the new position will be the same as the previous position (and you will be slightly bruised from walking into solid rock). One slight complicating factor is the combination of darkness and gravity - if you walk into a position where a vertical shaft is open below you, the callback will return a position where the vertical (z) coordinate is reduced the distance you fell. For this reason, moves in a z-only direction are never needed, as a purely downward move will always fail (you will already have fallen), and a purely upward move would be futile (you would fall back to your previous position).
The callback will obviously know which positions in the maze are filled with rock and which are open. In the event you wish to construct a map as you grope around in darkness, the mapStorage parameter will point to xSize*ySize*zSize bytes of storage for your use, initialized to zero by the caller. Once the callback indicates that you have found an exit for the maze, you should return TRUE. If you cannot find an exit, you should return FALSE (rather than loop forever), although this should not happen, because a path to an exit will always exist.
Selection of the winning solution will be based on total execution time of the Maze routine, including the time spent in the callback. The same callback code will be used to test each solution, providing a fixed time penalty for each attempted move.
This will be a native PowerPC Challenge, using the latest CodeWarrior environment. Solutions may be coded in C, C++, or Pascal. To keep the contest focused on code efficiency (rather than compiler efficiency), all solutions will be scored using the Metrowerks compiler for the appropriate language.
Next Month - PowerPC Assembly Language
Next month is the annual assembly language Challenge, and this will be the first year that the Challenge will be in PowerPC assembly language. Depending on the difficulty of the problem, I may provide the electronic version of the problem statement a little early to allow some extra time for you to code your solutions. (See The New Deadline, on the previous page, for instructions on subscribing to the Challenge mailings.)
Two Months Ago Winner
Congratulations once again to Ernst Munter (Kanata, ON), for submitting the fastest entry to the Programmers Challenge. This Challenge called for implementation of a comparison-free distribution sort algorithm that, in theory, can execute in time that grows linearly with the number of input records. While Ernst won the Postmans Sort Challenge by default, as the only person to submit an entry, his solution is both efficient and instructive.
The problem statement required that the solution deal with cases when memory was scarce, requiring the distribution sort being performed using temporary disk files. Ernsts solution also took advantage of the case when ample storage was available. The table below summarizes the results for a test case of 100000 records, given available working storage of 64KB, 640KB, 1MB, and 10MB, respectively. Numbers in parentheses after Ernsts name indicate his cumulative point total for all previous Challenges, not including this one.
time (seconds)
Name size data Memory: 64K 640K 1MB 10MB
Ernst Munter (174) 4312 1112 2314 221 201 18
I examined a few other cases to indicate the growth in sort time as a function of the number of records. Much more information would be needed for a definitive conclusion, but the growth appears to be somewhere between the O(Nlog2N) behavior of comparison sort algorithms and the theoretical best case linear behavior of a distribution sort. Ive included the results for a case where the temporary sort files were written to a RAM disk, which clearly indicates the dominant role of disk I/O time.
time (seconds)
Scenario / Memory available | 640K | 1MB | 10MB
|
20000 records | 64 | 36 | 3
|
100000 records | 221 | 201 | 18
|
500000 records | 1912 | 1514 | 644
|
20000 records (RAM disk) | 9 | 5 | 2
|
With this win, Ernst Munter takes over first place in the Programmers Challenge points standing. Congratulations, Ernst!
Ernst has vaulted into first place by winning four of the last five Challenge contests. To give greater recognition to contestants like Ernst who are currently participating in the Challenge, Ive decided to revise slightly the Top Contestants table that we publish each month. From now on, the published point total will include points earned in the twenty-four most recent contests. With this new scheme, a single first-place finish is currently enough to earn you a spot in the Top Contestants table.
Top Twenty Contestants
Here are the top twenty contestants for the Programmers Challenge. The numbers below include points awarded over the twenty-four most recent contests, including points earned by this months entrants.
Rank | Name | Points | Rank | Name | Points
|
1. | Munter, Ernst | 187 | 11. | Vineyard, Jeremy | 22
|
2. | Gregg, Xan | 92 | 12. | Cutts, Kevin | 21
|
3. | Larsson, Gustav | 87 | 13. | Picao, Miguel Cruz | 21
|
4. | [Name deleted] | 74 | 14. | Brown, Jorg | 20
|
5. | Lengyel, Eric | 40 | 15. | Gundrum, Eric | 20
|
6. | Lewis, Peter | 30 | 16. | Karsh, Bill | 19
|
7. | Darrah, Dave | 29 | 17. | Stenger, Allen | 19
|
8. | Beith, Gary | 24 | 18. | Mallett, Jeff | 17
|
9. | Kasparian, Raffi | 22 | 19. | Nevard, John | 17
|
10. | Noll, Robert | 22 | 20. | Nicolle, Ludovic | 14
|
There are three ways to earn points: (1) scoring in the top five of any Challenge, (2) being the first person to find a bug in a published winning solution, or (3) being the first person to suggest a Challenge that I use. The points you can win are:
1st place 20 points 5th place 2 points
2nd place 10 points finding bug 2 points
3rd place 7 points suggesting Challenge 2 points
4th place 4 points
Here is Ernsts winning solution:
PostPlus.cpp
copyright line:Copyright © 1996 Ernst Munter, Kanata, ON, Canada
/*
Postmans Sort
Problem Statement
-----------------
Sort an address file, using distribution sort, and avoid any direct or disguised
(indirect) comparison between sort keys of the input records.
Solution
--------
It was tempting to create a sorted index tree of the keys, and then sort input records
by comparing (descending) the sort tree. However, I think that would not have been
in the spirit of the Challenge, which was, I believe, to find a fast implementation of
Bob Rameys postmans sort algorithm. Hence, this solution works entirely by
distributing records into bins, or stacks, directly indexed by a single character.
Tabs, and end-of-record markers are converted to \x0, the range \x1 to \x1F is used
to sort numeric fields by their length. Characters above a are reduced by 0x20,
which makes all upper and lower case characters the same. Coincidentally, it also
sets {|}~ equal to [\]^.
Whether the input file fits in memory or not, the principle of sorting is the same:
All records are parsed into nodes and attached to stacks based on the current sort
key, for example the first letter of country. Then each stack is recursively sorted on
the next sort key, i.e. the second letter of country, in the same manner.
If this is done in-memory (MemSort) the stacks are linked and detached from the
global STACK array, before sorting each stack.
If the stack was too large to fit in memory, it is sent in segments to a temporary file
on disk. The temp file then contains interleaved segments of stacks which we keep
track of in a local variable (bin).
To continue sorting, the segments of a particular stack are retrieved from disk, and
the whole stack may now fit in memory for final sorting. If it does not, it is again
sorted into segments and sent to a new temp file on disk.
The number of temp files needed is equal to the recursion depth of the disk sort, until
a stack can be sorted to the finish in memory. The more memory we have, the fewer
temp files are needed.
Assumptions
-----------
tmpfile() is used to create temporary files. Depending on system setup, the system
may run out of files to open, and sorting will then stop.
The order of numerical fields will be natural numerical, that is 9, 12, 30, 100.
However, leading zeros are not expected (or we get 41, 80, 053, 102).
The length of a purely numeric field to be handled correctly should not exceed 31.
Longer numeric fields could get sorted behind fields starting with punctuation.
The longest record expected should be substantially less than the constant MINMEM
which is set to 1024, but this limit can be relaxed. It plays a role only when we are
very short of memory.
All ASCII characters may occur, and will be sorted on. Any control characters other
than Tab (0x09) will be treated as end-of-record indicators.
The sort is not stable, in the sense that the order of equivalent records (differing in
case only, as an example) is not preserved. Records are tossed on stacks and into
bins, and are frequently retrieved in reverse order. Furthermore, running the program
with different amounts of storage may result in a different order as well (U/L case).
*/
#include <stdlib.h>
#include <stdio.h>
#include <memory.h>
#include <string.h>
#define uchar unsigned char
#define ushort unsigned short
#define ulong unsigned long
#define NUM_KEYS 224
#define MINMEM 1024
#define MIN_BUFFER 512
#define DEF_BUFFER_SIZE 16384
#define numFields9
#define lastField(numFields-1)
#define TAB_CHAR 0x09
#define END_RECORD 0x0D
typedef struct Node Node;
struct Node {
uchar* ref;
Node* next;
Node* link;
ushort fld[numFields];
ushort len;
};
typedef struct tHeader tHeader;
struct tHeader {
long next;
int size;
int key;
};
typedef struct Bin Bin;
struct Bin {
FILE* tf;//temp file
long next; //next segment in temp file
int bytesLeft; //in current segment
long fpos[NUM_KEYS];
};
typedef struct OutputBuffer OutputBuffer;
struct OutputBuffer {
int space;//space left
int size; //size of buffer
uchar c[4];//will be c[size]
};
static FILE* theOutFile;
static OutputBuffer*buffer;
static int topKey;
static Node* STACK[NUM_KEYS];
Macros and prototypes
// Output buffer functions:
// The output buffer accumulates characters to be sent to the output file. The block
// size is chosen as 16K, but reduced to 8K if less than 128K of storage is available.
// The buffer isreleased if, after many recursions storage space is severely reduced and
// we need the memory for sorting.
void FlushBuffer();
intOpenBuffer(uchar* storage,int storageSize);
void CloseBuffer();
void BufferedWrite(uchar* text,int size);
void BufferedSend(Node* node);
// in-memory sort and parsing functions:
Node* Link(int* numeric);
intTestNumeric(int key,uchar* text,int len);
void Dist(Node* node,int field,int offset,int test);
void MemSort(int field,int offset);
intParse(uchar* text,Node* node,
int size,int field,int offset,int test);
// SStream functions:
// SStreams are virtual files which are interleaved on a single disk file.
// Each SStream may consist of several segments, linked together through small
// headers (tHeader) on the disk. The last segment is pointed to from a bin.
// Bins administer access to these streams.
void InitBin(Bin* bin);
intOpenBin(Bin* bin);
void ResetBin(Bin* bin);
void CloseBin(Bin* bin);
intSendToTemp(Bin* bin);
intSegmentSize(Node* node);
void Send(Node* node,FILE* f);
intSStreamSeek(Bin* bin,int key,long pos);
intSStreamRead(uchar* text0,int size,Bin* bin);
long SStreamSize(Bin* bin,int key);
void SendSStreamOut(uchar* text,int storageSize,Bin* bin,int
key);
// The DiskSort function is called from PostmansSort, and calls itself recursively.
// With each recursion, the calling Bin structure must be preserved which reduces the
// storage available for the called DiskSort function. This wastage is somewhat
// controlled by recovering the unused part of the Bin structure, since keys in the high
// ASCII ranges would frequently not be needed.
// Bins could also have been put on the procedure stack, but this could put a more
// severe limit on recursion depth. (about 1K per Bin), depending on the environment.
intRecoverMemory();
intDiskSort(Bin* inBin, uchar* storage, int storageSize,
int field, int offset);
#define QUIT(rc) {CloseBin(bin); return rc;}
#define EMPTY_BIN(bin,key) (bin->fpos[key]<0)
/* Ernst included shortCut functions used to detect the case of a single key in a stack
of records. These functions were not used in the evaluation, and have been deleted
for length. They can be found in the online version of the solution. [JRB] */
PostmansSort
pascal OSErr PostmansSort(
FILE *inFile, /* stream containing sort input */
FILE *outFile, /* stream to contain sort output */
void *privateStorage, /* preallocated storage for your use */
size_t storageSize/* number of bytes of privateStorage */
) {
uchar* storage=(uchar*)privateStorage;
Bin* bin=(Bin*)storage;
uchar* text;
Node* node;
long fileSize,shortage,curPos;
inttextSize,recovered,rc;
intnextField=lastField;
// Setup storage:
text=(uchar*)(bin+1);
theOutFile=outFile;
storageSize-=OpenBuffer(storage,storageSize);
storageSize-=sizeof(Bin);
storageSize-=sizeof(Node);
if (storageSize<MINMEM) return 10;
// Determine size of input file and read as much as possible
curPos=ftell(inFile);
fseek(inFile,0,2);
fileSize=ftell(inFile);
fseek(inFile,curPos,0);
textSize=fread(text,1,storageSize,inFile);
// Parse input data. Nodes grow from the top of storage
// down into the text, and may cause some of it to be reloaded
node=(Node*)(text+storageSize);
topKey=0;
shortage=Parse(text,node,textSize,nextField,0,1);
curPos=textSize-shortage;
if (fileSize-curPos==0) {//everything fit
MemSort(nextField,0);
} else { //distribute into bins
InitBin(bin);
do {
if (0!=(rc=SendToTemp(bin))) QUIT(rc);
fseek(inFile,curPos,0);
textSize=fread(text,1,storageSize,inFile);
node=(Node*)(text+storageSize)-1;
shortage=Parse(text,node,textSize,nextField,0,1);
curPos+=textSize-shortage;
} while(curPos<fileSize);
if (0!=(SendToTemp(bin))) QUIT(rc);
{
recovered=RecoverMemory();
if (0!=(rc=DiskSort(bin,text-recovered,
storageSize+recovered,nextField,0)))
QUIT(rc);
}
CloseBin(bin);
}
FlushBuffer();
return 0;
}
InitBin
void InitBin(Bin* bin) {
memset(bin->fpos,-1,sizeof(bin->fpos));
bin->next=0;
bin->bytesLeft=0;
bin->tf=0;
}
OpenBin
int OpenBin(Bin* bin) {
if (bin->tf==0) {
if (0==(bin->tf=tmpfile()))
return 5;// unable to open temp file
}
return 0;
}
ResetBin
void ResetBin(Bin* bin) {
memset(bin->fpos,-1,sizeof(bin->fpos));
bin->next=0;
bin->bytesLeft=0;
if (bin->tf) fseek(bin->tf,0,0);
}
CloseBin
void CloseBin(Bin* bin) {
if (bin->tf) {
fclose(bin->tf);
}
}
Parse
int Parse(uchar* text, Node* node, int size, int field,
int offset,int test) {
// returns the shortage, i.e. the difference between the end of the loaded text and the
// end of the last record parsed. The shortage is caused by the fact that we load text
// up to the storage limit, but then overwrite some of it by the parsed nodes which
// grow downwards from the top. In this way we dont have to guess at the size of a
// typical record.
uchar* record=text;
intxfield=0;
uchar* textEnd=text+size;
while (text<(uchar*)node) {
int c=*text++;
if (c<' ') {
if (c==TAB_CHAR) {
// mark off those fields
xfield++;
if (xfield>lastField) return 4;
node->fld[xfield]=(ushort)(text-record);
} else {
// any other CTL-char is treated as END_RECORD:
int key=*(record+node->fld[field]+offset);
// fill in fields of node and hook to STACK:
node->ref=record;
node->link=0;
node->fld[0]=0;
node->len=(ushort)(text-record);
// key is the character in the current key position modified to
// 0 if end of field,
// lower case becomes upper case
// length of field if numeric field detected
if (node->len==1) key=0;
else if (key>='a') key-=0x20;
else if (key<' ') key=0;
else if (test)
key=TestNumeric(key,node->ref+node->fld[field],
node->fld[field+1]-node->fld[field]-1);
// keep track of the highest key value used, for future economies.
node->next=STACK[key];
if (key>topKey) topKey=key;
STACK[key]=node--;
record=text;
xfield=0;
}
}
if (text>=textEnd)
return 0;//no shortage
}
return textEnd-record;
}
MemSort
static mms;
void MemSort(int field,int offset) {
int endField=(int)STACK[0],numeric=0;
Node* node;
mms++;
node=Link(&numeric);
while (node) {
Node* nextNode=node->link;
if (0==node->next) {//no need to sort a single record
BufferedSend(node);
endField=0;
} else if (endField) { //go to next field
int nextField=field-1;
endField=0;
if ((field==7)&&(offset>0)) nextField-=2;
Dist(node,nextField,0,1);
MemSort(nextField,0);
} else if (numeric) { //test same key position for value
numeric--;
Dist(node,field,offset,0);
MemSort(field,offset);
} else { //test next key position
Dist(node,field,offset+1,0);
MemSort(field,offset+1);
}
node=nextNode;
}
}
Link
Node* Link(int* numeric) {
// detaches stacks under the same key from STACK[] and
// links their first nodes together
Node* node=0;
Node* n;
int key;
for (key=topKey;key>=0;key--) {
if (0 != (n=STACK[key])) {
if ((key<' ') && (key>0)) (*numeric)++;
n->link=node;
node=n;
STACK[key]=0;
}
}
topKey=0;
return node;
}
Dist
void Dist(Node* node,int field,int offset,int test) {
// scan all nodes in a stack and redistribute them
// back on STACK[] depending on the next key value.
do {
Node* nextNode;
int key=*(node->ref+node->fld[field]+offset);
if (key>='a') key-=0x20;
else if (key<' ') {
key=0;
if (field==0) { //sorting done for this node
nextNode=node->next;
node->next=0;
BufferedSend(node);
node=nextNode;
continue;
}
} else if (test) {
key=TestNumeric(key,node->ref+node->fld[field],
node->fld[field+1]-node->fld[field]-1);
}
nextNode=node->next;
if (key>topKey) topKey=key;
node->next=STACK[key];
STACK[key]=node;
node=nextNode;
} while (node);
}
TestNumeric
int TestNumeric(int key,uchar* text,int len) {
// returns the field length of a numeric field, otherwise returns key unchanged
int i;
for (i=0;i<len;i++) {
int c=*text;
if (c>'9') return key;
if (c<'0') return key;
text++;
}
return len;
}
DiskSort
int DiskSort(
Bin* inBin,
uchar* storage,
int storageSize,
int field,
int offset) {
// recursively callable version of PostmansSort()
Bin* bin=(Bin*)storage;
uchar* text;
Node* node;
long fileSize;
inttextSize;
long shortage;
long curPos;
intkey;
intrc;
intrecovered;
inttopKeyIn=topKey;
InitBin(bin);
storageSize-=sizeof(Bin);
storageSize-=sizeof(Node);
if (storageSize<MINMEM) {
if (buffer) {
storageSize+=buffer->size+sizeof(OutputBuffer)-4;
CloseBuffer();
if (storageSize<MINMEM) return 8;
} else return 9;
}
text=(uchar*)(bin+1);
for (key=0;key<=topKeyIn;key++) {
int nextField,nextOffset,numericTest;
if (EMPTY_BIN(inBin,key)) continue;
if (0==(fileSize=SStreamSize(inBin,key))) continue;
nextField=field;
nextOffset=offset;
topKey=0;
if (key==0) { //end of field reached
if ((nextField==7)&&(nextOffset>0)) nextField=4;
else if (nextField==0) { //copy whole stream
SendSStreamOut(text,storageSize,bin,0);
continue;
} else nextField--;
nextOffset=0;
numericTest=1;
} else if (key<' ') { //start of a numeric field
nextOffset=0;
numericTest=0;
} else { //a typical field
nextOffset++;
numericTest=0;
}
SStreamSeek(inBin,key,0);
textSize=SStreamRead(text,storageSize,inBin);
node=(Node*)(text+storageSize);
shortage=Parse(text,node,textSize,nextField,nextOffset,
numericTest);
curPos=textSize-shortage;
if (fileSize-curPos==0) {//sort in memory
MemSort(nextField,nextOffset);
} else { //sort in temp files
do {
if (0!=(rc=SendToTemp(bin))) QUIT(rc);
SStreamSeek(inBin,key,curPos);
textSize=SStreamRead(text,storageSize,inBin);
node=(Node*)(text+storageSize)-1;
shortage=Parse(
text,node,textSize,
nextField,nextOffset, numericTest);
curPos+=textSize-shortage;
} while(curPos<fileSize);
if (0!=(rc=SendToTemp(bin))) QUIT(rc);
{
recovered=RecoverMemory();
if (0!=(rc=DiskSort(
bin, text-recovered,
storageSize+recovered,
nextField,nextOffset)))
QUIT(rc);
}
ResetBin(bin);
}
}
CloseBin(bin);
return 0;
}
SendToTemp
static sst;
int SendToTemp(Bin* bin) {
// send each segment (stack of equal key records) to disk,
// bin->fpos[key] points to the start of the last segment
// stored. The segment header points to the previous
// segment or -1 if this is the first.
int key,rc;
Node* node;
if (0!=(rc=OpenBin(bin))) return rc;
for (key=0;key<NUM_KEYS;key++) {
if (0 != (node=STACK[key])) {
tHeader th;
long pos;
STACK[key]=0;
th.size=SegmentSize(node);
th.next=bin->fpos[key];
th.key=key+('#'<<8);
pos=ftell(bin->tf);
fwrite(&th,1,sizeof(tHeader),bin->tf);
sst++;
bin->fpos[key]=pos;
Send(node,bin->tf);
}
}
return 0;
}
SegmentSize
int SegmentSize(Node* node) {
// The length of a segment needs to be known for the header, before we store the text
// records. This functions scans through all nodes of a stack and adds up their lengths.
int s=0;
do {
s+=node->len;
node=node->next;
} while(node);
return s;
}
Send
void Send(Node* node,FILE* f) {
// Sends a stack to disk, either the temp file, or
// the output file in the case of unbuffered output.
do {
fwrite(node->ref,1,node->len,f);
node=node->next;
} while(node);
}
SStreamSeek
int SStreamSeek(Bin* bin,int key,long pos) {
// returns error=7 if no SStream, or if seeked beyond size
long fpos=bin->fpos[key];
tHeader th;
if (0==bin->tf) return 6;
while (fpos>=0) {
fseek(bin->tf,fpos,0);
fread(&th,1,sizeof(th),bin->tf);
if (pos<th.size) {
fseek(bin->tf,pos,1);
bin->bytesLeft=th.size-pos;
bin->next=th.next;
return 0;
} else {
pos-=th.size;
fpos=th.next;
}
}
return 7;
}
SStreamRead
int SStreamRead(uchar* text0,int size,Bin* bin) {
// reads from current position reached with SStreamSeek()
// returns the actual number of bytes read
long fpos;
tHeader th;
int bytes2read;
uchar* text=text0;
if (bin->bytesLeft) {
if (bin->bytesLeft>size) bytes2read=size;
else bytes2read=bin->bytesLeft;
fread(text,1,bytes2read,bin->tf);
text+=bytes2read; //even if fread comes short
size-=bytes2read;
}
fpos=bin->next;
while ((size) && (fpos>=0)) {
fseek(bin->tf,fpos,0);
fread(&th,1,sizeof(th),bin->tf);
if (th.size>size) bytes2read=size;
else bytes2read=th.size;
fread(text,1,bytes2read,bin->tf);
text+=bytes2read; //even if fread comes short
size-=bytes2read;
fpos=th.next;
}
return text-text0;
}
SStreamSize
long SStreamSize(Bin* bin,int key) {
// determines the total SStream size by adding up the sizes of all its segments.
long fpos=bin->fpos[key];
tHeader th;
long size=0;
while (fpos>=0) {
fseek(bin->tf,fpos,0);
fread(&th,1,sizeof(th),bin->tf);
size+=th.size;
fpos=th.next;
}
return size;
}
SendSStreamOut
void SendSStreamOut(uchar* text,int storageSize,
Bin* bin,int key) {
// function to copy an entire SStream to the output file, which
// can happen when there is a large number of identical records.
long fpos=bin->fpos[key];
tHeader th;
while (fpos>=0) {
fseek(bin->tf,fpos,0);
fread(&th,1,sizeof(th),bin->tf);
while(th.size){
int textSize=storageSize;
if (textSize>th.size) textSize=th.size;
fread(text,1,textSize,bin->tf);
BufferedWrite(text,textSize);
th.size-=textSize;
}
fpos=th.next;
}
}
RecoverMemory
int RecoverMemory() {
// unused top part of the current bin
return (NUM_KEYS-topKey-1)*sizeof(long);
}
FlushBuffer
void FlushBuffer() {
int bytes2send=buffer->size-buffer->space;
if (bytes2send) {
fwrite(buffer->c,1,bytes2send,theOutFile);
}
buffer->space=buffer->size;
}
OpenBuffer
int OpenBuffer(uchar* storage,int storageSize) {
// allocates 1/8 of available memory, up to a maximum of
// DEF_BUFFER_SIZE as output buffer for outFile
int maxSize=storageSize>>3;
int bufferSize=DEF_BUFFER_SIZE;
while (bufferSize>maxSize) bufferSize>>=1;
if (bufferSize>=MIN_BUFFER) {
buffer=(OutputBuffer*)(storage+storageSize-
(bufferSize+sizeof(OutputBuffer)-4));
buffer->size=bufferSize;
buffer->space=bufferSize;
return bufferSize+sizeof(OutputBuffer)-4;
} else return 0;
}
CloseBuffer
void CloseBuffer() {
FlushBuffer();
buffer=0;
}
BufferedWrite
void BufferedWrite(uchar* text,int size) {
if (buffer) {
if (size > buffer->space) {
int n1=buffer->space;
int n2=size-buffer->space;
memcpy(buffer->c+buffer->size-buffer->space,text,n1);
buffer->space=0;
FlushBuffer();
BufferedWrite(text+n1,n2);
} else {
memcpy(buffer->c+buffer->size-buffer->space,
text,size);
buffer->space-=size;
}
} else {
fwrite(text,1,size,theOutFile);
}
}
BufferedSend
void BufferedSend(Node* node) {
if (buffer) {
do {
BufferedWrite(node->ref,node->len);
node=node->next;
} while(node);
} else Send(node,theOutFile);
}