Random Access Files
Volume Number: | | 2
|
Issue Number: | | 9
|
Column Tag: | | Basic School
|
Random Access Files
By Dave Kelly, MacTutor Editorial Board
There are two types of data files that can be created and used by your MS Basic program: sequential files and random access files. Sequential files are used more often because they are easy to create, but random access files are more flexible and data can be located faster. A discussion of sequential file I/O operation begins on page 45 of your MS Basic manual (ver. 2.0 or greater). Random Access File I/O starts on page 48. Before we begin our discussion of random access file I/O, I suggest that you refer to those pages.
The purpose of this column is to help you develop an understanding of random access I/O and how to use it in your own programs. It is very easy to understand how data is structured in sequential files. It requires more work to organize a random access file. The organization of the random access file is up to you. I'll try to outline some steps you can use to help organize your file.
First, you should decide just what data you have to store. For example, if you were setting up a mail list database you would need one field each for name, address, city-state, and zipcode. Next decide how many characters will be allowed for each field (25 for name, 30 for address, 25 for city-state, and 5 for zipcode). The total length of an individual record would then be 85 characters.
Now decide how many individual records you expect to have in the file. If you don't require too many records and don't expect to ever expand the file, a sequential file many be suitable. The is especially true if you have a lot of RAM to work with and a comparatively small data file. There are some advantages and disadvantages to using a sequential file this way. With a sequential file, all records are read into memory so the disk is only accessed once. The program can then operate on the data much faster than if it had to access the disk for each record. However, if the data had been changed at all, the entire file would have to be stored back to the disk or the changes would be lost. In the event of a power failure or some other system crash, a random access file would contain all the changes, but a sequential file would not. Generally as files get larger, they are better handled by random access methods. A large sequential file could take quite a bit of time just to read and write to the disk.
Next you should consider how you want to access each record of your random access file. You may want to be able to search for a name or sort the file by zip code. A long and tedious way to do this would be to read through each and every record until the desired record is found. If the user knows exactly which record to read then the access time may be reduced significantly. One way to do this would be to create an index file. For example, if you wanted to find a specific record and you know the contents of one of the fields, you could look in the index file to find the matching field and record number. For a mail list database you might set up an index file containing all of the names and the record numbers corresponding to the names. Index files may be sequential or random access (for relation databases) but should contain as few fields as possible to optimize data access time. If the index is sequential it should be kept in memory and updated as the random file is updated.
Figure 1
If indexes are used, some thought must be taken as to updating and changing the index file. If a record is to be deleted, you might want to delete the index, thus removing any reference to the random access file record. This leaves an available record for late addition of a new record (if you keep track of which records have been deleted). If your file isn't expected to change very much you may not mind the wasted space taken up by the deleted record. Ideally, you should keep track of the locations of deleted records so that they can be reused when new records are added. Another way to get rid of the wasted records (if you don't want to go to the trouble of keeping track of the deleted records) is to write a program to do "Garbage Collection".
Fig. 2 Garbage Collection
A "Garbage Collection" program reads all undeleted records and writes them to a new file. You only have to do "Garbage Collection" when a lot of records have been deleted and you need more space to add new records. "Garbage Collection" might be ok to use if it is automatically performed (with no user intervention). It is NOT desirable for the user of your program to have to keep track of this kind of file handling (when to collect garbage and when not to).
When a record is added to the datafile, a new index entry should be created and the new record should be added to the random access file (either as a new record or replacing a previously deleted record). If an existing record is edited and changed the index file should be updated accordingly. You may want to sort the index file before writing it to the disk. Be sure to save the index program before quiting the program.
Now let's take a look at how the random access file is structured. When you open a file in basic, a buffer is allocated for each file opened. For random access files the buffer should be set equal to the length of one record ( the default buffer size is 128 bytes). It is through this buffer that basic reads and writes to the disk. To help you understand what a random access file "looks like", let's create a sample file to examine. The Random Access File program included with this column will create a sample random access file that we can analyze. It creates a random file named "Sample RA File" with a length of 64 bytes. One advantage of MS Basic random access files is that random access files require less room on the disk, since Basic stores them in a packed binary format. Sequential files are stored as a series of sequential ASCII characters.
To facilitate the conversion of numbers to the packed binary format we must use the MKI$,MKS$,MKD$ commands. To unconvert the numbers we must use CVI,CVS,CVD commands. These are somewhat easy to remember if you think of the MK as MaKe and the CV as ConVert. Thus if we want to store an integer number we use MKI$ to MaKe an Integer string and use CVI to ConVert the Integer back again. The sample file shows an example of how to use these MaKe and ConVert commands for integers, single precision and double precision numbers.
As I already mentioned, when the file is opened a buffer is allocated (in this case the length of all the fields is 64). The fields that we want to use must be memory mapped to the buffer area. This is accomplished with the FIELD statement. You may use as many FIELD statements as you like, however, each field statement starts defining the fields starting at the beginning of the buffer. If you define all your fields on one line (one FIELD statement) then you won't have any problem, but if you have more fields than you want to put in one statement then you will want to use a second FIELD statement. The trick (which the manual does not show you how to do) is to define a dummy variable with the accumulative length of all the previous field statements before defining your next field. In the sample program the first FIELD statement defines three number fields with a total of 14 bytes. (Integer fields are converted to 2 bytes, single precision to 4 bytes, and double precision to 8 bytes). In the second FIELD statement a dummy string is marking the first part of the buffer which has already been defined so that the next field will begin after the previously defined fields. If you didn't know to do this you could have some strange effects when you read your file back as the field definitions would overlap.
The next important thing that the program must do is to put our data into the buffer so it can be written to a record on the disk. This is accomplished with the LSET or RSET statements. LSET will left justify the string within the defined field length (a variable might be actually shorter than the field has available), The RSET statement will right justify the string within the field. Every field must be set into the buffer with one of these commands. You should use a different variable in defining the fields and setting into the buffer than you use to manipulate your data. Be sure that you don't use a defined field in an INPUT or LET type statement. This will redefine the location that the variable points to (we want it to put to the buffer area). If a record is read from the disk, all the fields defined in the buffer area will contain the data stored on the disk for that record. You only have to reset those fields that you want to change. All the rest of the fields will be left untouched until you read another record into the buffer or set a new value into the field.
To store a record to disk use PUT [# ]filenumber [,record-number ]. To read a record from the disk use GET [# ]filenumber [,record-number ]. The PUT, GET statements read and write the entire record in the buffer. You use PUT after you use the MaKe string statements and use ConVert statements after using GET. You can find more information on PUT and GET in your Basic manual (pages 220 and 146). Run the sample program to create a random access file we can examine.
The second program included with this column is a random access utility that I developed to analyze the data stored in a random access file. I have been saved from alot of problems with programs like this in the past. I have been able to repair damaged random access files and determine what buggy random programs were doing with utilities like this one.
The utility program opens with a menu which will allow you to open your file. Choosing open from the File menu brings up the standard getfile dialog box from which you can choose the file you want to examine. (You should choose "Sample RA File" for this example). Next, the program asks for the length of the random access file record. If you wrote the program you should have this available, however, if you don't know what it is you can guess. The sample file is 64 bytes so enter a 64 for the length (then click OK).
The file menu now has made active a menu item named Edit in the File menu (this may be confusing - it is NOT the Edit menu). Selecting Edit from the File menu will bring up a prompt for the record number you want to read. Enter a '1' to read record number one (the sample file only has one record) (click OK). Next the record is read into the buffer and displayed on the screen. The first EDIT FIELD shown displays the file as it looks. Note that some of the ASCII characters are invisible and can't be seen in the EDIT FIELD. The second EDIT FIELD shows the equivalent ASCII representation of the record. Invisible characters can be seen (for example a '0' is a null character). Either of these two fields can be modified or examined as you like.
The hardest thing to analyze is the numbers which have been converted to strings with the MaKe statements. To make this somewhat easier (though not foolproof) the program provides a way to convert your numbers from strings to numbers and numbers to strings to see how these ConVert/MaKe statements work. The third EDIT FIELD provides the way to enter the number or string to be converted. For example, enter a 5 in the field and select MKI$(integer) Convert from the Convert menu. The integer 5 will be converted to the packed binary format string. Note that the first field stored by our sample file is '0, 5' which was the two byte string made from the integer 5 (see the sample program if you don't follow this). The converted string has been placed in the third EDIT FIELD. The characters there are invisible (0 and 5 ASCII do not print). If you select CVI(string) Convert (2-bytes) from the Convert menu, the string will be converted back to the integer equivalent and displayed.
The rest is up to you as to what you want to do with the utility. It is possible to modify data in the random record by typing the change in one of the first two EDIT FIELDs. Then select the button at the top of the window to write the record. When you select 'OK', the EDIT FIELD which which is active (the EDIT FIELD which the cursor is blinking) will be stored in place of the record. It is possible to convert a number in the Convert EDIT FIELD then COPY the contents of the EDIT FIELD and PASTE it into the text in the first EDIT FIELD. It may be somewhat difficult to COPY/PASTE invisible characters (because you can't see them to select them) although it is possible. I recommend that you display the converted ASCII equivalent and enter the ASCII characters into the second EDIT FIELD and save the record to the disk.
That's all there is on random access files. Hopefully the utility will help you to learn some things by experimentation about random access. Any questions may be directed to myself via MacTutor.
' Random Access File
' ©MacTutor 1986
' This program creates a sample Random Access File
Integer%=5: Single!=32769!: Double#=123456789#
Title$="MacTutor, The Macintosh Programming Journal"
OPEN "Sample RA File" AS #1 LEN=64
FIELD #1,2 AS I$,4 AS S$,8 AS D$
FIELD #1,14 AS Dummy$,50 AS T$
TEXTFACE(1)
PRINT "Our Variables are: Integer%=";Integer%;"Single!=";Single!
PRINT "Double#=";Double#
PRINT "Title$=";Title$
TEXTFACE(0)
WRIT: PRINT"We will now save them to record 1 (record length=64)."
LSET I$=MKI$(Integer%)
LSET S$=MKS$(Single!)
LSET D$=MKD$(Double#)
LSET T$=Title$
PUT #1,1
CLOSE #1
PRINT"Now clear all variables... and print them:"
Integer%=0:Single!=0:Double#=0:Title$=""
TEXTFACE(1)
PRINT "Our Variables are: Integer%=";Integer%;"Single!=";Single!
PRINT "Double#=";Double#
PRINT "Title$=";Title$
TEXTFACE(0)
PRINT "Now read them back again..."
OPEN "Sample RA File" AS #1 LEN=64
FIELD #1,2 AS I$,4 AS S$,8 AS D$ , 50 AS T$
GET #1,1
LET Integer%=CVI(I$)
LET Single!=CVS(S$)
LET Double#=CVD(D$)
LET Title$=T$
PRINT"Now close the file and print them all..."
CLOSE #1
TEXTFACE(1)
PRINT "Our Variables are: Integer%=";Integer%;"Single!=";Single!
PRINT"Double#=";Double#
PRINT "Title$=";Title$
TEXTFACE(0)
END
' Professor Mac's Random Access Utility
' ©MacTutor 1986
' By Dave Kelly
OPTION BASE 1
DEFINT a-z
WINDOW 1,"",(2,25)-(510,335),3
GOSUB WindowHeader
Recordnumber=1
MENU 1,0,1,"File"
MENU 1,1,1,"Open"
MENU 1,2,0,"Close"
MENU 1,3,0,"Edit"
MENU 1,4,1,"Quit"
MENU 3,0,0,""
MENU 4,0,0,""
MENU 5,0,0,""
False=0: True= NOT False
Fileopen = False
ON MENU GOSUB MenuEvent
MENU ON
WaitForEvent: GOTO WaitForEvent
MenuEvent:
MenuNumber = MENU(0)
MenuItem = MENU(1):MENU
ON MenuNumber GOSUB Filemenu,Editmenu,Convertmenu
RETURN
Filemenu:
ON MenuItem GOSUB OpenFile,CloseFile,FindRecord,Quititem
RETURN
Editmenu:
RETURN
WindowHeader:
TEXTFONT(2):TEXTSIZE(14):TEXTFACE(1)
LOCATE 1,15:PRINT"Random Access Utility"
TEXTSIZE(12):TEXTFACE(0)
RETURN
Quititem:
IF Fileopen = True THEN GOSUB CloseFile
MENU RESET
WINDOW CLOSE 1
END
OpenFile:
Filename$=FILES$(1)
IF Filename$="" THEN GOSUB WindowHeader: RETURN
LOCATE 4,1:PRINT" Enter the length of your Random Access File:"
GOSUB WindowHeader
EDIT FIELD 1,"128",(300,48)-(350,63),1,1
BUTTON 1,1,"OK",(315,130)-(365,180)
GOSUB Loop
Recordlength=VAL(EDIT$(1))
IF Recordlength >32767 OR Recordlength <=0 THEN GOTO OpenFile
BUTTON CLOSE 1
EDIT FIELD CLOSE 1:CLS
OPEN Filename$ AS #1 LEN=Recordlength
FIELD #1,Recordlength AS Random$
Setup:
GOSUB WindowHeader
Fileopen=True
MENU 1,1,0
MENU 1,2,1
MENU 1,3,1
RETURN
CloseFile:
Fileopen=False
MENU 1,1,1
MENU 1,2,0
MENU 1,3,0
CLOSE #1
IF MenuItem <>4 THEN GOSUB WindowHeader
RETURN
GetRecord:
IF Recordnumber=0 THEN PRINT "Record # 0 does not exist":RETURN
GET #1,Recordnumber
R$=Random$
RETURN
StoreRecord:
LSET Random$=R$
PUT #1,Recordnumber
RETURN
FindRecord:
CLS
LOCATE 4,1:PRINT"Enter Record Number to find:"
EDIT FIELD 1,STR$(Recordnumber),(200,48)-(250,63),1,1
BUTTON 1,1,"OK",(315,130)-(365,180)
GOSUB Loop
Recordnumber=VAL(EDIT$(1))
LOCATE 5,1
IF Recordnumber<1 OR Recordnumber > 16777215# THEN PRINT "Number
out of range":BEEP:FOR i=1 TO 100:NEXT:GOTO FindRecord
GOSUB GetRecord
EDIT FIELD CLOSE 1
EditRecord:
MENU ON
CLS:GOSUB WindowHeader
BUTTON CLOSE 1
GOSUB DecodeASCII
PRINT "Current Record is #";Recordnumber
LOCATE 17,1:PRINT "Conversion string:"
TEXTFONT(4)
EDIT FIELD 3,"",(10,280)-(90,295),2,1
EDIT FIELD 2,ASCII$,(10,130)-(485,250),1,1
EDIT FIELD 1,R$,(10,40)-(485,125),2,1
TEXTFONT(2)
BUTTON 1,1,"OK",(450,255)-(500,305)
BUTTON 2,1,"Write record after Edit",(275,22)-(450,37),3:b2=False
MENU 3,0,1,"Convert"
MENU 3,1,1,"CVI(string) Convert (2-bytes)" 'Convert 2-byte string
MENU 3,2,1,"CVS(string) Convert (4-bytes)" 'Convert 4-byte string
MENU 3,3,1,"CVD(string) Convert (8-bytes)" 'Convert 8-byte string
MENU 3,4,1,"MKI$(integer) Convert" 'Convert integer
MENU 3,5,1,"MKS$(single-precision) Convert" 'Convert single-precision
MENU 3,6,1,"MKD$(double-precision) Convert" 'Convert double-precision
i=1
EditLoop:
d=DIALOG(0)
IF d=1 THEN buttonpushed=DIALOG(1):IF buttonpushed=1 THEN
EditDone ELSE GOSUB Switch
IF d=2 THEN i=DIALOG(2)
IF d=6 AND i=1 THEN EditDone
IF d=7 THEN i=(i MOD 2)+1:EDIT FIELD i
GOTO EditLoop
EditDone:
R$=EDIT$(1)
IF i=2 THEN ASCII$=EDIT$(2): GOSUB EncodeASCII
IF b2 = True THEN GOSUB StoreRecord
EDIT FIELD CLOSE 1
EDIT FIELD CLOSE 2
EDIT FIELD CLOSE 3
BUTTON CLOSE 1:BUTTON CLOSE 2
MENU 3,0,0,""
CLS:GOSUB WindowHeader
RETURN
Convertmenu:
x#=FRE(0)
MENU OFF:TEXTFONT(4)
Convert$=EDIT$(3)
LOCATE 17,18:PRINT STRING$(35," ")
LOCATE 18,18:PRINT STRING$(35," "):LOCATE 17,18
ON MenuItem GOSUB CVIconvert, CVSconvert, CVDconvert, MKIconvert,
MKSconvert,MKDconvert
MENU ON:TEXTFONT(2)
RETURN
CVIconvert:
IF LEN(Convert$)<>2 THEN PRINT"Can't convert"; LEN(Convert$);"bytes.":RETURN
IntNumber%=CVI(Convert$)
PRINT "CVI(";CHR$(34);Convert$;CHR$(34);")=";IntNumber%
RETURN
CVSconvert:
IF LEN(Convert$)<>4 THEN PRINT"Can't convert"; LEN(Convert$);"bytes.":RETURN
SingleNumber!=CVS(Convert$)
PRINT "CVS(";CHR$(34);Convert$;CHR$(34);")=";SingleNumber!
RETURN
CVDconvert:
IF LEN(Convert$)<>8 THEN PRINT"Can't convert"; LEN(Convert$);"bytes.":RETURN
DoubleNumber#=CVD(Convert$)
PRINT "CVD(";Convert$;")=";DoubleNumber#
RETURN
MKIconvert:
IF VAL(Convert$)<-32767 OR VAL(Convert$)>32767 THEN PRINT "Number
too big!":RETURN
IntNumber%=VAL(Convert$)
NewConvert$=MKI$(IntNumber%)
EDIT FIELD 3,NewConvert$,(10,280)-(90,295)
PRINT "MKI$(";Convert$;")= ASCII:";
PRINT USING " ###";ASC(MID$(NewConvert$,1,1)), ASC(MID$(NewConvert$,2,1))
RETURN
MKSconvert:
IF VAL(Convert$)<-1.18E-38 OR VAL(Convert$)>3.3999E+38 THEN
PRINT "Number too big!":RETURN
SingleNumber!=VAL(Convert$)
NewConvert$=MKS$(SingleNumber!)
EDIT FIELD 3,NewConvert$,(10,280)-(90,295)
PRINT "MKS$(";Convert$;")= ASCII:";
PRINT USING " ###";ASC(MID$(NewConvert$,1,1)), ASC(MID$(NewConvert$,2,1));
PRINT USING " ###";ASC(MID$(NewConvert$,3,1)), ASC(MID$(NewConvert$,4,1))
RETURN
MKDconvert:
IF VAL(Convert$)<-2.23D-308 OR VAL(Convert$)>1.789999D+308
THEN PRINT "Number too big!":RETURN
DoubleNumber#=VAL(Convert$)
NewConvert$=MKD$(DoubleNumber#)
EDIT FIELD 3,NewConvert$,(10,280)-(90,295)
PRINT "MKD$(x)= ASCII:";
PRINT USING " ###";ASC(MID$(NewConvert$,1,1)), ASC(MID$(NewConvert$,2,1));
PRINT USING " ###";ASC(MID$(NewConvert$,3,1)), ASC(MID$(NewConvert$,4,1))
LOCATE 18,33
PRINT USING " ###";ASC(MID$(NewConvert$,5,1)), ASC(MID$(NewConvert$,6,1));
PRINT USING " ###";ASC(MID$(NewConvert$,7,1)), ASC(MID$(NewConvert$,8,1))
RETURN
Switch:
b2=NOT b2
IF b2=True THEN BUTTON 2,2 ELSE BUTTON 2,1
RETURN
Loop:
d=DIALOG(0)
IF d=1 THEN Done
IF d=6 THEN Done
GOTO Loop
Done:
RETURN
DecodeASCII:
ASCII$=""
FOR i=1 TO Recordlength
ASCIInum$=STR$(ASC(MID$(R$,i,1)))+","
IF LEN(ASCIInum$)=2 THEN ASCIInum$=ASCIInum$
IF LEN(ASCIInum$)=3 THEN ASCIInum$=ASCIInum$
ASCII$=ASCII$+ASCIInum$
NEXT i
RETURN
EncodeASCII:
R$="":commaposition=1
FOR i=1 TO Recordlength
commaplace=INSTR(commaposition,ASCII$,",")
ASCIInum$=MID$(ASCII$,commaposition,commaplace-1)
commaposition=commaplace+1
R$=R$+CHR$(VAL(ASCIInum$))
NEXT i
RETURN
Basic Compiler Update News
MacTutor is keeping you up to date on the latest developments of the new Basic products that have been released. There are some new developments which you should be made aware of. Refer to the August 1986 MacTutor for the preliminary review of these products. The status (as of this time) of all of the products we have reviewed is:
MS BASIC (version 2.10): Not a word. Rumors have it that Microsoft is making improvements, including the compiler. No word on when or what.
PCMacBasic: Major Improvements are in the works. I have not yet seen an update.
Softworks Basic: I don't know if any improvements are being made.
True Basic: Improvements are in the works.
ZBasic (version 3.02b) : Zedcor, Inc. has corrected several of the bugs we spoke about in our review and already sent me an updated (beta) copy. (Thank you!) Specifically:
Z-Basic Enhancements
Files can now be located in any folder by volume
The Directory command now works if you specify the pathname (volume) as DIR ZBASIC DISK:Z FOLDER: Z FILENAME. DIR 1 and DIR 2 did not work. The way it is right now you need to know the name (exact spelling) to do a directory command.
Eject 1 and Eject 2 now works
You can specify the volume for the filename you want to run with the run statement. RUN filename,vol % This means that you can now run any application from any HFS folder with the RUN statement. This is not documented in the version of the manual that I have.
The mouse clicks in window title bars now works properly. They didn't fix that for the edit menu in the ZBasic compiler program yet though.
The default window will not appear in your standalone application if you use the WINDOW OFF statement at the beginning of your program.
Even more improvements/enhancements are in the works so keep watching here for update information.