Data Conversion
Volume Number: | | 4
|
Issue Number: | | 5
|
Column Tag: | | Basic School
|
Quick & Dirty Data Conversion
By Dave Kelly, MacTutor Editorial Board
FILE IMPORT/EXPORT with BASIC
Over the years, one of the most beneficial uses for Basic is the ability to quickly write programs to import/export data to and from data bases. This also provides a quick and dirty way to convert data or suppress unwanted characters. Id like to explain/complain about the process and some of the problems that I have encountered.
My first major data conversion came a couple of years ago when I converted 30,000 names in a mail list from my Apple //e to the Macintosh. Yes, this was quite a feat, mostly because of the bottle neck which the dinky 143K floppy drives allowed on the Apple //. And you thought the Mac 400K drives were small! Anyway, on the Apple //e I was using DB MASTER, a now orphaned product, with some small support still available from a consulting firm in Northern California. There may be some of you that are familiar with DB MASTER (sometimes it was referred to as DB MONSTER). DB MASTER was great for mail lists and still would be today except for its slow speed (thats partly the Apple //s fault).
The process went something like this: First the data needed to be converted from DB MASTER file format to ASCII so that I could direct connect the Mac and the //e together. DB MASTER utilities includes an export utility which would convert files to either DIF or a One Field per Line format. The utilities converted my mail list from my Sider hard disk to about 50 143K floppy disks (UGH!). Next the ASCII needed to be converted from Apple // ASCII to the rest of the world ASCII format. ASCII EXPRESS telecommunications software provides a conversion utility for doing this (If they didnt Id have to use my own Basic program to convert it myself. Next each disk was converted via direct null modem connection to the Mac. Whew!
To make a long story (and many hours) short, the mail list was now on the Mac, but somehow during the process some control characters were intermixed in the files. Omnis 3 would not read the files with these control characters so it was BASIC to the rescue. There are 128 standard ASCII characters of which the first 31 characters are control characters, traditionally used for teletype terminals. The problem was easily solved by writing a Basic routine to read in the file, filter out the bad characters and rewrite the file. Compile the program to speed it up!
Control Character Filter
©MacTutor, 1988
By Dave Kelly
MSBASIC
x$=FILES$(1,TEXT)
IF x$= THEN END
OPEN x$ FOR INPUT AS #1
OPEN Filter Output FOR OUTPUT AS #2
WHILE NOT EOF(1)
Char$=INPUT$(1,1)
IF ASC(Char$)>=32 THEN PRINT #2,Char$;
IF Char$=CHR$(13) THEN PRINT #2,Char$;
WEND
CLOSE #1
CLOSE #2
END
Thats ok for simple filtering. Things may need to be more complex if the original file contains unexpected characters in it. Recently, I came across the need to convert data stored in a FileMaker Plus file in order to do some simple, but unusual calculations. As drafting is done for test equipment the progress is logged into a FileMaker Plus file. Each drawing number is entered with its own charge number and the date is entered indicating the start or completion of stages of the drawing process. Occasionally the boss (who is always right because he is the boss) wants a drawing status report according to charge number (there are multiple drawings per charge number). FileMaker Plus will give the total number of drawings but only after searching for each of the charge numbers and dates individually. BASIC to the rescue.
The Drawing Count program was written to count progress of each of the charge numbers. A file is opened for input and another file for output. Each field was previously exported to a BASIC formatted file and is read by the Basic program in the same order which it was written. The charge number is saved, but each of the date fields is replaced by a flag (0 or 1) which indicated if there was anything in the date field. These 0s and 1s are added together to get the totals after the data is sorted. The sort routine could have been beefed up but, for a quick and dirty program, it does the job. The program was compiled with the MS Basic compiler which provided adequate speed results.
If you are wondering why ZBasic wasnt used here, so am I. Actually, I ran into a small problem which appears to be a bug with ZBasic 4.01 while reading the FileMaker Plus exported text files. Although ZBasics INPUT# statement is supposed to read data from the file until a carriage return, <COMMA>, End-Of-File, or 255 characters are encountered, when ZBasic saw a comma followed by a carriage return, the null between the comma and carriage return was skipped and the next field (from the next record) was read in. The data looks like this:
ITEM1,ITEM2",ITEM3",
ITEM4,ITEM5",ITEM6",ITEM7"
The null field after item3 was skipped and ITEM4 is read in. This is how data is formatted when dumped in comma delimited format as with FileMaker Plus. Sorry, Zedcor.
COORDINATE WINDOW
DEF OPEN TEXTDAVE
BREAK ON
filename$=FILES$(1,TEXT,,vol%)
IF filename$= THEN END
OPEN I,#1,filename$,,vol%
INPUT #1,A$,B$,C$,D$
PRINT @(1,10)A$,B$,C$,D$
CLOSE 1
Another file conversion problem to deal with involves the conversion of text which has been dumped from text fields. FileMaker Plus, for example, inserts an end of line mark at the end of each line when it exports data to a file. When the data is read into another program which doesnt use the end of line character to mark the end of lines (because the text is word wrapped and reformattable). If you read the FileMaker text field into Double Helix II for example, the end of line character shows up in the text field as an undefined character (a box). The filter method explained above can find these characters if you know what character it is.
Fortunately life is much easier for converting data these days than it was even five years ago. Now I use file formats like DIF and SYLK whenever I can when converting between programs that support both formats. The transfer goes a lot less painfully when the format is well defined as the DIF format is. For those not familiar with DIF, it is called the Data Interchange Format and was originally designed to by Software Arts (are they still around?) to transfer data from Visicalc (the father of all spreadsheet programs). The file is completely ASCII but has header information added to help maintain data integrity. (This saved me a few times). This means that the file will undoubtedly be larger than without DIF formatting so be sure that you only use it for file transport rather than data storage especially if your disk free space is critical. SYLK is Microsofts format. Most Microsoft products and a few data base programs such as Omnis 3 and FileMaker Plus support SYLK format. The best way to figure out how these work is to analyze some sample data from a program that exports in one of these formats.
Drawing Count Program
Converts Drawing Log Data created by FileMaker Plus
and computes total number of drawings under each date
By Dave Kelly
©1988
MS BASIC
DIM Charge$(1000),Date1(1000),Date2(1000),Date3(1000), Date4(1000)
DIM tCharge$(20),tDate1(20),tDate2(20),tDate3(20),tDate4(20)
filename$=FILES$(1,TEXT)
IF filename$= THEN END
OPEN I,1,filename$
OPEN O,2,Drawing Count.DATA
PRINT
count=0
CALL MOVETO(10,30)
PRINT Now Reading in Records...
WHILE NOT EOF(1)
count=count+1
CALL MOVETO (10,50)
PRINT count
Charge$(count)=
i$=
A$=
C$=
S$=
INPUT#1,Charge$(count),i$,A$,C$,S$
IF LEN(Charge$(count))>0 THEN
IF RIGHT$(Charge$(count),1)=) THEN
Charge$(count)=MID$(Charge$(count),LEN(Charge$(count))-4,4)
ELSE
Charge$(count)=RIGHT$(Charge$(count),4)
END IF
END IF
IF LEN(i$)>0 THEN Date1(count)=1 ELSE Date1(count)=0
IF LEN(A$)>0 THEN Date2(count)=1 ELSE Date2(count)=0
IF LEN(C$)>0 THEN Date3(count)=1 ELSE Date3(count)=0
IF LEN(S$)>0 THEN Date4(count)=1 ELSE Date4(count)=0
WEND
CALL MOVETO(10,70)
PRINT Now Sorting Data...
CALL Sort(count,Charge$(),Date1(),Date2(),Date3(),Date4())
k=0
FOR j=1 TO count
IF tCharge$(k)<>Charge$(j) THEN k=k+1
tCharge$(k)=Charge$(j)
tDate1(k)=tDate1(k)+Date1(j)
tDate2(k)=tDate2(k)+Date2(j)
tDate3(k)=tDate3(k)+Date3(j)
tDate4(k)=tDate4(k)+Date4(j)
NEXT j
CALL MOVETO(10,90)
PRINT Now Writing to File...
PRINT#2,Charge# Date1 Date2 Date3 Date4"
FOR i=1 TO k
tDate1(i)=tDate1(i)-tDate2(i)
tDate2(i)=tDate2(i)-tDate3(i)
tDate3(i)=tDate3(i)-tDate4(i)
PRINT #2,tCharge$(i),tDate1(i),tDate2(i),tDate3(i),tDate4(i)
PRINT tCharge$(i),tDate1(i),tDate2(i),tDate3(i),tDate4(i)
NEXT i
CLOSE #1
CLOSE #2
PRINTClick to continue...
WHILE MOUSE(0)<>1:WEND
END
SUB Sort(count,Charge$(),Date1(),Date2(),Date3(),Date4()) STATIC
flips=1
WHILE flips
flips=0
FOR i=2 TO count
IF Charge$(i-1)> Charge$(i) THEN
flips=1
SWAP Charge$(i-1),Charge$(i)
SWAP Date1(i-1),Date1(i)
SWAP Date2(i-1),Date2(i)
SWAP Date3(i-1),Date3(i)
SWAP Date4(i-1),Date4(i)
END IF
NEXT
WEND
END SUB
HYPER-CORNER
The new update disk for HyperCard version 1.1 contains a read me stack with three new buttons pertaining to importing and exporting data. If you dont have the new update you can get it from your Apple dealer. Version 1.1 is supposed to fix some bugs that bugged some of us.
The read me stack gives directions for using the new buttons, but Ill try to explain them a little bit here. To try them out, create a new stack with a few fields in it such as the one I created; see fig. 1.
Fig 1. Hyper Dialog
HyperCard Import/Export
The arrow keys were added from the Button Ideas stack to make it easy to check out the data, but you will need to remember the command 1,2,3,4 keys that move you from card to card or you may use the arrow (or option-arrow) keys to move from field to field.
Fig. 2 Export Dialog
The most useful of the buttons is the Export Data button. It allows you to select the background fields you wish to export and the delimiters separating the fields and records. Most database programs will read the tab delimited records (with carriage return between each record). NO, the export button will not export card fields, only background fields. That is understandable since the output needs to be consistent.
I found that the best way to see how the buttons work is to try them out. Clicking on the Export Data (or Export Text) button will produce this dialog which will let you select the background field you desire to output. (See figure 2)
Selecting the Fields for Export
After selecting the All button or after selecting which of the background fields you want to export, you specify the file name to export to. A dialog comes up to ask you for the name of the text file to export to:
Fig. 3 Editable Dialog
Selecting a Filename for Export
There is an XCMD around someplace for calling the Standard File Dialog which would be better to substitute for this dialog. The problem with XCMD routines for general use is that not everyone has them. It would be best to set up your own custom library of XCMDs to use in your own stacks.
I ran into no difficulty getting my data to export to a file. You can examine your data with a word processor to see how the text file looks when the data has been exported. I like using MS Word for examining this kind of file because you can view all the tabs and carriage returns using the Show ¶ menu item.
The Import Button works well too, except that you cant import data into existing background fields. It creates the background fields on the fly. Perhaps someone out there could create a button script that will allow you to specify which field you would like to be the first, second and so on field to be read in. AT LEAST, now these buttons are included with HyperCard. At first, it may appear that the import didnt work, but if you read the instructions in the read me stack, it explains that the new background fields are created one on top of another. To see the fields you will have to peel them apart. (see figure below)
Imported Fields
Ideally, the importing will be done first thing before the stack layout is designed if possible. Importing/Exporting of text with HyperCard has been one of the weak links. With all the interest in HyperCard these days, there should be more XCMD routines and tools we can use to improve the usefulness of HyperCard.