Structure
Volume Number: | | 1
|
Issue Number: | | 8
|
Column Tag: | | Special Projects
|
|
"The Structure of a Microsoft BASIC Program"
By Mike Steiner, MacTutor Contributing Editor
Unraveling the Mysteries
During a fit of boredom and out of curiosity, I started peeking with fEdit at Microsft BASIC to see how it was formatted when saved in compressed mode.
I found that if a character with the high bit set is not preceded by a quote, REM (or a single quote) or DATA, then the BASIC interpreter considers it a keyword, part of the coding of a number, or a syntax error. I received some help from Michael M. Boy of Elgin AZ, who wrote a utility that locates itself in memory (Listing 1). (Programs as presented here are for a 512K Macintosh. Change values as needed for a 128K Mac. Some experimentation may be necessary to find the correct values. When you have found the starting point, you can make the proper adjustments in listings 2 and 3.)
With this utility, we determined that a BASIC program usually starts at location 77002 (decimal) on a 512K Macintosh; however, on some occassions programs loaded four to six bytes lower in memory. Apparently, this situation happens when another application is run before BASIC is loaded. If the computer is reset, programs load at 77002. Mike then wrote a routine, which I modified (See listing 2), that prints a hex and ASCII dump of itself. The routine peeks memory from the start of the program until it finds the end of program marker. The memory dump is identical to that stored on disk with the exception that one byte is prefixed to the disk file to show the nature of the program (compressed or protected) and whether it was written in the Binary or Decimal version of BASIC. This, however, is part of another column. Following is a discussion of the format of Microsoft BASIC programs.
Format of Basic Program Lines
If a program line has n bytes, the line format is as follows:
Bytes 1 and 2: If the line is not numbered, the first byte of the line has the high bit cleared. If the line is numbered, then this bit is set. The first two bytes (high order byte first) show the length of the line; however only the second digit of the first byte is used. The maximum number of bytes in the line is normally 255. However, if there are colons or REM statements automatically inserted by BASIC (see below for a discussion of tokens automatically inserted by BASIC), the maximum number may exceed 255; the longest line I have seen had 259 bytes. The line length includes all bytes in the line, including those used internally by BASIC and not displayed in the program listing, such as the end of line marker.
The third byte is always $00.
Bytes 4 and 5: If the line is numbered, these bytes show the line number, high byte first; the highest line number is 65529. If there is no line number the body of the line starts with byte 4.
Bytes 6 (4 if no line number) through n-1: This is the information you typed in the line.
Byte n: Always $00 to show end of line. This value ($00) may appear within a line, but if it is not at position n, which is coded by the first two bytes, the program recognizes that it is not the end of line marker.
A blank line is represented by 00 04 00 00. This includes the end of line marker. A blank numbered line is shown by 80 06 00 HB LB 00 where HB and LB are the high and low bytes of the line number.
The end of program marker is 00 00 00 00 00 including the end of line marker for the last line of the program. These five zeroes clearly describes the end of the program when the first byte of the sequence is byte n. This sequence may also appear in the body of a line as part of the coding of a declared double precision number.
Data Format Within a Line
All text within quotes or following a REM or DATA statement are represented in positive ASCII (i.e. high bit off). However, those characters that are typed in conjunction with the Option key (e.g. Π ÷ etc.) use negative ASCII (i.e. high bit set). Numbers are coded in positive and negative ASCII. The formatting of numbers is quite complex and is beyond the scope of this column.
Reserved words are represented by negative ASCII. There are only 128 negative ASCII bytes possible, and there are over 200 reserved words; therefore some reserved words are represented by pairs of bytes (both in negative ASCII). (See tables 1 and 2.) Any byte with high bit set that is not defined as a reserved word or part of the coding for a number and is not part of a PRINT statement, a REM, or a DATA statement is not displayed in the listing and will cause an error message when program execution reaches it.
There are a few special cases: REM, ELSE, GOTO, and GOSUB.
Special Cases
REM: Microsoft BASIC lets you use the apostrophe character as an abbreviation for REM. If you do, it inserts a $3A (colon) and an $AF (REM) before the apostrophe ($E8) token, so what is actually represented is :REM' When BASIC sees these three bytes, it suppresses listing the :REM ($3A $AF) in the list window. So, you use one extra bytes of memory whenever you use an apostrophe instead of REM at the beginning of a line (If you use REM, you need to put a space after it; with the apostrophe you do not.) Using it within the line does not use any extra bytes because if you type REM there, you have to precede it with a colon.
ELSE: Similarly, if you type ELSE in an IF - THEN statement, BASIC precedes it with a non-printing colon if you do not type one. You do not use any extra bytes in this case because your only other option is to type the colon yourself. You decide whether the colon is visible in the program listing by typing it, or not visible by letting BASIC insert it.
GOTO ($97) and GOSUB ($96) are followed by 20 1B 00 00 00 and the label name, if going to a labeled line. If going to a numbered line, the token is followed by 20 0E 00 and the line number, which is represented by two bytes, high byte first.
Managing Memory
From the above information, we can see that if available memory is a constraint, you are better off using line numbers rather than line labels in your programs. Line numbers use only two bytes in the line whereas a label uses one byte for each character in the label plus one more for the mandatory colon. Further, each reference to a labeled line elsewhere in the program uses five bytes plus the length of the label, whereas a reference to a numbered line always uses exactly five bytes. Of course, if the line is not referenced anywhere in the program, neither a label nor a line number is needed.
Description of the Goodies
Listing 1 is the locator program that finds itself in memory by searching for the REM token in the first line.
Listing 2 is the poke program that will poke the token of your choice into memory to replace a REM statement, thereby self-modifying the program. This is great for getting mathematical input and then executing it to return the value of an inputted function. This same technique was used several years ago on the Apple II by several companies to produce plot packages that could take an inputted function string and plot the results. With this utility, you can accomplish this same technique on the Macintosh.
Listing 3 is a program fragment that will do a memory dump of your program in hex and ascii.
Tables 1 is a listing of the reserved words in Microsoft Basic, sorted by ASCII code. Use this table with the poke utility to convert remark statments into new BASIC code dynamically.
Basic Listing #1: Locator Program
REM }|{here
x$ = "}|{here" : REM x$ must be the same as the REM on the above line
y$ = LEFT$(x$,1)
x = 42000! : REM start searching here, should be suitable for 128K Mac
at this location
FOR i = x TO 512*1024
z$ = CHR$(PEEK(i))
IF z$ <> y$ THEN elp1
a$ = ""
FOR j = 1 TO LEN(x$)-1
a$ = a$ + CHR$(PEEK(i+j))
NEXT j
IF a$ = RIGHT$(x$,LEN(x$)-1) THEN PRINT "we got it at "; i : END
elp1:
IF i = x THEN PRINT "now at ";x : x = i + 1000
NEXT i
REM This program does not give the start of the program. It gives the
location where the first character in the REM statement begins. Start
of program is lower in memory.
Basic Listing #2: Poke Token in Memory
SUB printit STATIC
SHARED b
PRINT | (b) :REM the vertical bar is a place holder for the value to
be poked and is replaced with the token for the function to executed
by POKE 77031. Run the program and list it again. The vertical bar
will be replaced by the function you selected. DO NOT INSERT ANY TEXT
BEFORE THE VERTICAL BAR or the program will not work. The bar, however,
may be replaced by any character.
END SUB
OPTION BASE 1
DIM funct (5),funct$(5)
DATA 130, 160, 181,183, 186, ATN, COS, SIN, SQR, TAN
FOR i = 1 TO 5: READ funct (i): NEXT
FOR i = 1 TO 5: READ funct$(i):NEXT
CLS
PRINT"Enter Function you want evaluated"
PRINT
PRINT" 1) ATN 2) COS 3) SIN
PRINT" 4) SQR 5) TAN
getfunction: INPUT "Your choice > ",a: IF a<1 OR a>5 THEN getfunction
INPUT "Enter value to be processed > ",b
POKE 77031!,funct (a):REM Poke the token into memory
PRINT: PRINT
PRINT "The "; funct$(a); " of "; b; "is ";
CALL printit
Basic Listing #3:
SUB prtmem STATIC :REM Merge this routine to your program. Then CALL
prtmem from the command window.
CLS
CALL TEXTFONT (4)
CALL TEXTSIZE (9)
i= 77001! :REM start of program minus 1
WHILE (PEEK(i) + PEEK(i-1) + PEEK(i-2) + PEEK(i-3) + PEEK (i-4)) <>0
: REM Look for end of program marker NOTE this may fail if numbers are
declared as double precision.
k = k + 1:i = i + 1
IF k= 1 THEN PRINT USING "######"; i;:"-";
PRINT RIGHT$("0" + HEX$(PEEK(i)),2);" ";
IF PEEK(i) > 31 THEN b$ = CHR$ (PEEK(i)): ELSE b$ = "."
a$ = a$ +b$
IF k = 8 THEN PRINT TAB (35);a$ : k = 0 : a$ = "": REM Print 8 bytes
then Print ASCII representation
WEND
IF k<> 0 THEN PRINT TAB (35);a$:REM Print ASCII for last line
END SUB
How This Publication was Created
Since this is our first laser printed Journal, we will join with the other Mac magazines in describing our labor pains. The text of the articles were created by the various members of the editorial staff, contributing editors and board members, using MacWrite disk version. The articles were transmitted to the editorial office over MacTerminal, in Mac-to-Mac mode, or sent US Post on floppy disk. Drawings were done in MacPaint or MacDraw, ver. 1.7. The final composition was done on Pagemaker by Aldus. The pagemaker articles were then Laser printed and sent to a web press printer. Viola! A completely electronic magazine, no paste-up!
BASIC TOKEN LIST
BY ASCII NUMBER: MICROSOFT BASIC 2.0
TOKEN DECIMAL HEX
ABS 128 80
ASC 129 81
ATN 130 82
CALL 131 83
CDBL 132 84
CHR$ 133 85
CINT 134 86
CLOSE 135 87
COMMON 136 88
COS 137 89
CVD 138 8A
CVI 139 8B
CVS 140 8C
DATA 141 8D
ELSE 142 8E
EOF 143 8F
EXP 144 90
FIELD 145 91
FIX 146 92
FN 147 93
FOR 148 94
GET 149 95
GOSUB 150 96
GOTO 151 97
IF 152 98
INKEY$ 153 99
INPUT 154 9A
INT 155 9B
LEFT$ 156 9C
LEN 157 9D
LET 158 9E
LINE 159 9F
LOC 161 A1
LOF 162 A2
LOG 163 A3
LSET 164 A4
MID$ 165 A5
MKD$ 166 A6
MKI$ 167 A7
MKS$ 168 A8
NEXT 169 A9
ON 170 AA
OPEN 171 AB
PRINT 172 AC
PUT 173 AD
READ 174 AE
REM 175 AF
RETURN 176 B0
RIGHT$ 177 B1
RND 178 B2
RSET 179 B3
SGN 180 B4
SIN 181 B5
SPACE$ 182 B6
SQR 183 B7
STR$ 184 B8
STRING$ 185 B9
TAN 186 BA
VAL 188 BC
WEND 189 BD
WHILE 190 BE
WRITE 191 BF
STATIC 227 E3
USING 228 E4
TO 229 E5
THEN 230 E6
NOT 231 E7
' (SINGLE QUOTE) 232 E8
> 233 E9
= 234 EA
< 235 EB
+ (PLUS) 236 EC
- (MINUS) 237 ED
* 238 EE
/ 239 EF
^ (CARET) 240 F0
AND 241 F1
OR 242 F2
XOR 243 F3
EQV 244 F4
IMP 245 F5
MOD 246 F6
/ 247 F7
AUTO 248 128 F8 80
CHAIN 248 129 F8 81
CLEAR 248 130 F8 82
CLS 248 131 F8 83
CONT 248 132 F8 84
CSNG 248 133 F8 85
DATE$ 248 134 F8 86
DEFINT 248 135 F8 87
DEFSNG 248 136 F8 88
DEFDBL 248 137 F8 89
DEFSTR 248 138 F8 8A
DEF 248 139 F8 8B
DELETE 248 140 F8 8C
DIM 248 141 F8 8D
EDIT 248 142 F8 8E
END 248 143 F8 8F
ERASE 248 144 F8 90
ERL 248 145 F8 91
ERROR 248 146 F8 92
ERR 248 147 F8 93
FILES 248 148 F8 94
FRE 248 149 F8 95
HEX$ 248 150 F8 96
INSTR 248 151 F8 97
KILL 248 152 F8 98
LIST 248 153 F8 99
LLIST 248 154 F8 9A
LOAD 248 155 F8 9B
LPOS 248 156 F8 9C
LPRINT 248 157 F8 9D
MERGE 248 158 F8 9E
NAME 248 159 F8 9F
NEW 248 160 F8 A0
OCT$ 248 161 F8 A1
OPTION 248 162 F8 A2
PEEK 248 163 F8 A3
POKE 248 164 F8 A4
POS 248 165 F8 A5
RANDOMIZE 248 166 F8 A6
RENUM 248 167 F8 A7
RESTORE 248 168 F8 A8
RESUME 248 169 F8 A9
RUN 248 170 F8 AA
SAVE 248 171 F8 AB
STOP 248 173 F8 AD
SWAP 248 174 F8 AE
SYSTEM 248 175 F8 AF
TIME 248 176 F8 B0
TRON 248 177 F8 B1
TROFF 248 178 F8 B2
VARPTR 248 179 F8 B3
WIDTH 248 180 F8 B4
BEEP 248 181 F8 B5
CIRCLE 248 182 F8 B6
LCOPY 248 183 F8 B7
MOUSE 248 184 F8 B8
POINT 248 185 F8 B9
PRESET 248 186 F8 BA
PSET 248 187 F8 BB
RESET 248 188 F8 BC
TIMER 248 189 F8 BD
SUB 248 190 F8 BE
EXIT 248 191 F8 BF
SOUND 248 192 F8 C0
BUTTON 248 193 F8 C1
MENU 248 194 F8 C2
WINDOW 248 195 F8 C3
DIALOG 248 196 F8 C4
LOCATE 248 197 F8 C5
CSRLIN 248 198 F8 C6
LBOUND 248 199 F8 C7
UBOUND 248 200 F8 C8
SHARE 248 201 F8 C9
UCASE$ 248 202 F8 CA
SCROLL 248 203 F8 CB
LIBRARY 248 204 F8 CC
CVSBCD 248 205 F8 CD
CVDBCD 248 206 F8 CE
MKSBCD$ 248 207 F8 CF
MKDBCD$ 248 208 F8 D0
OFF 249 244 F9 F4
BREAK 249 245 F9 F5
WAIT 249 246 F9 F6
USR 249 247 F9 F7
TAB 249 248 F9 F8
STEP 249 249 F9 F9
SPC 249 250 F9 FA
OUTPUT 249 251 F9 FB
BASE 249 252 F9 FC
AS 249 253 F9 FD
APPEND 249 254 F9 FE
ALL 249 255 F9 FF
PICTURE 250 128 FA 80
WAVE 250 129 FA 81
LINETO 251 94 FB 5E
FILLPOLLY 251 210 FB D2
INVERTPOLY 251 211 FB D3
ERASEPOLY 251 212 FB D4
PAINTPOLY 251 213 FB D5
FRAMEPOLY 251 214 FB D6
PTAB 251 215 FB D7
FILLARC 251 216 FB D8
INVERTARC 251 217 FB D9
ERASEARC 251 218 FB DA
PAINTARC 251 219 FB DB
FRAMEARC 251 220 FB DC
FILLROUNDRECT 251 221 FB DD
INVERTROUNDRECT 251 222 FB DE
ERASEROUNDRECT 251 223 FB DF
PAINTROUNDRECT 251 224 FB E0
FRAMEROUNDRECT 251 225 FB E1
FILLOVAL 251 226 FB E2
INVERTOVAL 251 227 FB E3
ERASEOVAL 251 228 FB E4
PAINTOVAL 251 229 FB E5
FRAMEOVAL 251 230 FB E6
FILLRECT 251 231 FB E7
INVERTRECT 251 232 FB E8
ERASERECT 251 233 FB E9
PAINTRECT 251 234 FB EA
FRAMERECT 251 235 FB EB
TEXTSIZE 251 236 FB EC
TEXTMODE 251 237 FB ED
TEXTFACE 251 238 FB EE
TEXTFONT 251 239 FB EF
MOVE 251 241 FB F1
MOVETO 251 242 FB F2
PENNORMAL 251 243 FB F3
PENPAT 251 244 FB F4
PENMODE 251 245 FB F5
PENSIZE 251 246 FB F6
GETPEN 251 247 FB F7
SHOWPEN 251 248 FB F8
HIDEPEN 251 249 FB F9
OBSURECURSOR 251 250 FB FA
SHOWCURSOR 251 251 FB FB
HIDECURSOR 251 252 FB FC
SETCURSOR 251 253 FB FD
INITCURSOR 251 254 FB FE
BACKPAT 251 255 FB FF