Autumn 91 - MACINTOSH DEBUGGING: A WEIRD JOURNEY
MACINTOSH DEBUGGING: A WEIRD JOURNEY INTO THE BELLY OF THE BEAST
BO3B JOHNSON AND FRED HUXHAM
ADAPTED FROM THEIR TALK AT THE WWDC BY DAVE JOHNSON
Macintosh debugging is a strange and difficult task. This article provides a collection
of tried-and-true debugging techniques Bo3b and Fred discussed at Apple's
Worldwide Developers Conference in May 1991. These techniques can ease your
debugging woes and make your life a lot simpler. They're guaranteed to help you
find your bugs earlier on, saving you hours of suffering.
The first thing you should know is that debugging is hard . Drinking gallons of Mountain Dew won't
help much, nor will seeking magic formulas or spreading fresh goat entrails around your keyboard
and chanting. The only way to get better at it is to do it a lot, and even then it's still hard. What
we're going to talk about are a number of techniques that will make debugging a little bit easier.
Notice that the title of this article is "MacintoshDebugging " and not "MacintoshDebuggers ." We're
not going to do a comparative review of debuggers. We're not going to show you how to use them.
In fact, we recommend that you buy and useall the ones described here. Each has useful features that
the others don't have. Which you use most often is up to you--pick one as your main debugger and
really get to know it, but keep all of them around.
The main Macintosh debuggers are
- MacsBug from Apple
- TMON (we often refer to version 2.8.4 as Old TMON) and TMON Professional
(version 3.0, called TMON Pro for short) from Icom Simulations, Inc.
- The Debugger from Jasik Designs (we'll call it "Jasik's debugger" here, because
Steve Jasik wrote it, and that's what everybody calls it in conversation)
We'll touch on many of the individual features of these debuggers in this article.
The hardest bugs to find are those that are not reproducible. If you have a crashing bug that can be
reproduced 100 percent of the time, you're well on your way to fixing it. But a bug that crashes your
application only once every few hours, at seemingly random times . . . well, that kind can take days or
weeks to find. Often the ultimate failure of a program is caused by code that executed long ago.
Tracing back to find the real problem can be difficult and extremely time consuming.
The techniques we show you in this article will help turn most of your random bugs into completely
reproducible ones. These techniques are designed to make your software crash or to otherwise alert
you as close as possible to where your code is doing something wrong.
We explain what each technique is, why it works, and any gotchas you need to be aware of. Then we
tell you how to turn it on or invoke it and list some of the common Macintosh programming errors it
will catch. Finally, we show a code sample or two. The code samples were chosen for a number of
reasons:
- The errors in many of them are subtle. We couldn't tell what was wrong with
some of them after not looking at them for a couple months, and we wrote them
in the first place.
- The mistakes are common. We've seen people make these same mistakes time and
time again.
- They're short. They had to fit on one slide at our Worldwide Developers
Conference presentation.
So, on to our first technique . . . .
SET $0 TO $50FFC001
The basic idea here is that the number $0 comes up a lot when things go wrong on the Macintosh.
When you try to allocate memory or read in a resource, and it fails, what gets returned is $0.
Programs should always check to see that when they ask for something from the Toolbox, they
actually get it.
Programs that don't check and use $0 as an address or try to execute from there are asking for
trouble. The code will often work without crashing, but presumably it's not doing what it was meant
to do, since there isn't anything down there that even remotely resembles resources or data in a
program.
Why $50FFC001? Old TMON used this number when we turned on Discipline (more on Discipline
later). This fine number has the following characteristics:
- Used as a pointer (address), $50FFC001 is in funny space on all Macintosh
computers--that is, it's in I/O space, which is currently just blank. Any relative
addresses close by are going to be in I/O space as well, so positive or negative
offsets from that as a base will crash, too. These types of offset are common when
referencing globals or record fields.
- When used as an address, it will cause a bus error on 68020, '030, and '040
machines. Because there's no RAM there, and no device to respond, the hardware
returns a bus error, crashing the program at the exact instruction. Without this
handy number, you not only won't crash, you won't even know the bug exists (for
a while . . . ).
- On 68000 machines, $50FFC001 will cause an address error because it's an odd
number. This also stops the offending code at the exact line that has a bug.
- If the program tries to execute the code at memory location $0, it will crash with
an illegal instruction, since the $50FF is not a valid opcode. This is nice when you
accidentally JSR to $0 and the program tries to run from there. Those low-
memory vectors are certainly not code but don't usually cause a crash until much
later.
- It's easy to recognize because it doesn't look like any normal number. If a program
uses memory location $0 as a source for data, this funny number will be copied
into data structures. If you see it in a valid data structure someplace else you know
there's a bug lurking in the program that's getting data from $0 instead of from
where it should.
Many different funny bus error numbers can be used. Take your pick.
AVAILABILITY
You can find various programs that set up memory location $0 in this helpful way, or you can build
your own.
- EvenBetterBusError (included on theDeveloper CD Series disc) is a simple INIT
that sets memory location $0 to $50FFC003. It also installs a VBL to make sure
no one changes it.
- Under System 7, the thing that used to be MultiFinder (now the Process
Manager) takes whatever is in memory location $0 when it starts up and saves it.
Periodically it stuffs that saved number back in. If it were a bus error number at
system startup (from an INIT, say), that number would be refreshed very nicely.
With MacsBug, it would be easy to build a dcmd that stuffs memory location $0
during MacsBug INIT, and MultiFinder would then save and restore that number.
- Jasik's debugger has a flag that allows you to turn the option on or off.
- Old TMON will set up the bus error number when Discipline is turned on.
TMON Pro has a script command, NastyO, that will also do this.
- You can put code in your main event loop that stuffs the bus error number into
memory location $0. Be sure to remove it before you ship.
ERRORS CAUGHT
The most obvious catch using this technique is the inadvertent use of NIL handles (or pointers). NIL
handles can come back from the Resource Manager and the Memory Manager during failed calls. If a
program is being sloppy and not checking errors, it's easy to fall into using a NIL handle, and this
technique will flush it out. A double dereference of a NIL handle will crash the computer. Something
like
newArf := aHandle^^.arf;
will crash if aHandle is $0 and we've installed this nice bus error number.
This technique will tell when a program inadvertently jumps off to $0 as a place to execute code,
which can happen from misaligned stacks or from trying to execute a purged code resource.
By watching for the funny numbers to show up in data structures, you can find out when NIL
pointers are being used as the source for data. This is surely not what was meant, and they're easy to
find when a distinctive number points them out. These uses won't crash the computer, of course.
CODE SAMPLE
theHandle = GetResource('dumb', 1);
aChar = **theHandle;
This is easy: the GetResource call may fail. If the 'dumb' resource isn't around, theHandle becomes
NIL. Dereferencing theHandle when it's NIL is a bug, since aChar ends up being some wacko
number out of ROM in the normal case
(ROMBase at $0) and cannot be assumed to be what was desired. This bus error technique will crash
the computer right at the **theHandle, pointing out the lack of error checking.
HEAP SCRAMBLE AND PURGE
With this option on, all movable blocks of memory (handles) are moved, and all purgeable blocks are
purged, whenever memory
can be moved or purged--which is different from moving and purging
memory whenever it
needs to be moved or purged. This technique is excellent at forcing those once-
a-month crashing bugs to crash more often--like all the time. You should run your entire program
with this option on, in combination with the bus error technique, using all program features and
really putting it through its paces. You'll be glad you did. Because this debugger option simulates
running under low-memory conditions all the time, it stress-tests the program's memory usage.
AVAILABILITY
All the debuggers have this option, but the one most worth using is in Old TMON and TMON Pro,
since it implements both scramble (moving memory) and purge. MacsBug and Jasik's debugger both
have scramble, but they're too slow, and neither has a purge option.
ERRORS CAUGHT
This technique will catch improper usage of dereferenced or purgeable handles, mistakes that fall
into the "easy to make, hard to discover" category. The technique will also catch blocks that are
overwritten accidentally, since there's an implicit heap check each time the heap is scrambled. Warning: The bugs you find may not be yours.
CODE SAMPLE
aPicture = GetPicture(1);
FailNil(aPicture);
aPtr = NewPtr(500000);
FailNil(aPtr);
aRect = (**aPicture).picFrame;
DrawPicture(aPicture, &aRect);
Here, if the picture is purgeable, it might be purged to make room in the heap for the large pointer
allocated next. This would make aRect garbage, and DrawPicture wouldn't work as intended,
probably drawing nothing. Here's a similar example in Pascal:
aPicture := GetPicture(kResNum);
FailNil(aPicture);
WITH aPicture^^ DO
BEGIN
aPtr := NewPtr(500000);
FailNil(aPtr);
aRect := picFrame;
END; {WITH}
Here, even if the picture isn't purged, the NewPtr call might move it, invalidating the WITH
statement and resulting, again, in a bad aRect.
ZAPPING HANDLES
The idea here is to trash disposed memory at the time it's disposed of in order to catch subsequent
use of the free blocks. The technique fills disposed memory with bus error numbers, so that if you
attempt to use disposed memory later, the program will crash. A related option is MPW Pascal's -u
option, which initializes local and global variables to $7267.
AVAILABILITY
This technique is implemented as a part of Jasik's Discipline option and is also a dcmd, available on
theDeveloper CD Series disc, for TMON Pro or MacsBug. You can also just write it into your
program by writing bottleneck routines for disposing of memory (such as MyDisposHandle,
MyDisposPtr) that fill blocks with bus error numbers just before freeing them. The problem with
this is that memory freed by other calls (ReleaseResource, for instance) isn't affected. We
recommend the dcmd or Jasik's Discipline.
ERRORS CAUGHT
This technique will catch reusing deallocated memory or disposing of memory in the wrong order. It
can also catch uninitialized variables, since after you've been running it for a while, much of the free
memory in the heap will be filled with bus error numbers.
CODE SAMPLE
SetWRefCon(aWindowPtr, (long)aHandle);. . .
DisposeWindow(aWindowPtr);
DisposHandle((Handle) GetWRefCon(aWindowPtr));
The GetWRefCon will work on a disposed window, but it's definitely a bug. Zapping the handles
sets the refCon to a bus error number, forcing the DisposHandle call to fail.
CHECKSUM $0
Once again, we're dealing with the address $0. This technique, however, is sort of the opposite of the
first one: it catches writing to $0 rather than reading or executing from it.
AVAILABILITY
This one is easy: you can set up a checksum so that you'll drop into the debugger whenever the value
at $0 changes. All the debuggers have a way to do this. Also, EvenBetterBusError sets up a VBL to
detect if $0 changes, but since VBL tasks don't run very often (relative to the CPU, anyway), you'll
probably be far away in your code by the time it notices. It's still much better than nothing, though,
since knowing the bug exists is the first step toward fixing it.
Note that on the IIci the Memory Manager itself changes $0, so you'll get spurious results.
EvenBetterBusError knows about this and ignores it.
ERRORS CAUGHT
The errors caught by this technique are much the same as those caught by the first technique, except
that this one catches writes rather than reads. This way, if your code tries to write to address $0 (by
dereferencing a NIL handle or pointer), you'll know.
CODE SAMPLE
aPtr = NewPtr(kBuffSize);
BlockMove(anotherPtr, aPtr, kBuffSize);
This one's pretty obvious: if the NewPtr call fails, aPtr will be NIL, and the BlockMove will stomp
all over low memory. If kBuffSize is big enough, this will take you right out, trashing all your low-
memory vectors and your debugger, too.
DISCIPLINE
Discipline is a debugger feature that checks for bogus parameters to Toolbox calls. It would of course
be nice if the Toolbox itself did more error checking, but for performance reasons it can't. (Be
forewarned that some versions of the system have errors that Discipline will catch.) Discipline is the
perfect development-time test. It catches all those stupid mistakes you make when typing your code
that somehow get past the compiler and may persist for some time before you discover them. It can
literally save you hours tracking down foolish parameter bugs that should never have happened in the
first place.
AVAILABILITY
Old TMON has an early version of Discipline, but there are no checks for Color QuickDraw calls or
later system calls, so its usefulness is limited. There is an INIT version of Discipline (on theDeveloper
CD Series disc with MacsBug) that works in conjunction with MacsBug or TMON Pro that's quite
usable, if slow and clunky. Jasik's version of Discipline is far and away the best; use it if you can.
ERRORS CAUGHT
As you'd expect, Discipline catches Toolbox calls with bad arguments, like bogus handles, and also
sometimes catches bad environment states, like trying to draw into a bad grafPort.
CODE SAMPLE
aHandle = GetResource('dumb', 1);
FailNil(aHandle);
. . .
DisposHandle(aHandle);
The problem here is that a resource handle has to be thrown away with ReleaseResource, not
DisposHandle. Otherwise, the Resource Manager will get confused since the resource map won't be
properly updated. Sometime later (maybe much later) Very Bad Things will happen.
32-BIT MEMORY MODE
Running in full 32-bit mode in System 7 forces the Memory Manager and the program counter to
use full 32-bit addresses: this is something new on the Macintosh. The old-style (24-bit) Memory
Manager used the top byte of handles to store the block attributes (whether or not the handle was
locked, purgeable, and so forth). By running your program in 32-bit mode, you'll flush out any code
that mucks with the top bits of an address, for any reason, accidentally or on purpose. In the past,
many programs examined or modified block attributes directly. This is a bad idea. Use the Toolbox
calls HGetState and HSetState to get and set block attributes.
AVAILABILITY
You get 32-bit memory mode with System 7, of course! You use the Memory cdev to turn on 32-bit
addressing, available only on machines that have 32-bit-clean ROMs (Macintosh IIfx, IIci, IIsi). You
should also install more than 8 MB of RAM and launch your application first, so that it goes into
memory thatrequires 32-bit addressing (within the 8 MB area, addresses use only 24 bits). We also
recommend using TMON's heap scramble in 32-bit mode, since the block headers are different.
ERRORS CAUGHT
You can inadvertently mess up addresses in a bunch of ways. Obviously, any code that makes
assumptions about block structures is suspect. Doing signed math on pointers is another one that
comes up pretty often. Any messing with the top bytes of addresses can get you into big trouble,
jumping off into weird space, where you have no business.
CODE SAMPLE
aHandle = (Handle) ((long) aHandle | 0x80000000);
Naturally, this method of locking a handle is not a good idea, since in 32-bit mode the locked bit
isn't even there. Use HLock or HSetState; they'll do the right thing.
FULL COMPILER AND LINKER WARNINGS
Always develop your code with full warnings on. When you're compiling and linking your program,
any number of errors or warnings will be emitted. The errors are for things that are just plain wrong,
so you'll have to fix those immediately. Warnings, however, indicate things that aren't absolutely
wrong, but certainly are questionable as far as the compiler or linker is concerned.
We think you should fix every problem as soon as a warning first appears, even if there's "nothing
wrong" with the code. If you leave the warnings in, little by little they'll pile up, and pretty soon
you'll have pages full of warnings spewing out every time you do a build. You know you won't read
through them every time. You'll probably just redirect the warnings to a file you never look at so that
your worksheet won't be sullied. Then the one warning thatwill cause a problem will sneak right by
you, and much later you'll find out that the totally nasty, hard-to-find bug that you finally corrected
was one the compiler warned you about a month ago. To avoid this painful experience, deal with the
warnings when they appear, even if they're false alarms.
AVAILABILITY
Use the compiler and linker options that turn on full warnings:
- MPW C++: The "-w2" option turns on the maximum compiler warnings.
- MPW C: Use "-warnings full" ("-w2" does the same thing). In addition, the "-r"
option will warn you if you call a function with no definition.
- MPW Linker: The "-msgkeyword " option controls the linker warnings. Keyword is
one or more of these: dup, which enables warnings about duplicate symbols;
multiple, which enables multiple warnings on undefined references to a label (you
can thus find all the undefined references in one link); and warn, which enables
warnings.
- THINK C: Because the compile is stopped when a warning is encountered, it
forces you to fix all warnings. Some people like this; others don't. We do, but you
decide. Be sure that "Check Pointer Types" is turned on in the compiler options.
- Pascal: Most of the things that cause warnings in C are automatically enforced.
If you're coding in C, it's also a good idea to prototypeall your routines. This avoids silly errors.
ERRORS CAUGHT
The compiler and linker will tell you about lots of things. Some examples are
- the use of uninitialized variables (which is a real bug)
- bad function arguments
- unused variables (these confuse the code and may be real bugs)
- argument mismatches (probably bugs)
- signed math overflow
In C++, overriding operator new without overriding operator delete is probably a bug and
unintentional. Even if a warning is caused by something intentional, fix it so that the warning won't
appear.
CODE SAMPLE
#define kMagicNumber 12345
. . .
short result;
result = kMagicNumber*99;
The problem with this code is that the multiplication is overflowing a 16-bit short value. If you have
full compiler warnings on, the MPW compiler will let you know this with the following error
message:
### Warning 276 This assignment may lose some significant bits
MEMORY PROTECTION
This is something you've always wanted: a way to get a protected memory model for the Macintosh.
With memory protection on, memory accesses outside the application's RAM space would be caught
as illegal, giving you the chance to find bad program assumptions and wild references. Only Jasik's
debugger has this feature now.
The protected mode is only partly successful, though, since the Macintosh has nothing that
resembles a standard operating system. The problems stem from how programs are expected to run,
in that references to some low-memory globals are OK, and code and data share the same address
space. Given the anarchy in the system, the way Jasik set it up is to allow protection of applications
only. The protected mode also protects the CODE resources in the application from being
overwritten.
Although this protected mode is not as good as having the OS support protected memory spaces, it's
still a giant leap ahead in terms of finding bugs in your programs. By catching these stray references
during development, you can be assured that the user won't get random crashes because of your
program. This is an ideal development tool for catching latent bugs that don't often show up. Who
knows what a write to a random spot in memory may hit? Sometimes you're just lucky, and those
random "stomper" bugs remain benign, but more often they're insidiously nasty.
AVAILABILITY
This tool is currently implemented only in Jasik's debugger. The memory protection is implemented
using the MMU, and it slows down the machine by around 20 percent. It's a mixed blessing, since it
will crash on any number of spurious errors-- use it anyway.
ERRORS CAUGHT
If the application writes to low memory or to the system heap, it's probably not what was desired. A
few cases could be deemed necessary, but in general, any references outside the application heap
space are considered suspect. Certainly, modifying system variables is not a common task that
applications need to support. This memory protection will catch those specific references and give
you the chance to be sure that they're valid and necessary.
Writing to I/O space or screen RAM is another problem this technique will catch. Writing directly
to the screen is bad form, and only tacky programs (and games, which must do it) stoop this low.
Even HyperCard writes directly to the screen; please don't emulate it. Some specialized programs
could make an argument for writing to I/O space, since they may have a device they need to use up
there. This protection will catch those references and point out a logically superior approach, which
is to build a driver to interface to that space, instead of accessing it directly.
CODE SAMPLE
*((long*) 0x16A) = aLong;
The low-memory global Ticks is being modified. Writing to low-memory globals is a Very Bad
Thing to do. This will be caught by memory protection.
LEAKS
A memory leak occurs when a program allocates a block of memory with either NewHandle or
NewPtr (or even with Pascal New or C malloc, both of which turn into NewPtr at a lower level), but
that block is never disposed of, and the reference to it is lost or written over. If a program does this
often enough, it will run out of RAM and probably crash. This leads to the famous statement:
"Properly written Macintosh programs will run for hours, even days, without crashing"--a standing
joke in Developer Technical Support for so long we've forgotten the original source. Naturally, if the
program is leaking in the main event loop, it will crash sooner than if it leaks from some rare
operation. If it leaks at all, it will ultimately fail and crash some poor user.
AVAILABILITY
A simple technique that all debuggers support can tell you whether or not the program is leaking. Do
a Heap Total and check the amount of free space and purgeable space that's available. Run the
program through its paces and then see if the amount of free space plus purgeable space has dropped.
If it has, try again, under the assumption that the program might have loaded some code or other
data the first time around. If it's still smaller, it's likely to be a leak. This approach, of course, only
shows that youhave a leak; tracking it down is the hard part. But, hey, you can't start tracking till you
know it's there.
There's a dcmd called Leaks (on theDeveloper CD Series disc) that runs under both TMON Pro and
MacsBug. The basic premise is to watch all the memory allocations to see if they get disposed of
correctly. Leaks patches the traps NewHandle, DisposHandle, NewPtr, and DisposPtr. When a new
handle or pointer is allocated on the heap, Leaks saves the address into an internal buffer. When the
corresponding DisposHandle or DisposPtr comes by, Leaks looks it up in the list and, if it finds the
same address, dumps that record as having been properly disposed of. Now all those records on the
Leaks list that didn't have the corresponding dispose are candidate memory leaks.
The Macintosh has a lot of fairly dynamic data, so Leaks often ends up getting a number of things on
its list that haven't been disposed of but are not actually leaks. They're just first-time data, or loaded
resources. To avoid false alarms, the Leaks dcmd requires that you perform the operation under
question three times, in order to get three or more items in its list that are similar in size and
allocated from the same place in the program. An operation can be as simple or complex as desired,
since every memory allocation is watched. An example of an operation to watch is to choose New
from a menu and then choose Close, under the assumption that those are complementary functions.
If you do this three times in a row with Leaks turned on, anything that Leaks coughs out will very
likely be a memory leak for that operation.
The dcmd saves a shortened stack crawl of where the memory is being allocated, so that potential
leaks can be found back in the source code.
One problem with Leaks as a dcmd is that if it's installed as part of the TMON Pro startup, it
patches the traps using a tail patch. Tail patches are bad, since they disable bug fixes the system may
have installed on those traps. This could cause a bug to show up in your program that isn't there in
an unpatched system. It's still probably worth the risk, given the functionality Leaks can provide. The
problem doesn't exist with MacsBug, since the traps are patched by the dcmd before the system
patches them.
A vastly superior way around this problem is to provide the Leaks functionality as debugging code,
instead of relying on an external tool. By writing an intermediate routine that acts as a "wrapper"
around any memory allocations your program does, you can watch all the handles and pointers go by,
do your own list management to know when the list should be empty, and dump out the information
when it isn't. By wrapping those allocations, you avoid patching traps (always a good idea). Be sure to
watch for secondary allocations, such as GetResource/DetachResource pairs. You may still want to
run Leaks when you notice memory being lost, but your wrappers don't notice it.
ERRORS CAUGHT
Potential memory leaks, but you knew that already.
CODE SAMPLE
anIcon := GetCIcon(kIconId);
PlotCIcon(aRect, anIcon);
DisposHandle(Handle (anIcon));
This orphans any number of handles, because the GetCIcon call will create several extra handles for
pixMaps and color tables. This is an easy error to make, since the GetCIcon returns a CIconHandle,
which seems a lot like a PicHandle. A PicHandle is a single handle, though, and a CIconHandle is a
number of pieces. Always use the corresponding dispose call for a given data structure. In this case,
the appropriate call is DisposCIcon.
STRESS ERROR HANDLING
Here the goal is to see how the program deals with less than perfect situations. Your program won't
always have enough RAM or disk space to run smoothly, and it's best to plan for it. The first step is
to write the code defensively, so that any potential error conditions are caught and handled in the
code. If you don't put in the error-handling code, you're writing software that never expects to be
stressed, which is an unreasonable assumption on the Macintosh.
AVAILABILITY
Try running the program in a memory-critical mode, where it doesn't have enough RAM even to
start up. Users can get into this unfortunate situation by changing the application's partition size.
Rather than crash, put up an alert to tell users what went wrong, and then bail out gracefully. Try
running with just enough RAM to start up, but not enough to open documents. Be sure the program
doesn't crash and does give the user some feedback. Try running in situations where there isn't
enough RAM to edit a document, and make sure it handles them. What happens if you get a
memory-low message, and you try to save? If you can't save, the user will be annoyed. What happens
when you try to print?
Run your program on a locked disk, and try to save files on the locked disk. The errors you get back
should be handled in a nice way, giving the user some feedback. This will often find assumptions in
the code, like, "I'm sure it will always be run from a hard disk."
To see if you handle disk-full errors in a nice way, be sure to try a disk that has varying amounts of
free space left. Here again, if you've only ever tested on a big, old, empty hard disk, it may shock you
to find out that your users are running on a double-floppy-disk Macintosh SE and aren't too happy
that disk-full errors crash the program. A particularly annoying common error is saving over a file on
the disk. Some programs will delete the old file first and then try to save. If a disk-full error occurs,
the old copy of the data has been deleted, leaving the user in a precarious state. Don't force a user to
switch disks, but allow the opportunity.
Especially with the advent of System 7, you should see how your program handles the volume
permissions of AppleShare. Since any Macintosh can now be an AppleShare server, you can definitely
expect to see permission errors added to the list of possible disk errors. Try saving files into folders
you don't have permission to access, and see if the program handles the error properly.
ERRORS CAUGHT
Inappropriate error handling, unnecessary crashes, lack of robustness, and general unfriendliness.
CODE SAMPLE
i := 0;
REPEAT
i := i + 1;
WITH pb DO
BEGIN
ioNamePtr := NIL;
ioVRefnum := 0;
ioDirID := 0;
ioFDirIndex := i;
END;
err := PBGetCatInfo (@pb, False);
UNTIL err <> noErr;
This sample is trying to enumerate all files and directories inside a particular directory by calling
PBGetCatInfo until it gets an error. (Note that this sample does one very important thing:
initializing the ioNamePtr field to NIL to keep it from returning a string at some random place in
memory.) The problem with this loop is that it assumes that any error it finds is the loop termination
case. For an AppleSharevolume, you may get something as simple as a permission error for a directory you don't have access
to. This is probably not the end of the entire search, but the code will bail out. This bug would be
found by trying the program with an AppleShare volume. The appropriate end case would be to look
for the exact error of fnfErr instead or, better, to add the permErr to the conditional.
MULTIPLE CONFIGURATION TESTS
This technique goes beyond merely finding the crash-and-burn bugs to help ensure that the program
will run in situations that weren't originally expected. Just fixing crash-and-burn bugs is for amateurs.
Professional software developers want their programs to be as bug-free as possible. As a step toward
this higher level of quality, testing in multiple configurations can give you more confidence that you
haven't made faulty assumptions about the system. The idea is to try the program on a number of
machines in different configurations, looking for combinations that cause unexpected results.
AVAILABILITY
Multiple configuration tests should use the Macintosh Plus as the low-end machine to be sure that
the program runs on 68000-based machines and on ones that have a lot of trap patches. Some of the
code the system supports is not available, like Color QuickDraw. If you use anything like that, you
will crash with an unimplemented trap number error, ID=12. The Macintosh Plus is a good target
for performance testing as well, since it's the slowest machine you might expect to run on. Its small
screen can also point out problems that your users might see in the user interface. For example, some
programs use up so much menu bar space that they run off the edge of the screen. That might not be
noticed until you run the program on a machine with a small screen. If your program specifically
doesn't support low-end machines, you should still put in a test for them and warn the user. Crashing
on a low-end machine is unacceptable, especially when all you needed was a simple check.
Naturally, the multiple configurations include a Macintosh II-class machine to be sure that
assumptions about memory are caught. Because most development is done on Macintosh II
computers, this case will likely be handled as part of the initial testing. It's virtually certain that your
program will be used on a Macintosh II by some users.
Using multiple monitors on a single system can point out some window- or screen-related
assumptions. The current version of the old 512 x 342 fixed-size bug is the assumption that the
MainGDevice is the only monitor in the system. Testing with multiple monitors will point out that
although sometimes the main device is black and white, there's a color device in the system. Should
your users have to change the main device and reboot just to run your program in color?
By testing the program within a color environment, even if it doesn't use color, you'll find any
assumptions about how color might be used or the way bitmaps look. It's a rare (albeit lame) program
that gets to choose the exact Macintosh it should run on.
Try the program under Virtual Memory to see if there are built-in assumptions regarding memory.
Use the program under both System 6 and 7. If the program requires System 7, but a user runs it
under System 6, it should put up an alert and definitely not crash. For the short term, it's obvious
that you cannot assume all users will have either one system or the other. The number of
fundamental differences between the systems is sufficiently large that the only way to gain confidence
that the program will behave properly is to run it under both systems. Some bugs that were never
caught under System 6 may now show up under System 7. The bugs may even be in your code, with
implicit assumptions about how some Toolbox call works.
Doing a set of functionality tests on these various types of systems will ensure that you can handle the
most common variations of a Macintosh. Tests of this form will give you a better feeling for the
limits of your program and the situations it can handle gracefully. There's usually no drawback to
getting a user's-eye view of your program.
There is a tool called Virtual User (APDA #M0987LL/B) that can help a lot with these kinds of tests.
It allows you to script user interactions so that they can be replayed over and over, and it can execute
scripts on other machines remotely, over AppleTalk. So, for instance, you could write a script that
puts your program through its paces, and then automatically execute that script simultaneously on
lots of differently configured Macintosh systems.
ERRORS CAUGHT
As discussed above, this technique attempts to flush out any assumptions your code makes about the
environment it's running in: color capabilities, screen size, speed, system software version, and so on.
CODE SAMPLE
void Hoohah(void)
{
long localArray[2500];
. . .
}
Naturally, this little array is stack hungry and will consume 10K of stack. On a Macintosh II
machine, this is OK, as the default stack is 24K. On the Macintosh Plus, the stack is only 8K, so
when you write into this array you will be writing over the heap, most likely causing a problem. This
type of easy-to-code bug may not be caught until testing on a different machine. Merely because the
code doesn't crash on your machine doesn't mean it's correct.
ASSERTS
Asserts are added debugging code that you put in to alert you whenever a situation is false or wrong.
They're used to flag unexpected or "can't happen" situations that your code could run into. Asserts
are used only during development and testing; they'll be compiled out of the final code to avoid a
speed hit.
AVAILABILITY
You could write a function called ASSERT that takes a result code and drops into the debugger if the
result is false--or, better yet, writes text to a debugging window. In MPW, you can use __FILE__
and __LINE__ directives to keep track of the location in the source code. Another thing to check for
is bogus parameters to calls, sort of like Discipline. Basically, you want to check any old thing that
will help you ensure consistency and accuracy in your code, the more the merrier, as long as the
asserts don't "fire" all the time. Fix the bugs pointed out by an assert, or toughen up the assert, but
don't turn it off. If you just can't stand writing code to check every possible error, temporarily put in
asserts for the ones that will "never" happen. If an assert goes off, you'd better add some error-
handling code.
The following sample code shows one way to implement ASSERT.
#if DEBUG
#define ASSERT(what) do \
{ if(!(what)) dbgAssert(__FILE__,__LINE__); } while(0)
#else
#define ASSERT(what) ((void)0)
#endif
void dbgAssert(const char* filename, int line)
{
char msg[256];
sprintf(msg, "Assertion failed # %s: %d", filename, line);
debugstr((Str255)msg);
}
In this example, ASSERT is defined by a C macro. If DEBUG is true, the macro expands to a block
of code that checks the argument passed to ASSERT. If the argument is false, the macro calls the
function dbgAssert, passing it the filename and line number on which the ASSERT occurs. If
DEBUG is false, the macro ASSERT expands to nothing. Making the definition of ASSERT
dependent on a DEBUG flag simplifies the task of compiling ASSERTs out of final code.
ERRORS CAUGHT
This technique catches all sorts of errors, depending, of course, on how you implement it. Logic
errors, unanticipated end cases that show up in actual use, and situations that the code is not
expecting are some of the possibilities.
CODE SAMPLE
numResources = Count1Resources('PICT');
for(i=1; i<=numResources; i++) {
theResource = Get1IndResource('PICT', i);
ASSERT(theResource != nil);
RmveResource(theResource);
}
The problem here is that the code doesn't account for the fact that Get1IndResource always starts at
the beginning of the available resources. So the first time through, we get the resource with index 1,
and we remove it. The next time through, we ask for resource 2, but since we removed the resource
at the front of the list, we get what used to be resource 3; we've skipped one. The upshot is that only
half the resources are removed, and then Get1IndResource fails. This is a great example of a "never
fail" situation failing. The ASSERT will catch this one nicely; otherwise, you might not know about
it for a long time. The solution is to always ask for the first resource.
TRACE
Trace is a compiler option that causes a subroutine call to be inserted at the beginning and end of
each of your functions. You have to implement the two routines (%__BP and %__EP), and then the
compiler inserts a JSR %__BP just after the LINK instruction and a JSR %__EP just before UNLK.
This gives you a hook into every procedure that's compiled, which can be extremely useful. Like
asserts, trace is debugging code and will be compiled out of the final version.
AVAILABILITY
Trace is available in all the MPW compilers and in THINK Pascal. THINK C's profiler can be
configured and used in the same sort of way.
ERRORS CAUGHT
By being able to watch every call in your program as it's made, you can more easily spot inefficiencies
in your segmentation and your call chain: If two often-called routines live in different segments,
under low-memory situations you may be swapping code to disk constantly. If you're redrawing your
window 12 times during an update event, you could probably snug things up a little and gain some
performance. You can watch the stack depth change, monitor memory usage and free space, and so
on. Think up specific flow-of-control questions to ask and then tailor your routines to answer them.
Expect to generate far more data than you can look at. Really get to know your program. Go wild.
CODE SAMPLE
PROCEDURE HooHah
VAR
localArray: ARRAY[1..2500] OF LongInt;
BEGIN
. . .
END; {HooHah}
Once again, we're building a stack that's too big for a Macintosh Plus. The stack sniffer will catch it
eventually, but since VBL tasks don't run very often, you may be far away by then. Trace could
watch for it at each JSR and catch it immediately.
USEFUL COMBINATIONS
All these techniques are powerful by themselves, but they're even better when used in combination.
Use them as early and as often as you can. Some of them are a bit of trouble, but that smidgen of
extra work is paid back many times over in the time saved by not having to track down the stupid
bugs. Use them throughout development, right up to the end. Many bugs show up through
interactions that only begin near the end of the process. Diligent use of these techniques is
guaranteed to find many of the easy bugs, so you can spend your time finding the hard ones, which is
much more interesting and worthwhile.
OK, now armed to the teeth with useful techniques, you're ready to stomp bugs. You know what to
look for and how to flush them out. But you know what? Debugging isstill hard.
THE INSIDE STORY OF THE DEBUGGER
BY STEVE JASIK
WHY WRITE A DEBUGGER
Since I didn't have the right connections for selling illegal drugs, I had to consider the alternative of selling
legal addictive drugs to Macintosh developers.
OK, seriously, I wanted to learn about the 68000 architecture. Given my experience writing compilers and
code generators for superscalar RISC mainframes, I decided to write a disassembler for and on the
Macintosh. I introduced my first product, MacNosy, in January 1985. It allowed a fair number of developers
to discover the innards of the Macintosh ROMs, as well as to curse at me for its original TTY interface.
Unhappy with the state of Macintosh debuggers, I decided to write one of my own, using MacNosy as a
foundation. The resulting product, The Debugger, made its international debut in London in November 1986.
Since then, it's been expanded to become a system debugger (it runs at INIT time and is available to debug
any process), include an incremental Linker for MPW compiled programs, and more.
THE MACINTOSH INTERFACE
The Debugger uses the Macintosh user interface, or at least my interpretation of it. The windows, menus,
dialogs, and text processing are standard for the Macintosh.
The only real problem was the switch in context. I had to swap in all of low memory ($0 to $1E00 on a
Macintosh II-class machine). This may appear to be a bit expensive, but in comparison with the screen swap,
which is a minimum of 22K on a small-screened Macintosh, it's trivial. The biggest problem in this area is
that some of the values have to be "cross-fertilized" between worlds, and many of the low-memory globals
are not documented.
Using the Macintosh interface became a royal pain as the System 7 group extended the system in such a
way that the basic ROM code assumed the existence of a Layer Manager and MultiFinder functions. In many
cases, I had to "unpatch" the standard code and substitute my own in order to keep The Debugger
functional.
MMU PROTECTION
MMU protection was initially designed so that The Debugger would try to protect the system from destruction
no matter what program was running. As we implemented the design, we found that this goal was
impossible because many of the applications (MPW Shell, ResEdit, Finder) diddled with the system heap. I
ended up protecting the rest of the system only when an application that's being debugged is running.
EASE OF USE
Users have had an influence on the design and feature set in The Debugger. For example, the initial version
of the watchpoint (memory watch) command was very simple. When a user pointed out the usefulness of an
auto reset feature in the command, we added it.
I've tried to use simple commands for the most frequently performed operations in The Debugger. The idea
has been to make common things easy to do. Some of the more complicated operations are difficult to keep
simple, as the scripting capability is limited. SADE, in contrast, has an extensive scripting capability but is
cumbersome to use.
TMON, THEN AND NOW
BY WALDEMAR HORWAT
The first version of TMON was released in late 1984. TMON was a summer project for me at TMQ
Software when I was a junior in high school. I wrote it because I was dreaming about a one-Macintosh
debugger (MacsBug required a terminal at the time) that had a direct-manipulation user interface. Direct
manipulation meant more than just having windows--it meant you would be able to change memory or
registers simply by typing over your values, assemble instructions by typing in a disassembly window, and so
on.
THE ORIGINAL TMON
Memory constraints of the Macintosh 128K forced me to write TMON entirely in assembly language--the
original version used only 16K plus a little additional memory to save the screen. TMON used its own
windowing system to avoid reentrancy problems with debugging programs that call the system. TMON also
included a "User Area," a block of code that could extend TMON. The source code was provided for the
standard user areas, and Darin Adler took great advantage of this facility to add numerous features to
TMON in his Extended User Area.
Writing TMON took a little ingenuity. I didn't have anything that could debug it, so I wrote the entire
program, assembled it, ran it on a Macintosh, and watched it crash. After a couple of dozen builds, I got it
to display its menu bar on the screen. By about build 100, I had a usable memory dump window that I
could then use to debug the rest of TMON.
TMON PRO
Improving a program written entirely in tight assembly language designed for a Macintosh 128K became
intractable, so I switched to MPW C++. Version 3.0 of TMON (TMON Pro) is written half in assembly
language and half in C++. Using C++ turned out to be one of the best ways to debug a program: C++
features such as constructors and destructors prevented a lot of pesky programming errors. The downside of
using a high-level language is that code size grows explosively--TMON 3.0's code is about ten times larger
than TMON 2.8's.
When writing TMON 3.0, I reevaluated earlier design decisions. I opted to continue to concentrate on
debugging at the assembly language level for two reasons. First, there are many bugs that can arise on a
Macintosh that pure source-level debuggers can't handle. Second, I find that I use TMON at least as much
for learning about the Macintosh as I do for debugging.
I sometimes wish I could use the Macintosh windows in TMON. Nevertheless, I decided to remain with
TMON's custom windows for reasons of safety. Until the Macintosh has a real reentrant multitasking system
that can switch to another task at any point in the code, writing such a debugger would either make it prone
to crashing if it was entered at the wrong time or require the debugger to be more dependent on
undocumented operating system internals than I like.
I found that writing TMON 3.0 was much harder and took much longer than writing the original TMON.
Part of this was due to the second-system effect--the product just kept on growing over time. Nevertheless, I
also found that writing TMON 3.0 was difficult because of the loss of the Macintosh "standard." There are
now over a dozen Macintosh models, using the 68000 through the 68040, some with third-party
accelerators, various ROM versions, 24- and 32-bit mode, virtual memory, several versions of the operating
system, and numerous INITs, patches, video cards, and other configuration options. These options present
unique challenges to a low-level debugger such as TMON, which must include special code for many of
them.
Despite the frustration, I think that writing TMON was worth it--it made many developers' lives easier. I plan
to continue to evolve TMON in the future and incorporate suggestions for improvements.
A WORD TO THE WISE FROM FRED
What we've described in this article are a number of tools for doing Macintosh software development. Some
of you are about to say, "Oh, those sound really great, but I don't have time to use them--I'm about to
ship," or whatever. I'd like to tell you a story that a man of sound advice, Jim Reekes, told me: A young boy
walked into a room and saw a man pushing a nail into the wall with his finger. The boy asked him, "Hey,
mister, why don't you go next door and get a hammer?" The man replied, "I don't have time." So the boy
went next door, got a hammer, and came back. The man was still pushing the nail into the wall with his
finger. So the boy hit the man in the head with the hammer, killed him, and took the nail.
BO3B JOHNSON AND FRED HUXHAM didn't want a bio, except to say that they are cohosts of "Lunch with Bo3b and
Fred." We also feel compelled to tell you that in Bo3b's name, the "3" is silent. *
THIRD-PARTY COMPATIBILITY TEST LAB Apple maintains a Third-Party Compatibility Test Lab for the use of Apple
Associates and Partners. The Lab features many preconfigured domestic and international systems, extensive networking
capabilities, support from staff engineers, and so on. If you're an Apple Associate or Partner, and you'd like to make a
test-session appointment or get more information, contact Carol Lockwood at (408)974-5065 or AppleLink
LOCKWOOD1.
Or you can write to Apple Third-Party Test Lab, Apple Computer, Inc., 20525 Mariani Avenue M/S 35-BD, Cupertino, CA
95014.*
RELATED READINGDebugging Macintosh Software with MacsBug by Konstantin Othmer and Jim Straus (Addison-Wesley, 1991) and How to
Write Macintosh Software by Scott Knaster (Hayden Books, 1988).*
THANKS TO OUR TECHNICAL REVIEWERS Jim Friedlander, Pete Helme, Jim Reekes*