Code Mechanic
Volume Number: 13 (1997)
Issue Number: 5
Column Tag: develop
Code Mechanic: Better Than Ever Stress Testing
by Dave Evans
There are few things more frustrating than losing access to your debugging tools due to a freeze, because you can't fix what you can't diagnose. The best course is to stop freezes before they start, so I'd like to share a common cause of freezes I've found. I'll also discuss some of the stress-testing options that are available to help you catch freeze-causing problems you might have missed, including an improved debugging tool.
Veteran readers of develop may notice a new title for this column. The previous title, "Balance of Power," was apt for its time, indicating a focus on PowerPC issues. But now that all new MacOS computers are PowerPC-based, everybody's writing about PowerPC, and my efforts in this area are complete. This new title reflects a focus on the mechanics of code tuning, with tips for improving your application's performance and stability, which I hope you'll find just as useful.
Protect Your Vectors
Even if you use a PowerPC-based MacOS computer, the first 256 bytes of memory are dedicated to 680x0 exception vectors, which the 680x0 software emulator uses to emulate 680x0 exceptions and interrupts. On a 680x0-based computer, these values are read by the processor itself when handling an exception or servicing an interrupt.
Under System 7, these important vectors are not memory protected. Any program can read from or write to them, possibly resulting in a serious failure. While not all of the vectors are used, modifying some of them will cause an immediate freeze, leaving you without access to your debugging tools. You probably don't address these vectors intentionally, but it often occurs accidentally when a nil pointer or empty handle is de-referenced.
Unintentionally reading from these vectors will produce a random result. In most cases the vectors are addresses of special system routines; these vectors can have any value, and they vary significantly from one computer model to another. As an example of how easy it is to cause a problem in this area, take a look at the following C code, similar to that found in some applications:
front_window = FrontWindow();
if (front_window->windowKind < 0)
MyDeskAccessoryRoutine(front_window);
The developers didn't realize that FrontWindow can return nil when no windows are open. In that case the application de-references the nil pointer and makes a logical decision based on the sign of the half word at $6C in low memory, which is the high half of the interrupt level 3 vector. On most Macintosh computers released before 1995, this vector pointed into ROM starting at address $40800000. Because of this, the applications would test the high half word value of $4080, and they wouldn't run the desk accessory routine. This was the right behavior, but for the wrong reason; disaster was averted by luck.
Beginning with all PCI-based PowerPC computers, ROM starts at location $FFC00000. During the development of these computers, we found that applications with code like the above would crash because they executed unexpected code after comparing the new half word value of $FFC0. We were able to work around their problem by changing the interrupt level 3 vector to point to a routine in RAM. This changed the high half word value to be a small positive number, and the applications behaved as expected. Still, the best case would have been if the problem could have been avoided in the first place. The following code is an example of what would have been a better, crash-free approach:
front_window = FrontWindow();
if (front_window && front_window->windowKind < 0)
MyDeskAccessoryRoutine(front_window);
Checking for nil pointers or handles is one way you can avoid these crashes in the first place. Checking for empty handles is another necessary step, since unlocked relocatable blocks that are marked purgeable may disappear any time memory can move.
To detect problems with purgeable blocks, you'll need tools to stress test your application. Utilities that display heap zones, allowing you to compact and purge a heap on demand, are a good start. For serious testing, however, you'll need a stress tool that operates all the time. One good tool for this is MemHell, which will compact and purge your heap whenever a Memory Manager routine that might move or purge memory is called. This slows down execution of your tests, but it will flush out problems with purgeable blocks.
So, while accidentally reading from low memory can cause unexpected results, accidentally writing to low memory can be fatal, and this is one of the most common causes of freezes that I've noticed. You may think this could never happen in your code, because none of your blocks are purgeable and you always check errors after allocating pointers. Think again; there are plenty of other opportunities. Do you check for an error after every GetResource call? Getting an unexpected error - from a corrupted resource file, for example - is one way you can end up with a nil handle. Besides diligent review of your code, you need to do stress testing to flush out possible errors, or freezes are likely to result.
Are You Stressed Enough?
There are a number of tools to help add stress to your testing. I've already mentioned MemHell for finding problems with purgeable blocks. You'll similarly need a tool to find reads and writes to the exception vectors.
The simplest choice is the ubiquitous and venerable EvenBetterBusError, written by Greg Marriott. This tool safeguards the first four bytes of memory, which are very often accidentally written over or read from. To detect reads, it places in the first four bytes of memory a value which when de-referenced will cause a crash. If you use a nil pointer or empty handle, the illegal value is likely to be used as data or de-referenced, leading to a crash. To detect writes, it checks periodically to see if the value that it placed has been overwritten; if so, you'll be notified with a DebugStr message. EvenBetterBusError is included as a dcmd in MacsBug beginning with version 6.5.4.
I've extended EvenBetterBusError to be more aggressive. The new version, YetEvenBetterBusError, writes a value over the first 256 bytes of memory which will cause a crash into your debugger when de-referenced. It also checks periodically for writes to these locations, but more frequently than EvenBetterBusError does. Like EvenBetterBusError, upon noticing a write to these locations it will notify you with a DebugStr message. YetEvenBetterBusError can be found at www.mactech.com.
To implement YetEvenBetterBusError, I had to sacrifice some compatibility with existing applications. Any application code that assumes the exception vectors start at address 0 will no longer function correctly. Most applications don't use the exception vectors directly, but some copy protection schemes do modify the vectors.
The correct way to determine the location of the exception vectors is by using the 680x0 instruction MOVEC, which must always be executed in supervisor mode. The location of the first vector is stored in the 680x0 VBR (Vector Base Register). To read the address, you would write the following assembly code:
_EnterSupervisorMode ; old sr result in d0
movec vbr,a0 ; get the vbr
move.w d0,sr ; restore the old sr
Always use the VBR to find these vectors. Although early versions of the MacOS always placed them at location 0, they're now often elsewhere. When virtual memory is turned on, for example, the vectors will actually reside in the system heap, and the VBR will point to them. To maintain compatibility, however, if virtual memory doesn't handle an exception it calls through to the original vector table at location 0. This is why even with virtual memory on, writing over the low-memory exception vectors can still cause a freeze.
YetEvenBetterBusError is able to overwrite and then monitor the first 256 bytes of memory by moving the exception vector table entirely. So, even when virtual memory is on, with YetEvenBetterBusError installed the original low-memory vectors are never called. This is why some existing applications may be incompatible with YetEvenBetterBusError.
A Cure for Test Anxiety
It's true that fully testing your code to reflect all possible configurations and user actions can be a near-impossible task. But the perceived stability of both your application and the computer depends on how well we all write and test our software. To do the best possible job, use the stress-testing tools mentioned in this column or in the article "Squashing Memory Leaks with TidyHeap" in this issue. Do the right thing: stress test, then relax!
Thanks to Pete Gontier, Chris Jalbert, Bo3b Johnson, Dave Lyons, Quinn "The Eskimo!", and Keith Stattenfield for reviewing this column.