December 95 - Kon & Bal's Puzzle Page: Zoning Out
Kon & Bal's Puzzle Page:
Zoning Out
Konstantin Othmer and Bruce Leak

See if you can solve this programming puzzle, presented in the form of a dialog
between Konstantin Othmer (KON) and Bruce Leak (BAL). The dialog gives clues to
help you. Keep guessing until you're done; your score is the number to the left
of the clue that gave you the correct answer. Even if you never run into the
particular problems being solved here, you'll learn some valuable debugging
techniques that will help you solve your own programming conundrums. And you'll
also learn interesting Macintosh trivia.
BAL I've got a small problem I'd like you to help me with.
KON Who's paying the airfare this time?
BAL Nothing like that. It's really quite straightforward, and surprisingly
reproducible. The problem is that sometimes when I'm using Microsoft Word 5.1a
and I pull down a menu, when I let go of the menu there's garbage on the screen
where the menu was.
KON That was a problem they were having in the beta release, but I think
it's fixed in the final version of Windows 95.
BAL Actually, this is on a Power Macintosh 6100, and I haven't yet
installed Windows 95 on top of my SoftPC, which runs on my 68000, which is
being emulated by Gary's emulator.
KON Microsoft is still in the loop.
BAL Well, it's not just a Microsoft problem. I can't seem to make it happen
with Word by itself. It only seems to happen if I run and quit cc:Mail before
running Word.
KON That darn Justice Department! Without them you could just be running
Microsoft mail, and you probably wouldn't have this problem.
Try running Word; then launch and quit cc:Mail. Does it still happen?
BAL Now Word is working fine. In fact, Word works in every case -- at least
as far as this problem is concerned -- unless I launch and quit cc:Mail before
launching and quitting Word. And the interesting thing is that it only happens
with the Modern Memory Manager on.
KON Just run your machine with the classic Memory Manager. I have problems
running THINK C's debugger when I use the Thread Manager and the Modern Memory
Manager. There's just too many of these kinds of bugs to deal with!
BAL Not so fast, QuickDraw. The Modern Memory Manager gives you lots of
great new features. First of all, your machine will run faster. In addition to
being ported native, it also uses much more efficient algorithms. It keeps
track of free blocks in a separate list, keeps track of heap zones to make
RecoverHandle work better, and has a back pointer so that blocks can be walked
either way, drastically decreasing heap-walking time and making things much
more efficient -- especially when virtual memory is on. Also, the Modern Memory
Manager was designed to be bus error proof, in that it returns from any
internally generated exception by returning an error to its caller (though this
was changed in the latest version of the Modern Memory Manager, as you may have
read in the Balance of Power column in develop Issue 23). Finally, in the old
Memory Manager moving the partition between the system and Process Manager
heaps was a total nightmare; this problem was solved in the Modern Memory
Manager.
KON Anytime you port something native you have two choices: rewrite the
code directly, preserving internal algorithms and data structures, or rethink
and reimplement, preserving only the top-level application interface. The first
choice virtually guarantees compatibility but makes it difficult to maintain in
the future, while the second gives you slightly less compatibility but a much
better upgrade path, better maintainability, and a much more efficient system.
It sounds like they went with the second choice, but at the obvious expense of
some short-term compatibility problems. And it seems like that's what we're
dealing with here.
BAL Thanks for the philosophy lesson. Are you going to solve the problem?
KON OK. Launch and quit cc:Mail and check all the heaps. Look for orphaned
memory, locked blocks being left around, or any other signs of an application
not properly cleaning up after itself.
BAL I need to install MacsBug to do that. I'll install version 6.5d11
because it has some new PowerPC features in case we need them.
KON I'm afraid we will.
BAL So after we quit cc:Mail, the system heap grew some, but all the heaps
seem fine. We have an extra 128-byte pointer, and we have five extra handles
for a total of almost 32K, but three of those (25K) are purgeable.
KON All this extra stuff lying around certainly explains why I have to
reboot every couple of hours.
BAL Yeah, and those OS engineers really worked on that problem. On System
7.5 you get a pretty picture and a nice thermometer bar!
KON So try the patch dcmd. It will tell you what traps have been patched.
Before you run cc:Mail, type
patch s
to
grab a snapshot of all the traps. When you're in cc:Mail, just type
patch
and
you'll get a list of all the traps that have been patched. It's a great way to
find random skankiness.
BAL The only OS trap that they patch is _Rename, and they patch the Toolbox
traps _Pack8, _UserDelay, _SysErr, _LoadSeg, _UnloadSeg, and _ExitToShell.
KON OK, and what's still patched after the application quits?
BAL Nothing. It seems to totally clean up.
KON Wonderful. What does Word patch?
BAL The OS traps _Rename and _CompactMem, and the Toolbox traps _Pack8,
_UserDelay, _HiliteWindow, _FrontWindow, _SysError, _LoadSeg, and
_ExitToShell.
KON There seems to be a lot of overlap. We should check a do-nothing
generic application. I bet the system is magically patching some stuff when it
runs an application.
BAL It turns out that all those traps except _HiliteWindow, _FrontWindow,
_CompactMem, and _UnloadSeg are always getting patched.
KON It figures. Word is augmenting parts of the Memory Manager and getting
in on some Window Manager action, and cc:Mail is playing games with the Segment
Loader. Where's that book on Macintosh programming guidelines?
BAL I don't think they read that in Redmond. By the way, even though menu
code is fairly boilerplate, this one's a mixed bag. Netscape, SimpleText, and
FindFile work fine, but Word and THINK Reference fail consistently.
KON Boy, times have changed. I remember when you used to just dive right
into MacsBug, disassemble a bunch of code, and get to the bottom of these
problems. Now you're looking at what SimpleText does compared to Word!
BAL I'm not the one who's doing it. I don't even touch the computer
anymore. It's one of my henchmen, Paul Young.
KON Anyway, there are two ways the bits behind the menus get redrawn. If
plenty of memory is available, they get back-buffered and restored with
CopyBits. If there's not much memory, an update event is generated.
BAL Since Word is the only application running at the time, we have plenty
of memory.
KON Set a breakpoint on CopyBits and pull a menu down. The first break will
be when the bits are being saved. Let's look at the address, step over the
call, and make sure the right data was put there. When you let the menu up,
you'll break on CopyBits again. Is the source data correct -- that is, is the
source our previous destination?
BAL The base address when the bits are restored isn't the same as the base
address when they get saved.
KON Where is the base address? Is it part of a handle that moved?
BAL The base address for the restore is $40810000.
KON Someone is dereferencing zero! It sounds like the bits are getting
saved in a handle, and somehow the handle is getting trashed. Let's follow the
handle from the save and see what happens to it.
BAL When the bits are being saved, the base address is part of a handle in
MultiFinder temporary memory. The handle is $438 bytes long.
KON What happens to that memory on the restore?
BAL The memory still exists, and the data is fine. It's just that the
PixMap doesn't point there anymore.
KON So we need to figure out where the Menu Manager is storing the PixMap
and why that location is getting trashed.
BAL The Menu Manager uses SaveBits and RestoreBits, which allocate memory
for the pixels using the offscreen buffer calls that return PixMaps. The PixMap
base address does double duty: when it's unlocked it's a handle; when it's
locked it's a pointer. There's a flag in rowBytes to indicate what state it's
in. To go from the locked state to the unlocked state, the GWorld routines call
RecoverHandle.
KON Let's break on RecoverHandle and see what we get back.
BAL It returns 0. But why?
KON It's kind of weird that this happens only with the Modern Memory
Manager. In the old Memory Manager, you had to set the heap zone before calling
RecoverHandle. The Modern Memory Manager relaxed this restriction and keeps
a tree of valid heaps. When you call RecoverHandle, it walks the heap tree. If
cc:Mail is somehow corrupting the tree, RecoverHandle will fail.
BAL Nice theory. How are you going to test that?
KON E.T.O. 17 has a debugging version of the native Memory Manager that
will print out diagnostics anytime weird stuff happens. Let's install it and
reboot.
BAL When you boot, you drop into MacsBug with the message "Bad pointer
being passed to RecoverHandle 00030020." It looks like "PC Exchange" was
loading.
KON Let's try booting with the extensions off. Use the Extensions Manager
so that you can keep MacsBug, the Memory control panel (so that we're sure
we're in the Modern Memory Manager), and the Debugging Memory Manager.
BAL When I run the Extensions Manager, I break into MacsBug with the
message "Bad handle; are you unlocking a fake handle?"
KON A complete treatise on all the memory crimes committed in the Macintosh
is beyond the scope of this column.
BAL Without superfluous extensions, the problem at boot time goes away, but
we still have the problem in Word.
KON Well, let's look at the zones and see if everything looks OK. Let's do
an hz to list all the heap zones.
BAL OK. But hz doesn't use the heap tree, so if you want to check the heap
tree you'll have to do it manually.
KON Great. I'll use the SmartFriends debugging trick and call Jeff to
figure out how to do that.
Jeff The heap tree is part of the zone header. The system zone starts at
$2800, and a pointer to the next zone starts at offset $20. $2820 contains
$1672DF0.
KON That should be the Process Manager zone. But that number is really big.
How could that be? How many fonts do you have installed?!
Jeff Since the system heap can grow, we put the Process Manager zone header
at the end of the block, so we don't have to move the header every time the
heap size changes.
BAL The next zone in the Process Manager is nil, since at the top level
there are only two zones: the system zone and the Process Manager zone.
KON Let's look at the child zones inside the Process Manager.
Jeff The child zones are pointed to by offset $24 in the zone header.
BAL The first child zone is the Word zone, which corresponds to what we got
from hz. And the Word zone header has no child zones.
KON So the world makes sense so far. Does the next zone pointer make
sense?
BAL It's kind of wacky. It points inside the Word heap!
KON That's a problem. Does that zone header look reasonable, at least?
BAL No. It's trash. It looks like Word code.
KON What happens if you don't run cc:Mail before running Word? And how does
the Memory Manager know how to update the zone headers? There's no call to
explicitly destroy zones, only create them.
BAL I'll take the second question first. Zones are created by InitZone, and
they're never explicitly destroyed. In the Modern Memory Manager, there's new
logic in DisposeHandle that checks to see if the handle is a zone; if so, it
assumes the zone is destroyed and updates the heap tree.
KON Will the skankiness ever end?
BAL If I run Word without first running cc:Mail, the heap tree is OK.
KON Now we just need to figure out why the heap tree is getting trashed.
Even though the tree update algorithm is implicit, it seems pretty good at
first blush. Let's go through the failing scenario and compare the heap zones
to the tree and figure out when they diverge.
BAL When we run cc:Mail, hz doesn't agree with the zone structure we get by
walking the heap tree. Here's what the two structures look like:

KON So the cc:Mail zone is smaller than the handle of the memory it's in.
Someone limited the size of the application zone. In the heap tree view, it's
clear why: another zone is being allocated; 32K is left between the zones, and
that space is being used for the stack.
BAL The reason hz can't find the second zone is that before the Modern
Memory Manager, no one kept explicit track of the zones. Basically, the hz
command has to search for the zones. It does this by starting from the system
zone, which is always pointed to by low memory (and is usually located after
the trap tables at $2800). From the system heap zone header, it can find the
zone trailer. Right after that block is the Process Manager zone header. It
walks all the blocks in a zone and finds all the handles that look like other
zones. It starts by assuming that the handle contains a zone, and then checks
to see if the zone header points to a block that looks like a trailer and if
the trailer points back to the zone header. When it looks for zones inside
other zones, it assumes that they begin either at the start of the handle or
right after another zone. Since cc:Mail has its stack space between the two
zones, the hz command can't find it.
KON OK. Unfortunately we're not debugging the hz command. But that probably
gives us a clue as to why the Modern Memory Manager is getting confused. It
seems to keep pretty good track of the zones that are getting created, since
that's easy by just watching InitZone. But it gets confused when the zones are
being disposed of, since it does that by watching DisposeHandle.
BAL Exactly. The heap tree gets trashed when cc:Mail quits, since the
Modern Memory Manager assumes that there's only one zone (and perhaps its
children) in any handle. So when it sees the dispose, it throws away the first
zone and all its children, but it doesn't throw away the second zone. It works
fine with the old Memory Manager since no one ever explicitly keeps track of
all the zones. But the Modern Memory Manager uses the heap tree for
RecoverHandle, and the tree is trashed, so either the machine crashes or you
get garbage.
KON That's pretty interesting. In this case, neither cc:Mail nor Word did
anything wrong. The way cc:Mail used the Memory Manager was nonstandard, and
when the algorithms in the Modern Memory Manager changed, there were some
interesting cases that fell through the cracks. I think the newer version of
cc:Mail no longer allocates zones this way. And the Memory Manager will
undoubtedly soon be smarter.
BAL Nasty.
KON Yeah.
KONSTANTIN OTHMER AND BRUCE LEAK
KON has been holding a steady job at Catapult Entertainment for many months
now, but he spends more time playing soccer than working. BAL is at the front
of the self-employment line and has finally moved out of his hotel and into a
house. Rumor has it that behind the house there's a big archery field.