June 93 - KON & BAL'S PUZZLE PAGE
KON & BAL'S PUZZLE PAGE
FINDER++
KONSTANTIN OTHMER AND BRUCE LEAK
See if you can solve this programming puzzle, presented in the form of a dialog
between Konstantin Othmer (KON) and Bruce Leak (BAL). The dialog gives clues
to help you. Keep guessing until you're done; your score is the number to the left of
the clue that gave you the correct answer. These problems are supposed to be tough. If
you don't get a high score, at least you'll learn interesting Macintosh trivia.
KON I'm trying out this new C compiler to see what we can do to make the system and
Finder smaller and faster.
BAL Wait. Since when has the Finder been written in C?
KON It's better than that! It's actually C++, with some assembly routines so that we can
claim the copyright goes back to 1983.
BAL Oh, that explains how the System 7 Finder got so much bigger. I thought it was just
the About box. I'm running System 6 on my PowerBook 100. I'd sure like to get a
smaller and faster version of System 7. Are you making any progress?
KON Well, yes and no. The compiler output is certainly smaller but I haven't nailed down
how much faster it is. When I boot up and the Finder launches, the machine restarts,
which relaunches the Finder, which causes the machine to restart, and so on. It all
happens pretty fast but doesn't seem all that useful.
BAL What machine is this on?
KON Macintosh Classic -- the original, not the Macintosh Classic II.
BAL Somehow the compiler is generating bogus code that causes the system to restart. So I
compare the code from the new compiler to the code from the old compiler, look at
the differences, and see if they make sense.
KON Everything's different: 42,000 bytes went away and the rest is totally different. This
isn't a minor compiler revision. We're talking Advanced Technology here. Where are
you going to look?
BAL OK, OK. Let's debug it. I set an ATB on _Launch and then another on _InitGraf.
KON OK. You break at _Launch and then after you Go you break at _InitGraf.
BAL I set an ATB on _WaitNextEvent.
KON You break at _WaitNextEvent.
BAL I say Go and see if I get back to _WaitNextEvent again.
100 KON The machine reboots almost immediately.
BAL I go back to the same place and instead of saying Go I trace over _WaitNextEvent.95
KON The machine crashes into MicroBug. But it's not your ordinary crash into MicroBug.
The screen is trashed and you can't type anything. But it looks as though MicroBug is
trying to come up.
BAL Can I hit the NMI button?
90 KON You can press it all you want, but it doesn't do anything. And, by the way, G -1
doesn't work either.
BAL Hmmm. It seems as if something is seriously wrong with _WaitNextEvent. Did you
recompile the Process Manager, any DAs, or other stuff?
85 KON Nope, I only recompiled the Finder. When I get that working, I'll get around to the
rest.
BAL So you didn't recompile the Finder extensions? Since the C++ virtual function tables
are different, all your existing Finder extensions are incompatible and maybe that's
what's hosing you.
80 KON None of the extensions are active, and even if they were, the Finder verifies their
versions. What do you expect? They're object oriented. Of course it works.
BAL Of course. Well, since we couldn't make it across _WaitNextEvent, let's step into it.
75 KON As soon as you step, you get the same weird crash into MicroBug.
BAL I just step into it?
KON Yes.
BAL As soon as I step, pending interrupts come in and kill me. So I disable interrupts with
an SR = 27000000 and try stepping again.
70 KON Same crash.
BAL Seems like there might be something wrong with MacsBug.
KON Let me make sure I'm following you here. Only the Finder is recompiled and you
blame the strange crashes on MacsBug? I'm going to have trouble selling that one.
BAL Clearly there's something wrong with the recompiled Finder. It's probably trashing
MacsBug memory.
65 KON Come on. MacsBug does some sort of a checksum on itself and tells you if it's been
altered. When you break at _WaitNextEvent, you don't get any messages to that
effect.
BAL You got me there, KON. So you're saying that MacsBug is in perfect working order at
this point. I can do an IL or whatever, but if I step I'm dead?
KON Perfect working order? Same as it ever was. But the Surgeon General has determined
that stepping or tracing at this point causes ill effects.
BAL This is not my beautiful MacsBug. If I trace after I hit _InitGraf, is everything fine?
60 KON No problem.
BAL So I do an
ATB ';t ;g'
which breaks on every trap, traces over it, and then continues. That way I can see what
the last trap I hit was.
55 KON The machine runs for a while, but when you crash and burn into MicroBug, you lose
your MacsBug screen.
BAL Fine. I set up another screen, put MacsBug on that screen using the Monitors control
panel, and use the SWAP command so that MacsBug is always visible. That way when
I crash I can see what just happened.
KON Great strategy for a modular Macintosh, but this is on a Macintosh Classic. I'd let you
figure it out that way except you used up your whole budget flying to North Dakota a
few puzzles ago.
BAL I was hoping you'd forget that. OK, fine. Someone must be trashing low memory, so
I'll use Bo3b Johnson's totally awesome Blat dcmd. It'll catch any read or write from
memory locations $0-$100.
50 KON You're on a Macintosh Classic, which doesn't have an MMU. That dcmd works via
the MMU.
BAL KON! Those correspondence classes are finally paying off. So I'll narrow down the
area that's causing the problem by doing an ATB 10 to skip over 16 ($10) traps at a
time until the machine crashes into MicroBug. If it takes five times to crash, the next
time I'll do an
ATB 40, and then an ATB 4, until it crashes. After I do this enough times I'll know
what was the last trap that was successfully executed, and I can go from there.
45 KON Rather than crashing, the machine is now rebooting.
BAL OK, so what's the last trap called before the machine reboots?
40 KON _WaitNextEvent.
BAL Fabulous. Déjà vu. Is this a Never Ending Story? And when I'm at _WaitNextEvent I
can't step or trace or anything?
KON Well, you can't step or trace. That's all you've tried so far.
BAL So I set a breakpoint on the first instruction of _WaitNextEvent and say Go.
35 KON You crash into MicroBug, just like before.
BAL OK, what's the current score? Can we call it quits?
KON I wouldn't say you aced this one. Luckily we're getting paid per word, so let's keep
going.
BAL But when I was at _InitGraf, I could trace. So something's hosing MacsBug between
_InitGraf and _WaitNextEvent. I'll do the ATB 10 trick like before, but this time I'll
try tracing after every break. That way I can figure out where MacsBug is getting
mauled.
30 KON You figure out that you can trace over a call to _InitWindows, but when you trace
over the next trap, a call to _GetResource, you crash into MicroBug.
BAL So I go to _InitWindows and trace until I get to the call to _GetResource. If it's a long
way, I do a T 1000. If that crashes, I reboot and do a T 500, then a T 250, and so on,
until I find the offending instruction.
25 KON The offending instruction is a
MOVE.L d0,20(a2)
BAL What's in A2?
20 KON $100.
BAL Writing to low memory like this sounds like a bad idea. My guess is that A2 is trashed
and we're pounding an important vector. What's at $120?
15 KON That's MacJmp.
BAL Aha! MacJmp is the vector that exception code uses to go to the debugger. Once you
trash that, all bets are off.
KON Yeah, setting ATBs works because MacsBug patches the trap dispatcher and looks for
the A-traps you have breaks on. If it encounters one, it just drops into MacsBug
directly. Other breakpoints are set by replacing the existing instruction with a trap
instruction. When these instructions are processed, they go through MacJmp. When
MacJmp gets trashed, tracing and stepping and setting breakpoints no longer works, as
we found out.
BAL Nasty.
KON Don't try to finish up so fast! You still haven't figured out why the machine is
rebooting.
BAL The new compiler must do a better job of register allocation and actually use them all
in its optimizations. Some Finder glue routine you called must have trashed A2.
KON Exactly. An easy problem to fix, though. The Finder was calling an assembly routine
that hammered A2. After you fix the bug and build a new Finder, the machine still
restarts.
BAL So I set an ATB on _WaitNextEvent, since that was as far as we got last time, and try
to trace over it.
10 KON OK. No problem.
BAL Whew! Finally I get past that _WaitNextEvent. Let's go for two. I say Go and see if
we hit _WaitNextEvent again.
KON Nope. The machine restarts.
BAL After the first _WaitNextEvent I do the trick with T 1000, T 500,
T 250, and so on, until I find the offending instruction or subroutine. If the problem is
occurring in a subroutine, I go into it and do the same thing. At some point this
process has to stop and I'll find the problem instruction.
5 KON The offending instruction is an
LEA 13(a7),a7
BAL Well, that's bogus. Using an odd address on a 68000 will cause an address error.
KON Yeah, but the machine is rebooting.
BAL I get it. It's an odd address in the stack pointer. The Macintosh gets an address error
because of the odd address. When it goes to process the exception, the exception
handler gets an address error trying to push the exception frame onto the stack. If the
Macintosh ran in user mode, it wouldn't have this problem, since it could switch to
supervisor mode -- essentially a clean machine with a properly aligned stack pointer
-- to handle the exception. But since it runs in supervisor mode, hosing the stack
pointer messes the machine up to the point where it can't even handle an exception, so
it reboots.
KON Yeah. We were working on cleaning up the stack after function calls in the compiler
and had a small problem with the way Booleans are handled. Since a Boolean is only a
char, which is one byte, the compiler thought it needed to clean up an odd amount of
space from the stack. Once we explained to the compiler that stacks must be word
aligned, the problem went away.
BAL Two bugs in one Puzzle Page!
KON Nasty.
BAL Yeah.
KONSTANTIN OTHMER AND BRUCE LEAK Long-time Puzzle Page fan Al Gore recently invited KON and BAL to upgrade
the White House situation room to BALKON-4, the latest in networked Spaceward Ho! technology. On his lunch hour,
KON debugged Clinton's economic plan and found the memory leak that was causing that $50 billion Medicare shortfall.
BAL is now working on an audio-animatronic Silicon Valley executive so that the President can always have one at his
side.*
Bo3b Johnson's Blat dcmd can be found on this issue's CD and on the E.T.O. disc. (The "3" in Bo3b's name is silent.) Blat
is written up in the Macintosh Debugging article in Issue 13 of develop.*
SCORING
- 75-100 How long have you been a member of the Liar's Club?
- 50-70 Sharpshooter. You win the (virtual) kewpie doll.
- 25-45 A valiant effort. These puzzles are hard!
- 5-20 Brush up for Issue 15's Puzzle Page.*
Thanks to Gary Davidian, scott douglass, and Jean-Charles Mourey for reviewing this column. *