September 95 - KON & BAL's Puzzle Page: Video Nightmare
Ian Hendry and Eric Anderson
See
if you can solve this programming puzzle, presented in the form of a dialog
between a pseudo KON (Ian Hendry) and BAL (Eric Anderson). The dialog gives
clues to help you. Keep guessing until you're done; your score is the number to
the left of the clue that gave you the correct answer. Even if you never run
into the particular problems being solved here, you'll learn some valuable
debugging techniques that will help you solve your own programming conundrums.
And you'll also learn interesting Macintosh trivia.
BAL I've got one for you, KON: I just updated to System 7.5 on my 8100/80
AV. Everything seemed OK for a while. I was comparing Scenery Animator to
Vistapro and I noticed that my cool new desktop pattern had disappeared. It was
there when I booted, but just as the Finder was coming up, the desktop changed
to a black-and-white pattern.
KON That's easy. Go back to System 7.1 and the world will be fine again.
Next.
BAL Hey, 7.5 is the source of much wonderment. It's really a lot of fun!
KON I don't know that much about 7.5. Metrowerks and THINK C seem to run
fine on 7.1. Is this part of that new puzzle CDEV that was added to spruce up
the system?
BAL Quit trying to change the subject. My desktop pattern went away and I'm
not happy about it.
KON Hmm. Did you change anything on your machine?
BAL I turned on VM for the memory-hungry rendering stuff.
KON So turn off VM and see if the problem goes away.
BAL Wow, that worked great. Now all I have to do is buy $1800 worth of
tariff-enhanced RAM so I can render my flyby of the Pentagon.
KON I won't ask what you're up to these days. My recent stock dealings have
left me low on bail money. Did you try it on 7.1.2?
BAL It didn't happen on straight 7.1.2, but I installed System Update 3.0
and it happened. VM must have changed in this update.
KON Sounds like a VM problem all right. Paste the older VM resources into
the new system. (I love component software.)
BAL The problem is still there. Does this let VM off the hook?
KON VM is never off the hook, but if your only problem is that the desktop
pattern is black and white, maybe you should stop whining and do your work.
BAL No, this is more serious. I opened the Monitors control panel and there
was no depth list and no monitor tile in the rearrange section.
KON Well, this should be pretty straightforward. Does it happen every
time?
BAL No such luck. This one is really evil. I've been trying to get a
reproducible case for days. Sometimes it happens right away, sometimes it goes
away for hours. Once it starts happening, it seems to keep happening across
restarts. It doesn't happen as reliably across shutdowns. It seems to happen
more in millions of colors but will happen at other depths too. Switching the
display back and forth between a 13-inch and a multiple-scan display sometimes
causes the problem to show up. Changing VM and RAM disk settings seems to
affect the reproducibility.
KON Cool! The random bugs are always the most fun. Let's get our trusty
MacsBug and see if we can find where it's going bad. Look at the video driver
and GDevice.
BAL When I try to enter MacsBug, the mouse freezes but MacsBug doesn't come
up.
KON Dang, I hate it when the tools don't work.
BAL Perhaps we should devote this column to model rocketry and video games
instead.
KON Now there's a thought. If I type Command-G, does the mouse unfreeze, or
do I have to reboot?
BAL The machine comes back when you hit Command-G.
KON So MacsBug is there; you just can't see it. I'll use log foo to dump
the output to a file named foo, followed by
dm @@thegdevice gdevice
Then I'll use drvr to see if the video driver is alive.
BAL Nice log trick, KON! The GDevice is fine, and the driver looks as if
it's loaded and active.
KON Drivers and VM sometimes don't get along. Maybe the driver is doing
something wrong. Did you try other video cards?
BAL It seems to happen only on Power Macintosh AV models.
KON It's the nasty "fungus" problem from Issue 17 all over again, or maybe
your card has gone bad.
BAL Well, I'm pretty sure there was no "fungus" around when the AV card was
developed, so it's probably not that. Besides, we're sticking to production
software in this column from now on. Anyway, I thought it might be my card,
too, so I borrowed your card. Thanks for the loaner. I turned on VM and it
worked like a champ . . . for a while. Now I've got the same problem again.
KON You killed my card?
BAL Admittedly, it's a computer that can do anything. But for the purposes
of this column, that idea is pretty far-fetched.
KON Here's what might be happening: Early in the boot process, VM isn't
present. For each card, the system calls the card's PrimaryInit code and
creates a GDevice. When VM loads, it changes the logical address mappings. When
the driver is called again, it assumes a one-to-one logical-to-physical mapping
of RAM, so the card starts responding to bogus address cycles. This confuses
the card's bus translator, and . . .
BAL Whatever. Any other stabs in the dark? Lose five points and try
again.
KON OK. Perhaps there are subtle timing variations when VM is present, and
the video card might have borderline hardware that's affected by these timing
dependencies. Or maybe the card's controller gets into a state where it no
longer responds to its address space.
BAL You're getting desperate. It's not a hardware problem. The declaration
ROM is there and everything looks fine. You can't blame this on the hardware.
Let's once again follow the software decision tree.
KON Yeah, you're probably right. Now that I think about it, those ideas
seem really out there.
So what you're telling me is that the desktop pattern is black and white,
MacsBug isn't working, and the Monitors control panel doesn't show depths or a
monitor tile. Let's find out when MacsBug stops working.
BAL When the machine boots, MacsBug is working, but by the time you get to
the Finder, it's gone. It seems to vanish early in the boot process.
KON See if you get to the first Display Manager call with an atb
DisplayDispatch or atb ABEB.
BAL OK. MacsBug is still alive.
KON I'll set a breakpoint just after the first Display Manager call and
then go.
BAL Yep. There doesn't seem to be a problem now. But the weird thing is, if
you trace over the Display Manager call and then type go, MacsBug will
eventually go away.
KON Wacky. Sounds like a Display Manager bug.
BAL Earlier you said it was a VM bug.
KON Both have been convicted criminals in the past, so you can't blame me for
thinking they're suspects. I'll bet you a buck it turns out to be neither! Do
an atb on the Display Manager call and trace from there until MacsBug goes
away; it shouldn't take too long.
BAL Sorry, MacsBug never goes away. The problem isn't reproduced.
KON So what you're telling me is that if I trace over the Display Manager
call and then go, I can't get into MacsBug after I'm done booting. But if I
keep tracing, the machine boots fine and MacsBug is always available.
BAL That's right. Let me help you along a little bit. There's an
SSecondaryInit call (which runs SecondaryInit code for the video cards) just a
few 680x0 instructions after the first Display Manager call. Does that help at
all?
KON What happens if we do an atb on SSecondaryInit?
BAL I can't reproduce the problem. If I set a breakpoint just after the
Display Manager call and go, the problem disappears. If I do an atb on the
Display Manager call, and either go from there or trace over it and go, the
problem happens. If I trace over it for a few instructions and go, the problem
doesn't happen.
KON So, what are the "few" instructions? It looks like they're the ones
whacking the video driver.
BAL No, they're just a few MOVE instructions to innocuous RAM locations --
nothing that should touch video.
KON What does this Display Manager call do? Could it be hosing anything in
the slots?
BAL I don't think so. I used MacsBug to skip it entirely and the problem
still happens.
KON This isn't getting us anywhere. Maybe the desktop pattern problem has
some better clues. The General Controls control panel in System 7.5 has an INIT
resource that calls HasDepth to decide whether to use the color pattern. It
then sets a PRAM bit to remember whether to use a color pattern across
restarts. When the desktop pattern is black and white, I'll use the log trick
to find out what the HasDepth call is returning.
BAL It returns an error.
KON Aha! Since HasDepth returns an error, the INIT resource thinks it's on
a display that can support only one bit per pixel (black and white), so it
disables the color desktop pattern and resets the PRAM bit; the color desktop
pattern is now gone forever.
BAL OK.
KON Let's trace HasDepth and find out what's wrong.
BAL It looks as if the Slot Manager returns the correct values for the
active functional sResource of the card but fails to find the depth. It returns
-316, an smInitStatVErr. According to Errors.h, this error indicates that the
siInitStatusV field was "negative after primary or secondary init." This means
the card's PrimaryInit or SecondaryInit code returned an error.
KON We can bet it's not PrimaryInit, because the GDevice is good. If the
error had happened in PrimaryInit, QuickDraw would have gotten an
smInitStatVErr when it called the Slot Manager to build the GDevice.
BAL You're finally making some progress!
KON MacsBug also makes Slot Manager calls (when it tries to switch depths),
which explains why it fails. That means the problem must be with the
SSecondaryInit call. Once the Slot Manager gets this error, most Slot Manager
calls return errors.
BAL But this doesn't explain what's causing the Slot Manager to fail to
begin with, or why the problem goes away every time we get close to it with
MacsBug.
KON Maybe we should try this with a BootBug card. Can you still get one?
BAL Maybe, but we're doing pretty well here. Let's keep going a little
longer.
KON Let's try to figure it out by brute logic. What does the SecondaryInit
code look like on this card?
BAL MOVE.L A0,A0.
KON That's it? Two bytes? No RTS? Cool! A bug in the AV card ROM! Does this
mean we all get new cards with the new 2.0 ROM? Maybe they can simplify that
complicated Monitors control panel Options dialog at the same time. How does it
boot at all?
BAL Good question. Designing Cards and Drivers for the Macintosh Family
says that the SecondaryInit entry on a video card is an SExecBlock, which is a
header followed by actual code. The Slot Manager validates the header before it
executes the code. The first byte of an SExecBlock is the revision number, and
the Slot Manager checks for a revision byte of 0x02. Since MOVE.L A0,A0 is
0x2048 in hex, the first byte of the AV card's SecondaryInit entry is 0x20,
which is a bogus entry, and the Slot Manager will never try to execute the
SecondaryInit code.
KON So it's pretty lame, but it should work, right?
BAL Yes. Remember, we added SecondaryInit to the boot process because some
machines didn't have 32-bit QuickDraw in ROM. On a machine without 32-bit
QuickDraw in ROM, video cards have to disable their functional sResources with
direct bit depths (16 and 32) in their PrimaryInit code, because the
PrimaryInit code runs before the disk is up and the cards can't tell if the
system has 32-bit QuickDraw installed. SecondaryInit was added to give these
cards a chance to reenable those direct depths after 32-bit QuickDraw was
loaded from disk. Power Macs obviously have 32-bit QuickDraw in ROM, and this
card only runs on a Power Mac, so it doesn't need SecondaryInit.
KON Let's walk through the SSecondaryInit call and see what it does. Why
does VM make a difference? And why is MacsBug causing the problem to go away if
you set breakpoints?
BAL You're just full of questions, aren't you? You're supposed to be giving
the answers!
KON Let's walk through SSecondaryInit.
BAL For each card, it looks for a SecondaryInit entry in the card's ROM.
The entry contains a header followed by the SecondaryInit code. If there's no
SecondaryInit entry on the card, SSecondaryInit bails out early. If there is a
SecondaryInit on the card, the Slot Manager tries to execute it with SExec and
then checks the status from the SExec call. If the status is negative (an
error), the Slot Manager marks the slot with that evil -316 error, and the slot
is bad from there on out.
KON So who is responsible for setting the error?
BAL The code executed by SExec, in this case the SecondaryInit code, should
set the status error. If the header is bad, the code never gets run and the
status never gets set.
KON Let me guess: the boot code never initializes the status before calling
SExec.
BAL Yep. And it's allocated on the stack as a local variable, which means
that the status is set to whatever garbage is left on the stack. At this point
in the boot process you're still in supervisor mode, so MacsBug is sharing your
stack. When MacsBug is used, it pushes stuff onto the stack and then pops it
off when it leaves (changing the garbage below the stack in the process).
That's why setting breakpoints and tracing mask the problem. BootBug also uses
the stack, so it too would have interfered with the bug.
Between the first Display Manager call and the SSecondaryInit, the system
allocates stack space for the SPBlock parameter for the SSecondaryInit call.
After the SPBlock is allocated, the stack pointer is very close to where the
local variables for SSecondaryInit will be allocated. At this point MacsBug's
stack usage will affect those never-initialized local variables.
This is something else to add to your list of gotchas for MacsBug: If you're in
supervisor mode (as you are at this point in the boot process) and you set
breakpoints, MacsBug is sharing your stack, and its use of the stack may affect
uninitialized variables. Later in the boot process, VM switches the machine to
user mode; from then on, MacsBug and applications use different stacks and
MacsBug will not interfere with uninitialized variables on the stack.
KON The garbage that VM leaves on the stack (sometimes) happens to be
negative. When the boot code gets to SecondaryInit and allocates variables on
the stack, it happens to use an area of the stack affected by VM.
BAL Well, I never turn VM on, so I'm always in supervisor mode, and MacsBug
always shares my stack. But now I've finally found a good use for VM: turn it
on when I have a bug that's hard to reproduce when MacsBug gets involved, and
see if it becomes reproducible.
KON That'll slow your machine down.
BAL Nasty.
KON Yeah.
SCORING
75-100 Excellent; you probably have a video-in jack built right into your
head.
50-70 Maybe we should be working for you.
25-45 Maybe you should be working for us.
5-20 Maybe you should stick to television.*
Thanks to Rich Collyer, Kent Miller, Mike Puckett, John Yen, KON (Konstantin
Othmer), and BAL (Bruce Leak) for reviewing this column.*
IAN HENDRY (AppleLink HENDRY; Internet hendry@apple.com) gets paid by Apple to
work on video stuff. His hobbies include shipping products and collecting new
Engineering managers. He can be found skipping meetings to play Ultimate and
working all hours to make up for it. Ian's going to be a dad soon, and though
he has been in rigorous sleep-deprivation training for years, he's hoping (but
still not certain) that he's ready for what he's gotten himself into.*
ERIC ANDERSON (AppleLink ERIC3) skipped out on the OS Services group at Apple
to get away from the chore of working on VM and the Thread Manager. Now he
works as Ian's evil twin on video-related Mac OS issues -- and he gets bugs
assigned to him that state, "When using a multisync monitor with my threaded
test app while VM is on, this funny thing happens." Seeing that there's no
escape, Eric wants more than ever to move to Hawaii and build boats.*