March 96 - KON & BAL'S PUZZLE PAGE: Printing, Patching, and Fonts
KON & BAL'S PUZZLE PAGE:
Printing, Patching, and Fonts
Dave Hersey and Cameron Esfahani
See if you can solve this programming puzzle, presented in the form of a dialog
between guest puzzlers Dave Hersey and Cameron Esfahani (cam). The dialog gives
clues to help you. Keep guessing until you're done; your score is the number to
the left of the clue that gave you the correct answer. Even if you never run
into the particular problems being solved here, you'll learn some valuable
debugging techniques that will help you solve your own programming conundrums.
Dave Hey cam, it's kinda quiet. Where are KON and BAL?
cam Since the local salad bar closed, I haven't seen KON. BAL disappeared
after he left the video game industry. Have you been getting enough sleep? You
look tired.
Dave I've been under a lot of pressure to track down this bug.
cam Maybe I can help. What's the problem?
Dave I have a Power Mac 6100/66 running System 7.5 with QuickDraw GX 1.1. When
I try to print from a word processor, I get the message "The application has
unexpectedly quit, because an error of type 11 occurred." What's an error of
type 11?
cam That's an unhandled exception from native code. What word processor are
you using?
Dave Um, a very large one in a very large office suite from a very large
company up north.
cam Have you updated to version 1.1.3 of QuickDraw GX?
Dave Yeah. The problem still happens.
cam Does it happen on any other machine?
Dave Yes. It crashes on any Power Mac but works fine on 680x0 machines.
cam Hmm. Is the word processor native on the Power Mac?
Dave Yes -- it's fat.
cam It sure is. But I have the same version of system software and the same
word processor, yet my machine doesn't crash.
Dave Well, I have a standard system installed, but I added a bunch of whizzy
fonts.
cam If I install one of your fonts, will my machine crash?
Dave Sometimes. If you install all my fonts, it crashes all the time.
cam That's easy, then: bad fonts. Here, take out this Thingamajigs font.
Dave No way, man. This is a standard bitmap-only font. It should work. Ike's
machine doesn't have Thingamajigs on it and his machine still crashes.
cam Does he have bitmap-only fonts installed?
Dave Yes.
cam At what point in the printing process do you crash?
Dave The crash occurs just as the application starts spooling the print
file.
cam Is this word processor QuickDraw GX-aware?
Dave Yes. It has support for the new QuickDraw GX print dialogs, and it calls
the QuickDraw GX translator to translate QuickDraw drawing commands into
QuickDraw GX shapes during printing.
cam Good for them. Have you tried to reproduce the crash with other QuickDraw
GX-aware applications?
Dave Yup. I tried to reproduce it with several QuickDraw GX-aware and
QuickDraw GX-savvy applications. No luck.
cam Try running the 680x0 version of this program on your Power Mac. It will
be slow and piggy, but try it anyway.
Dave The problem went away! So, the crash seems to have something to do with
the PowerPC code in this application.
cam Hmm. Let's install MacsBug and take a look at this from the debugger.
Dave I tried that before, but I couldn't see any symbols in the PowerPC code
where it crashes. I couldn't tell which routine the PC was in.
cam You should install the new version of MacsBug. Version 6.5.2 understands
native exceptions and can use embedded symbols.
Dave Nifty. . . . OK, I've done that. But I still crash.
cam Why do you crash? Type how.
Dave MacsBug claims that there was a "PowerPC access exception at 001DB030
ConstructNFNTDirectory+002B4."
cam What does ConstructNFNTDirectory do? Hey, wait, there's Alex Beaman. Alex,
can you help us out here?
Alex Sure. QuickDraw GX views all fonts as type 'sfnt'. It's really elegant:
ConstructNFNTDirectory will make an NFNT font appear to have an 'sfnt'
directory. It can build either just the directory header or the entire
directory, and this is controlled by a Boolean parameter passed into the
function. OK, gotta run!
Dave Thanks, Alex. When I disassemble ConstructNFNTDirectory with MacsBug, I
get this:
ilp ConstructNFNTDirectory
Disassembling PowerPC code from ConstructNFNTDirectory
ConstructNFNTDirectory
+00000 001DAD7C stmw r14,-0x0048(SP)
+00004 001DAD80 mflr r0
+00008 001DAD84 clrlwi r27,r5,0x18
+0000C 001DAD88 addi r28,r3,0x0000
+00010 001DAD8C mfcr r12
...
+00060 001DADDC addi r3,r30,0x0000
+00064 001DADE0 addi r4,r28,0x0000
+00068 001DADE4 bl GetNoLoadResource
...
+000E4 001DAE60 addi r3,r20,0x0000
+000E8 001DAE64 bl ComputeSearchFields
+000EC 001DAE68 crmove cr7_SO,cr7_SO
+000F0 001DAE6C cmpwi cr2,r27,0x0000
...
+002B4 001DB030 *lwzx r5,r19,r5
...
+002F0 001DB06C lhz r5,0x0004(r20)
+002F4 001DB070 li r16,0x0001
+002F8 001DB074 addic r5,r5,0x0001
+002FC 001DB078 sth r5,0x0004(r20)
+00300 001DB07C beq cr2,ConstructNFNTDirectory+00324
...
+003C8 001DB144 addic SP,SP,0x00A0
+003CC 001DB148 mtcrf 0x38,r12
+003D0 001DB14C mtlr r0
+003D4 001DB150 lmw r16,-0x0040(SP)
+003D8 001DB154 blr
cam An access exception means we're trying to read or write to an invalid
address. That, of course, could be caused by many things, such as uninitialized
variables or trashed memory. Let's check the heaps with
hc.
Dave Both the system heap and the application heap are fine.
cam OK, I restart the program and use brp in MacsBug to set a breakpoint at
ConstructNFNTDirectory. brp is just like br, except it works for PowerPC code.
After I start printing and the breakpoint is hit, I step through this function
to follow the code flow.
Dave At offset 0x0300 you don't take that branch, and you eventually begin
executing code that will corrupt the QuickDraw GX heap.
cam But that's wrong -- we should've taken that branch. The caller didn't ask
ConstructNFNTDirectory to create the entire directory, just its header; it
didn't allocate enough space for all of it. Check the heaps again.
Dave The heaps seem fine. QuickDraw GX allocates out of its own heap, which
MacsBug doesn't know about. Even if it did know about it, it wouldn't be able
to tell us if the heap was corrupt, as QuickDraw GX has its own memory
manager.
cam Darn, memory corruption bugs are the worst. You can trash memory and not
see the effects of it until you're miles away from that code. OK, why didn't it
take the branch at offset 0x0300?
Dave Well, CR2 is true, so the branch won't be taken.
cam How can you tell that CR2 is true?
Dave The PowerPC chip has eight condition register fields, CR0 through CR7,
stored in nibbles in a 32-bit condition register (Dave Evans talked about this
in his column in develop Issue 21). So the value of CR2 would be bits 8 through
11 of the condition register. The chip has its bits numbered from 0 through 31,
from left to right. We can tell that CR2 contains a true value because its
second logical bit isn't set. That bit corresponds to the equals operator, so
the fact that it's 0 means the operation that set this register was not
equal.
cam Who sets up CR2?
Dave The code at offset 0x00F0. As Alex mentioned, one of the parameters to
this function is a Boolean that controls whether the whole directory is created
or only the header. Because this parameter is a Boolean, the PowerPC processor
can just compare it against 0 and use the result as a flag for later branches.
Parameters passed in PowerPC code are put from left to right into registers R3
through R10; since this parameter is the third parameter to the function, it's
passed to the routine in register R5. (A much better description of this is in
Inside Macintosh: PowerPC System Software.)
cam I love this chip. I'll reexecute the program and get back to the start of
this function and examine CR2.
Dave It starts out false.
cam So someone's trashing it along the way. Well, we can't use some of our
normal tricks for detecting when memory gets trashed. One problem is that step
spy doesn't work yet for PowerPC. Another problem is that we would want to step
spy on CR2, which is a register, and step spy never worked on registers. We'll
have to do this the hard way: let's
step through this function, watching CR2 to see just when it gets changed.
Dave The subroutine GetNoLoadResource at offset 0x0068 changes CR2 from false
to true. GetNoLoadResource is a wrapper to GetResource.
cam I restart the program and trace over the GetResource call.
Dave Yep, that's the function that trashes CR2.
cam Is it legal for the compiler to rely on CR2 being preserved across
function calls?
Dave Yes. According to the PowerPC ABI (Application Binary Interface)
documentation -- section 3.6 in the first edition -- CR2 through CR5 are
nonvolatile and need to be saved across function calls.
cam Look at the code for GetResource. Since in System 7.5 GetResource is a
native trap with a routine descriptor, I can use the MacsBug dcmd drd to dump
that out. Here's what I get:
drd GetResource
The RoutineDescriptor at: 011EDFEC
Mixed Mode Magic Trap: AAFE, version: 07,
routine descriptor flags: 00 (NotIndexable),
loadLocation: 00000000, reserved2: 00,
selectorInfo: 00 (No Selector),
routine count: 0000
--- Routine Record 00000000 -----
procInfo: 000002F0, reserved1: 00, ISAType: 01 (kPowerPCISA),
Routine Flags: 0004 (IsAbsolute, IsPrepared, NativeISA,
PassSelector, IsNotDefault), procPtr: 01219EEC,
storedOffset: 00000000, selector: 00000000
Dave There's only one routine associated with the trap and it's the native
implementation.
cam Where's that function? On the Power Mac, every ProcPtr is actually a data
structure that contains the routine's real address and TOC. This is called a
TVector (transition vector). This allows every fragment to have its own
globals, because the correct TOC gets loaded for each routine by the runtime
environment. So, to find the routine's address, you need to dereference the
ProcPtr.
WH 1219EEC^
Address 00E77B78 is in the "Porky WProcessor" heap at 00DFC430
The address is in a CFM fragment "Porky WProcessor" [non-write exec]
It is 00073058 bytes into this heap block:
Start Length Tag Mstr Ptr Lock Prg Type ID File Name
* 00E04B20 003D35D8+0C N
Dave Apparently it's in the heap of the application.
cam So this program is patching GetResource. At least they have a native patch
-- a good habit these days because you don't know what traps will go native
from now on. If you're patching native PowerPC code with 680x0 code,
performance-sensitive code will run slower. For this reason, you should make
all of your patches fat. Let's disassemble the patch on GetResource.
ilp 1219eec^
Disassembling PowerPC code from 1219eec^
No procedure name
00E77B78 stwu SP,-0x0058(SP)
00E77B7C mflr r12
00E77B80 stw r12,0x0060(SP)
00E77B84 stmw r26,0x0040(SP)
00E77B88 stw r3,0x0070(SP)
00E77B8C sth r4,0x0074(SP)
00E77B90 extsh r4,r4
00E77B94 lis r5,0x4D42
00E77B98 ori r5,r5,0x4446
00E77B9C cmplw cr2,r3,r5
...
00E77C10 lmw r26,0x0040(SP)
00E77C14 lwz r12,0x0060(SP)
00E77C18 mtlr r12
00E77C1C addic SP,SP,0x0058
00E77C20 blr
Dave At 0x00E77B9C they do a compare and store the result in CR2. However, they
don't save and restore CR2 across this function, so it's trashed when we return
to ConstructNFNTDirectory.
cam OK, I restart the program and manually save and restore the value of CR2
across the GetResource calls. I do this by futzing with bit 2 in CR2.
Dave Everything prints fine.
cam It looks like a compiler bug. Either they shouldn't be using CR2 or they
should be preserving it. In any case, the GetResource patch is trashing CR2,
and that changes a Boolean which causes us to read in extra data. The caller
never allocated enough space for the extra data, so the QuickDraw GX heap gets
corrupted.
Dave Holy cow! A compiler bug. Shouldn't we notify the compiler developer?
cam Well, this company has their own in-house development tools group. They
write their own compilers, linkers, and debuggers. We should contact them
anyway, so that they can create a patch that fixes this problem. [This patch,
"Office4.2x Update for Power Mac," is now available on most online services.]
Dave Why are they patching GetResource?
cam It looks like they were looking for resources of type 'MBDF' (menu bar
definition procedures). I can tell this from the instructions at addresses
0x00E77B94 through 0x00E77B9C. The PowerPC architecture has a limitation of 16
bits on the size of an immediate constant. So, if you wanted to compare a value
against a 32-bit constant, you would have to build the 32-bit value with two
instructions. This is what occurs at addresses 0x00E77B94 and 0x00E77B98, where
they insert 0x4D42 and 0x4446 together into a 32-bit value. If you look at the
ASCII of this constant, it's 'MBDF'. At address 0x00E77B9C, they compare this
constant to the resource type parameter passed to GetResource. Since that
parameter is the first parameter, it will be in register R3.
Dave Why didn't we crash when we had only one NFNT font installed?
cam This patch would cause ConstructNFNTDirectory to always overwrite the
buffer passed in. But that wouldn't always cause your machine to freak out. By
adding enough NFNT fonts, we trashed the QuickDraw GX heap significantly enough
to cause the crash.
Dave Wow, all this and it was an application patch that caused the problem!
It sure would have been cool if we could have used the patch dcmd.
cam Yeah. The patch dcmd does works on the Power Mac -- but we didn't know
that was the problem when we started.
Dave It's interesting that it was an application bug. That would explain why I
crash in a spreadsheet application by the same company. They share the same
patch.
cam Nasty.
Dave Yeah.
DAVE HERSEY (AppleLink HERSEY) works in the QuickDraw GX PrintShop level 4
bio-containment facility, thousands of feet beneath the Cupertino R&D
campus. There, he develops PowerPC-native QuickDraw GX printing code, works on
Copland, and relaxes by dabbling with an occasional hot agent over lunch.*
CAMERON ESFAHANI (AppleLink DIRTY, Internet dirty@powertalk.apple.com) is the
shortest member of the Graphics team at Apple. To add a few more inches to his
height, he sometimes wears roller blades in meetings. If that doesn't help, he
has been known to don his large purple hat with sparkles.*
SCORING
80-100 You could have a promising career writing compilers for a company up
north.
45-70 Dr. MacsBug could always use another assistant.
25-40 Don't worry, it took us a while to figure it out too.
5-20 Visual Basic fan, are you?*
Thanks to Alex Beaman, Tom Dowdy, Ron Voss, KON (Konstantin Othmer), and BAL
(Bruce Leak) for reviewing this column.*