TweetFollow Us on Twitter

Optimizing for PPC
Volume Number:12
Issue Number:5
Column Tag:Book Review

The Need for Speed

Learn the nitty-gritty of PowerPC optimization

By Mike Scanlin

Optimizing PowerPC Code:
Programming the PowerPC Chip in Assembly Language

By Gary Kacmarcik

Addison-Wesley, 1995

ISBN 0-201-40839-2, 694 pages (softback). $39.95.

I’m disappointed. It’s just no challenge any more. It took me years of careful trial, error, repeated error, and determined study, to perfect my 680x0 optimizing skills to the point where I really understood the chip from a software point of view. I was looking forward to the same kind of challenge on the PowerPC (scrounging for obscure magazine articles, surfing the net looking for example code, writing and timing code three different ways, disassembling all the programs with good performance to see how they did it, etc.). But now that I’ve read this book, all the hard theory has been taken care of, and the only thing remaining is to do a few PowerPC assembly language projects and put the theory to the test. Mr. Kacmarcik has cut short my search for knowledge by writing a book which makes plain everything about the PowerPC chip, including the subtle pipeline and cache interactions that a true optimizer wants to know.

This book is intended for programmers with some high-level experience and at least a little experience with assembly language. It does not explain what hexadecimal means, for example, but it does define concepts like “latency” and “throughput”.

The first nine of the sixteen chapters review in precise detail the entire PowerPC instruction set and architecture. The purpose of these chapters is to broaden the audience for this book. Anyone with PowerPC experience could skim these 170 pages in an hour or so. For the rest, though, it is a reasonable starting point. Unfortunately, there are too few examples for the descriptions of the individual instructions to be meaningful. It’s like someone handing you a book on how to write poetry where the first hundred pages are a dictionary explaining all the words you can use in your poems but not really giving you the context or any examples to appreciate them. It’s hard to separate the really important stuff (like everyday instructions, registers and concepts) from the stuff that was just put in for the sake of completeness. An uninitiated person who tries to understand it all will probably become overwhelmed. I can accept that these chapters are meant to be an introduction and a bit of a reference (in addition to the complete references in the appendices), but it’s a little too much, too soon, in my opinion.

The next seven chapters, and especially Appendix D, are the reason to buy this book. They contain the info that is hard to find elsewhere. The chapter titles will give you a good idea of what you’ll find:

10. Memory and Caches

11. Pipelining

12. PowerPC 601 Instruction Timing

13. Programming Model [C calling conventions]

14. Introduction to Optimizing

15. Resource Scheduling

16. More Optimization Techniques

Appendix D. Optimization Summary

The cache discussion reviews how set-associative caches work. This is good info that you can apply to designing your own caches in higher-level languages like C. It is interesting to read that cache simulations have shown nearly identical cache hit rates for caches with random line-replacement algorithms and caches with least-recently-used line-replacement algorithms. There are tidbits of useful information sprinkled throughout this chapter, such as the sentence, “According to the PowerPC ISA, the programmer should assume that the processor has a split (instruction/data) cache, and that the processor will not automatically keep the instruction cache consistent with data written via the store instructions (that is, with the data cache).” Writers of self-modifying code, beware.

Even though the cache discussion is complete, it illustrates a problem that several of the chapters have: it’s missing down-to-earth examples. For instance, it says the 601 has “a unified 32K, eight-way set associative cache”, and explains what that means technically, but it doesn’t go on to tell me how far apart two addresses need to be before they map to the same cache line. If I’m working on an image-filtering application, it is really useful to know what sizes not to use for rowBytes (to avoid thrashing the data cache) if my algorithm visits all the pixels down a vertical column.

The instruction timing chapter was one of my favorites. Here’s an example of the kind of precision you can expect:

The Multiply Low Immediate (mulli) instruction always takes five cycles in IE. The length of time that the other multiply instructions spend in IE is dependent on the data contained in rB. If the upper 16 bits of rB are all sign bits, then the instruction spends five cycles in IE, otherwise it spends nine cycles. This means that the lesser (in magnitude) of the two arguments should be placed in rB because there is a potential savings of four cycles if -2^15 <= rB < (2^15 - 1).

All your favorite timing topics are handled here along with micro-examples to illustrate each stage of the pipeline for the entire sequence of instructions. Topics include: branch prediction (taken and not taken), cache hits and misses, pipeline synchronization, pipeline stalls, misaligned data accesses, and more. Here’s another example of the kind of details you’ll find. This is from the discussion of instruction fetching:

This may seem like a strange thing to affect timing, but the address affects where the data will be stored in the cache, and the cache timing is different when the request is from the upper or lower part of a cache line. If your timings always assume that you’ll receive four or eight instructions at a time, you may be surprised when the code is timed on a real system . For a critical loop, it might be worthwhile to place a few nops before the loop so that it fits nicely into a cache line.

The programming model chapter was good. I especially liked the explanation of how leaf routines that don’t need more than 220 bytes of stack space don’t need to allocate a stack frame (because, by convention, interrupt routines know not to use the 220 bytes above the current stack pointer - known as the “Red Zone” in Inside Macintosh). This chapter also discusses why you should not use the Load and Store Multiple instructions.

I must say I was disappointed that the chapter titled “Introduction To Optimizing” was only eight pages long. I was hoping that after plowing through 300 pages of details I would finally get to see 100 lines of before and after PowerPC assembly. But I didn’t. So I kept plowing ahead and on page 317 I found out that, as a rule of thumb, I should always place two independent instructions between two branches that are taken (jumps to subroutines, perhaps). As I got further and further into the book I would find a gem like this every 20 to 50 pages. I couldn’t help but think: “These are the really useful pieces of information; why can’t he just list everything like this and give lots of examples?” Then I found Appendix D.

Appendix D begins on page 677 and ends on page 678. But those are the two best pages in the whole book. If you want to apply the 90-10 rule to reading this book and you only have time to read two pages, then you better make it these two - they are the “rules of thumb” to follow when writing PowerPC assembly code. If you do these things right then a large portion of your optimizing job will be done.

This is a great book. I was frustrated that I had to read almost 700 pages before I found the summary of tricks that I was looking for. But there are lots of little bits sprinkled throughout, such as the table on page 347 that shows how to multiply something by 3 through 10 with no more than 3 integer shifts, adds and subtracts. Mechanically, the book is beautiful to read. It is nicely typeset with fonts, font sizes and diagrams well chosen.

My biggest complaint is that I want to see real-world code examples (i.e. more than five instruction sequences) in action. I’d like the author to provide some high-resolution timer code so that I can time my own code and know if I’ve made a difference (how about a performance workbench to experiment with?). And I’d like to see things like a C program calling some performance bottleneck written in assembly so I could get a bigger picture of how all this code fits together in a real program. Nevertheless, if you have any interest in writing fast PowerPC code, you should buy this book.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Monosnap 3.5.9 - Versatile screenshot ut...
Monosnap lets you capture screenshots, share files, and record video and .gifs. Features Capture Capture full screen, just part of the screen, or a selected window Make your crop area pixel... Read more
Tweetbot 3 for Twitter 3.3.1 - Popular T...
Tweetbot is a full-featured OS X Twitter client with a lot of personality. Whether it's the meticulously-crafted interface, sounds and animation, or features like multiple timelines and column views... Read more
Tunnelblick 3.8.0 - GUI for OpenVPN.
Tunnelblick is a free, open source graphic user interface for OpenVPN on OS X. It provides easy control of OpenVPN client and/or server connections. It comes as a ready-to-use application with all... Read more
Bookends 13.2.5 - Reference management a...
Bookends is a full-featured bibliography/reference and information-management system for students and professionals. Bookends uses the cloud to sync reference libraries on all the Macs you use.... Read more
Quicken 2019 5.11.2 - Complete personal...
Quicken makes managing your money easier than ever. Whether paying bills, upgrading from Windows, enjoying more reliable downloads, or getting expert product help, Quicken's new and improved features... Read more
Bookends 13.2.5 - Reference management a...
Bookends is a full-featured bibliography/reference and information-management system for students and professionals. Bookends uses the cloud to sync reference libraries on all the Macs you use.... Read more
Quicken 2019 5.11.2 - Complete personal...
Quicken makes managing your money easier than ever. Whether paying bills, upgrading from Windows, enjoying more reliable downloads, or getting expert product help, Quicken's new and improved features... Read more
Dashlane 6.1927.0 - Password manager and...
Dashlane is an award-winning service that revolutionizes the online experience by replacing the drudgery of everyday transactional processes with convenient, automated simplicity - in other words,... Read more
Capo 3.7.4 - Slow down and learn to play...
Capo lets you slow down your favorite songs so you can hear the notes and learn how they are played. With Capo, you can quickly tab out your songs atop a highly-detailed OpenCL-powered spectrogram... Read more
BetterTouchTool 3.153 - Customize multi-...
BetterTouchTool adds many new, fully customizable gestures to the Magic Mouse, Multi-Touch MacBook trackpad, and Magic Trackpad. These gestures are customizable: Magic Mouse: Pinch in / out (zoom)... Read more

Latest Forum Discussions

See All

Void Tyrant guide - Tips and tricks for...
Void Tyrant continues to get a lot of play in these parts. Probably because the game is just so deep and varied. The next stop on our guide series for Void Tyrant is class-specific guides. First up is the Knight, as it’s the first class anyone has... | Read more »
Summon beasts and battle evil in epic re...
Imagine a tale of conlict between factions of good and evil, where rogueish heroes summon beasts to aid them in them in warfare and courageously battle dragons over fields of scorched earth and brimstone - that's exactly the essence of epic fantasy... | Read more »
Upcoming visual novel Arranged shines a...
If you’re in the market for a new type of visual novel designed to inform and make you think deeply about its subject matter, then Arranged by Kabuk Games could be exactly what you’re looking for. It’s a wholly unique take on marital traditions in... | Read more »
TEPPEN guide - The three best decks in T...
TEPPEN’s unique take on the collectible card game genre is exciting. It’s just over a week old, but that isn’t stopping lots of folks from speculating about the long-term viability of the game, as well as changes and additions that will happen over... | Read more »
Intergalactic puzzler Silly Memory serve...
Recently released matching puzzler Silly Memory is helping its fans with their intergalactic journeys this month with some very special offers on in-app purchases. In case you missed it, Silly Memory is the debut title of French based indie... | Read more »
TEPPEN guide - Tips and tricks for new p...
TEPPEN is a wild game that nobody asked for, but I’m sure glad it exists. Who would’ve thought that a CCG featuring Capcom characters could be so cool and weird? In case you’re not completely sure what TEPPEN is, make sure to check out our review... | Read more »
Dr. Mario World guide - Other games that...
We now live in a post-Dr. Mario World world, and I gotta say, things don’t feel too different. Nintendo continues to squirt out bad games on phones, causing all but the most stalwart fans of mobile games to question why they even bother... | Read more »
Strategy RPG Brown Dust introduces its b...
Epic turn-based RPG Brown Dust is set to turn 500 days old next week, and to celebrate, Neowiz has just unveiled its biggest and most exciting update yet, offering a host of new rewards, increased gacha rates, and a brand new feature that will... | Read more »
Dr. Mario World is yet another disappoin...
As soon as I booted up Dr. Mario World, I knew I wasn’t going to have fun with it. Nintendo’s record on phones thus far has been pretty spotty, with things trending downward as of late. [Read more] | Read more »
Retro Space Shooter P.3 is now available...
Shoot-em-ups tend to be a dime a dozen on the App Store, but every so often you come across one gem that aims to shake up the genre in a unique way. Developer Devjgame’s P.3 is the latest game seeking to do so this, working as a love letter to the... | Read more »

Price Scanner via MacPrices.net

Flash sale! New 11″ 1TB WiFi iPad Pros for th...
Amazon has the 11″ 1TB WiFi iPad Pro on sale today for only $1199.99 including free shipping. Their price is $350 off Apple’s MSRP for this model, and it’s the lowest price ever for a 1TB 11″ iPad... Read more
Weekend Deal: 2018 13″ MacBook Airs starting...
B&H Photo has clearance 2018 13″ MacBook Airs available starting at only $999 with all models now available for $200 off Apple’s original MSRP. Overnight shipping, or expedited shipping, is free... Read more
Apple has clearance 10.5″ iPad Pros available...
Apple has Certified Refurbished 2017 10.5″ iPad Pros available starting at $469. An Apple one-year warranty is included with each iPad, outer shells are new, and shipping is free: – 64GB 10″ iPad Pro... Read more
Apple restocks refurbished iPad mini 4 models...
Apple has restocked Certified Refurbished 32GB iPad mini 4 WiFi models for $229 shipped. That’s $70 off original MSRP for the iPad mini 4. Space Gray, Silver, and Gold colors are available. Read more
Apple, Yet Again, Is Missing An Ultraportable...
EDITORIAL: 07.19.19 Prior to the decision made by Apple earlier this month to retire the thin and light MacBook model with a 12-inch retina display, the Cupertino, California-based company offered,... Read more
Verizon is offering a 50% discount on iPhone...
Verizon is offering 50% discounts on Apple iPhone 8 and iPhone 8 Plus models though July 24th, plus save 50% on activation fees. New line required. The fine print: “New device payment & new... Read more
Get a new 21″ iMac for under $1000 today at t...
B&H Photo has new 21″ Apple iMacs on sale for up to $100 off MSRP with models available starting at $999. These are the same iMacs offered by Apple in their retail and online stores. Shipping is... Read more
Clearance 2017 15″ 2.8GHz Touch Bar MacBook P...
Apple has Certified Refurbished 2017 15″ 2.8GHz Space Gray Touch Bar MacBook Pros available for $1809. Apple’s refurbished price is currently the lowest available for a 15″ MacBook Pro. An standard... Read more
Clearance 12″ 1.2GHz MacBook on sale for $899...
Focus Camera has clearance 12″ 1.2GHz Space Gray MacBooks available for $899.99 shipped. That’s $400 off Apple’s original MSRP. Focus charges sales tax for NY & NJ residents only. Read more
Get a new 2019 13″ 2.4GHz 4-Core MacBook Pro...
B&H Photo has new 2019 13″ 2.4GHz MacBook Pros on sale for up to $150 off Apple’s MSRP. Overnight shipping is free to many addresses in the US: – 2019 13″ 2.4GHz/256GB 6-Core MacBook Pro Silver... Read more

Jobs Board

Best Buy *Apple* Computing Master - Best Bu...
**711346BR** **Job Title:** Best Buy Apple Computing Master **Job Category:** Store Associates **Location Number:** 001095-Chesterfield-Store **Job Description:** Read more
Best Buy *Apple* Computing Master - Best Bu...
**704899BR** **Job Title:** Best Buy Apple Computing Master **Job Category:** Sales **Location Number:** 000135-Pleasant Hill-Store **Job Description:** **What does Read more
Best Buy *Apple* Computing Master - Best Bu...
**707083BR** **Job Title:** Best Buy Apple Computing Master **Job Category:** Sales **Location Number:** 000045-Rockford-Store **Job Description:** **What does a Read more
Geek Squad *Apple* Master Consultation Agen...
**702908BR** **Job Title:** Geek Squad Apple Master Consultation Agent **Job Category:** Services/Installation/Repair **Location Number:** 000360-Williston-Store Read more
Best Buy *Apple* Computing Master - Best Bu...
**711023BR** **Job Title:** Best Buy Apple Computing Master **Job Category:** Sales **Location Number:** 000012-St Cloud-Store **Job Description:** **What does a Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.