TweetFollow Us on Twitter

Optimizing for PPC
Volume Number:12
Issue Number:5
Column Tag:Book Review

The Need for Speed

Learn the nitty-gritty of PowerPC optimization

By Mike Scanlin

Optimizing PowerPC Code:
Programming the PowerPC Chip in Assembly Language

By Gary Kacmarcik

Addison-Wesley, 1995

ISBN 0-201-40839-2, 694 pages (softback). $39.95.

I’m disappointed. It’s just no challenge any more. It took me years of careful trial, error, repeated error, and determined study, to perfect my 680x0 optimizing skills to the point where I really understood the chip from a software point of view. I was looking forward to the same kind of challenge on the PowerPC (scrounging for obscure magazine articles, surfing the net looking for example code, writing and timing code three different ways, disassembling all the programs with good performance to see how they did it, etc.). But now that I’ve read this book, all the hard theory has been taken care of, and the only thing remaining is to do a few PowerPC assembly language projects and put the theory to the test. Mr. Kacmarcik has cut short my search for knowledge by writing a book which makes plain everything about the PowerPC chip, including the subtle pipeline and cache interactions that a true optimizer wants to know.

This book is intended for programmers with some high-level experience and at least a little experience with assembly language. It does not explain what hexadecimal means, for example, but it does define concepts like “latency” and “throughput”.

The first nine of the sixteen chapters review in precise detail the entire PowerPC instruction set and architecture. The purpose of these chapters is to broaden the audience for this book. Anyone with PowerPC experience could skim these 170 pages in an hour or so. For the rest, though, it is a reasonable starting point. Unfortunately, there are too few examples for the descriptions of the individual instructions to be meaningful. It’s like someone handing you a book on how to write poetry where the first hundred pages are a dictionary explaining all the words you can use in your poems but not really giving you the context or any examples to appreciate them. It’s hard to separate the really important stuff (like everyday instructions, registers and concepts) from the stuff that was just put in for the sake of completeness. An uninitiated person who tries to understand it all will probably become overwhelmed. I can accept that these chapters are meant to be an introduction and a bit of a reference (in addition to the complete references in the appendices), but it’s a little too much, too soon, in my opinion.

The next seven chapters, and especially Appendix D, are the reason to buy this book. They contain the info that is hard to find elsewhere. The chapter titles will give you a good idea of what you’ll find:

10. Memory and Caches

11. Pipelining

12. PowerPC 601 Instruction Timing

13. Programming Model [C calling conventions]

14. Introduction to Optimizing

15. Resource Scheduling

16. More Optimization Techniques

Appendix D. Optimization Summary

The cache discussion reviews how set-associative caches work. This is good info that you can apply to designing your own caches in higher-level languages like C. It is interesting to read that cache simulations have shown nearly identical cache hit rates for caches with random line-replacement algorithms and caches with least-recently-used line-replacement algorithms. There are tidbits of useful information sprinkled throughout this chapter, such as the sentence, “According to the PowerPC ISA, the programmer should assume that the processor has a split (instruction/data) cache, and that the processor will not automatically keep the instruction cache consistent with data written via the store instructions (that is, with the data cache).” Writers of self-modifying code, beware.

Even though the cache discussion is complete, it illustrates a problem that several of the chapters have: it’s missing down-to-earth examples. For instance, it says the 601 has “a unified 32K, eight-way set associative cache”, and explains what that means technically, but it doesn’t go on to tell me how far apart two addresses need to be before they map to the same cache line. If I’m working on an image-filtering application, it is really useful to know what sizes not to use for rowBytes (to avoid thrashing the data cache) if my algorithm visits all the pixels down a vertical column.

The instruction timing chapter was one of my favorites. Here’s an example of the kind of precision you can expect:

The Multiply Low Immediate (mulli) instruction always takes five cycles in IE. The length of time that the other multiply instructions spend in IE is dependent on the data contained in rB. If the upper 16 bits of rB are all sign bits, then the instruction spends five cycles in IE, otherwise it spends nine cycles. This means that the lesser (in magnitude) of the two arguments should be placed in rB because there is a potential savings of four cycles if -2^15 <= rB < (2^15 - 1).

All your favorite timing topics are handled here along with micro-examples to illustrate each stage of the pipeline for the entire sequence of instructions. Topics include: branch prediction (taken and not taken), cache hits and misses, pipeline synchronization, pipeline stalls, misaligned data accesses, and more. Here’s another example of the kind of details you’ll find. This is from the discussion of instruction fetching:

This may seem like a strange thing to affect timing, but the address affects where the data will be stored in the cache, and the cache timing is different when the request is from the upper or lower part of a cache line. If your timings always assume that you’ll receive four or eight instructions at a time, you may be surprised when the code is timed on a real system . For a critical loop, it might be worthwhile to place a few nops before the loop so that it fits nicely into a cache line.

The programming model chapter was good. I especially liked the explanation of how leaf routines that don’t need more than 220 bytes of stack space don’t need to allocate a stack frame (because, by convention, interrupt routines know not to use the 220 bytes above the current stack pointer - known as the “Red Zone” in Inside Macintosh). This chapter also discusses why you should not use the Load and Store Multiple instructions.

I must say I was disappointed that the chapter titled “Introduction To Optimizing” was only eight pages long. I was hoping that after plowing through 300 pages of details I would finally get to see 100 lines of before and after PowerPC assembly. But I didn’t. So I kept plowing ahead and on page 317 I found out that, as a rule of thumb, I should always place two independent instructions between two branches that are taken (jumps to subroutines, perhaps). As I got further and further into the book I would find a gem like this every 20 to 50 pages. I couldn’t help but think: “These are the really useful pieces of information; why can’t he just list everything like this and give lots of examples?” Then I found Appendix D.

Appendix D begins on page 677 and ends on page 678. But those are the two best pages in the whole book. If you want to apply the 90-10 rule to reading this book and you only have time to read two pages, then you better make it these two - they are the “rules of thumb” to follow when writing PowerPC assembly code. If you do these things right then a large portion of your optimizing job will be done.

This is a great book. I was frustrated that I had to read almost 700 pages before I found the summary of tricks that I was looking for. But there are lots of little bits sprinkled throughout, such as the table on page 347 that shows how to multiply something by 3 through 10 with no more than 3 integer shifts, adds and subtracts. Mechanically, the book is beautiful to read. It is nicely typeset with fonts, font sizes and diagrams well chosen.

My biggest complaint is that I want to see real-world code examples (i.e. more than five instruction sequences) in action. I’d like the author to provide some high-resolution timer code so that I can time my own code and know if I’ve made a difference (how about a performance workbench to experiment with?). And I’d like to see things like a C program calling some performance bottleneck written in assembly so I could get a bigger picture of how all this code fits together in a real program. Nevertheless, if you have any interest in writing fast PowerPC code, you should buy this book.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

MacCleanse 9.0 - $29.95
MacCleanse is the product of thousands of hours of intense research and development. It meticulously scans all of the nooks and crannies of a computer for unnecessary junk that can take up huge... Read more
MacPilot 12.0 - $15.96
MacPilot gives you the power of UNIX and the simplicity of Macintosh, which means a phenomenal amount of untapped power in your hands! Use MacPilot to unlock over 1,200 features, and access them all... Read more
NetNewsWire 5.1 - RSS and Atom news read...
NetNewsWire is the best way to keep up with the sites and authors you read most regularly. Let NetNewsWire pull down the latest articles, and read them in a distraction-free and Mac-like way. Native... Read more
FontExplorer X Pro 7.1.3 - Font manageme...
FontExplorer X Pro is optimized for professional use; it's the solution that gives you the power you need to manage all your fonts. Now you can more easily manage, activate and organize your... Read more
DiskCatalogMaker 8.2.5 - Catalog your di...
DiskCatalogMaker is a simple disk management tool which catalogs disks. Simple, light-weight, and fast Finder-like intuitive look and feel Super-fast search algorithm Can compress catalog data for... Read more
Skim 1.5.12 - PDF reader and note-taker...
Skim is a PDF reader and note-taker for OS X. It is designed to help you read and annotate scientific papers in PDF, but is also great for viewing any PDF file. Skim includes many features and has a... Read more
rekordbox 6.1.0.0030 - Professional DJ m...
rekordbox is the best way of preparing and managing your tracks, be it at home, in the studio, or even on the plane! It allows you to import music from other music-management software using the... Read more
iExplorer 4.4.0 - View and transfer file...
iExplorer is an iPhone browser for Mac lets you view the files on your iOS device. By using a drag and drop interface, you can quickly copy files and folders between your Mac and your iPhone or... Read more
OmniGraffle 7.17.5 - Create diagrams, fl...
OmniGraffle helps you draw beautiful diagrams, family trees, flow charts, org charts, layouts, and (mathematically speaking) any other directed or non-directed graphs. We've had people use Graffle to... Read more
Apple Configurator 2.13.1 - Configure an...
Apple Configurator makes it easy to deploy iPad, iPhone, iPod touch, and Apple TV devices in your school or business. Use Apple Configurator to quickly configure large numbers of devices connected to... Read more

Latest Forum Discussions

See All

Raziel: Dungeon Arena is a hack 'n...
Raziel: Dungeon Arena is available now on mobile and will appeal to fans of both comic books and old school dungeon crawlers. Not only will you hack 'n' slash your way through mobs of enemies but there's also fully-narrated animated comic to enjoy... | Read more »
Steam Link Spotlight - Hades
Steam Link Spotlight is a feature where we look at PC games that play exceptionally well using the Steam Link app. Our last entry was on Disco Elysium. Read about how it plays using Steam Link over here. | Read more »
Microsoft has acquired ZeniMax Media and...
In the latest of a series of blockbuster moves, Microsoft has now acquired Zenimax Media and its subsidiary, Bethesda Softworks, for $7.5 billion. [Read more] | Read more »
Infinity Mechs is an upcoming idle game...
Indie developer SkullStar studio has announced an upcoming idle mech game called Infinity Mechs. It draws inspiration from the mobile game Iron Saga and has been officially licensed by Game Duchy. It's set to launch for both iOS and Android on... | Read more »
PUBG Mobile Lite's latest update se...
PUBG Mobile Lite, the streamlined version of the popular battle royale that's designed to work on less powerful devices, sees the return of a popular game variant today, Survive Till Dawn mode. It arrives as part of the 0.19.0 content update. [... | Read more »
Matchy Catch, Jyamma Games’ new hyper-ca...
Matchy Catch is a new hyper-casual puzzler from Jyamma Games, the Italian studio behind the Pong-inspired puzzle-adventure Hi-Ball Rush. It’s only the developer’s second game for iOS and Android devices, but it promises to be every bit as fun and... | Read more »
Among Us! Imposter Guide - How to be a s...
Among Us! continues to be getting a lot of play in these parts, and since our first guide we've learned a thing or two about the game. This is especially true regarding the imposter role, as its a relatively rare opportunity that we've now put... | Read more »
Paladin's Story is an upcoming fant...
Paladin's Story is an upcoming fantasy RPG with an off-kilter sense of humour that's heading for iOS and Android. It will officially launch for both on September 16th though the game is already available on Google Play in Early Access. [Read more... | Read more »
Among Us! Guide - Tips for the uninitiat...
| Read more »
A Pretty Odd Bunny is a stealth-platform...
A Pretty Odd Bunny is a stealth-platformer from two-man team AJ Ordaz and René Rivera. It follows the story of a red-eyed rabbit who is allergic to carrots and instead has a penchant for devouring pigs. It's available now for Android devices. [... | Read more »

Price Scanner via MacPrices.net

The cheapest Macs are back in stock today at...
Apple has restocked clearance, previous-generation, Certified Refurbished Mac minis starting at only $599. Each mini comes with free shipping plus Apple’s standard one-year warranty. These are the... Read more
Sale! Amazon has 2020 13″ 2.0GHz MacBook Pros...
Amazon has 2020 13″ MacBook Pros with 10th generation Intel CPUs back in stock on sale again today for $150-$200 off Apple’s MSRP. Shipping is free. Be sure to purchase the MacBook Pro from Amazon,... Read more
Base 13″ 1.4GHz Apple MacBook Pros on sale fo...
Apple reseller Expercom is offering a $65-$75 discount on new 2020 13″ 1.4GHz MacBook Pros, depending on configuration. Shipping is free. Expercom estimates shipping in 3-5 days, as stock of Apple’s... Read more
Price drop! Get a 44mm Apple Watch Series 5 G...
Amazon has dropped their price on the 44mm Apple Watch Series 5 GPS + Cellular by $100 to $429 shipped. That’s $100 off Apple’s original MSRP for this model. For the latest prices and sales, see our... Read more
Verizon offers $200 discount on new Apple Wat...
Verizon will take up to $200 off the purchase of a new GPS + Cellular Apple Watch Series 6 or Apple Watch SE with select trade-in and the purchase of a new iPhone with service. The fine print: “Get... Read more
Verizon offers $250 discount on new 8th gener...
Verizon will take up to $250 off the price of an 8th generation 2020 Apple Cellular iPad with select trade-ins and a new iPhone purchase. Plus get Apple News+ free for 6 months. The fine print: “Save... Read more
Apple’s Implementation Of COVID-19 Exposure...
NEWS: 09.18.20 – The latest effort by Apple to embed exposure notifications for COVID-19 contact tracing right into its mobile operating system has some iPhone users weary of being exposed to... Read more
Here’s how to get a 16″ MacBook Pro for $300...
B&H Photo has new 16″ MacBook Pros on sale today for $250-$300 off Apple’s MSRP, starting at $2099. Expedited shipping is free to many addresses in the US: – 2019 16″ 2.6GHz 6-Core MacBook Pro... Read more
Apple has Certified Refurbished 16″ MacBook P...
Apple has Certified Refurbished 2019 16″ MacBook Pros available for up to $420 off the cost of new models, starting at $2039. Each model features a new outer case, shipping is free, and an Apple 1-... Read more
Price drops! Apple reseller B&H drops App...
B&H Photo has dropped prices on Apple Watch Series 5 models by $50-$70 off Apple’s original MSRP. Shipping is free. These are the same Apple Watch models sold by Apple in their retail and online... Read more

Jobs Board

Security Officer ($23.00/Hourly) - *Apple*...
**Security Officer \($23\.00/Hourly\) \- Apple Store** **Description** About NMS Built on a culture of safety and integrity, NMSdelivers award\-winning, integrated Read more
Security Officer ($23.00/Hourly) - *Apple*...
**Security Officer \($23\.00/Hourly\) \- Apple Store** **Description** About NMS Built on a culture of safety and integrity, NMSdelivers award\-winning, integrated Read more
Platform - Workplace Eng - *Apple* Enterpri...
MORE ABOUT THIS JOB We are looking for an Apple Platform Engineer who will bring a unique engineering skill set, support, clarity, organization and above all else, Read more
*Apple* Certified Repair Technician - Utah S...
…selected candidate will work in the USU Campus Store Tech Department as an Apple Certified Repair Technician and floor associate. This position is for both summer Read more
Senior Data Engineer - *Apple* - Theorem, L...
Job Summary Apple is seeking an experienced, detail-minded data engineeringconsultant to join our worldwide business development and strategy team. If you are Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.