TweetFollow Us on Twitter

Optimizing for PPC
Volume Number:12
Issue Number:5
Column Tag:Book Review

The Need for Speed

Learn the nitty-gritty of PowerPC optimization

By Mike Scanlin

Optimizing PowerPC Code:
Programming the PowerPC Chip in Assembly Language

By Gary Kacmarcik

Addison-Wesley, 1995

ISBN 0-201-40839-2, 694 pages (softback). $39.95.

I’m disappointed. It’s just no challenge any more. It took me years of careful trial, error, repeated error, and determined study, to perfect my 680x0 optimizing skills to the point where I really understood the chip from a software point of view. I was looking forward to the same kind of challenge on the PowerPC (scrounging for obscure magazine articles, surfing the net looking for example code, writing and timing code three different ways, disassembling all the programs with good performance to see how they did it, etc.). But now that I’ve read this book, all the hard theory has been taken care of, and the only thing remaining is to do a few PowerPC assembly language projects and put the theory to the test. Mr. Kacmarcik has cut short my search for knowledge by writing a book which makes plain everything about the PowerPC chip, including the subtle pipeline and cache interactions that a true optimizer wants to know.

This book is intended for programmers with some high-level experience and at least a little experience with assembly language. It does not explain what hexadecimal means, for example, but it does define concepts like “latency” and “throughput”.

The first nine of the sixteen chapters review in precise detail the entire PowerPC instruction set and architecture. The purpose of these chapters is to broaden the audience for this book. Anyone with PowerPC experience could skim these 170 pages in an hour or so. For the rest, though, it is a reasonable starting point. Unfortunately, there are too few examples for the descriptions of the individual instructions to be meaningful. It’s like someone handing you a book on how to write poetry where the first hundred pages are a dictionary explaining all the words you can use in your poems but not really giving you the context or any examples to appreciate them. It’s hard to separate the really important stuff (like everyday instructions, registers and concepts) from the stuff that was just put in for the sake of completeness. An uninitiated person who tries to understand it all will probably become overwhelmed. I can accept that these chapters are meant to be an introduction and a bit of a reference (in addition to the complete references in the appendices), but it’s a little too much, too soon, in my opinion.

The next seven chapters, and especially Appendix D, are the reason to buy this book. They contain the info that is hard to find elsewhere. The chapter titles will give you a good idea of what you’ll find:

10. Memory and Caches

11. Pipelining

12. PowerPC 601 Instruction Timing

13. Programming Model [C calling conventions]

14. Introduction to Optimizing

15. Resource Scheduling

16. More Optimization Techniques

Appendix D. Optimization Summary

The cache discussion reviews how set-associative caches work. This is good info that you can apply to designing your own caches in higher-level languages like C. It is interesting to read that cache simulations have shown nearly identical cache hit rates for caches with random line-replacement algorithms and caches with least-recently-used line-replacement algorithms. There are tidbits of useful information sprinkled throughout this chapter, such as the sentence, “According to the PowerPC ISA, the programmer should assume that the processor has a split (instruction/data) cache, and that the processor will not automatically keep the instruction cache consistent with data written via the store instructions (that is, with the data cache).” Writers of self-modifying code, beware.

Even though the cache discussion is complete, it illustrates a problem that several of the chapters have: it’s missing down-to-earth examples. For instance, it says the 601 has “a unified 32K, eight-way set associative cache”, and explains what that means technically, but it doesn’t go on to tell me how far apart two addresses need to be before they map to the same cache line. If I’m working on an image-filtering application, it is really useful to know what sizes not to use for rowBytes (to avoid thrashing the data cache) if my algorithm visits all the pixels down a vertical column.

The instruction timing chapter was one of my favorites. Here’s an example of the kind of precision you can expect:

The Multiply Low Immediate (mulli) instruction always takes five cycles in IE. The length of time that the other multiply instructions spend in IE is dependent on the data contained in rB. If the upper 16 bits of rB are all sign bits, then the instruction spends five cycles in IE, otherwise it spends nine cycles. This means that the lesser (in magnitude) of the two arguments should be placed in rB because there is a potential savings of four cycles if -2^15 <= rB < (2^15 - 1).

All your favorite timing topics are handled here along with micro-examples to illustrate each stage of the pipeline for the entire sequence of instructions. Topics include: branch prediction (taken and not taken), cache hits and misses, pipeline synchronization, pipeline stalls, misaligned data accesses, and more. Here’s another example of the kind of details you’ll find. This is from the discussion of instruction fetching:

This may seem like a strange thing to affect timing, but the address affects where the data will be stored in the cache, and the cache timing is different when the request is from the upper or lower part of a cache line. If your timings always assume that you’ll receive four or eight instructions at a time, you may be surprised when the code is timed on a real system . For a critical loop, it might be worthwhile to place a few nops before the loop so that it fits nicely into a cache line.

The programming model chapter was good. I especially liked the explanation of how leaf routines that don’t need more than 220 bytes of stack space don’t need to allocate a stack frame (because, by convention, interrupt routines know not to use the 220 bytes above the current stack pointer - known as the “Red Zone” in Inside Macintosh). This chapter also discusses why you should not use the Load and Store Multiple instructions.

I must say I was disappointed that the chapter titled “Introduction To Optimizing” was only eight pages long. I was hoping that after plowing through 300 pages of details I would finally get to see 100 lines of before and after PowerPC assembly. But I didn’t. So I kept plowing ahead and on page 317 I found out that, as a rule of thumb, I should always place two independent instructions between two branches that are taken (jumps to subroutines, perhaps). As I got further and further into the book I would find a gem like this every 20 to 50 pages. I couldn’t help but think: “These are the really useful pieces of information; why can’t he just list everything like this and give lots of examples?” Then I found Appendix D.

Appendix D begins on page 677 and ends on page 678. But those are the two best pages in the whole book. If you want to apply the 90-10 rule to reading this book and you only have time to read two pages, then you better make it these two - they are the “rules of thumb” to follow when writing PowerPC assembly code. If you do these things right then a large portion of your optimizing job will be done.

This is a great book. I was frustrated that I had to read almost 700 pages before I found the summary of tricks that I was looking for. But there are lots of little bits sprinkled throughout, such as the table on page 347 that shows how to multiply something by 3 through 10 with no more than 3 integer shifts, adds and subtracts. Mechanically, the book is beautiful to read. It is nicely typeset with fonts, font sizes and diagrams well chosen.

My biggest complaint is that I want to see real-world code examples (i.e. more than five instruction sequences) in action. I’d like the author to provide some high-resolution timer code so that I can time my own code and know if I’ve made a difference (how about a performance workbench to experiment with?). And I’d like to see things like a C program calling some performance bottleneck written in assembly so I could get a bigger picture of how all this code fits together in a real program. Nevertheless, if you have any interest in writing fast PowerPC code, you should buy this book.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Viber 11.9.1 - Send messages and make fr...
Viber lets you send free messages and make free calls to other Viber users, on any device and network, in any country! Viber syncs your contacts, messages and call history with your mobile device, so... Read more
Vallum 3.3.2 - $15.00
Vallum is a little tool that helps you monitor and block apps connections and throttle apps bandwidth. It is able to intercept connections at the application layer, and hold them while you decide... Read more
Microsoft OneNote 16.31 - Free digital n...
OneNote is your very own digital notebook. With OneNote, you can capture that flash of genius, that moment of inspiration, or that list of errands that's too important to forget. Whether you're at... Read more
Apple Pages 8.2.1 - Apple's word pr...
Apple Pages is a powerful word processor that gives you everything you need to create documents that look beautiful. And read beautifully. It lets you work seamlessly between Mac and iOS devices, and... Read more
Numbers 6.2.1 - Apple's spreadsheet...
With Apple Numbers, sophisticated spreadsheets are just the start. The whole sheet is your canvas. Just add dramatic interactive charts, tables, and images that paint a revealing picture of your data... Read more
f.lux 39.9873 - Adjusts the color of you...
f.lux makes the color of your computer's display adapt to the time of day, warm at night and like sunlight during the day. Ever notice how people texting at night have that eerie blue glow? Or wake... Read more
Deeper 2.5.0 - Enable hidden features in...
Deeper is a personalization utility for macOS which allows you to enable and disable the hidden functions of the Finder, Dock, QuickTime, Safari, iTunes, login window, Spotlight, and many of Apple's... Read more
NTFS 15.5.71 - Provides full read and wr...
NTFS breaks down the barriers between Windows and macOS. Paragon NTFS effectively solves the communication problems between the Mac system and NTFS. Write, edit, copy, move, delete files on NTFS... Read more
MTR 5.3.0.0 - The Mac's oldest and...
MTR (was MacTheRipper)--the Mac's oldest and smartest DVD-backup app. MTR - the complete toolbox, not a one-trick, point-and-click extractor. MTR is intended for making fair-use, backup copies of... Read more
Keynote 9.2.1 - Apple's presentatio...
Easily create gorgeous presentations with the all-new Keynote, featuring powerful yet easy-to-use tools and dazzling effects that will make you a very hard act to follow. The Theme Chooser lets you... Read more

Latest Forum Discussions

See All

Black Desert Mobile gets an official rel...
Pearl Abyss has just announced that its highly-anticipated MMO, Black Desert Mobile, will launch globally for iOS and Android on December 11th. [Read more] | Read more »
Another Eden receives new a episode, cha...
Another Eden, WFS' popular RPG, has received another update that brings new story content to the game alongside a few new heroes to discover. [Read more] | Read more »
Overdox guide - Tips and tricks for begi...
Overdox is a clever battle royale that changes things up by adding MOBA mechanics and melee combat to the mix. This new hybrid game can be quite a bit to take in at first, so we’ve put together a list of tips to help you get a leg up on the... | Read more »
Roterra Extreme - Great Escape is a pers...
Roterra Extreme – Great Escape has been described by developers Dig-It Games as a mini-sequel to their acclaimed title Roterra: Flip the Fairytale. It continues that game's tradition of messing with which way is up, tasking you with solving... | Read more »
Hearthstone: Battlegrounds open beta lau...
Remember earlier this year when auto battlers were the latest hotness? We had Auto Chess, DOTA Underlords, Chess Rush, and more all gunning for our attention. They all had their own reasons to play, but, at least from where I'm standing, most... | Read more »
The House of Da Vinci 2 gets a new gamep...
The House of Da Vinci launched all the way back in 2017. Now, developer Blue Brain Games is gearing up to deliver a second dose of The Room-inspired puzzling. Some fresh details have now emerged, alongside the game's first official trailer. [Read... | Read more »
Shoot 'em up action awaits in Battl...
BattleBrew Productions has just introduced another entry into its award winning, barrelpunk inspired, BattleSky Brigade series. Whilst its previous title BattleSky Brigade TapTap provided fans with idle town building gameplay, this time the... | Read more »
Arcade classic R-Type Dimensions EX blas...
If you're a long time fan of shmups and have been looking for something to play lately, Tozai Games may have just released an ideal game for you on iOS. R-Type Dimensions EX brings the first R-Type and its sequel to iOS devices. [Read more] | Read more »
Intense VR first-person shooter Colonicl...
Our latest VR obsession is Colonicle, an intense VR FPS, recently released on Oculus and Google Play, courtesy of From Fake Eyes and Goboogie Games. It's a pulse-pounding multiplayer shooter which should appeal to genre fanatics and newcomers alike... | Read more »
PUBG Mobile's incoming update bring...
PUGB Mobile's newest Royale Pass season they're calling Fury of the Wasteland arrives tomorrow and with it comes a fair chunk of new content to the game. We'll be seeing a new map, weapon and even a companion system. [Read more] | Read more »

Price Scanner via MacPrices.net

New 2019 16″ MacBook Pros on sale for $100 of...
Apple Authorized Reseller Adorama has new 2019 16″ MacBook Pros on sale today for $100 off Apple’s MSRP, each including free shipping. In addition, Adorama charges sales tax for NY & NJ residents... Read more
Apple Watch Series 3 GPS models on sale for l...
Amazon has Apple Watch Series 3 GPS models on sale starting at only $179. There prices are the lowest we’ve ever seen for these models from any Apple reseller. Choose Amazon as the seller rather than... Read more
iOS Bug In Facebook News Feed Lets Device Ca...
NEWS: 11.15.19- Users of the Facebook social media platform’s mobile app running on iOS devices won’t, like, this piece of news one bit in where a bug in the News Feed gave access to the camera... Read more
16″ MacBook Pros on sale! Preorder at Amazon...
Apple’s new 16″ MacBook Pros were only introduced yesterday, but Amazon is already offering a $100 discount on preorders. Prices for the base 6-Core 16″ MacBook Pros start at $2299: – 2019 16″ 2.6GHz... Read more
Use our exclusive MacBook Price Trackers to f...
Our Apple award-winning MacBook price trackers are the best place to look for the best sales & lowest prices on new and clearance MacBook Airs and MacBook Pros–including Apple’s new 16″ MacBook... Read more
New November Verizon iPhone deal: Get an iPho...
Verizon has the 64GB iPhone Xr on sale for 50% off for a limited time, plus they will include a free $200 prepaid MasterCard and a free Amazon Echo Dot. That reduces their price for the 64GB iPhone... Read more
Apple cuts prices on clearance, refurbished 2...
Apple has clearance 2018 15″ 6-Core Touch Bar MacBook Pros, Certified Refurbished, now available starting at only $1829. Each model features a new outer case, shipping is free, and an Apple 1-year... Read more
Up to $450 price drop on clearance 15″ MacBoo...
B&H Photo has dropped prices Apple’s 2019 15″ 6-Core and 8-Core MacBook Pros by $400-$450 off original MSRP, starting at $1999, with free overnight shipping available to many addresses in the US... Read more
Here’s how to save $200 on Apple’s new 16″ Ma...
Apple has released details of their Education discount associated with the new 2019 16″ 6-Core and 8-Core MacBook Pros. Take $200 off the price of the new 8-Core model (now $2599) and $200 off the 16... Read more
Price drop! 2019 15″ 2.6GHz 6-Core MacBook Pr...
Focus Camera has dropped their price for clearance 2019 15″ 2.6GHz 6-Core Space Gray MacBook Pros by $400 to $1999 shipped. Apple’s original MSRP for this model was $2399. Focus charges sales tax for... Read more

Jobs Board

Best Buy *Apple* Computing Master - Best Bu...
**746655BR** **Job Title:** Best Buy Apple Computing Master **Job Category:** Sales **Store NUmber or Department:** 002518-Atlantic Center-Store **Job Description:** Read more
*Apple* Mobility Pro - Best Buy (United Stat...
**744973BR** **Job Title:** Apple Mobility Pro **Job Category:** Store Associates **Store NUmber or Department:** 000949-Rochester Hills-Store **Job Description:** Read more
AV Systems Engineer at *Apple* - Theorem, L...
Job Summary Apple Retail Technology is looking for an Audio Visual Systems Engineer to design and implement scalable, next-generation A/V solutions for Apple ?s Read more
Nurse Practitioner - Field Based (San Bernard...
Nurse Practitioner - Field Based (San Bernardino, CA, Apple Valley, Hesperia) **Location:** **United States** **New** **Requisition #:** PS30312 **Post Date:** 3 Read more
Best Buy *Apple* Computing Master - Best Bu...
**746510BR** **Job Title:** Best Buy Apple Computing Master **Job Category:** Store Associates **Store NUmber or Department:** 001407-Milford-Store **Job Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.