TweetFollow Us on Twitter

Optimizing for PPC
Volume Number:12
Issue Number:5
Column Tag:Book Review

The Need for Speed

Learn the nitty-gritty of PowerPC optimization

By Mike Scanlin

Optimizing PowerPC Code:
Programming the PowerPC Chip in Assembly Language

By Gary Kacmarcik

Addison-Wesley, 1995

ISBN 0-201-40839-2, 694 pages (softback). $39.95.

I’m disappointed. It’s just no challenge any more. It took me years of careful trial, error, repeated error, and determined study, to perfect my 680x0 optimizing skills to the point where I really understood the chip from a software point of view. I was looking forward to the same kind of challenge on the PowerPC (scrounging for obscure magazine articles, surfing the net looking for example code, writing and timing code three different ways, disassembling all the programs with good performance to see how they did it, etc.). But now that I’ve read this book, all the hard theory has been taken care of, and the only thing remaining is to do a few PowerPC assembly language projects and put the theory to the test. Mr. Kacmarcik has cut short my search for knowledge by writing a book which makes plain everything about the PowerPC chip, including the subtle pipeline and cache interactions that a true optimizer wants to know.

This book is intended for programmers with some high-level experience and at least a little experience with assembly language. It does not explain what hexadecimal means, for example, but it does define concepts like “latency” and “throughput”.

The first nine of the sixteen chapters review in precise detail the entire PowerPC instruction set and architecture. The purpose of these chapters is to broaden the audience for this book. Anyone with PowerPC experience could skim these 170 pages in an hour or so. For the rest, though, it is a reasonable starting point. Unfortunately, there are too few examples for the descriptions of the individual instructions to be meaningful. It’s like someone handing you a book on how to write poetry where the first hundred pages are a dictionary explaining all the words you can use in your poems but not really giving you the context or any examples to appreciate them. It’s hard to separate the really important stuff (like everyday instructions, registers and concepts) from the stuff that was just put in for the sake of completeness. An uninitiated person who tries to understand it all will probably become overwhelmed. I can accept that these chapters are meant to be an introduction and a bit of a reference (in addition to the complete references in the appendices), but it’s a little too much, too soon, in my opinion.

The next seven chapters, and especially Appendix D, are the reason to buy this book. They contain the info that is hard to find elsewhere. The chapter titles will give you a good idea of what you’ll find:

10. Memory and Caches

11. Pipelining

12. PowerPC 601 Instruction Timing

13. Programming Model [C calling conventions]

14. Introduction to Optimizing

15. Resource Scheduling

16. More Optimization Techniques

Appendix D. Optimization Summary

The cache discussion reviews how set-associative caches work. This is good info that you can apply to designing your own caches in higher-level languages like C. It is interesting to read that cache simulations have shown nearly identical cache hit rates for caches with random line-replacement algorithms and caches with least-recently-used line-replacement algorithms. There are tidbits of useful information sprinkled throughout this chapter, such as the sentence, “According to the PowerPC ISA, the programmer should assume that the processor has a split (instruction/data) cache, and that the processor will not automatically keep the instruction cache consistent with data written via the store instructions (that is, with the data cache).” Writers of self-modifying code, beware.

Even though the cache discussion is complete, it illustrates a problem that several of the chapters have: it’s missing down-to-earth examples. For instance, it says the 601 has “a unified 32K, eight-way set associative cache”, and explains what that means technically, but it doesn’t go on to tell me how far apart two addresses need to be before they map to the same cache line. If I’m working on an image-filtering application, it is really useful to know what sizes not to use for rowBytes (to avoid thrashing the data cache) if my algorithm visits all the pixels down a vertical column.

The instruction timing chapter was one of my favorites. Here’s an example of the kind of precision you can expect:

The Multiply Low Immediate (mulli) instruction always takes five cycles in IE. The length of time that the other multiply instructions spend in IE is dependent on the data contained in rB. If the upper 16 bits of rB are all sign bits, then the instruction spends five cycles in IE, otherwise it spends nine cycles. This means that the lesser (in magnitude) of the two arguments should be placed in rB because there is a potential savings of four cycles if -2^15 <= rB < (2^15 - 1).

All your favorite timing topics are handled here along with micro-examples to illustrate each stage of the pipeline for the entire sequence of instructions. Topics include: branch prediction (taken and not taken), cache hits and misses, pipeline synchronization, pipeline stalls, misaligned data accesses, and more. Here’s another example of the kind of details you’ll find. This is from the discussion of instruction fetching:

This may seem like a strange thing to affect timing, but the address affects where the data will be stored in the cache, and the cache timing is different when the request is from the upper or lower part of a cache line. If your timings always assume that you’ll receive four or eight instructions at a time, you may be surprised when the code is timed on a real system . For a critical loop, it might be worthwhile to place a few nops before the loop so that it fits nicely into a cache line.

The programming model chapter was good. I especially liked the explanation of how leaf routines that don’t need more than 220 bytes of stack space don’t need to allocate a stack frame (because, by convention, interrupt routines know not to use the 220 bytes above the current stack pointer - known as the “Red Zone” in Inside Macintosh). This chapter also discusses why you should not use the Load and Store Multiple instructions.

I must say I was disappointed that the chapter titled “Introduction To Optimizing” was only eight pages long. I was hoping that after plowing through 300 pages of details I would finally get to see 100 lines of before and after PowerPC assembly. But I didn’t. So I kept plowing ahead and on page 317 I found out that, as a rule of thumb, I should always place two independent instructions between two branches that are taken (jumps to subroutines, perhaps). As I got further and further into the book I would find a gem like this every 20 to 50 pages. I couldn’t help but think: “These are the really useful pieces of information; why can’t he just list everything like this and give lots of examples?” Then I found Appendix D.

Appendix D begins on page 677 and ends on page 678. But those are the two best pages in the whole book. If you want to apply the 90-10 rule to reading this book and you only have time to read two pages, then you better make it these two - they are the “rules of thumb” to follow when writing PowerPC assembly code. If you do these things right then a large portion of your optimizing job will be done.

This is a great book. I was frustrated that I had to read almost 700 pages before I found the summary of tricks that I was looking for. But there are lots of little bits sprinkled throughout, such as the table on page 347 that shows how to multiply something by 3 through 10 with no more than 3 integer shifts, adds and subtracts. Mechanically, the book is beautiful to read. It is nicely typeset with fonts, font sizes and diagrams well chosen.

My biggest complaint is that I want to see real-world code examples (i.e. more than five instruction sequences) in action. I’d like the author to provide some high-resolution timer code so that I can time my own code and know if I’ve made a difference (how about a performance workbench to experiment with?). And I’d like to see things like a C program calling some performance bottleneck written in assembly so I could get a bigger picture of how all this code fits together in a real program. Nevertheless, if you have any interest in writing fast PowerPC code, you should buy this book.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Skype 8.52.0.138 - Voice-over-internet p...
Skype allows you to talk to friends, family and co-workers across the Internet without the inconvenience of long distance telephone charges. Using peer-to-peer data transmission technology, Skype... Read more
Bookends 13.2.6 - Reference management a...
Bookends is a full-featured bibliography/reference and information-management system for students and professionals. Bookends uses the cloud to sync reference libraries on all the Macs you use.... Read more
BusyContacts 1.4.0 - Fast, efficient con...
BusyContacts is a contact manager for OS X that makes creating, finding, and managing contacts faster and more efficient. It brings to contact management the same power, flexibility, and sharing... Read more
Chromium 77.0.3865.75 - Fast and stable...
Chromium is an open-source browser project that aims to build a safer, faster, and more stable way for all Internet users to experience the web. Version 77.0.3865.75: A list of changes is available... Read more
DiskCatalogMaker 7.5.5 - Catalog your di...
DiskCatalogMaker is a simple disk management tool which catalogs disks. Simple, light-weight, and fast Finder-like intuitive look and feel Super-fast search algorithm Can compress catalog data for... Read more
Alfred 4.0.4 - Quick launcher for apps a...
Alfred is an award-winning productivity application for OS X. Alfred saves you time when you search for files online or on your Mac. Be more productive with hotkeys, keywords, and file actions at... Read more
A Better Finder Rename 10.45 - File, pho...
A Better Finder Rename is the most complete renaming solution available on the market today. That's why, since 1996, tens of thousands of hobbyists, professionals and businesses depend on A Better... Read more
iFinance 4.5.11 - Comprehensively manage...
iFinance allows you to keep track of your income and spending -- from your lunchbreak coffee to your new car -- in the most convenient and fastest way. Clearly arranged transaction lists of all your... Read more
OmniGraffle Pro 7.11.3 - Create diagrams...
OmniGraffle Pro helps you draw beautiful diagrams, family trees, flow charts, org charts, layouts, and (mathematically speaking) any other directed or non-directed graphs. We've had people use... Read more
BBEdit 12.6.7 - Powerful text and HTML e...
BBEdit is the leading professional HTML and text editor for the Mac. Specifically crafted in response to the needs of Web authors and software developers, this award-winning product provides a... Read more

Latest Forum Discussions

See All

Five Nights at Freddy's AR: Special...
Five Nights at Freddy's AR: Special Delivery is a terrifying new nightmare from developer Illumix. Last week, FNAF fans were sent into a frenzy by a short teaser for what we now know to be Special Delivery. Those in the comments were quick to... | Read more »
Rush Rally 3's new live events are...
Last week, Rush Rally 3 got updated with live events, and it’s one of the best things to happen to racing games on mobile. Prior to this update, the game already had multiplayer, but live events are more convenient in the sense that it’s somewhat... | Read more »
Why your free-to-play racer sucks
It’s been this way for a while now, but playing Hot Wheels Infinite Loop really highlights a big issue with free-to-play mobile racing games: They suck. It doesn’t matter if you’re trying going for realism, cart racing, or arcade nonsense, they’re... | Read more »
Steam Link Spotlight - The Banner Saga 3
Steam Link Spotlight is a new feature where we take a look at PC games that play exceptionally well using the Steam Link app. Our last entry talked about Terry Cavanaugh’s incredible Dicey Dungeons. Read about how it’s a great mobile experience... | Read more »
PSA: GRIS has some issues
You may or may not have seen that Devolver Digital just released GRIS on the App Store, but we wanted to do a quick public service announcement to say that you might not want to hop on buying it just yet. The puzzle platformer has come to small... | Read more »
Explore the world around you in new matc...
Got a hankering for a fresh-feeling Match-3 puzzle game that offers a unique twist? You might find exactly what you’re looking for with What a Wonderful World, a new spin on the classic mobile genre which merges entertaining puzzles with global... | Read more »
Combo Quest (Games)
Combo Quest 1.0 Device: iOS Universal Category: Games Price: $.99, Version: 1.0 (iTunes) Description: Combo Quest is an epic, time tap role-playing adventure. In this unique masterpiece, you are a knight on a heroic quest to retrieve... | Read more »
Hero Emblems (Games)
Hero Emblems 1.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: ** 25% OFF for a limited time to celebrate the release ** ** Note for iPhone 6 user: If it doesn't run fullscreen on your device... | Read more »
Puzzle Blitz (Games)
Puzzle Blitz 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: Puzzle Blitz is a frantic puzzle solving race against the clock! Solve as many puzzles as you can, before time runs out! You have... | Read more »
Sky Patrol (Games)
Sky Patrol 1.0.1 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0.1 (iTunes) Description: 'Strategic Twist On The Classic Shooter Genre' - Indie Game Mag... | Read more »

Price Scanner via MacPrices.net

Save $150-$250 on 10.2″ WiFi + Cellular iPads...
Verizon is offering $150-$250 discounts on Apple’s new 10.2″ WiFi + Cellular iPad with service. Buy the iPad itself and save $150. Save $250 on the purchase of an iPad along with an iPhone. The fine... Read more
Apple continues to offer 13″ 2.3GHz Dual-Core...
Apple has Certified Refurbished 2017 13″ 2.3GHz Dual-Core non-Touch Bar MacBook Pros available starting at $1019. An standard Apple one-year warranty is included with each model, outer cases are new... Read more
Apple restocks 2018 MacBook Airs, Certified R...
Apple has restocked Certified Refurbished 2018 13″ MacBook Airs starting at only $849. Each MacBook features a new outer case, comes with a standard Apple one-year warranty, and is shipped free. The... Read more
Sunday Sale! 2019 27″ 5K 6-Core iMacs for $20...
B&H Photo has the new 2019 27″ 5K 6-Core iMacs on stock today and on sale for up to $250 off Apple’s MSRP. Overnight shipping is free to many locations in the US. These are the same iMacs sold by... Read more
Weekend Sale! 2019 13″ MacBook Airs for $200...
Amazon has new 2019 13″ MacBook Airs on sale for $200 off Apple’s MSRP, with prices starting at $899, each including free shipping. Be sure to select Amazon as the seller during checkout, rather than... Read more
2019 15″ MacBook Pros now on sale for $350-$4...
B&H Photo has Apple’s 2019 15″ 6-Core and 8-Core MacBook Pros on sale today for $350-$400 off MSRP, starting at $2049, with free overnight shipping available to many addresses in the US: – 2019... Read more
Buy one Apple Watch Series 5 at Verizon, get...
Buy one Apple Watch Series 5 at Verizon, and get a second Watch for 50% off. Plus save $10 on your first month of service. The fine print: “Buy Apple Watch, get another up to 50% off on us. Plus $10... Read more
Sprint offers 64GB iPhone 11 for free to new...
Sprint will include the 64GB iPhone 11 for free for new customers with an eligible trade-in in of the iPhone 7 or newer through September 19, 2019. The fine print: “iPhone 11 64GB $0/mo. iPhone 11... Read more
Verizon offers new iPhone 11 models for up to...
Verizon is offering Apple’s new iPhone 11 models for $500 off MSRP to new customers with an eligible trade-in (see list below). Discount is applied via monthly bill credits over 24 months. Verizon is... Read more
AT&T offers free $300 reward card + free...
AT&T Wireless will include a second free 64GB iPhone 11 with the purchase of one eligible iPhone at full price. They will also include a free $300 rewards card. The fine print: “Buy an elig.... Read more

Jobs Board

Student Employment (Blue *Apple* Cafe) Spri...
Student Employment (Blue Apple Cafe) Spring 2019 Penn State University Campus/Location: Penn State Brandywine Campus City: Media, PA Date Announced: 12/20/2018 Date Read more
Best Buy *Apple* Computing Master - Best Bu...
**732359BR** **Job Title:** Best Buy Apple Computing Master **Job Category:** Store Associates **Location Number:** 000171-Winchester Road-Store **Job Description:** Read more
*Apple* Mobile Master - Best Buy (United Sta...
**732324BR** **Job Title:** Apple Mobile Master **Job Category:** Store Associates **Location Number:** 000013-Fargo-Store **Job Description:** **What does a Best Read more
Best Buy *Apple* Computing Master - Best Bu...
**732455BR** **Job Title:** Best Buy Apple Computing Master **Job Category:** Sales **Location Number:** 000449-Auburn Hills-Store **Job Description:** **What does a Read more
*Apple* Mobility Pro - Best Buy (United Stat...
**732490BR** **Job Title:** Apple Mobility Pro **Job Category:** Store Associates **Location Number:** 000449-Auburn Hills-Store **Job Description:** At Best Buy, Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.