TweetFollow Us on Twitter

Optimizing for PPC
Volume Number:12
Issue Number:5
Column Tag:Book Review

The Need for Speed

Learn the nitty-gritty of PowerPC optimization

By Mike Scanlin

Optimizing PowerPC Code:
Programming the PowerPC Chip in Assembly Language

By Gary Kacmarcik

Addison-Wesley, 1995

ISBN 0-201-40839-2, 694 pages (softback). $39.95.

I’m disappointed. It’s just no challenge any more. It took me years of careful trial, error, repeated error, and determined study, to perfect my 680x0 optimizing skills to the point where I really understood the chip from a software point of view. I was looking forward to the same kind of challenge on the PowerPC (scrounging for obscure magazine articles, surfing the net looking for example code, writing and timing code three different ways, disassembling all the programs with good performance to see how they did it, etc.). But now that I’ve read this book, all the hard theory has been taken care of, and the only thing remaining is to do a few PowerPC assembly language projects and put the theory to the test. Mr. Kacmarcik has cut short my search for knowledge by writing a book which makes plain everything about the PowerPC chip, including the subtle pipeline and cache interactions that a true optimizer wants to know.

This book is intended for programmers with some high-level experience and at least a little experience with assembly language. It does not explain what hexadecimal means, for example, but it does define concepts like “latency” and “throughput”.

The first nine of the sixteen chapters review in precise detail the entire PowerPC instruction set and architecture. The purpose of these chapters is to broaden the audience for this book. Anyone with PowerPC experience could skim these 170 pages in an hour or so. For the rest, though, it is a reasonable starting point. Unfortunately, there are too few examples for the descriptions of the individual instructions to be meaningful. It’s like someone handing you a book on how to write poetry where the first hundred pages are a dictionary explaining all the words you can use in your poems but not really giving you the context or any examples to appreciate them. It’s hard to separate the really important stuff (like everyday instructions, registers and concepts) from the stuff that was just put in for the sake of completeness. An uninitiated person who tries to understand it all will probably become overwhelmed. I can accept that these chapters are meant to be an introduction and a bit of a reference (in addition to the complete references in the appendices), but it’s a little too much, too soon, in my opinion.

The next seven chapters, and especially Appendix D, are the reason to buy this book. They contain the info that is hard to find elsewhere. The chapter titles will give you a good idea of what you’ll find:

10. Memory and Caches

11. Pipelining

12. PowerPC 601 Instruction Timing

13. Programming Model [C calling conventions]

14. Introduction to Optimizing

15. Resource Scheduling

16. More Optimization Techniques

Appendix D. Optimization Summary

The cache discussion reviews how set-associative caches work. This is good info that you can apply to designing your own caches in higher-level languages like C. It is interesting to read that cache simulations have shown nearly identical cache hit rates for caches with random line-replacement algorithms and caches with least-recently-used line-replacement algorithms. There are tidbits of useful information sprinkled throughout this chapter, such as the sentence, “According to the PowerPC ISA, the programmer should assume that the processor has a split (instruction/data) cache, and that the processor will not automatically keep the instruction cache consistent with data written via the store instructions (that is, with the data cache).” Writers of self-modifying code, beware.

Even though the cache discussion is complete, it illustrates a problem that several of the chapters have: it’s missing down-to-earth examples. For instance, it says the 601 has “a unified 32K, eight-way set associative cache”, and explains what that means technically, but it doesn’t go on to tell me how far apart two addresses need to be before they map to the same cache line. If I’m working on an image-filtering application, it is really useful to know what sizes not to use for rowBytes (to avoid thrashing the data cache) if my algorithm visits all the pixels down a vertical column.

The instruction timing chapter was one of my favorites. Here’s an example of the kind of precision you can expect:

The Multiply Low Immediate (mulli) instruction always takes five cycles in IE. The length of time that the other multiply instructions spend in IE is dependent on the data contained in rB. If the upper 16 bits of rB are all sign bits, then the instruction spends five cycles in IE, otherwise it spends nine cycles. This means that the lesser (in magnitude) of the two arguments should be placed in rB because there is a potential savings of four cycles if -2^15 <= rB < (2^15 - 1).

All your favorite timing topics are handled here along with micro-examples to illustrate each stage of the pipeline for the entire sequence of instructions. Topics include: branch prediction (taken and not taken), cache hits and misses, pipeline synchronization, pipeline stalls, misaligned data accesses, and more. Here’s another example of the kind of details you’ll find. This is from the discussion of instruction fetching:

This may seem like a strange thing to affect timing, but the address affects where the data will be stored in the cache, and the cache timing is different when the request is from the upper or lower part of a cache line. If your timings always assume that you’ll receive four or eight instructions at a time, you may be surprised when the code is timed on a real system . For a critical loop, it might be worthwhile to place a few nops before the loop so that it fits nicely into a cache line.

The programming model chapter was good. I especially liked the explanation of how leaf routines that don’t need more than 220 bytes of stack space don’t need to allocate a stack frame (because, by convention, interrupt routines know not to use the 220 bytes above the current stack pointer - known as the “Red Zone” in Inside Macintosh). This chapter also discusses why you should not use the Load and Store Multiple instructions.

I must say I was disappointed that the chapter titled “Introduction To Optimizing” was only eight pages long. I was hoping that after plowing through 300 pages of details I would finally get to see 100 lines of before and after PowerPC assembly. But I didn’t. So I kept plowing ahead and on page 317 I found out that, as a rule of thumb, I should always place two independent instructions between two branches that are taken (jumps to subroutines, perhaps). As I got further and further into the book I would find a gem like this every 20 to 50 pages. I couldn’t help but think: “These are the really useful pieces of information; why can’t he just list everything like this and give lots of examples?” Then I found Appendix D.

Appendix D begins on page 677 and ends on page 678. But those are the two best pages in the whole book. If you want to apply the 90-10 rule to reading this book and you only have time to read two pages, then you better make it these two - they are the “rules of thumb” to follow when writing PowerPC assembly code. If you do these things right then a large portion of your optimizing job will be done.

This is a great book. I was frustrated that I had to read almost 700 pages before I found the summary of tricks that I was looking for. But there are lots of little bits sprinkled throughout, such as the table on page 347 that shows how to multiply something by 3 through 10 with no more than 3 integer shifts, adds and subtracts. Mechanically, the book is beautiful to read. It is nicely typeset with fonts, font sizes and diagrams well chosen.

My biggest complaint is that I want to see real-world code examples (i.e. more than five instruction sequences) in action. I’d like the author to provide some high-resolution timer code so that I can time my own code and know if I’ve made a difference (how about a performance workbench to experiment with?). And I’d like to see things like a C program calling some performance bottleneck written in assembly so I could get a bigger picture of how all this code fits together in a real program. Nevertheless, if you have any interest in writing fast PowerPC code, you should buy this book.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Final Cut Pro 10.6.4 - Professional vide...
Redesigned from the ground up, Final Cut Pro combines revolutionary video editing with a powerful media organization and incredible performance to let you create at the speed of thought.... Read more
iMovie 10.3.4 - Edit personal videos and...
With a streamlined design and intuitive editing features, iMovie lets you create Hollywood-style trailers and beautiful movies like never before. Browse your video library, share favorite moments,... Read more
Motion 5.6.2 - Create and customize Fina...
Motion is designed for video editors, Motion 5 lets you customize Final Cut Pro titles, transitions, and effects. Or create your own dazzling animations in 2D or 3D space, with real-time feedback as... Read more
iMazing 2.15.8 - Complete iOS device man...
iMazing is the world’s favourite iOS device manager for Mac and PC. Millions of users every year leverage its powerful capabilities to make the most of their personal or business iPhone and iPad.... Read more
VueScan 9.7.90 - Scanner software with a...
VueScan is a scanning program that works with most high-quality flatbed and film scanners to produce scans that have excellent color fidelity and color balance. VueScan is easy to use, and has... Read more
Compressor 4.6.2 - Adds power and flexib...
Compressor adds power and flexibility to Final Cut Pro X export. Customize output settings, work faster with distributed encoding, and tap into a comprehensive set of delivery features. Features:... Read more
Capture One 15.3.2.11 - RAW workflow sof...
Capture One is a professional RAW converter offering you ultimate image quality with accurate colors and incredible detail from more than 400 high-end cameras - straight out of the box. It offers... Read more
Vivaldi 5.4.2753.28 - An advanced browse...
Vivaldi is a browser for our friends. We live in our browsers. Choose one that has the features you need, a style that fits and values you can stand by. From the look and feel, to how you interact... Read more
Parallels Desktop 18.0.0 - Run Windows a...
Parallels allows you to run Windows and Mac applications side by side. Choose your view to make Windows invisible while still using its applications, or keep the familiar Windows background and... Read more
TechTool Pro 16.0.1 - Hard drive and sys...
TechTool Pro has long been one of the foremost utilities for keeping your Mac running smoothly and efficiently. With the release of this version, it has become more proficient than ever. Main... Read more

Latest Forum Discussions

See All

Turn-Based RPG ‘Avatar: Generations’ Sof...
Square Enix London Mobile, Navigator Games, and Paramount Consumer Products just announced that the turn-based RPG Avatar: Generations based on Nickelodeon’s Avatar: The Last Airbender is soft launching this month for mobile. Avatar: Generations is... | Read more »
Tower of Fantasy launches today and brin...
Level Infinite and Hotta Studio have announced the release of their very ambitious looking shared open world MMORPG Tower of Fantasy. With its cross-platform functionality between PC and mobile, it looks to be one to roll the dice on and enjoy at... | Read more »
‘Genshin Impact’ Version 3.0 Gets a New...
After HoYoverse released Genshin Impact (Free) version 2.8 on all platforms, the company has slowly been teasing the major upcoming 3.0 update. This update features the Sumeru region with many characters. While details on the update including a... | Read more »
Out Now: ‘Tower of Fantasy’, ‘Tightrope...
Each and every day new mobile games are hitting the App Store, and so each week we put together a big old list of all the best new releases of the past seven days. Back in the day the App Store would showcase the same games for a week, and then... | Read more »
SwitchArcade Round-Up: ‘Book Quest’, ‘Cl...
Hello gentle readers, and welcome to the SwitchArcade Round-Up for August 10th, 2022. In today’s article, we’ve got a little news about an update to a game I really like, a few new releases to summarize, and some sales to look at. A bit of a quiet... | Read more »
‘Pine Tar Poker’ is an Otherworldly Poke...
Developer BJ Malicoat, who put out the well-received and former Apple Game of the Day pick Downwordly in June of last year, is back working on another mobile game project called Pine Tar Poker, and it has caught my attention. Why? Because it’s a... | Read more »
Darkness Rises celebrates four years of...
Four years of uptime for a mobile game is akin to eternity, and this is exactly the milestone that Darkness Rises has reached. It is important for developers to keep updating to keep the game fresh, and NEXON has announced a massive anniversary... | Read more »
Keep Your Smatphone’s Case On When Using...
The original Gamevice was born as a sort of offshoot of the weird Wikipad gaming tablet/controller/hybrid thing way back in 2014. Interestingly, the first Gamevice controller for iOS only supported the iPad mini and launched in 2015, with versions... | Read more »
SwitchArcade Round-Up: A ‘Splatoon 3’ Ni...
Hello gentle readers, and welcome to the SwitchArcade Round-Up for August 9th, 2022. In today’s article, we’ve got some news about a Splatoon 3 Nintendo Direct, a review of QUByte’s Thunderbolt Collection, a single new release summary, and the usual... | Read more »
Orangepixel’s Pacifist Survival Game ‘Re...
Back in June we learned that long-time mobile developer Orangepixel, who also makes games for PC and consoles (including the Atari VCS!), would be bringing the unique survival game Residual to mobile devices sometime this year. Originally launched... | Read more »

Price Scanner via MacPrices.net

Apple has 24-inch M1 iMacs available starting...
Apple has 24-inch M1 iMacs with M1 CPUs (8-core CPU/7-core GPU) available today in their Certified Refurbished store for $1099 shipped. Their price is $200 off standard MSRP. Each iMac is in like-new... Read more
13″ M1 MacBook Airs in stock today for $799,...
QuickShip Electronics has open-box return 13″ M1 MacBook Airs in stock and on sale for $200 off MSRP on their eBay store right now, each with free express delivery. According to QuickShip, “The item... Read more
In stock today: Mac Studio models for up to $...
Apple retailer Expercom has Mac Studio models in stock today and on sale for up to $400 off Apple’s MSRP, depending on configuration. Their prices are the lowest price available for a Mac Studio from... Read more
Mac mini with M1 CPU and 512GB of storage on...
Amazon has the M1 Mac mini with a 512GB SSD in stock today on sale for $749.99 including free shipping. Their price is $150 off Apple’s MSRP, and it’s the lowest price available for this... Read more
Need a Mac or iPad for school? Get a free App...
Apple’s Back to School promotion for 2022 continues to run through September 26, 2022. As part of this promotion, Apple will include a free $150 Apple Gift Card with the purchase of any MacBook Air,... Read more
Apple Watch SE on sale for $50 off MSRP
Amazon has Apple Watch SE GPS models on sale for $50 off MSRP for a limited time, each including free shipping. Their prices are the lowest currently available for SE Watches: – 40mm Apple Watch SE... Read more
Save $310 on a 14″ 24-core GPU M1 Max MacBook...
Save $310 on 14″ MacBook Pros with 24-core M1 Max processors at Apple (32GB RAM/1TB SSD) with these Certified Refurbished models in stock today for $2789 in Space Gray or Silver colors. Regular price... Read more
14″ M1 Pro MacBook Pros available today at Ap...
Apple has Certified Refurbished standard-configuration 14″ MacBook Pros with M1 Pro CPUs available today for up to $250 off original MSRP, starting at $1799. Each model features a new outer case,... Read more
13″ MacBook Air with M2 CPU, in Starlight, on...
Apple retailer Expercom has the new Starlight 13″ MacBook Air with an M2 CPU (8GB RAM/256GB SSD) on sale for $1135.05, shipped, through August 12, 2022. Their price is $64 off Apple’s MSRP, and it’s... Read more
14″ M1 Pro MacBook Pro with 1TB SSD on sale f...
Expercom is offering a $200 instant discount on the 14″ M1 Pro MacBook Pro with a 1TB SSD through August 12, 2022. Their discount reduces the price of this configuration to $1999 shipped — the lowest... Read more

Jobs Board

Solutions Engineering Manager - *Apple* - S...
…in our Hardware and Advanced Solutions group leading and developing our Apple technical practice to increase revenue and profitability. The ideal candidate would Read more
Operations Associate - *Apple* Blossom Mall...
Operations Associate - Apple Blossom Mall Location:Winchester, VA, United States (https://jobs.jcp.com/jobs/location/191170/winchester-va-united-states) - Apple Read more
Cashier - *Apple* Blossom Mall - JCPenney (...
Cashier - Apple Blossom Mall Location:Winchester, VA, United States (https://jobs.jcp.com/jobs/location/191170/winchester-va-united-states) - Apple Blossom Mall Read more
Omnichannel Associate - *Apple* Blossom Mal...
Omnichannel Associate - Apple Blossom Mall Location:Winchester, VA, United States (https://jobs.jcp.com/jobs/location/191170/winchester-va-united-states) - Apple Read more
Sephora Beauty Advisor - *Apple* Blossom Ma...
Sephora Beauty Advisor - Apple Blossom Mall Location:Winchester, VA, United States (https://jobs.jcp.com/jobs/location/191170/winchester-va-united-states) - Apple Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.