TweetFollow Us on Twitter

Optimizing for PPC
Volume Number:12
Issue Number:5
Column Tag:Book Review

The Need for Speed

Learn the nitty-gritty of PowerPC optimization

By Mike Scanlin

Optimizing PowerPC Code:
Programming the PowerPC Chip in Assembly Language

By Gary Kacmarcik

Addison-Wesley, 1995

ISBN 0-201-40839-2, 694 pages (softback). $39.95.

I’m disappointed. It’s just no challenge any more. It took me years of careful trial, error, repeated error, and determined study, to perfect my 680x0 optimizing skills to the point where I really understood the chip from a software point of view. I was looking forward to the same kind of challenge on the PowerPC (scrounging for obscure magazine articles, surfing the net looking for example code, writing and timing code three different ways, disassembling all the programs with good performance to see how they did it, etc.). But now that I’ve read this book, all the hard theory has been taken care of, and the only thing remaining is to do a few PowerPC assembly language projects and put the theory to the test. Mr. Kacmarcik has cut short my search for knowledge by writing a book which makes plain everything about the PowerPC chip, including the subtle pipeline and cache interactions that a true optimizer wants to know.

This book is intended for programmers with some high-level experience and at least a little experience with assembly language. It does not explain what hexadecimal means, for example, but it does define concepts like “latency” and “throughput”.

The first nine of the sixteen chapters review in precise detail the entire PowerPC instruction set and architecture. The purpose of these chapters is to broaden the audience for this book. Anyone with PowerPC experience could skim these 170 pages in an hour or so. For the rest, though, it is a reasonable starting point. Unfortunately, there are too few examples for the descriptions of the individual instructions to be meaningful. It’s like someone handing you a book on how to write poetry where the first hundred pages are a dictionary explaining all the words you can use in your poems but not really giving you the context or any examples to appreciate them. It’s hard to separate the really important stuff (like everyday instructions, registers and concepts) from the stuff that was just put in for the sake of completeness. An uninitiated person who tries to understand it all will probably become overwhelmed. I can accept that these chapters are meant to be an introduction and a bit of a reference (in addition to the complete references in the appendices), but it’s a little too much, too soon, in my opinion.

The next seven chapters, and especially Appendix D, are the reason to buy this book. They contain the info that is hard to find elsewhere. The chapter titles will give you a good idea of what you’ll find:

10. Memory and Caches

11. Pipelining

12. PowerPC 601 Instruction Timing

13. Programming Model [C calling conventions]

14. Introduction to Optimizing

15. Resource Scheduling

16. More Optimization Techniques

Appendix D. Optimization Summary

The cache discussion reviews how set-associative caches work. This is good info that you can apply to designing your own caches in higher-level languages like C. It is interesting to read that cache simulations have shown nearly identical cache hit rates for caches with random line-replacement algorithms and caches with least-recently-used line-replacement algorithms. There are tidbits of useful information sprinkled throughout this chapter, such as the sentence, “According to the PowerPC ISA, the programmer should assume that the processor has a split (instruction/data) cache, and that the processor will not automatically keep the instruction cache consistent with data written via the store instructions (that is, with the data cache).” Writers of self-modifying code, beware.

Even though the cache discussion is complete, it illustrates a problem that several of the chapters have: it’s missing down-to-earth examples. For instance, it says the 601 has “a unified 32K, eight-way set associative cache”, and explains what that means technically, but it doesn’t go on to tell me how far apart two addresses need to be before they map to the same cache line. If I’m working on an image-filtering application, it is really useful to know what sizes not to use for rowBytes (to avoid thrashing the data cache) if my algorithm visits all the pixels down a vertical column.

The instruction timing chapter was one of my favorites. Here’s an example of the kind of precision you can expect:

The Multiply Low Immediate (mulli) instruction always takes five cycles in IE. The length of time that the other multiply instructions spend in IE is dependent on the data contained in rB. If the upper 16 bits of rB are all sign bits, then the instruction spends five cycles in IE, otherwise it spends nine cycles. This means that the lesser (in magnitude) of the two arguments should be placed in rB because there is a potential savings of four cycles if -2^15 <= rB < (2^15 - 1).

All your favorite timing topics are handled here along with micro-examples to illustrate each stage of the pipeline for the entire sequence of instructions. Topics include: branch prediction (taken and not taken), cache hits and misses, pipeline synchronization, pipeline stalls, misaligned data accesses, and more. Here’s another example of the kind of details you’ll find. This is from the discussion of instruction fetching:

This may seem like a strange thing to affect timing, but the address affects where the data will be stored in the cache, and the cache timing is different when the request is from the upper or lower part of a cache line. If your timings always assume that you’ll receive four or eight instructions at a time, you may be surprised when the code is timed on a real system . For a critical loop, it might be worthwhile to place a few nops before the loop so that it fits nicely into a cache line.

The programming model chapter was good. I especially liked the explanation of how leaf routines that don’t need more than 220 bytes of stack space don’t need to allocate a stack frame (because, by convention, interrupt routines know not to use the 220 bytes above the current stack pointer - known as the “Red Zone” in Inside Macintosh). This chapter also discusses why you should not use the Load and Store Multiple instructions.

I must say I was disappointed that the chapter titled “Introduction To Optimizing” was only eight pages long. I was hoping that after plowing through 300 pages of details I would finally get to see 100 lines of before and after PowerPC assembly. But I didn’t. So I kept plowing ahead and on page 317 I found out that, as a rule of thumb, I should always place two independent instructions between two branches that are taken (jumps to subroutines, perhaps). As I got further and further into the book I would find a gem like this every 20 to 50 pages. I couldn’t help but think: “These are the really useful pieces of information; why can’t he just list everything like this and give lots of examples?” Then I found Appendix D.

Appendix D begins on page 677 and ends on page 678. But those are the two best pages in the whole book. If you want to apply the 90-10 rule to reading this book and you only have time to read two pages, then you better make it these two - they are the “rules of thumb” to follow when writing PowerPC assembly code. If you do these things right then a large portion of your optimizing job will be done.

This is a great book. I was frustrated that I had to read almost 700 pages before I found the summary of tricks that I was looking for. But there are lots of little bits sprinkled throughout, such as the table on page 347 that shows how to multiply something by 3 through 10 with no more than 3 integer shifts, adds and subtracts. Mechanically, the book is beautiful to read. It is nicely typeset with fonts, font sizes and diagrams well chosen.

My biggest complaint is that I want to see real-world code examples (i.e. more than five instruction sequences) in action. I’d like the author to provide some high-resolution timer code so that I can time my own code and know if I’ve made a difference (how about a performance workbench to experiment with?). And I’d like to see things like a C program calling some performance bottleneck written in assembly so I could get a bigger picture of how all this code fits together in a real program. Nevertheless, if you have any interest in writing fast PowerPC code, you should buy this book.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Adobe Acrobat DC 20.009.20074 - Powerful...
Acrobat DC is available only as a part of Adobe Creative Cloud, and can only be installed and/or updated through Adobe's Creative Cloud app. Adobe Acrobat DC with Adobe Document Cloud services is... Read more
beaTunes 5.2.10 - Organize your music co...
beaTunes is a full-featured music player and organizational tool for music collections. How well organized is your music library? Are your artists always spelled the same way? Any R.E.M. vs REM?... Read more
DiskCatalogMaker 8.1.5 - Catalog your di...
DiskCatalogMaker is a simple disk management tool which catalogs disks. Simple, light-weight, and fast Finder-like intuitive look and feel Super-fast search algorithm Can compress catalog data for... Read more
Meteorologist 3.4.1 - Popular weather ap...
Meteorologist is a simple interface to weather provided by weather.com. It provides the ability to show the weather in the main menu bar, displaying more detail in a pop-up menu, whose contents are... Read more
NeoFinder 7.6 - Catalog your external me...
NeoFinder (formerly CDFinder) rapidly organizes your data, either on external or internal disks, or any other volumes. It catalogs and manages all your data, so you stay in control of your data... Read more
GarageSale 8.1.1 - Create outstanding eB...
GarageSale is a slick, full-featured client application for the eBay online auction system. Create and manage your auctions with ease. With GarageSale, you can create, edit, track, and manage... Read more
Firetask Pro 4.2.2 - Innovative task man...
Firetask Pro uniquely combines the advantages of classical priority-and-due-date-based task management with GTD. Stay focused and on top of your commitments - Firetask Pro's "Today" view shows all... Read more
Bookends 13.4.3 - Reference management a...
Bookends is a full-featured bibliography/reference and information-management system for students and professionals. Bookends uses the cloud to sync reference libraries on all the Macs you use.... Read more
LibreOffice 6.4.5.2 - Free, open-source...
LibreOffice is an office suite (word processor, spreadsheet, presentations, drawing tool) compatible with other major office suites. The Document Foundation is coordinating development and... Read more
Thunderbird 68.10.0 - Email client from...
As of July 2012, Thunderbird has transitioned to a new governance model, with new features being developed by the broader free software and open source community, and security fixes and improvements... Read more

Latest Forum Discussions

See All

Distract Yourself With These Great Mobil...
There’s a lot going on right now, and I don’t really feel like trying to write some kind of pithy intro for it. All I’ll say is lots of people have been coming together and helping each other in small ways, and I’m choosing to focus on that as I... | Read more »
Pokemon Go's July Community Day wil...
Pokemon Go developers have announced the details concerning the upcoming Gastly Community Day. This particular event was selected by the players of the game after the Gas Pokemon came in second place after a poll that decided which Pokemon would... | Read more »
Clash Royale: The Road to Legendary Aren...
Supercell recently celebrated its 10th anniversary and their best title, Clash Royale, is as good as it's ever been. Even for lapsed players, returning to the game is as easy as can be. If you want to join us in picking the game back up, we've put... | Read more »
Detective Di is a point-and-click murder...
Detective Di is a point-and-click murder mystery set in Tang Dynasty-era China. You'll take on the role of China's best-known investigator, Di Renjie, as he solves a series of grisly murders that will ultimately lead him on a collision course with... | Read more »
Dissidia Final Fantasy Opera Omnia is se...
Dissidia Final Fantasy Opera Omnia, one of Square Enix's many popular mobile RPGs, has announced a plethora of in-game events that are set to take place over the summer. This will include several rewards, Free Multi Draws and more. [Read more] | Read more »
Sphaze is a neat-looking puzzler where y...
Sphaze is a neat-looking puzzler where you'll work to guide robots through increasingly elaborate mazes. It's set in a visually distinct world that's equal parts fantasy and sci-fi, and it's finally launched today for iOS and Android devices. [... | Read more »
Apple Arcade is in trouble
Yesterday, Bloomberg reported that Apple is disappointed in the performance of Apple Arcade and will be shifting their approach to the service by focusing on games that can retain subscribers and canceling other upcoming releases that don't fit... | Read more »
Pixel Petz, an inventive platform for de...
Pixel Petz has built up a sizeable player base thanks to its layered, easy-to-understand creative tools and friendly social experience. It revolves around designing, trading, and playing with a unique collection of pixel art pets, and it's out now... | Read more »
The King of Fighters Allstar's late...
The King of Fighters ALLSTAR, Netmarble's popular action RPG, has once again been updated with a plethora of new content. This includes battle cards, events and 21 new fighters, which increases the already sizeable roster even more. [Read more] | Read more »
Romancing SaGa Re;univerSe, the mobile s...
Square Enix latest mobile spin-off Romancing SaGa Re;univerSe is available now globally for both iOS and Android. It initially launched in Japan back in 2018 where it's proven to be incredibly popular, so now folks in the West can finally see what... | Read more »

Price Scanner via MacPrices.net

$200 13″ MacBook Pro discounts are back at Am...
Amazon has 2020 13″ 2.0GHz MacBook Pros on sale again today for $150-$200 off Apple’s MSRP. Shipping is free. Be sure to purchase the MacBook Pro from Amazon, rather than a third-party seller, and... Read more
Deal Alert! Apple AirPods with Wireless Charg...
Sams Club has Apple AirPods with Wireless Charging Case on sale on their online store for only $149.98 from July 6, 2020 to July 9, 2020. Their price is $50 off Apple’s MSRP, and it’s the lowest... Read more
Xfinity Mobile promo: Apple iPhone XS models...
Take $300 off the purchase of any Apple iPhone XS model at Xfinity Mobile while supplies last. Service plan required: – 64GB iPhone XS: $599.99 save $300 – 256GB iPhone XS: $749.99 save $300 – 512GB... Read more
New July 2020 promo at US Cellular: Switch an...
US Cellular has introduced a new July 2020 deal offering free 64GB Apple iPhone 11 smartphones to customers opening a new line of service. No trade-in required, and discounts are applied via monthly... Read more
Apple offers up to $400 Education discount on...
Apple has launched their Back to School promotion for 2020. They will include one free pair Apple AirPods (with charging case) with the purchase of a MacBook Air, MacBook Pro, iMac, or iMac Pro (Mac... Read more
July 4th Sale: Woot offers wide range of Macs...
Amazon-owned Woot is blowing out a wide range of Apple Macs and iPads for July 4th staring at $279 and ranging up to just over $1000. Models vary from older iPads and 11″ MacBook Airs to some newer... Read more
Apple Pro Display XDR with Nano-Texture Glass...
Abt Electronics has Apple’s new 32″ Pro Display XDR model with the nano-texture glass in stock and on sale today for up to $144 off MSRP. Shipping is free: – Pro Display XDR (nano-texture glass): $... Read more
New 2020 Mac mini on sale for up to $100 off...
Amazon has Apple’s new 2020 Mac minis on sale today for $40-$100 off MSRP with prices starting at $759. Shipping is free: – 2020 4-Core Mac mini: $759 $40 off MSRP – 2020 6-Core Mac mini: $998.99 $... Read more
July 4th Sale: $100 off every 2020 13″ MacBoo...
Apple resellers have new 2020 13″ MacBook Airs on sale for $100 off Apple’s MSRP as part of their July 4th sales. Starting at $899, these are the cheapest new 2020 MacBooks for sale anywhere: (1) B... Read more
This hidden deal on Apple’s site can save you...
Are you a local, state, or federal government employee? If so, Apple offers special government pricing on their products, including AirPods, for you as well as immediate family members. Here’s how... Read more

Jobs Board

Operating Room Assistant, *Apple* Hill Surg...
Operating Room Assistant, Apple Hill Surgical Center - Full Time, Day Shift, Monday - Saturday availability required Tracking Code 62363 Job Description Operating Read more
Perioperative RN - ( *Apple* Hill Surgical C...
Perioperative RN - ( Apple Hill Surgical Center) Tracking Code 60593 Job Description Monday - Friday - Full Time Days Possible Saturdays General Summary: Under the Read more
Product Manager, *Apple* Commercial Sales -...
Product Manager, Apple Commercial Sales Austin, TX, US Requisition Number:77652 As an Apple Product Manager for the Commercial Sales team at Insight, you Read more
*Apple* Mac Product Engineer - Barclays (Uni...
Apple Mac EngineerWhippany, NJ Support the development and delivery of solutions, products, and capabilities into the Barclays environment working across technical Read more
Blue *Apple* Cafe Student Worker - Pennsylv...
…enhance your work experience. Student positions are available at the Blue Apple Cafe. Employee meal discount during working hours. Duties include food preparation, Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.