The PowerPC

Volume Number:		10
Issue Number:		2
Column Tag:		Powering Up

The PowerPC

From CISC to RISC

By Richard Clark & Jordan Mattson, Apple Computer, Inc.

The Heart of the Next Generation

The forthcoming generation of Macintosh systems will be powered by the PowerPC family of RISC microprocessors. Apple’s decision to make this change wasn’t undertaken lightly. This month’s Powering Up will examine the differences between CISC and RISC, take a look at the PowerPC family of microprocessors, and close with a overview of the architecture of the first PowerPC implementation - the PowerPC “601”. This should help explain why Apple is making such a dramatic change to the Macintosh product line.

A brief history of the (CISC) universe

The earliest microcomputers were designed to be easy to program in assembly language and were designed to conserve memory, which was expensive and slow to access. (They were also designed according to the limited manufacturing techniques available.) This led to chips that had:

• Very few registers - often only an “accumulator” and one or two general-purpose registers

• “Complex” instructions that allowed assembly language programmers to write programs using a small number of these instructions instead of a large number of simpler instructions (this also conserved memory)

• “Variable length” instructions where the instruction (often 1 byte long) would be followed by the information needed by that instruction

• Multiple styles of accessing memory, known as “addressing modes,” which allowed programmers to access individual locations directly, via a pointer, by an offset to a pointer, by combining pointers, and so on

These processors also executed instructions serially - each instruction had to complete before the next instruction could begin.

As microprocessors evolved, from 8 bits to 16 bits to 32 bits, each new generation added more registers, more addressing modes, and new instructions; some chips even added a limited form of Pipelining - the ability to execute multiple instructions at once. But, the basic design was still oriented towards conserving memory and serving the needs of the assembly-language programmer, often at the expense of speed.

Enter RISC

In the early 1980s, several designers noticed that microprocessor design hadn’t kept up with the rest of the system. Memory was faster and much less expensive, assembly-languages had been replaced largely by such “high-level languages” as C and Pascal, and existing designs were pushing the limits of what could be manufactured. So they went back to the drawing boards, and came out with simpler designs that were optimized for speed and for use with high-level languages. These new designs used instruction sets made up of many simple instructions, and thus were dubbed “reduced” instruction set computers.

While the exact meaning of “RISC” is still a subject for debate, most RISC designs include:

• A large number of general purpose registers, and few special-purpose registers

• Instruction sets which are well matched to the needs of compilers, and which contain many “simple” instructions

• Instructions which fit completely in a single “word” (including the data used by the instruction), and which are encoded in an easy to process format.

• A “load/store” architecture, where information has to be loaded into registers before it can be used

• A small number of memory addressing modes, often only one or two, which use a pointer in one of the registers

These features allow most RISC implementations to apply a few simple techniques to get maximum performance:

• Pipelining, so the processor can process multiple instructions simultaneously

• Memory caches, which provide faster access to instructions and data than system RAM or ROM

• Restrictions on data alignment, where the processor requires that all two-byte values be aligned on an even address, all four-byte values be aligned on an even multiple of four, and so on.

The PowerPC is a RISC design which has all of these “common” RISC features, except that it relaxes the rules for memory alignment.

The PowerPC - An Overview

The PowerPC architecture is a collaborative effort of Apple, IBM, and Motorola to create a new generation of high performance microprocessors which can used in everything from personal computers, workstations, servers, and multiprocessor systems to embedded microcontrollers.

The PowerPC is based on IBM’s highly successful POWER architecture. The POWER architecture was designed for scientific workstations, and has been optimized for both integer and floating-point math operations. The POWER architecture also incorporates a “branch processor” which attempts to minimize the impact of branch instructions on the processor’s performance.

When the Apple-IBM-Motorola consortium set out to design PowerPC, the members modified the POWER architecture to reduce manufacturing costs and make the design more suitable for desktop computers. They eliminated parts of the POWER instruction set that made the POWER architecture more difficult to implement but had a minimal impact on performance. While the architects were modifying the instructions set for the architecture, they also removed dependencies between instructions, and added features which simplified building multi-processor systems.

The result of the these changes is a low-cost, high-performance RISC architecture with:

• Fixed length, consistently encoded instructions

• A register-to-register (load/store) architecture, with support for aligned data accesses, misaligned data accesses, and both big-endian and little-endian data

• A “simple” instruction set, with instructions which may be tailored to the task at hand (for example, setting the condition codes at the end of an arithmetic operation is an option, not a requirement)

• Simple, yet powerful, addressing modes applied consistently across the instruction set

• A large register set which includes both general-purpose and floating-point registers

• Floating-point as a first-class data type. This means that floating-point is a standard part of the architecture and therefore is better integrated than it is in many other RISC architecturers

Some of these features - notably the mis-aligned data support and the dual big-endian / little-endian support - are unusual in a RISC design, but were required to support past and future Macintosh designs.

The PowerPC Family of Microprocessors

The PowerPC family currently has the following four members:

601 - The 601 is a fusion of the POWER architecture and the PowerPC architecture. It is designed to drive mainstream desktop systems. A Macintosh with a 601 will deliver integer performance three to five times that of today’s high-end 68040-based Macintosh systems and floating point performance around ten times that of today’s high-end 68040-based Macintosh systems.

603 - The 603 is the first PowerPC only implementation of the PowerPC architecture. It is designed for low-cost and low-power consumption. The 603 will be used in portable and low-cost desktop Macintosh with PowerPC systems. In many ways, over time the 603 could become Apple’s replacement for the 68030.

604 - The 604 is designed for mainstream desktop personal computers. It will cost about as much as the 601, but will deliver higher performance.

620 - The 620, which is currently still in the design phase, is a high-performance microprocessor that Motorola and IBM believes will be well-suited for very high-end personal computers, workstations, servers, and multiprocessor systems.

The PowerPC 601 in Context and why Apple likes RISC

Many developers and customers have been asking how the 601 stacks up against Intel’s state-of-the-art CISC design, the “Pentium.” On a basis of price, performance, and power consumption, the PowerPC 601 compares quite favorably. As you can see from Table 1, the 601 delivers integer performance that matches and floating-point performance that exceeds Pentium’s for about half the cost. In addition it consumes about half the power of Pentium.

Pentium PowerPC 601

Frequency 66 MHz 66 MHz

Die Size 264 mm2 120 mm2

Cache 16K 32K

Power 14 Watts 9 Watts

SPECInt92 64 60

SPECfp92 57 80

Price $950.00 $450.00

This comparison should give you some idea why Apple is staking such a large part of its future on RISC. The PowerPC 601 is the first of its generation (though it does descend from previous RISC architectures), yet matches the performance of the latest CISC chips - and the next PowerPC implementation (603) is well under way. While CISC designers have to work increasingly hard to squeeze more performance out of their designs, at an ever increasing manufacturing cost, RISC designs have considerable room for growth. The evolution of RISC designs has the potential to outstrip the evolution of CISC.

A Quick Tour of the 601

Every PowerPC design begins with the fundamental architecture shown in Figure 1, with some chip-specific details. For example, the 601 incorporates single 32K cache which holds both instructions and data, while other PowerPC models are likely to separate the two caches as shown. Also, future implementations may include multiple arithmetic logic units in both the fixed-point and floating-point units, allowing multiple arithmetic operations to proceed simultaneously.

Figure 1 - A General Diagram of the PowerPC Architecture

Each of these units has a specific purpose:

• The Branch Unit collects instructions from the Instruction Queue, then locates and removes any branches from the instruction stream before sending instructions to the Fixed-Point and Floating-Point units. Unconditional branches can be removed from the instruction stream, while conditional branches (i.e. part of an “if” statement or a loop) might require the branch unit to “predict” the outcome of the branch. In any case, the branch unit tries to provide an uninterrupted stream of instructions to the units downstream.

• The Fixed-Point unit holds the 32 General-Purpose registers (including one which is used as the Stack Pointer, and another which resembles register A5 in a 68K-based Macintosh.) Each register is one “word” wide, where a word is 32 bits on a 32-bit PowerPC (601/603/604) and 64 bits on the 620 implementation.

The fixed-point unit also holds the Fixed-Point arithmetic unit. This unit implements the standard addition, subtraction, multiplication, and division operations, as well as some comparison, logical, and shift/rotate instructions.

On the 601, the Fixed-Point unit also manages the transfers of data between memory (the Data cache) and the internal registers. This function may be implemented in a separate functional unit on future PowerPC implementations. (Note that even though the Fixed-Point unit manages load and store operations, data cannot be transferred directly between the Fixed-Point and Floating-Point units - the transfer must go through memory.)

The Fixed-Point unit also serves to calculate addresses for use by the Branch Unit.

• The Floating-Point unit holds the 32 floating-point registers and the Floating-Point arithmetic unit. Each register is 64 bits wide (a “double precision” floating-point value), but can hold single-precision (32-bit) values as well.

The Floating-Point unit implements addition, multiplication, and division, combining addition/subtraction and multiplication into a single “multiply and accumulate” unit. This design fits well with most scientific computing needs, where a common operation involves multiplying two values and then adding the result to a running total.

Since the processor contains multiple functional units, each one of which can execute an instruction independently of the others, this is a variety of “multiple-issue” design (where multiple instructions may be executed in a single clock.) Under ideal conditions, the 601 can execute 3 instructions in a single clock - a branch instruction, a floating-point instruction, and a fixed-point instruction.

Optimizing Code for the PowerPC

One of the ways that a programmer can take advantage of the design of the PowerPC is by instruction scheduling - arranging instructions so that each functional unit can run without stopping to wait for information or another unit. The PowerPC compilers are designed to use instruction scheduling to create the smallest, quickest applications possible.

For example, a compiler has to implement an “if” statement using at least two operations - performing a test (which sets the appropriate condition codes) followed by a “conditional branch” instruction. Whenever possible, the compiler will schedule some operations to occur between the test and the branch instruction, which gives the branch unit time to forsee the branch, access the condition codes, and predict the outcome of the branch perfectly.

Another example involves loading registers well before they are actually needed, which gives the load operation time to complete (which may require several clock cycles if it has to go to RAM.)

A final example involves allocating “scratch” registers within a function. The Runtime Architecture designates several registers as “volatile”, i.e. not saved across function calls. The compiler can look at a group of functions which are compiled together, and locate which volatile registers are not changed across calls to a particular function, and thereby use that as a scratch register in the calling function.

All of these optimizations require an in-depth knowledge of the processor, and the ability to see the entire structure of a single compiled file. The compiler writers are able to build the myriad rules for instruction scheduling right into the compiler, and the compiler can keep track of the code it generates. Because of this, the compiler often generates better code than an assembly-language programmer will. In fact, Apple suggests that programmers move their entire program into a high-level language (probably portable ANSI C or C++) and only move to assembler those parts which absolutely cannot be expressed in a high-level language.

Next Month in Powering Up

The second most frequently asked questions about Macintosh with PowerPC - after, “When can I buy one?” - are “How can I program one” and “What is the average user going to do with that much power?” In next month’s column, we’ll take a look at the development tools for PowerPC and some applications which show off the PowerPC performance to good advantage.