The PowerPC
Volume Number: | | 10
|
Issue Number: | | 2
|
Column Tag: | | Powering Up
|
The PowerPC
From CISC to RISC
By Richard Clark & Jordan Mattson, Apple Computer, Inc.
The Heart of the Next Generation
The forthcoming generation of Macintosh systems will be powered by the PowerPC family of RISC microprocessors. Apples decision to make this change wasnt undertaken lightly. This months Powering Up will examine the differences between CISC and RISC, take a look at the PowerPC family of microprocessors, and close with a overview of the architecture of the first PowerPC implementation - the PowerPC 601. This should help explain why Apple is making such a dramatic change to the Macintosh product line.
A brief history of the (CISC) universe
The earliest microcomputers were designed to be easy to program in assembly language and were designed to conserve memory, which was expensive and slow to access. (They were also designed according to the limited manufacturing techniques available.) This led to chips that had:
Very few registers - often only an accumulator and one or two general-purpose registers
Complex instructions that allowed assembly language programmers to write programs using a small number of these instructions instead of a large number of simpler instructions (this also conserved memory)
Variable length instructions where the instruction (often 1 byte long) would be followed by the information needed by that instruction
Multiple styles of accessing memory, known as addressing modes, which allowed programmers to access individual locations directly, via a pointer, by an offset to a pointer, by combining pointers, and so on
These processors also executed instructions serially - each instruction had to complete before the next instruction could begin.
As microprocessors evolved, from 8 bits to 16 bits to 32 bits, each new generation added more registers, more addressing modes, and new instructions; some chips even added a limited form of Pipelining - the ability to execute multiple instructions at once. But, the basic design was still oriented towards conserving memory and serving the needs of the assembly-language programmer, often at the expense of speed.
Enter RISC
In the early 1980s, several designers noticed that microprocessor design hadnt kept up with the rest of the system. Memory was faster and much less expensive, assembly-languages had been replaced largely by such high-level languages as C and Pascal, and existing designs were pushing the limits of what could be manufactured. So they went back to the drawing boards, and came out with simpler designs that were optimized for speed and for use with high-level languages. These new designs used instruction sets made up of many simple instructions, and thus were dubbed reduced instruction set computers.
While the exact meaning of RISC is still a subject for debate, most RISC designs include:
A large number of general purpose registers, and few special-purpose registers
Instruction sets which are well matched to the needs of compilers, and which contain many simple instructions
Instructions which fit completely in a single word (including the data used by the instruction), and which are encoded in an easy to process format.
A load/store architecture, where information has to be loaded into registers before it can be used
A small number of memory addressing modes, often only one or two, which use a pointer in one of the registers
These features allow most RISC implementations to apply a few simple techniques to get maximum performance:
Pipelining, so the processor can process multiple instructions simultaneously
Memory caches, which provide faster access to instructions and data than system RAM or ROM
Restrictions on data alignment, where the processor requires that all two-byte values be aligned on an even address, all four-byte values be aligned on an even multiple of four, and so on.
The PowerPC is a RISC design which has all of these common RISC features, except that it relaxes the rules for memory alignment.
The PowerPC - An Overview
The PowerPC architecture is a collaborative effort of Apple, IBM, and Motorola to create a new generation of high performance microprocessors which can used in everything from personal computers, workstations, servers, and multiprocessor systems to embedded microcontrollers.
The PowerPC is based on IBMs highly successful POWER architecture. The POWER architecture was designed for scientific workstations, and has been optimized for both integer and floating-point math operations. The POWER architecture also incorporates a branch processor which attempts to minimize the impact of branch instructions on the processors performance.
When the Apple-IBM-Motorola consortium set out to design PowerPC, the members modified the POWER architecture to reduce manufacturing costs and make the design more suitable for desktop computers. They eliminated parts of the POWER instruction set that made the POWER architecture more difficult to implement but had a minimal impact on performance. While the architects were modifying the instructions set for the architecture, they also removed dependencies between instructions, and added features which simplified building multi-processor systems.
The result of the these changes is a low-cost, high-performance RISC architecture with:
Fixed length, consistently encoded instructions
A register-to-register (load/store) architecture, with support for aligned data accesses, misaligned data accesses, and both big-endian and little-endian data
A simple instruction set, with instructions which may be tailored to the task at hand (for example, setting the condition codes at the end of an arithmetic operation is an option, not a requirement)
Simple, yet powerful, addressing modes applied consistently across the instruction set
A large register set which includes both general-purpose and floating-point registers
Floating-point as a first-class data type. This means that floating-point is a standard part of the architecture and therefore is better integrated than it is in many other RISC architecturers
Some of these features - notably the mis-aligned data support and the dual big-endian / little-endian support - are unusual in a RISC design, but were required to support past and future Macintosh designs.
The PowerPC Family of Microprocessors
The PowerPC family currently has the following four members:
601 - The 601 is a fusion of the POWER architecture and the PowerPC architecture. It is designed to drive mainstream desktop systems. A Macintosh with a 601 will deliver integer performance three to five times that of todays high-end 68040-based Macintosh systems and floating point performance around ten times that of todays high-end 68040-based Macintosh systems.
603 - The 603 is the first PowerPC only implementation of the PowerPC architecture. It is designed for low-cost and low-power consumption. The 603 will be used in portable and low-cost desktop Macintosh with PowerPC systems. In many ways, over time the 603 could become Apples replacement for the 68030.
604 - The 604 is designed for mainstream desktop personal computers. It will cost about as much as the 601, but will deliver higher performance.
620 - The 620, which is currently still in the design phase, is a high-performance microprocessor that Motorola and IBM believes will be well-suited for very high-end personal computers, workstations, servers, and multiprocessor systems.
The PowerPC 601 in Context and why Apple likes RISC
Many developers and customers have been asking how the 601 stacks up against Intels state-of-the-art CISC design, the Pentium. On a basis of price, performance, and power consumption, the PowerPC 601 compares quite favorably. As you can see from Table 1, the 601 delivers integer performance that matches and floating-point performance that exceeds Pentiums for about half the cost. In addition it consumes about half the power of Pentium.
Pentium PowerPC 601
Frequency 66 MHz 66 MHz
Die Size 264 mm2 120 mm2
Cache 16K 32K
Power 14 Watts 9 Watts
SPECInt92 64 60
SPECfp92 57 80
Price $950.00 $450.00
This comparison should give you some idea why Apple is staking such a large part of its future on RISC. The PowerPC 601 is the first of its generation (though it does descend from previous RISC architectures), yet matches the performance of the latest CISC chips - and the next PowerPC implementation (603) is well under way. While CISC designers have to work increasingly hard to squeeze more performance out of their designs, at an ever increasing manufacturing cost, RISC designs have considerable room for growth. The evolution of RISC designs has the potential to outstrip the evolution of CISC.
A Quick Tour of the 601
Every PowerPC design begins with the fundamental architecture shown in Figure 1, with some chip-specific details. For example, the 601 incorporates single 32K cache which holds both instructions and data, while other PowerPC models are likely to separate the two caches as shown. Also, future implementations may include multiple arithmetic logic units in both the fixed-point and floating-point units, allowing multiple arithmetic operations to proceed simultaneously.
Figure 1 - A General Diagram of the PowerPC Architecture
Each of these units has a specific purpose:
The Branch Unit collects instructions from the Instruction Queue, then locates and removes any branches from the instruction stream before sending instructions to the Fixed-Point and Floating-Point units. Unconditional branches can be removed from the instruction stream, while conditional branches (i.e. part of an if statement or a loop) might require the branch unit to predict the outcome of the branch. In any case, the branch unit tries to provide an uninterrupted stream of instructions to the units downstream.
The Fixed-Point unit holds the 32 General-Purpose registers (including one which is used as the Stack Pointer, and another which resembles register A5 in a 68K-based Macintosh.) Each register is one word wide, where a word is 32 bits on a 32-bit PowerPC (601/603/604) and 64 bits on the 620 implementation.
The fixed-point unit also holds the Fixed-Point arithmetic unit. This unit implements the standard addition, subtraction, multiplication, and division operations, as well as some comparison, logical, and shift/rotate instructions.
On the 601, the Fixed-Point unit also manages the transfers of data between memory (the Data cache) and the internal registers. This function may be implemented in a separate functional unit on future PowerPC implementations. (Note that even though the Fixed-Point unit manages load and store operations, data cannot be transferred directly between the Fixed-Point and Floating-Point units - the transfer must go through memory.)
The Fixed-Point unit also serves to calculate addresses for use by the Branch Unit.
The Floating-Point unit holds the 32 floating-point registers and the Floating-Point arithmetic unit. Each register is 64 bits wide (a double precision floating-point value), but can hold single-precision (32-bit) values as well.
The Floating-Point unit implements addition, multiplication, and division, combining addition/subtraction and multiplication into a single multiply and accumulate unit. This design fits well with most scientific computing needs, where a common operation involves multiplying two values and then adding the result to a running total.
Since the processor contains multiple functional units, each one of which can execute an instruction independently of the others, this is a variety of multiple-issue design (where multiple instructions may be executed in a single clock.) Under ideal conditions, the 601 can execute 3 instructions in a single clock - a branch instruction, a floating-point instruction, and a fixed-point instruction.
Optimizing Code for the PowerPC
One of the ways that a programmer can take advantage of the design of the PowerPC is by instruction scheduling - arranging instructions so that each functional unit can run without stopping to wait for information or another unit. The PowerPC compilers are designed to use instruction scheduling to create the smallest, quickest applications possible.
For example, a compiler has to implement an if statement using at least two operations - performing a test (which sets the appropriate condition codes) followed by a conditional branch instruction. Whenever possible, the compiler will schedule some operations to occur between the test and the branch instruction, which gives the branch unit time to forsee the branch, access the condition codes, and predict the outcome of the branch perfectly.
Another example involves loading registers well before they are actually needed, which gives the load operation time to complete (which may require several clock cycles if it has to go to RAM.)
A final example involves allocating scratch registers within a function. The Runtime Architecture designates several registers as volatile, i.e. not saved across function calls. The compiler can look at a group of functions which are compiled together, and locate which volatile registers are not changed across calls to a particular function, and thereby use that as a scratch register in the calling function.
All of these optimizations require an in-depth knowledge of the processor, and the ability to see the entire structure of a single compiled file. The compiler writers are able to build the myriad rules for instruction scheduling right into the compiler, and the compiler can keep track of the code it generates. Because of this, the compiler often generates better code than an assembly-language programmer will. In fact, Apple suggests that programmers move their entire program into a high-level language (probably portable ANSI C or C++) and only move to assembler those parts which absolutely cannot be expressed in a high-level language.
Next Month in Powering Up
The second most frequently asked questions about Macintosh with PowerPC - after, When can I buy one? - are How can I program one and What is the average user going to do with that much power? In next months column, well take a look at the development tools for PowerPC and some applications which show off the PowerPC performance to good advantage.
Further reading
Space limitations have forced us to give you just the briefest of sketches on the evolution of RISC and the design features of the PowerPC 601. For more information on either of these topics, please consult the PowerPC RISC Microprocessor Users Manual, published by Motorola (part number MPC601UM/AD) which is included in the Macintosh with PowerPC Starter Kit available from APDA, and the Programmers Introduction to RISC and PowerPC CD-ROM available from APDA.