TweetFollow Us on Twitter

December 93 - MAKING THE LEAP TO POWERPC

MAKING THE LEAP TO POWERPC

DAVE RADCLIFFE

[IMAGE Radcliffe_final_rev1.GIF]

Apple will soon be introducing the first Macintosh CPU architecture not based on a 68000-family microprocessor. The entirely new architecture is built around a new RISC CPU -- the PowerPC microprocessor jointly designed by IBM, Motorola, and Apple. Truly taking advantage of PowerPC technology will require an ongoing effort by both Apple and developers. Apple is making the first leap to this new platform; now it's up to developers to make the next leap and bring the performance made possible by PowerPC technology to their applications.


In 1984, Apple Computer offered a startling vision of the future of personal computing by introducing the Macintosh, which radically changed the desktop. Now, nearly ten years later, the computing world embraces graphical interfaces. Ten years is a lifetime in computing terms; at that age, many computing architectures are considered ancient. The Macintosh enters its second decade by looking to the future while remembering its past -- making the transition from the sturdy Motorola 68000 family to the sleek new PowerPC processor-based family without forsaking developers and users and their investment in the 680x0 architecture.

The PowerPC microprocessor is the most significant change to date in the Macintosh product line. This article introduces the new PowerPC architecture and discusses the ramifications for existing applications, as well as opportunities for new or revised applications to take full advantage of the power of the new chip. It contrasts the new architecture with the old and explains how this new architecture both acknowledges the past and prepares for the future.

COMPARING CISC AND RISC

Much has been written about the differences between a CISC (complex instruction set computer) architecture, used in Motorola's MC680x0 processors, and a RISC (reduced instruction set computer) architecture, used in the PowerPC microprocessor. The relative merits of the two architectures have also been widely debated. A detailed discussion of CISC and RISC is beyond the scope of this article, but some understanding of RISC principles is useful for understanding PowerPC architecture.

Two logical considerations motivated CISC development. The first was a desire to simplify assembly-language programming by enriching the functionality of the instruction set. CISC architectures did this by providing a greater variety of instructions, as well as a wide array of addressing modes, thereby reducing the number of steps required to perform a particular operation. Second, as writing compilers became easier, there was a desire to provide instructions more closely related to operations performed by high-level languages. CISC architectures were marvelously successful at satisfying this goal also. In the early 1980s, hardware designers began to run into the limitations inherent in CISC architectures, particularly in their ability to streamline the flow of instructions. At the same time, the software world was deemphasizing assembly-language programming in favor of high-level languages with sophisticated, optimizing compilers. This allowed hardware designers to simplify their architecture and shift much of the performance burden to compiler writers.

The classic equation for execution time is whereET is the total execution time,N is the number of instructions executed,CPI is the number of cycles per instruction, andCT is the cycle time. Both CISC and RISC architectures benefit from reduced cycle time. Faster clock rates translate directly to smaller cycle times, and hence shorter execution times. Where CISC and RISC architectures differ is in their approach toN and CPI. CISC tries to shorten execution times by minimizingN, while RISC tries to minimizeCPI.

PIPELINING
The four typical stages in executing an instruction are fetch, decode, execute, and write. In a simplistic architecture, these stages all happen in sequence, and the next instruction can't start until the previous instruction has finished, as shown in Figure 1. Designers realized that this need not be the case and that each of these stages can overlap. Once an instruction is fetched and passed to the decode stage, the next instruction can be fetched without waiting for the first instruction to complete. This technique, known aspipelining, is shown in Figure 2.

The example in Figure 2 executes the same two instructions, but in only nine cycles, compared to 12 cycles in the nonpipelined case. There's a curious thing about this example, though: the second instruction takes eight cycles to complete when pipelined, but only five when it's not. This is because the various stages take different amounts of time to complete. The overall result is better, but unnecessary delays can occur in instruction execution.

[IMAGE Radcliffe_final_rev2.GIF]

Figure 1 Nonpipelined Stages of Execution

[IMAGE Radcliffe_final_rev3.GIF]

Figure 2 Pipelined Stages of Execution





Variable numbers of cycles per stage is a characteristic of CISC architectures. Complex instructions may occupy multiple words, requiring multiple cycles to fetch. Multiple operands complicate the process of decoding. More complicated instructions take longer to execute than simpler instructions. In Figure 2, the execute stage of the second instruction is delayed two cycles while waiting for the first instruction to execute. This is known as a pipelinestall. Similarly, the write stage sits idle for one cycle between the first and second instructions while waiting for the execute stage of the second instruction to complete. This is known as a pipelinebubble. Both stalls and bubbles reduce the efficiency of the pipeline and increase the overall number of cycles per instruction.

INCREASING PIPELINE EFFICIENCY
RISC architectures work very hard to eliminate inefficiencies in the instruction pipeline and keep the pipeline jammed full. RISC architectures share most or all of the following common features:

  • Instructions are a uniform length. Variable-length instructions in CISC architectures mean that time must be spent just figuring out how long the instruction is and how many operands it uses. RISC architectures don't have that problem.
  • Simplified instructions, instruction formats, and addressing modes allow for fast instruction decoding and execution.
  • Relatively large numbers of registers and large amounts of fast-cache memory reduce cycles spent for access to slower, main memory and allow frequently used variables to be kept loaded.
  • Load/store architecture is used for access to memory. The only memory-to- register and register-to-memory operations are load and store instructions. All other operations are register only. Register-to-memory and memory-to-memory operations in CISC architectures require multiple cycles to complete.
  • Instructions are simple. In an ideal RISC machine, each stage requires one cycle to complete.
  • For improved performance, instructions can be implemented directly in hardware instead of being microprogrammed as in CISC processors.

Figure 3 shows an example of executing instructions on a nonpipelined RISC machine. When instructions are not pipelined, they complete serially, with two instructions completing in eight cycles. The optimal case for pipelining instructions is shown in Figure 4. Now you have the two instructions executing in just five cycles. If the pipeline is kept full like this, the number of cycles per instruction drops to just one. This is the goal of most RISC architectures.

[IMAGE Radcliffe_final_rev4.GIF]

Figure 3 RISC Nonpipelined Stages of Execution

[IMAGE Radcliffe_final_rev5.GIF]

Figure 4 RISC Pipelined Stages of Execution





One cycle per instruction is the ideal case for this example, but in reality, stalls and bubbles occur, even in the best architectures. This is where the compiler comes into play. The compiler has detailed knowledge of how the program should work. It need not perform operations in the order specified in the source code; it need only guarantee that the right result is obtained. If you build into the compiler some knowledge of how to make best use of the CPU, the compiler can make a huge difference in program performance.

Consider the following two C instructions:

b = *a + 5;
d = *c + 10;
The variablesa, b, c, and dare all long or pointer-to-long variables. The compiler might generate the following assembly instructions on the PowerPC microprocessor:

lwz     r5,0(r3)        ; Load value pointed to by r3 into r5
addi    r5,r5,0x0005    ; Add 5 to value in r5
lwz     r6,0(r4)        ; Load value pointed to by r4 into r6
addi    r6,r6,0x000a    ; Add 10 to value in r6

The lwz instruction (Load Word and Zero) loads a register from a source value. On a PowerPC processor, words are 32-bit values; 16-bit values are half words. Theaddiinstruction (Add Immediate) adds the immediate value and stores the result.

Figure 5 shows what happens when these instructions execute. Bothaddiinstructions stall in the decode stage because they can't enter the execute stage until the register is available from thelwzinstruction.

The compiler can prevent the stalls. Instead of following the flow of the original source code, you can rearrange the instructions as follows:

lwz     r5,0(r3)        ; Load value pointed to by r3 into r5
lwz     r6,0(r4)        ; Load value pointed to by r4 into r6
addi    r5,r5,0x0005    ; Add 5 to value in r5
addi    r6,r6,0x000a    ; Add 10 to value in r6

Now look at what happens to the instruction pipeline (Figure 6): there are no delays. By moving the add instructions to later in the instruction stream, you allow the load instructions they depend on to complete, so the add instructions can execute immediately.

[IMAGE Radcliffe_final_rev6.GIF]

Figure 5 Stalled Pipelined Execution

[IMAGE Radcliffe_final_rev7.GIF]

Figure 6 No-Delay Pipelined Execution

BRANCHING
All pipelined architectures face the problem of branches. Any time a conditional branch is encountered, the processor faces a dilemma because now two instruction streams are possible. It can't pipeline both possible paths. It can guess which path to take, but if it guesses wrong, the pipeline is disrupted.

One common approach to this problem is a technique calleddelayed branching. In delayed branching, the processoralwaysexecutes the instruction immediately following the branch instruction. While starting this instruction, the CPU can be figuring out the destination of the branch instruction and so can keep the pipeline flowing. Of course, it's important that the instruction after the branch not affect the branch. It's up to the compiler to find an instruction unrelated to the branch instruction to fill this delay slot. If it can't fill the delay slot, the compiler can always put in a no-op instruction, but this is inefficient. Some architectures allow the instruction in the delay slot to be ignored if the branch is taken. This avoids the need to fill the delay slot with a no-op instruction, but undermines the purpose of delayed branching. PowerPC architecture takes a unique approach to the branching problem, as discussed later in the section "Branch Processor."

SUPERSCALAR DESIGN
Another technique RISC designers use to increase performance is superscalar or multi-issue design. The simpler design of RISC architectures makes it possible to build in multiple processing units; this is superscalar design. In the same way that the compiler can juggle instructions to avoid resource constraints, the CPU can now reduce bottlenecks and achieve higher performance by feeding instructions to separate processing units operating in parallel. This allows average instruction cycle times to drop below one cycle per instruction. PowerPC microprocessors use this technique as discussed later in the section "Functional Units of the PowerPC Microprocessor."

RISC ADVANTAGES
One last point needs to be made before leaving a comparison of CISC and RISC. Many of the techniques used by RISC designers can and are used by CISC designers. Modern CISC chips such as the MC68040 and Intel 80486 make extensive use of instruction pipelining, parallel integer and floating-point units, fast cache architectures, and resource constraint reduction (such as delayed writes) to achieve the performance they do. But the sheer complexity of the designs means they're hard to implement (and implement correctly), which often results in long development cycles. The simplicity of RISC architecture helps avoid this problem.

Similarly, the compiler can aid CISC machine performance. But the complexity of CISC design means it's nearly impossible to determine instruction timing, so it's difficult for the compiler to choose the best instruction sequence. Instruction scheduling is also possible but more difficult. The finer granularity of the RISC instruction set gives the compiler much more flexibility and control over the resources provided by the CPU.

Simplified hardware and the influence of the compiler are really the ultimate advantages of RISC.

POWERPC CPU ARCHITECTURE

PowerPC architecture is a modern 64-bit, RISC architecture adhering to all the previously discussed design goals. It has 32 general-purpose and 32 floating-point registers. All instructions have a uniform 32-bit length. The first PowerPC microprocessor, the PowerPC 601, is a superscalar implementation of the 32-bit subset of this architecture.

POWERPC VERSUS POWER
The PowerPC microprocessor is a single-chip design descended from an earlier, multichip IBM RISC implementation known as POWER. It's worth mentioning the differences between the two architectures.

  • Misaligned data access. Most RISC architectures require all data access to be word (4-byte) aligned. POWER was ambiguous regarding data alignment. PowerPC architecture explicitly allows misaligned data access but with a possible performance penalty. The advantage is that it allows use of data structures aligned for 680x0 architecture.
  • Elimination of the MQ register. POWER has a special-purpose multiply/quotient (MQ) register for extended-precision integer arithmetic. But since there's only one register, it becomes a bottleneck that hinders superscalar implementations. The MQ register, and all instructions that depend on it, were eliminated from the PowerPC architecture.
  • Addition of single-precision floating point. POWER supports only double- precision floating point. PowerPC architecture supports single precision as well, which may be more appropriate for some applications. (There's no hardware support for 80- or 96-bit extended floating point, which 680x0 developers are familiar with. The consequences of this for developers are discussed in "Native PowerPC Numerics.")
  • 64-bit architecture. POWER is a 32-bit architecture. PowerPC architecture is fully 64 bit; however, the first implementations feature a 32-bit subset of the architecture. Code written for 32-bit processors will be fully supported on 64-bit implementations running in 32-bit mode.

PowerPC architecture uses big-endian byte order, just like 680x0 and POWER. As an added feature, it also supports a mode using little-endian byte ordering and provides instructions to allow access to little-endian data from big-endian mode and to big-endian data from little-endian mode.

For divisible integer quantities composed of separately addressable bytes -- for example, a 32-bit integer subdivided into four addressable bytes -- there are numerous ways to arrange the bytes. Only two arrangements make sense and are in use on computers today. Big-endian byte ordering means the most significant byte (the big end of the number) is assigned the lowest address. Little-endian byte ordering means the least significant byte is assigned the lowest address; it's used, for example, on Intel 80x86 CPUs. The terms originated in Jonathan Swift's Gulliver's Travels, where the controversy was over breaking an egg at the big end or the small end. *

FUNCTIONAL UNITS OF THE POWERPC MICROPROCESSOR

Figure 7 is a block diagram of the PowerPC 601 microprocessor, the first member of the PowerPC processor family. This microprocessor is a superscalar PowerPC implementation, with three separate execution units: the fixed-point and floating-point units and the branch processor. The branch processor initiates instruction execution by fetching instructions from the instruction cache (which is filled from memory if there are no instructions in it). The branch processor then feeds integer and floating-point instructions to the fixed-point and floating-point units respectively. These units operate on data in registers and in the data cache (which is filled from memory if there's no data in it). The fixed-point unit is also involved in address decode operations.

BRANCH PROCESSOR
The branch processor deserves special attention. As mentioned earlier, PowerPC architecture takes an original approach to the problem of branch penalties, and the branch processor is responsible for this. The branch processor contains within it everything needed to determine how to handle a branch instruction. This includes three special-purpose registers:

  • The condition register (CR) has flags set by certain operations and is used for conditional branching.
  • The link register (LR) can contain a destination address for a branch instruction and can also hold the return address after branch and link (subroutine) instructions.
  • The count register (CTR) is used for looping and indirect branches.

For unconditional branches, the branch processor knows unambiguously which path to take. For conditional branches, if a branch condition is set far enough before the actual branch instruction, the branch processor has the information necessary to determine which path to take. The design of the condition register uniquely aids the processing of conditional branches. Instead of a single set of condition codes, it contains eight 4-bit condition code fields, designated CR0, CR1 . . . CR7. Compare operations allow each field to be set independently. A compiler using these multiple, independent condition code fields has more flexibility in scheduling instructions to assist the branch processor. As an additional performance enhancement, instructions that might set condition codes (such asadd) do so only if a record bit is set in the instruction, so time isn't spent setting condition codes that would otherwise be ignored.

The branch processor also has knowledge of the count register, used in looping operations. This lets the branch processor know in advance when a loop will finish.

With this design the branch processor can preprocess the instruction stream and, in most cases, determine in advance the target of the branch operation. This allows it to "fold" the branch instruction out of the instruction stream, so the fixed-point and floating-point units see an unbroken stream of instructions and fewer branch penalties occur.

POWERPC RUNTIME ARCHITECTURE

An important goal in the development of Apple's PowerPC processor-based machines was to preserve user and developer investment in the 680x0 architecture. Another important goal was to port the existing 680x0 Toolbox and operating system to the new platform quickly. Both goals were met through the ability to emulate 680x0 instructions in software on the PowerPC microprocessor. So the first way to view a Macintosh on PowerPC, and indeed the way existing applications and system software view this machine, is as a 680x0-based Macintosh. In this section we approach this new beast through the 680x0 emulator and then peel away the layers to reveal the underlying PowerPC runtime architecture.

SOFTWARE EMULATOR
The software emulator understands and executes the instruction set of a Motorola MC68020 processor. You might wonder why Apple chose to emulate the MC68020 and not the latest and greatest processors such as MC68030 and MC68040.

  • The only advantage of the 68030 over the 68020, in terms of instruction set, is the integrated memory management unit (MMU). The MMU is really for use by the operating system for implementing features such as virtual memory. The PowerPC microprocessor MMU operation is very different from 680x0 MMU operation, and there's no need for applications to execute MMU instructions anyway. Applications needing control over virtual memory can still use the existing virtual memory interface; just the implementation will be different.
  • Similarly, the key advantage of the MC68040 over its predecessors is the integrated floating-point unit. The PowerPC microprocessor has its own floating- point implementation. Apple already provides a standard numeric interface for 680x0 applications, called SANE, and emulating floating-point instructions using native PowerPC code offers no real advantages over implementing SANE as native PowerPC code.

As a bonus feature, the emulator also supports certain advanced user-mode instructions such as the MOVE16 instruction from the MC68040. However, from a programmer's point of view, the emulator behaves as an MC68020 (for example, Gestalt reports an MC68020 is present) and developers are advised not to take advantage of any features outside the MC68020 architecture.

Once the emulator was up and working, the PowerPC processor-based machine almost immediately gained an operating system, since all the code in the ROM and the operating system was now executable. This also gave the machine a high degree of compatibility with older Macintosh models, because the same code, with all its idiosyncrasies, is being executed.

Had Apple stopped here, you'd have a machine that works great but is pretty boring. After all, who wants a machine that pretends to execute 680x0 code, but not necessarily as fast as the real thing? Why not get a real 680x0 machine instead? The answer, of course, lies in tapping into the power behind the emulator -- the PowerPC microprocessor itself.

TOOLBOX ACCELERATION
All Macintosh applications spend part of their time calling the Macintosh Toolbox. In turn, the Toolbox performs the requested service by executing Toolbox code on behalf of the application. You can think of the Toolbox as an extension of the application. The advantage of this during development of PowerPC processor-based machines is that selectively replacing portions of the Toolbox with equivalent PowerPC code greatly enhances the performance of those portions of the Toolbox. All applications that use those routines benefit from improved performance. No modification of the application is required to receive the benefit.

Ideally, of course, it would be best if the entire Toolbox executed as native code. But that requires a huge amount of work and would delay the first release of Macintosh on PowerPC. Analysis of application programs revealed that some portions of the Toolbox are used more heavily than others. All applications, for example, rely heavily on QuickDraw. Effort spent porting QuickDraw would benefit more applications than, say, porting the Dialog Manager. So the first release of Macintosh on PowerPC will target the portions of the Toolbox that will provide the greatest performance enhancement to the greatest number of applications.

As Apple releases new versions of the system, with more and more of the Toolbox as native PowerPC code, users will magically get a "faster" machine without adding new hardware. All they have to do is install the newer, accelerated Toolbox.

At the same time, the goal is not just to enhance the performance of the system, but to empower application software as well. The accelerated Toolbox is a start, but real PowerPC application performance comes from having native PowerPC applications, and the first release of Macintosh on PowerPC will include an entirely new runtime architecture in support of native applications.

WHY A NEW RUNTIME ARCHITECTURE?
The new runtime architecture addresses many of the following limitations of the 680x0 architecture:

  • The first Macintosh models were severely limited in the memory available to applications, so the runtime architecture was designed to squeeze the most out of the memory that was available. Today, the relative availability of cheap RAM removes this limitation.
  • Hard disks and memory management units required to support virtual memory were unavailable, so applications were required to load discrete blocks of code through the Segment Loader. With the relative availability of cheap RAM and support for virtual memory, most reasons for having the Segment Loader disappear.
  • The system now supports a wide variety of code types -- not just applications and system software, but standalone code blocks, such as INITs and MDEFs, and loadable code plug-ins, such as XCMDs and components. These code blocks strain the runtime architecture because it's difficult to manage global data for these blocks and to import and export functions between blocks.
  • There's a large amount of code duplication in the Macintosh. The Toolbox provides some code sharing between applications, but in general, most applications have built into them large amounts of redundant code. For example, library and glue code gets linked into every application. Having it built into the application increases demands on disk and memory resources because each instance of the application must have the duplicated code.

CODE FRAGMENT MANAGER
The centerpiece of the new architecture is the Code Fragment Manager. Each block of executable PowerPC code is a code fragment. A code fragment is autonomous, with its own static data. It can export both code and data references for use by other fragments and import code and data referencesfrom other fragments for its own use. Because such references are resolved at run time, code fragments are a form of dynamically linked, shared libraries. (See "Code Fragment Manager or Shared Library Manager?" for an explanation of the relationship between the two managers.)

From a native PowerPC application's point of view, access to the Macintosh Toolbox now occurs through a shared library maintained by the Code Fragment Manager. Applications no longer have segments -- they have one or more code fragments. The main code fragment is loaded at launch time and any external references to other shared libraries are resolved. An application neither knows nor cares whether a reference is internal or external; access is completely transparent.

In some cases applications may want to manage code fragments on their own. For example, standalone code resources can now be handled as code fragments. This makes code resources such as XCMDs much easier to develop. Not only does such a resource have its own static data, but function references within the resource are fully exportable. Complicated parameter blocks aren't needed for passing data or jumping into the beginning of a code resource. Furthermore, because the application code is itself a code fragment and can export its references, the standalone code has access to functions and data within the application itself. Complicated callback mechanisms are no longer necessary.

CODE FRAGMENT MANAGER OR SHARED LIBRARY MANAGER?

You may already be familiar with an implementation of shared libraries for the Macintosh known as the Shared Library Manager. The advantage of the Shared Library Manager is that it works with today's 680x0 runtime architecture. The Code Fragment Manager, on the other hand, lays the foundation for a new and more modern runtime architecture.

The first releases of these two managers will be mutually exclusive. The Shared Library Manager will be implemented only for 680x0 and the Code Fragment Manager will work only on the PowerPC microprocessor.

In the future, though, the Code Fragment Manager will be available on 680x0-based machines as well, and a future release of the Shared Library Manager (version 2.0) will be built on top of the Code Fragment Manager. This will provide Shared Library Manager support for Macintosh on PowerPC. Developers should code for whichever mechanism best suits their needs and target platform.

MIXED MODE MANAGER
There's one final piece to the PowerPC architecture puzzle. The Macintosh Toolbox makes wide use of pointers to functions. FilterProcs, I/O completion routines, A-trap vectors, QuickDraw bottlenecks, definition procedures (such as MDEFs, MBDFs, and CDEFs), and other types of standalone code (such as INITs and VBL tasks) are just a few examples of the wide variety of function pointers in use on the Macintosh.

On a 680x0-based Macintosh, life is easy because a function pointer is just the address of a 680x0 routine that can be called. On a PowerPC processor-based Macintosh, life is much more complicated; not only is the Toolbox a mixture of 680x0 and PowerPC code, but a function pointer could be a pointer to 680x0 code or PowerPC code and the caller should neither know nor care what kind of code it's calling.

To handle this situation, Apple is introducing the Mixed Mode Manager. One problem that this manager must solve is the mismatch between calling conventions for 680x0 and PowerPC code. PowerPC code follows C conventions, with parameters passed right to left. The 680x0 code uses a variety of calling conventions: some traps are register based while some are Pascal stack based with parameters passed left to right. The Mixed Mode Manager must make calls between disparate functions seamless. Furthermore, it must do it in a way that's compatible with existing 680x0 applications. Since existing binaries must work unmodified, the existence of the Mixed Mode Manager must be completely transparent to these applications. The Mixed Mode Manager's task is shown in Figure 8. Instead of passing a function pointer of type ProcPtr to the Toolbox, applications must now pass a function pointer of type UniversalProcPtr. UniversalProcPtr is a generic version of ProcPtr that lets the Mixed Mode Manager know how to route the call. Whenever 680x0 or PowerPC code calls a function through a UniversalProcPtr, the Mixed Mode Manager looks at the destination for the call. If a mode switch isn't necessary -- in other words, if both the caller and the callee are the same code type -- the Mixed Mode Manager does nothing and just passes the call to the caller.

[IMAGE Radcliffe_final_rev9.GIF]

Figure 8 Mixed Mode Manager

If a mode switch is necessary -- in other words, if a 680x0 caller is calling PowerPC code, or vice versa -- the Mixed Mode Manager performs a protocol conversion, rearranging the parameters, including moving them into or out of registers as necessary to ensure that the callee sees the parameters correctly. When the callee returns, the Mixed Mode Manager performs a protocol conversion in the other direction to ensure that return values are correctly passed back to the caller.

For 680x0 applications, the Mixed Mode Manager is completely transparent and these applications run without modification. PowerPC applications, however, must become aware of the Mixed Mode Manager. The basics of using the Mixed Mode Manager are covered along with UniversalProcPtrs later in the section "UniversalProcPtrs."

WRITING PORTABLE C CODE

The preferred development languages for PowerPC code are C and C++. Therefore, the first step in preparing for the PowerPC platform is to provide portable C and C++ code. The examples here use C, but the principles apply to C++ as well.

The compilers for PowerPC C code are stricter than either the MPW or the THINK C compiler, so the best way to prepare your code for the PowerPC platform is to be sure it follows the ANSI C standard. You should take full advantage of the stronger type checking and prototyping features an ANSI C compiler provides.

Consistent use of function prototypes is the best way to ensure portable code. ANSI C prototypes fully qualify the parameters to a function, as shown in this example:

void DoEvent (EventRecord *event);

It's usually permissible to mix the new-style function declaration with the old-style function definition:

void DoEvent (event)
EventRecord *event;{ . . .
}

However, mixing function declarations in this way typically defeats the purpose of having a function prototype in the first place. So the first step in writing portable code is to be sure you consistently use ANSI C function prototypes throughout.

INTEGERS AND BITFIELDS
Variations in the size of integers of type int always cause trouble when you're trying to port code. This is more of a problem for THINK C code, which allows 16-bit integers of type int. C purists may not agree, but my recommendation is never to use type int. Always use integers of types short and long (or an equivalent type). The Macintosh Toolbox itself is explicit about data sizes, and experience has shown that developers dependent on the THINK C 16-bit integers of type int have more difficulty porting to the PowerPC platform.

A similar caution applies to bitfields. Bitfields are useful for access to machine-dependent data structures and the like, but are inherently implementation defined and therefore nonportable.

DATA STRUCTURES
Some compilers allow incomplete arrays as the last member in a data structure:

struct QElem {
struct QElem*qLink;
short qType;
short qData[];
};

This isn't allowed by the ANSI C standard. Here's a more portable definition:

struct QElem {
struct QElem*qLink;
short qType;
short qData[1];
};

Similarly, some compilers allow comparison of data structures. Again, this isn't allowed by the ANSI C standard, so attempting to do something as simple as comparing two Rects will fail on the compilers for PowerPC code.

When using data structures, you need to be aware of data alignment. RISC machines prefer (and often require) that data be aligned on a 4-byte boundary. But on the 680x0, the default is to align data to a 2-byte boundary. PowerPC architecture specifically allows misaligned data access, but there can be a small performance penalty if multiple bus cycles are required for access to the data. This creates a dilemma: portability versus performance.

Because the Macintosh Toolbox relies on 680x0 data structures, data passed to the Toolbox must have 680x0 alignment. The same applies if you want to share data with 680x0 applications. To solve this, the compiler now allows you, through #pragma statements and compiler options, to align PowerPC code data structures just like 680x0 code data structures. But if the structure is only internal to your application, you probably want to use the natural PowerPC code alignment. Although it's likely to be painful to modify existing data structures for PowerPC code alignment, if you're designing new data structures, you can keep the alignment issue in mind and create structures that are optimal for both 680x0 and PowerPC processor-based machines.

COMPILER EXTENSIONS
In addition to supporting 680x0 data alignment, compilers for PowerPC code have been extended in several other ways to make porting easier. This involves supporting several of the MPW C compiler extensions and features:

  • The compiler understands "\p" at the start of a string for the generation of Pascal strings.
  • The pascal function keyword is allowed by the compiler, but ignored. A subtle consequence of this is discussed in the section "Pascal Functions."
  • The compiler won't complain if you use C++ style line-end comments (//).
  • MPW C packs enums into the smallest data type possible and the compilers for PowerPC code have been extended to support the feature.

How can you tell if your code is ANSI C compliant? You can eliminate many of the idiosyncrasies in your code by compiling it with multiple compilers. Code conditioned in this way is much more portable to the PowerPC platform than code dependent on a single compiler. So one of the best ways to prepare for the PowerPC platform is to make sure your code compiles and runs with both MPW C and THINK C.

WRITING CODE FOR POWERPC

Some changes to the programming model are necessary for the development of PowerPC code. However, Apple tried to limit changes so as to make the transition to the PowerPC platform easier for developers (see "Universal Interfaces" to understand how these changes affect development for 680x0 platforms).

COMPATIBILITY GUIDELINES
Everything ever written about compatibility guidelines for the Macintosh applies to the Macintosh on PowerPC in spades. Here are some of the key points:

  • The code must be 32-bit clean. Most applications now satisfy this requirement, thanks to System 7, but it deserves reiterating because 24-bit mode will no longer be an option.
  • For the first release of Macintosh on PowerPC, access to low memory is allowed exactly as before. Direct access to low memory applies for both 680x0 and native PowerPC applications; however, a procedural interface is provided as part of the new API, and developers are strongly urged to begin using it for future compatibility. For example, CurDirStore is a commonly used low-memory global, and two new functions are defined to provide access to it:
    long LMGetCurDirStore (void);
    void LMSetCurDirStore (long CurDirStoreValue);
    
  • Don't depend on undocumented data structures. Also, don't depend on alignment of data structures.
  • Don't write data into code. In the past, this was often necessary because of limitations of the runtime architecture. Many of the reasons for doing it no longer exist with PowerPC architecture, so avoid it.
  • Beware of dependencies on floating-point data types (see "Native PowerPC Numerics," earlier in this article).
  • Don't depend on the hardware. Not only is there no longer a 680x0 CPU present, but the I/O architecture can also change. Use programmatic interfaces to perform I/O.
  • Don't depend on the 680x0 runtime model, which is very idiosyncratic. Fortunately, many of those idiosyncrasies were eliminated in the PowerPC runtime architecture, making your life easier but complicating the move from the 680x0 to this new architecture.

    Some of these points are discussed in the following sections.

    UNIVERSAL INTERFACES

    BY DEAN YU

    As Apple takes the Macintosh experience to a new chip architecture, it becomes more important than ever to have portable source code. With that in mind, Apple has created a set of universal interface files, which are provided on this issue's CD. The same interface file -- for example, Windows.h -- can be used to compile any source file for a Macintosh on either a 680x0 or a PowerPC microprocessor. The main changes you'll find in the C universal interface files are described below.

    All system software routines declared extern. On the PowerPC platform, all routines can potentially be in a shared library, so all routines must be declared extern in order for the compiler to generate the correct code. Declaring routines extern is also compatible with MPW C.

    Inline code wrapped in macro definitions. Obviously, 680x0 inline code isn't very useful on a PowerPC platform. 680x0 inline code is isolated by macros such as THREEWORDINLINE, which are defined in ConditionalMacros.h. These macros expand to inline initializers when compiling for 680x0 on non-shared library based platforms, and do nothing when compiling for PowerPC or shared library-based platforms.

    UniversalProcPtrs. As discussed more fully in this article, the biggest change in the interface files is the introduction of the UniversalProcPtr data type used by the Mixed Mode Manager. In support of cross-platform code generation, the interface files define special "New" and "Call" macros (such as NewGrowZoneProc and CallGrowZoneProc) that hide the implementation details of using UniversalProcPtrs. For example, when you compile your application as 680x0 code, the Call macros jump to the routine pointed to by the UniversalProcPtr directly rather than invoke CallUniversalProc as they would for PowerPC compilation. Note that 680x0 versions of the Call macros are provided only for stack-based ProcPtrs.

    Low memory access. To isolate dependencies on low memory, the SysEqu.h file has been removed and replaced by LowMem.h, which defines accessor functions for low memory. Previously defined accessor functions, such as MemError, are still defined but call through to the new accessor functions when appropriate.

    Structure alignment. To maintain data structure compatibility, structs follow 680x0 word alignments when being compiled for the PowerPC microprocessor.

    Even if you don't plan on porting your application immediately to the PowerPC platform, you can begin using the universal interface files for 680x0 development and make a crucial step toward future PowerPC compatibility.

    REVISITING THE CODE FRAGMENT MANAGER
    As previously mentioned, the centerpiece of the PowerPC runtime architecture is the Code Fragment Manager. Rather than having a collection of code resources, a PowerPC application has a code fragment (generally one, but possibly more) that lives in the data fork of the application. When an application is launched, the Process Manager determines whether a native PowerPC code fragment is present by looking for a 'cfrg' resource. This resource provides the necessary information for the Code Fragment Manager to load the main code fragment and resolve any external code and data references. The Code Fragment Manager also sets up global data for the code fragment.

    The Code Fragment Manager eliminates the need for a segment loader. If virtual memory isn't present, the Code Fragment Manager loads the entire code fragment into memory; otherwise, it relies on virtual memory to page code directly in from the application when needed.

    A 680x0 application maintains a notion of an A5 world, an integral part of the 680x0 runtime environment. Register A5 provides access to four kinds of data:

    • application global data
    • application QuickDraw global variables
    • application jump table
    • application parameters

    Of these, only the QuickDraw global variables remain relevant. A wide variety of system and application code depends on using A5 to locate QuickDraw globals. Even though a native application has no use for a 680x0 register A5, the system still maintains an A5 world so that code that does depend on A5 has access to the right data. This means SetCurrentA5 and SetA5 will do the right thing with QuickDraw globals if you need to swap A5 worlds.

    The 680x0 Macintosh Toolbox uses a wide variety of calling conventions. The two most common ones are Pascal stack based and register based. Variations include passing a selector to dispatch to a variety of functions or passing a pointer to a parameter block in register A0 (for VBL tasks, notification tasks, and I/O completion routines) or register A1 (for Time Manager tasks). Two of my personal favorites are the TextEdit highHook and caretHook routines: when called they have a pointer to the edit record in A3 and, instead of a return address, a pointer to a rectangle on top of the stack. The point is that it's nearly impossible to write 680x0 Macintosh applications entirely in a high-level language. Some assembly-language programming is required just to move these weird parameters around.

    Life gets much easier on the PowerPC platform, which relies on uniform C calling conventions for everything. In almost all cases, 680x0 inline assembly and assembly wrapper routines can be rewritten in C for PowerPC code. For example, a 680x0 application can use the following assembly highHook routine to underline a selection:

    HighHookUnderline
    MOVE.L(SP),A0 ; Get the address of the rectangle
    MOVE bottom(A0),top(A0); Make the top coordinate equal to
    SUBQ #1,top(A0) ; the bottom coordinate minus 1
    _InverRect ; Invert the resulting rectangle
    RTS

    It's impossible to write this routine in C because of the weird calling conventions that supply the pointer to the Rect on top of the stack. For a native PowerPC application, the two parameters are simply specified as standard C parameters and the following routine suffices (the TEPtr parameter isn't used in this example):

    void HighHookUnderline (Rect *boundsRect, TEPtr pTE)
    { boundsRect->top = boundsRect->bottom - 1;
    InvertRect(boundsRect);
    return;
    }

    PASCAL FUNCTIONS
    Although the compilers for PowerPC C code were extended to accept thepascal keyword for source code compatibility with 680x0 Macintosh code, when the compiler encounters this keyword, it doesabsolutely nothing. Unlike MPW C, where the keyword alters parameter ordering and changes how some parameters are passed, the compilers for PowerPC code ignore thepascal keyword. In most cases this is not a problem, but there can be some subtle consequences. For example, consider the following Apple event handler:

    pascal OSErr DoAEAnswer (AppleEvent message, AppleEvent reply,
    long refCon);

    An Apple event record is larger than four bytes, so in Pascal it's automatically passed by reference. Because DoAEAnswer is declared as apascal function, MPW C handles the parameter in the same way. But the compilers for PowerPC code treat it as a standard C data structure and pass it by value. So if DoAEAnswer were called by the Apple Event Manager, bizarre things would happen. To be compatible with both types of compilers, you must explicitly make these parameters pointers, as follows:

    pascal OSErr DoAEAnswer (AppleEvent *message, AppleEvent *reply,
    long refCon);

    When in doubt, check the new interfaces; they now declare special function pointers of type ProcPtr that specify the correct parameters.

    typedef pascal OSErr (*EventHandlerProcPtr)(const AppleEvent
    *theAppleEvent, const AppleEvent *reply, long handleRefCon);

    Unfortunately, in most cases you'll now be coercing any special ProcPtrs (such as EventHandlerProcPtr) into normal ProcPtrs for calls to NewRoutineDescriptor (described in the next section), which means type checking will be lost. So double-check all your callback routines.

    UNIVERSALPROCPTRS
    Because of the introduction of the Mixed Mode Manager, the single biggest change you'll have to make to your code is converting function pointers of type ProcPtr to type UniversalProcPtr. Every place in the interfaces where a type of ProcPtr was declared, Apple added a similar declaration of type UniversalProcPtr.

    UniversalProcPtr is a generic function pointer. For 680x0 code, a UniversalProcPtr is just a 680x0 ProcPtr. For native PowerPC code, though, a UniversalProcPtr is a pointer to a data structure called a routine descriptor, which in addition to providing a function reference, supplies all the information the Mixed Mode Manager needs to transform parameters back and forth between 680x0 and PowerPC worlds. Because a UniversalProcPtr is no longer a simple function reference, there are issues of allocation and scope that make it more complicated to use than a simple ProcPtr. Fortunately, 680x0 interfaces are being changed to add UniversalProcPtr support, so changes you make for PowerPC code will also be compatible with 680x0 interfaces (see "Universal Interfaces" earlier in this article).

    Let's look at a simple example using a UniversalProcPtr. Suppose you have an action procedure for a vertical scroll bar, called VActionProc. Current code would call TrackControl with that action procedure as follows:

    TrackControl(ctlHit, mouseLoc, VActionProc);

    With PowerPC code, you must create a routine descriptor for VActionProc. Because there's usually a one-to-one correspondence between function pointers of type ProcPtr in your code and function pointers of type UniversalProcPtr required by the Mixed Mode Manager, it's simplest to allocate one UniversalProcPtr for each ProcPtr you use. The memory impact of this approach is small because a routine descriptor data structure typically uses only 32 bytes.

    One way to do this is to allocate the routine descriptor statically and have it initialized by the compiler. Macros are supplied in MixedMode.h for this purpose. For example, you can create a routine descriptor for VActionProc like this:

    RoutineDescriptor gVActionProcRD =
    BUILD_ROUTINE_DESCRIPTOR(uppControlActionProcInfo, VActionProc);

    Alternatively, you can allocate your routine descriptors on the heap. Again, because they seldom change, you'll generally want to allocate them at application startup:

    ControlActionUPPgVActionUPP;

    gVActionUPP = NewRoutineDescriptor((ProcPtr)VActionProc,
    uppControlActionProcInfo, GetCurrentISA());
    NewRoutineDescriptor is declared as follows:

    UniversalProcPtr NewRoutineDescriptor(ProcPtr theProc,
    ProcInfoType theProcInfo, ISAType theISA);

    NewRoutineDescriptor allocates nonrelocatable storage for the routine descriptor on the heap and returns it as a pointer to the routine descriptor in the form of a UniversalProcPtr. The theProc parameter is just the function pointer for the function you're referring to and theProcInfo is a 32-bit value that tells the Mixed Mode Manager how to convert parameters back and forth. Every UniversalProcPtr type has defined for it a corresponding ProcInfoType value. So the ProcInfoType value for ControlActionUPP is uppControlActionProcInfo. The third parameter, theISA, specifies the current instruction set architecture (ISA) in use. For portable code, simply call GetCurrentISA to get the appropriate ISA type. If you know you're dealing with a specific code type -- for example, a 680x0 code resource -- you can call NewRoutineDescriptor and specify the proper instruction set type -- for example, kM68kISA for 680x0 code.

    To simplify creation of function pointers of type UniversalProcPtr, the new interfaces also define macros that call NewRoutineDescriptor for you and automatically specify the ProcInfoType value:

    gVActionUPP = NewControlActionProc((ProcPtr) VActionProc);

    If you created the routine descriptor statically, you can pass the address of the structure to TrackControl:

    TrackControl(ctlHit, mouseLoc, (ControlActionUPP) &gVActionProcRD);

    If, instead, you created a UniversalProcPtr on the heap, you can use it directly in TrackControl:

    TrackControl(ctlHit, mouseLoc, gVActionUPP);

    If you allocate a UniversalProcPtr statically, you don't have to worry about deallocating it, because that will happen when the application quits. You could also allocate it locally, which you might want to do if the routine were unlikely to be called. In that case, you would have to explicitly deallocate the routine descriptor before leaving the function, as follows:

    DisposeRoutineDescriptor(gVActionUPP);

    A potential problem with disposing of routine descriptors is that you could dispose of them before they're used. For example, if you have a routine descriptor for an asynchronous I/O completion routine, disposing of the routine descriptor before the completion routine is called would be bad.

    An alternative for infrequently used routine descriptors is to allocate them globally but initialize them only when needed, as in this example:

    if (!gVActionUPP)
    gVActionUPP = NewControlActionProc((ProcPtr) VActionProc);
    TrackControl(ctlHit, mouseLoc, gVActionUPP);

    In most cases you won't need to call a UniversalProcPtr yourself; you'll simply pass it to the Toolbox. But should you need to call one from PowerPC code, you can't simply treat it as a function pointer. You must use CallUniversalProc to have the Mixed Mode Manager call the function for you. CallUniversalProc is declared as follows:

    long CallUniversalProc(UniversalProcPtr theProcPtr,
    ProcInfoType procInfo, ...);

    The first two parameters, the UniversalProcPtr and the 32-bit ProcInfoType value, are followed by all the additional parameters normally passed to the call. To simplify calling UniversalProcPtrs,special macros have been included in the interfaces for each UniversalProcPtr data type. For example, gVActionUPP above could be called using CallControlActionProc:

    CallControlActionProc(gVActionUPP, theControl, partCode);

    One special case of a UniversalProcPtr deserves mention because it can't be flagged by the compiler. A wonderful feature of the Dialog Manager is that for a userItem, the SetDItem call allows the item's procedure pointer to be set via the item parameter. Since you're explicitly casting a ProcPtr to a handle, the compiler assumes you know what you're doing and doesn't object. Of course, what you really need to pass is a UniversalProcPtr, but since the compiler doesn't catch this, strange things will surely happen if you don't catch it yourself.

    As another example of using function pointers of type UniversalProcPtr, let's look at a VBL task. A persistent VBL task (one that works when the application is in the background) is often implemented by copying the VBL task code into the system heap, an ugly solution and self-modifying code as well. A simpler solution for PowerPC code is to create the UniversalProcPtr itself in the system heap since the Process Manager views the UniversalProcPtr as code. The following code shows how to install such a VBL task:

    #define kVBLInterval 30

    OSErr InstallVBL (VBLTaskPtr theVBLTask, VBLProcPtr myVBLProc,
    Boolean isPersistent)
    { OSErr theError;
    THz  savedZone;

     /  *
    * For a VBL task that operates when the application is in the
    * background (i.e., that's persistent) we can simply create the 
    * UniversalProcPtr in the system heap. This causes the Process 
    * Manager to treat the code as though it were in the system heap
    * and the VBL will always get executed.
    * /

     if (isPersistent) {
    savedZone = GetZone();
    SetZone(SystemZone());
    }
    theVBLTask->vblAddr = NewRoutineDescriptor((ProcPtr) myVBLProc,
    uppVBLProcInfo, GetCurrentISA());
    theError = MemError();
    if (isPersistent)
    SetZone(savedZone); /* Restore the application zone. */
    if (theVBLTask->vblAddr != nil) {
    theVBLTask->qType = vType;
    theVBLTask->vblCount = kVBLInterval;
    theVBLTask->vblPhase = 0;
    theError = VInstall((QElemPtr) theVBLTask);
    }
    return (theError);
    }

    The isPersistent Boolean variable controls whether the VBL functions persistently. If it's persistent, you can control where the memory is allocated by first setting the zone to the system zone (because NewRoutineDescriptor calls the Memory Manager to allocate memory for the routine descriptor).

    Here's the code for the VBL task:

    long gCounter = 0;

    pascal void MyVBLProc (VBLTaskPtr theVBLTask)
    { theVBLTask->vblCount = kVBLInterval;
    gCounter++;
    return;
    }

    This very simple example alters only a global variable, but it illustrates two points. First, no complicated setup for global variables is required. For a 680x0 VBL task, messy saving and restoring of register A5 would be necessary for correct access to global variables. In the example, because the code resides in a code fragment, global variables are always accessible. Second, the procedure is called with a VBLTaskPtr parameter. For a 680x0 VBL task, a pointer to the VBLTask record resides in register A0 and requires special handling to get to the data from a high-level language. Because PowerPC code uses strict C calling conventions, the required data is passed as a standard parameter.

    Finally, of course, you have to remove the VBL task correctly:

    void RemoveVBL (VBLTaskPtr theVBLTask)
    { THzsavedZone;

     VRemove((QElemPtr) theVBLTask);
    if (theVBLTask->vblAddr) {
    savedZone = GetZone();
    /* Make sure we're in the right zone. */
    SetZone(PtrZone((Ptr) theVBLTask->vblAddr));
    DisposeRoutineDescriptor(theVBLTask->vblAddr);
    SetZone(savedZone);
    }
    return;
    }

    Although it may not be necessary to deallocate a VBL task created in the application heap, this code practices safe memory management by being sure the memory gets deallocated no matter where it is -- in other words, whether it's persistent or not.

    TRAP PATCHING
    Trap patching is fully supported on the PowerPC microprocessor; as always, however, it must be undertaken with due care and consideration. Not only is the compatibility risk higher (especially if you're dependent on 680x0 runtime features), but indiscriminate trap patching can severely affect the performance of the PowerPC processor-based machine.

    Trap patching is possible from both 680x0 code and PowerPC code, and you should use the NGetTrapAddress and NSetTrapAddress calls in both cases. From PowerPC code, the address returned by NGetTrapAddress must be treated as a UniversalProcPtr and you must pass a UniversalProcPtr to NSetTrapAddress as well.

    What complicates the issue is that the trap you patch could be written in either 680x0 code or PowerPC code. The Mixed Mode Manager, of course, handles both cases, but if you're patching native PowerPC code with 680x0 code, performance-sensitive code can suddenly run more slowly, not only because of your emulated code but because of overhead associated with mixed mode transitions. So you must think very carefully about the performance consequences of your patch.

    TAKING A RISC

    To ease the transformation of existing applications into native PowerPC applications, Apple has minimized changes to the API. Most ANSI C compliant code, with the exception of ProcPtrs, shouldrecompile without modification. Developers can exploit this opportunity to easily tap into the power of the PowerPC microprocessor.

    With PowerPC processor-based machines, Apple is laying the foundation for the future. The new levels of performance and new features such as the Code Fragment Manager give developers new worlds to explore and new opportunities for adding unique features to their applications.

    NATIVE POWERPC NUMERICS

    BY ALI SAZEGARI

    Developers dependent on floating point who port to the PowerPC platform will enjoy superior floating-point performance. However, some special consideration is needed, because the floating-point implementation on the PowerPC processor differs from that of the 680x0 processors.

    POWERPC ARCHITECTURE FEATURES
    The PowerPC microprocessor floating point is an IEEE 754-compliant single- and double-precision implementation offering fast, pipelined, nondestructive floating-point operations. These operations are add, subtract, multiply, divide, compare, convert to int, and a new class of multiply-add fused (MAF) instructions of the form

    frT ¨ (frA * frB) + frC

    where fr is a floating-point register. In MAF operations, all bits of the resultant multiply section are kept (106 bits in double) and participate in the final rounding, producing a more exact result. In other words, ( A * B) + C is a single operation with one rounding. The compilers on the PowerPC platform use MAF instructions wherever possible, unless expressly prohibited by the user.

    The PowerPC microprocessor has a rich set of floating-point register files: 32 floating-point double-precision data registers and a combined status and control register (unlike the MC6888x or MC68040).

    C PROGRAMMER'S MODEL
    The PowerPC microprocessor shared math library, MathLib, complies with the emerging Floating-Point C Extensions (FPCE X3J11.1/93-001) of the Numerical C Extensions Group (NCEG) specification. FPCE extends C to provide access to floating-point features generally and IEEE 754/854 specifically. FPCE provides a superset of math.h and sane.h functionality. The new required include files are fp.h and fenv.h.

    The FPCE fp.h file is a collection of mathematical functions. It defines all math.h and nonenvironmental sane.h functionality plus hyperbolic, inverse hyperbolic, max, min, positive difference, error, and gamma functions. Other functions round floating-point numbers to integral values or integral format. An extensive array of correctly rounded binary-to-decimal conversion functions is provided.

    The FPCE fenv.h file defines all the functions used to query or modify the floating-point environment (exception flags and rounding direction).

    The include file math.h is kept for ANSI C compliance, but developers are encouraged to use fp.h and fenv.h. The sane.h include file won't be supported. Be aware of function name and prototype differences

    [IMAGE Radcliffe_final_rev8.GIF]

    Figure 7 between SANE and FPCE-NCEG interfaces. For example, the functions copysign and scalb have reversed arguments in the new fp.h, and log1 is now called log1p.

    FP DATA TYPES
    Table 1 lists the available native data types on the PowerPC microprocessor. There's no hardware or compiler support for the 80- or 96-bit IEEE extended values commonly used by Macintosh programmers. Developers should use 64-bit double as their native data type and use rescaling techniques within their algorithms susceptible to numerical ill-conditioning. The 64-bit comp type, a floating-point data type available on the 680x0-based Macintosh, isn't supported. Use the data type long double judiciously and only when an algorithm requires the extra precision. SANE data types, which include extended and comp, are fully supported in emulation mode on PowerPC processor-based Macintosh systems.

    The transcendental long-double functions are not supported for the first release of MathLib on PowerPC processor-based Macintosh systems. A complete long-double library is planned for a later release.

    Table 1 Available Native Data Types on the PowerPC Microprocessor

    Native Data Type Description float IEEE single precision (32 bits with fast operations) double IEEE double precision (64 bits with fast operations) long double 128-bit structure of two doubles (head and tail), whose value is head + tail. Not an IEEE double-extended type! Provides additional precision within double range.

    Note: The long double data type isn't supported by the hardware, so operations are relatively slow. It should be used selectively.

    RECOMMENDED READING

    For more information on CISC and RISC architectures in general and POWER and PowerPC architectures in particular, consult the following sources:

    • Advanced Microprocessors by Daniel Tabak (McGraw-Hill, 1991).
    • Computer Architecture and Computer Architecture Case Studies by Robert J. Baron and Lee Higbie (Addison-Wesley, 1992).
    • Computer Architecture: A Quantitative Approach by David A. Patterson and John L. Hennessy (Morgan Kaufman Publishers, 1990).
    • "PowerPC Performs for Less," by Tom Thompson, Byte, August 1993.
    • "RISC Drives PowerPC," by Bob Ryan, Byte, August 1993.
    • PowerPC 601 RISC Microprocessor User's Manual (Motorola, 1993).


    DAVE RADCLIFFE is a five-year veteran of Apple's Developer Technical Support group and for the past year has been excited to be part of the PowerPC project. When he's not plumbing the depths of MacsBug, Dave enjoys relaxing with a mug of beer from one of the local microbreweries, watching old movies at the Stanford Theatre, or just being a couch potato in front of one of his laser disc videos. For his sabbatical, Dave is looking forward to a low-tech adventure river rafting through the Grand Canyon.*

    THANKS TO OUR TECHNICAL REVIEWERS C. K. Haun, Ron Hochsprung, Bruce Jones, Alan Lillich, Wayne Meretzky, Eric Traut*Thanks to Paul Finlayson and Stuart McDonald for their review of "Native PowerPC Numerics." *

  •  

    Community Search:
    MacTech Search:

    Software Updates via MacUpdate

    Latest Forum Discussions

    See All

    Tokkun Studio unveils alpha trailer for...
    We are back on the MMORPG news train, and this time it comes from the sort of international developers Tokkun Studio. They are based in France and Japan, so it counts. Anyway, semantics aside, they have released an alpha trailer for the upcoming... | Read more »
    Win a host of exclusive in-game Honor of...
    To celebrate its latest Jujutsu Kaisen crossover event, Honor of Kings is offering a bounty of login and achievement rewards kicking off the holiday season early. [Read more] | Read more »
    Miraibo GO comes out swinging hard as it...
    Having just launched what feels like yesterday, Dreamcube Studio is wasting no time adding events to their open-world survival Miraibo GO. Abyssal Souls arrives relatively in time for the spooky season and brings with it horrifying new partners to... | Read more »
    Ditch the heavy binders and high price t...
    As fun as the real-world equivalent and the very old Game Boy version are, the Pokemon Trading Card games have historically been received poorly on mobile. It is a very strange and confusing trend, but one that The Pokemon Company is determined to... | Read more »
    Peace amongst mobile gamers is now shatt...
    Some of the crazy folk tales from gaming have undoubtedly come from the EVE universe. Stories of spying, betrayal, and epic battles have entered history, and now the franchise expands as CCP Games launches EVE Galaxy Conquest, a free-to-play 4x... | Read more »
    Lord of Nazarick, the turn-based RPG bas...
    Crunchyroll and A PLUS JAPAN have just confirmed that Lord of Nazarick, their turn-based RPG based on the popular OVERLORD anime, is now available for iOS and Android. Starting today at 2PM CET, fans can download the game from Google Play and the... | Read more »
    Digital Extremes' recent Devstream...
    If you are anything like me you are impatiently waiting for Warframe: 1999 whilst simultaneously cursing the fact Excalibur Prime is permanently Vault locked. To keep us fed during our wait, Digital Extremes hosted a Double Devstream to dish out a... | Read more »
    The Frozen Canvas adds a splash of colou...
    It is time to grab your gloves and layer up, as Torchlight: Infinite is diving into the frozen tundra in its sixth season. The Frozen Canvas is a colourful new update that brings a stylish flair to the Netherrealm and puts creativity in the... | Read more »
    Back When AOL WAS the Internet – The Tou...
    In Episode 606 of The TouchArcade Show we kick things off talking about my plans for this weekend, which has resulted in this week’s show being a bit shorter than normal. We also go over some more updates on our Patreon situation, which has been... | Read more »
    Creative Assembly's latest mobile p...
    The Total War series has been slowly trickling onto mobile, which is a fantastic thing because most, if not all, of them are incredibly great fun. Creative Assembly's latest to get the Feral Interactive treatment into portable form is Total War:... | Read more »

    Price Scanner via MacPrices.net

    Early Black Friday Deal: Apple’s newly upgrad...
    Amazon has Apple 13″ MacBook Airs with M2 CPUs and 16GB of RAM on early Black Friday sale for $200 off MSRP, only $799. Their prices are the lowest currently available for these newly upgraded 13″ M2... Read more
    13-inch 8GB M2 MacBook Airs for $749, $250 of...
    Best Buy has Apple 13″ MacBook Airs with M2 CPUs and 8GB of RAM in stock and on sale on their online store for $250 off MSRP. Prices start at $749. Their prices are the lowest currently available for... Read more
    Amazon is offering an early Black Friday $100...
    Amazon is offering early Black Friday discounts on Apple’s new 2024 WiFi iPad minis ranging up to $100 off MSRP, each with free shipping. These are the lowest prices available for new minis anywhere... Read more
    Price Drop! Clearance 14-inch M3 MacBook Pros...
    Best Buy is offering a $500 discount on clearance 14″ M3 MacBook Pros on their online store this week with prices available starting at only $1099. Prices valid for online orders only, in-store... Read more
    Apple AirPods Pro with USB-C on early Black F...
    A couple of Apple retailers are offering $70 (28%) discounts on Apple’s AirPods Pro with USB-C (and hearing aid capabilities) this weekend. These are early AirPods Black Friday discounts if you’re... Read more
    Price drop! 13-inch M3 MacBook Airs now avail...
    With yesterday’s across-the-board MacBook Air upgrade to 16GB of RAM standard, Apple has dropped prices on clearance 13″ 8GB M3 MacBook Airs, Certified Refurbished, to a new low starting at only $829... Read more
    Price drop! Apple 15-inch M3 MacBook Airs now...
    With yesterday’s release of 15-inch M3 MacBook Airs with 16GB of RAM standard, Apple has dropped prices on clearance Certified Refurbished 15″ 8GB M3 MacBook Airs to a new low starting at only $999.... Read more
    Apple has clearance 15-inch M2 MacBook Airs a...
    Apple has clearance, Certified Refurbished, 15″ M2 MacBook Airs now available starting at $929 and ranging up to $410 off original MSRP. These are the cheapest 15″ MacBook Airs for sale today at... Read more
    Apple drops prices on 13-inch M2 MacBook Airs...
    Apple has dropped prices on 13″ M2 MacBook Airs to a new low of only $749 in their Certified Refurbished store. These are the cheapest M2-powered MacBooks for sale at Apple. Apple’s one-year warranty... Read more
    Clearance 13-inch M1 MacBook Airs available a...
    Apple has clearance 13″ M1 MacBook Airs, Certified Refurbished, now available for $679 for 8-Core CPU/7-Core GPU/256GB models. Apple’s one-year warranty is included, shipping is free, and each... Read more

    Jobs Board

    Seasonal Cashier - *Apple* Blossom Mall - J...
    Seasonal Cashier - Apple Blossom Mall Location:Winchester, VA, United States (https://jobs.jcp.com/jobs/location/191170/winchester-va-united-states) - Apple Read more
    Seasonal Fine Jewelry Commission Associate -...
    …Fine Jewelry Commission Associate - Apple Blossom Mall Location:Winchester, VA, United States (https://jobs.jcp.com/jobs/location/191170/winchester-va-united-states) Read more
    Seasonal Operations Associate - *Apple* Blo...
    Seasonal Operations Associate - Apple Blossom Mall Location:Winchester, VA, United States (https://jobs.jcp.com/jobs/location/191170/winchester-va-united-states) - Read more
    Hair Stylist - *Apple* Blossom Mall - JCPen...
    Hair Stylist - Apple Blossom Mall Location:Winchester, VA, United States (https://jobs.jcp.com/jobs/location/191170/winchester-va-united-states) - Apple Blossom Read more
    Cashier - *Apple* Blossom Mall - JCPenney (...
    Cashier - Apple Blossom Mall Location:Winchester, VA, United States (https://jobs.jcp.com/jobs/location/191170/winchester-va-united-states) - Apple Blossom Mall Read more
    All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.