Code Optimization

Volume Number: 15 (1999)
Issue Number: 4
Column Tag: CodeWarrior Workshop

Code Optimization

by Andrew Downs

Code Optimization Options in CodeWarrior

Introduction

Code optimization involves the application of rules and algorithms to program code with the goal of making it faster, smaller, more efficient, and so on. Often these types of optimizations conflict with each other: for instance, faster code usually ends up larger, not smaller. Optimizations can be performed at several levels (e.g. source code, intermediate representations), and by various parties, such as the developer or the compiler/optimizer.

This article discusses and demonstrates the code optimization options available in CodeWarrior Pro. In spite of its ever-growing sophistication, CodeWarrior still makes it easy to select the right level of optimization for your program. Understanding how CodeWarrior applies optimizations will help when it is time to debug your code, especially when using a low-level debugger or when the source code is not available.

Optimization Terms

In order to understand the process, you need to speak the language. Here are definitions for the terms discussed in this article, plus a few extras. Keep in mind that the words replace, substitute, etc. do not imply physical changes to your original source code, but rather are reflected in the generated machine code, after your source code has been parsed and optimized.

Each term includes a brief description, including why it is important. Code examples illustrating some of these terms will be presented later in the article.

Common subexpression elimination: If the value resulting from the calculation of a subexpression is used multiple times, perform the calculation once and substitute the result for each individual calculation.
Constant propagation: Replace variables that rely on an unchanging value with the value itself.
Copy propagation: Replace multiple variables that use the same calculated value with one variable.
Dead code elimination: Code that never gets executed can be removed from the object file to reduce stored size and runtime footprint.
Dead store elimination: Variables and related store operations that are never used can be removed from the object file to reduce stored size and runtime footprint.
Global register allocation: Variables that do not overlap in scope may be placed in registers, rather than remaining in RAM. Accessing values stored in registers is faster than accessing values in RAM.
Global vs. peephole scope: Global optimization allows the compiler/optimizer to look at the overall program and determine how best to apply the desired optimization level. Peephole provides local optimizations, which do not account for patterns or conditions in the program as a whole. Local optimizations may include instruction substitutions.
Inline calls: A function that is fairly small can have its machine instructions substituted at the point of each call to the function, instead of generating an actual call. This trades space (the size of the function code) for speed (no function call overhead).
Instruction scheduling: Instructions for a specific processor may be generated, resulting in more efficient code for that processor but possible compatibility or efficiency problems on other processors. This optimization may be better applied to embedded systems, where the CPU type is known at build time. Most desktop software should use the Generic PowerPC option in CodeWarrior.
Lifetime analysis: A register can be reused for multiple variables, as long as those variables do not overlap in scope.
Loop counting using CTR: The count register (CTR) on the PowerPC is intended to be used for counting, such as in loop iterations. There are some branch instructions that specifically use the value in the CTR.
Loop invariant expressions (code motion): Values that do not change during execution of a loop can be moved out of the loop, speeding up loop execution.
Loop transformations: Some loop constructs can be rewritten to execute more quickly. For example, a for{} loop can be rewritten as a while{} loop, saving one or more instructions.
Loop unrolling: Statements within a loop that rely on sequential indices or accesses can be repeated more than once in the body of the loop. This results in checking the loop conditional less often.
Strength reduction: Certain operations and their corresponding machine code instructions require more time to execute than simpler, possibly less efficient counterparts.

CodeWarrior Optimization Dialog

Figure 1 shows the dialog box pane used to specify instruction scheduling and peephole optimization in CodeWarrior.

Figure 1. PPC processor pane.

Global optimizations are specified in the optimization pane, shown in Figure 2. Global optimization can be performed with the intent of making the resulting code faster or smaller, but not both.

Figure 2. Global optimization pane.

The code examples presented in this article use either peephole optimization or the faster execution speed settings. The smaller code size setting is briefly discussed later in the article. Profiled results for smaller code size are provided in the article archive, but are not illustrated in code examples.

Disassembling Code

The Disassemble menu item under the Project menu in CodeWarrior allows you to view the compiler's output resulting from application of any optimization choices. This is extremely useful in that it gives you a chance to save an assembly listing of your code for use during debugging sessions. That menu item was used to produce the listings presented in this article.

Just Enough Assembly Language

Listed below are the PowerPC instructions used in this article. Following each is the expanded mnemonic, and a short description of what the instruction does (using the specified operands). The instructions are presented in mnemonic alphabetical order, not the order in which they are encountered in the code. For a thorough treatment of Power PC assembly language, refer to the references listed at the end of this article.

add r30,r30,r0: Add: store the sum of the values stored in r0 and r30 in r30.
addi r30,r30,1: Add Immediate: add 1 to the value stored in r30, and store the result in r30.
b *+12: Branch: continue execution at the location 12 bytes (3 instructions) after this instruction.
bdnz *-80: Branch if Decremented Count Register is Not Zero: decrement the Count Register, and if its value is not zero, continue execution 80 bytes (20 instructions) before this instruction.
bl.whileTest__Fv: Branch and Link: branch to the location specified and continue execution, saving the address of the next instruction to be executed in this routine in the Link Register.
blt *-12: Branch if Less Than: if the result of the previous comparison resulted in the Condition Register LT bit being set, continue execution at the instruction 12 bytes (3 instructions) prior to this instruction.
bne *-28: Branch if Not Equal: if the result of the previous compare operation is not zero, as recorded in the Condition Register EQ bit, continue execution at the instruction 28 bytes (7 instructions) before this instruction.
cmplw r31,r29: Compare Logical Word: perform an unsigned comparison between r29 and r31, and update the Condition Register as appropriate.
cmplwir31,$0000: Compare Logical Word Immediate: compare the value in r31 against 0, and set the appropriate flags in the Condition Register.
li r30,0: Load Immediate: move the value 0 into register r30.
lwzr3,gLimit(RTOC): Load Word and Zero: move the value of the global variable gLimit into register r3. Global variables are referenced as offsets from the Table of Contents register (RTOC).
mr r31,r29: Move Register: copy the contents of r29 into r31.
mtctr r0: Move To Count Register: copy the value in r0 to the Count Register (typically used for loop iteration).
mullw r0,r29,r28: Multiply integer word, return Low Word: store the product of the values stored in r28 and r29 in r0.
stmw r26,-24(SP): Store Multiple Word: store registers r26-r31 beginning at the location -24 bytes off the current stack pointer location.
stw r31,-4(SP): Store Word: store register r31 at the location -4 bytes off the current stack pointer location.
subir31,r31,1: Subtract Immediate: subtract 1 from the value in r31, and store the result in r31.
subic.r31,r31,1: Subtract Immediate Carrying: similar to subi, but set the Exception Register CA bit with the result of the carry (whether it occurs or not).

Example Code

The archive for this article includes the .c and .dump files. Listing 1 contains the source code used to test the optimization options. For illustration purposes, some of the functions contain redundant or unused variables, unoptimized loops, and so on.

The profiler calls and header include statement are commented out. The sample project file in the archive includes the profiler library, so if you would like to use the profiler, simply uncomment the four lines in the source code, compile, and run.

Listing 1: test.c

test.c
Sample code with loops and variable usage.

#include <stdio.h>
#include <stdlib.h>
//#include <profiler.h>

unsigned long cseTest( void );
unsigned long doWhileTest( void );
unsigned long whileTest( void );
unsigned long forTest( void );

unsigned long gLimit = 1000L;

int main( void ) {
//ProfilerInit( collectSummary, bestTimeBase, 10, 3 );
//ProfilerSetStatus( 1 );
   unsigned long   theResult;
   unsigned long   j = 0L;

   j++;

   theResult = whileTest();
   
   theResult = doWhileTest();
   
   theResult = cseTest();

   theResult = forTest();
   
   j-;
//ProfilerDump( "\ptest.c.prof");
//ProfilerTerm();
   return 0;
}

unsigned long whileTest( void ) {
   unsigned long   theResult = 0L;
   unsigned long   i, limit = gLimit;

   i = 0;
   
   while ( i < limit )
      i++;
   
   return 0;
}

unsigned long doWhileTest( void ) {
   unsigned long   theResult = 0L;
   unsigned long   i, limit = 1000L;
   unsigned long dest[ 1000 ], index = 0;

   i = limit;
   
   do {
      dest[ index++ ] = i;
      i-;
   } while ( i > 0 );

   return 0;
}

unsigned long cseTest( void ) {
   unsigned long   theResult = 0L;
   unsigned long   i, limit = 1000L;
   unsigned long   product = 0L;
   unsigned long   x = 4;
   unsigned long   y = 3;

   theResult = limit;
   
   i = limit;
   
   do {
      product += x * y;
   } while ( i- > 0 );

   return 0;
}

unsigned long forTest( void ) {
   unsigned long   i, limit = 1000, sum = 0;
   
   for ( i= 0; i < limit; i++ )
      sum += i;

   return sum;
}

Peephole Optimization

The peephole optimizer applies a set of pattern-matching rules to the code, with the goal of replacing inefficient sequences of instructions with more efficient sequences. A second goal is to perform some register usage optimizations based on variable lifetimes. Both benefits are shown in the following examples.

Note that function prolog and epilog code is only occasionally presented in the listings in this article. The full listings are included in the article archive.

In the first example, the non-optimized version of main() contains the following assembly code, where register r30 is used to hold the value of j:

int main( void ) {
//...

   // Initialize j.
   // unsigned long   j = 0L;
      li    r30,0

   // Increment j.
   // j++;
      addi   r30,r30,1

   // theResult = whileTest();
      bl    .whileTest__Fv

   // ...

Enabling the peephole optimizer results in no action being taken on j, since its value is never used:

int main( void ) {
//...

   // Initialize j.
   // unsigned long   j = 0L;

   // Increment j.
   // j++;
   // No instructions generated!

   // theResult = whileTest();
      bl    .whileTest__Fv

   // ...

The second example uses the function doWhileTest(). The variable limit is accessed twice, but its value never changes. In addition, there is a subtract/compare instruction sequence. First, the unoptimized version:

unsigned long doWhileTest( void ) {
// ...

   // unsigned long   i, limit = 1000L;
   // Initialize limit in r29.
      li    r29,1000

   // i = limit;
   // Copy limit into r31.
      mr    r31,r29

   // i-;
   // Subtract 1 from i (in r31).
      subi   r31,r31,1

   // } while ( i > 0 );
   // Compare to 0 and branch if not equal.
      cmplwi  r31,$0000
      bne   *-28

The peephole optimizer detects the redundant register usage for the variable limit, and the fact that the value of limit never changes. The peephole optimizer also finds the inefficient subtract/compare sequence. It transforms the function into:

unsigned long doWhileTest( void ) {
// ...

   // Initialize limit in r31.
   // This is the same register used for i, so no copy/move operation is required.
      li    r31,1000

   // i-;
   // Subtract with carry and set condition flags.
      subic.  r31,r31,1
      
   // Branch based on result of subtraction operation.
   // Comparison operation is implicit.
      bne   *-20

The profiler says:

The CodeWarrior profiler, which can be used to measure (among other things) the amount of time spent in various functions, showed the following execution times for doWhileTest():

Optimization Level	Function	Time (ms)	% of Total Time
None	doWhileTest()	0.131	39.3
Peephole	doWhileTest()	0.105	34.4

These results were obtained using the bestTimeBase setting. At runtime, this translated into the PowerPC timebase setting, which uses the Real-Time Clock (RTC) register on PPC 601-based machines, and the Time Base (TB) register on later implementations, for accurate, low-overhead timing.

The fourth column is the percentage of total program running time spent in this function. This is very useful in determining where to begin optimizing. To use it properly, you should compare the results for all of the functions at a given optimization level. (The profiler summary for all functions and all optimization levels is included in the article archive.) You need to be careful, because the time and % of total time columns do not necessarily mirror each other. You will see in one of the examples (Global Optimization - Level 4) that sometimes an inverse relationship occurs: less time spent in a function may be accompanied by a greater percentage of total time!

In summary, even though its scope is relatively local, the peephole optimizer may be able to provide some performance benefits, depending on patterns it detects in the code. In this case, it decreased execution time slightly (by approximately 20%).

Global Optimization - Level 1

This level of optimization provides global register allocation and dead code elimination.

Here is the non-optimized version of whileTest():

unsigned long whileTest( void ) {
      stw   r31,-4(SP)
      stw   r30,-8(SP)

   // unsigned long   theResult = 0L;
      li    r0,0
      stw   r0,-16(SP)

   // unsigned long   i, limit = gLimit;
      lwz   r3,gLimit(RTOC)
      lwz   r30,0(r3)

   // i = 0;
      li    r31,0

   // while ( i < limit )
      b    *+8
      i++;
   
      addi   r31,r31,1
      cmplw  r31,r30
      blt   *-8

The level 1 optimized version eliminates the non-volatile register saves, and uses volatile registers for local variables. Also, the assignment of theResult is eliminated, since its value is never used.

unsigned long whileTest( void ) {
   // unsigned long   theResult = 0L;
   // unsigned long   i, limit = gLimit;
      lwz   r3,gLimit(RTOC)
      lwz   r0,0(r3)

   // i = 0;
      li    r3,0

   // while ( i < limit )
      b    *+8

      // i++;
      addi   r3,r3,1
      cmplw  r3,r0
      blt   *-8

The profiler says:

Optimization Level	Function	Time (ms)	% of Total Time
None	whileTest()	0.025	7.5
Global - Faster 1	whileTest()	0.025	7.5

In this case, the result is not faster, but it is more efficient in terms of number of instructions used (code size).

Global Optimization - Level 2

This level of optimization adds common subexpression elimination and copy propagation to those provided by level 1.

Here is the non-optimized version of cseTest():

unsigned long cseTest( void ) {
   // Preserve the non-volatile registers (r26-r31).
      stmw   r26,-24(SP)

   // unsigned long   i, limit = 1000L;
      li    r27,1000

   // unsigned long   product = 0L;
      li    r30,0

   // unsigned long   x = 4;
      li    r29,4

   // unsigned long   y = 3;
      li    r28,3

   // theResult = limit;
   
   // i = limit;
      mr    r31,r27

   // do {
      // product += x * y;
      // A multiply is an expensive (slow) operation.
      mullw  r0,r29,r28
      add   r30,r30,r0

   // } while ( i- > 0 );
      cmplwi  r31,$0000
      subi   r31,r31,1
      bne   *-16

Enabling level 2 optimization changes the multiply instruction to an add (using a constant value of 12). As with level 1, the use of volatile registers (r0, r3-r12) replaces usage of non-volatile registers (r13-r31). As a result, the store multiple instruction goes away.

unsigned long cseTest( void ) {
   // No store multiple register instruction.
   // unsigned long   theResult = 0L;
   // ...

   // theResult = limit;   
      li    r4,0
   
   // i = limit;
   // do {
      li    r3,1000

      // product += x * y;
      // Add the constant value 12 to product.
      addi   r4,r4,12

   // } while ( i- > 0 );
      cmplwi  r3,$0000
      subi   r3,r3,1
      bne   *-12

The profiler says:

Optimization Level	Function	Time (ms)	% of Total Time
None	cseTest()	0.131	39.3
Global - Faster 2	cseTest()	0.047	18.7

The optimized version is approximately 64% faster than the original!

Global Optimization - Level 3

This level of optimization adds loop transformations, strength reduction, and loop-invariant code motion to those provided by level 2.

Here is the non-optimized version of forTest():

unsigned long forTest( void ) {
      stw   r31,-4(SP)
      stw   r30,-8(SP)
      stw   r29,-12(SP)

   // unsigned long   i, limit = 1000, sum = 0;
      li    r29,1000
      li    r30,0

   // for ( i= 0; i < limit; i++ )
      li    r31,0
      b    *+12

      // sum += i;
      add   r30,r30,r31
      addi   r31,r31,1
      cmplw  r31,r29
      blt   *-12

Applying the optimizations results in an unrolling of the loop. This code now contains the equivalent of ten loop iterations in the body of the loop. Notice that the iterator (limit) has been reduced to the value of 100 to compensate.

unsigned long forTest( void ) {
   // unsigned long   i, limit = 1000, sum = 0;
      li    r3,0
      li    r4,0
      b    *+4

   // for ( i= 0; i < limit; i++ )
      // Iterate over the loop 1000 times (total).
      // Use the count register to hold the adjusted value of limit (100).
      li    r0,100
      mtctr  r0
      b    *+4

      // sum += i;
      add   r3,r3,r4
      addi   r4,r4,1
      add   r3,r3,r4
      addi   r4,r4,1
      add   r3,r3,r4
      addi   r4,r4,1
      add   r3,r3,r4
      addi   r4,r4,1
      add   r3,r3,r4
      addi   r4,r4,1
      add   r3,r3,r4
      addi   r4,r4,1
      add   r3,r3,r4
      addi   r4,r4,1
      add   r3,r3,r4
      addi   r4,r4,1
      add   r3,r3,r4
      addi   r4,r4,1
      add   r3,r3,r4
      addi   r4,r4,1
      bdnz   *-80

The profiler says:

Optimization Level	Function	Time (ms)	% of Total Time
None	forTest()	0.046	13.9
Global - Faster 3	forTest()	0.031	14.9

The optimized version is approximately 33% faster. Notice the increased percentage of total time: the time savings obtained in this loop must have been outweighed by more significant savings elsewhere in the optimized program.

Global Optimization - Level 4

This level of optimization adds repeated loop-invariant code motion and common subexpression elimination. The optimized code becomes a candidate for further optimizations, until no more opportunities are found.

For this example, we will compare the level 3 and 4 versions of the same function. Here is the level 3 optimized version of cseTest():

unsigned long cseTest( void ) {
   // unsigned long   theResult = 0L;
   // unsigned long   i, limit = 1000L;
   // unsigned long   product = 0L;
   // unsigned long   x = 4;
   // unsigned long   y = 3;

   // theResult = limit;
   
   // i = limit;
   
   // do {
      li    r4,0
      li    r3,1000
      b    *+4

      // product += x * y;
   // } while ( i- > 0 );
      addi   r4,r4,12
      cmplwi  r3,$0000
      subi   r3,r3,1
      bne   *-12

Here is the level 4 version. Notice that the value of product is never calculated, probably because it is never used.

unsigned long cseTest( void ) {
   // unsigned long   theResult = 0L;
   // ...
   
   // do {
      li    r3,1000
      b    *+4
      b    *+4

      // product += x * y;
   // } while ( i- > 0 );
      cmplwi  r3,$0000
      subi   r3,r3,1
      bne   *-8

The profiler says:

Optimization Level	Function	Time (ms)	% of Total Time
Global - Faster 3	cseTest()	0.046	22.4
Global - Faster 4	cseTest()	0.028	14.8

Again, much (39%) faster the second time around!

Faster vs. Smaller

Although I am not presenting examples using global optimizations for smaller code, the profiler results for those settings are included in the archive. For reference, here are the generated code sizes for faster vs. smaller, for each optimization level:

Faster 0 360 48 (constant) Faster 1 260 Faster 2 220 Faster 3 308 Faster 4 316 Smaller 0 312 Smaller 1 260 Smaller 2 220 Smaller 3 236 Smaller 4 244

Faster/Smaller	Level	Code Size (bytes)	Data Size (bytes)
Faster	0	360	48 (constant)
Faster	1	260
Faster	2	220
Faster	3	308
Faster	4	316
Smaller	0	312
Smaller	1	260
Smaller	2	220
Smaller	3	236
Smaller	4	244

The smaller option always generated code that was no larger, and usually smaller, than its faster counterpart at each optimization level. Applying levels 3 and 4 always resulted in increased code size (compared to level 2). However, the percentage increase from level 2 to level 4 for the smaller setting (10.9%) is much lower than for the faster setting (43.6%).

Be careful in trying to apply these results to another program. Each program has its own unique characteristics, which may respond favorably to some types of optimization but not to others. Plus, your needs may dictate which type of optimization to perform. For example, code destined for embedded systems may require the smaller option.

Doing it Yourself

Hand-optimized programs may still present the best performance. For example, you can optimize loops yourself, or include assembly functions mixed with high-level code. On the other hand, that approach is time-consuming, costly, and possibly less portable. Plus, not all developers have the necessary skills to hand-optimize. Relying on the development environment to perform optimizations is a good thing.

Writing a straightforward, correct program is key to generating an optimized program. That version becomes the starting point in the optimization process. Every subsequent adjustment or optimization, whether applied by the developer or the development environment, should only make the code faster or tighter.

Debugging efforts should focus on the unoptimized version, since the generated machine code most closely mirrors the original source. Once the program is bug-free (!), the desired type and level of optimization should be applied.

Conclusion

Optimization techniques can improve code size or speed. CodeWarrior provides several categories of optimization, including peephole, global, and instruction scheduling. You can apply any of these in combination via the project preferences dialog. It is worth reviewing your disassembled code to assess the impact the optimizations will have on subsequent debugging sessions. Debugging will be easier if performed on the unoptimized version of your code, since the optimized version of the code may not closely mirror the original source. Once the bugs are out, add the appropriate type of optimization.

References

Compiler Construction, Niklaus Wirth, Addison-Wesley, 1996.
Optimizing PowerPC Code, Gary Kacmarcik, Addison-Wesley, 1995.
PowerPC Microprocessor Developer's Guide, John Bunda et al, Sams Publishing, 1995.
Writing Compilers and Interpreters, Ronald Mak, Wiley Computer Publishing, 1996.

Credits

Thanks to Tom Thompson of Metrowerks for reviewing this article and providing insight into CodeWarrior's optimization process.

Archive URL

The article archive, containing a CodeWarrior project file, source code, and disassembler and profiler dump files, is available at <www.mactech.com/>.

Andrew Downs is a Technical Lead for Template Software in New Orleans, LA, designing and building enterprise apps. He also teaches Java programming at Tulane University College. Andrew wrote the Macintosh freeware program Recent Additions, and the Java application UDPing. You can reach him at andrew@downs.net.

Software Updates via MacUpdate

Latest Forum Discussions

Combo Quest (Games)

Combo Quest 1.0 Device: iOS Universal Category: Games Price: $.99, Version: 1.0 (iTunes) Description: Combo Quest is an epic, time tap role-playing adventure. In this unique masterpiece, you are a knight on a heroic quest to retrieve... | Read more »

Hero Emblems (Games)

Hero Emblems 1.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: ** 25% OFF for a limited time to celebrate the release ** ** Note for iPhone 6 user: If it doesn't run fullscreen on your device... | Read more »

Puzzle Blitz (Games)

Puzzle Blitz 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: Puzzle Blitz is a frantic puzzle solving race against the clock! Solve as many puzzles as you can, before time runs out! You have... | Read more »

Sky Patrol (Games)

Sky Patrol 1.0.1 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0.1 (iTunes) Description: 'Strategic Twist On The Classic Shooter Genre' - Indie Game Mag... | Read more »

The Princess Bride - The Official Game...

The Princess Bride - The Official Game 1.1 Device: iOS Universal Category: Games Price: $3.99, Version: 1.1 (iTunes) Description: An epic game based on the beloved classic movie? Inconceivable! Play the world of The Princess Bride... | Read more »

Frozen Synapse (Games)

Frozen Synapse 1.0 Device: iOS iPhone Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: Frozen Synapse is a multi-award-winning tactical game. (Full cross-play with desktop and tablet versions) 9/10 Edge 9/10 Eurogamer... | Read more »

Space Marshals (Games)

Space Marshals 1.0.1 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0.1 (iTunes) Description: ### IMPORTANT ### Please note that iPhone 4 is not supported. Space Marshals is a Sci-fi Wild West adventure taking place... | Read more »

Battle Slimes (Games)

Battle Slimes 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: BATTLE SLIMES is a fun local multiplayer game. Control speedy & bouncy slime blobs as you compete with friends and family.... | Read more »

Spectrum - 3D Avenue (Games)

Spectrum - 3D Avenue 1.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: "Spectrum is a pretty cool take on twitchy/reaction-based gameplay with enough complexity and style to stand out from the... | Read more »

Drop Wizard (Games)

Drop Wizard 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: Bring back the joy of arcade games! Drop Wizard is an action arcade game where you play as Teo, a wizard on a quest to save his... | Read more »

Price Scanner via MacPrices.net

Our MacBook Price Trackers will show you the...

Our Apple award-winning MacBook Price Trackers are continually updated with the latest information on prices, bundles, and availability for 16″ and 14″ MacBook Pros along with 13″ and 15″ MacBook... Read more

Amazon is offering a 10% discount on Apple’s...

Don’t pay full price! Amazon has 16-inch M4 Pro MacBook Pros (Silver and Black colors) on sale today for 10% off Apple’s MSRP. Shipping is free. These are the lowest prices currently available for 16... Read more

13-inch M4 MacBook Airs on sale for $150 off...

Amazon has new 13″ M4 MacBook Airs on sale for $150 off MSRP right now, starting at $849. Sale prices apply to most colors and configurations. Be sure to select Amazon as the seller, rather than a... Read more

15-inch M4 MacBook Airs on sale for $150 off...

Amazon has new 15″ M4 MacBook Airs on sale for $150 off Apple’s MSRP, starting at $1049. Be sure to select Amazon as the seller, rather than a third-party: – 15″ M4 MacBook Air (16GB/256GB): $1049, $... Read more

Amazon is offering a $50 discount on Apple’s...

Amazon has Apple’s 11th-generation A16 iPads in stock on sale for $50 (or a little more) off MSRP this week. Shipping is free: – 11″ 11th-generation 128GB WiFi iPads: $299 $50 off MSRP – 11″ 11th-... Read more

Clearance 13-inch M1 MacBook Airs available f...

Walmart has clearance, but new, Apple 13″ M1 MacBook Airs (8GB RAM, 256GB SSD) available online for $649, $360 off original MSRP, in Space Gray, Silver, and Gold colors. These are new MacBooks for... Read more

iPad minis on sale for $100 off Apple’s MSRP...

Amazon is offering $100 discounts (up to 20% off) on Apple’s newest 2024 WiFi iPad minis, each with free shipping. These are the lowest prices available for new minis among the Apple retailers we... Read more

AirPods Max headphones on sale for $479, $70...

Amazon has AirPods Max with USB-C on sale for $479.99 in all colors. Shipping is free. Their price is $70 off Apple’s MSRP, and it’s the lowest price available today for AirPods Max. Keep an eye on... Read more

14-inch M4 Pro/M4 Max MacBook Pros on sale th...

Don’t pay full price! Get a new 14″ MacBook Pro with an M4 Pro or M4 Max CPU for up to $320 off Apple’s MSRP this weekend at these retailers…they are the lowest prices available for these MacBook... Read more

Get a 15-inch M4 MacBook Air for $150 off App...

A couple of Apple retailers are offering $150 discounts on new 15″ M4 MacBook Airs this weekend. Prices at these retailers start at $1049: (1): Amazon has new 15″ M4 MacBook Airs on sale for $150 off... Read more

Jobs Board

SPREAD THE WORD:
Slashdot
Digg
Del.icio.us
Reddit
Newsvine

MacTech

Code Optimization

Code Optimization Options in CodeWarrior

Introduction

Optimization Terms

CodeWarrior Optimization Dialog

Disassembling Code

Just Enough Assembly Language

Example Code

Listing 1: test.c

Peephole Optimization

The profiler says:

Global Optimization - Level 1

The profiler says:

Global Optimization - Level 2

The profiler says:

Global Optimization - Level 3

The profiler says:

Global Optimization - Level 4

The profiler says:

Faster vs. Smaller

Doing it Yourself

Conclusion

References

Credits

Archive URL

Software Updates via MacUpdate

Latest Forum Discussions

Price Scanner via MacPrices.net

Jobs Board