Efficient 68030

Volume Number:		8
Issue Number:		5
Column Tag:		Efficient coding

Efficient 68030 Programming

Optimizing your code to run faster on the 68030

By Mike Scanlin, MacTutor Regular Contributing Author

So you want to optimize your code to run faster on the 68030? Well, good for you. This article will help. Most of these optimizations apply to running on the 68020/040, too, and in no case will any of these optimizations make your code run slower on a non-030 machine (like the 68000). These optimizations are not language specific, although most of the examples given here are in Think C and assembly.

The two most important things you need to know in order to make your program run fast on a 68030 are: (1) How to intelligently align your code and data and (2) How to use the data and instruction caches effectively. Once you understand a few basic attributes of the 030 you can do a few things to your code without much work to ensure that you have optimal cache use and alignment.

How much difference will it make, you ask? Using these techniques you can save about 15% on your subroutine calling overhead, 25% to 50% on some of your 32-bit memory accesses (reading or writing longs, Ptrs, Handles, etc), 60% on your NewHandleClear calls and more. Taken in sum these things are likely to make a noticeable difference in your code’s performance when running on an 030.

The single biggest gain you can realize in your programs when running on an 030 is to make sure your 4-byte variables (longs, Ptrs, Handles, etc) are on 4-byte memory boundaries (i.e. their address is evenly divisible by 4). To understand why this makes a difference you need to know a little bit about how the data cache works.

THE DATA CACHE

The 68030 has a 256 byte data cache where it stores copies of the most recently accessed memory locations and values. If your code reads the same memory location twice within a short period of time, it is very likely that the 68030 can get the value at that location from its cache (which is in the processor itself) rather than from actual memory. This reduces the time it takes to execute the instruction accessing the memory location because it eliminates the external bus cycle that the processor would normally use to go out into memory and fetch the value that the instruction needs.

Of course, it’s not quite as simple as that. There are a couple of other details to know. The 256 byte data cache is actually broken up into 16 separate caches of 16 bytes each. When the processor is trying to fetch the value at a given memory location it first checks if it is in one of its 16 data caches. If it is, then it uses the value and everything is cool. If it’s not, then it goes out into memory and fetches the value along with the 16 or 32 bytes surrounding the value. If x is the memory location requested, then the processor fetches all the bytes from location ((x >> 4) << 4) to ( (x + sizeof(fetch)) >> 4) << 4) + 15), where sizeof(fetch) is 1, 2 or 4 depending on if this is a byte, word or long access. Basically, it fetches enough 16 byte “buckets” to completely cover the request (and each bucket begins at an address divisible by 16). The reason why more than one 16-byte bucket might be necessary is because the first part of the requested value could be at the end of one bucket and the second part could be at the beginning of the next bucket. An example is the low memory global GrayRgn, whose address is $9EE - the first half (hi-word) exists in the bucket from $9E0 to $9EF and the second half (lo-word) exists in the bucket from $9F0 to $9FF. So, assuming GrayRgn isn’t in the data cache to begin with, the instruction:

Move.L  GrayRgn,A0

will cause 32 bytes to be read in from memory (4 of which the processor actually uses to process the Move.L instruction). At the end of the instruction, two of the 16 data caches are filled with the values from $9E0 to $9FF. The reason why the processor does this is because it is predicting that the code is going to want the bytes near the most recently requested memory location. Although the first access to a 16-byte bucket is expensive, subsequent accesses to the same bucket are free (in terms of time to fetch the data). This actually helps routines like BlockMove quite a bit (it’s 46% faster for non-overlapping moves because of it). And it’s not quite as sick as you might expect to read in 16 or 32 bytes from memory because the processor uses a special “data burst” mode to do it (which runs somewhat parallel with other stuff the processor is doing).

The other detail to know about the 030 data cache is that it’s a write-through cache. That means that if you change the value at a given location and that location is already in the data cache, then both the cached value and the real memory value are updated. This way you can never have stale data in the cache (i.e., the cache always agrees with actual memory and vice versa). (Note: That’s somewhat of a lie. It is actually possible to get stale data in an 030 data cache if you have a DMA card writing to main memory at interrupt time, but I assume that if you’re doing that then you already know what’s going on. Most people won’t ever have to worry about this detail. Check out TN #261: “Cache As Cache Can” for more info on this.)

The last little thing to know about alignment is that independent of the 16-byte buckets the data cache uses, it is almost always faster to access a 4-byte quantity on a 4-byte boundary than on a 2-byte boundary (the only time it’s not is if the entire 4 bytes are already in the data cache). How much difference does it make? In the case of reading a 4-byte value it is 34% faster if the value is long-aligned and in the case of writing a 4-byte value it is 25% faster if the value is long-aligned (and if the non-aligned value crosses a 16-byte bucket boundary then it’s 52% faster for reads and 25% faster for writes; notice there is no penalty, other than the normal non-aligned penalty, for writing to a 4-byte value that crosses a 16-byte boundary because write operations don’t cause 32-byte reads to happen).

So, the take home message is that you should always put your 4 byte quantities (longs, Ptrs, Handles, etc) at addresses that are divisible by 4 (or, to be more general, put your n-byte quantities on n-byte boundaries). At the time the Macintosh was originally conceived this was not a design goal because several of the low memory globals are 4-byte quantities at odd-word addresses. In fact, a few live at the worst possible location for a 4-byte quantity: those addresses ending in $E that will cause 32-byte reads when they are accessed: FCBSPtr ($34E), GrayRgn ($9EE), MinStack ($31E), ROMBase ($2AE) and WMgrPort ($9DE). There’s nothing we can do about those, but there is something we can do about your code.

ALIGNING YOUR VARIABLES AND DATA

The basic idea of aligning your variables is to keep all of your 4-byte variables/structs together and then have all of your 2-byte and 1-byte variables/structs follow. If you know that the first variable you allocate is long-word aligned and all the ones that follow it are multiples of 4-bytes in length, then you know that they are all going to be long-word aligned (likewise, if the first one is off then they are all going to be off).

Here is an example of some variables, some of which are guaranteed to not be aligned:

inti;
long  x;
char  c[2];
Ptrp;

Because i and c are 2-byte quantities, you know that one of x or p is going to be long-word aligned and the other one is not going to be long-word aligned, no matter where the compiler allocates them (assuming that they are all allocated next to each other in the order they’re declared).

Here’s a regrouping that is better:

/* 1 */

long  x;
Ptrp;
inti;
char  c[2];

Now, as long as you know that x is long-word aligned, you know that p is long-word aligned, too. Forcing long-word alignment of variables comes in three parts: globals, locals and data structures.

Before we get to the specific details, I should mention that the C language doesn’t specify the location or order of declared variables; it’s implementation dependent. Therefore, it’s more important that you understand what is desirable and why it is desirable. Then, you can consult your compiler manual or, more likely, write some test programs to see what it’s doing with your variables. In a perfect world you wouldn’t have to be concerned with alignment - it would be a compiler option. Until that time, read on.

ALIGNING GLOBAL VARIABLES

Because of the fact that the Mac memory manager always allocates memory on 4-byte boundaries (Ptrs and dereferenced handles always end in 0, 4, 8 or C) you know that the heap is going to start and end on a long-word boundary. Further, because you know that the jump table is a multiple of 8 bytes you can assume that register A5 will point to an address divisible by 4. This means that as long as you allocate your globals in chunks of 4 bytes you will have long-word alignment for them. The best way to do this is to allocate your biggest variables first, then your next biggest, and so on, on down to your smallest. Just make sure that you have an amount of each one that divides by four.

Good global allocation:

/* 2 */

/* First, the 4*n byte quantities */
 Rect r;

/* Now list the 4 byte quantities */
 Handle h;
 long x;
 char *p;

/* Now list the 2 bytes quantities */
 int  i, j, k;
 int  dummyInt1; /* need this for padding */

/* And finally the 1 byte quantities */
 char ch;
 Booleanb;
 int  dummyInt2; /* need this for padding */

Note that each section is a multiple of 4 bytes long. This is important because the Think C 5.0 compiler/linker allocates globals in backwards order from how they’re declared. If these are your only global variables:

long  x;
inti;

then x is at $FFFA(A5) and i is at $FFFE(A5). But if you put a dummyInt in there like this:

/* 3 */

long  x;
inti;
intdummyInt;

then x becomes long-word aligned at $FFF8(A5) (which is the goal).

The only gotcha with globals has to do with the QuickDraw (QD for short) globals which aren’t a multiple of 4 bytes long (they’re 206 bytes long). You need to know if your compiler allocates them before or after your own globals and maybe add a dummyInt either before or after all of your globals to compensate for the unaligned QD globals. Think C users don’t really have to worry about this problem because the Think compiler allocates the QD globals after the application globals (so the QD globals might be unaligned but there’s generally nothing you can do about that; at least they won’t affect your application globals’ alignment).

ALIGNING YOUR LOCAL VARIABLES

Aligning local variables is trickier because they are allocated on the stack. It requires that the stack be long-word aligned on entry to your routine, which you can’t always control (especially if you’re a callback proc). However, there is a little we can do to make this a more likely event.

First, we need to make sure the stack is aligned as soon as our code begins to execute. The easiest way to do this is by not having any local variables in main() and by adding a little code that makes sure the stack is aligned after launching by adjusting it by 2 if necessary. The following inline functions are used for this purpose:

/* 4 */

void AlignStack(void) = {
 0x200F,//Move.L SP,D0
 0x0240,0x0003,  //And  #0x0003,D0
 0x6604,//Bne.S  @1
 0x7004,//Moveq  #4,D0
 0x554F,//Subq #2,SP
 0x3F00 // @1  Move D0,-(SP)
};

void UnAlignStack(void) = {
 0xDED7,//Add  (SP),SP
};

If you need to, make main a shell proc that calls your “real” main, MyMain, so that you don’t have any local variables in main():

/* 5 */

void main() {
 AlignStack();
 MyMain();
 UnAlignStack();
}

If you do this, then you know for sure that MyMain will be called with an aligned stack. Now the goal is to make MyMain’s stack frame (as well as every other stack frame in the entire program) aligned - make sure you declare variables from largest to smallest and make sure that the total size of the stack frame (including parameters) is a multiple of 4 bytes (in addition, if you’re using Pascal then you need to include the size of the function result since it’s allocated on the stack). The following inline function is useful to put at the front of each function to make sure the stack is aligned on entry. It will cause a hang (branch-to-self instruction) if it detects an unaligned stack (in which case you know the caller of that function needs work):

/* 6 */

void CheckStackAlignment(void) = {
 0x2F00,//Move.L D0,-(SP)
 0x200F,//Move.L SP,D0
 0x0240,0x0003,  //And  #0x0003,D0
 0x66FE,// @1  Bne.S @1
 0x201F,//Move.L (SP)+,D0
};

Here’s an example program with all of these macros in use:

/* 7 */

void main() {
 AlignStack();
 MyMain();
 UnAlignStack();
}

void MyMain() {
 long x;
 int  i;
 int  d; // needed for alignment

#if DEBUG
 CheckStackAlignment();
#endif
 x = 5;
 i = MyProc(x);
}

int MyProc(long x) {
#if DEBUG
 CheckStackAlignment();


#endif
 return(x);
}

Not only are you ensuring that x is long-word aligned in this example but, because the stack is aligned the whole time you are getting another benefit: faster subroutine calls. The reason is because the processor can push and pop the return address (a 4-byte value) to an aligned location on the stack. The overall gain of an aligned stack on a Jsr/Bsr and Rts pair is 14% to 16% faster than the same Jsr/Bsr and Rts on a non-long-word aligned stack. Don’t forget, too, that if the routine does a Movem.L to save/restore the registers it uses on the stack that the Movem.L instructions will be much faster on a long-aligned stack. Note: This assumes that your compiler is not doing delayed stack clean up. If you are coding for speed and using these alignment techniques, then you need to turn off that compiler option. It is called “Defer & combine stack adjusts” in Think C (make sure it’s unchecked).

Note that you don’t want to put CheckStackAlignment in callback procs you pass to the ROMs (like filterProcs or ioCompletion routines) or in VBL tasks because you can’t control what the ROM or interrupt put on the stack before you are called. You’ll just have to live with whatever alignment you get at the time you’re called. You can, however, do something about the alignment of the routines called by your callback proc if you put an AlignStack/UnAlignStack in your callback proc. That way, only the callback proc’s parameters and local variables will be potentially unaligned - all procs that it calls will be aligned.

There is one more detail relative to functions: You should design your functions so that, in C, the parameters should add up to a multiple of 4 bytes and, in Pascal, the parameters plus the size of the function result should add up to a multiple of 4 bytes (because in Pascal the function result is on the stack and in C it’s in a register). Otherwise your carefully crafted stack frames won’t start at an aligned address. It is worth it to increase the size of an int parameter to long or even to add a dummy int parameter in order to maintain this multiple-of-4-bytes rule. If this is your original function:

int MyProc(int i) { return(i); }

then changing it to:

/* 8 */

int MyProc(long i) { return(i); }

will cause it to execute 17% faster when called from an aligned-stack caller. Changing 
it to:

/* 9 */

int MyProc(int i, int dummy) { return(i); }

with a calling sequence of:

/* 10 */

i = MyProc(i, 0);

will cause it to execute 9% faster when called from an aligned-stack caller.
There are three things to keep in mind when adding up the sizes of parameters: (1) 
One-byte parameters (Booleans, chars, bytes, etc) count as 2 bytes each because they are 
pushed on the stack as 16-bits each in order to maintain word-alignment on the stack (or 
else you’d get an address error on the 68000), (2) pointers to any size parameter (like 
char *) are 4 bytes each and (3) declare your parameters in the correct order. If you’re 
working in C, then declare them in order from smallest to largest (the opposite of what 
you do with global and local variables) - that way the 4-byte values (which are pushed 
first) will be long-aligned (assuming the stack is aligned at the time of the subroutine 
call). Usually, a C compiler will push parameters in order from right to left (because of 
functions like printf that take a variable number of parameters, the first parameter needs 
to be pushed last) so you want you’re biggest (4 byte) parameters at the end of the parameter 
list. Here are some examples of bad (unaligned) C prototypes:

void MyProc1(long x, int i);
int MyProc2(int i, long x, char c);
Handle MyProc3(Handle h, long x, int i);

Better (aligned) versions of these would be:

/* 11 */

void MyProc1(int i, long x);
int MyProc2(char c, int i, long x);
Handle MyProc3( int i, long x, Handle h);

Note that the size of the return result doesn’t matter in C. Also, you could do even better than these by making the sum of the sizes of the parameters a multiple of 4 bytes, as was mentioned above (make ints into longs or add dummy ints for filler; MyProc1 and MyProc3 need this). If you’re unsure which way your C compiler pushes the parameters or if you want to write portable code that compiles on two different C compilers that do it differently, then just make sure all of your non-4-byte parameters are in one part of the parameter list and that you have a multiple of 4 bytes worth of them. Then it won’t matter which way the compiler pushes them - the 4-byte ones (which are what we really care about) are guaranteed to be aligned (as is the stack when the routine that is called starts to execute).

In Pascal, it’s a little different. Parameters are pushed from left to right and only after space has been allocated for the function result (if there is one). So, you want to declare your parameters from largest to smallest in this case (the opposite of what you’d do in C). Here are the same prototypes in aligned Pascal:

{ 12 }

PROCEDURE MyProc1(x: LongInt; i: INTEGER);
FUNCTION MyProc2(i: INTEGER; x: LongInt; c: Char): INTEGER;
FUNCTION MyProc3(h: Handle; x: LongInt; i: INTEGER): Handle;

Notice that in Pascal the size of the return result matters. In MyProc2 we make the first parameter an int to make up for the function result INTEGER; that way x (the LongInt) will be aligned when it is pushed on the stack. If the return result is a 4-byte quantity (like the Handle in MyProc3), then you can forget about it and just list the parameters from biggest to smallest. Don’t forget that in Pascal the goal is to make the sum of all the parameters plus the function result be a multiple of 4 bytes (which is not true for any of the above). As in C, make INTEGERs into LongInts or add dummy INTEGERs if you need to.

ALIGNING DATA STRUCTURES

What about custom data structures? Where do they fit in? Well, the answer is simple: in order to have the least maintenance possible with all of this long-word alignment stuff, make sure all of your structs are multiples of 4 bytes long. Add a dummy field or two to the end if you need to, just make sure all structs are multiples of 4 bytes long. That way, if you declare them as a global or as a local, you will know they are multiples of 4 bytes long, and you can forget about them and put them in any section without ruining your alignment.

Bad (unaligned) data structure design:

typedef struct {
 int  i;
 long j;
} MyStruct;

Good (aligned) data structure design:

/* 13 */

typedef struct {
 long j;
 int  i;
 int  dummyInt; /* force the length to a multiple of 4 */
} MyStruct;

Some might say that the dummyInt is an unnecessary waste of space. I would argue that it is a good thing because it ensures that MyStruct is a multiple of 4 bytes, and it might come in useful some day when you need a couple extra bytes in MyStruct - think of dummyInt as unused and reserved for future expansion.

But what about system data structures? Yes, some of them are a problem. Take BitMap, for example:

struct BitMap {
    Ptr baseAddr;
    short rowBytes;
    Rectbounds;
};

There are two problems with it: (1) Bounds starts at an odd-word address (even though it’s really an array of 4 ints, someone might reference the first half as topLeft and it’ll be an unaligned 4-byte reference) and (2) The size is not a multiple of 4 bytes. Unfortunately, we can’t do anything about (1) but, we can do something about (2). Whenever you have a system data structure in your globals or locals and you’re not sure if it is aligned or no,t then you should add this filler variable to take up the extra space:

/* 14 */

BitMap  b;
char  filler1[4-(sizeof(BitMap)%4)];

The worst case is where the struct in question is already aligned. In that case, the filler will be 4 bytes long (when what you’d really like is to have it be zero bytes long). Still, if you’re coding for speed, then a few extra bytes are worth it (better yet, you could manually check to see if the size of the struct is aligned and then not put in the filler in that case).

INSTRUCTION CACHE

The 68030 has a 256 byte instruction cache where it stores instructions (code). It works much like the data cache in that there are 16 caches of 16 bytes each. When an instruction is fetched, the 16 or 32 bytes surrounding it are fetched into the instruction cache, too. Like the other caches, you can turn off the instruction cache if you want to. I haven’t mentioned this fact before because it is hardly ever a good idea. (If you are so motivated, though, you would use the Movec D0,CACR instruction to enable/disable/clear the caches; this is a priveledged instruction, though, so don’t ship any products that use it.).

The goal when writing small loops is to have everything contained within 242 bytes. You can’t assume all 256 bytes of the instruction cache unless your routine begins at an address divisible by 16; the worst case is that your code begins at an address that ends in $E, thus the 242 byte number (2 bytes from the first bucket plus 15 other complete buckets). The time difference between a fully cached function vs. the same function that isn’t fully cached is 24%. So, you can reduce the time of the entire loop by 24% if you can make it fit in 242 bytes or less (including the size of any routines called from within the loop, of course). Note that, in general, unrolling loops is a good way to speed them up (since a short branch not taken is faster than any size branch taken) but, make sure you don’t unroll them to be larger than 242 bytes.

If you have a IIci with a cache card or a IIfx, then you won’t notice this 24% performance gain; it only applies to the standard 030 machines: SE/30, Classic II, IIx, IIcx, IIsi, IIci (no card), PB 140 and PB 170. The IIci’s cache card and IIfx’s built in cache consist of 2048 16-byte caches (32K total) that can be used for data or instructions.

The 68020, by the way, only has a 64-byte instruction cache, which is 16 caches of 4 bytes each. If you want to make sure that your tight loops run well on that processor, then make sure they are 62 bytes or smaller. The 020 does not have a data cache.

The 68040 has 256 instruction caches of 16 bytes each, or 4K worth of instruction caches. It also has a similar 4K data cache.

INTERLEAVED WRITE OPERATIONS

Because memory write operations are write-through, the second of two successive write operations must wait until the first has finished before it can execute. Knowing this, you can use the delay

to your advantage to streamline the pipelining process. The idea is

to avoid two memory writes in a row in cases where you can.

Consider the code you might have at the top of a routine, where you save some registers and then set up some initial values:

Movem.L D0-D3,-(SP)
Moveq #x,D0
Moveq #y,D1
Moveq #w,D2
Moveq #z,D3

You can make this code execute 15% faster on a IIfx (it’s 19% faster on a IIci) if you interleave the memory writes with non-memory reads and writes:

;15

Move.L  D3,-(SP)
Moveq #z,D3
Move.L  D2,-(SP)
Moveq #w,D3
Move.L  D1,-(SP)
Moveq #y,D1
Move.L  D0,-(SP)
Moveq #x,D0

Of course, there is a tradeoff. This faster version takes 4 extra bytes. But if you’re 
coding for speed, it’s worth it.
SPECIAL INSTRUCTIONS
The 68030 supports some addressing modes and instructions that are not implemented 
on the 68000 (they do work on the 020/040, though). If you use these, then your program 
will not run on a 68000. If you have a tight loop that you’d like to run faster on 020s, 
030s and 040s, then consider modifying it to use these instructions. Be sure to do a run-time 
check for which CPU you’re running on (using Gestalt with the gestaltProcessorType selector) 
and only call routines that use these instructions if you’re running on the appropriate CPU 
(using a set of procPtrs that you install when you start up your app is a good idea). Rather 
than give you a dry listing of the new instructions, I’ll give a couple of examples of where 
the common ones might come into play when programming the Mac. If you really want to 
know more, then go get a book on the 030 (if you call Motorola they’ll probably send you 
a Programmer’s Guide for free: 408/749-0510).
Double dereferencing of handles can now be done in one instruction with a new addressing 
mode. If you want to put the first byte of a handle in A0 into D0, you would do this on the 
68000:

 Move.L (A0),A0
 Move.B (A0),D0

Now you can use this instruction instead (which is 35% faster):

;16

 Move.B ([A0]),D0

Of course, if you’re going to need the dereferenced handle for more than this one instruction, then you’re better off keeping the dereferenced handle in a register rather than using this addressing mode to repeatedly redereference it.

You can add an offset after the first dereference, too. If you want the 10th byte of the handle in A0, you used to do this:

 Move.L (A0),A0
 Move.B 9(A0),D0

On the 68020 (and 030/040) you can do this:

;17

 Move.B ([A0],9),D0

Or, suppose your handle points to a list of 8 byte elements, Rects, let’s say, and suppose you have a zero-based index of the Rect you want in D0 and suppose you want the botRight of that Rect in D1. On the 68000 you would do something like this:

 Move.L (A0),A0
 Asl  #3,D0
 Move.L botRight(A0,D0),D1

But now you can do this in a much more glamourous instruction:

;18

 Move.L ([A0],D0*8,botRight),D1

which is 22% faster and 2 bytes smaller than the old way (and, oh, so much more glamourous). There is another new addressing mode, too, which is nearly as glamourous, and has the general form of ([bd,An,Xn.SIZE*SCALE],od). Geez, I can see why people are moving towards RISC these days - these instructions are getting a bit complex

A new instruction, Extb.L, was added that allows you to sign extend a byte to a long. On the 68000 you have to use two instructions, Ext.W and Ext.L, to get this effect.

Bit fields were added, too. They’re really cool if you’re working with bitmapped graphics. It will be worth your time to go read about them if you do a lot of graphics work.

DON’T USE ‘CLEAR’

The NewPtrClear and NewHandleClear traps are a lot slower than they need to be. This is for two reasons: (1) because they use the Clr instruction (which does a memory read before setting the memory to zero; thus filling the data cache with useless data) and (2) because they only clear one byte at a time. The inner loop that they both use is something like this:

@1 Clr.B(A0)+
 Dbra D0,@1

By substituting this loop instead:

;19

 Moveq  #0,D1
@1 Move.B D1,(A0)+
 Dbra D0,@1

you get a 19% speed improvement. Furthermore, by clearing longs at a time (and dealing with the end as a special case) you can clear memory 65% faster than the Clr.B loop method. If you do a lot of NewPtrClears or NewHandleClears you probably want to write your own MyNewPtrClear and MyNewHandleClear that do this.

ASSEMBLY TRIVIA

Who knows what the slowest instruction on the 68030 is (other than Reset)? If you guessed something to do with division you’re wrong. If, on the other hand, you guessed:

;20

Movem.L  ([d32,An],Xn,d32),A0-A7/D0-D7

then you’re right. Weighing in at about 200 cycles you should avoid this instruction wherever possible (Quick! Go change your source!).

I would like to thank Chris B, Dolf S and John A for help in preparing this article. It’s always nice to have people around who can tell you precisely how confused you are. I would also like to thank Timo B for the use of his computer. It’s always nice to have someone around who has more hardware than you do.

Software Updates via MacUpdate

Latest Forum Discussions

Combo Quest (Games)

Combo Quest 1.0 Device: iOS Universal Category: Games Price: $.99, Version: 1.0 (iTunes) Description: Combo Quest is an epic, time tap role-playing adventure. In this unique masterpiece, you are a knight on a heroic quest to retrieve... | Read more »

Hero Emblems (Games)

Hero Emblems 1.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: ** 25% OFF for a limited time to celebrate the release ** ** Note for iPhone 6 user: If it doesn't run fullscreen on your device... | Read more »

Puzzle Blitz (Games)

Puzzle Blitz 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: Puzzle Blitz is a frantic puzzle solving race against the clock! Solve as many puzzles as you can, before time runs out! You have... | Read more »

Sky Patrol (Games)

Sky Patrol 1.0.1 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0.1 (iTunes) Description: 'Strategic Twist On The Classic Shooter Genre' - Indie Game Mag... | Read more »

The Princess Bride - The Official Game...

The Princess Bride - The Official Game 1.1 Device: iOS Universal Category: Games Price: $3.99, Version: 1.1 (iTunes) Description: An epic game based on the beloved classic movie? Inconceivable! Play the world of The Princess Bride... | Read more »

Frozen Synapse (Games)

Frozen Synapse 1.0 Device: iOS iPhone Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: Frozen Synapse is a multi-award-winning tactical game. (Full cross-play with desktop and tablet versions) 9/10 Edge 9/10 Eurogamer... | Read more »

Space Marshals (Games)

Space Marshals 1.0.1 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0.1 (iTunes) Description: ### IMPORTANT ### Please note that iPhone 4 is not supported. Space Marshals is a Sci-fi Wild West adventure taking place... | Read more »

Battle Slimes (Games)

Battle Slimes 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: BATTLE SLIMES is a fun local multiplayer game. Control speedy & bouncy slime blobs as you compete with friends and family.... | Read more »

Spectrum - 3D Avenue (Games)

Spectrum - 3D Avenue 1.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: "Spectrum is a pretty cool take on twitchy/reaction-based gameplay with enough complexity and style to stand out from the... | Read more »

Drop Wizard (Games)

Drop Wizard 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: Bring back the joy of arcade games! Drop Wizard is an action arcade game where you play as Teo, a wizard on a quest to save his... | Read more »

Price Scanner via MacPrices.net

Our MacBook Price Trackers will show you the...

Our Apple award-winning MacBook Price Trackers are continually updated with the latest information on prices, bundles, and availability for 16″ and 14″ MacBook Pros along with 13″ and 15″ MacBook... Read more

Amazon is offering a 10% discount on Apple’s...

Don’t pay full price! Amazon has 16-inch M4 Pro MacBook Pros (Silver and Black colors) on sale today for 10% off Apple’s MSRP. Shipping is free. These are the lowest prices currently available for 16... Read more

13-inch M4 MacBook Airs on sale for $150 off...

Amazon has new 13″ M4 MacBook Airs on sale for $150 off MSRP right now, starting at $849. Sale prices apply to most colors and configurations. Be sure to select Amazon as the seller, rather than a... Read more

15-inch M4 MacBook Airs on sale for $150 off...

Amazon has new 15″ M4 MacBook Airs on sale for $150 off Apple’s MSRP, starting at $1049. Be sure to select Amazon as the seller, rather than a third-party: – 15″ M4 MacBook Air (16GB/256GB): $1049, $... Read more

Amazon is offering a $50 discount on Apple’s...

Amazon has Apple’s 11th-generation A16 iPads in stock on sale for $50 (or a little more) off MSRP this week. Shipping is free: – 11″ 11th-generation 128GB WiFi iPads: $299 $50 off MSRP – 11″ 11th-... Read more

Clearance 13-inch M1 MacBook Airs available f...

Walmart has clearance, but new, Apple 13″ M1 MacBook Airs (8GB RAM, 256GB SSD) available online for $649, $360 off original MSRP, in Space Gray, Silver, and Gold colors. These are new MacBooks for... Read more

iPad minis on sale for $100 off Apple’s MSRP...

Amazon is offering $100 discounts (up to 20% off) on Apple’s newest 2024 WiFi iPad minis, each with free shipping. These are the lowest prices available for new minis among the Apple retailers we... Read more

AirPods Max headphones on sale for $479, $70...

Amazon has AirPods Max with USB-C on sale for $479.99 in all colors. Shipping is free. Their price is $70 off Apple’s MSRP, and it’s the lowest price available today for AirPods Max. Keep an eye on... Read more

14-inch M4 Pro/M4 Max MacBook Pros on sale th...

Don’t pay full price! Get a new 14″ MacBook Pro with an M4 Pro or M4 Max CPU for up to $320 off Apple’s MSRP this weekend at these retailers…they are the lowest prices available for these MacBook... Read more

Get a 15-inch M4 MacBook Air for $150 off App...

A couple of Apple retailers are offering $150 discounts on new 15″ M4 MacBook Airs this weekend. Prices at these retailers start at $1049: (1): Amazon has new 15″ M4 MacBook Airs on sale for $150 off... Read more

Jobs Board

SPREAD THE WORD:
Slashdot
Digg
Del.icio.us
Reddit
Newsvine

MacTech