TweetFollow Us on Twitter

Efficient 68000
Volume Number:8
Issue Number:2
Column Tag:Assembly workshop

Efficient 68000 Programming

If a new CPU speeds up inefficient code, what do you think it will do to efficient code?

By Mike Scanlin, MacTutor Regular Contributing Author

The dew is cold. It is quiet. I hear nothing except for crackling sounds coming from the little fire burning two inches to the left of my keyboard. It wasn’t there a minute ago. Seems that Doo-Dah, the god of efficient programming, is upset with me for typing “Adda.W #10,A0” and just sent me a warning in the form of a lightning bolt. I hate it when he does that. You’d think that after three years in his service, researching which 68000 assembly language instructions are the most efficient ones for any given job, that he would lighten up a little. I guess that’s what makes him a god and me a mere mortal striving for enlightenment through the use of optimal instructions. As I extinguish the fire with a little Mountain Dew, I reflect upon the last three years.

My first lesson in the service of Doo-Dah was that proficiency in assembly language is a desirable skill in programmers so long as performance is a desirable attribute of software. The nay-sayers who depend upon faster and faster CPUs to make their sluggish software run at acceptable speeds don’t realize the underlying relativeness of the universe. If a new CPU will speed up a set of non-optimal instructions by 10%, then it will also speed up a set of optimal instructions by 10%. One should strive to be right on the edge of absolute maximum performance all the time. Users may not notice the difference in a 2K document but when they start working with 20MB documents they will soon be able to separate the optimal software from the non-optimal.

In the months following that lesson, I was given the task of compiling a list of instructions that should only very rarely appear in any program executing on a 68000 (and only then because you’re dealing with either self-modifying code or special hardware that depends on certain types of reads and writes from the processor). They are:

Don't Use Use Save

Move.B #0,Dx Clr.B Dx 8 cycles, 2 bytes

Move.W #0,Dx Clr.W Dx 8 cycles, 2 bytes

Clr.L Dx Moveq #0,Dx 2 cycles

Move.L #0,Dx Moveq #0,Dx 8 cycles, 4 bytes

Move.L #0,Ax Suba.L Ax,Ax 4 cycles, 4 bytes

Move.L #[-128..127],Dx Moveq #[-128..127],Dx 8 cycles, 4 bytes

Move.L #[-128..127],ea Moveq #[-128..127],Dx 4 cycles, 2 bytes

Move.L Dx,ea

Move.L #[128..254],Dx Moveq #[64..127],Dx 4 cycles, 2 bytes

Add Dx,Dx

Move.L #[-256..-130],Dx Moveq #[-128..-65],Dx 0 cycles, 2 bytes

Add.L Dx,Dx

Lea [1..8](Ax),Ax Addq #[1..8],Ax 0 cycles, 2 bytes

Add.W #[9..32767],Ax Lea [9..32767](Ax),Ax 4 cycles

Lea [-8..-1](Ax),Ax Subq #[1..8],Ax 0 cycles, 2 bytes

Sub.W #[9..32767],Ax Lea [-32767..-9](Ax),Ax 4 cycles

Asl.W #1,Dx Add.W Dx,Dx 4 cycles

Asl.L #1,Dx Add.L Dx,Dx 2 cycles

Cmp.x #0,ea Tst.x ea 4-10 cycles, 2 bytes

And.L #$0000FFFF,Dx Swap Dx 4 cycles

Clr.W Dx

Swap Dx

In addition, if you don’t care about the values of the condition codes then the following may be optimized:

Don't Use Use Save

Move.W #nnnn,-(SP) Move.L #ppppnnnn,-(SP) 4 cycles, 2 bytes

Move.W #pppp,-(SP)

Move.L #$0000nnnn,-(SP) Pea $nnnn 4 cycles, 2 bytes

Move.B #255,Dx St Dx 2 cycles, 2 bytes

Move.L #$00nn0000,Dx Moveq #[0..127],Dx 4 cycles, 2 bytes

Swap Dx

Movem (SP)+,Dx Move (SP)+,Dx 4 cycles

Ext.L Dx

Movem.L Dx,-(SP) Move.L Dx,-(SP) 4 cycles, 2 bytes

Movem.L (SP)+,Dx Move.L (SP)+,Dx 8 cycles, 2 bytes

Movem.L (SP)+,<2 regs> Move.L (SP)+,<reg 1> 4 cycles

Move.L (SP)+,<reg 2>

Note that pushing 2 regs or popping 3 with Movem.L is equivalent in cycles to doing it with multiple Move.L’s, but popping 3 regs with Move.L’s costs you two extra bytes. An easy rule to remember is to always use Movem.L whenever you’re dealing with 3 or more registers.

There are other optimizations you can make with minimal assumptions. For instance, if you are making room for a function result then don’t use Clr:

Don't UseUseSave
Clr.W -(SP)Subq #2,SP6 cycles
_Random _Random
Clr.L -(SP)Subq #4,SP14 cycles
_FrontWindow _FrontWindow

If you’re trying to set, clear, or change one of the low 16 bits of a data register and you don’t need to test it first, then don’t use these:

Don't UseUseSave
Bset #n,DxOr.W #mask,Dx4 cycles
Bclr #n,DxAnd.W #mask,Dx4 cycles
Bchg #n,DxEor.W #mask,Dx4 cycles

You should use registers wherever possible, not memory (because memory is much slower to access). If you need to test for a NIL handle or pointer, for instance, do this:

Don't UseUseSave
Move.L A0,-(SP)Move.L A0,D016 cycles, 2 bytes
Addq #4,SPBeq.S ItsNil
Beq.S ItsNil

Use the “quick” operations wherever you can. Many times you can reverse the order of two instructions to use a Moveq (since Moveq handles bigger numbers than Addq/Subq):

Don't UseUseSave
Move.L D0,D1Moveq #10,D16 cycles, 4 bytes
Add.L #10,D1Add.L D0,D1

Also, use two Addq’s or Subq’s when dealing with longs in the range of 9..16:

Don't UseUseSave
Addi.L #10,D0Addq.L #2,D04 cycles, 2 bytes
Addq.L #8,D0

The following three optimizations will reduce the size of your program but at the expense of a few cycles. This is good for user interface code, but you probably don’t want to use these optimizations in tight loops where speed is important:

Don't UseUseSave
Move.B #0,-(SP)Clr.B -(SP)-2 cycles, 2 bytes
Move.W #0,-(SP)Clr.W -(SP)-2 cycles, 2 bytes
Move.L #0,-(SP)Clr.L -(SP)-2 cycles, 4 bytes

Most of the optimizations from here onward are only applicable in some cases. Many times you can use a slightly different version of the exact code given here to get an optimization that works well for your particular set of circumstances. These optimizations don’t always have the same set of side effects or overflow/underflow conditions that the original code has, so use them with caution.

Shifting left by 2 bits (to multiply by 4) should be avoided if you’re coding for speed:

Don't UseUseSave
Asl.W #2,DxAdd.W Dx,Dx2 cycles, -2 bytes
Add.W Dx,Dx

Use bytes for booleans instead of bits. They’re faster to access (and less code in some cases). If you have many booleans, though, bits may be the way to go because of reduced memory requirements (of the data, that is, not the code).

Don't UseUseSave
Btst #1,myBools(A6)Tst.B aBool(A6)4 cycles, 2 bytes
Btst #1,D0Tst.B D06 cycles, 2 bytes

Avoid the use of multiply and divide instructions like the plague. Use shifts and adds for immediate operands or loops of adds and subtracts for variable operands. For instance, to multiply by 14 you could do this:

Don't UseUseSave
Mulu #14,D0Add D0,D0many cycles, -4 bytes
Move D0,D1
Lsl #3,D0
Sub D1,D0

If you have a variable source operand, but you know that it is typically small (and positive, for this example), then use a loop instead of a multiply instruction. This works really well in the case of a call to FixMul if you know one of the operands is a small integer -- you can avoid the trap overhead and the routine itself by using a loop similar to this one (in fact, the FixMul routine itself checks if either parameter is 1.0 before doing any real work):

Don't UseUseSave
Mulu D1,D0Move D0,D2many cycles, -8 bytes
Neg D2
@1 Add D0,D2
Subq #1,D1
Bne.S @1

Likewise, for division, use a subtract loop if you know that the quotient isn’t going to be huge (and if the destination fits in 16 bits):

Don't UseUseSave
Divu D1,D0Moveq #0,D2many cycles, -10 bytes
Cmp D1,D0
Bra.S @2
@1 Addq #1,D2
Sub D1,D0
@2 Bhi.S @1

Don’t use Bsr/Rts in tight loops where speed is important. Put the return address in an unused address register instead.

Don't UseUseSave
Bsr MyProcLea @1,A08 cycles, -4 bytes
;<blah>Bra MyProc
@1 ;<blah>
MyProc:MyProc:
;<blah blah>;<blah blah>
RtsJmp (A0)

You can eliminate a complete Bsr/Rts pair (or equivalent above) if the Bsr is the last instruction before an Rts by changing the Bsr to a Bra:

Don't UseUseSave
Bsr MyProcBra MyProc24 cycles, 2 bytes
Rts

Don’t use BlockMove for moves of 80 bytes or less where you know the source and destination don’t overlap. The trap overhead and preflighting that BlockMove does make it inefficient for such small moves. Use this loop instead (assuming Dx > 0 on entry):

Don't UseUseSave
_BlockMoveSubq #1,Dxmany cycles, -6 bytes
@1 Move.B (A0)+,(A1)+
Dbra Dx,@1

I base this conclusion on time trials done on a Mac IIci with a cache card. The actual results were (for several thousand iterations):

Figure 1: How fast do blocks move?

I did the same tests on a Mac SE and found that it was only beneficial to call BlockMove on that machine for moves of 130 bytes or more. However, since you should optimize for the lowest common denominator across all machines, you should only use the Dbra loop for non-overlapping moves of 80 bytes or less.

Be warned, though: on the Quadras, BlockMove has been modified to flush the 040 caches because of the possibility that you (or the memory manager) are BlockMoving executable code. So don’t use the above loop for moving small amounts of code (like you might do in some INIT installation code). Apple did this for compatibility reasons with existing non-040 aware applications running in 040 copy-back mode (high performance mode). However, because of this, your non-code BlockMoves are unnecessarily clearing the caches, too. I don’t know if it’s worth it to write a dedicated BlockMove for non-code moves, but it seems like it’s worth doing and then timing to see if there’s a difference.

Unroll loops. At the expense of a few extra bytes you can make any tight loop run faster. This is because short branch instructions that are not taken are faster than those that are taken. Here’s an even faster version of the above loop:

;1

 Subq #1,Dx
 @1 Move.B (A0)+,(A1)+
 Subq #1,Dx
 Bcs.S @2
 Move.B (A0)+,(A1)+
 Subq #1,Dx
 Bcs.S @2
 Move.B (A0)+,(A1)+
 Dbra Dx,@1
 @2

Beware when using the above trick, though, because it doesn’t work for long branches. In that case, a taken branch is faster than a branch not taken.

Preserving pointers into relocatable blocks across code that moves memory: If you need to lock a handle because you’re going to call a routine that moves memory but the handle (and the dereferenced handle) isn’t a parameter to that routine, then you can usually avoid locking the handle with a trick (which has the desirable side effect of reducing memory fragmentation). Assume the handle is in A3 and the pointer into the middle of the block is in A2. All you really have to do is save/restore the offset into the block; you don’t care if the block moves or not:

Don't UseUseSave
Move.L A3,A0Sub.L (A3),A2many cycles, 4 bytes
_HLock
;<move memory> ;<move memory>
Move.L A3,A0Add.L (A3),A2
_HUnlock

If the end of a routine is executing the same set of instructions two or more times, then you may be able to use this trick to save some bytes (at the expense of a few cycles). If the end of the routine looks like a subroutine, then have it Bsr to itself, like this (this example is drawing a BCD byte in D3):

Don't UseUseSave
Ror #4,D3Ror #4,D3many bytes
Move.B D3,D0Bsr @1
And #$000F,D0Rol #4,D3
Add #'0',D0
Move D0,-(SP)
_DrawChar
Rol #4,D3
Move.B D3,D0@1 Move D3,D0
And #$000F,D0And #$000F,D0
Add #'0',D0Add #'0',D0
Move D0,-(SP)Move D0,-(SP)
_DrawChar _DrawChar
Rts Rts

Use multiple entry points to set common parameters. Suppose you have a routine that takes a boolean value in D0 as an input and suppose you call this routine 20 times with the value of True and 30 times with the value of False. It would save code if you made two entry points that each set D0, and then branched to common code. For instance:

Don't UseUseSave
St D0Bsr MyProcTruemany bytes
Bsr MyProc
Sf D0Bsr MyProcFalse
Bsr MyProc
MyProcTrue:
St D0
Bra.S MyProc
MyProcFalse:
Sf D0
MyProc:MyProc:
;<blah>;<blah>
RtsRts

Clean up the stack with Unlk. If your routine already has a stack frame and you create some temporary data on the stack (in addition to the stack frame) then you don’t always need to remove it when you’re done with it -- the Unlk will clean it up for you. For instance, suppose you make a temporary Rect on the stack. You would normally remove it with Addq #8,SP but if it’s near the end of a function that does an Unlk, then leave the Rect there; it’ll be gone when the Unlk executes.

Well, hopefully Doo-Dah has many more learned disciples now. Don’t forget to sacrifice a copy of FullWrite in his honor at least once a year. That makes him happy.

P.S. If you want even more 68000 optimizations there is an excellent article by Mike Morton in the September, 1986, issue of Byte magazine called “68000 Tricks and Traps” (pgs. 163-172). There are more than half a dozen or so tricks in that article not covered in this article (sorry for not listing them here but I didn’t want to get sued for plagiarism).

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

EarthDesk 7.3.2 - $24.99
EarthDesk replaces your static desktop picture with a rendered image of Earth showing correct sun, moon, and city illumination. With an Internet connection, EarthDesk displays near-real-time global... Read more
CleanMyMac X 4.2.1.1 - Delete files that...
CleanMyMac makes space for the things you love. Sporting a range of ingenious new features, CleanMyMac lets you safely and intelligently scan and clean your entire system, delete large, unused files... Read more
BlueStacks 4.50.5 - Run Android applicat...
BlueStacks App Player lets you run your Android apps fast and fullscreen on your Mac. Version 4.50.5: Fixed: The sound and gameplay synchronization issue has been resolved so that you can have a... Read more
Adobe Premiere Pro CC 2019 13.0.3 - Digi...
Premiere Pro CC 2019 is available as part of Adobe Creative Cloud for as little as $20.99/month (or $9.99/month if you're a previous Premiere Pro customer). Adobe Premiere Pro CC 2019 lets you edit... Read more
Arq 5.16 - Online backup to Google Drive...
Arq is super-easy online backup for Mac and Windows computers. Back up to your own cloud account (Amazon Cloud Drive, Google Drive, Dropbox, OneDrive, Google Cloud Storage, any S3-compatible server... Read more
Mindjet MindManager 12.0.161 - Professio...
MindManager is a powerful mind mapping tool that increases your productivity. From business plans or developing a new website, its robust mind maps have all the features you need to accomplish your... Read more
Stacks 3.6.6 - New way to create pages i...
Stacks is a new way to create pages in RapidWeaver. It's a plugin designed to combine drag-and-drop simplicity with the power of fluid layout. Features Fluid Layout: Stacks lets you build pages... Read more
Malwarebytes 3.7.32.2261 - Adware remova...
Malwarebytes (was AdwareMedic) helps you get your Mac experience back. Malwarebytes scans for and removes code that degrades system performance or attacks your system. Making your Mac once again your... Read more
TotalFinder 1.11.8 - Adds tabs, hotkeys,...
TotalFinder is a universally acclaimed navigational companion for your Mac. Enhance your Mac's Finder with features so smart and convenient, you won't believe you ever lived without them. Features... Read more
Google Chrome 72.0.3626.119 - Modern and...
Google Chrome is a Web browser by Google, created to be a modern platform for Web pages and applications. It utilizes very fast loading of Web pages and has a V8 engine, which is a custom built... Read more

Latest Forum Discussions

See All

All the tips, tricks, and hints you need...
Immortal Rogue is one of those games that can take a bit of getting used to. Don't get us wrong, we love the action roguelike with all of our hearts, but it's got some pretty deep ideas and concepts that might pass you by if you're not paying close... | Read more »
The best games for iPhone and iPad that...
Well, once again we've reached Thursday, which means it's time to have a look at the awesome games that have popped up on the App Store over the past seven days. We've got horror, we've got vampires, and we've got at least one game we still don't... | Read more »
The AAF app could be the only fantasy fo...
Fantasy football is massive, which is something the AAF understands. It's a new American Football league that's two weeks into its first season. It's designed to serve a number of different masters - for one thing it's positioned itself as a second... | Read more »
Everything you need to know to win in Ja...
Jaws.io is one of those games that just shouldn't be as much fun as it is. But it is as much fun as it is, so there you go. It's a game all about eating things, shooting a giant shark, and trying to score as many points as possible. And if there's... | Read more »
Everything you need to know to win in Kn...
Knights of the Card Table is a really clever, solitaire dungeon crawler that's not just crammed to the gills with monsters to fight and loot to find, it's also got one of the biggest hearts of any game we've seen on the App Store. We definitely... | Read more »
A quick beginner’s guide to Final Blade
Final Blade was developed by newcomer SkyPeople studio, with help from localisation guru Glohow. After two years exclusively in the hands of South Korean and Chinese players, the game is now celebrating its global launch. Hurrah! But if you’re a... | Read more »
The best games for iPhone and iPad that...
How is it already Thursday again? My oh my, doesn't time fly when you're playing the very best mobile games out there? We certainly hope it does, because we've gone ahead and written a list of what we think are the top 5 best games for iPhone and... | Read more »
Three games for iPad and iPhone to keep...
On Monday we told you that Apex Legends is, all being well, eventually going to end up on the App Store. That means you'll be able to play one of the best new battle royale shooters in months in the palm of your hand. However, it hasn't happened... | Read more »
Why you should be excited about Apex Leg...
You've no doubt heard of Apex Legends by now. It's a new take on the battle royale genre developed by Respawn, and published by EA. It went live on EA Origin, PS4, and Xbox One last week, and it's already been generating a lot of buzz around the... | Read more »
Epic fantasy RPG Final Blade celebrates...
Now is a great time for RPG fans the world over as Final Blade has, well, finally got its global release for iOS and Android. The grand-scale RPG developed by Skypeople Inc in association with Glowhow, the has been quite the hit over in Taiwan and... | Read more »

Price Scanner via MacPrices.net

Apple offers full line of 2017 iMacs, Certifi...
Apple has a full line of Certified Refurbished 21″ & 27″ iMacs available for up to $350 off original MSRP. Apple’s one-year warranty is standard, shipping is free, and each iMac features a new... Read more
Apple Needs To Get Into The Fold And See Wher...
EDITORIAL: 02.22.19- Apple, Inc. has long been rumored to be putting into its pipeline of possible products a touchscreen enabled computer with an operating system to boot that may (or may not) be... Read more
B&H and Amazon offer $50-$70 discounts on...
B&H Photo and Amazon have new 2018 12.9″ WiFi iPad Pros on sale for up to $70 off MSRP. Shipping is free: B&H: – 12.9″ 64GB WiFi iPad Pro: $949 $50 off – 12.9″ 256GB WiFi iPad Pro: $1099 $50... Read more
Get the 21″ 3.4GHz 4K Apple iMac for $1399 to...
Abt Electronics has the 21″ 3.4GHz 4K iMac on sale today for $1399.99 including free shipping. Their price is $100 off MSRP, and it’s the lowest price available for this model: – 21″ 3.4GHz 4K iMac... Read more
iMac sale! Get a new 27″ Apple iMac for $200...
B&H Photo has new 27″ Apple iMacs on sale for $200 off MSRP, starting at $1599. These are the same models offered by Apple in their retail and online stores. Shipping is free: – 27″ 3.8GHz 5K... Read more
Apple restocks Certified Refurbished 4K Apple...
Apple has restocked Certified Refurbished 32GB and 64GB 4K Apple TVs for $30 off the cost of new models. Apple’s standard one-year warranty is included with each model, and shipping is free: – 32GB... Read more
Save $50-$62 on a new Apple Mac mini at Abt E...
Abt Electronics has the new 2018 4-Core and 6-Core Mac minis on sale for $50-$62 off standard MSRP, with prices starting at $749. Shipping is free: – 3.6GHz Quad-Core mini: $749 $50 off MSRP – 3.... Read more
13″ Dual-Core 2.3GHz non-Touch Bar MacBook Pr...
Apple resellers B&H Photo and Amazon are both offering sale prices on new 13″ Dual-Core 2.3GHz non-Touch Bar MacBook Pros, ranging up to $150 off MSRP, with prices starting at $1199. Shipping is... Read more
Could These Be The Products That Apple Will B...
NEWS: 02.20.19- Apple, Inc. is widely expected to be releasing a number of new products in the pipeline this year but just what exactly will those items be and who is the source of all of that... Read more
B&H has 42mm Apple Watch Series 3 GPS + C...
B&H Photo is discounting 42mm Apple Watch Series 3 GPS + Cellular models by $60. Shipping is free: – 42mm Apple Watch Series 3 GPS + Cellular: $349 $60 off MSRP Their price is the lowest... Read more

Jobs Board

Operations Associate - *Apple* Blossom Mall...
Operations Associate - Apple Blossom Mall Location:Winchester, VA, United States- Apple Blossom Mall 1850 Apple Blossom Dr Job ID:1044618 Date:Today Job Read more
Hair Stylist - *Apple* Blossom Mall - JCPen...
Hair Stylist - Apple Blossom Mall Location:Winchester, VA, United States- Apple Blossom Mall 1850 Apple Blossom Dr Job ID:1065040 Date:Today Job Description Read more
Cashier - *Apple* Blossom Mall - JCPenney (...
Cashier - Apple Blossom Mall Location:Winchester, VA, United States- Apple Blossom Mall 1850 Apple Blossom Dr Job ID:1042611 Date:Today Job Description Read more
Omni-Channel Associate - *Apple* Blossom Ma...
Omni-Channel Associate - Apple Blossom Mall Location:Winchester, VA, United States- Apple Blossom Mall 1850 Apple Blossom Dr Job ID:1074107 Date:Tomorrow Job Read more
Temporary Operations Associate - *Apple* Bl...
Temporary Operations Associate - Apple Blossom Mall Location:Winchester, VA, United States- Apple Blossom Mall 1850 Apple Blossom Dr Job ID:1040569 Date:Today Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.