TweetFollow Us on Twitter

Jun 94 Challenge
Volume Number:10
Issue Number:6
Column Tag:Programmers’ Challenge
!seealso: "May 94 Challenge" " Jul 94 Challenge"

Programmers’ Challenge

By Mike Scanlin, MacTech Magazine Regular Contributing Author

Note: Source code files accompanying article are located on MacTech CD-ROM or source code disks.

The rules

Here’s how it works: Each month there will be a different programming challenge presented here. First, you must write some code that solves the challenge. Second, you must optimize your code (a lot). Then, submit your solution to MacTech Magazine (formerly MacTutor). A winner will be chosen based on code correctness, speed, size and elegance (in that order of importance) as well as the postmark of the answer. In the event of multiple equally desirable solutions, one winner will be chosen at random (with honorable mention, but no prize, given to the runners up). The prize for the best solution each month is $50 and a limited edition “The Winner! MacTech Magazine Programming Challenge” T-shirt (not to be found in stores).

In order to make fair comparisons between solutions, all solutions must be in ANSI compatible C (i.e., don’t use Think’s Object extensions). Only pure C code can be used. Any entries with any assembly in them will be disqualified (except for those challenges specifically stated to be in assembly). However, you may call any routine in the Macintosh toolbox you want (i.e., it doesn’t matter if you use NewPtr instead of malloc). All entries will be tested with the FPU and 68020 flags turned off in THINK C. When timing routines, the latest version of THINK C will be used (with ANSI Settings plus “Honor ‘register’ first” and “Use Global Optimizer” turned on) so beware if you optimize for a different C compiler. All code should be limited to 60 characters wide. This will aid us in dealing with e-mail gateways and page layout.

The solution and winners for this month’s Programmers’ Challenge will be published in the issue two months later. All submissions must be received by the 10th day of the month printed on the front of this issue.

All solutions should be marked “Attn: Programmers’ Challenge Solution” and sent to Xplain Corporation (the publishers of MacTech Magazine) via “snail mail” or preferably, e-mail - AppleLink: MT.PROGCHAL, Internet: progchallenge@xplain.com, CompuServe: 71552,174 and America Online: MT PRGCHAL. If you send via snail mail, please include a disk with the solution and all related files (including contact information). See page 2 for information on “How to Contact Xplain Corporation.”

MacTech Magazine reserves the right to publish any solution entered in the Programming Challenge of the Month. Authors grant MacTech Magazine the non-exclusive right to publish entries without limitation upon submission of each entry. Copyrights for the code are retained by the author.

FACTORING

Being able to factor quickly is an important part of breaking secret codes, I mean, writing cool Mac games. This month’s challenge, therefore, is to factor a 64-bit number into the two primes that were multiplied together to produce it.

The prototype of the function you write is:


/* 1 */
void Factor64(lowHalf, highHalf
 prime1Ptr, prime2Ptr)
unsigned long lowHalf;
unsigned long highHalf;
unsigned long *prime1Ptr;
unsigned long *prime2Ptr;

highHalf and lowHalf are the 64-bit input number split into two pieces (bit zero of lowHalf is bit 0 of the input number and bit 31 of highHalf is bit 63 of the input number). The input number is guaranteed to be the product of two primes, each of which is 32 bits or less. Your routine will store one prime at *prime1Ptr and the other one at *prime2Ptr (in either order).

Remember, solutions must be in C to qualify for entry into the Challenge but assembly versions might get mentioned if they’re wicked fast. Also, if anyone has a nice routine for factoring even larger numbers (like, say, 256-bit numbers) into composite primes and wouldn’t mind sharing it with MacTech readers then send it on in. The best one might get published along with the winning solution.

TWO MONTHS AGO WINNER

The competition for the Swap Blocks challenge was unusually tough. There were several very high quality entries. Congratulations to Bill Karsh (Chicago, IL) for winning with the fastest entry. It was only last month that I declared Bob Boonstra (Westford, MA) the Programmer Challenge Champion for having the most number of first place showings but now he and Bill are tied for that elusive title (with three wins each). Jorg Brown (San Francisco, CA) deserves praise for his second place showing. His code size was just over half of Bill’s winning solution and was nearly as fast.

Here are the code sizes and times for two different tests. The first time test was for random size inputs (according to the distribution stated in the problem). The second time test was for blocks that were roughly, but not exactly, equal in size (again, with the given distributions but with both sizes coming from the same size category). Numbers in parens after a person’s name indicate how many times that person has finished in the top 5 places of all previous Programmer Challenges, not including this one:

Name time 1 time 2 code size

Bill Karsh (3) 170 219 642

Jorg Brown 174 242 366

Jim Lloyd 209 408 1642

Lorn Olsen 239 350 670

Ted Krovetz 243 247 88

Stepan Riha (6) 243 347 452

Bob Boonstra (8) 247 443 480

Jeffry Spain 248 397 234

Greg Landweber (1) 264 491 300

Martin Weiss 281 601 210

Christopher Suley 299 321 110

Dave Darrah 299 681 284

Ernst Munter 315 414 632

Xan Gregg 340 1260 484

Michael Anderson 359 942 156

Allen Stenger (5) 393 436 156

Michael Panchenko 409 465 82

Danny Stevenson 449 583 424

Eric Bennett 493 1478 284

Arnold Woodworth 595 729 206

Bob Boonstra 212 418 400

(assembly)

The SwapBytes problem is really a multi-byte rotate problem. Think about it this way: If you had a 32-bit register and you wanted to swap the low 7 bits with the upper 25 bits you could just rotate it 7 bit positions to the right. The rotate instruction is like a SwapBits operation where size1 + size2 always equals 32.

Almost everyone who entered used a variant of this observation. The fifth place entry by Ted Krovetz (Santa Cruz, CA) illustrates it nicely:


/* 2 */
void SwapBlocks (void *p1, void *p2,
 void *swapPtr, ulong size1,
 ulong size2, ulong swapSize)
{
 long *lp1 = (long *)p1;
 long *lp2 = (long *)p2;  
 ulong s1 = size1 >> 2;
 ulong s2 = size2 >> 2;
 ulong count;
 long temp, *tempp1, *tempp2;
 
 do {
 if (s1 < s2) {
 count = s1;
 tempp1 = lp1;
 s2 -= s1;
 tempp2 = lp2 + s2;
 }
 else {
 count = s2;
 tempp1 = lp1;
 tempp2 = lp2;
 lp1 += s2;
 s1 -= s2;
 }
 do {
 temp = *tempp1;
 *(tempp1++) = *tempp2;
 *(tempp2++) = temp;
 } while (--count);
 } while (s1);
}

Because Bill’s winning solution is so general purpose and macro-ized it is not the easiest code to read (although I commend his generality in making a useful piece of reusable and portable code). He has compile-time flags that let you build a large fast version (over 600 bytes, which was the version timed) or a small slower version (less than 100 bytes). And you can optionally change the 4 byte alignment assumption into a 2 byte or 1 byte alignment assumption (by redefining AtomSize).

I used Think C’s preprocessor command to see what all those #defines would boil down to. The core swap code for those cases where you can’t use the temporary swap space (cause it’s too small) ends up looking like this:


/* 3 */
switch( (short)q ) {
case 0:
 while( --nS ) {
 q = *pL;
 *pL++ = *pR;
 *pR++ = q;
case 7:
 q = *pL;
 *pL++ = *pR;
 *pR++ = q;
case 6:
 q = *pL;
 *pL++ = *pR;
 *pR++ = q;
case 5:
 q = *pL;
 *pL++ = *pR;
 *pR++ = q;
case 4:
 q = *pL;
 *pL++ = *pR;
 *pR++ = q;
case 3:
 q = *pL;
 *pL++ = *pR;
 *pR++ = q;
case 2:
 q = *pL;
 *pL++ = *pR;
 *pR++ = q;
case 1:
 q = *pL;
 *pL++ = *pR;
 *pR++ = q;
 } /* end while */
}; /* end switch */

This illustrates some interesting loop unrolling syntax that’s possible in C. As the code shows, it’s legal to spread a while statement over several case labels in a switch statement. Which nicely solves the problem of “How do you handle the remainder?” when you unroll a loop 8 times. In this example nS is the number of times to swap divided by 8 and q is numTimesToSwap mod 8. So if numTimesToSwap is 10 then q is 2 and nS is 1. When the switch statement is executed it will branch to case 2 which does 2 swaps and then loops back to the top of the while loop. It runs through one set of 8 swaps and then stops. Pretty cool syntax.

Here’s Bill’s winning solution:

SwapBlocks

Response to Apr 94 MacTech Programmer's Challenge.

by Bill Karsh

Object: Exchange contents of two adjacent memory blocks.

Redirection: This is an interesting problem, but what would make this guy really useful? As stated, the blocks for the challenge are 4i bytes long and start on 4j aligned addresses. These are special circumstances which apply to Memory Manager blocks, and then, only on 68020 or later cpu's. Memory blocks on the 68000 are merely even aligned and even length. Further, this could be a word processor tool for swapping runs of bytes, but we would have to relax the alignment and size restrictions even further to arbitrary address and length since we would almost always be pointing to characters interior to a handle.

I have written the routine to give its best performance, subject to a specified minimum enforced alignment and atom size (smallest unit to move). This is controlled at compile time by:


/* 4 */
typedef long  Atom, for len = 4i, addr = 4j,
typedef short Atom, for len = 2i, addr = 2j,
typedef Byte  Atom, for len = any, addr = any.

Note - due to an ancient law of portability, preprocessor directives are not allowed to compare enums, types, sizeof()s or anything else that has machine dependency hidden in it. This means you have to #define the AtomSize manually. This is needed to select the proper performance crossover points for that type.

But wait there’s more... You might not tolerate a 644 byte dedicated word swapper in your text editor, but a 96 byte one might fit. We handle that.

You can tailor the routine to your requirements for execution speed vs. code size by setting the JobMode constant according to this table:

JobMode Buffers MonsterCopies MonsterSwaps

Smallest No No No

Small No No Yes

Fast Yes No Yes

Fastest Yes Yes Yes

- billKarsh


/* 5 */
#pragma options( honor_register, !assign_registers )

#defines
#define Smallest                0
#define Small                   1
#define Fast                    2
#define Fastest                 3
User Selectable Parameters

/* 6 */
#define JobMode                 Fastest
#define Verify_p1_LowerThan_p2  0

Sorry, you must #define your chosen Atom’s size by hand. The preprocessor won’t accept sizeof operators. Yuck! The XOvers below vary according to this size, so we have to know it.


/* 7 */
typedef longAtom;
#define AtomSize 4


#if JobMode >= Fast
#define UseBuffer1
#endif
#if JobMode == Fastest
#define MonsterCopy1
#endif
#if JobMode >= Small
#define MonsterSwap1
#endif


#define Lo3B0x00ffffff


#if AtomSize == 4
#define FwdXOver            144
#define BckXOver            120
#define SwpXOver            44
#elif AtomSize == 2
#define FwdXOver            48
#define BckXOver            44
#define SwpXOver            32
#else
#define FwdXOver            24
#define BckXOver            20
#define SwpXOver            12
#endif

FwdOp
#define FwdOp                                        \
 *dst++ = *src++

BckOp
#define BckOp                                        \
 *--pR = *--pL

SwpOp
#define SwpOp                                        \
 q     = *pL;                                       \
 *pL++ = *pR;                                       \
 *pR++ = q

Cases3_1
#define Cases3_1( op )                               \
 case 3:     op;                                    \
 case 2:     op;                                    \
 case 1:     op

Cases7_1
#define Cases7_1( op )                               \
 case 7:     op;                                    \
 case 6:     op;                                    \
 case 5:     op;                                    \
 case 4:     op;                                    \
 Cases3_1( op )

CalcPasses
#define CalcPasses( bits )                           \
 nS /= sizeof(Atom);                                \
 q = nS & ((1 << bits) - 1);                        \
 nS >>= bits;                                       \
 ++nS

Monster
#define Monster( op, cases )                         \
 switch( (short)q ) {                               \
 case 0:                                          \
 while( --nS ) {                                \
 op;                                          \
 cases( op );                                 \
 }                                              \
 }

CopyInc
#if MonsterCopy == 1
#define CopyInc( dst, src, n )                     \
 nS = n;                                           \
 if( nS > FwdXOver ) {                             \
 _CopyInc(                                       \
  (Atom*)(dst), (Atom*)(src), nS );              \
 }                                                 \
 else {                                            \
 pL = (Atom*)(dst);                              \
 pR = (Atom*)(src);                              \
 do { *pL++ = *pR++; } while(nS-=sizeof(Atom));  \
 }
#else
#define CopyInc( dst, src, n )                      \
 nS = n;                                            \
 pL = (Atom*)(dst);                                 \
 pR = (Atom*)(src);                                 \
 do { *pL++ = *pR++; } while(nS-=sizeof(Atom))
#endif

CopyDec
#if MonsterCopy == 1
#define CopyDec( dst, src, n )                      \
 nS = n;                                            \
 pR = (Atom*)((Byte*)(dst) + nS);                   \
 pL = (Atom*)((Byte*)(src) + nS);                   \
 if( nS > BckXOver ) {                              \
 CalcPasses( 2 );                                 \
 Monster( BckOp, Cases3_1 );                      \
 }                                                  \
 else {                                             \
 do { BckOp; } while(nS-=sizeof(Atom));           \
 }
#else
#define CopyDec( dst, src, n )                      \
 nS = n;                                            \
 pR = (Atom*)((Byte*)(dst) + nS);                   \
 pL = (Atom*)((Byte*)(src) + nS);                   \
 do { BckOp; } while(nS-=sizeof(Atom))
#endif

Swap
#if MonsterSwap == 1
#define Swap                                        \
 if( nS > SwpXOver ) {                              \
 CalcPasses( 3 );                                 \
 Monster( SwpOp, Cases7_1 );                      \
 }                                                  \
 else {                                             \
 do { SwpOp; } while(nS-=sizeof(Atom));           \
 }
#else
#define Swap                                        \
 do { SwpOp; } while(nS-=sizeof(Atom))
#endif


#define MacroMania              true


#if JobMode == Fastest

_CopyInc
Copy specified number of Bytes from src to dst.  Addresses are incremented, 
so src and dst can overlap iff dst <= src.
 
static void _CopyInc(
 register Atom           *dst,
 register const Atom     *src,
 register unsigned long  nS )
{
 short  q, pad;
 
 CalcPasses( 3 );
 Monster( FwdOp, Cases7_1 );
}
#endif

SwapBlocks

void SwapBlocks(
 void           *p1,
 void           *p2,
 void           *swapPtr,
 unsigned long  size1,
 unsigned long  size2,
 unsigned long  swapSize )
{
 register Atom   *pL, *pR, *p0;
 register long   nL, nR, nS, q;
 Boolean         done;
 short           pad;
 
 if( !(nL = size1) || !(nR = size2) ) return;
 
 p0 = p1;

If you can safely assume that p1 is always lower or same as p2, define Verify_p1_LowerThan_p2 = 0 (the #if section is not necessary).

If the “1” and “2” in p1 and p2 are simply labels, indicating nothing about position in memory of the blocks, then you must order them by activating the #if section. Define Verify_p1_LowerThan_p2 = 1.

Ordering means comparing addresses, which treats them as 32-bit numbers, no matter the current cpu addressing mode. If GetMMUMode returns true, we are in 32-bit mode - all 32-bits are significant.

In 24-bit mode, when the cpu uses an address to load or store something, it totally ignores the high-byte of the address. The high-byte may be random garbage. In this mode we suppress any garbage before comparing by masking it to zero.


/* 8 */
#if Verify_p1_LowerThan_p2 == 1

 pR = p2;
 
 if( !GetMMUMode() ) {
 p0 = (Atom*)((long)p0 & Lo3B);
 pR = (Atom*)((long)pR & Lo3B);
 }
 
 if( pR < p0 ) {
 q  = (long)p0;
 p0 = pR;
 p2 = (Atom*)q;
 
 q  = nL;
 nL = nR;
 nR = q;
 }
#endif

First, make use of buffer if we can. This is faster in most cases. A notable exception is equal size case which is best done in situ (let drop through).

Compare only the smaller size with buffer. If left is smaller, we can use post-increment addressing which is the faster mode. If right is smaller, use pre-decrement mode. We omit seeing if right-smaller will work with post-increment mode (if left also fits buffer). Preflighting overhead swallows us up very quickly.


/* 9 */
Buffer?
#if UseBuffer == 1

 if( nL < nR ) {
 if( nL <= swapSize ) {
 CopyInc( swapPtr, p0, nL );
 CopyInc( p0, p2, nR );
 CopyInc( (Byte*)p0 + nR, swapPtr, nL );
 return;
 }
 }
 else if( nL > nR ) {
 if( nR <= swapSize ) {
 CopyInc( swapPtr, p2, nR );
 CopyDec( (Byte*)p0 + nR, p0, nL );
 CopyInc( p0, swapPtr, nR );
 return;
 }
 }
#endif

This algorithm always does the job, buffer or not.

Find the smaller block. Swap it immediately into its final place. Now the larger block is in two out-of-order, but contiguous pieces. Wait a minute, this is what we started with! The only differences are: now the sizes are {smaller, larger - smaller}, and the start addresses have to keep up with the new pieces.

We repeat until the two pieces were the same length. In other words, the final swap didn’t break anybody in two. This can end with sizes larger than Atom-Atom. It depends on whether the smaller evenly divides the larger.


/* 10 */
In Situ
 done = false;

 do {
 
 pL = p0;
 pR = p2;

 if( nL < nR ) {
 nR = nR - nL;
 pR = (Atom*)((Byte*)pR + nR);
 nS = nL;
 }
 else if( nL > nR ) {
 p0 = (Atom*)((Byte*)pL + nR);
 nL = nL - nR;
 nS = nR;
 }
 else {
 nS = nL;
 done = true;
 }
 
 Swap;
 
 } while( !done );
}
 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Latest Forum Discussions

See All

Avatar Legends: Realms Collide pre-regis...
Despite the staying power that the series has, Avatar: The Last Airbender surprisingly fumbled in the mobile gaming sphere, with Generations shuttering about a mere year after launch. Now, Tilting Point is giving it a go, with the upcoming Avatar... | Read more »
Return to the glory days of tactical RPG...
Back on the old Sega Mega Drive, or Genesis depending on where you are, there was a little series of tactical role-playing games called Shining Force, and I adored them. It started a love for this grid-based genre that persisted through the years... | Read more »
Take on the grandest beasts of all as th...
It has been hyped for a while but now it is finally here, the new Dancing in the Tempest season has arrived in Monster Hunter Now. Kicking off a banner summer for the game, it is time to strap on your boots and face the greatest threat yet, as the... | Read more »
The indie hit Vault of the Void will lan...
In a market where a lot of AAA games are starting to feel like reboots, remakes, or the same thing we’ve seen ten times with a different name slapped on it, indie developers are a boon. Especially ones who build a successful game singlehandedly,... | Read more »
Stumble Guys tries to catch up to the hy...
Do you remember when the Fallout TV series launched on Amazon and everyone nearly lost their minds? Well, it appears that Scopely missed that particular craze, and only now are we going to get a Fallout crossover, and by now, I mean in version 0.... | Read more »
Bid farewell to Penacony as Honkai: Star...
Penacony has been a story of twists, exciting new characters, and strong allies, and soon Honkai: Star Rail will be finishing it with a bang. Version 2.3, fittingly titled Farewell Penacony, will be launching June 19th and will feature updates to... | Read more »
HoYoverse roll out their plans for Anime...
For those who are looking to book a getaway in July, you might give some thought to Los Angeles between the 4th and 7th, which just so happens to coincide with the Anime Expo 2024. Amongst all the storied attendees is HoYoverse, who will be... | Read more »
The first rule of Brok the InvestiGator...
Mobile gamers were recently able to get their hands on BROK the InvestiGator, a point-and-click following the adventures of the titular reptile, a detective who can solve crimes through wit or brawn. If you were one that chose the latter then... | Read more »
Diablo Immortal celebrates second annive...
It has been two years since Diablo Immortal launched and despite some very valid criticism of its business model, it has done pretty well for itself. The Tempest class also gives it a lot of grace. To celebrate this anniversary, the March of the... | Read more »
Pokemon GO pulls on its jersey for a foo...
There have been a lot of jokes about this, some by me, but Pokemon Go has genuinely done a lot of good by getting people out and about.Pokemon GO Fest 2024: Madrid is fast approaching, and Niantic has set up a new area in a bit to get people to... | Read more »

Price Scanner via MacPrices.net

Apple Watch Ultra Watch 2 on sale for $719, s...
Amazon is offering an $80 discount on every Apple Watch Ultra 2 model this week. Their price is now $719. Shipping is free. For the latest prices & deals, keep an eye on our Apple Watch Price... Read more
New sale at Amazon: 16-inch M3 Pro and M3 Max...
Amazon is offering instant discounts on 16″ M3 Pro and 16″ M3 Max MacBook Pros ranging up to $350 off MSRP. Shipping is free. These are the lowest prices currently available for new 16″ Apple MacBook... Read more
Get a 13-inch M2 MacBook Air today at Apple f...
Apple has 13″ M2 MacBook Airs available for only $849 today in their Certified Refurbished store. These are the cheapest M2-powered MacBooks for sale at Apple. Apple’s one-year warranty is included,... Read more
Clearance Mac Studio with M1 Max CPU availabl...
Apple has clearance M1 Max Mac Studios available in their Certified Refurbished store for $270 off original MSRP. Each Mac Studio comes with Apple’s one-year warranty, and shipping is free: – Mac... Read more
Apple has 24-inch M3 iMacs on sale for $200-$...
Apple has a full line of 24-inch M3 iMacs available in their Certified Refurbished store starting at $1099 and ranging up to $260 off original MSRP. Each iMac is in like-new condition and comes with... Read more
24-inch M1 iMacs are available at Apple start...
Apple has clearance M1 iMacs available in their Certified Refurbished store starting at $1049 and ranging up to $300 off original MSRP. Each iMac is in like-new condition and comes with Apple’s... Read more
Back to School savings: Take $50-$100 off new...
Apple will take $50-$100 off new 11″ and 13″ M2 iPad Airs for all teachers, students, and staff of any educational institution with a .edu email address as part of their Apple Education discount,... Read more
Could A Smarter Siri Infused With AI (‘Apple...
FEATURE – The iPhone is already smart, but it’s about to become more intelligent. AI — short for artificial intelligence — is widely expected to be the main topic of discussion at this year’s WWDC (... Read more
Update: For WWDC, Amazon has lowered prices o...
Amazon has every configuration and color of Apple’s M3 MacBook Airs now on sale for $170-$210 off MSRP, starting at only $899 shipped, as Apple holds their annual WWDC conference this week. Their... Read more
Deal Alert! 2nd-generation Apple AirPods on s...
Amazon has 2nd generation Apple AirPods on sale right now for only $79.99 shipped. That’s $50 (38%) off Apple’s MSRP. Their price is the lowest currently available for a new set of AirPods from any... Read more

Jobs Board

Beauty Consultant - *Apple* Blossom Mall -...
Beauty Consultant - Apple Blossom Mall Location:Winchester, VA, United States (https://jobs.jcp.com/jobs/location/191170/winchester-va-united-states) - Apple Read more
Senior Software Engineer - *Apple* Fundamen...
…center of Microsoft's efforts to empower our users to do more. The Apple Fundamentals team focused on defining and improving the end-to-end developer experience in Read more
Sublease Associate Optometrist- *Apple* Val...
Sublease Associate Optometrist- Apple Valley, CA- Target Optical Date: Jun 17, 2024 Brand: Target Optical Location: Apple Valley, CA, US, 92307 **Requisition Read more
Rehabilitation Medicine Technician - *Apple*...
Rehabilitation Medicine Technician - Apple Hill (Outpatient Clinic) - Day/Evening Location: York Hospital, York, PA Schedule: Part Time Sign-On Bonus Eligible Read more
Operations Associate - *Apple* Blossom Mall...
Operations Associate - Apple Blossom Mall Location:Winchester, VA, United States (https://jobs.jcp.com/jobs/location/191170/winchester-va-united-states) - Apple Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.