Nov 93 Challenge

Volume		9
Number		11
Column Tag		Programmers’ Challenge

Programmers’ Challenge

By Mike Scanlin, MacTech Magazine Regular Contributing Author

Note: Source code files accompanying article are located on MacTech CD-ROM or source code disks.

WHO PLAYS WHO?

Thanks to Kevin Cutts (location unknown) for suggesting this month’s challenge. The goal is to match up teams for the annual MacTech Bean Counting contest where there are half as many playing areas as there are teams. Each team needs to play every other team exactly once. (And they don’t want to wait all day for their schedule to be generated!)

The input is the number of teams, a list of team names and a list of playing area names. The number of teams will be an even number less than 25 and the number of playing areas will be half of the number of teams. The output will be to an existing file where you describe who plays who on what playing area at what time. Each bean counting match takes 10 minutes to play, so you can schedule a match every 15 minutes on each playing area. The events don’t start until noon so that everyone involved has time to sleep in before their big day.

The prototype of the function you write is:

void ScheduleMatches(numTeams,
 teamNames, playingAreaNames,
 outputFile)
unsigned short numTeams;
Str255  teamNames[];
Str255  playingAreaNames[];
FILE    *outputFile;

The outputFile will be open and empty when your routine is called. You write to the file using the standard C method of fprintf(outputFile, "Here is some output text.\n");, for example. You should not close the file on exit of your routine (the caller will close it since the caller opened it).

The format of the output is up to you. It should be intelligible, though. Don’t skimp on output readability to save a few cycles of time.

The input team names and playing area names are Pascal strings that take 256 bytes each (length byte included). These arrays are read-only; if you want to convert them to C strings then you’ll have to copy them somewhere first. Don’t worry about the special formatting requirements of long strings; I will be testing with fairly small strings.

Here is some sample input:

numTeams = 4;
teamNames[0] = “\pCycleStealers”;
teamNames[1] = “\pBeanies”;
teamNames[2] = “\pRiscTakers”;
teamNames[3] = “\pGiraffeButts”;
playingAreaNames[0] = “\pField 1”;
playingAreaNames[1] = “\pField 2”;

and suggested output format:

12:00
Field 1: CycleStealers vs. Beanies
Field 2: RiscTakers vs. GiraffeButts

12:15
Field 1: CycleStealers vs. RiscTakers
Field 2: Beanies vs. GiraffeButts

12:30
Field 1: CycleStealers vs. GiraffeButts
Field 2: Beanies vs. RiscTakers

TWO MONTHS AGO WINNER

It would appear that the 10 or more people who wrote to me and requested an assembly language challenge were either (1) kidding, (2) all on vacation during the last month, or (3) unable to cope with moving bits in assembly language, because I only received 3 entries to the BlockMoveBits assembly language challenge. And only one of them gave correct results. Congratulations to Bob Boonstra (Westford, MA) for (1) entering, (2) having correct code and (3) winning. Bob’s code would have an excellent chance at winning even with more competition because it is very efficient indeed. Bob recently won the Where In The World? challenge, too, so this is his second win (the second two-time winner to date; there are no 3-time winners at this point). Well done!

Complements to Kevin Cutts for having the guts to enter C code in an assembly language contest. Despite the fact that his code was 690 bytes and used over 400 bytes of static lookup table data (compared to Bob’s 166 bytes with no tables) his times were within a respectable 10% of Bob’s. Correctness, however, is key and Kevin’s routine gave occasional bogus results so I had to disqualify it (be sure to try all 64 combinations of source and destination bit offsets; each can range from 0 to 7).

MAIL BAG

Recently I received a letter from a MacTech reader which said, in part:

“I DO object to the programming contest though. It rewards convoluted, hard to maintain code at the expense of speed and size. In the real world the former is MUCH more important. Programs should be as small and as fast as they can be WITHOUT sacrificing understanding.”

While I agree with this sentiment to some extent, it is my personal opinion that a large number of today’s applications suffer from performance problems. And I don’t think it’s the hardware that is lacking. I think intelligently written apps that do things like pre-compute data, cache data, use smart data structures and algorithms, and take advantage of specific processor tricks are doing their users a favor. I know that my mom, who is not a sophisticated user at all, gets frustrated when simple things like changing the font or margins of her 20 page letter on her Mac Classic takes longer than a few seconds (“I thought these computer thingies were suppose to be fast?”). There’s no reason why simple operations have to take so long. Optimizing data structures, algorithms and individual C statements is an important part of competing in the application market.

The purpose of this column is to help people see what kind of tricks and speedups are possible for those places where you need them. You don’t have to write 100% totally, absolutely, perfectly efficient code all of the time (although some people do and my hat is off to them); you only have to do that in about 25% of your application that is doing all the real work. Also, remember that this column is, after all, a game and measuring cycles and bytes is much more objective and fair than something open to interpretation like a “code maintainability” criteria.

Having said that, we can take a look at another type of letter I received recently...

DIVIDE BY 15 TRICK

Frequent Challenge player Gerry Davis writes to me with a non-obvious trick to do a faster integer divide by 15:

This code:

/* 1 */

long i;
// j must be unsigned to catch overflow
unsigned long j; 

j= i/15;
is faster as:

j =((i+((i+((i+((i+((i+((i+((i+
 (i>>4)+1)>>4)+1)>>4)+1)>>4)+1)>>4)+
 1)>>4)+1)>>4)+1)>>4);

This is about 5.5 times faster on a 68000 and 1.2 times faster on a 68020. It adds about 50 bytes of code on the 020, but on the 68000 the code necessary for a long divide is a lot more than this. You can remove some of the iterations to do short integers as well.

Thanks, Gerry. I tested it on my Quadra 700 and found that your version is 48 bytes and is about 1.4 times faster on the 040 than the chip’s built-in long divide instruction.

Does anyone else have any similar special case optimizations they’d like to share? Send them in!

Here’s Bob’s winning solution:

/*
** BlockMoveBits by Bob Boonstra
**
** Solution strategy:
**   Use 68030 bit field manipulation instructions
**     rather than shifting and masking.
**   Accomplish move in three steps, where the first step
**     aligns destination to longword, second step uses
**     BFEXTU/MOVE.L combination instead of BFEXTU/BFINS to
**     move bulk of the bits, and third step cleans up.
**   Special case when srcBitOffset==destBitOffset,
**     allowing main loop to use MOVE.L (x)+,(y)+
**
** Relative execution times for various strategies:
** 100: Straigntforward BFEXTU/BFINS in 32-bit chunks,
**  70: byte-align src and MOVE.L/BFINS in main loop,
**  58: byte-align dst and BFEXTU/MOVE.L in main loop,
**  50: long-aligned dst and BFEXTU/MOVE.L in main loop,
**  29: as above, if srcOffset==dstOffset use one MOVE.L
*/

/* some register definitions for readability */
#define bitCt     d2
#define srcOffset d6
#define dstOffset d7
#define srcPtr    a0
#define dstPtr    a1

void BlockMoveBits(char *srcBytePtr, char *destBytePtr, 
  unsigned char srcBitOffset, unsigned char destBitOffset, 
  unsigned short bitCount)
{
  asm 68030 {
  
; save registers

    MOVEM.L   d6-d7,-(a7)
    
; exit if no bits to move

    MOVEQ     #0,bitCt
    MOVE.W    bitCount,bitCt
    
; get params into registers

    MOVE.L    srcBytePtr,srcPtr
    MOVE.L    destBytePtr,dstPtr
    MOVE.B    srcBitOffset,d1
    MOVEQ     #0,d0
    MOVE.B    destBitOffset,d0
    
; calculate srcOffset and dstOffset in
;   bit field manipulation coordinates 
;   (bit 0 is MSB)

    MOVEQ     #7,srcOffset    
    SUB.B     d1,srcOffset
    MOVEQ     #7,dstOffset
    SUB.B     d0,dstOffset
    
; exit if <= 32 bits to move

    CMPI.L     #32,bitCt
    BLE       @lastbits
    
; convert dstOffset to initial bit count

    ADDQ.W    #1,d0

; STEP 1:  Move enough bits to longAlign destination
;          using bit field manipulation

; adjust bit count to longAlign destination

    MOVE.W    dstPtr,d1
    ANDI.B    #3,d1
    EORI.B    #3,d1
    LSL.B     #3,d1
    ADD.B     d1,d0
    
; move initial bits

    BFEXTU    (srcPtr){srcOffset:d0},d1
    BFINS     d1,(dstPtr){dstOffset:d0}
    
; decrement bits left to move

    SUB.L     d0,bitCt
    
; adjust source offset; this may make
; srcOffset >= 8, but BFEXTU does not care

    ADD.W     d0,srcOffset
    
; adjust dstPtr to account for alignment

    LSR.B     #3,d0
    ADDQ.B    #1,d0
    ADDA.W    d0,dstPtr
    MOVEQ     #0,dstOffset

; STEP 2:  Main loop, MOVE.L all 32-bit chunks

; set up d0 with number of longwords to move

    MOVE.W    bitCt,d0
    LSR.W     #5,d0
    BLE       @lastbits
    
; set up bitCt for final BFEXTU/BFINS

    ANDI.W    #31,bitCt
    
; decrement d0 for subsequent DBRA

    SUBQ.W    #1,d0
    
; move bits one longword at a time

    MOVE.B    srcOffset,d1
    ANDI.B    #7,d1
    BNE.S     @longloop
    
; special case, src is byte-aligned

    LSR.B     #3,srcOffset
    ADDA.L    srcOffset,srcPtr
    MOVEQ     #0,srcOffset
    
alignloop:

    MOVE.L    (srcPtr)+,(dstPtr)+
    DBRA      d0,@alignloop
    BRA.S     @lastbits
    
; normal case, src not byte-aligned

longloop:

    BFEXTU    (srcPtr){srcOffset:0},d1
    MOVE.L    d1,(dstPtr)+
    ADDQ.L    #4,srcPtr
    DBRA      d0,@longloop

; STEP 3:  Move remaining bits with bit field
;          manipulation

lastbits:

    TST.B     bitCt
    BEQ.S     @done
    
; move leftover bits

    BFEXTU    (srcPtr){srcOffset:bitCt},d1
    BFINS     d1,(dstPtr){dstOffset:bitCt}
    
done:

; restore registers

    MOVEM.L   (a7)+,d6-d7
  }
}

The rules

Here’s how it works: Each month there will be a different programming challenge presented here. First, you must write some code that solves the challenge. Second, you must optimize your code (a lot). Then, submit your solution to MacTech Magazine (formerly MacTutor). A winner will be chosen based on code correctness, speed, size and elegance (in that order of importance) as well as the postmark of the answer. In the event of multiple equally desirable solutions, one winner will be chosen at random (with honorable mention, but no prize, given to the runners up). The prize for the best solution each month is $50 and a limited edition “The Winner! MacTech Magazine Programming Challenge” T-shirt (not to be found in stores).

In order to make fair comparisons between solutions, all solutions must be in ANSI compatible C (i.e., don’t use Think’s Object extensions). Only pure C code can be used. Any entries with any assembly in them will be disqualified (except for those challenges specifically stated to be in assembly). However, you may call any routine in the Macintosh toolbox you want (i.e., it doesn’t matter if you use NewPtr instead of malloc). All entries will be tested with the FPU and 68020 flags turned off in THINK C. When timing routines, the latest version of THINK C will be used (with ANSI Settings plus “Honor ‘register’ first” and “Use Global Optimizer” turned on) so beware if you optimize for a different C compiler. All code should be limited to 60 characters wide. This will aid us in dealing with e-mail gateways and page layout.

The solution and winners for this month’s Programmers’ Challenge will be published in the issue two months later. All submissions must be received by the 10th day of the month printed on the front of this issue.

All solutions should be marked “Attn: Programmers’ Challenge Solution” and sent to Xplain Corporation (the publishers of MacTech Magazine) via “snail mail” or preferably, e-mail - AppleLink: MT.PROGCHAL, Internet: progchallenge@xplain.com, CompuServe: 71552,174 and America Online: MT PRGCHAL. If you send via snail mail, please include a disk with the solution and all related files (including contact information). See page 2 for information on “How to Contact Xplain Corporation.”

MacTech Magazine reserves the right to publish any solution entered in the Programming Challenge of the Month and all entries are the property of MacTech Magazine upon submission. The submission falls under all the same conventions of an article submission.

Software Updates via MacUpdate

Latest Forum Discussions

Price Scanner via MacPrices.net

Jobs Board

SPREAD THE WORD:
Slashdot
Digg
Del.icio.us
Reddit
Newsvine