Nov 93 Challenge
Volume | | 9
|
Number | | 11
|
Column Tag | | Programmers Challenge
|
Programmers Challenge
By Mike Scanlin, MacTech Magazine Regular Contributing Author
Note: Source code files accompanying article are located on MacTech CD-ROM or source code disks.
WHO PLAYS WHO?
Thanks to Kevin Cutts (location unknown) for suggesting this months challenge. The goal is to match up teams for the annual MacTech Bean Counting contest where there are half as many playing areas as there are teams. Each team needs to play every other team exactly once. (And they dont want to wait all day for their schedule to be generated!)
The input is the number of teams, a list of team names and a list of playing area names. The number of teams will be an even number less than 25 and the number of playing areas will be half of the number of teams. The output will be to an existing file where you describe who plays who on what playing area at what time. Each bean counting match takes 10 minutes to play, so you can schedule a match every 15 minutes on each playing area. The events dont start until noon so that everyone involved has time to sleep in before their big day.
The prototype of the function you write is:
void ScheduleMatches(numTeams,
teamNames, playingAreaNames,
outputFile)
unsigned short numTeams;
Str255 teamNames[];
Str255 playingAreaNames[];
FILE *outputFile;
The outputFile will be open and empty when your routine is called. You write to the file using the standard C method of fprintf(outputFile, "Here is some output text.\n");, for example. You should not close the file on exit of your routine (the caller will close it since the caller opened it).
The format of the output is up to you. It should be intelligible, though. Dont skimp on output readability to save a few cycles of time.
The input team names and playing area names are Pascal strings that take 256 bytes each (length byte included). These arrays are read-only; if you want to convert them to C strings then youll have to copy them somewhere first. Dont worry about the special formatting requirements of long strings; I will be testing with fairly small strings.
Here is some sample input:
numTeams = 4;
teamNames[0] = \pCycleStealers;
teamNames[1] = \pBeanies;
teamNames[2] = \pRiscTakers;
teamNames[3] = \pGiraffeButts;
playingAreaNames[0] = \pField 1;
playingAreaNames[1] = \pField 2;
and suggested output format:
12:00
Field 1: CycleStealers vs. Beanies
Field 2: RiscTakers vs. GiraffeButts
12:15
Field 1: CycleStealers vs. RiscTakers
Field 2: Beanies vs. GiraffeButts
12:30
Field 1: CycleStealers vs. GiraffeButts
Field 2: Beanies vs. RiscTakers
TWO MONTHS AGO WINNER
It would appear that the 10 or more people who wrote to me and requested an assembly language challenge were either (1) kidding, (2) all on vacation during the last month, or (3) unable to cope with moving bits in assembly language, because I only received 3 entries to the BlockMoveBits assembly language challenge. And only one of them gave correct results. Congratulations to Bob Boonstra (Westford, MA) for (1) entering, (2) having correct code and (3) winning. Bobs code would have an excellent chance at winning even with more competition because it is very efficient indeed. Bob recently won the Where In The World? challenge, too, so this is his second win (the second two-time winner to date; there are no 3-time winners at this point). Well done!
Complements to Kevin Cutts for having the guts to enter C code in an assembly language contest. Despite the fact that his code was 690 bytes and used over 400 bytes of static lookup table data (compared to Bobs 166 bytes with no tables) his times were within a respectable 10% of Bobs. Correctness, however, is key and Kevins routine gave occasional bogus results so I had to disqualify it (be sure to try all 64 combinations of source and destination bit offsets; each can range from 0 to 7).
MAIL BAG
Recently I received a letter from a MacTech reader which said, in part:
I DO object to the programming contest though. It rewards convoluted, hard to maintain code at the expense of speed and size. In the real world the former is MUCH more important. Programs should be as small and as fast as they can be WITHOUT sacrificing understanding.
While I agree with this sentiment to some extent, it is my personal opinion that a large number of todays applications suffer from performance problems. And I dont think its the hardware that is lacking. I think intelligently written apps that do things like pre-compute data, cache data, use smart data structures and algorithms, and take advantage of specific processor tricks are doing their users a favor. I know that my mom, who is not a sophisticated user at all, gets frustrated when simple things like changing the font or margins of her 20 page letter on her Mac Classic takes longer than a few seconds (I thought these computer thingies were suppose to be fast?). Theres no reason why simple operations have to take so long. Optimizing data structures, algorithms and individual C statements is an important part of competing in the application market.
The purpose of this column is to help people see what kind of tricks and speedups are possible for those places where you need them. You dont have to write 100% totally, absolutely, perfectly efficient code all of the time (although some people do and my hat is off to them); you only have to do that in about 25% of your application that is doing all the real work. Also, remember that this column is, after all, a game and measuring cycles and bytes is much more objective and fair than something open to interpretation like a code maintainability criteria.
Having said that, we can take a look at another type of letter I received recently...
DIVIDE BY 15 TRICK
Frequent Challenge player Gerry Davis writes to me with a non-obvious trick to do a faster integer divide by 15:
This code:
/* 1 */
long i;
// j must be unsigned to catch overflow
unsigned long j;
j= i/15;
is faster as:
j =((i+((i+((i+((i+((i+((i+((i+
(i>>4)+1)>>4)+1)>>4)+1)>>4)+1)>>4)+
1)>>4)+1)>>4)+1)>>4);
This is about 5.5 times faster on a 68000 and 1.2 times faster on a 68020. It adds about 50 bytes of code on the 020, but on the 68000 the code necessary for a long divide is a lot more than this. You can remove some of the iterations to do short integers as well.
Thanks, Gerry. I tested it on my Quadra 700 and found that your version is 48 bytes and is about 1.4 times faster on the 040 than the chips built-in long divide instruction.
Does anyone else have any similar special case optimizations theyd like to share? Send them in!
Heres Bobs winning solution:
/*
** BlockMoveBits by Bob Boonstra
**
** Solution strategy:
** Use 68030 bit field manipulation instructions
** rather than shifting and masking.
** Accomplish move in three steps, where the first step
** aligns destination to longword, second step uses
** BFEXTU/MOVE.L combination instead of BFEXTU/BFINS to
** move bulk of the bits, and third step cleans up.
** Special case when srcBitOffset==destBitOffset,
** allowing main loop to use MOVE.L (x)+,(y)+
**
** Relative execution times for various strategies:
** 100: Straigntforward BFEXTU/BFINS in 32-bit chunks,
** 70: byte-align src and MOVE.L/BFINS in main loop,
** 58: byte-align dst and BFEXTU/MOVE.L in main loop,
** 50: long-aligned dst and BFEXTU/MOVE.L in main loop,
** 29: as above, if srcOffset==dstOffset use one MOVE.L
*/
/* some register definitions for readability */
#define bitCt d2
#define srcOffset d6
#define dstOffset d7
#define srcPtr a0
#define dstPtr a1
void BlockMoveBits(char *srcBytePtr, char *destBytePtr,
unsigned char srcBitOffset, unsigned char destBitOffset,
unsigned short bitCount)
{
asm 68030 {
; save registers
MOVEM.L d6-d7,-(a7)
; exit if no bits to move
MOVEQ #0,bitCt
MOVE.W bitCount,bitCt
; get params into registers
MOVE.L srcBytePtr,srcPtr
MOVE.L destBytePtr,dstPtr
MOVE.B srcBitOffset,d1
MOVEQ #0,d0
MOVE.B destBitOffset,d0
; calculate srcOffset and dstOffset in
; bit field manipulation coordinates
; (bit 0 is MSB)
MOVEQ #7,srcOffset
SUB.B d1,srcOffset
MOVEQ #7,dstOffset
SUB.B d0,dstOffset
; exit if <= 32 bits to move
CMPI.L #32,bitCt
BLE @lastbits
; convert dstOffset to initial bit count
ADDQ.W #1,d0
; STEP 1: Move enough bits to longAlign destination
; using bit field manipulation
; adjust bit count to longAlign destination
MOVE.W dstPtr,d1
ANDI.B #3,d1
EORI.B #3,d1
LSL.B #3,d1
ADD.B d1,d0
; move initial bits
BFEXTU (srcPtr){srcOffset:d0},d1
BFINS d1,(dstPtr){dstOffset:d0}
; decrement bits left to move
SUB.L d0,bitCt
; adjust source offset; this may make
; srcOffset >= 8, but BFEXTU does not care
ADD.W d0,srcOffset
; adjust dstPtr to account for alignment
LSR.B #3,d0
ADDQ.B #1,d0
ADDA.W d0,dstPtr
MOVEQ #0,dstOffset
; STEP 2: Main loop, MOVE.L all 32-bit chunks
; set up d0 with number of longwords to move
MOVE.W bitCt,d0
LSR.W #5,d0
BLE @lastbits
; set up bitCt for final BFEXTU/BFINS
ANDI.W #31,bitCt
; decrement d0 for subsequent DBRA
SUBQ.W #1,d0
; move bits one longword at a time
MOVE.B srcOffset,d1
ANDI.B #7,d1
BNE.S @longloop
; special case, src is byte-aligned
LSR.B #3,srcOffset
ADDA.L srcOffset,srcPtr
MOVEQ #0,srcOffset
alignloop:
MOVE.L (srcPtr)+,(dstPtr)+
DBRA d0,@alignloop
BRA.S @lastbits
; normal case, src not byte-aligned
longloop:
BFEXTU (srcPtr){srcOffset:0},d1
MOVE.L d1,(dstPtr)+
ADDQ.L #4,srcPtr
DBRA d0,@longloop
; STEP 3: Move remaining bits with bit field
; manipulation
lastbits:
TST.B bitCt
BEQ.S @done
; move leftover bits
BFEXTU (srcPtr){srcOffset:bitCt},d1
BFINS d1,(dstPtr){dstOffset:bitCt}
done:
; restore registers
MOVEM.L (a7)+,d6-d7
}
}
The rules
Heres how it works: Each month there will be a different programming challenge presented here. First, you must write some code that solves the challenge. Second, you must optimize your code (a lot). Then, submit your solution to MacTech Magazine (formerly MacTutor). A winner will be chosen based on code correctness, speed, size and elegance (in that order of importance) as well as the postmark of the answer. In the event of multiple equally desirable solutions, one winner will be chosen at random (with honorable mention, but no prize, given to the runners up). The prize for the best solution each month is $50 and a limited edition The Winner! MacTech Magazine Programming Challenge T-shirt (not to be found in stores).
In order to make fair comparisons between solutions, all solutions must be in ANSI compatible C (i.e., dont use Thinks Object extensions). Only pure C code can be used. Any entries with any assembly in them will be disqualified (except for those challenges specifically stated to be in assembly). However, you may call any routine in the Macintosh toolbox you want (i.e., it doesnt matter if you use NewPtr instead of malloc). All entries will be tested with the FPU and 68020 flags turned off in THINK C. When timing routines, the latest version of THINK C will be used (with ANSI Settings plus Honor register first and Use Global Optimizer turned on) so beware if you optimize for a different C compiler. All code should be limited to 60 characters wide. This will aid us in dealing with e-mail gateways and page layout.
The solution and winners for this months Programmers Challenge will be published in the issue two months later. All submissions must be received by the 10th day of the month printed on the front of this issue.
All solutions should be marked Attn: Programmers Challenge Solution and sent to Xplain Corporation (the publishers of MacTech Magazine) via snail mail or preferably, e-mail - AppleLink: MT.PROGCHAL, Internet: progchallenge@xplain.com, CompuServe: 71552,174 and America Online: MT PRGCHAL. If you send via snail mail, please include a disk with the solution and all related files (including contact information). See page 2 for information on How to Contact Xplain Corporation.
MacTech Magazine reserves the right to publish any solution entered in the Programming Challenge of the Month and all entries are the property of MacTech Magazine upon submission. The submission falls under all the same conventions of an article submission.