TweetFollow Us on Twitter

Thread Performance Analysis

Volume Number: 13 (1997)
Issue Number: 1
Column Tag: Toolbox Techniques

Preempting the Mac

By Fabrizio Oddone, Torino, Italy

How to use preemptive threads and how well

This article focuses on the little known facts about the Thread Manager, an interesting and useful part of the MacOS. I assume that the reader is already familiar with the relevant Thread Manager documentation listed in the Bibliography.

Why Preemptive Threads?

One widely heard complaint against the MacOS is its supposed lack of preemptive multitasking, dubbed also "true" multitasking by the lovers of George Boole. The Thread Manager provides this "longed for" capability, but a number of gotchas, like the ones listed below, have steered developers away from adopting its most attractive features.

  • You cannot call most of the Toolbox from within a preemptive thread.
  • Preemptive threads use only half of the CPU power available to the application (the other half is reserved to cooperative threads.)
  • Preemptive threads are not available on PowerMacs.
  • There is no system-supported semaphore API. (The library enclosed with the Apple SDK is 680x0 only.)

I am going to examine each of these gripes' one by one, but before delving into the details, I would like to warn you against one very nasty Thread Manager bug. You must take this bug into account if you are willing to make use of preemptive threads in your application.

One Word of Caution

Thread Manager versions earlier than 2.1 had a "feature" that made preemptive threads practically unusable [Bechtel, 1995]. Preemptive threads did not preempt after the first threaded application launched in Threads 2.0.1. This was fixed.

Apple, in their infinite wisdom, does not tell us how to detect the fundamental bug fix. The Universal Headers 2.1 lack the relevant information as well. (I cannot comment upon the newer 2.1.1 headers because Apple does not make them available for download.) We lucky Gestalt Selectors List dwellers (thanks to Rene G.A. Ros for maintaining the mail list) have figured out a tentative answer.

Listing 1: Gestalt.c

GestaltCheck
// Checks whether a reliable Thread Manager is installed. This routine should deliver 
// TRUE if you can safely call the Thread Manager API, FALSE otherwise.
Boolean GestaltCheck(void)
{
enum {
// preemptive scheduler fix present?
gestaltSchedulerFix = 3
};
long  Gresp;
Boolean pThreads = false;
if (TrapAvailable(_Gestalt)) {
 if (Gestalt(gestaltThreadMgrAttr, &Gresp) == noErr) {
 pThreads = (Gresp & (1L << gestaltThreadMgrPresent)) &&
 (Gresp & (1L << gestaltSchedulerFix));
 }
 }
// If we are compiling for the Code Fragment Manager, check whether we can 
// successfully call the library. The gestaltThreadsLibraryPresent bit is not correctly set
// sometimes, so we don't rely on it.
#if GENERATINGCFM
if (pThreads)
 if (NewThread == (void *)kUnresolvedCFragSymbolAddress)
 pThreads = false;
#endif
return pThreads;
}

Since preemptive threads do not work as expected under outdated Thread Managers and a fixed version is freely available, your best bet is requiring the bug fix at all times. This is not official Apple gospel, so you will have to take my word for it.

Cannot Call the Toolbox Forever, I Guess

Many developers usually surrender, maybe hoping that Copland will improve matters. Unfortunately, as far as I know, Copland's preemptive processes remains subject to the very same limitation. The Toolbox and older applications will run cooperatively, sharing the same address space. Only specifically written applications not calling the Toolbox may be entitled to run preemptively in a separate, protected address space. The moral of the story, as always, is to keep the user interface code (using the Toolbox) clearly separated from the actual code (not relying on the Toolbox). If you manage to run preemptively under the Thread Manager today, chances are that you will run with little or no effort, preemptively and safely, under Copland. Also, keep the parts shared by the preemptive thread and the host application to a minimum and clearly documented. Currently, a thread has complete access to the application memory and globals, whereas a Copland process may not, due to protected memory.

Half of the CPU?

The Thread Manager 2.0 documentation states that Preemptive threads are not required to make yield calls to cause a context switch, although they certainly may, and they share 50% of their CPU time with the currently executing cooperative thread. However, calling yield from a preemptive thread is desirable if that thread is not currently busy.

This paragraph has given birth to a catastrophic superstition - that any calculation will run at half the speed if assigned to a preemptive thread, all else being inactive. Thorough tests clearly demonstrate that the facts are different and somewhat surprising. I grabbed the Apple sample code implementing the Dhrystone benchmark and I modified it in order to use either preemptive threads or cooperative threads. (CW7 Gold Reference: MacOS System Extensions: Thread Manager 2.1: Sample Applications: 68k Examples: Traffic Threads.)

Our main event loop is structured like that shown in Listing 2.

Listing 2: EventLoop.c

EventLoop
// This is our simple event loop. If you are calculating and need to use the CPU as much // as possible, pass 
a zero sleep time. If the Mac is executing only your calculations and // you pass X ticks, those X ticks are 
actually lost. Remember to reset the sleep 
// parameter to a reasonable value (usually CaretTime if a blinking cursor is visible) 
// when your application is idle again.
void EventLoop(void)
{
EventRecord event;

do {
 if (WaitNextEvent(everyEvent, &event, 0UL, nil)) {
 DoEvent(&event);

// this is a "busy" cooperative loop, waiting for the preemptive thread to finish; useful // to evaluate whether 
the CPU is evenly scheduled between the main cooperative 
// thread and the preemptive thread
#if _TESTBALANCE
 while (gDhrystoneDone == false) ;
#endif
 }
 else {
    // the else clause is executed when no events are pending; since we want to give
    // preference to the calculating thread, we explicitly tell the scheduler; if you are                          // using many 
calculating threads keep them in a list, in order to yield to each
 (void) YieldToThread(gDhrystoneThreadID);
 if (gDhrystoneDone) {
    // stuff used to update the window and the log file removed
 gDhrystoneDone = false;
 (void) SetThreadState(gDhrystoneThreadID, kReadyThreadState, kNoThreadID);
 }
 }
 }
while ( gQuit == false );
}

We want to evaluate how much time the Mac actively spends calculating, so we start by ensconcing a proper frame of reference. Since we are probing the Operating Systems behavior, but we want our findings independent of the relative speed of each Mac model, we set each Mac maximum performance level equal to 1.0 (one). With "maximum performance level" we mean the one obtained executing the test calculation within a cooperative thread, that never yields the CPU. It is better to explain the figures with an example. For a given method, a score of 1.25 means that the calculation takes one minute and fifteen seconds, compared to one minute for the same calculation taking place without yielding the CPU. Note that we nit-pickers are also very interested in evaluating possible behavioral changes when the application is kept in the background (as opposed to the foreground) and nothing else is running. Cooperative threads, when yielding, yield the CPU every 20 ticks (1/3 of a second.) Preemptive threads do not need to yield and in fact never yield in our test.

Listing 3: Yield.c

Dhrystone
// This shows my calculation routine that may be called by a cooperative or preemptive // thread. The symbol 
_COOPYIELD must be set to 0 in the latter case.

void
Dhrystone(void)
{
    // other variables removed for clarity
 register UInt32 Run_Index;

// the following gets compiled only when we don't use preemptive threads; this is very // important since you 
shall NEVER call TickCount() within a preemptive thread!
#if_COOPYIELD
 UInt32 base_Time = TickCount();
 UInt32 curr_Time;
#endif

/* Initializations */
// initialization stuff removed
 for (Run_Index = 1; Run_Index <= kNumber_Of_Runs; 
 ++Run_Index) {
 if((Run_Index & 0xFFF) == 0) YieldToAnyThread();

// the above method was originally used in the Apple sample; decidedly unwise, since // slower Macs will 
yield too little (impairing responsiveness) and faster Macs will yield // too much (wasting precious CPU time)

// calculating stuff removed

// actual yielding code
#if_COOPYIELD
 curr_Time = TickCount();
 if (curr_Time > base_Time + 20UL) {
 YieldToAnyThread();
 base_Time = curr_Time;
 }
#endif
 } 
// loop "for Run_Index"
}

All Macs were tested when running under System 7.5.1 with extensions turned off (the System incorporates Thread Manager 2.1.1), except for the SE/30 under 32 bit mode, having MODE32 7.5 installed. If not specified, 32 bit mode is implied in any case. No other applications were running, and the mouse was left quiet.

ResultsCooperative fgCooperative bgPreemptive fgPreemptive bg
Classic1.0661.2142.3041.690

LC II

1.0481.2681.4072.060
SE/30 24 bit1.0001.1011.2031.497
SE/30 32 bit1.0351.1331.2351.521
IIvx 24 bit1.0111.0471.1541.129
IIvx 32 bit1.0681.0901.1781.153
Quadra 700 24 bit1.0121.0841.080 1.341

Quadra 700 32 bit

1.0141.0851.075 1.335
PB 5401.0071.0321.0781.144

Note that the displayed results are averaged over a reasonable number of runs (ten runs at most, sometimes less since the timings settle quickly). I always treated the first run as an outliner and discarded it (because of the window updates, major context switches, etc.)

Except for a couple of Mac models, the situation is much better than one would expect, in the light of the 50% passage I have previously quoted from the Thread Manager documentation. This probably happens because of the explicit YieldToThread() call at idle time. However, our desire is to observe a constant pattern across Mac models, since we normalized our data against the faster result on each Mac. On the contrary, we cannot help but spot a significant and annoying variability in the measured behavior. A chart sharply supports our contention.

Figure 1. Thread methods compared.

I am completely at a loss here. The very same program under the very same Operating System version behaves differently, depending on the Mac model. Just when you thought that computers were deterministic devices Let's now look at the same data from another perspective.

Figure 2. Macs compared.

A quick glance at the chart may fool the reader into thinking that Macs perform better under 24 bit mode. This is not true, and remember the normalization trick. The absolute timings show that in the faster calculation, 32 bit mode always outperforms 24 bit mode. To understand this, we quote develop #9 (Winter, 1992) p. 87, "Turning on 32-bit addressing helps because it reduces interrupt handler overhead." Also, some parts of the Toolbox may run faster when 32 bit mode is on, notably QuickDraw. (I witnessed this myself on my Quadra 700, with an old SpeedoMeter version.) Rather unusually, when using threads, the opposite happens and calculations proceed at a better pace under 24 bit mode. Apple is not known for being quick at repartee, nonetheless, we are all ears waiting for a detailed explanation on this subject.

Even the worst case situation, though less pronounced, portrays a variable outcome. We obtain this chart by setting the _TESTBALANCE symbol to 1. (See Listing 2.) We conclude that the CPU is not evenly divided between the main cooperative thread and the preemptive thread. The former has a little, but significant, Mac-dependent scheduling advantage. The even point is obviously at abscissa 2.0.

Figure 3. Worst case situation.

Although with my data collection at hand I cannot but reproach the slouching gait of preemptive threads. I still think that reengineering an existing application (or writing one from scratch), and letting the user choose between preemptive and cooperative threads has no contradictions of sort. The potential speed loss occurring in the preemptive case is counterbalanced by such a user interface responsiveness that you will wonder why in the world you waited so long to implement this feature.

One last remark for those who are screaming since the start of this section, "If you don't like the default scheduler, write your own! The Thread Manager allows this!" My answer is simple - custom schedulers are intended (at least they should be) for unusual situations, not for fairly standard programming constructs. Remember that programmers, though superhuman to some extent, are mere mortals themselves. Therefore, Donald Norman's motto is still valid, "Activities that are easy to do tend to get done; those that are difficult tend not to get done."

Preemptive Threads Not Available on PowerMacs

This is not a good reason for leaving us poor 680x0 denizens (I don't own a PowerMac, yet) with sluggish applications. I have just shown that you can easily remodel a preemptive thread into a friendly, CPU-yielding, cooperative thread by conditionally compiling a short bunch of code. However, since we have advocated a preferences-based option, and we also want to take advantage of preemptive threads on PowerMacs automatically, as soon as Apple decides they are worth the effort, we have to modify the previous listing appropriately.

Listing 4: YieldRealWorld.c

DhrystoneR
// This shows a real-world calculation routine used either cooperatively or 
// preemptively, depending on a global setting.

void
DhrystoneR(void)
{
    // other variables removed for clarity
 register UInt32 Run_Index;
 
 UInt32 base_Time;
 UInt32 curr_Time;
 
    /* Initializations */
 if (gUseCooperative)
 base_Time = TickCount();
    // other initialization stuff removed
 for (Run_Index = 1; Run_Index <= kNumber_Of_Runs;
 ++Run_Index) {
    // calculating stuff removed
    // actual yielding code
 if (gUseCooperative) {
 curr_Time = TickCount();
 if (curr_Time > base_Time + 20UL) {
 YieldToAnyThread();
 base_Time = curr_Time;
 }
 }
 } 
// loop "for Run_Index"
}

Of course the application would check at initialization time whether preemptive threads are available or not (calling NewThread() and checking for paramErr), and gray out the relevant choice in the latter case. Speaking about user friendliness, I think that most users are neither aware, nor interested in the cooperative vs. preemptive issue, so we should label the two choices avoiding technical jargon.

As a last remark, while it is true that you cannot spawn preemptive threads in native mode, you can in emulation mode. My Disk Charmer application takes advantage of this.

Multiprocessing Trivia

Multiprocessing is the bleeding edge, especially now that the BeBox has been unveiled. It would be great if the Thread Manager could automatically allocate preemptive threads on multiple processors, but this does not emerge from the MP API [MP, May 1995] Daystar and Apple developed. Instead, I infer that one has to call yet another API instead of the Thread Manager's, in order to benefit from the added CPU horsepower. This is a less than elegant design decision, to say the least. At any rate, the rules an MP task must follow are the same as those pertaining to preemptive threads. So if you are preemptive, you are probably ready for multiprocessing.

No Semaphores

I think you will have to live without them, at least until they are explicitly supported (the Multiprocessing API supports semaphores and other synchronization constructs.) However, there are reasons that suggest that you avoid semaphores whenever possible. Let me try to clear the mist (or add to the confusion).

• Semaphores are good.

This is taken from Silberschatz-Galvin [1994], p. 186:

Although semaphores provide a convenient and effective mechanism for process synchronization, their incorrect use can still result in timing errors that are difficult to detect, since these errors happen only if some particular execution sequences take place, and these sequences do not always occur.

See also this brief excerpt from the Ada95 Rational:

[ ] avoided the methodological difficulties encountered by the use of low level primitives such as semaphores and signals. As is well known, such low-level primitives suffer from similar problems as gotos; it is obvious what they do and they are trivial to implement but in practice easy to misuse and can lead to programs that are difficult to maintain. [ ]

For these reasons, the adoption of higher level constructs is advocated. (Interested readers may refer to the texts above cited.) The semaphore/goto comparison reminds me that, oddly enough, the man whom first attacked gotos [Dijkstra, 1968] is the one who earlier introduced semaphores [Dijkstra, 1965].

• Semaphores are necessary.

Not strictly: you can boot a UN*X box with the semaphore facilities conveniently uninstalled.

• Semaphores are efficient.

Semaphores may be efficient, though it depends on the implementation. I recently had to write the customary "dining philosophers" program using the UN*X semaphore primitives. For the record, I used an asymmetric solution. Running under HP-UX 9 on a 68040 HP workstation produced disconcerting results. You run the program with some forty semaphores/processes, and as the CPU load skyrockets, the whole computer unbelievably slows down, crawls, and withers. The unlucky user at the console can barely move the mouse (if the dreaded X Window System is running). This is worse than the Mac while initializing a floppy. Responsiveness improves slightly, but not to a reasonable degree, if you set the friendliest priority with the ‘nice' command. By the way, did you know that UN*X has two different ‘nice' commands, one built into the C shell, and one as an external command, with two different command syntax's?

So, my advice when it comes to synchronizing primitives - if the programming language you are using supports tasking constructs, (such as Ada95, but this is not available on the Mac as I am writing this) go for it. As an added advantage, you may easily port your tasking code on different platforms. If you are stuck with a mainstream language without tasking support, (Pascal, C, C++) stay with the Thread Manager primitives.

On a related note, I have seen some semaphore implementations that use the Enqueue() and Dequeue() system calls on the net. Provided that you are only using threads and not other interrupt-level code, this method is overkill because the above mentioned system calls disable interrupts. The Thread Manager API is more desirable because the relevant critical region calls disable thread preemption only, leaving interrupts enabled [Anderson-Post, 1994].

Wish List and Concluding Remarks

Is it too much to ask Apple for a dependable, levelheaded Thread Manager? What about multiprocessing support? Was Apple fast asleep while Gassée was hard at work? I can't believe it! <evil grin> While I am at it, what about a native Event Manager well before Copland is released?

That's all, folks. You have enough material to bash Apple for the next few weeks, and enough enthusiasm to dive head over heels into preemptive threads!

Bibliography and References

Anderson, Eric and Post, Brad. "Concurrent Programming with the Thread Manager". develop, The Apple Technical Journal, issue 17 (March 1994), pp. 73-98. Apple Computer's Developer Press.

Anderson, Eric & friends. "Thread Manager for Macintosh Applications". Final Draft, Revision 2.0 (January 24, 1994). [CW7 Gold Reference: MacOS System Extensions: Thread Manager 2.1: Thread Manager Documentation].

Bechtel, Brian. "System 7.5 Update 1.0". TechNote OS 07 (February 1995).

Dijkstra, E. W. "Cooperating sequential processes". Technical Report EWD-123, Technological University, Eindhoven, the Netherlands, (1965). Reprinted in [Genuys, 1968], p. 43-112.

Dijkstra, E. W. "GOTO statement considered harmful". Communications of the ACM, 11.3.147 (1968). ACM Press.

Genuys, F. (editor). Programming Languages (1968). Academic Press, London, England.

Silberschatz, Abraham and Galvin, Peter B. Operating System Concepts, Fourth Edition (1994). Addison-Wesley.

"Multiprocessor API Specification" prepared by Apple Computer and DayStar Digital, Inc. for the World Wide Developers Conference (May 1995).

Relevant Internet URL's

Multiprocessing sites:

http://www.daystar.com/

http://www.be.com/

Here you can download for free the Ada95 Rationale:

http://sw-eng.falls-church.va.us/

http://lglwww.epfl.ch/Ada/

The Gestalt Selectors List is here:

http://www.bio.vu.nl/home/rgaros/gestalt/

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Latest Forum Discussions

See All

Tokkun Studio unveils alpha trailer for...
We are back on the MMORPG news train, and this time it comes from the sort of international developers Tokkun Studio. They are based in France and Japan, so it counts. Anyway, semantics aside, they have released an alpha trailer for the upcoming... | Read more »
Win a host of exclusive in-game Honor of...
To celebrate its latest Jujutsu Kaisen crossover event, Honor of Kings is offering a bounty of login and achievement rewards kicking off the holiday season early. [Read more] | Read more »
Miraibo GO comes out swinging hard as it...
Having just launched what feels like yesterday, Dreamcube Studio is wasting no time adding events to their open-world survival Miraibo GO. Abyssal Souls arrives relatively in time for the spooky season and brings with it horrifying new partners to... | Read more »
Ditch the heavy binders and high price t...
As fun as the real-world equivalent and the very old Game Boy version are, the Pokemon Trading Card games have historically been received poorly on mobile. It is a very strange and confusing trend, but one that The Pokemon Company is determined to... | Read more »
Peace amongst mobile gamers is now shatt...
Some of the crazy folk tales from gaming have undoubtedly come from the EVE universe. Stories of spying, betrayal, and epic battles have entered history, and now the franchise expands as CCP Games launches EVE Galaxy Conquest, a free-to-play 4x... | Read more »
Lord of Nazarick, the turn-based RPG bas...
Crunchyroll and A PLUS JAPAN have just confirmed that Lord of Nazarick, their turn-based RPG based on the popular OVERLORD anime, is now available for iOS and Android. Starting today at 2PM CET, fans can download the game from Google Play and the... | Read more »
Digital Extremes' recent Devstream...
If you are anything like me you are impatiently waiting for Warframe: 1999 whilst simultaneously cursing the fact Excalibur Prime is permanently Vault locked. To keep us fed during our wait, Digital Extremes hosted a Double Devstream to dish out a... | Read more »
The Frozen Canvas adds a splash of colou...
It is time to grab your gloves and layer up, as Torchlight: Infinite is diving into the frozen tundra in its sixth season. The Frozen Canvas is a colourful new update that brings a stylish flair to the Netherrealm and puts creativity in the... | Read more »
Back When AOL WAS the Internet – The Tou...
In Episode 606 of The TouchArcade Show we kick things off talking about my plans for this weekend, which has resulted in this week’s show being a bit shorter than normal. We also go over some more updates on our Patreon situation, which has been... | Read more »
Creative Assembly's latest mobile p...
The Total War series has been slowly trickling onto mobile, which is a fantastic thing because most, if not all, of them are incredibly great fun. Creative Assembly's latest to get the Feral Interactive treatment into portable form is Total War:... | Read more »

Price Scanner via MacPrices.net

Early Black Friday Deal: Apple’s newly upgrad...
Amazon has Apple 13″ MacBook Airs with M2 CPUs and 16GB of RAM on early Black Friday sale for $200 off MSRP, only $799. Their prices are the lowest currently available for these newly upgraded 13″ M2... Read more
13-inch 8GB M2 MacBook Airs for $749, $250 of...
Best Buy has Apple 13″ MacBook Airs with M2 CPUs and 8GB of RAM in stock and on sale on their online store for $250 off MSRP. Prices start at $749. Their prices are the lowest currently available for... Read more
Amazon is offering an early Black Friday $100...
Amazon is offering early Black Friday discounts on Apple’s new 2024 WiFi iPad minis ranging up to $100 off MSRP, each with free shipping. These are the lowest prices available for new minis anywhere... Read more
Price Drop! Clearance 14-inch M3 MacBook Pros...
Best Buy is offering a $500 discount on clearance 14″ M3 MacBook Pros on their online store this week with prices available starting at only $1099. Prices valid for online orders only, in-store... Read more
Apple AirPods Pro with USB-C on early Black F...
A couple of Apple retailers are offering $70 (28%) discounts on Apple’s AirPods Pro with USB-C (and hearing aid capabilities) this weekend. These are early AirPods Black Friday discounts if you’re... Read more
Price drop! 13-inch M3 MacBook Airs now avail...
With yesterday’s across-the-board MacBook Air upgrade to 16GB of RAM standard, Apple has dropped prices on clearance 13″ 8GB M3 MacBook Airs, Certified Refurbished, to a new low starting at only $829... Read more
Price drop! Apple 15-inch M3 MacBook Airs now...
With yesterday’s release of 15-inch M3 MacBook Airs with 16GB of RAM standard, Apple has dropped prices on clearance Certified Refurbished 15″ 8GB M3 MacBook Airs to a new low starting at only $999.... Read more
Apple has clearance 15-inch M2 MacBook Airs a...
Apple has clearance, Certified Refurbished, 15″ M2 MacBook Airs now available starting at $929 and ranging up to $410 off original MSRP. These are the cheapest 15″ MacBook Airs for sale today at... Read more
Apple drops prices on 13-inch M2 MacBook Airs...
Apple has dropped prices on 13″ M2 MacBook Airs to a new low of only $749 in their Certified Refurbished store. These are the cheapest M2-powered MacBooks for sale at Apple. Apple’s one-year warranty... Read more
Clearance 13-inch M1 MacBook Airs available a...
Apple has clearance 13″ M1 MacBook Airs, Certified Refurbished, now available for $679 for 8-Core CPU/7-Core GPU/256GB models. Apple’s one-year warranty is included, shipping is free, and each... Read more

Jobs Board

Seasonal Cashier - *Apple* Blossom Mall - J...
Seasonal Cashier - Apple Blossom Mall Location:Winchester, VA, United States (https://jobs.jcp.com/jobs/location/191170/winchester-va-united-states) - Apple Read more
Seasonal Fine Jewelry Commission Associate -...
…Fine Jewelry Commission Associate - Apple Blossom Mall Location:Winchester, VA, United States (https://jobs.jcp.com/jobs/location/191170/winchester-va-united-states) Read more
Seasonal Operations Associate - *Apple* Blo...
Seasonal Operations Associate - Apple Blossom Mall Location:Winchester, VA, United States (https://jobs.jcp.com/jobs/location/191170/winchester-va-united-states) - Read more
Hair Stylist - *Apple* Blossom Mall - JCPen...
Hair Stylist - Apple Blossom Mall Location:Winchester, VA, United States (https://jobs.jcp.com/jobs/location/191170/winchester-va-united-states) - Apple Blossom Read more
Cashier - *Apple* Blossom Mall - JCPenney (...
Cashier - Apple Blossom Mall Location:Winchester, VA, United States (https://jobs.jcp.com/jobs/location/191170/winchester-va-united-states) - Apple Blossom Mall Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.