TweetFollow Us on Twitter

Fast Square Root Calc

Volume Number: 14 (1998)
Issue Number: 1
Column Tag: Assembler Workshop

Fast Square Root Calculation

by Guillaume Bédard, Frédéric Leblanc, Yohan Plourde and Pierre Marchand, Québec, Canada

Optimizing PowerPC Assembler Code to beat the Toolbox

Introduction

The calculation of the square root of a floating-point number is a frequently encountered task. However, the PowerPC processors don't have a square root instruction. The implementation presented here performs the square root of a double-precision number over the full range of representation of the IEEE 754 standard for normalized numbers (from 2.22507385851E-308 to 1.79769313486E308) with an accuracy of 15 or more decimal digits. It is very fast, at least six times faster than the Toolbox ROM call.

Theory of Operation

A floating point number has three components: a sign, a mantissa and an exponential part. For example, the number +3.5 x 10^4 (35 000) has a plus sign, a mantissa of 3.5 and an exponential part of 104. The mantissa consists of an integer part and a fractional part f.

A double precision number in IEEE 754 format has the same components: a sign bit s, an 11-bit exponent e and a 52 bit fraction f. The exponential part is expressed in powers of 2 and the exponent is biased by adding 1023 to the value of e. The mantissa is normalized to be of the form 1.f. Since the integer part of the normalized mantissa is always 1, it doesn't have to be included in the representation. The number is thus represented as follows: (-1)s x 1.f x 2^e+1023.

For example, the number 5.0 can be expressed in binary as 101.0, which means 101.0 x 2^0, which in turn is equal to 1.010 x 2^2, obtained by dividing the mantissa by 4 and multiplying 2^0 by 4. Therefore, the normalized mantissa is 1.010 and the exponent 2. Fraction f is then .01000000.... The biased exponent is obtained by adding 1023 to e and is 1025, or 10000000001 in binary. The double precision IEEE representation of 5.0 is finally:

-----------------------------------------------------------------------------
|  0 | 10000000001 | 01000000000000000000000000000000000000000000000000000  |
-----------------------------------------------------------------------------

or 4014000000000000 in hexadecimal notation for short.

First Approximation

Given this representation, a first approximation to the square root of a number is obtained by dividing the exponent by 2. If the number is an even power of 2 such as 16 or 64, the exact root is obtained. If the number is an odd power of 2 such as 8 or 32, 1/SQRT(2) times the square root is obtained. In general, the result will be within a factor SQRT(2) of the true value.

Refining the Approximation

The Newton-Raphson method is often used to obtain a more accurate value for the root x of a function f(x) once an initial approximation x0 is given:

[1]

This becomes, in the case of the square root of n, = x2 - n:where f(x)

[2]

An excellent approximation to the square root starting with the initial approximation given above is obtained within 5 iterations using equation [2]. This algorithm is already pretty fast, but its speed is limited by the fact that each iteration requires a double-precision division which is the slowest PowerPC floating-point instruction with 32 cycles on the MPC601 (Motorola, 1993).

Eliminating Divisions

Another approach is to use equation [1] with the function.

In this case, equation [1] becomes:

[3]

There is still a division by n, but since n is constant (it's the original number whose root we want to find), it can be replaced by multiplying by 1/n, which can be calculated once before the beginning of the iteration process. The five 32-cycle divisions are thus replaced by this single division followed by 5 much faster multiplications (5 cycles each). This approach is approximately three times faster than the preceding one. However, care must be taken for large numbers since the term in x02 can cause the operation to overflow.

Use of a table

Finally, an approach that is even faster consists in using a table to obtain a more accurate first approximation. In order to do so, the range of possible values of fraction f (0 to ~1) is divided into 16 sub-ranges by using the first 4 bits of f as an index into a table which contains the first two coefficients of the Taylor expansion of the square root of the mantissa (1.0 to ~2) over that sub-range.

The Taylor expansion is given in general by:

[4]

the first two terms of which yield, in the case where f(x) = SQRT(x):

[5]

The square root of x is thus approximated by 16 straight-line segments. The table therefore contains the values of

A =

and B =

for each of the 16 sub-ranges as shown in Figure 1. This first approximation gives an accuracy of about 1.5 %.

Figure 1. Approximation by straight line segment.

To reach the desired accuracy of 15 digits, equation [2] is applied twice to the result of equation [5]. To avoid having to perform two divisions by repeating the iteration, the two iterations are folded together as follows, which contains only one division:

and

[6]

In order to perform these calculation, the exponent of x and n is reduced to -1 (1022 biased), so that floating-point operations apply only to the values of the mantissa and don't overflow if the exponent is very large. The value of these numbers will therefore be in the range 0.5 to 1.0 since the mantissa is in the range 1.0 to 2.0. If the original exponent was odd, the mantissa is multiplied by SQRT(2) before applying equation [6].

Finally, the original exponent divided by two is restored at the end.

The Code

The SQRoot function shown in Listing 1 has been implemented in CodeWarrior C/C++ version 10.

Listing 1: SQRoot.c

// On entry, fp1 contains a positive number between 2.22507385851E-308
// and 1.79769313486E308. On exit, the result is in fp1.

asm long double SQRoot(long double num);   // prototype

float Table[35] = {
0.353553390593, 0.707106781187, 0.364434493428, 0.685994340570,
0.375000000000, 0.666666666667, 0.385275875186, 0.648885684523,
0.395284707521, 0.632455532034, 0.405046293650, 0.617213399848,
0.414578098794, 0.603022689156, 0.423895623945, 0.589767824620,
0.433012701892, 0.577350269190, 0.441941738242, 0.565685424949,
0.450693909433, 0.554700196225, 0.459279326772, 0.544331053952,
0.467707173347, 0.534522483825, 0.475985819116, 0.525225731439,
0.484122918276, 0.516397779494, 0.492125492126, 0.508000508001,
1.414213562373, 0.000000000000, 0.000000000000 };

asm long double Sqrt(long double num) {

   lwz   r3,Table(rtoc)      // address of Table[]
   lhz   r4,24(sp)           // load
                             // Sign(1)+Exponent(11)+Mantissa(4)
   andi.   r5,r4,0xF         // keep only Mantissa(4)
   ori   r5,r5,0x3FE0        // exponent = -1+BIAS = 1022
   sth   r5,24(sp)           // save reduced number

   rlwinm   r5,r5,3,25,28    // take 8*Mantissa(4) as index
   lfd   fp1,24(sp)          // load reduced number
   lfsux   fp4,r5,r3         // load coefficient A
   lfs   fp5,4(r5)           // load coefficient B
   lfs   fp3,128(r3)         // load SQRT(2)
   fmr   fp2,fp1             // copy reduced number
   rlwinm.   r5,r4,31,18,28  // divide exponent by 2
   beq   @@2                 // if (exponent == 0) then done

   fmadd   fp2,fp2,fp5,fp4   // approximation SQRT(x) = A + B*x
   andi.   r4,r4,0x10        // check if exponent even
   beq   @@1                 // if (exponent even) do iteration
   fmul   fp2,fp2,fp3        // multiply reduced number by SQRT(2)
   fadd   fp1,fp1,fp1        // adjust exponent of original number

@@1:   fadd   fp3,fp2,fp2    // 2*x
   fmul   fp5,fp2,fp1        // x*n
   fadd   fp3,fp3,fp3        // 4*x
   fmadd   fp4,fp2,fp2,fp1   // x*x + n
   fmul   fp5,fp3,fp5        // 4*x*x*n
   fmul   fp6,fp2,fp4        // denominator = x*(x*x + n)
   fmadd   fp5,fp4,fp4,fp5   // numerator = (x*x + n)*(x*x + n) +
                             // 4*x*x*n
   fdiv   fp1,fp5,fp6        // double precision division
   andi.   r5,r5,0x7FF0      // mask exponent 
   addi   r5,r5,0x1FE0       // rectify new exponent

@@2:   sth   r5,132(r3)      // save constant C (power of 2) 
   lfd   fp2,132(r3)         // load constant C
   fmul   fp1,fp1,fp2        // multiply by C to replace exponent
   blr                       // done, the result is in fp1
}

Performance

The code presented above runs in less than 100 cycles, which means less than 1 microsecond on a 7200/75 Power Macintosh and is more than six times faster than the ROM code. The code could be modified to make use of the floating reciprocal square root estimate instruction (frsqrte) that is available on the MPC603 and MPC604 processors, and which has an accuracy of 5 bits. It is not available on the MPC601, however. The method used here could also be used to evaluate other transcendental functions.

Performance was measured by running the code a thousand times and calling a simple timing routine found in (Motorola, 1993), that we called myGetTime(). It uses the real-time clock of the MPC 601 processor (RTCU and RTCL registers) and is shown in Listing 2. The routine would have to be modified to run on MPC603 or MPC604 processors, since they don't have the same real-time clock mechanism.

The code doesn't support denormalized numbers (below 2.22507385851E-308). This could easily be implemented albeit at the cost of a slight reduction in performance.

Listing 2: myGetTime.c

asm long myGetTime()
   {
lp:    mfspr   r4,4           // RTCU
   mfspr   r3,5               // RTCL
   mfspr   r5,4               // RTCU again
    cmpw      r4,r5           // if RTCU has changed, try again
    bne      lp
    rlwinm   r3,r3,25,7,31    // shift right since bits 25-31 are
                              // not used
    blr                       // the result is in r3. 1 unit is
                              // worth 128 ns.
   }

To run the code, a very simple interface using the SIOUX library is provided in Listing 3.

Listing 3: main.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fp.h>

void main()
   {
   long double   num, num2;
   long startTime, endTime, time;
   short i;


   do {
   printf("%2s","> ");           // caret
   scanf("%Lf",&num);           // read long double
   if (num < 0.0) num = 0.0;     // replace by 0.0 if negative
   startTime = myGetTime();
   for (i = 0; i < 1000; i++)    // repeat 1000 times
   num2 = SQRoot(num);              // call our function
   endTime = myGetTime();
   time = endTime - startTime;
   if (num > 1e-6 && num < 1e7)
      printf("%7s%Lf\n","root = ",num2);   // show result
   else
      printf("%7s%Le\n","root = ",num2);
   printf("%7s%d\n","time = ", time);      // show elapsed time
   }
   while (1);                              // repeat until Quit
   }

References

PowerPC 601 RISC Microprocessor User's Manual, Motorola MPC601UM/AD Rev 1, 1993.


The first three authors are undergraduate students in Computer Science at Université Laval in Québec, Canada. This work was done as an assignment in a course on Computer Architecture given by the fourth author.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Audio Hijack 3.7.3 - Record and enhance...
Audio Hijack (was Audio Hijack Pro) drastically changes the way you use audio on your computer, giving you the freedom to listen to audio when you want and how you want. Record and enhance any audio... Read more
CleanMyMac X 4.6.15 - Delete files that...
CleanMyMac makes space for the things you love. Sporting a range of ingenious new features, CleanMyMac lets you safely and intelligently scan and clean your entire system, delete large, unused files... Read more
Suitcase Fusion 21.2.1 - Font management...
Suitcase Fusion is the creative professional's font manager. Every professional font manager should deliver the basics: spectacular previews, powerful search tools, and efficient font organization.... Read more
Civilization VI 1.3.6 - Next iteration o...
Civilization® VI is the award-winning experience. Expand your empire across the map, advance your culture, and compete against history’s greatest leaders to build a civilization that will stand the... Read more
Dashlane 6.2042.0 - Password manager and...
Dashlane is an award-winning service that revolutionizes the online experience by replacing the drudgery of everyday transactional processes with convenient, automated simplicity - in other words,... Read more
Airfoil 5.9.2 - Send audio from any app...
Airfoil allows you to send any audio to AirPort Express units, Apple TVs, and even other Macs and PCs, all in sync! It's your audio - everywhere. With Airfoil you can take audio from any... Read more
VirtualBox 6.1.16 - x86 virtualization s...
VirtualBox is a family of powerful x86 virtualization products for enterprise as well as home use. Not only is VirtualBox an extremely feature rich, high performance product for enterprise customers... Read more
Xcode 12.1 - Integrated development envi...
Xcode includes everything developers need to create great applications for Mac, iPhone, iPad, and Apple Watch. Xcode provides developers a unified workflow for user interface design, coding, testing... Read more
FileZilla 3.51.0 - Fast and reliable FTP...
FileZilla (ported from Windows) is a fast and reliable FTP client and server with lots of useful features and an intuitive interface. Version 3.51.0: Bugfixes and minor changes: Fixed import of... Read more
KeyCue 9.8 - Displays all menu shortcut...
KeyCue has always been a handy tool for learning and remembering keyboard shortcuts. With a simple keystroke or click, KeyCue displays a table with all available keyboard shortcuts, system-wide... Read more

Latest Forum Discussions

See All

PUBG Mobile has provided yet another upd...
PUBG Mobile has been making a point of publicly mentioning all of their ongoing efforts to vanquish cheating from the popular battle royale. Today two teams within the company have provided updates on their progress. [Read more] | Read more »
Zombieland: AFK Survival is celebrating...
Zombieland: AFK Survival is currently celebrating its one-year anniversary. If you don't quite recognise the name that's because it initially launched as Zombieland: Double Tapper. Anyway, the game is celebrating turning one with two Halloween-... | Read more »
Distract Yourself With These Great Mobil...
There’s a lot going on right now, and I don’t really feel like trying to write some kind of pithy intro for it. All I’ll say is lots of people have been coming together and helping each other in small ways, and I’m choosing to focus on that as I... | Read more »
Genshin Impact Guide - Gacha Strategy: W...
If you're playing Genshin Impact without spending money, you'll always need to be looking for ways to optimize your play to maximize rewards without getting stuck in a position where you're tempted to spend. The most obvious trap here is the game'... | Read more »
Genshin Impact Adventurer's Guide
Hello and well met, fellow adventurers of Teyvat! Check out our all-in-one resource for all things Genshin Impact. We'll be sure to add more as we keep playing the game, so be sure to come back here to check for updates! [Read more] | Read more »
Genshin Impact Currency Guide - What...
Genshin Impact is great fun, but make no mistake: this is a gacha game. It is designed specifically to suck away time and money from you, and one of the ways the game does this is by offering a drip-feed of currencies you will feel compelled to... | Read more »
XCOM 2 Collection on iOS now available f...
The XCOM 2 Collection, which was recently announced to be coming to iOS in November, is now available to pre-order on the App Store. [Read more] | Read more »
Presidents Run has returned for the 2020...
IKIN's popular endless runner Presidents Run has returned to iOS and Android just in time for the 2020 election season. It will see players choosing their favourite candidate and guiding them on a literal run for presidency to gather as many votes... | Read more »
New update for Cookies Must Die adds new...
A new update for Rebel Twins’ platformer shooter Cookies Must Die is coming out this week. The update adds quite a bit to the game, including new levels and characters to play around with. [Read more] | Read more »
Genshin Impact Guide - How to Beat Pyro...
The end game of Genshin Impact largely revolves around spending resin to take on world bosses and clear domain challenges. These fights grant amazing rewards like rare artifacts and ascension materials for weapons and adventurers, but obviously... | Read more »

Price Scanner via MacPrices.net

Use our exclusive iPhone Price Trackers to fi...
Looking for a new Apple iPhone 12 or 12 Pro? Perhaps a deal on last year’s iPhone 11? Check out our iPhone Price Tracker here at MacPrices.net. We track new and clearance iPhone prices from Apple as... Read more
Weekend deal: $100 off 13″ MacBook Airs at Am...
Amazon has new 2020 13″ MacBook Airs on sale for $100 off Apple’s MSRP, starting at only $899. Their prices are the lowest available for new MacBooks from any Apple resellers. These are the same 13″... Read more
New 10.9″ 64GB Apple iPad Air on sale for $55...
Amazon has Apple’s new 2020 10.9″ 64GB WiFi iPad Air on sale today for $549.99 shipped. That’s $40 off MSRP. Pre-orders are available today at this discounted price, and Amazon states that the iPad... Read more
Get a clearance 2019 27″ 5K iMac for up to $5...
Apple has Certified Refurbished 2019 27″ 5K iMacs available starting at $1439 and up to $520 off their original MSRP. Apple’s one-year warranty is standard and shipping is free. The following... Read more
AT&T offers the Apple iPhone 11 for $10/m...
AT&T is offering Apple’s 64GB iPhone 11 for $10 per month, for customers opening a new line of service, no trade-in required. Discount is applied via monthly bill credits over a 30 month period.... Read more
Apple’s 2020 11″ iPad Pros on sale today for...
Apple reseller Expercom has new 2020 11″ Apple iPad Pros on sale for $50-$75 off MSRP, with prices starting at $749. These are the same iPad Pros sold by Apple in their retail and online stores: – 11... Read more
Did Apple Drop The Ball By Not Branding Its C...
EDITORIAL: 10.21.20 – In the branding game, your marketing strategy can either be a hit or a miss and the latter is the case for Apple when it missed out on an opportunity to brand its “SE” series of... Read more
27″ 6-core and 8-core iMacs on sale for up to...
Adorama has Apple’s 2020 27″ 6-core and 8-core iMacs on sale today for $50-$100 off MSRP, with prices starting at $1749. Shipping is free: – 27″ 3.1GHz 6-core iMac: $1749, save $50 – 27″ 3.3GHz 6-... Read more
Apple’s 16″ MacBook Pros are on sale for $300...
B&H Photo has 16″ MacBook Pros on sale today for $300-$350 off Apple’s MSRP, starting at $2099. Expedited shipping is free to many addresses in the US. Their prices are among the lowest available... Read more
Apple has 2020 13″ MacBook Airs available sta...
Apple has a full line of Certified Refurbished 2020 13″ MacBook Airs available starting at only $849 and up to $200 off the cost of new Airs. Each MacBook features a new outer case, comes with a... Read more

Jobs Board

Dental Receptionist - *Apple* Valley Clinic...
Dental Receptionist - Apple Valley Clinic + Job ID: 57314 + Department: Apple Valley Dental + City: Apple Valley, MN + Location: HP - Apple Valley Clinic Read more
*Apple* Mobility Specialist - Best Buy (Unit...
**788165BR** **Job Title:** Apple Mobility Specialist **Job Category:** Store Associates **Store Number or Department:** 001013-Virginia Commons-Store **Job Read more
Cub Foods - *Apple* Valley - Now Hiring Par...
Cub Foods - Apple Valley - Now Hiring Part Time! United States of America, Minnesota, Apple Valley Retail Post Date Oct 08, 2020 Requisition # 124800 Sign Up for Read more
*Apple* Mobility Specialist - Best Buy (Unit...
**784631BR** **Job Title:** Apple Mobility Specialist **Job Category:** Store Associates **Store Number or Department:** 000522-Baxter-Store **Job Description:** The Read more
Senior Data Engineer - *Apple* - Theorem, L...
Job Summary Apple is seeking an experienced, detail-minded data engineeringconsultant to join our worldwide business development and strategy team. If you are Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.