TweetFollow Us on Twitter

FORTRAN Benchmarks
Volume Number:7
Issue Number:1
Column Tag:Jörg's Folder

Absoft's MacFORTRAN II

By Jörg Langowski, MacTutor Editorial Board

Note: Source code files accompanying article are located on MacTech CD-ROM or source code disks.

“Two FORTRANS for MPW”

You have read several columns on the FORTRAN compiler by Language Systems, which for a while was the only one to offer MPW support. Not for long, though; the creators of the first Fortran compiler for the Macintosh, Absoft Co., were not asleep and brought out their MPW FORTRAN, first MacFORTRAN/MPW and recently MacFortran II. A comparison between the two compilers was long overdue, and this month you will find those long-awaited benchmark figures, together with some insights into the code produced by the two compilers.

When I read the Whetstone benchmark figures in the Absoft ad, which were above those of Language Systems’ compiler by more than a factor of two, I was surprised, and curious to find out how they were able to do this. I asked for a sample copy, which they kindly provided, together with a corresponding T-Shirt, so that I can now run advertising for both Language Systems (odd days) and Absoft (even days).

The Absoft MacFORTRAN II package is very easy to set up and use. It comes with a well-written manual and an installation script for MPW 3.1. The manual is very well structured and has an extensive index; the documentations of Absoft and LS Fortran match each other in quality. An extensive set of examples is also provided, showing standard Fortran programming, toolbox calls, performance analysis and Hypercard interfacing.

Code optimization

The installation script sets up a menu that allows you to invoke the Commando interface to the compiler. That interface is almost indispensable; you have a choice between so many different options that it becomes hard to remember or type in everything. Most of these options have to do with code optimization. Absoft lets you switch on or off every available optimization individually. The most important basic optimizations are turned on by the -O compiler switch. They include: subexpression elimination, loop invariant code removal, use unchanged DATA-initialized variables as constants, inline intrinsic functions, and peephole optimization at the machine code level. These optimizations correspond approximately to what the Language System compiler does when you select -opt=3.

However, MacFortran II has more in store. Three more ways of optimizing the code are provided. First, subroutines may be ‘folded’ into the main code, eliminating call overhead (if they are defined in the same file). Second, loops may be expanded into sequences of instructions (‘loop unrolling’). This is approximately equivalent to expanding

do i=1,n
 a(i) = b(i) + c(i)
end do

into

C 1

do i=1,n,2
 a(i) = b(i) + c(i)
 a(i+1) = b(i+1) + c(i+1)
end do

assuming that n is even. This cuts down the number of loop index tests by a factor of two and allows for parallelization on machines that support it (on the Mac’s floating point processor, you can at least distribute the calculation over different FP registers).

The third optimization, called ‘strength reduction’, will try to substitute integer multiplication in loops by integer additions where possible. For instance, the loop:

do i=1,n
 a(i) =i * m
end do

can also be written as

C2

i1 = m
do i=1,n
 a(i) =i1
 i1 = i1 + m
end do

which will run faster if the addition takes less time than the multiplication.

These last three optimizations strategies are often applied already in the source code by Fortran programmers. As an example of loop unrolling, consider the following piece of code from a frequently-employed matrix/vector multiplication routine from the public domain Linpack library:

C 3

      jmin = j+16
      do 60 j = jmin, n2, 16
         do 50 i = 1, n1
            y(i) = ((((((((((((((( (y(i))
     $             + x(j-15)*m(i,j-15)) + x(j-14)*m(i,j-14))
     $             + x(j-13)*m(i,j-13)) + x(j-12)*m(i,j-12))
     $             + x(j-11)*m(i,j-11)) + x(j-10)*m(i,j-10))
     $             + x(j- 9)*m(i,j- 9)) + x(j- 8)*m(i,j- 8))
     $             + x(j- 7)*m(i,j- 7)) + x(j- 6)*m(i,j- 6))
     $             + x(j- 5)*m(i,j- 5)) + x(j- 4)*m(i,j- 4))
     $             + x(j- 3)*m(i,j- 3)) + x(j- 2)*m(i,j- 2))
     $             + x(j- 1)*m(i,j- 1)) + x(j)   *m(i,j)
   50    continue
   60 continue

Here, sixteen of the basic operations are performed in one pass of the loop. Of course, it is necessary to provide code that deals with array dimensions which are not an integer multiple of 16 (or 8, 4, 2). With the loop unrolling option of Absoft’s Fortran, the compiler does this for you.

The increase in execution speed that loop unrolling and subroutine folding provide depends very much on the type of code being operated upon. The Whetstone benchmark, for instance, is strongly affected by subroutine folding. This is not surprising, since the benchmark consists mainly of calls to small subroutines executed over and over again in loops. I have reprinted the Whetstone benchmark in Listing 1, so you can actually see what it is doing.

If you activate loop unrolling and subroutine folding in addition to the basic optimizations, Absoft Fortran runs the Whetstone benchmark at 2466 KWhet/s on a Mac IIx. Language Systems, at its highest optimization level, reaches only 1035 KWhet/s (see Table 1). In real life however, where benchmarks don’t necessarily apply, the speed increase is not quite so dramatic, although still substantial. Absoft Fortran with the basic optimizations still executes the Whetstone program 40% faster than LS Fortran.

The Linpack Benchmark, which mainly tests performance in operations on big matrices, executes about 30% faster under Absoft than under LS Fortran. Note that the loop unrolling actually slows down execution here, because the time-consuming calculations are already unrolled in the source code of the Linpack package. This shows you that you have to test this option carefully before using to see whether it really increases the speed.

Table 1: Language Systems and Absoft Fortran performances on different benchmarks

Compiler Whetstone Linpack Matmult

[KWhet/s] [MFlops] [s]

LSF opt=0 929 0.077 0.1 / 4.80

LSF opt=3 1035 0.103 0.05 / 2.45

ABF no opt. 1265 0.121 0.07 / 3.93

ABF -O 1264 0.134 0.05 / 2.17

ABF -O -h2 1317 0.132 0.07 / 1.93

ABF -O -h4 1345 - 0.05 / 1.91

ABF -O -h8 1346 - 0.07 / 1.95

ABF -O -k 770 (!) 0.126 0.05 / 2.18

ABF -O -Z -G -h2 2466 (!) - 0.05 / 2.16

Why is the code created by the Absoft compiler - even with only the basic optimizations - faster than the LS Fortran code? To understand this, let’s look at the assembler code generated for the matrix multiplication benchmark example from MacTutor Vol. 5, #8. Listing 1 shows the central loop of the multiplication routine for various optimization settings.

You can clearly see that both compilers throw away a lot of redundant code when the optimization is turned on (LSF, opt=3 vs. opt=0; ABF, -O vs. no options). The code generated by both compilers is very tight, but Absoft keeps many more variables in registers. Looking at the complete routine, which is not printed here for space reasons, reveals that Absoft uses all registers except A1, A5, and D1. Language Systems keeps most variables in a local stack frame and must access memory to get them; also, it does not fully exploit the available addressing modes of the 68020, even with the -68020 option turned on. This might be a compromise taken to allow easier generation of 68000 and 68020 code by the same compiler. Since the Absoft Fortran works only on 68020/30 based machines, it could probably be designed to exploit that CPU more efficiently. The floating point registers are used by both compilers to full extent. Note that Absoft does the addition (intermediate calculation in the central loop) in extended precision before converting back to double precision for the final result.

The difference in A and D register use is mainly due to the fact that LS Fortran always generates code that is fully linker-compatible with the other MPW compilers. This means it follows the register-saving conventions which require a subroutine to leave the contents of all registers except A0, A1, D0, D1, and D2 untouched. As long as one just runs a Fortran program, it is not necessary to follow these conventions for each subroutine, but when Fortran routines are to be called from Pascal, it is. Absoft Fortran has an option (-k) that saves and restores A2-A5/D3-D7 at the beginning and the end of each routine, to keep full compatibility with the Macintosh calling conventions. For running plain Fortran code, one does not need this option. Including the register-saving code in Absoft Fortran decreases the Whetstone performance dramatically (as you’ve seen above, the Whetstone figures are very much influenced by the efficiency of subroutine calling). The other benchmarks are not much affected by this option, since subroutine calls make up only a small amount of the total execution time.

Summarizing the benchmark figures, Linpack being the most significant for floating-point calculations, we can grant a fair advantage of about 25-35% in execution speed of Absoft MacFortran II over Language Systems Fortran 2.1. This is in agreement with practical observations; other users of both compilers who I discussed with generally observe speed increases of approx. 30% for pure Fortran code on the Absoft compared to the Language Systems compiler. These figures may be much different for applications which do a lot of cross-language calls.

The loop unrolling options (-h2, -h4, -h8) do not seem to affect the execution speed too much; there is a 10% improvement for the matrix multiplication, and no improvement at all for the Linpack benchmark whose code is already unrolled. The Whetstone benchmark is improved by only about 5%. To show you what the loop unrolling is doing, I have also listed the assembly output for the double (-U) and quadruple (-h4) unrolling.

MacFortran II’s Macintosh interface

Compared to the excellent Macintosh interface that Language Systems Fortran offers, the Absoft MRWE (Macintosh Runtime Window Environment) seems a little rough. It provides you with a resizable terminal window and keyboard input and the possibility to save the output to a file at the end of the program. The elegant menu setup that Language Systems has, where you can assign vectors to Fortran subroutines to user-defined menu items, is missing here.

The good news is that Absoft provides the full source of MRWE, which gives the user the opportunity to read and understand what the code is doing (all Fortran), and possibly modify it. What I also liked about MRWE is that the window setup, dialogs, etc. are kept in resources where they should be. Language Systems takes the easy way out by hard-encoding the window environment into its code. Although LSF’s runtime interface it is very powerful, it would be even better if it were properly resource-based.

Both Fortrans offer optional inclusion of background-processing code under Multifinder. The amount of backgrounding can be controlled by the user in both cases.

Absoft does toolbox calls in a similar way as LS Fortran, by simply calling the routine and adding VAL modifiers to the parameters where call by value is needed. It is not possible to use Pascal calling conventions (as it is in LS Fortran through a PEXTERNAL declaration). The usage of handles and pointers is less convenient than in LS Fortran, where you can use STRUCTURE declarations; that extension is missing in Absoft Fortran and has to be simulated by lots of EQUIVALENCE statements.

For toolbox calling convenience and the Macintosh runtime environment, Language Systems clearly has the advantage.

Summary

Absoft’s MacFortran II is the fastest Fortran for the Macintosh currently available, being about 30% faster than the runner-up, LS Fortran 2.1. For pure number crunching this speed difference might be relevant. If you want easy addition of a Macintosh ‘look and feel’ to your Fortran program, you might prefer LSF with its capability to associate subroutines with menu items. This could in principle be done in Absoft Fortran, but only by modifying the MRWE code, which would be more difficult. If you do a lot of cross-language calling, LSF may have the advantage over Absoft. Another advantage of LSF is the total compatibility with VAX Fortran; for adapting Vax programs to Absoft, one still needs to make some changes now and then, e.g. in I/O statements.

The ideal setup would again be to compile computation-intensive parts in Absoft Fortran with the Pascal-calling option on (taking care not to have too many calls to small subroutines), and run the main program from the Language Systems runtime environment. Well, maybe the two products will move closer to each other in time, Language Systems optimizing its code even more and Absoft adding some convenience to its Mac interface.

Final words

At the end, I want to apologize to Steve Hawley (the author of SteveForth that I reviewed a while ago). I did not see his letter before it was printed in this magazine. I had actually made the error to think that SteveForth was freely available for anyone interested in testing it, because it was on a public accessible FTP host (It was actually removed soon after I had discovered it). Sorry for giving the impression that this is a finished product; SteveForth is not public domain or shareware, but Steve Hawley might make it available later when he gets around to doing some more work on it. I still think that his Forth implementation contains some very interesting ideas, and would be happy to see and review a later version.

At the last minute, I found some very useful information for those of you interested in downloading free-distribution Macintosh software from the Internet network. On the Info-mac mailing list, the following table appeared:

Date: Sun, 11 Nov 90 13:03:56 CST

From: ST5845%SIUCVMB.BITNET@forsythe.stanford.edu

Subject: ftp sites

Here is the list of anonymous FTP sites I promised to share with the net a few weeks ago:

List of Anonymous FTP sites with Macintosh Archives:

------------------------------------------------------------

apple.com 130.43.2.2 /pub/dts/mac

arisia.xerox.com 13.1.100.206 sunfixes, mac, LispUsers, tcp/ip,

ba.excelan.com 130.57.8.6 misc. (looking for suggestions)

bnlux0.bnl.gov 130.199.128.1 looking for suggestions

boombox.micro.umn.edu 128.101.95.95 POP2 email(hypercard->unix host)

brownvm.brown.edu 128.148.128.40 mac

bu.edu 128.197.2.6 RFCs. mail utils, games source, etc.

cc.sfu.ca 128.189.32.250 msdos, mac

citi.umich.edu 35.1.128.16 pathalias, CITI macIP, webster

doc.cso.uiuc.edu 128.174.73.30 msdos (pcsig), mac

elbereth.rutgers.edu 128.6.4.61 sci-fic works, startrek guides,

f.ms.uky.edu 128.163.128.6 mac, msdos, unix-pc

funet.fi 128.214.1.1

genbank.bio.net 134.172.1.160 National Repository for Gene

grape.ecs.clarkson.edu 128.153.13.196 /f/gif

hubcap.clemson.edu 192.5.219.1 /pub/gif

indri.primate.wisc.edu 128.104.230.11 macintosh TransSkel TransDisplay

ix1.cc.utexas.edu 128.83.1.21 /pub/macintosh

merlin.cs.purdue.edu 128.10.2.3 ConcurrenC, Xinu, mac, GIF

net.bio.net 128.92.192.252 /pub/mac

net1.ucsd.edu 128.54.16.10 mac

nyssa.cs.orst.edu 128.193.32.17 GIF, games, misc.

oswego.oswego.edu 129.3.1.1 GNU, mac, kermit

p6xje.ldc.lu.se 130.235.133.7 NCSA telnet 2.2ds, PC networking

pine.circa.ufl.edu 128.227.128.55 this list, RFCs, Internet Worm

polyslo.calpoly.edu 129.65.17.1 Hitchers guide 2 INET:Email list

rascal.ics.utexas.edu 128.83.144.1 /mac

sally.cs.utexas.edu 128.83.1.21 /mac

ssyx.ucsc.edu 128.114.133.1 /pub/mac-misc /pub/startrek

sdres.isd.usgs.gov 130.11.1.2 U.S. Geological Survey public files

sumex-aim.stanford.edu 36.44.0.6 mac archives, Mycin (sun4), imap

sun.cnuce.cnr.it 192.12.192.4 atalk, ka9q, GNU

surya.waterloo.edu 129.97.129.72 /images

tank.uchicago.edu 128.135.4.27 mac

tolsun.oulu.fi 128.214.5.6 amiga, atari, c64, msdos, mac, irc

topaz.rutgers.edu 128.6.4.194 amiga, others, too much to list

trwind.trw.com 129.4.16.70 NNStat,mac, named, sun-utils

tut.fi 128.214.1.2 Images, lots of misc. unix

ucbvax.berkeley.edu 128.32.137.3 /pub/mac

umaxc.weeg.uiowa.edu 128.255.64.80 NCSA telnet, sendmail

umn-cs.cs.umn.edu 128.101.224.1 Sendmail, vectrex, mac, unix-pc,

utsun.s.u-tokyo.ac.jp 133.11.7.250 Japanese PD, msdos, mac, unix, etc.

uwasa.fi 128.214.12.3 mac, pc, suntools, unix, vms

uxa.cso.uiuc.edu 128.174.2.1 mac, msdos (pcsig)

uxe.cso.uiuc.edu 128.174.5.54 /mac/pc/gifs/pc/grape

vega.hut.fi 130.233.200.42 msdos, mac, Kermit, fusion docs,

watmath.waterloo.edu 129.97.128.1 lots of stuff

whitechapel.media.mit.edu 18.85.0.125 OBVIUS, macnh

wpi.wpi.edu 130.215.24.1 dspl, anime, fusion, mac, GNU, ash,

wsmr-simtel20.army.mil 26.2.0.74 msdos, unix, cpm, mac (tenex)

wuarchive.wustl.edu 128.252.135.4 GNU,X.11R3,GIF,info-mac, 4.3BSD

zaphod.ncsa.uiuc.edu 128.174.20.50 NCSA Telnet source, Mathematica

- also 128.174.25.50

Those of you with access to Internet will know how to download files from those sites. Those with Bitnet access can send mail to BITFTP@PUCC which is the FTP gateway to Internet from Bitnet. To find out how to use that service, send a HELP message to BITFTP.

Listing 1: Benchmarks
Matrix multiplication

   program matbench
   real*8 a(50,50),b(50,50),c(50,50)

   time = second(0.)
   do i=1,50
   do j=1,50
 a(i,j) = i + j*0.01
 b(i,j) = a(i,j)
 end do
   end do
   time = second(0.) - time
   write (*,*) “Time to set up matrices:”,time,” seconds”

   time = second(0.)
   call mat_mult(c,50,a,50,b,50,50,50,50)
   time = second(0.) - time
   write (*,*) “Time to multiply matrices:”,time,” seconds”
   
   pause
   end
   
      subroutine mat_mult(c,nc,a,na,b,nb,n1,n2,n3)
c      sets c=a.b; c must be different from a or b
c      na,nb,nc are first dimensions
c      n1 n2 n3 are problem dimensions
c      c is n1xn3
c      a    n1 n2
c      b    n2 n3
      real*8 c(nc,n3),a(na,n2),b(nb,n3)

      do k=1,n3
        do i = 1,n1
        c(i,k) = 0d0
        end do
        do j=1,n2
        do i=1,n1
        c(i,k) = c(i,k)+a(i,j)*b(j,k)
        end do
        end do
      end do
      return
      end

      FUNCTION SECOND(X)
      CALL UTILIZ(TIME)
      SECOND=TIME
      RETURN
      END

      SUBROUTINE UTILIZ(TIME)
      TIME = LONG(362)/60.0
      END
Whetstone benchmark

C
      PROGRAM WHETSTONE
      DIMENSION E1(4)
      COMMON /A/ E1,J,K,L
      COMMON /B/ T,T2
C     SYNTHETIC BENCHMARK  BY CURNOW/WICHMANN
C     ***************************************
C     INITIALIZE CONSTANTS
      T=0.499975
      T1=0.50025
      T2=2.0
C     READ VALUE OF I, CONTROLLING TOTAL WEIGHT:
C     IF I=10 THE TOTAL WEIGHT IS ONE MILLION WHETSTONE INSTUCTIONS
C
      I=10
C
      CALL UTILIZ(TT1)
      CPU1=SECOND(Z)
      II=I
      N1=0
      N2=12*I
      N3=14*I
      N4=345*I
      N5=0
      N6=210*I
      N7=32*I
      N8=899*I
      N9=616*I
      N10=0
      N11=93*I
C     MODULE 1: SIMPLE IDENTIFIERS
      JJ=100
99999 CONTINUE
      X1=1.0
      X2=-1.0
      X3=-1.0
      X4=-1.0
      IF (N1) 12,15,12
12    CONTINUE
      DO 10 I=1,N1
      X1=( X1+X2+X3-X4)*T
      X2=( X1+X2-X3+X4)*T
      X3=( X1-X2+X3+X4)*T
      X4=(-X1+X2+X3+X4)*T
10    CONTINUE
X     CALL POUT(N1,N1,N1,X1,X2,X3,X4)
15    CONTINUE
C     MODULE 2: ARRAY ELEMENTS
      E1(1)=1.0
      E1(2)=-1.0
      E1(3)=-1.0
      E1(4)=-1.0
      DO 20 I=1,N2
      E1(1)=( E1(1)+E1(2)+E1(3)-E1(4))*T
      E1(2)=( E1(1)+E1(2)-E1(3)+E1(4))*T
      E1(3)=( E1(1)-E1(2)+E1(3)+E1(4))*T
      E1(4)=(-E1(1)+E1(2)+E1(3)+E1(4))*T
20    CONTINUE
X     CALL POUT(N2,N3,N2,E1(1),E1(2),E1(3),E1(4))
C     MODULE 3: ARRAY AS PARAMETER
      DO 30 I=1,N3
      CALL PA(E1)
30    CONTINUE
X     CALL POUT(N3,N2,N2,E1(1),E1(2),E1(3),E1(4))
C     MODULE 4: CONDITIONAL JUMPS
      J=1
      DO 40 I=1,N4
      IF(J.EQ.1) GOTO 42
41    J=2
      GO TO 43
42    J=3
43    IF(J.GT.2) GOTO 45
44    J=0
      GO TO 46
45    J=1
46    IF(J.LT.1) GOTO 48
47    J=1
      GO TO 40
48    J=0
40    CONTINUE
X     CALL POUT(N4,J,J,X1,X2,X3,X4)
C     MODULE 5: OMITTED
C     MODULE 6: INTEGER ARITHMETIC
      J=1
      K=2
      L=3
      DO 60 I=1,N6
      J=J*(K-J)*(L-K)
      K=L*K-(L-J)*K
      L=(L-K)*(K+J)
      E1(L-1)=J+K+L
      E1(K-1)=J*K*L
60    CONTINUE
X     CALL POUT(N6,J,K,E1(1),E1(2),E1(3),E1(4))
C     MODULE 7: TRIG. FUNCTIONS
      X=0.5
      Y=0.5
      DO 70 I=1,N7
      X=T*ATAN(T2*SIN(X)*COS(X)/(COS(X+Y)+COS(X-Y)-1.0))
      Y=T*ATAN(T2*SIN(Y)*COS(Y)/(COS(X+Y)+COS(X-Y)-1.0))
70    CONTINUE
X     CALL POUT(N7,J,K,X,X,Y,Y)
C     MODULE 8: PROCEDURE CALLS
      X=1.0
      Y=1.0
      Z=1.0
      DO 80 I=1,N8
      CALL P3(X,Y,Z)
80    CONTINUE
X     CALL POUT(N8,J,K,X,Y,Z,Z)
C     MODULE 9: ARRAY REFERENCES
      J=1
      K=2
      L=3
      E1(1)=1.0
      E1(2)=2.0
      E1(3)=3.0
      DO 90 I=1,N9
      CALL P0
90    CONTINUE
X     CALL POUT(N9,J,K,E1(1),E1(2),E1(3),E1(4))
C     MODULE 10: INTEGER ARITHMETIC
      J=2
      K=3
      IF (N10) 97,105,97
97    CONTINUE
      DO 100 I=1,N10
      J=J+K
      K=J+K
      J=K-J
      K=K-J-J
100   CONTINUE
X     CALL POUT(N10,J,K,X1,X2,X3,X4)
105   CONTINUE
C     MODULE 11: STANDARD FUNCTIONS
      X=0.75
      DO 110 I=1,N11
      X=SQRT(EXP(ALOG(X)/T1))
110   CONTINUE
      JJ=JJ-1
      IF (JJ.GT.0) GOTO 99999
X     CALL POUT(N11,J,K,X,X,X,X)
      CPU2=SECOND(Z)
      CPU2=1000000.0/FLOAT(II)/(CPU2-CPU1)
      WRITE(*,2) CPU2
2     FORMAT(/// “ TOTAL WEIGHT:” ,F10.3,” (IN THOUSANDS OF WHETSTONE”,
     *       “ INSTRUCTIONS)”)
       CALL UTILIZ(TT2)
       TTT=TT2-TT1
       WRITE(*,7777)TTT
7777   FORMAT(‘ TIME TAKEN  ‘,F12.4)
      END
      SUBROUTINE PA(E)
      DIMENSION E(4)
      COMMON /B/ T,T2
      J=0
100   E(1)=( E(1)+E(2)+E(3)-E(4))*T
      E(2)=( E(1)+E(2)-E(3)+E(4))*T
      E(3)=( E(1)-E(2)+E(3)+E(4))*T
      E(3)=(-E(1)+E(2)+E(3)+E(4))/T2
      J=J+1
      IF (J-6) 100,105,105
105   CONTINUE
      RETURN
      END
      SUBROUTINE P0
      DIMENSION E1(4)
      COMMON /A/ E1,J,K,L
      E1(J)=E1(K)
      E1(K)=E1(L)
      E1(L)=E1(J)
      RETURN
      END
      SUBROUTINE P3(X,Y,Z)
      COMMON /B/ T,T2
   AX=X
      AY=Y
      AX=T*(AX+AY)
      AY=T*(AX+AY)
      Z=(AX+AY)/T2
      RETURN
      END
      SUBROUTINE POUT(N,J,K,X1,X2,X3,X4)
      WRITE(*,4) N
      WRITE(*,4) J
      WRITE(*,4) K
      WRITE(*,3) X1
      WRITE(*,3) X2
      WRITE(*,3) X3
      WRITE(*,3) X3
      WRITE(*,3) X4
3     FORMAT(1X,E25.15)
4     FORMAT(1X,I10)
      RETURN
      END

      FUNCTION SECOND(X)
      CALL UTILIZ(TIME)
      SECOND=TIME
      RETURN
      END

      SUBROUTINE UTILIZ(TIME)
      TIME = LONG(362)/60.0
      END

Listing 2: Assembly listing of inner loops of matrix multiply routine


LS Fortran, no optimization

L10009  EQU *
 ;  File “matmult.f”;  Line         16
 MOVE.L $FFFFFFF8(A6),D1
 SUB.L  $FFFFFFAC(A6),D1
 ASL.L  #3,D1
 MOVE.L $FFFFFFF4(A6),D2
 SUB.L  $FFFFFFB8(A6),D2
 MULS.L $FFFFFFB4(A6),D2
 ADD.L  D1,D2
 MOVE.L D2,$FFFFFF84(A6)
 MOVE.L $FFFFFFF8(A6),D1
 SUB.L  $FFFFFFAC(A6),D1
 ASL.L  #3,D1
 MOVE.L $FFFFFFF4(A6),D2
 SUB.L  $FFFFFFB8(A6),D2
 MULS.L $FFFFFFB4(A6),D2
 ADD.L  D1,D2
 MOVE.L D2,$FFFFFF68(A6)
 MOVE.L $FFFFFFF8(A6),D1
 SUB.L  $FFFFFFC4(A6),D1
 ASL.L  #3,D1
 MOVE.L $FFFFFFFC(A6),D2
 SUB.L  $FFFFFFD0(A6),D2
 MULS.L $FFFFFFCC(A6),D2
 ADD.L  D1,D2
 MOVE.L D2,$FFFFFF6C(A6)
 MOVE.L $FFFFFFFC(A6),D1
 SUB.L  $FFFFFFDC(A6),D1
 ASL.L  #3,D1
 MOVE.L $FFFFFFF4(A6),D2
 SUB.L  $FFFFFFE8(A6),D2
 MULS.L $FFFFFFE4(A6),D2
 ADD.L  D1,D2
 MOVE.L D2,$FFFFFF70(A6)
 MOVEA.L$0020(A6),A0
 ADDA.L $FFFFFF6C(A6),A0
 FMOVE.D(A0),FP7
 MOVEA.L$0018(A6),A1
 ADDA.L $FFFFFF70(A6),A1
 FMUL.D (A1),FP7
 MOVEA.L$0028(A6),A1
 ADDA.L $FFFFFF68(A6),A1
 FADD.D (A1),FP7
 MOVEA.L$0028(A6),A1
 ADDA.L $FFFFFF84(A6),A1
 FMOVE.DFP7,(A1)
 ;  File “matmult.f”;  Line         17
 ADDQ.L #1,$FFFFFFF8(A6)
 SUBQ.L #1,D7
 BGT    L10009

LS Fortran, -opt=3 optimization

L10009  EQU *
 ;  File “matmult.f”;  Line         16
 MOVE.L $FFFFFF5C(A6),$FFFFFF58(A6)
 MOVE.L $FFFFFF64(A6),$FFFFFF40(A6)
 MOVEA.L$0020(A6),A0
 ADDA.L $FFFFFF60(A6),A0
 FMOVE.D(A0),FP7
 MOVEA.L$0018(A6),A1
 ADDA.L $FFFFFF40(A6),A1
 FMUL.D (A1),FP7
 MOVEA.L$0028(A6),A1
 ADDA.L $FFFFFF58(A6),A1
 FADD.D (A1),FP7
 FMOVE.DFP7,(A1)
 ;  File “matmult.f”;  Line         17
 MOVEQ  #$0008,D1
 ADD.L  D1,$FFFFFF5C(A6)
 ADD.L  D1,$FFFFFF60(A6)
 ADDQ.L #1,$FFFFFFF8(A6)
 SUBQ.L #1,D7
 BGT.S  L10009

Absoft Fortran, no optimization

L14:

;                c(i,k) = c(i,k)+a(i,j)*b(j,k)

 move.l d3,d2
 sub.l  #$0001,d2
 move.l (20,a7),d4
 move.l d4,d6
 sub.l  #$0001,d6
 move.l (a7),d0
 muls.l d0,d6
 add.l  d6,d2
 move.l d3,d6
 sub.l  #$0001,d6
 move.l (36,a7),d5
 move.l d5,d7
 sub.l  #$0001,d7
 muls.l (8,a7),d7
 add.l  d7,d6
 sub.l  #$0001,d5
 move.l d4,d7
 sub.l  #$0001,d7
 muls.l (16,a7),d7
 add.l  d7,d5
 move.l (200,a7),a2
 fmove.d(a2,d6.l*8),fp2
 move.l (208,a7),a3
 fmove.d(a3,d5.l*8),fp3
 fmul.x fp3,fp2
 move.l (192,a7),a4
 fmove.d(a4,d2.l*8),fp4
 fadd.x fp2,fp4
 move.l d3,d2
 sub.l  #$0001,d2
 move.l d4,d5
 sub.l  #$0001,d5
 muls.l d0,d5
 add.l  d5,d2
 fmove.dfp4,(a4,d2.l*8)
 add.l  #$0001,d3
 move.l (44,a7),d6
 move.l d6,d7
 sub.l  #$0001,d7

;            end do

 move.l d7,(44,a7)
;  loop bottom branch
 tst.l  d7
 bgt    L14

Absoft Fortran, -O optimization

L14:
 move.l d6,d7
 sub.l  #$0001,d7
 move.l d7,d2
 add.l  (236,a7),d2
 add.l  (240,a7),d7
 fmove.d(a2,d7.l*8),fp2
 fmul.d (248,a7),fp2
 fmove.d(a3,d2.l*8),fp3
 fadd.x fp2,fp3
 fmove.dfp3,(a3,d2.l*8)
 add.l  #$0001,d6
 sub.l  #$0001,d4

;            end do

 tst.l  d4
 bgt    L14

Absoft Fortran, -O -U optimization

L27:
 move.l d4,d2
 sub.l  #$0001,d2
 move.l d2,d7
 add.l  d3,d7
 add.l  d0,d2
 fmove.d(a2,d2.l*8),fp2
 fmul.x fp3,fp2
 fmove.d(a3,d7.l*8),fp4
 fadd.x fp2,fp4
 fmove.dfp4,(a3,d7.l*8)
 move.l d4,d2
 add.l  #$0001,d2
 move.l d2,d4
 sub.l  #$0001,d4
 move.l d4,d7
 add.l  d3,d7
 add.l  d0,d4
 fmove.d(a2,d4.l*8),fp5
 fmul.x fp3,fp5
 fmove.d(a3,d7.l*8),fp6
 fadd.x fp5,fp6
 fmove.dfp6,(a3,d7.l*8)
 move.l d2,d4
 add.l  #$0001,d4
 sub.l  #$0002,d6

;            end do

 tst.l  d6
 bgt    L27
 move.l d4,(28,a7)

Absoft Fortran, -O -h4 optimization

L27:
 move.l d4,d2
 sub.l  #$0001,d2
 move.l d2,d7
 add.l  d3,d7
 add.l  d0,d2
 fmove.d(a2,d2.l*8),fp2
 fmul.x fp3,fp2
 fmove.d(a3,d7.l*8),fp4
 fadd.x fp2,fp4
 fmove.dfp4,(a3,d7.l*8)
 move.l d4,d2
 add.l  #$0001,d2
 move.l d2,d4
 sub.l  #$0001,d4
 move.l d4,d7
 add.l  d3,d7
 add.l  d0,d4
 fmove.d(a2,d4.l*8),fp5
 fmul.x fp3,fp5
 fmove.d(a3,d7.l*8),fp6
 fadd.x fp5,fp6
 fmove.dfp6,(a3,d7.l*8)
 add.l  #$0001,d2
 move.l d2,d4
 sub.l  #$0001,d4
 move.l d4,d7
 add.l  d3,d7
 add.l  d0,d4
 fmove.d(a2,d4.l*8),fp7
 fmul.x fp3,fp7
 fmove.d(a3,d7.l*8),fp0
 fadd.x fp7,fp0
 fmove.dfp0,(a3,d7.l*8)
 add.l  #$0001,d2
 move.l d2,d4
 sub.l  #$0001,d4
 move.l d4,d7
 add.l  d3,d7
 add.l  d0,d4
 fmove.d(a2,d4.l*8),fp2
 fmul.x fp3,fp2
 fmove.d(a3,d7.l*8),fp1
 fadd.x fp2,fp1
 fmove.dfp1,(a3,d7.l*8)
 move.l d2,d4
 add.l  #$0001,d4
 sub.l  #$0004,d6

;            end do

 tst.l  d6
 bgt    L27

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

SpamSieve 2.9.37 - Robust spam filter fo...
SpamSieve is a robust spam filter for major email clients that uses powerful Bayesian spam filtering. SpamSieve understands what your spam looks like in order to block it all, but also learns what... Read more
Viber 11.3.1 - Send messages and make fr...
Viber lets you send free messages and make free calls to other Viber users, on any device and network, in any country! Viber syncs your contacts, messages and call history with your mobile device, so... Read more
Monosnap 3.6.1 - Versatile screenshot ut...
Monosnap lets you capture screenshots, share files, and record video and .gifs. Features Capture Capture full screen, just part of the screen, or a selected window Make your crop area pixel... Read more
WhatRoute 2.2.6 - Geographically trace o...
WhatRoute is designed to find the names of all the routers an IP packet passes through on its way from your Mac to a destination host. It also measures the round-trip time from your Mac to the router... Read more
MacFamilyTree 9.0.5 - Create and explore...
MacFamilyTree gives genealogy a facelift: modern, interactive, convenient and fast. Explore your family tree and your family history in a way generations of chroniclers before you would have loved.... Read more
WhatsApp 0.3.4375 - Desktop client for W...
WhatsApp is the desktop client for WhatsApp Messenger, a cross-platform mobile messaging app which allows you to exchange messages without having to pay for SMS. WhatsApp Messenger is available for... Read more
Mactracker 7.8.1 - Database of all Mac m...
Mactracker provides detailed information on every Mac computer ever made, including items such as processor speed, memory, optical drives, graphic cards, supported OS X versions, and expansion... Read more
Boom 3D 1.3.1 - 3D surround sound and ph...
Boom 3D is a revolutionary app with 3D Surround Sound and phenomenally rich and intense audio that is realistic and works on any headphones. Features 3D surround sound Built-in audio player... Read more
OmniGraffle 7.11.2 - Create diagrams, fl...
OmniGraffle helps you draw beautiful diagrams, family trees, flow charts, org charts, layouts, and (mathematically speaking) any other directed or non-directed graphs. We've had people use Graffle to... Read more
OmniGraffle Pro 7.11.2 - Create diagrams...
OmniGraffle Pro helps you draw beautiful diagrams, family trees, flow charts, org charts, layouts, and (mathematically speaking) any other directed or non-directed graphs. We've had people use... Read more

Latest Forum Discussions

See All

Steam Link Spotlight - Dicey Dungeons
Steam Link Spotlight is a new feature where we take a look at PC games that play exceptionally well with the Steam Link app. In case you missed it, our last entry focused on Faeria, a collectible card game that used to be available on the App Store... | Read more »
I can't believe Sky came out after...
I play games almost exclusively on mobile, and I’ve been doing so since around the time I started writing for 148Apps. This is why I’m late to the party on Journey. It wasn’t until last week that the game was playable on mobile, and it wasn’t until... | Read more »
Gigantic X guide - What you need to know...
Gigantic X continues to inspire loot lust over here at 148Apps, particularly because the game has already been updated just in its second week of release. Unfortunately, this 1.1.0 patch doesn’t bring a whole ton of new goodies with it, but it does... | Read more »
Steam Link Spotlight - Faeria
If you’ve been following 148Apps.com for a while, chances are you’ve seen me talk about Faeria. I reviewed it when it initially came out on iOS, and again when The Adventure Pouch: Oversky came out. I also put the game on my best games of 2017 list... | Read more »
Gigantic X guide - Tips and tricks for b...
Gigantic X has only been out for a little over a week, but it’s shaping up to be the mobile loot shooter of our dreams. That said, it’s not exactly the most friendly game out there. We noted in our review that you need to invest some time in the... | Read more »
Do Not Feed The Monkeys is one of the mo...
I’ve done a lot of messed up stuff in video games. I’ve beat people to death, slaghtered innocent animals, and even committed genocide. In doing all of that though, I’m not sure I’ve felt as uncomfortable as I have while playing Do Not Feed The... | Read more »
Civilization VI - What you need to know...
Last week, the mobile version of Civilization VI got updated to include the huge Rise and Fall expansion. Where previous updates to the game provided one or two new civilizations and maybe a few scenarios, Rise and Fall makes sweeping changes to... | Read more »
Combo Quest (Games)
Combo Quest 1.0 Device: iOS Universal Category: Games Price: $.99, Version: 1.0 (iTunes) Description: Combo Quest is an epic, time tap role-playing adventure. In this unique masterpiece, you are a knight on a heroic quest to retrieve... | Read more »
Hero Emblems (Games)
Hero Emblems 1.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: ** 25% OFF for a limited time to celebrate the release ** ** Note for iPhone 6 user: If it doesn't run fullscreen on your device... | Read more »
Puzzle Blitz (Games)
Puzzle Blitz 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: Puzzle Blitz is a frantic puzzle solving race against the clock! Solve as many puzzles as you can, before time runs out! You have... | Read more »

Price Scanner via MacPrices.net

Price drop! B&H now has new 4-Core Mac mi...
B&H Photo has dropped prices on 2018 4-Core Mac minis to $100 off MSRP, only $699. They’re also offering a $100 discount on the 6-Core Mac mini. Shipping is free: – 3.6GHz Quad-Core mini: $699 $... Read more
Amazon is offering a 37% discount on Apple Sm...
Amazon has Apple Smart Keyboards for current-generation 10″ iPad Airs and previous-generation 10″ iPad Pros on sale today for $99.99 shipped. That’s a 37% discount over Apple’s regular MSRP of $159... Read more
12″ iPad Pros on sale today for up to $400 of...
Amazon has new 2018 Apple 12″ iPad Pros in stock today and on sale for up to $400 off Apple’s MSRP. These are the same iPad Pros sold by Apple in its retail and online stores. Be sure to select... Read more
Apple has 2019 iMacs, Certified Refurbished,...
Apple has Certified Refurbished 2019 21″ & 27″ iMacs now available starting at $929 and up to $350 off the cost of new models. Apple’s one-year warranty is standard, shipping is free, and each... Read more
Sale! New 11″ WiFi iPad Pros for up to $400 o...
Walmart has new 2018 Apple 11″ WiFi iPad Pros in stock today and on sale for up to $400 off Apple’s MSRP. These are the same iPad Pros sold by Apple in its retail and online stores. Choose free local... Read more
New 2019 15″ 2.3GHz 8-Core MacBook Pro on sal...
Apple resellers B&H Photo and Amazon are offering the new 2019 15″ 2.3GHz Silver 8-Core MacBook Pro for $2449. That’s $350 off Apple’s MSRP and the lowest price available for an 8-Core MacBook... Read more
B&H has new 4-Core and 6-Core Mac minis o...
B&H Photo has the new 2018 4-Core and 6-Core Mac minis on sale for $80-$100 off Apple’s standard MSRP. Shipping is free: – 3.6GHz Quad-Core mini: $719 $80 off MSRP – 3.0GHz 6-Core mini: $999 $... Read more
10″ iPad Airs on sale for up to $50 off Apple...
B&H Photo has new 10.5″ iPad Airs on sale today for $30-$50 off Apple’s standard MSRP including free overnight shipping to many address in the US: – 10.5″ 64GB WiFi iPad Air: $469 $30 off MSRP –... Read more
Apple has clearance 2018 13″ 2.3GHz Quad-Core...
Apple has Certified Refurbished 2018 13″ 2.3GHz 4-Core Touch Bar MacBook Pros available starting at $1489. Apple’s one-year warranty is included, shipping is free, and each MacBook has a new outer... Read more
Clearance 2018 13″ MacBook Airs drop to an al...
B&H Photo has clearance 2018 13″ MacBook Airs available for $300 off Apple’s original MSRP with prices starting at only $899. Overnight shipping, or expedited shipping, is free depending on your... Read more

Jobs Board

*Apple* Mobility Pro - Best Buy (United Stat...
**719892BR** **Job Title:** Apple Mobility Pro **Job Category:** Store Associates **Location Number:** 001096-Grove City-Store **Job Description:** At Best Buy, our Read more
Best Buy *Apple* Computing Master - Best Bu...
**719975BR** **Job Title:** Best Buy Apple Computing Master **Job Category:** Store Associates **Location Number:** 001198-East Orange-Store **Job Description:** The Read more
*Apple* Mobile Master - Best Buy (United Sta...
**721421BR** **Job Title:** Apple Mobile Master **Job Category:** Store Associates **Location Number:** 000878-Dubuque-Store **Job Description:** **What does a Best Read more
Best Buy *Apple* Computing Master - Best Bu...
**716411BR** **Job Title:** Best Buy Apple Computing Master **Job Category:** Sales **Location Number:** 001089-Watertown-Store **Job Description:** **What does a Read more
*Apple* Mobility Pro - Best Buy (United Stat...
**721359BR** **Job Title:** Apple Mobility Pro **Job Category:** Store Associates **Location Number:** 000952-Baytown-Store **Job Description:** At Best Buy, our Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.