Benchmarks 2
Volume Number: | | 3
|
Issue Number: | | 9
|
Column Tag: | | Mac Cad
|
Benchmarks Re-visited
By Paul Zarchan, Cambridge, Mass
With the emergence of the Mac 2 and the growing base of useful, easy to use scientific software, the field of desktop engineering will surely grow this year. The purpose of this article is to compare, from an engineering user point of view, the new Macs (using a Prodigy 4 as the equivalent of a Mac 2) with their counterparts in the IBM micro world, DEC mini world and IBM mainframe world. First the issue of compilation and linking will be addressed and then standardized benchmarks will be used to compare various machines from both a cost and performance point of view. Most of the non Mac results were provided to me by A. Tetewsky and D. Feenberg. These results will soon be published in Ref. 1.
Compiling and Linking
When using a compiled language for programming, such as FORTRAN, the issue of compile and link times is extremely important. In engineering applications, excessive compile and link times may make it worthwhile to develop engineering software in an interpretive language such as BASIC, and then port it to a compiled language after initial debugging and algorithm development have been completed. If switching languages may not be practical, it may be worthwhile to stay in FORTRAN but develop the engineering software on a computer with faster compilation times. After program development the source code can easily be ported to the computer of interest for final compilation.
Lets consider an example in finding complex roots of real polynomials. The 144 lines of program source code for this example can be found in Ref. 2. This example, like that of the Butterworth example in Ref. 3, uses single precision arithmetic but unlike the Butterworth example has virtually no input/output code. In this root finding example, a solution is found for a 30th order, well-behaved polynomial. The compile and link times for the 144 lines of code, using MS FORTRAN (both in the Apple and non Apple world), are indicated in Table 1 for a variety of micros.
In this example, compilation and linking were done using a hard disk for the IBM AT and Compaq 386, while in the Macintosh world, compilation and linking were done in RAM. In the IBM world, compiling in RAM is not significantly faster than compiling from the hard disk. This will always be the case since the operating system software, DOS, is written for 64k segmented 8086/8088 processors. Although an operating system which is developed for the 80386 or OS/2 should be better and improve compilation times, it will not be available for at least one year. If history is any guide, the wait time may be significantly longer. In addition, due to memory segmentation and the lack of a FORTRAN editor (a word processor must be used), it may be difficult to fit all necessary engineering tools into RAM. In the Macintosh world, memory is linear and easily expandable with third party upgrades. For example a 512K Mac can be upgraded to 2 Megs for about $500. This permits the creation of a 1.5 Meg recoverable RAM disk which is large enough to fit FORTRAN and many other useful tools into RAM. Therefore, compiling in RAM with a Mac is much faster than compiling from a hard disk.
In addition, in the IBM world one must compile and link before the code can be executed. The user must nurse the computer through the compiling, linking and execution process. In the Macintosh world, linking is dynamic and therefore automatic from a user point of view. The user simply double clicks on compile and execute and the source code compiles, links and runs.
The execution time for this complex root finding example for a variety of micros appears in Table 2. In this example all the micros with the exception of the Mac Plus had math coprocessors.
The Table shows that, for this example, the Prodigy 4 is about 10 times faster than a Mac Plus, more than 5 times faster than an IBM AT and 2.5 times faster than a Compaq 386. In the IBM world, with the exception of the PC, the math coprocessor never seems to run at the same clock rate as the CPU. That is why for this example, an AT and PC (where the math coprocessor is matched to the CPU at 4.77 MHz) have similar execution times. The Compaq 386 is only twice as fast as the AT even though the Compaq has 32 bits rather than 16 bits and runs at 16 Mhz rather than 6 Mhz. In principal, when the IBM operating system software is written and a 16 MHz Intel 80387 math coprocessor becomes available, it should be in the same speed class as the Prodigy 4. Interestingly enough, the Compaq 386 is rated at 3.5 MIPs while the Prodigy 4 is only rated at 2.0 MIPs. We can see that in numerical applications, MIP ratings may not tell the whole story (see Ref. 4 for example).
Often the user may only be interested in the turn around time, which is the sum of the compile, link and execution times. For this example we can see by comparing Tables 1 and 2 that the turn around times are significantly better in the Macintosh world. Table 3 summarizes the results for the complex root example.
The sample problem only had 144 lines of FORTRAN code. If we consider a traveling salesman program using 1500 lines of FORTRAN code, the comparison of compile and linking times are even more dramatic. Table 4 shows that the Macintosh and Prodigy 4 are considerably faster for larger programs than either the IBM AT or Compaq 386.
Whetstone Benchmarking
The Whetstone benchmark, devised in England by Curnow and Wichman in the Feb. 1976 issue of the Computer Journal, is an attempt to cover a typical mix of all floating point operations. This benchmark contains linear arrays, and add, subtract, multiply, divide and transcendental operations. Whetstones were originally written in ALGOL, but later translated to FORTRAN in 1979 by D. Frank. Since that time, many computer manufacturers have rated their machines in terms of thousands of Whetstones per second or kw/sec. Higher Whetstone ratings mean more powerful machines. Table 5 presents single and double Whetstone ratings for a variety of micro, mini and mainframe computers. In addition, ratios referenced to Prodigy 4 speed are indicated in the Table. A ratio of 1.7 means that the computer is 1.7 times faster than the Prodigy 4. All computers, with the exception of the Mac Plus, have math coprocessors or floating point accelerators. The poor double precision Whetstone rating of the Mac Plus may, relative to the IBM PC, may be one of the reasons there has been a scarcity of scientific software for the Mac. Of course, we can see from this Table that the Prodigy 4 and hence new Mac 2 changes all that.
The Whetstone results of Table 5 (with no I/O) can be compared to the Butterworth simulation results( with considerable I/O and more representative of a realistic engineering application) of Ref. 3. Figure 1 shows that all the benchmarks, whether they be Whetstones or Butterworth simulations, yield about the same relative machine performance. Only the Mac Plus seems to yields results which are significantly benchmark dependent. It yields worse performance on the Whetstones because of its lack of a math coprocessor.
Figure 1 - Relative Machine Performance is Approximately Independent of Benchmark
The performance comparison of Fig. 1 can be placed into proper perspective when the cost of the host computer is considered. For simplicity, computer cost can be considered to be the machines purchase price only. This neglects the cost of the small army of technicians required to operate the larger machines and the cost of software leasing agreements. We can see from Fig. 2 that generally higher cost computers yield faster performance. However the cost is not always commensurate with the performance. For example, a VAX 11/780 is only 1.5 times as fast as a Prodigy 4 and yet is 40 times more expensive. An IBM 3084Q is 11.7 times faster than a Prodigy 4 and is 500 times more expensive. On the micro side an IBM RT is 2.5 times slower than a Prodigy 4 and yet costs twice as much.
Figure 2 - Micros are More Cost Effective Than Larger Machines
If we normalize the computer performance as measured by double precision whetstones per second to the computer purchase price we can generate bang for the buck information. More bang for the buck means that the computer yields a higher double precision Whetstone rating for less cost. Figure 3 presents this cost effectiveness information and shows that the Compaq 386, Prodigy 4 and Micro Vax 2 are very cost effective, with the Prodigy 4 yielding the most bang for the buck. The curve also indicates that if a micro can do the job, it is more cost effective from a performance point of view than a mainframe.
Figure 3 - Prodigy 4 Outperforms Every Other Computer
Summary
The intent of this article was to show that FORTRAN runs very efficiently on the Prodigy 4 (and hence Mac 2) when compared to non Apple micros. When compilation and linking times are taken into account, the comparison is even more dramatic. A relative performance curve is presented quantifying bang for the buck information for a variety of micros, minis and mainframes. As expected, the new Mac 2 appears to out- perform every other computer.
Acknowledgements
I wish to thank Micro/Systems, Av Tetewsky and Dan Feenberg for permitting me to extract from Ref. 1 the benchmark timings on all the non Apple machines and for providing the technical explanation for the features of the various DOS machines. In addition, I would like to thank Owen Deutsch, for providing me with the travelling salesman FORTRAN code.
References
1) Tetewsky, A. and Feenberg, D. A Survey of 6 FORTRAN Compilers to appear in Sept. 1987 edition of Micro/Systems Journal.
2) Press, N. H. et al, Numerical Recipes The Art of Scientific Computation, Cambridge University Press, 1986.
3) Zarchan, P. New Mac Workstation Potential, MacTutor, Vol. 3, No. 3, March 1987, pp 15-21.
4) Boston Computer Society IBM PC Report, PC Technical Report: MIPs, MFlops, Benchmarks and Other Half-Truths, May-June 1987.
5) Marshall, T., Jones, C., and Kluger, S. Definicon 68020 Coprocessor, BYTE, July 1986, pp 120-144.