Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesBringing Benchmarks up to SPEC


March 1996 / Core Technologies / Bringing Benchmarks up to SPEC

A suite of respected CPU benchmarks gets a face-lift

Tom Yager

Computer systems keep getting faster. That's the way we like it. Unfortunately, this also creates a challenge: As systems evolve, so must the benchmarks that we use to compare them.

The last time the Standard Performance Evaluation Corp. (SPEC) released a new version of its performance tests was in 1992. Since then, the SPECint92 (integer) and SPECfp92 (floating-point) benchmarks have become industry standards. However, SPEC's membership--which comprises system vendors, educational institutions, and consultants--has been busy finding fault with its own work. The result of all this soul-searching is a new benchmark, known as SPEC 95. Thanks to new test programs and a new baseline machine, SPEC95 creates a more level playing field for comparing different systems and microprocessors.

Of course, you should view any benchmarks with a skeptical eye. Nobody has yet invented a canned benchmark test that precisely measures how your system will perform when running your software. For some users, the only worthwhile benchmarks are the ones that they create and run themselves using real applications. But standardized benchmarks such as SPEC95 are nevertheless useful for obtaining ballpark estimates of how different systems will perform under actual conditions. At the very least, you can use them as a broad, initial screen.

Rocketing Ratios

SPEC95 includes two test suites . One, written in C, measures integer performance; another, written in FORTRAN, measures floating-point performance. These programs deliver their results indexed to a standardized baseline system (a 40-MHz Sun SparcStation 10) scoring 1.0. In other words, if the SPECint95 result is 5.0, then the tested system is five times faster at integer tasks than the baseline system.

SPEC92's baseline system was a Digital VAX-11/780. This system was also the baseline for another, more dubious, benchmark: Dhrystone MIPS. This benchmark fell into disrepute over the years and is sometimes defined as "meaningless indicator of performance" or "marketing's idea of performance."

Actually, MIPS and SPEC92 were tolerable benchmarks until some CPUs began scoring ratios in the hundreds. For example, Digital Equipment estimates that its Alpha 21164 processor scores greater than 500 SPECint92 and 750 SPECfp92. Digital predicts that a next-generation Alpha processor, which is scheduled for introduction in 1997, will attain 1000 SPECint92. Sky-high ratios like these obscure the differences between systems and might indicate that some tests ran too fast to permit accurate measurement.

To keep the test ratios from soaring out of control (at least for a while), SPEC95 adopted the new SparcStation 10 baseline system. As a result, Digital says that the same Alpha 21164 chip that was estimated at 500 SPECint92 and 750 SPECfp92 now achieves 11 SPECint95 and 17 SPECfp95.

Another problem SPEC tackled was optimized compilers. SPEC had to abandon one floating-point test entirely when a new FORTRAN compiler knocked performance ratios right out of the park. The combination of faster systems and smarter compilers creates a problem for all types of benchmark programs.

How do you write a program that's complex enough to test a system acceptably well under all conditions but that also runs in a reasonable amount of time? SPEC's latest answer is that you can't. That's why if you run the full suite of SPEC95 tests on the new baseline machine, you won't get your results for two days.

New Requirements

The SPEC95 benchmark suite consists of programs culled from various sources, primarily academic and scientific. SPEC's first alteration was to replace some of SPEC92's small programs with more demanding ones. The goal was not only to create longer run times but also to present a more accurate picture of true performance by using larger, more resource-intensive programs. The SPEC95 code is portable and runs on just about any flavor of Unix. Soon there will also be a version available for Windows NT.

Of course, compiled programs measure the efficiency of a compiler as much as they measure the performance of a system. SPEC answers this criticism in two ways. First, SPEC acknowledges that compilers and optimizers can have a significant impact on the results. Second, SPEC now requires vendors to run the benchmarks with limited optimizations--no more than four optimization flags. Vendors must use the same flags for all tests and report their optimizations.

SPEC never intended for SPEC92 to measure I/O performance, but sometimes the larger tests overflowed a s ystem's main memory, forcing it to use virtual memory. As a result, machines with faster disk I/O performed much better. To avoid this situation, SPEC95 requires the test system to have at least 64 MB of RAM (Windows NT systems, too).

By changing the test code and defining a new baseline, SPEC has made it almost impossible to devise a conversion formula that translates SPEC92 results into SPEC95 numbers. SPEC wisely discourages this because the two tests are not comparable. Major elements of the SPEC92 suite don't exist in SPEC95. The new benchmarks place less emphasis on floating-point math, because integer operations are more typical in real-world applications. And some tests in the SPEC95 suite run for a given period of time rather than for a given number of iterations, making comparisons with SPEC92 still more difficult.

Therefore, to obtain SPEC95 results for older systems, you have to run the SPEC95 suite on those machines. Unfortunately, many of them can't meet the minimum RAM requirement of 64 MB. That's why you won't see SPEC benchmarks for all six generations of the Intel x86 architecture going back to 1978. For those kinds of historical comparisons, we're still stuck with MIPS.

Looking for Respect

One of the best things about SPEC is that it's unbiased. Even though vendors such as IBM and Intel help define the benchmarks, SPEC is a nonprofit organization that makes everybody play by the same rules.

Limiting the compiler optimizations is just one example. There's also a whole set of "run rules" that govern compilation, testing, and system configuration. Vendors must follow yet another set of rules when publishing their test results.

Official SPEC95 test reports will have at least two numbers. SPECint_base95 measures integer performance with minimal compiler optimizations; SPECfp_base95 does the same for floating-point performance. These are probably the most trustworthy numbers because they obey the most stringent rules. However, it's likely that y ou'll see two additional results: SPECint95 and SPECfp95. These tests allow maximum compiler optimizations, which brings the compiler's performance into the mix.

However, some system vendors prefer to report SPECint_rate95, which is an entirely different result; it measures throughput ratios. Instead of measuring a machine's performance while running a single program, SPECint_rate95 is based on repeated tests that count how many iterations that a machine performs within a fixed amount of time. Here, factors such as cache efficiency make a difference.

Vendors are free to use any compiler optimization that they want for the SPECint_rate95 test. If they decide to use minimal optimizations, then they can report the result as SPECint_rate_base95. The parallel floating-point equivalents for these two tests are SPECfp_rate95 and SPECfp_rate_base95.

How to Use SPEC95

Anyone can purchase the SPEC benchmark suite on CD-ROM for $600. It includes all the tools you need to compile and run the programs. Vendors can submit results to SPEC, which reviews them. If vendors don't conform to all the run rules, they don't get published in the SPEC newsletter. Of course, SPEC can't stop anyone from publishing numbers elsewhere.

When interpreting SPEC results, it's important to keep a few things in mind. First, although SPEC has attempted to devise a suite that closely mimics system behavior when running real applications, these are still synthetic benchmarks. Your mileage may vary.

Second, remember that SPEC95 does not test I/O performance. If your application is I/O-intensive--on-line transaction processing, for instance--SPEC95 probably won't be as meaningful as a disk-I/O benchmark. If the responsiveness of a GUI is important to you, SPEC95 isn't the best choice for that, either.

Finally, don't attempt to map specific SPEC95 test programs to your real-world applications. SPEC95 is a collection of programs that lets you compare one system's basic performance to another's . There's still no substitute for running real programs on the system you're trying to evaluate.

If you're going to base a major purchasing decision on SPEC results, you might want to compile and run the tests yourself. At the very least, obtain the benchmark results directly from SPEC. If these results are markedly worse than the vendor's published numbers, demand an explanation from the vendor. SPEC's new rules should make it more difficult for vendors to rig the tests with their own benchmark-specific optimizations. SPEC has put the world on notice that if it uncovers any such optimizations, it will change the suite to close the loophole.

SPEC deserves praise for its dedication to providing reliable test data. SPEC95 is a definite improvement over SPEC92.


WHERE TO FIND


Standard Performance Evaluation Corp.

National Computer Graphics Association
Fairfax, VA
Phone:    (703) 698-9604
E-Mail:   
spec-ncga@cup.portal.com


HotBYTEs
 - information on products covered or advertised in BYTE


The SPEC95 Benchmark

It's based on a new suite of programs.


SPEC95 INTEGER TESTS


Game of Go                              099.go
Motorola 88000 RISC CPU simulator       124.m88ksim
GNU C compiler                          126.gcc
File compression/decompression          129.compress
LISP interpreter                        130.li
JPEG compression/decompression          132.ijpeg
String and integer manipulations        134.perl
in the Perl language
Database                                147.vortex


SPEC95 FLOATING-POINT TESTS


Mesh generator                          101.tomcatv
Shallow-water model                     102.swim
Quantum physics                         103.su2cor
Astrophysics
                            104.hydro2d
Multigrid solver in 3-D potential field 107.mgrid
Differential equations                  110.applu
Simulated turbulence in a cube          125.turb3d
Weather conditions and                  141.apsi
distribution of pollutants
Quantum chemistry                       145.fpppp
Plasma physics                          146.wave5



SPEC95: No Comparison to SPEC92

Because SPEC95 numbers are indexed to a different baseline system,
they can't be compared directly to SPEC92 values. (Values shown are
for a 150-MHz Pentium Pro.) Even the ratio of floating-point to
integer performance might vary from the old benchmark, as this
example for the Pentium Pro 150 shows.

                        
SPEC95  SPEC92

Floating-point          5.41    220
Integer                 6.08    276.3
SPECint/SPECfp          1.12    1.26



Tom Yager is a freelance writer and an evangelist for the Matrox Video Pro ducts Group. He works from his research lab in North Texas. You can reach him on the Internet at tyager@maxx.net or on BIX c/o "editors."

Up to the Core Technologies section contentsGo to previous article: Go to next article: Bug-Free BenchmarksSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network