Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesFPU Precision


January 1995 / Core Technologies / FPU Precision

Ndiff, a custom comparison program, helps programmers sort out how various hardware platforms calculate FPU operations differently

Oliver Sharp

Programmers who work with floating-point-intensive applications know that obtaining accurate numbers can be tricky. One particularly awkward problem is that floating-point applications do not produce identical results on different machines. In fact, the results can even change when switching from one compiler to another.

Traditionally, programmers have used a utility such as diff to compare two text files to see if two versions of a code are in agreement. But diff may not be sufficient for FPU-intensive applications. Two programs may both be working correctly even though every number they compute is different. Considerin g that scientific applications may produce thousands or millions of numbers as results, how can a programmer check whether the code is executing properly on different machines? To address this problem, I wrote a program I call ndiff.

Different Results

To understand how an application can behave differently on different hardware platforms, we need to look at the way computers handle non-integer values.

A floating-point value consists of a sign, an exponent, and a significand. In the IEEE standard, a single-precision floating-point value equals (1)S * (1+significand) * 2(exponent127). Some real numbers can be expressed precisely, but most do not have an exact representation. The listing "Demonstrating Floating-point Round-off" computes the square root of 2, squares that, and compares the final result to 2. On a SPARC workstation, the difference is 1.2e7; on a Cray C90, it is 1.4e14.

Programmers commonly do two things that affect the results of a floating-point application: They change the precision of the code and the order in which operations are executed.

In the first case, precision simply refers to the number of bits that are used to store floating-point values. Modern computers typically offer two sizes: 32 bits, known as single precision, and 64 bits, known as double precision. Although these sizes are specified by the IEEE/ANSI standard and have become widely accepted, they are not universal and are certain to change in the future as the length of the machine instruction word continues to increase. A given program won't always run on a machine with the same word size as the one on which it was originally written. Moving to an architecture with a different representation size will almost always change the results of floating-point programs.

One of the reasons that "Demonstrating Floating-point Round-off" behaves differently on the SPARC and Cray computers is that the latter uses 64 bits for single precision. Changing precision al ters therounding behavior and usually changes the results. (Language mechanisms are another way to change precision.)

A second source of trouble relates to something most of us learned in grammar school. We were taught that arithmetic operations like multiplication and addition are commutative (i.e., A x B and B x A yield the same result) and associative ([A+B]+C is the same as A+[B+C]). However, neither is true in computer addition.

Unfortunately, a compiler must often rearrange the order of computations to improve program performance. "Reordering for Optimization" shows a simple example using a standard compiler optimization called loop-invariant code motion. This transformation moves computations that are inside a loop but don't need to be. In the example, the new code computes B+C once instead of redoing it 100 times. However, if addition is performed left to right in this language, the original code computed (result+B)+C. The new version uses result+(B+C) instead, which is not necessarily the same thing.

While changing the order is often a significant improvement on single-processor machines, the benefit can be much more dramatic on a parallel architecture. For example, suppose that we have an array of one million elements and wish to compute the sum. At first glance, this seems to be the perfect problem to solve on a parallel machine. If there are 100 processors, we can use each one to add 10,000 of the elements together. Then we do a final pass, adding the partial sums, and the computation will be just under 100 times as fast (depending on communication requirements and how that final pass is implemented).

However, the new and efficient parallel algorithm adds the numbers differently than the original did. Instead of going from one end of the array to the other, the pieces of the array that are allocated to each processor are summed first. Because addition is not associative, the modification of the order will change the final results. Such changes often mak e it difficult to know when a parallel program is debugged. Even when the program is working correctly, all of its results are slightly different than the ones from the sequential version. Although such changes are often unavoidable, ndiff can help to determine how large the differences are.

The Program

Ndiff works by going through two files in lockstep to compare each line. It scans each pair of lines twice. The first pass looks at everything but numbers to verify that the letters and symbols in the two lines are identical. If there are any differences, ndiff prints out the two lines together, with a line-number prefix. Because the idea behind ndiff is to compare the output of the same program on different architectures, there will generally be few or no differences that aren't numerical. However, the program might do something like print out the current date and time, so ndiff can't simply give up when the files don't match.

If the lines are identical aside from their nu merical values, ndiff rescans them looking for pairs of numbers. It reads in the numbers as double-precision floating-point values and compares them. Note that ndiff depends on the library routine sscanf() to parse the numbers; a poorly written implementation of that routine will affect the results that ndiff generates.

Once you load a pair of numbers, ndiff compares them. If they are identical, ndiff discards them and continues. If not, it computes a number of statistics that describe the way they differ, including the absolute value of the difference and the percentage. Once ndiff completes its work, it prints out a report describing the relationship of the two files.

"An Ndiff Report" shows two sample input files and ndiff's analysis of them. Running diff on the files would only reveal that the first lines are identical. Ndiff is more helpful because it provides a statistical summary of the differences to help reveal whether they are the results of architectural and compil er effects.

Ndiff uses a variety of statistics, because any single one can be misleading. One strategy for comparing numbers is to consider only the magnitude of the difference and require that it be small. For example, 1 and 1.000001 only differ by 10-6 and are probably close enough for practical purposes. But absolute magnitudes can't be interpreted without some information about the application that generated them. If the program is computing the distance between two galaxies in meters, a difference of a few thousand is probably negligible. When computing the number of microns between atoms, however, a difference of one would be unacceptable. Therefore, ndiff computes both the magnitude of the difference and the percentage difference. If the difference is a tiny percentage, it can probably be ignored.

After comparing the two files, ndiff reports the largest percentage and the largest difference it encountered for any number pair. In general, if the maximum percentage is small, the two files a re essentially identical, and you can ignore the rest of the report.

Even if the maximum percentage is large, all may still be well, but the programmer must be careful, because large maximums can conceal problems. In "An Ndiff Report," some of the numbers differed by a large percentage and some by a large magnitude. The first half of the report suggests that the files are almost identical, but it could be deceptive. Suppose that the files contained the corresponding pair (500,1000)--a pairing that should certainly be a major cause for concern. That pair wouldn't change the maximums, though, because ndiff already finds a larger difference and a larger percentage in the file as it is now.

The second half of ndiff's report reveals these hidden pairings. It is a set of threshold rules computed for several different percentages. A threshold rule consists of two values, P and M, and means that every pair of corresponding numbers in the files is either within P percent of one another or has a differenc e less than M. In the "Thresholds" portion of "An ndiff Report," the third line reveals that differences larger than 1.0 percent are all small (less than .01). The pair (500,1000) would completely change the threshold rules: for both 1 percent and 10 percent, M would be 500. The programmer can quickly tell from the thresholds when a potentially troublesome pairing is concealed by the maximum values.

Scientific Tool

Ndiff is not a perfect solution to the problem of variation in floating-point behavior, but it is a useful tool for programmers who must work on the same code in different environments. I've relied on it often when working with and parallelizing scientific applications using a diversity of architectures and compilers. Ndiff is available electronically.


Demonstrating Floating-point Round-off

/*  sqrt.c - the problems of approximation  */
##include 
##include 
main()
{
  float a;
  a = sqrt(2.0);
  a = a*a;
  if (a ==
 2.0)
    printf("a is 2\n");
  else
    printf("a isn't 2; the difference is %e\n",a-2.0);
}



An Ndiff Report

file1:
**
This file contains several numbers for ndiff to work on.
0.02 1026060.33 343.8599 4444444.33454 1000
**
file2:
**
This file contains several numbers for ndiff to work on.
0.01 1033409.11 343.8667 4444493.22983 1000
**
output:
**
4 numbers differ between the two files out of 5 comparisons.
 Average of the differences: 1.849423e+03
 Largest difference:          7.348780e+03 on line 3 (was 0.711%)
 Average percent difference: 12.500%
 Largest percent difference: 50.000% on line 3 (was 1.000000e-02)
Thresholds:
    Differences were either below           or less than
                                0.001%            7.348780e+03
                                0.010%            7.348780e+03
                                0.100%            7.348780e+03
                                1.000%            1.000000e-02

                               10.000%            1.000000e-02



Reordering for Optimization

original code:                       optimized code:
for i = 1, 100                     bplusc = B+C;
  result = result+B+C;             for i = 1, 100
                                     result = result+bplusc


Oliver Sharp is a doctoral candidate at the University of California--Berkeley. His research area is compilation for parallel architectures. You can contact him on the Internet at Oliver@cs.berkeley.edu , or on BIX c/o "editors."

Up to the Core Technologies section contentsGo to next article: AMD's 29030 MicroprocessorSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network