Rick Grehan
A problem with the BYTEmark benchmarks has been located and corrected. Specifically, the logical unit (LU) decomposition test -- a component of the FPU benchmark portion of BYTEmark -- behaved erratically under certain OSes. One unfortunate outcome of this problem resulted in BYTE's publishing low benchmark numbers for Intel P6 processors.
The BYTEmark's component tests are all run multiple times by the benchmark, and the program passes the results through statistical calculations to yield the final indexes. In the case of the erratic LU decomposition test, the resulting scores for the P6 were sometimes low (which yielded an index of about 1.7) and sometimes high (yielding an index of about 3.6). The test showed its worst behavior under Windows NT.
The problem concerned data alignment. The LU decompos
ition algorithm solves linear equations, which are represented by coefficients stored as doubles (an 8-byte floating-point data type) in a 2-D array. As the LU decomposition algorithm does its work, it quickly processes data in the array while making numerous 8-byte fetches.
Because the BYTEmark is self-adjusting (i.e., each test component makes proportionally more or less work for itself, depending on the power of the system under test), the array is not statically allocated. The LU decomposition test calls the library routine
malloc()
to allocate space for the array.
Under the Windows NT compilers we tested -- Visual C++ and Watcom C++, the latter being the compiler used to generate the release version of the BYTEmark --
malloc()
always returns data that's aligned to 4-byte boundaries. (This makes perfect sense, since NT is a 32-bit OS.) However, it doesn't always return data aligned to 8-byte boundaries.
Nonaligned memory accesses on Intel processors are always slower
than aligned accesses. Consequently, whenever
malloc()
returned a non-8-byte-aligned array to LU decomposition, the algorithm proceeded much more slowly than when it received an aligned array.
A modified version of the benchmarks run on an Intergraph 150-MHz P6 machine scored 2.1 on the integer test and 2.6 on the floating-point test. (This was a dual-processor machine, but the current BYTEmark tests are single-threaded only.)
By the time you read this, an update to the BYTEmark will be on the BYTE World Wide Web page. In addition, for Intel P6 processors, we'll be reporting the proper numbers as returned by the aligned accesses.
We apologize for the confusion this has caused. We would like to thank the people at Geodesic Systems, Intel, Watcom, and -- in particular -- Rob Barris of Quicksilver Software for their help in tracking down and correcting this problem.