Benchmarks run on an early reference system confirm that the P6 is not the best chip for running 16-bit software. AMD and Cyrix say they won't have the same problem.
The Byte Staff
The mediocre performance of the next-generation P6 processor on Windows and 16-bit DOS code creates an opportunity that Intel's competitors are eager to exploit. At least two of those competitors, AMD and Cyrix, say their respective next-generation x86-compatible processors, the K5 and the M1, will not suffer from the P6's defects when running Windows 3.1 or Windows 95 software.
As reported in BYTE last month (see "P6 Weakness Revealed," September BYTE), preliminary benchmarks run by Intel indicated the P6 performs best when running 32-bit code. Older 16-b
it and current mixed 16/32-bit code (as found in Windows 3.1 and Windows 95, respectively) that makes use of segment writes, partial register operations, unaligned data accesses, and instruction-prefix bytes stymies the P6. This is because when Intel started designing the P6 about five years ago, the company thought most code running on today's desktops would be 32-bit. Thus, the P6 was not optimized for 16-bit performance. BYTE recently confirmed the P6's poky 16-bit performance by running a variety of benchmarks, including a special 16-bit version of the cross-platform BYTEmark CPU and FPU benchmarks (
see the figure
).
AMD says its K5 has extra tag fields and comparators in the reorder buffer to handle partial register accesses more smoothly than the P6. Also, unlike the P6, the K5 can execute segment changes speculatively, a technique that avoids significant performance penalties.
Cyrix's M1 has special circuitry that makes its performance when handling segment-register wr
ites, partial register updates, and instruction-prefix bytes better than that of the P6 when running 16-bit and mixed 16-/32-bit Windows code. Cyrix officials say the M1 will offer better 16-bit and equivalent 32-bit performance as the P6.
Tests performed on an early P6 reference system produced by Intel highlight the P6's 16-bit/32-bit performance gap. We ran a variety of benchmarks on a 150-MHz P6 reference system. The system had a 60-MHz I/O bus, 64 MB of two-way interleaved RAM, a Diamond Stealth Pro video card, and the P6's integrated, 256-KB secondary cache. Our tests indicate that for running 16-bit applications, you'll almost always get equal or better performance for less money if you buy a PC based on a 90-MHz or faster Pentium instead of one based on a first-generation P6 processor.
When running a 16-bit version of BYTE's cross-platform BYTEmark CPU/FPU benchmarks, our baseline 90-MHz Dell Pentium outperformed the 150-MHz P6 on all tests except the Fourier test. The reason the P6 won he
re is because every test except the Fourier test operates in a source/destination fashion.
With source/destination operation, the tests process a quantity of data (source) and output another quantity of data (destination). For example, the IDEA test reads a large array of text and encrypts it into a destination array.
All the source/destination-style tests must call the segment-offset calculation routine repeatedly. On the P6, this results in a performance penalty, because that routine involves a segment-register load.
If Cyrix and AMD can deliver processors that offer better 16-bit and comparable 32-bit performance as Intel's first P6, then both companies will likely sell more chips to users who want to maintain their investment in legacy code. But the window of opportunity is small. Microsoft's release of Windows 95 should push the market toward 32-bit software. And when Intel pumps up the P6's clock speed to 200 MHz, which is expected to happen later in 1996, the P6 should outperform the
Pentium no matter what software it's running.
illustration_link (9 Kbytes)

BYTE's cross-platform BYTEmark CPU and FPU benchmarks confirm that a 90-MHz Pentium outperforms a 150-MHz P6-based system running in 16-bit code. The performance of the P6 improves over that of the Pentium when running 32-bit code, as you would expect.
The P6's difficulty with 16-bit code is less pronounced at the application level, but it's still noticeable. Practically any 100-MHz Pentium-based machine will outrace a 150-MHz P6-based computer when running Windows 3.1 applications.