Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesWhy Legacy Code Snags the P6


September 1995 / News & Views / P6 Weakness Revealed / Why Legacy Code Snags the P6
Rick Grehan

The Pentium currently outperforms the P6 when running 16-bit programs under Windows 3.1 due to a combination of factors. They include the design of the P6 and the hangover of legacy DOS and Windows code.

As described in "Intel's P6" (April BYTE), instructions passed to the P6 are converted into equivalent microoperations that are loaded into a 40-element circular buffer. Instructions in the buffer pass to the execution unit, which processes between three and five instructions simultaneously, if the data for the specific instruction is available.

If instruction B references a particular register, and instruction A, which precedes B in program flow, also writes t o that register, B must wait for A to complete. Therefore, the fewer the dependencies, the faster the instructions can be delivered to the execution units.

To conserve on the P6's transistor count, Intel decided to shadow (i.e., allow multiple independent instances) the "true" registers as full 32-bit entities only. The result is that any instruction that alters any part of a register will hold up a following instruction that uses any part of the same register, even if the instructions are logically independent. An ADD AL,6 holds up a MOV BX,AX .

If this were a completely 32-bit world (as Intel's engineers had hoped it would be by now), any instruction referencing a register would be held up by, at most, one preceding instruction, and the P6 would "fire on all cylinders." Similarly, if all programs manipulated the CPU registers only 16 bits at a time, the P6 would perform well. Unfortunately, a great deal of code, especially in the DOS and Windows world, manipulates registers as 8-bit entities here, 16-bit entities there, and sometimes 32-bit entities. This "mixing" of data sizes bogs the P6 down, because it has to spend so much time "piecing" the 32-bit registers together from 8- and 16-bit subunits.

Another source of friction for the P6 arises from the ever-dreaded segment registers often manipulated in 16-bit DOS and Windows programs. Again, to skirt what would have been a tremendous multiplication of complexity, the P6 engineers elected not to virtualize the segment registers. So, whereas general CPU registers can be shadowed, only one global instance exists for each segment register. The result is that the arrival of a segment register load instruction "serializes" the CPU: No other instructions can proceed until the load completes.

Furthermore, any instructions that had already been started but appear in the program flow after the segment register load instruction must be dumped and restarted. The "tear it up and start from scratch" tactic is necessary because th e source for all instructions and data following the segment load is in question.

Ironically, none of this would be of any significance if the designers of the P6 hadn't made a few excusable miscalculations. In one of the larger mispredicted branches we've ever seen, the P6 engineers in 1990 estimated that most code today would be 32 bits, and that the standard for chip technology, including the Pentium, would be at 0.6 micron running at around 100 MHz. However, hardware again outpaced software. Today's typical PC runs a mixture of 16-bit code on 32-bit OSes. Meanwhile, the latest Pentium is produced on a 0.35-micron process and soon will run at 150 MHz.

The first P6 will not be manufactured on a 0.35-micron process, however. Instead, Intel says it will make the first P6 chips on a more conservative 0.6-micron process. Once it has worked the bugs out at 0.6 microns, Intel says it will move to a more aggressive 0.35-micron process. The company estimates there will be an eight-month period when a similarly clocked Pentium will outpace the P6 in the special circumstances we've described. But once Intel moves to 0.35-micron manufacturing, the P6 will race ahead.


Up to the News & Views section contentsGo to previous article: P6 Weakness RevealedGo to next article: PC Power Comes to the CalculatorSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network