which is of use in multimedia, digital-video, content-creation, and games applications. This article examines the differences between the Pentium Pro and the Pentium II.
Architectural Differences
When the Pentium Pro first became available, it sported a number of significant enhancements over its predecessor, the Pentium. For example, the Pentium Pro featured a concept called the Dual Independent Bus (DIB) architecture, which addressed existing system-bandwidth limitations. It did this with two buses: a processor-to-main-memory bus and a processor-to-L2-cache bus. The processor could access both buses simultaneously for better throughput.
The Pentium Pro can execute up to four instructions per clock cycle. It also features dynamic execution, which incorporates the concepts of out-of-order and speculative execution. The Pentium Pro has a 12-stage superpipeline, as compared to the Pentium's five-stage counterpart (or six stages for a Pentium with multimed
ia extensions [MMX] technology).
It employs multiple branch prediction based on both past history and knowledge as to how each op code is typically used. While still compatible with existing x86 applications, this branch-prediction logic improves the Pentium Pro's performance over the Pentium's.
The Pentium II inherits the Pentium Pro's superpipeline and DIB architecture. However, the biggest change is in its internal logic: the Pentium II has larger L1 caches and supports Intel's MMX instructions. These 57 new instructions enable 64-bit data words to be treated as two 32-bit, four 16-bit, or eight 8-bit chunks. This permits the same operation to be performed on each chunk simultaneously, thereby facilitating features such as full-screen video. (See "x86 Enters the Multimedia Era," July 1996 BYTE,for more information.)
Also, unlike the Pentium Pro, which operates at 3.3V, the Pentium II operates at 2.8 V, thereby allowing Intel to run it at higher frequencies without unduly increasing its p
ower requirements. While a 200-MHz Pentium Pro with a 512-KB cache consumes about 37.9 watts of power, a 266-MHz Pentium II with a 512-KB cache burns 37.0 W.
Processor Packaging
The Pentium Pro consists of a multichip module containing two dies: the processor core and the L2 cache. This module is supplied in a pin-grid array (PGA) package. This is inserted into a zero-insertion-force (ZIF) socket, known as Socket 8 on the circuit board; the pin attributes were defined by Intel.
At first glance, the Pentium II appears to be radically different from the Pentium Pro, but it is conceptually very similar. From the outside the Pentium II appears to be huge because it's packaged in what Intel refers to as a single-edge-connect (SEC) cartridge. It plugs into a connector called Slot 1 on the motherboard.
In fact, the Pentium II is a cross between a multichip module and a hybrid using an FR4 (printed circuit board) substrate, as
shown in the photo
. Intel doubled the proc
essor's on-chip L1 cache (two separate 16-KB caches for data and instructions, for a 32-KB total). The company also separated the processor core from the L2 cache. The result is six individually packaged devices on the SEC cartridge substrate. These devices consist of the processor, four industry-standard (i.e., low-cost) burst-static-cache RAMs, and one tag RAM, which was previously integrated on the L2 cache die. The L2 cache chips and the tag RAM are presented in conventional quad flat packages (QFPs), while the core processor is packaged as a leadless grid array (LGA).
The SEC cartridge conveys further design advantages. The Pentium Pro's PGA package requires 387 pins, while the SEC cartridge uses only 242. This one-third reduction in the pin count is due to the fact that the SEC cartridge contains discrete components, such as termination resistors and capacitors. These items provide signal decoupling, which means that far fewer power pins are required.
Furthermore, laying down the system-bus
traces between multiple Pentium Pro processors using the Socket 8 style is extremely arduous and typically forces board designers to increase a board's layer count. The SEC cartridge's in-line pin arrangement dramatically improves circuit routing, which lets designers employ less-expensive four-layer boards.
Supporting Cast
As the figure
"Dual-Processor System"
shows, the Pentium II employs a gunning-transceiver-logic (GTL+) host bus that offers glueless support for two processors. This provides a cost-effective, minimalist two-processor design that allows symmetric multiprocessing (SMP).
The two-processor limitation is not imposed by the Pentium II; rather, it's dictated by the supporting chip set. However, by initially limiting the chip set to a dual-processor configuration, this allows Intel and workstation vendors to offer dual-processor systems in a timely and economical manner. Power users demanding the ultimate in performance can expect a quad-processor versio
n of the Pentium II chip set to appear in the future.
In the figure "Dual-Processor System", note the PMC and DBX chips, which are collectively referred to as the 440FX chip set. The 450KX (low-end) and 450GX (high-end) chip sets used by the Pentium Pro support only fast-page-mode (FPM) memory devices, so these chip sets are obliged to provide memory interleaving to reduce memory cycles. The problem with interleaving is that all the memory slots have to be occupied, which increases the cost to the user for memory upgrades. The 440FX chip set doesn't offer memory interleaving, but it does support extended data out (EDO) DRAM, which improves memory performance by reducing clock latencies.
System Issues and Performance
Many Pentium II-based systems offer only X-3-3-3 memory timing. This means that, when a block of data is read from memory, the first access requires
X
clock cycles to set up the initial access.
X
for the 440FX chip set equals seven, nine, or 12 cycles, depending
on whether the system gets a page hit, page hit/row miss, or page miss, respectively. Subsequent accesses take only three clocks per access, as implied by the 3-3-3 moniker.
Access
in this context refers to reading a 64-bit chunk of data, known as a
quadword
. But vendors who pay attention to signal-integrity issues and employ high-quality components, such as buffers and terminators, can actually wring a faster X-2-2-2 timing out of the processor. Intergraph's 440FX-based TD and TDZ Pentium II systems, for example, are built to deliver this memory timing.
Also, Intergraph's designers have built an SEC look-alike "gender-bender" cartridge that sports a Pentium Pro. Thus, users can buy a Pentium Pro-based system and then upgrade the gender-bender cartridge(s) to full-fledged Pentium IISEC cartridges in the future. The only requirement is to exchange a couple of jumpers that modify the frequency and voltage levels of the system clock.
At 266 MHz, the Pentium II delivers a SPECint95
of 10.8 and a SPECfp95 of 6.89. For those who don't run exotic benchmarks for a living, how does this realistically measure up? Intergraph's internal evaluations reveal that a 266-MHz Pentium II runs real-world applications anywhere between 5 percent and 30 percent faster than a 200-MHz Pentium Pro (both with 512-KB caches). For a large number of these applications, the performance improvement falls in the 20 percent to 25 percent range.
Considering this performance improvement and the potential of system vendors passing the Pentium II's cost savings on to their customers, I'd say that Intel has done us proud.
illustration_link (14 Kbytes)

Support for two processors and EDO DRAM reduces system costs.
photo_link (106 Kbytes)

The Pentium II uses industry-standard L2-cache parts.
Clive Maxfield is on the technical staff at Intergraph Computer Systems (Huntsville, AL). He is the coauthor ofBebop BYTEs Back (An Unconventional Guide to Computers)(Doone Publications, 1997). You can contact him at
http://ro.com/~bebopbb
.