Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesPentium Chip's Dual Personality


December 1994 / Core Technologies / Pentium Chip's Dual Personality

The latest processors offer built-in support for dual-processor systems

Roeland Van Krieken

Early in 1994, Intel announced two new versions of the Pentium processor: one that runs at 90 MHz and another that runs at 100 MHz. Collectively code-named the P54C, these CPUs use a 0.6-micron BiCMOS process that reduces their die size by 45 percent. They have improved performance due to higher core frequencies and lower power dissipation, because they operate at 3.3 V. Although the low-power, high-performance features of these processors have garnered much attention, this column focuses on another important feature in the Pentium 90-MHz (or 735\90) and Pentium 100-MHz (or 815\100) design: Both possess architectural enhancements that support efficient dual-processor systems.

Some background information is in order on Pentium system designs that use two or more processors. One design, which I term multiprocessor, uses dedicated caches for each Pentium processor. The advantage of a multiprocessor design is that it has efficient bus utilitization, because each processor communicates freely with its own cache. This bus efficiency yields high performance that scales effectively for two or more processors. Adding a second processor to a multiprocessor system design boosts its performance by more than 90 percent. The disadvantage of this design is its cost and complexity. Each dedicated cache requires an additional cache controller and SRAMs, as well as data-path, memory-bus, and interrupt-control circuitry. Because performance, scalability, and cost of a multiprocessor system are all greater than those of a single-processor system, this design is ideally suited for OLTP (on-line transaction processing) applications.

Another d esign, called dual-processor, uses two Pentium processors that share a single secondary cache. The advantage of the dual-processor design is that it is simpler and less costly, because only one cache controller and some SRAM is necessary to implement it. However, because each processor shares the bus with the secondary cache, bus efficiency is limited. The performance of a dual-processor system design typically improves by 50 percent to 80 percent with a secondary processor installed. The price/performance of this implementation makes it suitable for high-end desktop systems or workstations.

Dual Details

The 90- and 100-MHz versions of the Pentium processor provide three key hardware features to support a dual-processor design: cache coherency, multiprocessing interrupt control, and bus arbitration. The Pentium's bus interface implements the MESI (modified, exclusive, shared, invalid) protocol, which helps manage cache consistency. The bus also integrates multiprocessor interrupt-control logic and b us-arbitration logic. The interrupt-control logic is based on the APIC (advanced programmable interrupt controller) architecture, which supports the redirection of interrupts to multiple processors. The bus-arbitration logic lets the two processors arbitrate access for the common bus to the shared cache.

With these on-chip logic blocks, a system designer can develop a ``glueless'' interface for a dual-processor system. It simplifies the overall system design, but some support logic is required to flesh out the implementation. The support hardware consists of an external I/O APIC that obtains the system interrupts and distributes them to the appropriate processor, and some data-path control logic to optimize access to the host bus.

The Pentium dual-processor design uses a private APIC bus to maintain cache coherency and to coordinate the operations of the two processors. The APIC bus consists of three lines, as shown in the figure ``Pentium Dual-Processor System.'' The first line is the APIC enab le. It signals the presence of a dual-processor setup, enabling the on-chip (local) APIC logic. This allows another processor to be inserted into a second socket without special consideration to the system hardware or software. The second line selects the processor. The external I/O APIC uses this line to select the processor that is the recipient of a system interrupt. The third line is the APIC bus clock, which operates the APIC bus independently of the processor bus.

As mentioned earlier, both the primary and the secondary processor contain integrated local APIC modules. This APIC logic handles directed interrupts and interprocessor interrupts. As interrupts arrive from the system, they are routed through the external I/O APIC logic. This I/O APIC is similar to the original 8259 interrupt controller found in all PCs today. However, the I/O APIC captures all system interrupts and directs them to separate processors through various programmable distribution schemes. The local APIC logic in the primary and secondary processors receives interrupts from the I/O APIC via the three-wire private APIC bus, locally via the local interrupt pins, or from the other processor via the APIC bus.

The Pentium processor incorporates a private arbitration mechanism that allows the primary and secondary processors to arbitrate for the shared processor bus without assistance from a bus controller. The arbitration architecture is structured in such a way that the dual-processor pair appears as a single processor to the system. The arbitration logic uses a fair arbitration scheme, and the arbitration state machine was designed to efficiently use the processor bus bandwidth.

The arbitration mechanism requires that the Pentium check the second socket for a processor every time it is reset. The voltage on a processor type pin indicates whether the Pentium is the primary or the secondary processor in a dual-processor design. The primary processor always comes out of reset as the MRM (most recently used master) and th e secondary processor as the LRM (least recently used master). The MRM controls the bus. Via the control signals of the host bus, the LRM processor requests use of the host bus. The MRM processor grants control of the bus to the LRM as soon as any pending bus transactions are completed. The LRM becomes the new MRM, until it yields the bus to the other processor.

The MRM grants the bus to the LRM immediately if that CPU has a pipelined cycle to issue. During this inter-CPU pipelining, the current MRM processor may drive one more cycle onto the bus, or it may grant the address and the control bus to the LRM. The MRM gives the bus to the LRM only if another bus cycle can be pipelined onto the current bus cycle. The result is that the arbitration for the bus doesn't introduce any dead clocks on bus transactions.

To improve the efficiency of the host bus bandwidth, dual-processor systems must include an integrated data-path controller. Memory writes go from the host bus to a FIFO (first-in/first-out) write buffer. The data flows through the FIFO and to DRAM. The result is that memory writes require significantly fewer host bus cycles during write-intensive applications.

It's important to note that the dual-processor design provides for future growth through its support of OverDrive processors. This is accomplished by using a CPUID instruction. With this instruction, system software can establish the processor type in the primary and secondary processor sockets and the features they support. CPUID assigns bits 12 and 13 of the EAX register with values that indicate the processor type. For upgradability with future Pentium OverDrive processors, the system software must allow the EAX register to contain CPU type values following a CPUID instruction.

Software Issues

Adding a second processor to a computer doesn't make the system run any faster if the operating system fails to make use of the second processor. A smart multitasking system allots certain tasks to each processor, distributing th e workload. Furthermore, application code can be written so that it's subdivided into threads. A thread is a portion of application code that runs in parallel with, or independent of, other parts of the application. For example, a spreadsheet application might have a print thread generating the output for a chart while an interface thread accepts new data from the user.

In a multiprocessing system, the application's threads can run on different processors. Thus, a threaded application can use the capabilities of a dual-processor system more efficiently than a nonthreaded application can. For example, the table ``Single-Processor and Dual-Processor Mode Performance'' shows that a threaded version of Adobe Photoshop achieved better performance by running its filtering operations on the secondary processor of a dual-processor Pentium system. However, nonthreaded applications can still benefit from a dual-processor system: The multiprocessing operating system would run on one processor while the applicatio n runs on the other.

Other software factors must also be considered when determining whether a dual-processor design is the best solution for a job. Specifically, an application's memory-usage pattern can affect its performance. Database applications (OLTP workloads) tend to randomly access large areas of memory. These applications perform best on machines with large cache memories and, thus, work best on multiprocessor machines. Workstation applications are more calculation-intensive, tending to execute tight loops that fit in smaller caches. These applications can benefit from a dual-processor system, with its smaller cache and additional compute power supplied by the secondary processor. For example, Adobe Photoshop filters ran 50 percent to 80 percent faster, as shown in the table. Your mileage will vary.

Because of its simple design and low cost, the dual-processor system design is well-suited to advanced desktop systems. Although its reduced bus efficiency means it achieves lower performan ce than multiprocessor systems for some applications, the added cost to accommodate the second processor is minimal.


Single-Processor And Dual-Processor Mode Performance



Intel evaluated multiple image files filtered by Adobe Photoshop 3.0 running under Daytona on a dual-processor system that uses 90-MHz Pentiums. The data shows the improved performance of the multithreaded functions.


FILTER              DUAL-PROCESSOR/UNIPROCESSOR RATIO
Despeckle                        1.64
Dust & Scratch                   1.51
Find edges                       1.70
HighPass                         1.68
Median                           1.64
Radial blur                      1.73
Unsharp mask                     1.79


Figure: PENTIUM DUAL-PROCESSOR SYSTEM
Roeland van Krieken is program manager in the Microprocessor Group at Intel. He was an engineering manager for the 386 and 486 processors. He can be reached on the Internet or BIX at editors@bix.com .

Up to the Core Technologies section contentsGo to next article: CTOS RevealedSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network