this strategy in mind while designing the Pentium Pro processor: Its bus has built-in support for a four-processor system.
Implementing a four-way multiprocessing environment isn't easy. For multiple processors to work in concert and share resources effectively, you must resolve many issues (e.g., how they interact during system reset, system initialization, and the OS boot). The Pentium Pro mechanism uses a combination of embedded hardware, processor-resident microcode, and firmware to produce a reliable yet extensible multiprocessor building block.
Bus Organization
To achieve this goal, Intel bused together all four processors' signals (as shown in the figure
"A Multiprocessor System Bus"
). This design uses two of the five buses: the arbitration bus and the advanced programmable interrupt controller (APIC) bus. (The other three are the control, data, and address buses.) The reset operation makes heavy use of both these
buses. We'll show how they assist in establishing the multiprocessing environment.
During reset, some power-on circuitry pulls one of the arbitration lines low. The board's hardware for the arbitration bus implements a rotating bit pattern on these bus lines, which creates a unique configuration for each processor. This configuration defines a processor's ID, which is used for all subsequent bus transactions. During normal (i.e., nonreset) processor operation, the processors use the arbitration lines to control access to the control, data, and address buses.
The APIC bus supports delivery of targeted or broadcast interrupt messages in a multiprocessor-system environment. During a reset operation, the processors send interprocessor interrupts (IPIs) to each other using the APIC bus. I/O devices or processors can place IPI messages onto this bus to be received by one or more processors. System software sets up the interrupt priorities for these messages, and the OS can use various delivery schemes
for them. All APIC devices communicate using a three-wire bus.
This APIC bus differs slightly from the two-processor Pentium design described in the article
"Pentium Chip's Dual Personality"
(December 1994 BYTE). There, one of the lines served as an APIC enable, another acted as a chip select, and the third handled a clock signal. Here, two of the wires are wired-OR data lines, and the third wire is a common clock signal.
Dueling Processors
All processors must be connected to the APIC bus. The systems designer also provides an APIC clock signal. This bus is required for a hardware reset of the multiprocessor environment, even if it's not used after reset. (Intel strongly recommends that a multiprocessor system use the APIC interrupt scheme.)
A processor first checks that the APIC bus is not busy before initiating a data transfer; it then drives the APIC data lines low during a common clock phase to initiate the transfer. If two or more processors try to initiat
e an IPI message during the same clock, the processors negotiate by driving their unique arbitration ID (derived from the processor ID) onto the data lines.
The processor with the highest-priority ID wins the arbitration, and the losing processors back off and wait for the APIC bus to fall idle. All devices now increment their arbitration ID. This puts the winner at the end of the priority queue for the next arbitration cycle. This round-robin scheduling algorithm guarantees that one--and only one--device sends IPI messages on the APIC bus at any time. It also ensures that each device has equal access to the bus bandwidth.
Following this arbitration sequence, an APIC device drives more serial bits onto the two data-bus lines, so that all the other devices on the APIC bus receive this IPI message. The APIC bus supports four categories of messages, as determined by the serial bits. Each message also has multiple subtypes to match the needs of various priority schemes. During reset, BOOT IPI messag
es are used, and WAKEUP and INIT IPI messages may be used.
Once a RESET signal is recognized, all the processors execute identical microcode (as shown in the figure
"Which Processor Takes Control?"
). Each processor checks its INIT pin. If low (which is recommended), the processor executes a built-in self test (BIST). A processor executing a BIST drives the reset-not-complete pin active, which prevents other processors from moving to the next phase until all the processors have completed BISTs.
The final parts of the reset stage set the processor's CS register to 0FFFF:0F000h and the EIP register to 0FFF0h. This forces the first code fetch from the RESTART vector at 0FFFF:FFF0h, or just below 4 GB. The systems designer can arrange for the Pentium Pro processor to start execution at 0F:FFF0h, or just below 1 MB. Intel provides this 286-compatible alternate scheme so that systems with more than 4 GB of memory need not have a "hole" in the address space to accommodate the RESTA
RT vector. The microcode also clears the bootstrap processor (BSP) register. As its name implies, the BSP is a machine-specific register that identifies the bootstrap processor.
The next stage of initialization involves selecting a bootstrap processor from the available processors. All the processors are eligible to become the single bootstrap processor, rather than defining that a processor with, say, an ID of 0 becomes the bootstrap processor. This eliminates a single-point failure situation, where a system boot sequence stalls because that particular processor fails to operate. The processors continue to execute from microcode and implement a multiprocessor boot protocol.
Each processor broadcasts a BOOT IPI onto the APIC bus--note that the APIC bus serializes these requests--and each processor receives
n
BOOT IPIs. Each processor checks these incoming APIC IPIs. If the first one received has the same ID as the processor itself, this processor becomes the bootstrap processor.
Simply put, the fastest processor wins this arbitration round, and it sets the BSP register to 1. If the first ID doesn't match, that processor executes a wait loop in microcode. This essentially puts the losing processors to sleep because they don't perform external bus accesses. The bootstrap processor fetches code pointed to by the RESTART vector and starts executing the system firmware. This code is typically the system BIOS.
Design Issues
There might be a hardware reason why a systems designer would want a specific processor to serve as the bootstrap processor, rather than one randomly chosen by the bootstrap algorithm. DOS-compatible hardware, for example, might be connected only to a particular processor. In this case, the current bootstrap processor, if it isn't handling the compatibility signals, sends a WAKEUP IPI to the required processor. It also sends an INIT IPI to itself.
The bootstrap processor enters a wait-loop microcode sequence, effectively putting itself to
sleep. The processor that receives the WAKEUP IPI extracts an embedded RESET vector from this IPI message and starts executing firmware code. This new RESET vector lets the awakened processor execute different firmware from the bootstrap processor. The original bootstrap processor clears its BSP flag, and the awakened processor sets its BSP flag to 1. This sequence transfers the responsibility of booting the OS to the newly anointed bootstrap processor.
The BIOS typically executes a system self test, and the other processors may be turned on for testing purposes using WAKEUP IPIs. The initiating processor can remain active to perform multiprocessor testing or can turn itself off by sending itself an INIT IPI. Following the successful completion of the power-on self tests (POSTs), the systems programmer should switch off all the processors except one. He or she must take care while switching processors on or off: The last processor left on must have its BSP flag set (indicating that it's the bootstrap
processor).
Each processor in a multiprocessor system must be initialized consistently. They must, for example, have a common view of the system memory map that defines which areas are cacheable, noncacheable, I/O, and so forth. Other multiprocessor initialization, such as system management mode and machine check architecture, should be completed at this stage.
The bootstrap processor interrogates the system hardware and builds a table that describes the hardware configuration. This standardized table contains information about each processor, expansion buses, I/O APIC descriptions, I/O interrupt assignments, and local interrupt mappings. The OS may use this resource list to support plug and play. Full details of this table and its parameter passing are described in
The Multiprocessor Specification
, which is available from Intel's World Wide Web site (
http://www.intel.com
) or by contacting the Intel Literature Center at (800) 548-4725 and requesting packet #242016-004.