Next generation: PA-8000
DOSSIER:
Hewlett-Packard was an early entrant into the RISC market, launching its first 32-bit PA-RISC processor in 1986. Practically all PA-RISC chips go into HP's 9000 series workstations. HP shipped enough of these machines from 1991 to 1993 (prior to the introduction of PowerPC systems) to make PA-RISCs the biggest-selling RISC chips in dollar volume.
HP set up the Precision RISC Organization (PRO) to promote the use of its chips by other manufacturers. Then, in 1994, HP dropped the bombshell that it was teaming up with Intel to create a new architecture. This throws doubt on the future of the PRO.
The PA-8000 is a 64-bit, four-way superscalar design with a radical out-of-or
der execution scheme. The chip has 10 function units: two integer ALUs, two integer shift/merge units, two floating-point multiply/accumulate (MAC) units, two floating-point divide/square-root units, and two load/store units. The MAC units have a three-cycle latency and are fully pipelined for single-precision processing to deliver up to 4 FLOPS per cycle. The divide units have 17-cycle latency and are not pipelined, but they can be run concurrently with the MACs.
The PA-8000 uses a 56-instruction-deep instruction reorder buffer (IRB), which looks ahead to the next 56 instructions in the stream to find four that can execute in parallel. The IRB actually consists of two 28-slot buffers: The ALU buffer holds instructions for the integer unit and FPU, and the memory buffer holds load/store instructions.
Once an instruction has been stored into an IRB slot, the hardware watches all instructions being dispatched to the function units to see whether any of them supplies one of the operands for the instr
uction in the slot. The instruction in the slot runs only after the last instruction for which it is waiting has been dispatched. Each of the IRB's two buffers can dispatch two instructions per cycle, and in all cases it's the oldest instruction in the buffer that gets issued. Since the PA-8000 uses register renaming and retires instructions from the IRB in program order, it maintains a precise exception model.
HP designed the PA-8000 especially for commercial data process-ing and complex computational applications, such as genetic engineering, where data sets tend to be too big to fit into any conceivable on-chip cache. Thus, the PA-8000 employs external primary data and instruction caches. The slots in a third 28-slot buffer, called the address-reorder buffer (ARB), are associated one-to-one with the slots of the IRB's memory buffer. The ARB keeps the virtual and physical addresses of all dispatched load/store instructions. In addition, the ARB permits loads and stores to execute out of order while ma
intaining coherency, and it effectively hides the long latency associated with the addressing of off-chip caches.
OFFICIAL INTRODUCTION DATE:
First Quarter of 1996
CURRENT STATUS:
Sampling
LIKELIHOOD INTRODUCTION DATE WILL BE MET:
Good
TARGET CLOCK SPEED:
200 MHz
ESTIMATED PERFORMANCE:
Greater than 360 SPECint92; greater than 550 SPECfp92
FABRICATION PROCESS/FEATURE SIZE:
CMOS/0.5-micron
TECHNOLOGICAL ADVANTAGES:
HP is the only RISC vendor to leave its primary instruction and data caches off-chip, where they can be made to be several megabytes in size. (RISC cores designed solely for maximum processing speed often perform badly on data sets that are too big to fit in the cache.)
TECHNOLOGICAL DISADVANTAGES:
The off-chip caches run at full CPU speed and so need to be created with ult
rafast SRAM, which makes them expensive to build.
PRIMARY MARKET:
Commercial data processing; engineering and scientific workstations.
WHERE TO FIND:
Hewlett-Packard
Cupertino, CA
(408) 447-4747
fax (408) 447-7983
Hewlett-Packard PA-8000
photo_link (30 Kbytes)

Off-chip instruction and data caches make HP's design unique.