Next generation: R10000
DOSSIER:
The R10000 inherits its superscalar design from the R8000, which was created for the scientific supercomputing market. But the R10000 is intended for more mainstream applications. The R10000's adoption of dynamic instruction scheduling, which reduces the need to recompile software that's written for older-generation processors, is of particular benefit to Mips partner Silicon Graphics, which has a back-catalog of large and complex graphics applications.
The R10000 is the first single-chip superscalar processor from Mips. It features dynamic branch prediction to minimize pipeline stalls, with up to four levels of speculative execution, using register renaming to ensure that no results are committed
to the real registers until the branch is resolved. The chip maintains a "shadow map" of its rename register mappings. In the event of a mispredicted branch, it just restores this map rather than having to clear registers and flush buffers, which reduces the penalty to one cycle.
The R10000 also features a radical out-of-order execution scheme. Instructions remain in program order through the first three pipeline stages, but following that they divert into one of three queues (which wait on the integer ALUs, the FPUs, or the load/store unit). These queues get served in whatever order that their resources become free.
Program order is eventually restored by always "graduating" (which is Mips jargon for retiring) the oldest instruction first. This also ensures precise exception reporting. This hardware-assisted instruction reordering offers a great advantage to end users, because code that's written for Mips's older, scalar CPUs (e.g., the R4000) will gain almost the entire speed benefit without nee
ding to be recompiled.
Although it has the potential to issue five execution instructions per cycle, the Mips R10000 can fetch and retire only four; a fifth cannot complete on the same cycle. Nevertheless, this excess of dispatch bandwidth offers more flexible opportunities for instruction scheduling, which means that the R10000 can complete four instructions per cycle for more of the time.
One of the twin double-precision FPUs handles addition; the other processes multiply/divide and square-root functions. The latter is further split into two subunits that can execute a division operation and a square-root operation in parallel. An on-chip processor-bus interface permits up to four R10000s to be clustered in a multiprocessor configuration without requiring any glue logic.
OFFICIAL INTRODUCTION DATE:
Fourth Quarter of 1995
CURRENT STATUS:
Sampling
LIKELIHOOD INTRODUCTION DATE WILL BE MET:
Good
TARGET CLOCK SPEED:
200 MHz
ESTIMATED PERFORMANCE:
300 SPECint92; 600 SPECfp92
FABRICATION PROCESS/FEATURE SIZE:
CMOS/0.35-micron
TECHNOLOGICAL ADVANTAGES:
The 64-bit R10000 has five functional pipelines, so it can potentially execute up to five instructions in any given cycle. With two double-precision FPUs, the R10000 is optimized for high floating-point performance.
TECHNOLOGICAL DISADVANTAGES:
For best performance, the off-chip secondary cache needs to be built from expensive SRAM.
PRIMARY MARKET:
Graphics workstations, symmetric multiprocessing (SMP) servers, and supercomputers.
WHERE TO FIND:
Mips
Mountain View, CA
(415) 933-6477
fax (800) 446-6477
http://www.mips.com
Mips R10000
photo_link (38 Kbytes)

Twin FPUs in the R10000 processor mean exceptional SPECfp92 speeds.