Digital's trio of processors offers different design possibilities
Bruce Faust
In a world where speed is king, not all RISC PCs are created equal. Currently, there is a marketing battle over which of the industry titans' RISC PCs are the fastest. However, there's one unmistakable truth concerning RISC PCs: If you have ever used one and run a "native" Windows NT application (an application compiled for the RISC processor, not something running in an x86 emulator), you'll never want to go back to an x86-based system.
Consider, for example,
Digital Equipment's Alpha AXP family of microprocessors
. Digital's Semiconductor Operation (Hudson, MA) has developed CPUs for many years, and the Alpha comes from the micro
VAX family of CPUs. However, the Alpha is rather unique when compared to other RISC processors. It was designed from the beginning as 64-bit processor, which differs from other 64-bit RISC processors that have evolved from 32-bit implementations. It has 64-bit address and data lines, pipelined both in and out of the processor. Furthermore, not only is the Alpha superpipelined but superscalar as well. In a superscalar design, the CPU is issuing more than one instruction per clock tick. Digital's newest Alpha design, the 21164, issues four instructions per clock tick. This super-superscalar approach, coupled with a 300-MHz clock speed, yields a mind-bending 1.2 billion instructions per second.
There are essentially three types of Alpha CPUs. The table on page 240 shows a comparison of the Alpha family of processors. These are the 21066, 21064 (with two varieties), and 21164. As the table shows, the taxonomy of each processor is quite similar. However, there are variations in the internal cache sizes, clo
ck speeds, and external glue logic required. With that in mind, let's start with the first member of the Alpha family.
The 21066 (PCA, PC Alpha)
The 21066's strength lies in its ready ease of integration into a PC system, hence the moniker, PC Alpha. That's because the 21066 provides all on-board cache, DRAM, and PCI (Peripheral Component Interconnect) logic signals. The PCI bus interface is 32 bits wide, which offers transfer rates of up to 132 MBps. Put another way, the designer doesn't have to design the external glue logic for the cache, main memory, or a PCI interface. A computer architect can lay out the motherboard and then attach the multiplexed address and data lines for the write-back cache and main memory. Digital added an on-board PLL (phase-locked loop) that further simplifies the implementation of the PCI interface. You supply the 21006 with an external 33-MHz clock signal, and the PLL multiplies it internally to give the processor a clock speed to either 166 MHz
or 233 MHz. Meanwhile, the external hardware, such as the PCI bus, cache, and memory continue to operate at 33 MHz, simplifying design and component costs.
In some tests, the 21066 can outperform the faster 21064 family of CPUs in PCI I/O, simply because the former processor's PCI interface is efficient. The cache and DRAM bus are 64 bits wide, giving the processor bandwidth up to 264 MBps. However, because the cache and DRAM interface are time-multiplexed, the 21066 takes a performance hit relative to the 21064 and 21164 processors on memory accesses.
While the 21066's integer performance is bested by Intel's Pentium (94 SPECin92 at 233 MHz versus 112 SPECint92 for a 100-MHz Pentium), the Alpha's floating-point performance is quite impressive (110 SPECfp92 versus the Pentium's 82 SPECfp92). Floating-point computations are extremely important for such applications as rendering, animation, CAD, and other scientific applications. The strengths of the 21066 are evident in low-cost 64-bit RISC appli
cations. If you want good floating-point performance as well as good I/O performance in a low-cost workstation, the 21066-based workstation is for you. Users of 21066-based systems enjoy about 25 percent better floating-point performance than Pentium 100 users. Base prices for 21066-based machines are under $4000.
The 21064
The 21064 was the first Alpha processor to arrive on the market, originally running at 150 MHz. Now the chip ticks along at 275 MHz. However, the 21064 requires external glue logic to interface the cache, DRAM, and PCI. The 21064 uses separate (nonmultiplexed) address and data lines; therefore, memory accesses are more efficient than in the 21066. This bus arrangement also allows such enhancements as doubling the data paths from 64 bit to 128 bits, which offers a more effective method for minimizing wait states and maximizing cache efficiency. However, designing the cache technology to minimize the wait states from the CPU to cache memory is somewhat difficu
lt, because a 21064 running at 275 MHz has a 4-ns (nanosecond) access cycle time. As result, even using the currently available 15-ns, 1-Mb static RAMs yields four wait states per memory access at best.
Using such cache techniques as two-set associativity and synchronous static RAMs greatly improves cache performance. For sequential data applications, it is sometimes better to operate a smaller yet faster cache, such as one 512-KB cache using 10-ns parts. In applications where the data might be accessed randomly, having a larger yet slower cache offered better performance.
Newer 21064-based designs that offer
cache SIMM modules
are on the way. These cache SIMM modules are densely populated and can use fast 10-ns, 256-KB or 1-MB parts. These modules can be expanded from 2 MB up to 8 MB, allowing the 21064-based system to gain the best of both worlds: fast 10-ns access time for sequential applications and a deep cache for random-access applications. However, this makes the 21064 design mo
re complex than developing a 21066-based machine.
Although the 21064 might be more of a design challenge for engineers, users who like the more-power approach to computing love this class of machine. Running native Window NT applications, a 275-MHz 21064 machine is about twice the speed of the Pentium 100 system, and floating-point performance is roughly four times faster than that of the Pentium. Emulated 16-bit x86 applications on the Alpha run at about the speed of a 50-MHz 486DX2. So, if you run many 16-bit applications, you might want a Pentium system instead.
The 21164
The 21164 is the newest in a series of Alpha CPUs from Digital. And this one really screams, especially when it can operate at 300 MHz. At this speed, it posts 330 SPECint92 and 500 SPECfp92. The key to this blazing performance is that the processor has a level 2 cache on-chip and issues four instructions per clock cycle. Because the level 2 cache is latched to the speed of the microprocessor, it off
ers zero wait states. The only exception, of course, is if the next set of data is not cached in either the level 1 or level 2 cache and must be fetched from an off-chip cache or from main memory. With cycle times now less than 4 ns, and using cache module SIMMs with 10-ns speeds, the 21164 will probably have at least four wait states. However, silicon that glues this chip to a third level cache, DRAM, and PCI interface is not yet available. Such chip sets are expected to be released later this summer. Also, the planned PCI interface is expected to be expanded to 64 bits, adding to the complexity of the ASIC design of the glue interface. Early versions of systems based on this chip will be expensive. Such systems will have complex designs and will require costly high-speed parts to keep the 21164 running at full speed. Also, the 21164 alone has a price tag of $3000--higher than the price of some PC systems.
Market Outlook
Despite the Alpha's lead in the clock and performance ra
ce, Digital clearly has a number of obstacles it must overcome to make the processor pervasive. First and foremost, more software vendors need to embrace the Alpha. Currently, a number of vendors have ported applications to the processor. To date, over 1500 vendors with applications such as Word and Excel have already been ported. However, it's the ports of such software as Microstation, Pro Engineer, and NewTek's Lightwave 3D that has fueled a boom in the Alpha-based workstation market. These applications are heavily floating-point intensive, and the native versions of these applications run circles around Pentium and even other competing RISC architectures.
Another obstacle is support silicon. Glue logic chip sets are crucial for system designers to develop hardware capable of harnessing the Alpha technology. Without this, I doubt many designers would be interested in developing programmable array logic-based motherboards. However, DeskStation (Lenexa, KS) has recently developed a chip set for the 21
064 and one for the 21164. These should be available early this summer. Other vendors should follow suit.
Finally, pricing for the Alpha chip technology must entice users to make the switch from Intel or its clones over to Alpha. Time will tell if Digital has made the right gamble.
ALPHA FAMILY FEATURES
Each processors features targets specific price/performance markets.
21066 21066A 21064 21064A 21164
Total on-chip cache size 16 KB 32 KB 16 KB 32 KB 112 KB
On-chip secondary cache size n/a n/a n/a n/a 96 KB
(unified
data
and
instruc-
tion)
Die technology (micron size) .68 .50 .75 .50 .50
to .40
Clock speed (MHz) 166 233 150, 233, 266,
166, 275 300,
200 300+
Transistor count (millions) 2.2 2.4 1.6 1.8 9.6
External data bus 64 bit 64 bit 128 bit 128 bit 256 bit
External cache (level 2) 256 256 128 128 n/a
KB to KB to KB to KB to
1 MB 1 MB 16 MB 16 MB
External cache (level 3) n/a n/a n/a n/a 2 MB
to 64 MB
Build in DRAM interface X X O O O
Build in cache RAM interface X X
O O O
Build in PCI interface (32 bit) X X O O O
External chip set required? O O X X X
External chip set PCI interface 32 bit 32 bit 32 bit 32 bit 64 bit
Pin grid array count 287 pin 287 pin 431 pin 431 pin 499 pin
Power dissipation 22.5 W 27 W 27 W 32 W 43.5 W
X yes;
O = no;
n/a = not applicable
photo_link (31 Kbytes)
Bruce Faust holds a graduate degree and is founder of Carrera Computers and NekoTech. Both NekoTech and Carrera manufacture RISC PCs based on Alpha technology.