The G3, as disclosed, differs somewhat from the PowerPC road map released by IBM and Motorola late last year. In that plan, the G3 would use a 0.35-micron process technology, operate at an initial clock speed of 200 MHz, and certain variations of the design would have nearly 30 million transistors. The announced G3 is a 32-bit processor that leapfrogs the plan and starts at 250 MHz. It uses a 0.25-micron five-metal-layer static CMOS process. This process allows the G3 to pack 6.35 million transistors (the 604e has 5.6 million) onto a die 67 mm
2
, making it smaller than the 603e's 81-mm
2
die. This is certainly not one of the 30-million-transistor behemoth
s alluded to by the AIM alliance, but the specifications are
still impressive
.
In designing the G3, the Somerset engineers first did extensive code traces of PowerPC applications. Monitoring the streams of instructions and data this way allowed them to identify bottlenecks in the second-generation PowerPC designs that could be eliminated from the G3. As a result of this approach, the G3 benefits from the best technologies found in the existing 6xx product line. The G3 is an amalgam of the enhanced versions of the 603e's power management, the 604e's dynamic branch prediction logic, and the 620's integrated L2 controller and dedicated cache interface.
As shown
in the figure
, the G3's RISC core with five execution units resembles the 603e's. Both designs have a floating-point unit, a branch unit, and a load/store unit. Where they differ is that the G3 has two single-cycle integer units, while the 603e has only one integer unit and a system unit that was cou
nted as an execution unit. The G3's newly designed load/store unit can process loads and stores to the cache in one clock cycle. The FPU has a three-stage pipeline to boost math computations. As mentioned, the G3's branch unit has been beefed up with the 604e's dynamic prediction logic. It can process one branch instruction per cycle, with one speculative stream in execution and an additional speculative stream in fetch.
The G3 uses a four-stage pipeline that consists of fetch, decode-dispatch, execute, and complete-writeback stages. The fetch unit retrieves four instructions per clock. When an instruction gets loaded into the cache, a predecode operation creates a completely new 36-bit opcode. This data assists the processor's dispatch logic in issuing instructions to the proper execution units. The G3 performs only a two-instruction dispatch (like the 603e) because using the 604e's four-instruction dispatch mechanism while maintaining a high clock speed would require a complete redesign. The G3 can, h
owever, sustain a peak execution rate of three instructions per clock.
Cache Considerations
The G3 has the same amount of on-chip cache as the 604e: two 32-KB caches (one for instructions, one for data), each supported by its own memory management unit. The caches are eight-way set associative, using 128 sets. The cache size and set count were fixed by the requirement that the G3 be pin-compatible with the 603e and the 604e; one variant of this first G3 uses the same 255-pin ball grid array (BGA) connector. A high-performance variant provides on-chip support for an L2 cache. The built-in L2 cache controller has 4 KB of tag entries that can be configured to manage a two-way set-associative L2 cache set for sizes of 256 KB, 512 KB, or 1 MB. The L2 interface supports direct connections to several types of SRAM. A divider circuit runs the L2 cache memory at ratios of 1:1 to 1:3 in half-clock increments. Of course, the L2 interface requires extra signal pins, and so this G3 variant uses a
360-pin BGA. Both the on-chip and the L2 cache logic support copyback and write-through modes with bus snooping.
Power and Performance
The G3's purpose is to provide high-performance computing power for both mobile applications and desktop systems. The direct L2 cache interface helps meet the performance goals of desktops. For mobile systems, the G3 relies on a proven set of power conservation features lifted from the 603e. The G3 uses a 2.5-V core to reduce power consumption, while the bus interface still operates at 3.3 V for compatibility with existing designs. At 250 MHz, the G3 dissipates 5 W when running at full bore. This is slightly more than a 166-MHz 603e, which dissipates 3 W at peak speed.
The G3 provides a variety of clock multipliers, starting at 2:1 and climbing to 8:1, with half-clock frequency multiples. This allows a desktop design to use, say, a lower-cost 50-MHz system bus while the processor races at 250 MHz. Conversely, portable designs can employ a diffe
rent multiplier so that the G3 runs at 250 MHz while using a low-power, slower system bus. The dynamic power management hardware monitors the instruction stream and selectively disables the clock to an execution unit that falls idle. A thermal assist unit has an on-die thermometer that allows an OS to monitor the processor's temperature and take action before it overheats. The OS can either switch the G3 into one of the low-power doze, nap, or sleep modes or throttle the instruction cache so that the processor effectively slows down. Once the chip's temperature falls to a preset level, the OS has the G3 resume normal operation.
Motorola and IBM peg the G3's performance -- when running at 250 MHz and using a half-speed 1-MB L2 cache and a 50-MHz system bus -- at an estimated 10 SPECint95. The G3 thus delivers roughly the same throughput as a 604e but consumes substantially less power (a 166-MHz 604e dissipates 10 W). Therefore, the G3 straddles the capabilities of 603e- and 604e-based designs. The 603e w
ill continue to be used in low-end, low-cost designs. Because the G3's bus implements only the MEI (modified, exclusive, invalid) protocol, like the 603e, it can be used only in single- or dual-processor designs. It will handle high-performance mobile systems or desktop systems. Ultra-high-performance multiprocessor systems will continue to use the 604e. The G3 has been sampling for several months, and because it's pin-compatible with both the 603e and the 604e, you can expect to see it in shipping PowerPC-based computers this summer.
illustration_link (22 Kbytes)

The G3 uses enhanced technologies of existing PowerPC processo
rs to deliver high performance while consuming little power.
photo_link (162 Kbytes)

The G3 packs more transistors than a PowerPC 604e on a die that's smaller than a PowerPC 603e.
Tom Thompson is a BYTE senior technical editor at large. You can contact him via e-mail at
tom_thompson@bix.com
.