r's throughput is ideally suited for the computing demands of today's 32-bit computer peripherals.
The PowerPC's foray
into the embedded market isn't new. IBM introduced the PowerPC 403, a microcontroller version of the processor, in April 1994. One of the latest variants of this design, the 403GB, features a PowerPC core that dispatches two instructions at a time. However, its very sophistication, combined with the physical footprint required to support separate address and data buses, makes the 403 a poor fit for cost-sensi-tive consumer electronics or hand-held devices.
Enter the PowerPC 401GF. While it retains many high-performance features of its predecessor, it does so with a slimmed-down RISC core. A multiplexed bus interface reduces the chip's footprint; in addition, at 2.5 V, it consumes only 40 milliwatts at 25 MHz. This makes the 401 ideal for custom embedded-processor applications in the low-cost consumer-electronics market.
The Small yet Smart Core
The 401GF gets both its smarts and its processing brawn from a 32-bit RISC processor core that's code-compliant with the PowerPC 60x f
amily. Called the 401 core, it sports a surprising number of 60x processor features, as shown in the figure
"The PowerPC 401 RISC Core"
.
It has 32 32-bit general-purpose registers (GPRs), several special-purpose registers (SPRs), an ALU, multiply and divide hardware for 32-bit integers, a barrel shifter, and control logic that supervises data flow and code execution in the core. The 401GF also has static-branch-prediction logic, similar to that used by the PowerPC 601 and 603, to improve code performance.
The 401 core has a three-stage pipeline (instruction fetch, instruction decode, and instruction execute) that boosts code throughput. These pipeline stages also expose some of the decoded instructions for use by other function units, such as the 401GF's memory management unit (MMU).
But certain design compromises were made to reduce the 401 core's transistor count, thereby reducing its power consumption. For example, the 401 core does not have an FPU.
Moreover, w
hile the 60x family has multiple execution units that can operate on two or more instructions concurrently, the 401 core executes a single instruction at a time. While this trade-off constrains the 401 core's performance, it also eliminates the large number of transistors required for buffers, reservation stations, and other logic needed to support the concurrent operation of the execution units.
The end result is a RISC core that is comprised of only 85,000 transistors and is physically very small. Using 0.5-micron, triple-level-metal CMOS fabrication technology, the 401 core occupies only 4.5 square millimeters on a die.
IBM also offers custom solutions, in which the 401 core and optional function units can be fabricated into an ASIC that is targeted for a specific design. The 401GF represents such an implementation, with its data and instruction caches, MMU, several timers, and peripheral interface.
Embedded Power
The heart of the 401GF consists of three tightly coupl
ed function units: the 401 CPU core, the data-cache unit (DCU), and the instruction-cache unit (ICU), as shown in the figure
"The PowerPC 401GF Architecture"
. The 401 core, in tandem with the bus-interface logic, can field misaligned data on load/store instructions. This capability allows programmers to tightly pack data or code and conserve RAM, which is a high-cost item for embedded applications.
Both of the 401GF's cache units have data arrays, tags, and control logic for addressing and cache management. The 401GF uses a Harvard architecture and has a 1-KB data cache and a 2-KB instruction cache. (Custom 401GF implementations can be fabricated with caches up to 16 KB in size.) The small size of these caches is offset by the performance gained through the cache's two-way set-associative organization.
The DCU uses a copy-back strategy during cache operations to reduce bus traffic. This means that writes to main memory occur only for those data items that get modified in the
cache. These updates take place when the altered data must be purged from the DCU in order to make room for new data. Control bits in the data-cache-control register can disable the cache for specific sections of memory.
Managing Memory
The 401GF has a sophisticated bus-control unit (BCU) that handles transfers among the external bus, the caches, and the registers within the processor core. The BCU can be programmed to handle a mix of 8-, 16-, or 32-bit devices. For example, in a set-top box, the BCU might use 8-bit accesses to fetch instructions from inexpensive 8-bit ROMs while performing 32-bit accesses to update a bank of DRAM that acts as a frame buffer.
The BCU has a programmable read/write burst mechanism that allows it to work with burst-mode ROMs and with page-mode DRAM to expedite cache-fill and flush operations. The BCU handles big-endian or little-endian byte ordering and supports transfers between it and external bus masters. This combination of programmable bus width
, endian addressing modes, and read/write functions allows the 401GF to easily integrate with any type of peripheral or memory device, a plus for any design.
The 401GF has a real-mode MMU. That is, the MMU handles memory protection and assigns access attributes to sections of memory, but it doesn't perform memory-address translation. (A virtual-mode MMU that addresses translation and memory paging is available for special purposes.)
Conserving Power
The 401GF uses a fully static CMOS design. This allows the processor clock to be switched off, putting the processor to sleep for power savings, yet the contents of its registers and its internal state are preserved. The 401GF can resume where it left off when an external event switches on the processor clock.
The 401GF has a programmable clock multiplier with ratios of 1:1, 2:1, 3:1, and 4:1. This lets the 401GF operate internally at higher clock rates for better performance while the system runs at a lower, less power-consuming f
requency.
Like the PowerPC 603, the 401GF can selectively disable the clock to idle function units, which can further reduce power consumption. Also, you can issue commands to the SPRs so that the processor enters one of three power-conservation modes (nap, doze, or sleep).
At 25 MHz, the 401GF typically consumes 40 mW; it can be as little as 0.015 mW when it's in sleep mode. At this frequency, the 401GF can execute 44,000 Dhrystones per second, or 26 Dhrystone MIPS (DMIPS). Also, 50-, 75-, and 100-MHz versions of the 401GF will be available later this year as 3.3-V parts.
IBM has a library of various function units (e.g., a virtual-mode MMU, a digital signal processor or floating-point coprocessor, and a serial I/O interface) that you can add along with a 401 core to create a custom ASIC for a specific design. You can even add your own specialized logic, thus producing a high-performance 401-based ASIC for unique embedded applications.
Wher
e to Find
IBM Microelectronics Division
Hopewell Junction, NY
Fax: (415) 855-4121
Internet:
http://www.chips.ibm.com