Motorola's latest RISC processor is small, yet it delivers big computing power.
Bill Moyer and John Arends
Battery life has become a major competitive force in hand-held computing products, perhaps as important as a low weight and a small form factor. Motorola's newest 32-bit offering, the MCore RISC processor, targets cost-sensitive embedded control applications that demand high performance yet must consume little power. MCore processors minimize power consumption by combining a fully static CMOS design with low-voltage operation. Initial versions of the chip operate at 1.8 volts, feature several low-power operating modes, and provide dynamic power man
agement. These capabilities make the MCore ideal for battery-operated portable products.
MCore's first production version is implemented in a 0.36-micron, triple-level-metal static CMOS process. The result is an 80,000-transistor CPU that occupies just 2.2 mm
2
of die. MCore products come in low-cost plastic ball grid array (PBGA) and thin quad flat pack (TQFP) versions that sport 100 to 200 pins, depending upon the application. The low pin count eliminates additional signal lines from a product's design, thereby reducing its size and cost.
Processor Architecture
The MCore processor core is divided into a data-path section and a control section. The data-path section consists of 50,000 transistors, while the control section uses the remaining 30,000 transistors for control circuitry and clocking. The data-path section consists of a program counter unit, an execution unit, a register file unit, a memory interface unit, and a hardware accelerator
interface (HAI) unit, as shown in the figure
"MCore Microarchitecture."
The control section manages the overall sequencing and coordination of the execution units and interfaces. Additional logic in the data-path section minimizes power consumption by automatically powering down unused internal function units on a clock-by-clock basis. Doze, Wait, and Stop power-conservation modes provide comprehensive system power management.
The execution unit contains a 32-bit arithmetic logic unit (ALU), a 32-bit single-cycle barrel shifter, a multiply/divide unit, a find-first-one unit (a priority encoder), and result-feed-forward hardware. All arithmetic and logic operations execute in a single cycle. The exceptions to this rule are, of course, the multiply and divide operations. The multiply instructions use a modified Booth's algorithm with early-out capability that reduces execution time for operations with small multiplier values. The divide instructions also offer minimized execution tim
ing.
The program counter unit has a dedicated program-counter incrementer and a dedicated branch address adder that minimize the execution time required to deal with a change in program flow. Branch target addresses are calculated in parallel with the branch instruction decode and branch condition checking. Thus, a conditional branch executes in only two clock cycles, while branches not taken execute in one.
Memory load and store operations execute in two clock cycles, where one cycle adds a scaled displacement to a base address pointer value and the second cycle performs the memory access. Load and Store Multiple Register instructions allow low-overhead register file save and restore operations. These instructions execute in (N+1) clock cycles, where
N
is the number of registers to transfer. The memory interface unit provides a full 32-bit address bus and a 32-bit data bus, along with access attribute indicators for transfer of instructions and data operands. The memory interface unit mo
nitors these attributes along with the logical address to provide memory protection.
MCore has sixteen 32-bit
general-purpose
registers. Programs operating in the chip's supervisor mode have access to a second set of sixteen 32-bit registers, which normally serves as an alternate register file. The register file unit contains both the 16-entry general register file and the alternate register file, plus 13 status/control registers available to supervisor software.
Throughput Optimization
System cost and power consumption are strongly affected by an application's memory requirements. While MCore is a 32-bit load/store RISC architecture, it adopts a compact 16-bit fixed-length instruction format. Benchmark results on a variety of application tasks indicate that the code density of MCore programs is higher than many CISC designs, in spite of the fixed-length instructions. The high code density lowers an embedded product's cost, since the most expensive parts of a design
are memory. The 16-bit instructions also reduce the amount of fetch traffic on an external bus, further reducing power consumption. Finally, the instruction width keeps system performance high even when a design uses 16-bit memory to minimize costs.
For embedded applications that require real-time processing, MCore provides an exception mechanism that is both flexible and fast. Exception processing uses an exception vector table (a table of 32-bit pointers) and a set of internal shadow registers to transfer control to an exception handler. MCore uses a relocatable vector table that contains 128 exception vectors. For external devices that don't provide an interrupt vector, an autovector (default vector) capability is provided.
MCore processors support two independent interrupt requests: a normal interrupt and a higher-priority fast interrupt. The fast interrupt request uses a dedicated set of shadow registers that eliminates having to preserve the processor's context on the stack before the interr
upt handler executes. Software can reserve the alternate register file for exclusive use by interrupt handlers. This enables support of extremely low-latency interrupts, and it makes real-time processing possible.
MCore's hardware accelerator interface supports tightly coupled hardware function blocks that extend the MCore architecture. For flexibility, the interface is generic in nature and makes few assumptions about the actual processing being accelerated. The HAI operates independently of the memory and peripheral interfaces to allow overlapped execution. A base set of instruction primitives allows the explicit transfer of operands and instructions to and from external function blocks. Hardware handshaking can control the rate of the instruction and data transfers. The function blocks are tailored to boost processing for application-specific purposes. For example, such a block might act as a DSP arithmetic unit or a graphics accelerator; another block might handle speech processing or handwriting r
ecognition.
Small Die, Big Performance
Initial MCore processors use supply voltages ranging from 1.8 to 3.6 volts. The chips operate at 50 MHz. The "sedate" clock rate dramatically lowers power consumption -- critical for a hand-held device. At 50 MHz, MCore delivers 48 Dhrystone 2.1 MIPS yet consumes only 20.5 milliwatts. The inexpensive packaging, support for 16-bit memory devices, and low power consumption, combined with high performance, make the MCore processors attractive for the cost-sensitive consumer and embedded-control markets. It also provides a migration path for existing 8-bit and 16-bit controller applications.
illustration_link (28 Kbytes)

This processor offers performance features such as hardware multiply/divide and registers for low-latency interrupt handling.
illustration_link (22 Kbytes)

The alternate register file provides for low-overhead context switching in real-time processing.
Bill Moyer (
billm@sandbox.sps.mot.com
) is a principal architect and systems designer for Motorola. John Arends (
john_arends@email.sps.mot.com
) has worked on Motorola's RISC processor designs and implementations.