Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers

ArticlesRISC Gets Small


February 1998 / Core Technologies / RISC Gets Small

Motorola's latest RISC processor is small, yet it delivers big computing power.

Bill Moyer and John Arends

Battery life has become a major competitive force in hand-held computing products, perhaps as important as a low weight and a small form factor. Motorola's newest 32-bit offering, the MCore RISC processor, targets cost-sensitive embedded control applications that demand high performance yet must consume little power. MCore processors minimize power consumption by combining a fully static CMOS design with low-voltage operation. Initial versions of the chip operate at 1.8 volts, feature several low-power operating modes, and provide dynamic power man agement. These capabilities make the MCore ideal for battery-operated portable products.

MCore's first production version is implemented in a 0.36-micron, triple-level-metal static CMOS process. The result is an 80,000-transistor CPU that occupies just 2.2 mm 2 of die. MCore products come in low-cost plastic ball grid array (PBGA) and thin quad flat pack (TQFP) versions that sport 100 to 200 pins, depending upon the application. The low pin count eliminates additional signal lines from a product's design, thereby reducing its size and cost.

Processor Architecture

The MCore processor core is divided into a data-path section and a control section. The data-path section consists of 50,000 transistors, while the control section uses the remaining 30,000 transistors for control circuitry and clocking. The data-path section consists of a program counter unit, an execution unit, a register file unit, a memory interface unit, and a hardware accelerator interface (HAI) unit, as shown in the figure "MCore Microarchitecture." The control section manages the overall sequencing and coordination of the execution units and interfaces. Additional logic in the data-path section minimizes power consumption by automatically powering down unused internal function units on a clock-by-clock basis. Doze, Wait, and Stop power-conservation modes provide comprehensive system power management.

The execution unit contains a 32-bit arithmetic logic unit (ALU), a 32-bit single-cycle barrel shifter, a multiply/divide unit, a find-first-one unit (a priority encoder), and result-feed-forward hardware. All arithmetic and logic operations execute in a single cycle. The exceptions to this rule are, of course, the multiply and divide operations. The multiply instructions use a modified Booth's algorithm with early-out capability that reduces execution time for operations with small multiplier values. The divide instructions also offer minimized execution tim ing.

The program counter unit has a dedicated program-counter incrementer and a dedicated branch address adder that minimize the execution time required to deal with a change in program flow. Branch target addresses are calculated in parallel with the branch instruction decode and branch condition checking. Thus, a conditional branch executes in only two clock cycles, while branches not taken execute in one.

Memory load and store operations execute in two clock cycles, where one cycle adds a scaled displacement to a base address pointer value and the second cycle performs the memory access. Load and Store Multiple Register instructions allow low-overhead register file save and restore operations. These instructions execute in (N+1) clock cycles, where N is the number of registers to transfer. The memory interface unit provides a full 32-bit address bus and a 32-bit data bus, along with access attribute indicators for transfer of instructions and data operands. The memory interface unit mo nitors these attributes along with the logical address to provide memory protection.

MCore has sixteen 32-bit general-purpose registers. Programs operating in the chip's supervisor mode have access to a second set of sixteen 32-bit registers, which normally serves as an alternate register file. The register file unit contains both the 16-entry general register file and the alternate register file, plus 13 status/control registers available to supervisor software.

Throughput Optimization

System cost and power consumption are strongly affected by an application's memory requirements. While MCore is a 32-bit load/store RISC architecture, it adopts a compact 16-bit fixed-length instruction format. Benchmark results on a variety of application tasks indicate that the code density of MCore programs is higher than many CISC designs, in spite of the fixed-length instructions. The high code density lowers an embedded product's cost, since the most expensive parts of a design are memory. The 16-bit instructions also reduce the amount of fetch traffic on an external bus, further reducing power consumption. Finally, the instruction width keeps system performance high even when a design uses 16-bit memory to minimize costs.

For embedded applications that require real-time processing, MCore provides an exception mechanism that is both flexible and fast. Exception processing uses an exception vector table (a table of 32-bit pointers) and a set of internal shadow registers to transfer control to an exception handler. MCore uses a relocatable vector table that contains 128 exception vectors. For external devices that don't provide an interrupt vector, an autovector (default vector) capability is provided.

MCore processors support two independent interrupt requests: a normal interrupt and a higher-priority fast interrupt. The fast interrupt request uses a dedicated set of shadow registers that eliminates having to preserve the processor's context on the stack before the interr upt handler executes. Software can reserve the alternate register file for exclusive use by interrupt handlers. This enables support of extremely low-latency interrupts, and it makes real-time processing possible.

MCore's hardware accelerator interface supports tightly coupled hardware function blocks that extend the MCore architecture. For flexibility, the interface is generic in nature and makes few assumptions about the actual processing being accelerated. The HAI operates independently of the memory and peripheral interfaces to allow overlapped execution. A base set of instruction primitives allows the explicit transfer of operands and instructions to and from external function blocks. Hardware handshaking can control the rate of the instruction and data transfers. The function blocks are tailored to boost processing for application-specific purposes. For example, such a block might act as a DSP arithmetic unit or a graphics accelerator; another block might handle speech processing or handwriting r ecognition.

Small Die, Big Performance

Initial MCore processors use supply voltages ranging from 1.8 to 3.6 volts. The chips operate at 50 MHz. The "sedate" clock rate dramatically lowers power consumption -- critical for a hand-held device. At 50 MHz, MCore delivers 48 Dhrystone 2.1 MIPS yet consumes only 20.5 milliwatts. The inexpensive packaging, support for 16-bit memory devices, and low power consumption, combined with high performance, make the MCore processors attractive for the cost-sensitive consumer and embedded-control markets. It also provides a migration path for existing 8-bit and 16-bit controller applications.


MCore Microarchitecture

illustration_link (28 Kbytes)

This processor offers performance features such as hardware multiply/divide and registers for low-latency interrupt handling.


MCore Registers

illustration_link (22 Kbytes)

The alternate register file provides for low-overhead context switching in real-time processing.


Bill Moyer ( billm@sandbox.sps.mot.com ) is a principal architect and systems designer for Motorola. John Arends ( john_arends@email.sps.mot.com ) has worked on Motorola's RISC processor designs and implementations.

Up to the Core Technologies section contentsGo to previous article: Go to next article: Writing JavaScript Applications
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network