This new implementation renews a proven embedded architecture
Joe Circello
Motorola's 68000 family of microprocessors has served both the computer and the embedded markets well. Now the PowerPC has created an opportunity for the 68000 family to refocus entirely on embedded systems, making it possible to redefine the architecture to achieve dramatic improvements in both cost and performance relative to the older 68000-family designs. A new architecture, called ColdFire, is the result of such a refocus and represents an approach targeted at the emerging needs of advanced consumer-electronics applications.
ColdFire's designers set several requirements for the architecture they would create for this new class of cost-sensitive embedded
applications. Obviously, they demanded a low-cost architecture, most of which they achieved by using a small core size. A small die also let them integrate on-chip memories, system modules, and peripherals cost-effectively. ColdFire offers abundant processing power, so it can tackle computer-intensive jobs while consuming relatively little electrical power.
Finally, ColdFire employs a high-density ISA (instruction-set architecture), especially important because in many embedded systems the memory subsystem's cost far exceeds the processor's cost. A high-density ISA minimizes the application's storage requirements, which thus reduces overall system cost. The original 68000 processor ISA provided the starting point for ColdFire's ISA. Like the 68000 ISA, ColdFire defines a variable-length ISA to achieve optimum code density. This is accomplished in a RISC-based implementation that provides a very efficient silicon design.
Changes in Instruction
Other important changes were
made to the 68000 ISA instruction set while still maintaining the original programming model. Certain operations had either reduced support or were eliminated, which makes for a simpler and smaller core. Examples of the changes to the instruction set are reduced support for byte- and word-size operands, reduced support for RMW (read-modify-write) instructions, and removal of instructions used primarily by desktop applications, such as the trap on overflow exception, and BCD (binary-coded decimal) arithmetic.
Let's look at these changes in more detail: For byte- and word-size operands, the instructions supporting arithmetic and logical operations were removed. However, ColdFire keeps those op codes performing simple assignments (e.g.,
move
) and the Test and Clear functions. While support for RMW operations was reduced, ColdFire retains the op codes performing arithmetic and logical functions using a program-visible register and memory. A number of instructions, including those involving BCD op
erands, rotate op codes, and integer divides, were simply deleted. The Divide instruction was eliminated because the transistor count needed to support these op codes could not be justified. A software Divide routine has been developed that actually uses less machine cycles than does the 68000 for most operands.
A number of extensions were made to the original 68000 ISA when the 68020 microprocessor was introduced. The ColdFire architecture implements several important instructions from these additions, including a 32- by 32-bit integer multiply that produces a 32-bit result, a complete set of register sign-extension instructions, scale factors (x1, x2, x4) for indexed addressing modes, and multiple-word NOP (No Operation) instructions. Compilers use the latter to remove branch instructions.
Taking Exception
In addition to these areas of instruction-set simplification, the ColdFire exception processing model is streamlined. The architecture defines a single 8-byte frame
created for all exception types on a self-aligning system stack (i.e., the stack pointer automatically compensates for misaligned data before creating an exception frame). After ColdFire creates the stack frame, it fetches an exception vector from a 1024-byte table that defines the location of the first instruction of the service routine. Thus, the processing of system calls and external interrupts remains exactly compatible with previous 68000-family designs. As a result of these simplifications, exception processing times are very fast. For most exceptions, the time from the faulting instruction until the first instruction in the service routine is a mere 12 cycles.
The resulting ColdFire ISA then represents a balance between the core size and code expansion, while retaining the 68000-family programming model with its powerful set of basic addressing modes. The static size of embedded applications in the ColdFire ISA is typically 20 percent to 40 percent less than fixed-length instruction sets. In re
lation to its predecessor, the ColdFire ISA produces object images that are considerably smaller than 68000 object images, but not as compact as objects targeted for the 68040.
A Tale of Two Pipelines
The hardware implementation of the ColdFire architecture uses a synthesis-driven, tools-based design philosophy. This allows the addition of optional hardware modules that provide custom functions and tune the processor's performance. It also provides design independence across different process technologies that target a range of operating frequencies and voltages. Finally, this approach also produces quick design cycles.
Two decoupled pipelines implement the ColdFire processor core: an IFP (Instruction Fetch Pipeline) and an OEP (Operand Execution Pipeline). A 12-byte FIFO (first-in/first-out) instruction buffer decouples the two pipelines (see the figure "
ColdFire Processor Block Diagram
"). Note that the core features a non-Harvard implementation to
minimize die size and bus complexity. Studies indicate a full Harvard architecture provides only a minimal improvement in performance.
As the figure shows, the IFP itself consists of two stages, an IAG (Instruction Address Generation) stage and an IC (Instruction Fetch Cycle) stage. The OEP also consists of two stages, each of which can perform multiple functions, depending on the instruction type. The first stage of the OEP is the DSOC (Decode and Select/Operand Fetch Cycle), and the second stage is the operand AGEX (Address Generation/Execute Cycle).
The IFP calculates the next instruction address and then fetches 32 bits of instruction data using the single-cycle processor/memory bus. Typically, the processor is connected to an on-chip memory, either in the form of a RAM/ROM structure or a unified cache. As the fetched instruction enters the processor, it is loaded into the FIFO instruction buffer. If the OEP is waiting for instruction data, the prefetched instruction is also gated directly
into its instruction registers. The connection between the two pipelines is a 48-bit interface, a ColdFire instruction's maximum size. The ColdFire architecture's variable-length instructions include a 16-bit op code, an optional 16-bit extension word 1, and an optional 16-bit extension word 2. The IFP connected to the FIFO instruction buffer provides a very efficient mechanism for loading the variable-length ColdFire instructions into the OEP with a minimum of idle cycles.
The OEP is based on the traditional RISC compute engine structure with a dual read-ported register file feeding an ALU. Register-to-register instructions are executed in a single pipeline cycle with the operands fetched during the OC (Operand Fetch Cycle) phase of the OEP pipeline, and the actual execution is performed in the EX (execute) phase of the OEP pipeline.
The ColdFire ISA is not a pure load/store architecture, so there are numerous compound instructions that combine a load operation with some type of arithmetic or
logical operation. These "embedded-load" instructions essentially pass through the OEP twice. This type of instruction begins by selecting the components needed to form the operand address in the DS (decode and select) phase of the OEP's first stage. Next, the ALU sums the components to form the operand address during the AG (address generation) phase in the pipeline's second stage. During the third cycle, memory is read and the desired operand returned to the core. At the same time, any required register operand is fetched during the OC phase in the OEP pipeline. Finally, the instruction is actually executed in the ALU during the EX phase in the OEP pipeline. Register store operations perform both functions (DS + OC, and AG + EX) simultaneously in each stage of the OEP to execute the instruction in a single cycle.
The results of the ColdFire design can be seen in the table above. It compares today's 68000 design (the 68EC000), the latest 68040 design, and a possible ColdFire implementation. The ColdFi
re architecture provides 68040 levels of performance at a given frequency in a core size smaller than the original 68000 design. The RISC-based implementation approach provides higher operating frequencies while still maintaining the advantages of a variable-length ISA. For cost-driven embedded systems, this variable-length ISA can provide substantial benefits over a fixed-length approach. Additionally, this new architecture maintains compatibility with the substantial 68000-family embedded development tool sets and preserves the knowledge base of engineers and programmers.
COMPARING COLDFIRE TO OTHER 68000 DESIGNS
68EC000 68040V COLDFIRE
Process technology 0.8 0.5 0.5
3.3 V, DLM 3.3 V, TLM 3.3 V, TLM
Core size (sq mm) 11.8 18.4 4.4
Frequency (MHz) 16.67 25 50
On-c
hip cache None 4 KB instruction,
4 KB data 4 KB unified
External bus (bits) 16 32 32
Performance
Embedded code 1.0x 11.6x 20.2x
Dhrystone MIPS 2.1 24.6 44.3
MIPS/watt 42 36 197
illustration_link (14 Kbytes)
To improve performance, the ColdFire processor uses two pipelines. Note that the OEP's output is routed in such a way that compound instructions can pass through this pipeline twice.
Joe Circello is an advanced microprocessor architect for Motorola's High Performance Embedded Systems Division. You can reach him on the Internet at
circello@oakhill.sps.mot.com
or on BIX c/o "editors."