stream requires lots of processing power. Until recently, you needed costly multiprocessor arrays or custom hardware to achieve MPEG-2 encoding and editing in real time.
A low-cost processor from C-Cube Microsystems, the DV
x
, changes the situation. It is a 0.35-micron, 3.3-V part that contains 5.4 million transistors, packaged in a 352-pin ball-grid array (BGA). While the DV
x
operates at a modest 100 MHz, it performs professional-quality, real-time MPEG-2 encoding using only one-fourth the data rate of today's DV and M-JPEG video formats.
This lets a PC capture, encode, and store digital video on its standard hard disk or a recordable DVD, rather than use a dedicated disk array. Because the DV
x
combines MPEG-2 encoding/decoding and
video-effects functions on a single chip, it makes MPEG-based, frame-accurate video editing available to the serious consumer for the first time.
DV
x
Architecture
The DV
x
architecture is based on the experience obtained from three previous encoder generations. Internally, the DV
x
consists of several semi-independent units, as shown in the figure
"The DV
x
Microarchitecture."
A SPARC RISC core performs high-level processing, complemented by motion estimation (ME) and video digital signal processor (DSP) units that handle compute-intensive, low-level processing. All these parts work concurrently to perform the operations required for real-time encoding.
The core acts as a microcontroller and operates at 80 MIPS. This software-based architecture lets you add new features or correct bugs without changing the hardware. A 16-KB instruction cache ensures that no cache misses occur in major processing loops. The DV
x
has a
n on-chip 8-KB data memory that's managed by overlapped software-controlled DMA transfers. This replaces the traditional data cache to guarantee real-time performance.
The video DSP is a high-level coprocessor extending the SPARC instruction set to include image-processing and encoding operations. Its nearly autonomous operation lets the DV
x
use a less complex and smaller single-scalar core. The video DSP coprocessor consists of a DMA unit and a DSP unit, each connected to a double-buffered working memory composed of two banks of 4 KB each. At any given moment, the DMA unit is both loading new operands into one memory bank and storing prior results from it, while the DSP unit processes data in the other bank.
When the DMA and DSP units complete their tasks, the roles of the two banks are reversed. This lets video DSP operations overlap with the synchronous DRAM (SDRAM) data transfers necessary to sustain their throughput.
DMA-unit instructions load and store rectangular subsection
s (i.e., strips) of an image between working memory and the external SDRAMs. One strip-load instruction implements the various flavors of motion compensation defined in the MPEG standard. The DMA unit converts motion vectors generated by the ME unit into image-strip addresses, while the SDRAM controller performs alignment and subpixel interpolation on the reference data.
Image Encoding and Performance
MPEG-2 encoding works by examining a succession of images (or frames) and removing redundant information from them (e.g., the blank wall in a scene's background can be stored once and reused in subsequent frames until the scene's point of view changes). This requires the DV
x
to have a high-throughput, robust ME mechanism to determine what image information has changed between frames.
A list of commands -- generated by the core and stored in SDRAM -- controls the programmable ME search engine. The engine fetches search commands from memory and writes the results back into it. As e
ach command executes, the ME unit loads the appropriate target and reference image data from SDRAM into its on-chip target and reference window memories. These memories are double-buffered to allow the next target's SDRAM accesses to overlap with the search for the current target.
After all the search commands have been processed, an interrupt notifies the core. The command results might generate more search commands for the next level in a hierarchical search or perform motion compensation in the video DSP. Although the ME unit off-loads much of the burden from the core, microcode retains full control of the critical search parameters. This gives the flexibility of a CPU-controlled search engine, but with the performance of a hard-wired engine.
To encode high-resolution formats such as HDTV, multiple DV
x
processors can operate in parallel to divvy up the processing task. Previously, video-processing chips were interconnected by a globally shared bus. However, as the number of chips inc
reases on the shared bus, it reaches the limit of the bus's bandwidth. This prevents further scaling of performance.
Instead, the DV
x
uses a point-to-point architecture that scales directly with the number of chips. The DV
x
chip's interprocess communications (IPC) channels can be interconnected to build multiprocessor arrays, as shown in the figure
"Point-to-Point Communications"
. Through the IPC ports, multiple DV
x
chips coordinate processing operations so as to encode all proposed digital HDTV formats. Two DV
x
chips can encode the 525P format, and eight to 10 chips are necessary to encode an HDTV 1080I format. (It takes only two DV
x
chips to decode all HDTV video formats.)
System Configuration
The DV
x
provides glueless interfaces to several of the PC's subsystems. It has a 32-bit PCI host bus interface (revision 2.1- compliant), a programmable CCIR-656 (parallel D1) video interface, and an eight-chan
nel I2S-compatible audio I/O interface. The DV
x
uses 8 MB of SDRAM, comprised of four 16-Mb parts. Because you don't need external first-in/first-out (FIFO) buffers and other logic, the DV
x
further reduces the cost of adding the chip to a PC. To make systems capable of encoding and manipulating HDTV video formats, you simply add the DV
x
chips you need, depending on the target audience. With recordable DVD, a DV
x
PC offers professional-quality MPEG-2 video recording and authoring at entry-level prices.
illustration_link (48 Kbytes)

The various units operate concurrently to c
apture and encode digital video on the fly.
illustration_link (35 Kbytes)

A specialized bus lets processors work in parallel to encode HDTV video formats.
Les Kohn (
editors@bix.com
) is chief architect of C-Cube Microsystems' DV
x
family of processors. Greg Efland (
editors@bix.com
) is the chief architect of the DV
x
.