nes. Thus, you now can create high-quality MPEG videos for presentations, training videos, and communications on the Internet.
CLM4111 Architecture
The CLM4111 performs all the processing required to turn uncompressed digital video from a video-capture device into an MPEG-1 video data stream. To provide this capability at low cost, C-Cube took a programmable approach, enhancing a 32-bit RISC core, the VideoRISC CPU, with special instructions and hardware coprocessors, as shown in the figure
"The CLM4111 Microarchitecture."
The VideoRISC CPU operates at 80 MHz with single-cycle instruction execution. Integrated 1-KB instruction and data caches provide single-cycle access to data for both the VideoRISC CPU and Motion Estimator (ME) coprocessor to reduce stalls.
Besides the standard operations, the VideoRISC instruction set includes single instruction/multiple data (SIMD) instructions. These SIMD instructions perform operat
ions on multiple pixels, and C-Cube widened the CPU's internal data paths and registers to 36 bits to support processing four 8-bit pixels in parallel. The SIMD instructions execute on a dedicated Video DSP (digital signal processor) ALU. Video DSP instructions accelerate filtering, discrete-cosine-transform (DCT) calculations, and the image analysis required for MPEG compression.
Coprocessors and I/O
The CLM4111 has several on-chip coprocessors that boost the data-encoding rate. Operating on 8- by 8-byte data arrays, the ME coprocessor performs the repeated block matches required to find the best motion vector when coding MPEG B and P frames.
The variable-length-coder (VLC) coprocessor performs the final lossless compression stage of the MPEG algorithm. The VLC does a zigzag scan of each data block to order the data for maximum compressibility, followed by run-length and Huffman encoding. The VLC uses ROM-based lookup tables to implement MPEG-1, MPEG-2, and H.261 encoding schemes
. The variable-length-decoder (VLD) coprocessor performs the reverse processing necessary to decode an MPEG data stream.
To keep the large amounts of video data flowing through the processor, the CLM4111 has four I/O interfaces. Two 8-/16-bit video interfaces handle digital video data, and each one of these has a 32- by 32-bit first-in/first-out (FIFO) buffer to hold the data. The video input port can interface directly to CCIR-601 signal-compatible video decoders. The data output port can transfer data to the host application asynchronously.
The third interface, a 16-/32-bit host port, is used to initialize the processor and download microcode. The CLM4111 has a message-based API that controls the MPEG encoder and its compression parameters. The host system issues the commands through this interface.
The last interface is a 32-bit DRAM interface that connects to 2 MB of fast page-mode DRAM. The CLM4111 uses this DRAM as a scratchpad to store temporary data during the encoding process. This interf
ace generates the signals necessary to drive the DRAM bank, thus reducing parts in a design. A seven-channel on-chip DMA controller prevents congestion by managing data transfers.
Making MPEG
A description of how the CLM4111 encodes video data shows the complex processing required to achieve real-time MPEG encoding. In the capture stage, the processor receives uncompressed video frames through the video input port. The input interface's hardware performs subsample filtering and 2-to-1 horizontal scaling of the data. The DMA controller transfers the processed data from the input FIFO buffer to a buffer in DRAM.
In the preprocessing stage, the DMA controller transfers frame data to the VideoRISC data cache. The VideoRISC CPU and Video DSP perform additional filtering, scaling to Quarter Source Input Format (QSIF) resolution, and image analysis to determine rate-control settings (i.e., how to encode the current frame with the highest quality). The resulting data is transferred back t
o DRAM for use in the subsequent encoding stage.
Target and search data are also transferred to the data cache for processing by the ME coprocessor. The resulting best-match information is stored by the VideoRISC CPU in DRAM.
The filtered video data, the ME's motion vectors, and the preprocessing stage's coding instructions are now available for the MPEG-encoding stage. The Video DSP performs the motion compensation, DCT, and quantization on each 8- by 8-byte data block. These results are transferred to the VLC. The VLC unit performs the run-length and Huffman coding of the data, with the output again transferred by the DMA controller to DRAM. In the output stage, the CLM4111 transfers the compressed data frames to the host through the video output port, again under DMA control.
Microcode Machinations
The CLM4111's microcode manages the overall operation of the processor and implements C-Cube's MPEG compression algorithm. The microcode provides the flexibility to handle differ
ent video-frame sizes (SIF and QSIF resolution) and rates (NTSC and PAL). The microcode implements the MPEG algorithm's "smarts," making on-the-fly decisions about how to best encode each frame. The microcode is also responsible for managing the data flow through the processor, such as scheduling DMA transfers and coprocessor execution.
The CLM4111's internal architecture lets the microcode implement a software pipeline. Because most of the MPEG-encoding stages use different on-chip resources, the microcode arranges them to execute in parallel, boosting throughput. Finally, the microcode can program the CLM4111 to perform different functions, depending on the targeted market.
CLM4111 System Design
The CLM4111 is a 3.3-V part, using a three-metal-layer, 0.5-micron process. It's housed in a 208-pin Plastic Quad Flat Package (PQFP). It dissipates only 1 W at 80 MHz. To help OEMs incorporate the CLM4111 into their products, C-Cube provides a reference design, as shown in the figure
"The CLM4111 Reference Schematic."
Sincee it has a generic host interface and outputs fully compressed MPEG video data, you can connect the CLM4111 to ISA or PCI buses or integrate it on a graphics accelerator board.
The CLM4111 costs $75 in production quantities (i.e., lots of 5000). The reference schematic's estimated bill of materials is $175. These costs are comparable to those of a quality graphics card. The CLM4111's capabilities and price point make it attractive as an add-on for vendors wishing to differentiate their systems. Because the CLM4111 lets you capture, edit, store, and communicate with digital video, MPEG becomes an active data type and useful to everyone. The PC finally becomes an active -- not passive -- tool for video work.
Where to Find
C-Cube Microsystems
Milpitas, CA
Phone: (408) 944-6300
Fax: (408) 944-6314
Internet:
http://www.c-cubed.com/