Thanks to new hardware and software technologies, accelerated motion-video playback is no longer a premium, and MPEG is on the move
Stanford Diehl and Greg Loveria
A very short time ago, accelerated playback of digital-video files was a value-added feature that differentiated the commodity market for Windows graphics cards. A number of technological and market developments promise to drive motion-video playback to the mass market. Given the power of today's mainstream hardware, video bandwidth can now be negotiated across a local bus instead of the slower system bus, and high-end CPUs can crunch more sophisticated decompression algorithms. Video-enhanced titles for training, reference, education, and entertainment are in high demand. And virtually every graphics-chip vendor, enabled by Microsoft Windows' DCI (Display Control Int
erface), has announced a graphics architecture to support full-motion playback of digital video.
Digital-Video Playback
A number of factors affect video playback quality under Windows. The first is frame rate, measured in frames per second. To ensure quality, the video must be captured at an acceptable frame rate. The standard for TV-quality, full-motion video is 30 fps.
Video for Windows will drop frames to match the capability of the playback hardware, producing a fluid or jerky motion depending on the system it's played back on. The number of colors the sequence was captured at also affects quality because more data flows across the video adapter's data bus. If the video sequence was captured at 24-bit color depth, you have three times more data to move across the display bus than with an 8-bit (256 colors) video clip.
An uncompressed 24-bit video file, recorded at 640- by 480-pixel resolution and at 30 fps, would require a throughput rate of over 26 MBps. Clearl
y, the video data must be compressed. Compression not only allows more video to be stored on your computer's hard drive, it also lowers the bandwidth requirements for video playback.
While compression algorithms significantly reduce bandwidth requirements, they demand intensive computational resources. Dedicated hardware has been required to decompress the video data at an acceptable rate while the host CPU took care of other chores, such as color space conversion (converting the video data from the compressed YUV format used for motion video to the RGB format necessary for display on computer monitors) and video scaling (scaling algorithms help maintain video quality when the video window is stretched beyond the captured size).
Video Playback's First Pass
Digital-video boards for Windows have been available for quite some time, but they have been expensive and difficult to install and use. Sigma Designs was the first company to successfully bring digital-video playback to
a mainstream audience with its
RealMagic board
(dubbed ReelMagic at the time). RealMagic provided hardware-based decompression of MPEG video files. Sigma employed C-Cube's CL450 video processor along with its own proprietary video-acceleration chip (called Piccolo) to perform all pixel interpolation, line doubling, smoothing, and scaling algorithms.
RealMagic proved that there was a market for MPEG decompression boards, even at a time when few MPEG titles were available, but the board suffered from some limitations. RealMagic had no on-board VGA. Digital-video boards have typically relied on VGA pass-through, routing the VGA signal across a feature connector. Given the bandwidth limitations of a standard VGA feature connector, the graphics subsystem is confined to a maximum 640- by 480-pixel screen resolution. The feature-connector architecture has also been plagued by performance and compatibility problems.
Color shifts and shimmering problems afflict some models of VL-Bus c
ards when connected to RealMagic through the VGA feature connector. According to Sigma, this problem is caused by the way the VESA (Video Electronics Standards Association) pass-through specification delivers MPEG 1 video's 15-bit color depth (the same color depth as NTSC TV) to a VL-Bus display adapter working in higher color modes.
Sigma recently announced RealMagic Rave, its MPEG playback adapter with an on-board graphics accelerator. It will continue to offer RealMagic Lite as well, but if you opt for the feature-connector solution, you should call Sigma first and make sure your VL-Bus adapter works correctly with RealMagic. Despite the limitations, RealMagic continues to be a driving force in pushing MPEG 1 as a major digital-video standard. With a compatible graphics adapter, the quality of video is outstanding. However, RealMagic does not accelerate more common software codecs such as Microsoft's Video 1, Intel Indeo, and SuperMac Cinepak.
VideoLogic's AVI Accelerator
One of the first single-board solutions for graphics and video acceleration is
VideoLogic's 928Movie
. The card uses S3's 86C928 graphics accelerator with 32-bit memory interleaving. The primary purpose of the 928Movie is to accelerate motion-video playback of Indeo, Cinepak and Microsoft Video 1 digital-video files.
The 928Movie uses VideoLogic's custom PowerPlay32 Digital Movie Accelerator ASIC (Application-Specific IC) and SmoothScale algorithm for YUV-to-RGB color space conversion and video scaling. The result is excellent full-motion playback even for video stretched beyond a 320- by 240-pixel window. Subjectively, though, we found that the AVI clips--even with the 928Movie's help--did not approach the quality of MPEG digital video. The motion is smooth, but the picture quality is somewhat blocky because of codec limitations. An announced upgrade to the Indeo codec may help.
The 928Movie also hosts a VMC (VESA Media Channel) architecture. The VMC provides an optimized d
ata path for passing video data to other video components, such as capture cards, codec accelerators, or scan converters. By using the VMC, video components avoid passing data across the host system bus. Unfortunately, the general market has not embraced the VMC.
One of the VMC options VideoLogic offers is a hardware MPEG decoder. The $349 MPEG Player occupies a second slot and, like Sigma's RealMagic, uses C-Cube's CL450 acceleration chip. VideoLogic also employs its own Powerstream ASIC, a video-acceleration chip that works in conjunction with the CL450. Color palette shifts didn't affect video data passed across the VMC, even when we used the same graphics accelerator that caused problems with the RealMagic adapter.
We found the quality of VideoLogic's 928Movie, coupled with the MPEG Player adapter, simply superb. A 928Movie matched with the MPEG Player delivers a unified, expandable solution for accelerating AVI (Audio Video Interleave) and MPEG digital video. VideoLogic is now shipping a PC
I (Peripheral Component Interconnect) adapter, the PCIMovie, with the PowerPlay32 video accelerator, a Weitek P9100 graphics accelerator, and the VMC architecture.
The DCI Interface
Before Intel and Microsoft released the software DCI layer, video accelerators such as the PowerPlay were very limited in what they were able to do. This limitation was not inherent to the chips themselves; video-playback software that could not take advantage of the specialized hardware imposed it.
Before DCI, Video for Windows would use the host CPU for compression and YUV-to-RGB conversion and then pass the RGB data to the video subsystem. Under this scenario, a specialized motion-video chip would get the data only after it was converted to RGB format. The only video function left for it to accelerate was video scaling.
DCI is a low-level interface that allows the video-playback software direct access to hardware-specific capabilities of the video subsystem. DCI coordinates with the W
indows GDI (Graphical Device Interface), allowing the GDI to be bypassed for video playback when appropriate. DCI-compliant applications can check for the presence of specialized video hardware through the hardware's DCI driver. The DCI driver can then directly access the video frame buffer to dramatically improve throughput. With DCI, the video accelerator's driver can instruct the playback software to pass YUV data to it, allowing the video chip to perform color space conversion instead of the host CPU (see the figure ``
Hardware Video Acceleration
'').
Windows Accelerators Do the Video Thing
The DCI design enables a device-independent way for digital-video codecs to access specialized hardware features. DCI promises to drive innovation from both the software end and the hardware end. The graphics subsystem can now request raw YUV data and then process the video data totally within the confines of the graphics architecture. The graphics-chip vendors have respo
nded with a flurry of announcements, heralding optimized motion video and graphics acceleration within a coordinated architecture. Some architectures are already in place but will benefit greatly from the DCI initiative.
Weitek Corp. (Sunnyvale, CA, (408) 738-8400) uses a dedicated chip--the Video Power coprocessor--for video scaling, color space conversion, and dithering (to emulate high-color video in 256-color mode). And yet Weitek also integrates video and graphics acceleration into a single architecture. Unlike earlier feature-connector solutions, the Weitek Power 9100 graphics controller and the Video Power coprocessor share a single video-memory frame buffer (see the figure, ``
Video Architectures
''). The shared frame buffer not only reduces the cost by requiring less video memory, it also enhances performance by passing video data along the frame-buffer bus instead of the system bus.
The Tseng Labs (Newtown, PA, (215) 968-0502) architecture relies on a single frame bu
ffer. A shared frame buffer requires two memory controllers that must negotiate for access to the video memory. Instead of arbitrating the frame buffer between the video processor and graphics accelerator, the Tseng Labs W32p graphics chip uses a ``multiport cache'' design. A fast cache sits on the front end of the W32p. YUV data flows to a Viper entry port of the VGA (Viper is the Tseng Labs video-acceleration chip). The Viper accepts the data, converts and scales it, and then loads it into the multiport cache. All the display data--video and graphics--is then stored in the frame buffer. The single-frame buffer design avoids any latency caused by arbitration between two controllers for buffer access.
Tseng Labs' latest video accelerator, the Viper f/x supports screen resolutions of up to 1024 by 768 pixels by 24-bit color. Tseng Labs has also announced a single-chip solution for graphics and video acceleration. The company will continue to market a dedicated video processor as well, claiming that a de
dicated video accelerator can support a wider range of video formats and functionality. A dedicated processor does not have to make as many size and cost trade-offs as a single-chip architecture, so it can support a wider range of YUV conversions, for instance.
By the same logic, a dedicated processor could support more sophisticated interpolation algorithms than is possible with a single-chip architecture. To scale video beyond native size, video chips must add pixels to enlarge the video window. These pixels can be created by replication (simply replicating an adjacent pixel) or by interpolation (using an algorithm to determine the optimum characteristics of the pixel). Clearly, interpolation is the preferred method, but interpolation algorithms vary widely. At the most basic level, the chip could simply average the color values of two adjacent pixels and create the new pixel with the resulting color value. Very little memory would be required to process this logical operation. But as more sophistica
ted algorithms are employed for pixel interpolation, more memory and chip complexity are required as well. Again, these requirements may exceed the size and cost limitations of a single-chip solution.
Jazz Multimedia's Jakarta board
uses the Tseng Labs chip combination--the Viper video accelerator and the ET4000/W32P graphics accelerator--as a base platform to build a modular video solution. The standard Jakarta board delivers video-playback acceleration, including hardware MPEG decompression and graphics acceleration. Snap-on modules add a TV tuner and NTSC/PAL output. The Jakarta represents a strategy many graphics vendors will adopt: Deliver a standard video-playback solution on the graphics card and add higher-end functionality through modular components. The latest version of MGA Impression Plus starts with a 64-bit graphics accelerator and DCI driver; a snap-on module includes the new 64-bit PowerPlay64 and a VMC connector to support any other VMC-compatible video hardware.
On-the-Fly Video
Alliance Semiconductor (San Jose, CA, (408) 383-4900) takes a similar approach to the Tseng Labs single-buffer design, but the Alliance ProMotion-3210 chip performs scaling and color space conversion as the video data shifts out of memory and to the screen. As the screen is being refreshed, the chip can switch color depth on-the-fly as it scans across the screen, sending 256 colors to the graphical desktop and 24-bit color to a video window. The single-chip solution supports full-motion, 24-bit video acceleration along with 1024- by 768-pixel by 256-color graphics acceleration within a single megabyte of DRAM. Alliance claims that its chip can enable motion-video acceleration for an additional cost of less than $10 per system.
Once again, DCI is the key to this technology. DCI creates a surface in video memory that can be on-screen or off-screen. This surface is an area that the video codec can write to directly. Different vendors take advantage of this ca
pability in different ways. Currently, many implementations perform video scaling and color space conversion before sending the processed data to the frame buffer. Scaling and conversion in real time requires high-speed circuitry that can match the refresh rate of the computer monitor, but overall cost is lower because a small amount of DRAM can be used effectively. In addition, the Alliance chip delivers true high-color video, instead of resorting to dithering to simulate high color in the video window.
All-in-One Chips
The trend is clearly toward integrating all video- and graphics-acceleration components onto a single slab of silicon. S3 (Santa Clara, CA, (408) 980-5400) has introduced the Vision868 (DRAM-based) and Vision968 (video-memory-based) Multimedia accelerators. The Vision series integrates a 64-bit graphics engine, color space conversion, scaling, and dithering on a single chip. The latest version of
Diamond MultiMedia's Stealth 64 VRAM
series wi
ll soon offer an extensible architecture featuring the Vision968. The baseline adapter comes with graphics and video acceleration; add-on modules enrich the architecture with MPEG playback and video capture.
Similarly, Cirrus Logic (Fremont, CA, (510) 623-8300) has announced its MotionVideo Architecture. Cirrus goes even further than S3: The company not only integrates a graphics engine and video accelerator into its CL-GD5440 chip but also packs in a 24-bit DAC (D/A converter) for good measure. Internally, the chip uses a single frame buffer that supports different color depths between video and graphics. The company has also announced an 800- by 600-pixel LCD VGA controller with integrated video acceleration.
Perhaps the most ambitious new architecture is Brooktree's MediaStream (see ``Packetized Multimedia''). MediaStream sends multimedia packets to a specialized DAC that decodes the packets on-the-fly. Brooktree, along with other chip vendors, is already shipping a video-enabled DAC that per
forms on-the-fly color conversion and scaling. These video DACs are pin-compatible with existing DACs; theoretically, a board maker could simply plug in the video DAC to video-enable an existing graphics adapter. In operation, though, the graphics accelerator must be able to let the video DAC know that YUV data is being passed to the DAC for conversion and scaling. Not all graphics chips deliver the signaling requirements to support a pin-compatible video DAC.
And the Winner is . . .
After all these new digital-video solutions come to market, the big winner could be the MPEG video codec. MPEG is generally accepted as a higher-quality codec than Indeo or Cinepak. In fact, MPEG 1 was specifically designed for high-quality playback from a single-speed CD-ROM (150 KBps). Unfortunately, the high compression ratios supported by MPEG (up to 200 to 1) require sophisticated algorithms and, hence, intensive computational resources. The demands of MPEG decompression created the market (that
RealMagic currently owns) for MPEG boards.
But these new video-playback architectures present a clear threat to hardware MPEG decompression boards. With other processor-intensive tasks such as color space conversion and video scaling being off-loaded to mainstream graphics adapters, high-end host CPUs can now handle real-time MPEG decompression. In fact, many of the graphics chip makers plan to ship a software-based MPEG player from Xing Technology (Arroyo Grande, CA, (805) 473-0145) with the new video-enabled graphics accelerators. Consumers will then be able to play MPEG-1 CD-ROM titles without dedicated MPEG hardware.
Commodity Video
By midyear, graphics adapters with accelerated playback of digital video will be in the same commodity market as today's Windows accelerators. This is great news if you appreciate the latest applications and multimedia titles that feature motion-video clips. But it will require more understanding of video technologies. A vendor's claim of
``video accelerated'' will not necessarily translate into high-quality, full-motion video playback. The video tag will be somewhat like the claims of ``all natural'' on supermarket shelves. More than ever, you'll need to do your homework to make sure you're getting what you think you are.
About the Products
RealMagic Lite $349
RealMagic Controller with audio playback $449
RealMagic CD-ROM Upgrade Kit $799
RealMagic Rave (with graphics accelerator) $489
Sigma Designs, Inc.
46501 Landing Pkwy.
Fremont, CA 94538
(800) 845-8086
fax: (510) 770-2640
928Movie $349
2MB VRAM without audio $449
2MB VRAM with audio $549
1MB VRAM Upgrade Kit $100
MPEG Player
$299
PCIMovie $399
VideoLogic, Inc.
245 First Street, Suite 1403
Cambridge, MA 02142
(617) 494-0530
fax: (800) 203-8587
Jazz Jakarta $499
Jazz Multimedia
1040 Richard Ave.
Santa Clara, CA 95050
(408) 727-8900
fax (408) 727-9092
Stealth 64 VRAM $399
4MB VRAM $599
Diamond MultiMedia
1130 East Arques Ave.
Sunnyvale, CA 94086
(408) 736-2000
fax: (408) 730-5750
illustration_link (10 Kbytes)
Before the release of DCI, a specialized video accelerator could only provide scaling services for digital-video clips. Video for Windows required the CPU to perform decompression
and color space conversion, passing RGB data on to the graphics subsystem. A DCI-compliant video codec can check for the presence of video hardware and, if a video accelerator is present, can pass unconverted YUV data directly to the video subsystem for color space conversion and video scaling. With more control over video playback, graphics-chip vendors have devised innovative architectures for efficient video acceleration within Windows.
illustration_link (7 Kbytes)
a) Dual Frame Buffer
In a dual-frame-buffer architecture, the video-acceleration board plugs into the host system's I/O bus and connects to an existing graphics adapter via the feature connector. Each accelerator uses its own video memory and DAC (D/A converter). The feature connector limits screen resolution to 640 by 480 pixels and suffers from incompati
bilities with some board combinations.
b) Shared Frame Buffer
With a shared-frame-buffer interface, the graphics accelerator and video processor share one video-memory buffer, lowering memory requirements. Both accelerators feed the buffer, and each requires its own controller to arbitrate access to video memory.
c) Single Frame Buffer
A single-frame buffer routes converted video data through the graphics controller. All the display data--video and graphics--is then stored in the frame buffer. No buffer arbitration is needed because the graphics controller alone feeds the buffer. The single-frame-buffer architecture also requires only a single communications port to video memory, so inexpensive DRAM can be used instead of dual-ported video memory.
photo_link (38 Kbytes)
Both the Diamond Stealth 64 VRAM an
d Jazz Jakarta start from a baseline graphics engine with integrated motion-video acceleration, and both offer snap-on upgrade modules. The Jakarta includes a Tseng Labs graphics engine, Viper f/x video accelerator, and hardware MPEG decompression on-board. Upgrade modules add a cable tuner and NTSC/PAL output.
photo_link (30 Kbytes)
The Stealth 64 VRAM includes S3's new single-chip graphics/video accelerator. Add-on modules support MPEG decompression and video capture.
photo_link (32 Kbytes)
Two early harbingers of the coming wave of low-cost PC-based video accelerators: Sigma Design's RealMagic MPEG decoder (below) and VideoLogic's 928Movie (above), one of the first cards dedicated to accele
rated playback of AVI files. The 928Movie was also first to implement the VESA Media Channel.
Stanford Diehl is director of BYTE reviews. You can reach him on the Internet or BIX at
sdiehl@bix.com
loveria@bix.com
.