Videoconferencing systems from different vendors can now talk to each other, thanks to a standard called H.32x
Andrew W. Davis
At the 1964 New York World's Fair, AT&T showed the videophone, which delivered voice and video over standard telephone lines. Thirty years later, the newest videoconferencing products from AT&T Microelectronics still use standard telephone lines. But there's a difference.
Videoconferencing has finally shed its proprietary shackles. In 1964, AT&T was the last company to lose sleep over interoperability concerns: The original videophone had to communicate only with other AT&T products. But today, deregulation is the rule and interoperability is a serious issue. AT&T's latest-generation conferencing system can communicate with systems from other vendors. The AT&T system and competitors from Intel and PictureTel
are an outgrowth of the sometimes acrimonious interoperability battles of the past two years.
But the battles are ending and the standards war is nearly won: By the end of the year, the International Telecommunications Union's H.324 standard should be formalized. H.324 defines how videoconferencing should work over plain old telephone service (POTS) lines and complements H.320, which does the same for ISDN. These standards could bring videoconferencing to general business applications.
Person to Person
Today's desktop videoconferencing (DVC) systems link the computers of collaborating workers (see the sidebars "In Your Face" and "Face Off"). This is a fundamental difference from group videoconferencing, where participants gather in rooms that are connected to other videoconferencing centers. DVC systems keep collaborators in touch with important data that's accessible from their desktop systems.
According to a 1994 survey by Forward Concepts, a market researcher bas
ed in Tempe, Arizona, DVC most commonly connects workers within the same company. To date, most collaborative users have been in aircraft, automotive, and other industries with intensive engineering efforts that require visual communication among numerous workgroups. Intercompany videoconferencing requires standards to solve interoperability and networking issues.
Visions of ubiquitous videoconferencing existed even before the videophone turned heads at the World's Fair. The breakup of the standards logjam in recent months is cause for optimism over the technology's commercial potential. The ITU ratified H.320 in 1990 to cover switched digital networks. H.320 covers a wide range of network bandwidths. Audio signals can range from 16 to 64 Kbps. The video specification spans from one 64-Kbps ISDN link to 30 inverse-multiplexed 64-Kbps ISDN links.
But many vendors resisted H.320 and claimed their own algorithms offered better video quality. H.320 gained widespread acceptance last spring once the Int
el-led Personal Conferencing Working Group backed off from its competing proposal based on Intel's Indeo compression family.
However, H.320 is tied to ISDN, which is excellent technology for sending video but remains mired in rollout problems (see "Implementing ISDN," April BYTE). Attention now is moving to the other networks that connect PCs and workstations. H.324 is significant because it will eventually use a newer compression algorithm than that of H.320 and because it leverages the latest price and performance advances in silicon to promise high levels of audio and video quality. Like H.320, H.324 covers audio, video, and call-control procedures (see the figure
"What's Under the H.32x Umbrella"
).
Both H.320 and H.324 are umbrella standards; they don't define technologies in themselves but specify the collection of ITU standards for digital and analog networks. In addition to POTS, the ITU is developing new recommendations for guaranteed-bandwidth packet-switched netwo
rks, such as IsoEthernet, and non-guaranteed packet-switched networks, such as ordinary Ethernet.
The Big Squeeze
The essential element in both standards is the H.261 video codec specification, a video-compression algorithm designed specifically for videoconferencing. Like MPEG, H.261 compresses images using discrete cosine transform (DCT). H.261 allows systems to fully encode certain key frames and encode only the differences among other frames. The main elements of the H.261 source coder are prediction, block transformation (spatial to frequency domain translation), quantization, and entropy coding. Here's how it works.
H.261 divides images into 8- by 8-pixel blocks and into macroblocks consisting of four luminance blocks and two corresponding chrominance blocks. The H.261 encoder starts by compressing and quantizing data to form
intramode
blocks; an intramode macroblock must be transmitted at least once every 132 frames in a process known as
forced updating
. The encoder also decodes results and subtracts the resulting image block (which is what the receiver sees) from the input video. If the differences are small, the block is not transmitted.
If there has been sufficient change, the encoder transforms, quantizes, entropy-encodes, and transmits the differences. The definition of "sufficient change" can vary from block to block. Change typically results from motion, but transmission noise can cause the video system to falsely infer change. As a result, commercial H.261 products can offer video-signal noise filters to differentiate themselves from the competition, although these filters are outside the scope of the standard.
Macroblocks carry a flag to indicate whether they are predicted or intraframe macroblocks and a second flag to indicate whether the data should be transmitted or not. The criteria for choice of mode and for transmitting a block are not detailed by the recommendation and may be varied dynamically as part of the control strateg
y.
The standard requires H.261 devices to encode only the difference between a frame and the previous frame. Vendors can provide higher video quality and faster transmissions with optional motion-compensation and loop-filtering capabilities. Motion compensation analyzes macroblocks to identify a group in the previous frame that best matches a group in the current frame. The system then codes the difference along with a vector that describes the offset.
Unlike JPEG and MPEG, which are resolution- and image-size independent, H.261 specifies two image sizes. Common interchange format (CIF) is 352 pixels by 288 pixels. Quarter CIF (QCIF) is 176 pixels by 144 pixels. Like MPEG, H.261 uses prediction and motion estimation to reduce temporal redundancy, but it takes a different approach. MPEG maintains picture quality with maximum compression; H.261 minimizes encoding and decoding delay while achieving a fixed data rate.
What does this mean to videoconferencing users? H.261 makes a trade-off betwee
n frame rate and picture quality. As the motion content of the images increases, the codec has to do more computations and usually has to give up on image quality to maintain frame rate (or vice versa).
For an H.261 subsystem to meet the peak range of the standard -- 30 frames per second with full-motion estimation and loop filtering -- it must execute approximately 8 billion operations per second. Most of this is for the optional motion estimation. However, designers can reduce operations at the expense of picture quality. For example, by applying a motion-estimation algorithm, codec designers can bring the system requirements down to about 1.5 billion operations per second (and thus let users do their videoconferencing on less-expensive hardware). Designers could also limit the video to 15 fps, which is a limit that ISDN bandwidths may impose.
Dedicated H.261 processors have reduced the complexity, development cycle, and costs of videoconferencing systems while providing better quality. The newe
r devices, such as AT&T's AVP III, support not only H.261 but an enhanced version (called H.263) and MPEG on the same chip.
Motion Slickness
H.263, which is backward-compatible with H.261, offers improved picture quality by using a half-pixel new-motion estimation scheme rather than H.261's integer-estimation approach. The half-pixel technique is noticeably better at predicting changes to low-resolution images than H.261's motion-estimation technique is. Also, the Huffman coding table used in H.263 is optimized for low-bit-rate transmissions and provides superior imagery at 28.8 Kbps, the speed of high-end modems. The ITU is considering incorporating H.263 support into the H.320 standard.
H.263 allows (but does not require) implementation of predictive frames as well as the I (DCT-coded) frames in the codec. This is similar to MPEG's approach. While predictive frames stress the computational load and increase the frame delay, they also add quality to the video stream by ra
ising the frame rate. These hooks give the standards room to grow. Developers can maintain compatibility and interoperability, while the quality of audio and video can improve with improvements in silicon price and gains in performance.
The rise of videoconferencing standards means vendors must find ways of differentiating themselves without offering proprietary technologies. Opportunities to add value come not from designing proprietary codecs but from applying pre-filters and post-filters to the video stream to make the codec more efficient and raise the quality of the decoded picture. Vendors can also add echo cancellation and sound mixing to the audio streams, or they can create API-smart software.
Now that vendors don't have to fight interoperability battles, they can concentrate on price and performance. We're already seeing positive signs. Prices of H.320-compliant videoconferencing kits have fallen from $6000 to less than $2000 in the past two years, while audio and video quality have impr
oved noticeably. These kits benefit from a new generation of low-cost codec chips with higher processing power and support for multiple audio- and video-compression algorithms. Some computer makers now sell systems that come with conferencing hardware, so users can buy configured systems and avoid installation hassles.
Visual Support
Standards and new chips may help DVC evolve into an essential tool for business. Some vendors envision videoconferencing as an embedded technology. Already, one manufacturer of printing presses is showing a videoconferencing component that links printshop workers with technical staff, so they can get help if the press goes down. When talking heads venture from the Future Pavilion of the World's Fair to the nuts-and-bolts world of a shop floor, videoconferencing will be ubiquitous.
ACKNOWLEDGEMENT
Richard Schaphorst
, president of Delta Information Systems (
Horaham, PA) and an ITU official for Very Low Bitrate Visual Telephony, provided technical assistance for this article.
WHERE TO FIND
Apple Computer
Cupertino, CA
(408) 996-1010
AT&T Global Information Systems
Dayton, OH
(513) 445-5000
AT&T Microelectronics
Allentown, PA
(610) 712-6011
fax: (610) 712-5514
Creative Labs
Milpitas, CA
(408) 428-6600
fax: (408) 428-6611
Crosswise
Santa Cruz, CA
(408) 459-9060
fax: (408) 426-3859
DataBeam
Lexington, KY
(606) 245-3500
fax: (606) 245-3528
Datapoint
San Antonio, TX
(210) 593-7866
fax: (210) 593-7518
FutureLabs
Los Altos, CA
(415) 254-9000
fax: (415) 254-9010
IBM
Armonk, NY
(800) 426-4968
fax: (800) 426-4329
InSoft
Mechanicsburg, PA
(717) 730-9501
fax: (717) 730-9504
Intel
Hillsboro, OR
(503) 264-7354
fax: (503) 264-6835
Intelligence at Large
Philadelphia, PA
(215) 387-6002
fax: (215) 387-9215
InVision Systems
Tulsa, OK
(918) 584-7772
fax: (918) 584-7775
PictureTel
Danvers, MA
(508) 762-5000
fax: (508) 762-5245
Sagem USA
Cupertino, CA
(408) 446-8690
Target Technologies
Wilmington, NC
(910) 395-6100
fax: (910) 395-6108
VCC
Aliso Viejo, CA
(714) 452-0800
fax: (714) 581-9271
Viewpoint Systems
Dallas, TX
(214) 488-7100
fax: (214) 243-0635
Vivo Software
Waltham, MA
(617) 899-8900
fax: (617) 899-1400
Zydacron
Manchester, NH
(603) 647-1000
fax: (603) 647-9470
illustration_link (20 Kbytes
)

The H.320 and H.324 recommendations for videoconferencing terminals
(center area)
specify the means for audio and video coding, call control, and interfacing to data equipment. The standards include base-level requirements as well as optional specifications.
Value-added opportunities
(on the left)
such as sound mixing, noise filtering, and video smoothing may be implemented by vendors to improve quality, but they are outside the ITU recommendations.
Andrew W. Davis is president of the Wainhouse Consulting Group (Southborough, MA), which provides research, planning, and marketing services. He can be reached at
andrewwd@wainhouse.ultranet.com
.