Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesSpeech Enables the Common Desktop PC


December 1997 / Core Technologies / Speech Enables the Common Desktop PC

Powerful CPUs and cheap memory let PCs do speech synthesis and voice recognition using software alone.

Joseph J. Lazzaro

Thirteen years ago, I wrote a review of several speech synthesizers for BYTE ("The Search for Speech," December 1984). At that time, the speech market was a very different place. Prices were steep: You could fork over as much as $4000 for a high-end text-to-speech synthesizer, or as much as $10,000 for a turnkey voice recognition package. Vendors could command such prices because, at that time, speech generation and recognition required expensive, custom hardware. Few speech-technology standards existed.

Now, I'm happy to report that times have definitely changed. Thanks to the standardization of sound hardware and to more powerful computer platforms, software-based speech synthesis and recognition have moved squarely into the mainstream. People can now get powerful speech synthesis and recognition technology at bargain-basement prices. (For example, IBM's Simply Speaking Gold voice recognition and text-t o-speech package costs only $99.) More important, people with visual or learning disabilities can now purchase a computer from a superstore and use it immediately, without purchasing additional hardware.

From Toy to Tool

How did this change come about? For starters, processors got a lot faster over the past decade. A PC with a 133-MHz Pentium, or even a 200- or 233-MHz CPU, is not uncommon. These chips deliver sufficient computing power so that speech processing can be handled by software rather than by hardware. A possible show-stopper to using software for speech operations is that both the speech generation and the speech recognition algorithms require ample memory to store and process wave forms. However, today's low DRAM prices have helped the situation: Many out-of-the-box PCs are tricked out with a basic 32 MB of RAM. It only costs about $200 to double that capacity to 64 MB.

Another crucial change is that a de facto hardware standard has emerged for sound generation and capture on PCs, eliminating the installation and support problems created by a morass of different hardware configurations and drivers. Creative Labs' Sound Blaster card has become the recognized speech and audio standard for the Windows platform. The company claims it has 20 million cards installed worldwide. You can't purchase a PC today without a sound card inside, and chances are it will be a Sound Blaster, or Sound Blaster-compatible. The Sound Blaster comes b undled with its own native text-to-speech and recognition engines in the form of TextAssist and VoiceAssist, respectively. With a high-speed Pentium processor and 32 megabytes of RAM, running speech synthesis or recognition engines concurrently is no longer a daunting task. Furthermore, there's more than enough memory and processing power left over for the OS and applications to run smoothly. Because of this, major speech-technology developers like IBM, Kurzweil, AT&T, and Dragon Systems have migrated from proprietary speech cards toward the ever-present Sound Blaster hardware.

In the API arena, a set of solid standards that support speech for a wide variety of applications is emerging. The Microsoft Speech Application Programmers Interface (SAPI) is a standard programming interface for speech technologies on the Windows platform. SAPI provides support for both voice synthesis and speech recognition. By writing SAPI-compliant code, developers gain the ability to mix and match technologies from any o f the vendors that provide SAPI-compliant speech engines. SAPI is based on the Component Object Model (COM), so it can be accessed from a number of languages and development environments, including Visual C++, Visual Basic, Visual J++, as well as development environments from other vendors that support COM. The point to remember is that SAPI lets you choose the speech engine or product that is most useful for your needs.

Speak to Me

Text-to-speech synthesis engines convert text into the spoken word in real time. Speech engines can take notice of punctuation, capitalization, numbers, even international conventions for time, currency, and date. Numerous speech synthesis engines and products are on the market, with one suited to almost every requirement. Here are some representative examples.

Digital Equipment Corporation has long been a major player in the speech business. DEC offers speech products for different markets and applications. The company supports its proprietary DECtalk hardwar e technology, as well as software solutions. DECtalk Software is a text-to-speech engine that features nine voices and has an unlimited vocabulary. DECtalk can also generate DTMF tones for telephony applications. DECtalk Access 32 is under development for the adaptive technology market; it will be used to produce speech aids for users who are blind or visually impaired. DECtalk Software runs on Alpha or Intel systems running Windows NT, Alpha systems running Digital Unix, or Intel systems running Windows 95. Many speech synthesis products work with the DECtalk Software engine.

The Productivity Works offers pwWebSpeak, a talking Web browser. The program reads Web pages in an understandable robotic voice, speaking links automatically as a page is read. The software supports voice synthesizers that use a Sound Blaster-compatible card. It requires 8 MB of RAM and runs under Windows 3.1 and Windows 95.

Voice Recognition Products

Voice recognition engines process the spoken word, converting ver bal commands into computer commands. Many voice recognition products are available for the PC platform. You can use voice recognition to control the Windows desktop, dictate documents, or both.

NaturallySpeaking, from Dragon Systems, is a voice dictation system that performs continuous speech recognition. You can dictate documents into your computer, then cut and paste the text into your word processor. You do not need to deliberately pause between words, so data entry is faster. NaturallySpeaking requires a PC equipped with a 166-MHz Pentium processor; it runs faster on MMX machines. The software needs 32 MB under Windows 95, 48 MB under Windows NT 3.51 and 4.0, and 60 MB of free hard disk space. NaturallySpeaking also requires an industry standard 16-bit Sound Blaster-compatible card or, on portables, a built-in sound system. It comes bundled with a headset-style microphone. NaturallySpeaking has a 30,000-word memory-resident active vocabulary and a disk-based 200,000-word backup dictionary.

Kur zweil Applied Intelligence is one of the major players in the speech arena. The company offers several voice recognition products for PCs. VoicePad is a voice dictation system with a 20,000-word active vocabulary and a disk-based vocabulary of 200,000 words. Under Windows 3.1, the software requires a 75-MHz 486 processor or faster. Running under Windows 95, a Pentium processor is required. The system also needs 8 MB of RAM for the voice application and 20 MB of disk space. The program requires a 16-bit Sound Blaster-compatible card.

IBM has long been one of the leading developers of voice technology. Simply Speaking Gold is a combination voice recognition and text-to-speech engine from IBM for Windows 95 and Windows NT 4.0. The package combines voice command-and-control functions with voice dictation. It also includes VoiceType Connection for Netscape, which enables voice-directed Web browsing using Navigator 4.0. Simply Speaking Gold requires a 100-MHz Pentium system, 16 MB of RAM for Windows 95 (32 MB for Windows NT 4.0), 46 MB of disk space, and a Sound Blaster card.

Famous Last Words

Since the personal computer was born, speech technology has made tremendous strides, slowly working its way from games to the office desktop, with prices dropping all the while. Faster Pentium-class computers and 32 megabytes of memory provide a solid platform for running speech applications. The Sound Blaster card and its clones provide the audio component at a cost-efficient price. Clearly, the goal of many speech developers is to make their products available to the consumer market, which means you'll start seeing more speech-enabled applications in the computer stores. Many of these products will be in the $100 to $200 range. Speech has put itself squarely in the mainstream, and the technology has taken a giant step toward replacing the keyboard.


Where to Find


Speech Processing


DECtalk Software

D
igital Equipment Corp.
Littleton, MA
Phone:    800-344-4825
Fax:      800-234-2298
Internet: 
http://www.digital.com/oem/products/dectalk/dectalk.htm


NaturallySpeaking

Dragon Systems, Inc.
Newton, MA
Phone:    617-965-5200
Fax:      617-527-0372
Internet: 
http://www.dragonsys.com/


pwWebSpeak

The Productivity Works, Inc.
Trenton, NJ
Phone:    609-984-8044
Fax:      609-984-8048
Internet: 
http://www.prodworks.com
 

Simply Speaking Gold

IBM Direct
Atlanta, GA
Phone:    800-426-2255
Fax:      800-242-6329
Internet: 
http://www.software.ibm.com/


VoicePad

Kurzweil Applied Intelligence, Inc.
Waltham, MA
Phone:    781-893-5151
Internet: 
http://www.lhs.com/kurzweil


Screen Readers


Automated Screen Access Program (ASAP) for Windows

MicroTalk
Texarkana, TX
Phone:    903-792-2570
Fax:      903-792-5140
Internet: 
http://www.screenaccess.com


JAWS for Windows

Henter-Joyce, Inc.
Phone:    813-803-8000
Fax:      813-803-8001
Internet: 
http://www.hj.com/


Slimware Window Bridge 

Syntha-Voice Computers, Inc.
Stoney Creek, Ontario, Canada
Phone:    905-662-0565
Fax:      905-662-0568
Internet: 
http://www.synthavoice.on.ca/


Window Eyes

GW Micro
Fort Wayne, IN
Phone:    219-489-3671
Fax:      219-489-2608
Internet: 
http://www.gwmicro.com


WinVision

Artic Technologies
Troy, MI
Phone:    248-588-7370
Fax:      248-588-2650
Internet: 
http://www.artictech.com/



Information on products in the operating systems category HotBYTEs - information on products covered or advertised in BYTE

Reading the Screen

illustration_link (11 Kbytes)

Screen readers are programs used by people who are blind or visually impaired to operate a computer. These programs examine data going to the screen buffer and present any text strings on a braille display or use one of the voice synthesis programs to speak the text aloud. Many of these screen readers support the Sound Blaster card and software-based text-to-speech synthesizers, such as Digital Equipment's DECtalk.


Joseph J. Lazzaro ( lazzaro@world.std.com ) is the author of Adapting PCs for Disabilities (Addison-Wesley, 1996). He is also project director of the Adaptive Technology Program housed at the Massachusetts Commission for The Blind, in Boston.

Up to the Core Technologies section contentsGo to previous article: Go to next article: Demystifying ATM AddressingSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network