o-speech package costs only $99.) More important, people with visual or learning disabilities can now purchase a computer from a superstore and use it immediately, without purchasing additional hardware.
From Toy to Tool
How did this change come about? For starters, processors got a lot faster over the past decade. A PC with a 133-MHz Pentium, or even a 200- or 233-MHz CPU, is not uncommon. These chips deliver sufficient computing power so that
speech processing
can be handled by software rather than by hardware. A possible show-stopper to using software for speech operations is that both the speech generation and the speech recognition algorithms require ample memory to store and process wave forms. However, today's low DRAM prices have helped the situation: Many out-of-the-box PCs are tricked out with a basic 32 MB of RAM. It only costs about $200 to double that capacity to 64 MB.
Another crucial change is that a de facto hardware standard has emerged for sound generation and capture on PCs, eliminating the installation and support problems created by a morass of different hardware configurations and drivers. Creative Labs' Sound Blaster card has become the recognized speech and audio standard for the Windows platform. The company claims it has 20 million cards installed worldwide. You can't purchase a PC today without a sound card inside, and chances are it will be a Sound Blaster, or Sound Blaster-compatible. The Sound Blaster comes b
undled with its own native text-to-speech and recognition engines in the form of TextAssist and VoiceAssist, respectively. With a high-speed Pentium processor and 32 megabytes of RAM, running speech synthesis or recognition engines concurrently is no longer a daunting task. Furthermore, there's more than enough memory and processing power left over for the OS and applications to run smoothly. Because of this, major speech-technology developers like IBM, Kurzweil, AT&T, and Dragon Systems have migrated from proprietary speech cards toward the ever-present Sound Blaster hardware.
In the API arena, a set of solid standards that support speech for a wide variety of applications is emerging. The Microsoft Speech Application Programmers Interface (SAPI) is a standard programming interface for speech technologies on the Windows platform. SAPI provides support for both voice synthesis and speech recognition. By writing SAPI-compliant code, developers gain the ability to mix and match technologies from any o
f the vendors that provide SAPI-compliant speech engines. SAPI is based on the Component Object Model (COM), so it can be accessed from a number of languages and development environments, including Visual C++, Visual Basic, Visual J++, as well as development environments from other vendors that support COM. The point to remember is that SAPI lets you choose the speech engine or product that is most useful for your needs.
Speak to Me
Text-to-speech synthesis engines convert text into the spoken word in real time. Speech engines can take notice of punctuation, capitalization, numbers, even international conventions for time, currency, and date. Numerous speech synthesis engines and products are on the market, with one suited to almost every requirement. Here are some representative examples.
Digital Equipment Corporation has long been a major player in the speech business. DEC offers speech products for different markets and applications. The company supports its proprietary DECtalk hardwar
e technology, as well as software solutions. DECtalk Software is a text-to-speech engine that features nine voices and has an unlimited vocabulary. DECtalk can also generate DTMF tones for telephony applications. DECtalk Access 32 is under development for the adaptive technology market; it will be used to produce speech aids for users who are blind or visually impaired. DECtalk Software runs on Alpha or Intel systems running Windows NT, Alpha systems running Digital Unix, or Intel systems running Windows 95. Many speech synthesis products work with the DECtalk Software engine.
The Productivity Works offers pwWebSpeak, a talking Web browser. The program reads Web pages in an understandable robotic voice, speaking links automatically as a page is read. The software supports voice synthesizers that use a Sound Blaster-compatible card. It requires 8 MB of RAM and runs under Windows 3.1 and Windows 95.
Voice Recognition Products
Voice recognition engines process the spoken word, converting ver
bal commands into computer commands. Many voice recognition products are available for the PC platform. You can use voice recognition to control the Windows desktop, dictate documents, or both.
NaturallySpeaking, from Dragon Systems, is a voice dictation system that performs continuous speech recognition. You can dictate documents into your computer, then cut and paste the text into your word processor. You do not need to deliberately pause between words, so data entry is faster. NaturallySpeaking requires a PC equipped with a 166-MHz Pentium processor; it runs faster on MMX machines. The software needs 32 MB under Windows 95, 48 MB under Windows NT 3.51 and 4.0, and 60 MB of free hard disk space. NaturallySpeaking also requires an industry standard 16-bit Sound Blaster-compatible card or, on portables, a built-in sound system. It comes bundled with a headset-style microphone. NaturallySpeaking has a 30,000-word memory-resident active vocabulary and a disk-based 200,000-word backup dictionary.
Kur
zweil Applied Intelligence is one of the major players in the speech arena. The company offers several voice recognition products for PCs. VoicePad is a voice dictation system with a 20,000-word active vocabulary and a disk-based vocabulary of 200,000 words. Under Windows 3.1, the software requires a 75-MHz 486 processor or faster. Running under Windows 95, a Pentium processor is required. The system also needs 8 MB of RAM for the voice application and 20 MB of disk space. The program requires a 16-bit Sound Blaster-compatible card.
IBM has long been one of the leading developers of voice technology. Simply Speaking Gold is a combination voice recognition and text-to-speech engine from IBM for Windows 95 and Windows NT 4.0. The package combines voice command-and-control functions with voice dictation. It also includes VoiceType Connection for Netscape, which enables voice-directed Web browsing using Navigator 4.0. Simply Speaking Gold requires a 100-MHz Pentium system, 16 MB of RAM for Windows 95 (32 MB
for Windows NT 4.0), 46 MB of disk space, and a Sound Blaster card.
Famous Last Words
Since the personal computer was born, speech technology has made tremendous strides, slowly working its way from games to the office desktop, with prices dropping all the while. Faster Pentium-class computers and 32 megabytes of memory provide a solid platform for running speech applications. The Sound Blaster card and its clones provide the audio component at a cost-efficient price. Clearly, the goal of many speech developers is to make their products available to the consumer market, which means you'll start seeing more speech-enabled applications in the computer stores. Many of these products will be in the $100 to $200 range. Speech has put itself squarely in the mainstream, and the technology has taken a giant step toward replacing the keyboard.
Where to Find
Speech Processing
DECtalk Software
D
igital Equipment Corp.
Littleton, MA
Phone: 800-344-4825
Fax: 800-234-2298
Internet:
http://www.digital.com/oem/products/dectalk/dectalk.htm