The IBM Personal Dictation System delivers a voice-controlled computer interface and sophisticated speech-to-text software--on a 486-based PC
Stanford Diehl
The IBM Personal Dictation System, or IPDS, brings computer-based dictation services to a mainstream corporate audience. The system combines a voice-controlled application interface with a sophisticated dictation system. Less than two years ago, this type of system required the horsepower of an RS/6000, but the system I evaluated ran on a 486-based OS/2 desktop. IBM is currently working on a Windows version.
The Technology of Speech
The IPDS requires a single adapter, along with OS/2 2.1 software. The adapter provides audio input and output and also includes a DSP (digital signal processor) that handles the computationally intensive dictation algorithms. The system must
be able to immediately access the acoustic models of up to 32,000 words as well as the parameters stored for the speaker's voice.
Naturally, this requires some memory. IBM recommends 16 MB of RAM--8 MB for IPDS, the rest for OS/2. IPDS occupies 32 MB of hard disk space and consumes an additional, recoverable, 30 MB during training. After training, 2 MB or less should hold your voice parameters, but optional dictionaries add 10 to 15 MB each to the hard disk requirements. During a dictation session, the system stores data for the correction phase, including audio data (for playback) and possible alternative words. So you'll need lots of space during a dictation session--over half a megabyte per minute of speech. At the end of a session, those resources are recovered.
Discrete-speech systems support large vocabularies. IBM's dictation system ships with a 20,000-word office correspondence dictionary. Optional specialized dictionaries range from 16,000 to 30,000 words; you can add 2000 words to eac
h dictionary. You must pause discretely between each word you speak, and you have to train the system to understand your voice. Combining a command interface with dictation technology enables you to create and save documents in a completely "hands-free" environment: You can dictate and enter system commands with your voice.
Basic Training
To train the system, you must recite, one sentence at a time, a script that appears on your screen. My training session took well over an hour. The process can get a bit tedious, but you can pause the session at any time and resume training later. Once you have completed the training session, the system requires another 2 hours to process the data.
The system builds an icon for you on the desktop. Double-click on it, and the system loads the IPDS. Clicking on the microphone button at the bottom right corner of the screen turns the microphone on and off. Say "dictation window," and the dictation application starts up. You are presented with a window that look
s much like a blank word processing document. Say "start dictation," and the system will begin translating your speech into text. Once I got the hang of speaking with a pause between words, I dictated fairly quickly, up to 70 words per minute.
I dictated a number of different types of documents into the system: press releases, magazine articles, excerpts from popular novels, technical manuals, office memos, business letters, and even some poetry. In each case, the system improved as I read additional documents into it. The adaptive language model does its job well. It entered new words into the dictionary so that the system understood words I commonly use, including special formatting (e.g., capitalizing all the letters in BYTE). But it also updated data on my word-usage patterns; in effect, it learned the frame of reference for a particular set of documents. For instance, the more that I read press releases into the system, the better it got at translating press releases.
I found that the syste
m works much better for documents (e.g., legal papers and technical manuals) that abide by a consistent language structure; with such documents, the system can better predict what words will be used. It is much less accurate on more free-form prose, such as a novel, but in general, the system is very accurate--considerably better than other computer-based dictation systems I've used.
Hands-Free, Eyes-Free
When you first start using the system, you have to correct quite a few words. Luckily, you can complete a dictation session without watching the screen to check for any errors that are being made. This makes the system "eyes-free" as well as "hands-free." When you go back to correct the mistakes, you select the offending word, and the system plays back your pronunciation of the word. So even if the system really mangles the translation, you can always go back and hear what you said. The system also lists possible alternatives for an incorrect word. Often, the correct word is on this list, and you s
imply select it (see the screen). If the word is not on the list, you type it in. New words are added to the dictionary in this way.
Over a few weeks, not only did the system adapt to me, but I adapted to the system. I spoke more rapidly and rarely ran words together. I also learned how to correct words quickly. I transferred documents to a word processor and completed any final edits there. Voice macros were simple to create and extremely convenient. For instance, I could say "open letter," and the system would print my name and address, the current date, and a general salutation. You can generate often-used phrases or paragraphs by simply saying a single word. The system can be frustrating at first, but it gets more accurate and much easier to work with as you go along.
The IPDS should appeal to markets where voice recognition has traditionally done well. The ideal environment for the IPDS is a "hands-free/eyes-free" one, such as a hospital where a nurse could enter patient data while taking a
blood sample. The alternative is a manual procedure (e.g., a pen and clipboard) that requires the use of your hands. Legal applications are also well suited for IPDS. IBM sells supplemental dictionaries for journalism ($499), emergency medical ($499), and radiology ($599) applications, and more are in the works. IBM is also porting RS/6000 European language versions to the PC.
Breaking Tradition
Beyond the traditional markets, IBM is targeting IPDS for general business correspondence. During my evaluation, I found that most of my correspondence shares consistent terminology and phraseology. The system became quite accurate at creating memos and business letters.
However, the corporate environment is not as amenable to speech recognition as traditional voice applications are. The range of documents is more diverse, and the physical environment might be unsuitable. Although the training process accounts for steady background noise, the system will still pick up any loud stray noises. In a shar
ed-office or cubicled arrangement, you give up confidentiality when reciting your documents, and your coworkers might grow weary of listening to your dictation sessions.
If you are accustomed to regular typing, you will generate correspondence more quickly from your keyboard. If you currently use a stenographer, you must consider the trade-offs. The IPDS involves more work (i.e., training the system and correcting mistakes), but it costs much less than a stenographer, is always available when you need it, and requires no health insurance. If you don't type well or don't feel comfortable working with a computer, the IPDS system should appeal to you. It's easy to use and employs the most natural interface of all: Just talk to it.
Voice recognition is becoming viable. IBM is on the right track, and the future looks exciting. The company showed me a prototype system running on a ThinkPad with a PCMCIA adapter, promising that speech recognition for mobile applications will be available soon. And IBM
believes that the PowerPC processor has the horsepower required to support the IPDS without the need of additional DSP hardware. A PowerPC-based personal digital assistant may then adopt a voice-activated interface. Voice-controlled computers are no longer relegated to the realm of science fiction or even to specialized niche markets; viable speech recognition has arrived on the desktop.
The Facts
IBM Personal Dictation System
Software and microphone headset $499
Micro Channel adapter $579
ISA adapter $499
IBM Corp.
Speech Recognition Support Center
Mail Stop 2236, Route 100
Somers, NY 10589
(914) 766-9251
fax: (914) 766-2788
Illustration: The IPDS includes voice control of the OS/2 desktop and a sophisticated dictation application. When you correct a word, the system offers a list of possible alternatives. Note the button for turning the microphone on and off and the history of voice commands.
Stanford Diehl is director of the BYTE Lab. You can reach him on the Internet or BIX at
sdiehl@bix.com
.