[This one's a winner -- Trevor Smith, Editor, OS/2 e-Zine!]
VoiceType- by Dr. Dirk Terrell

Arguably, the most significant addition to Warp in version 4 has been the addition of VoiceType. This remarkable new tool brings a very powerful voice recognition system to OS/2 users. Once the domain of only the most powerful computers, VoiceType makes voice recognition possible on a relatively modest PC with a 16-bit sound card. By integrating VoiceType with the operating system, IBM has once again led the way with new operating system technologies. While Windows users were toiling with Program Manager for years, OS/2 users were reaping the productivity benefits of the powerful Workplace Shell. Now we are making another leap forward with VoiceType.

A noise-cancelling microphone is included with Warp 4. It's one of these lightweight headset microphones, and it is very adjustable and comfortable. The cord connecting the microphone to the computer is 7 feet (2 meters) long, and for most people that will probably be good enough. I suspect some people will seek out wireless solutions though. I have, on several occasions, forgotten about the microphone and walked away from the computer.

VoiceType operates in two modes: Navigation and Dictation. Navigation is the simpler of the two and requires fewer resources. With Navigation, you use voice commands (GIF, 3k) to perform operating system functions like opening folders, starting applications, moving windows, etc. For example, I usually start the day by saying "Jump to PMMail" and PMMail starts. I then read and reply to e-mail, again using Voice Navigation commands.

When replying to e-mail, I use Dictation, which requires a bit more than Navigation in terms of processor power and memory. IBM's stated requirements for Navigation are 16 megabytes of RAM and a 75 MHz Pentium. The minimum for Dictation is 24 megabytes of RAM and a 100 MHz Pentium. I tested Dictation on a 90 MHz Pentium with 32 megabytes of RAM, and it worked just fine. I also ran it on my 200 MHz Pentium Pro with 32 megabytes of RAM, and it recognized words as fast as I could say them. Like many people, I am no great typist and with a little practice, I can now dictate faster than I can type.

Of course, people all speak with different accents, so you must train the computer to recognize your voice. This training process is referred to as "enrollment". The enrollment for Navigation takes just a few minutes, and after you are done, you'll find yourself cruising all around your OS/2 system. You may actually find it kind of eerie at first. You say "Scroll Down" and a window starts scrolling, and "Go Right" causes the window to start moving across the screen. I find that Navigation is very close to 100% accurate. Occasionally, I slip into my South Carolina southern drawl, and it gets a little confused, but for the most part, navigation works flawlessly.

Enrollment for Dictation (GIF, 8.2k) takes a bit longer because there are many more sounds that the computer must learn to recognize. If my experience is typical, it will take 2-3 hours to complete the enrollment process. The nice thing is that you can do it in parts, and I found that Dictation was working pretty well even when I was only about 25% into the enrollment process. (I didn't do any rigorous statistics on it, but I would imagine that it was hitting 80-90% when I was only a quarter of the way through the enrollment.)

It takes a little practice to get Dictation (GIF, 7.1k) working efficiently because you have to pause between words. It doesn't take much of a pause though, and I have learned to speak just as rapidly with discrete words as I do in normal continuous speech. Another thing you have to get used to is the intelligence of the recognition engine. There are many words that have subtle pronunciation differences, and a lot of times you'll see the wrong word pop up on the screen as you are dictating. The temptation is to stop the dictation and correct the word, but if you keep dictating, you'll find that the recognition engine will change its interpretation on the fly, based on the context of the sentence. This is clearly some powerful software.

Of course, using spoken commands may not be the most efficient way to do everything. Just as I use the WPS to do certain things and a command line for others, I use voice commands for those things that it makes more efficient. For example, I have several Web sites set up with voice commands, so that if I want to go to a particular site, all I have to do is say "Jump to..." whatever site. Need to search for something on Yahoo? "Jump to Yahoo" brings up the Yahoo site, no matter what you happen to be doing. No more clicking on the WebExplorer icon and then finding Yahoo in your Quicklist. To paraphrase the Nike slogan, "Just Say It."

You're cruising along with Voice Navigation, and you come across something that you want to do, but you don't know what the voice command is. Now what? Just ask the computer of course! "What can I say?" pops up a window that shows you the voice commands that are appropriate to the situation. "Where can I go?" shows you the applications you can start with voice commands. "Jump to VoiceType User's Guide" brings up the help file for VoiceType.

The possibilities for applications of VoiceType are numerous. I can imagine all sorts of situations where voice control would be far superior to the mouse or keyboard. For example, you're in the heat of battle in MechWarrior getting pounded by an enemy that has seemingly come out of nowhere. Instead of hitting Ctrl-F1,4,1 to get your buddies to help out, "Attack my target" would be much easier. Another example: I'm at the telescope centering a star in the photometer diaphragm. Rather than having to move around in the dark and find the keyboard to start a measurement, I could just say "10 second integration," "switch to B filter", etc.

Although we haven't yet arrived in the era of the computers seen in Star Trek, VoiceType is certainly a giant leap in that direction. Voice control is the next big step in the human-computer interface. I am reminded, as I sit here and dictate this article, of the scene in Star Trek 4, where Scotty picks up the mouse of a Macintosh as if it were a microphone and says "Computer", only to be told to use the keyboard. His response is "How quaint." I can imagine an amusing commercial for Warp 4 with an OS/2 user saying the same thing to a Windows 95 user.


Dr. Dirk Terrell is an astronomer at the University of Florida specializing in interacting binary stars. His hobbies include cave diving, martial arts, painting and writing OS/2 software such as HTML Wizard.

[Index]  [® Previous] - [Feedback] - [Next ¯]

[Our Sponsor: BMT Micro - Over 100 of the best OS/2 shareware applications available.]


This page is maintained by Falcon Networking. We welcome your suggestions.

Copyright © 1996 - Falcon Networking