Windows XP Speech Recognition

by Kam-Hung Soh (kamhung dot soh at gmail dot com) 2007/07/28 15:35:04

Index Copyright About Blog

This article describes how to turn on Windows XP's Speech Recognition service and some experiments I did with this service. I wrote this article because I didn't find an obvious way to enable or use this service.

Enabling Speech Recognition

If you have not done so, enable your Language Bar:

  1. Open Windows Control Panel.
  2. Select the Regional and Language Options applet. (The following command opens the same applet: intl.cpl.)
  3. The Regional and Language Options dialog should open.
  4. Select the Languages tab.
  5. Press the Details … button.
    1. The Text Services and Input Languages dialog should open.
    2. Select the Settings tab.
    3. Installed ServicesIn the Installed services field set, check if you have <Your language> / Speech Recognition. On my notebook, I have English (United States) / Speech Recognition.
    4. Select the Speech Recognition item, then press the Properties … button.
      1. The Speech input settings dialog should open.
      2. In the Mode Keys field set, select the Assign mode keys check box.
      3. Press the Settings … button.
      4. The Mode button configuration dialog should open. In this dialog, choose the required key or mouse action to turn on and off dictation and command modes (e.g. performing edit operations). I use the default F11 and F12 keys, respectively.
      5. Press the OK button to close the Mode button configuration dialog.
    5. In Preferences field set, press the Language Bar … button.
      1. The Language Bar Settings dialog should open.
      2. Select the Show the Language bar on the desktop check box.
      3. I also selected the Show the Language bar as transparent when inactive. check box to make the Language Bar a little less intrusive on the desktop.
      4. Press the OK button to close the Language Bar Settings dialog.
    6. Select the Advanced tab.
    7. In the Compatibility Configuration field set, select the Extend support of advanced text services to all programs check box. If you select this check box, then the speech recognition service will work for many programs, not just WordPad and Microsoft Office.
    8. Press the OK button to close the Text Services and Input Languages dialog.
  6. Press the OK button to close the Regional and Language Options dialog.

Language Bar

The Language Bar should be visible in your desktop. Press the Microphone button and you should see two additonal buttons: Dictation and Voice Command. The Language Bar should also display a status balloon like Begin Dictation … or Listening ….

Testing and Training

Once you have your Language Bar configured, you're ready to use the speech service. Start an editor such as Notepad and say something like OpenOffice can support dictation period and you might get howlers like these: Open off this can support the station. or Whole offers and some work to Haitian.

Not surprising because the speech recognition service has to be trained to recognize your voice:

  1. Open Windows Control Panel.
  2. Select the Speech applet (there's no .cpl file available).
  3. The Speech Properties dialog should open.
  4. Select Speech Recognition tab.
  5. In the Language field set, check if you have Microsoft English Recognizer v5.1 in the drop down list.
  6. In the Microphone field set, press the Configure Microphone … button.
  7. Use the Microphone Wizard to ensure that your computer can hear you speak. This wizard takes about a minute to run.
  8. In the Recognition Profiles field set, select the Default Speech Profile, then press the Train Profile … button.
  9. Use the Voice Training wizard to train the speech recognition system. Each training session takes about 10 minutes and you have to complete a session before your speech profile is updated.

Using Speech Recognition

Correction Window

Wordpad provides the best support for the Speech Recognition service. Using the Language Bar's Extend support … option, you can dictate text using other programs such as OpenOffice Writer and Notepad but these programs only supported a limited number of commands and would put the punctuation marks one space after the end of the last word. GVim crashed whenever the speech recognition service was enabled in the Language Bar.

There's three groups of interactive commands: Dictation, Voice and Correction Window. The Dication and Voice commands are fairly similar; they transcribe some spoken text or do some editing operation. For example, Scratch That removes the last phrase added, Go To Top moves the cursor to the start of the document and Select … Through … selects a block of text. Say Correct That to show the Correction Window which displays a list of alternative phrases, then say Select <number> to choose the required phrase.

Conclusion

The Windows Speech Recognition service seems to be aimed exclusively at text entry for Microsoft Office products using speech. It isn't a general input service because applications have to specially written to support it. The compatibility option allows more applications to use speech recognition but can also crash programs. The service supports a very limited number of user interface verbs compared to the other input services such as the keyboard or mouse. For instance, there's no way to start an application, to open a file or to use a menu item. There's also no user interface to improve the speech recognition accuracy, other than by re-reading the training text.

References

Index Copyright About Blog