Vista Speech

R

Robert Robinson

General

Microsoft speech recognition software is much improved compared with the
previous versions. It still fails, however, to meet the minimum
requirements for professional dictation including adequate speech
recognition accuracy and the availability of specialized vocabularies.
The following is a specific critique of Vista Speech. It is understood
that this is still beta software; however, this beta (build 5384) is
supposed to contain the basic feature set of the production speech software.

Audio Hardware

Weaknesses

1. NT/Windows Server 2003 device drivers are typically incompatible with
Vista. Very few sound converter Vista drivers are available at this time.


Audio Input Window

Strengths

1. Ability to select sound adapter.
2. Ability to select input type, that is: Microphone, line in, or digital.
3. Option for setting audio level manually.

Weaknesses

1. It is unclear from the VU (volume unit) display where the optimum
audio setting should be; for example, at the middle or upper limit of
the green display.
2. What would appear to be the optimum audio setting on this display
doesn't agree with that of the control panel, advanced speech options,
speech recognition, microphone level window display.
3. The low end sensitivity of the VU is inadequate to show ambient
electrical and acoustic noise levels.
4. There is apparently no automatic volume control.
5. There is no frequency spectrum display to provide relative
indications of signal/noise amplitudes.

Recognition Engine (Microsoft Speech Recognizer 8.0)

Strengths

1. Recognition accuracy is significantly improved compared with previous
Microsoft speech recognition engines.

Weaknesses

1. Recognition accuracy is still far behind that of the current leading
speech recognition software, NaturallySpeaking.
2. Speech recognition processing is slow, even on a high performance
computer system.
3. Recognition accuracy is unusually sensitive to audio volume settings.
4. New words that are trained or existing words that are re-trained are
frequently not then recognized correctly. This is one test of
recognition accuracy. The Microsoft Speech Recognizer does poorly in
this test compared with NaturallySpeaking.

Speech Recognition Training Window

Weaknesses

1. Text to be dictated is displayed in short sentences or sentence
fragments rather than paragraphs. This is very upsetting to the normal
pacing of dictation.
2. There is no indication of progression of the dictation; for example,
highlighting or graying out of text as successive words are recognized.
3. There is no indication of the successfulness of the dictation. You
can dictate phrases that are completely different from the displayed
text and the program
proceeds to the next display without any indication of there having been
a recognition problem.
4. There is no ability to back-up, repeat or skip mis-recognized text.
5. There is no VU display in the training window.
6. There is no user selectable list of choices for additional training
after the introductory training.
7. There are no user selectable specialized training texts; for example,
business letters or medical reports.

Dictation

Strengths

1. Full-capability dictation into many application programs.
2. Ability to pop-up the correction window by speaking "correct"
followed by the phrase to be corrected or by highlighting it and
commanding "correct that", meaning correct the highlighted text.
3. "Scratch that" meaning to delete the most recently dictated phrase.
5. Various commands for navigating through text.

Weaknesses

1. There are limitations of dictation into some "Windows standard"
textbox controls.
2. There is no control key selection of command or dictation modes.
3. No microphone on/off by control key press.
4. No control key press for selection of post dictation spelling and
grammar checks.
5. No option for vocabulary switching.
6. No user selectable, context sensitive control of abbreviations and
number formatting.

Correction Window

Strengths

1. Correction window can be displayed by highlighting or voice selecting
the text to be corrected.
2. Errant phrases are numbered if more that one instance appears in the
text facilitating selection of a specific phrase to be corrected.
2. The lists of alternate phrases contain generally appropriate
possibilities.
3. Additional alternates can be displayed by re-dictating the errant phrase.
4. Voice spelling of a new term is well designed and usually works properly.

Weaknesses

1. No user option to re-train a mis-recognized word that is in the main
vocabulary.
2. No way to type in a new word - it must be voice spelled.
3. No way to re-check the accuracy of a mis-recognized word that has
just been re-trained.
4. No way to train a phrase ( as opposed to a single word).
5. No way to re-train both the corrected word or phrase and the original
mis-recognized phrase.
6. No way to specify and train both actual spelling and "spoken as"
representations of words or phrases.

Vocabulary

Weaknesses

1. Entries are limited to single words.
2. No way to specify and train both actual spelling and "spoken as"
representations of words or phrases (as above).
3. No capability to display, search, sort, edit, add, delete and train
any word or phrase in the main vocabularies. There is limited editing
capability for user the vocabulary only.
4. No current availability of specialized vocabularies; for example,
legal or medical.
5. No user option for adding specialized vocabularies.

Utilities

Weaknesses

1. No option for backup and restore of user (training, options, etc) and
vocabulary files.
2. No option to add, delete, edit and execute user developed macros.
3. The option for processing "typical" user documents is limited to
files stored in My Documents. The user has no control over the choice of
directories or the specific files to be screened.
4. Typical document screening lacks the important functions of
identifying and listing by frequency of occurrence the new words that
are located in the documents. There are no user options for adding and
training the new words.
5. Testing of the document screening from a custom SDK function did not
result in any improvement in recognition accuracy.

SDK/SAPI 5.3

Strengths

1. Extensive set of APIs.

Weaknesses

1. No backward compatibility with SAPI 4.
2. SAPI 5.3 is only available for the Vista platform. Microsoft has
decided, at least as of this date, not to supply a NT/Windows Server
2003 SAPI 5.3 based SDK.
3. The automation versions of many of the SAPI 5.3 functions are still
not available. Some may never be implemented.
4. There are still significant bugs in multiple Microsoft provided SAPI
5.3 based automation functions and utilities.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top