Review: Nuance Dragon for Windows offers strong voice recognition

05.01.2016

We're all getting more comfortable talking to devices these days, whether it means talking to Cortana, Google Now or Siri to check the weather forecast, asking Amazon Alexa which room your keys are in or telling Xbox to pause the video you're watching. But there's a voice dictation and control application that's been available for many years that is considerably more advanced.

Nuance's latest Dragon voice recognition for Windows now comes in several packages. Dragon 13 Home ($100) is for simple personal use; Dragon 13 Premium ($200) adds email, to-dos and other document-related features; Dragon Professional Individual ($300) is for business users who need features such as transcription; and Dragon Professional Group adds IT admin options for deployment and tracking. For this review, I worked with Dragon Professional Individual.

(There is also a version available for the Mac, which was reviewed in a previous article.)

If you're not familiar with Dragon, it is an application that lets you use your voice both for dictation and control; for example, you can tell Windows to open Word and then dictate your document. It works directly with familiar applications such as Word, Excel, Outlook, WordPerfect and Notepad, and popular browsers such as Chrome, Firefox and Internet Explorer; you can also control some popular websites like Bing and Gmail using spoken shortcuts.

When you start dictating in applications that are not directly supported, a Dictation Box pops up automatically to recognize your text and let you transfer it into the application.

Getting started with Dragon Professional is much less work than in older versions of the software. Once upon a time, you needed to read an entire chapter from a book into voice recognition software to get it to understand anything you were saying. Those days are gone. Setup and initial training took me less than 20 minutes, after which the software recognized my voice reasonably well.

You do need to pick both your region and accent; there's a different set of accents for different regions. For the UK that includes Australian, Indian and Southeast Asian as well as a "standard" British accent, whereas the U.S. and Canadian regions include not only "standard" English but southern U.S. English, British English and Pakistani, Spanish and teen (because children's voices need a different speech model).

Cleverly, the text you read to set Dragon up is made up of tips about using the software, such as keeping a consistent distance away from the microphone, speaking at the same volume and keeping your natural tone of voice. (Nuance's acoustic models for voice recognition are based on recordings of people speaking normally rather than in the artificial tone of voice some people adopt when speaking to a computer. They also use samples of users' voices; if you don't want to upload your own speech and recognition data to Dragon anonymously, you can opt out during setup).

Once installed, Dragon puts a floating window that it calls the DragonBar at the top of the screen to indicate that the voice recognition software is running.

Most of the time, the bar collapses to an icon that shows only whether the microphone is on and what it's listening for; hover your cursor over it to show the full controls. You can use your voice to open menus and choose commands on the DragonBar to change options in Dragon. You can also turn the microphone off with your voice, or put it to sleep (but of course, once the mic is off you can't turn it back on with a voice command). The DragonBar will also show tips -- for example, it will issue a warning if you can't dictate into the application you're using doesn't allow dictation.

Once the DragonBar is up, you can start using commands like "Start menu," "Open Microsoft Excel," "Post to Twitter" or "Scroll down" to control your computer, or start dictating text within an application.

Whether you're dictating or controlling your computer, you can use a voice command at any point to ask Dragon what you can say; you can get a list of commands to say for navigation, formatting and punctuation as well as correction, and making the most of the software is mostly a question of getting into the habit of using those rather than switching back to keyboard or mouse.

One of the major drawbacks with Dragon is that not all software lets you dictate into it automatically.

You can open a new Word or Notepad document, start talking and have your words appear directly in your document. But if you prefer to work in an app like OneNote, then you have to dictate into the Dictation Box, which is a floating window that automatically appears when you talk at any application Dragon can't insert text into directly. What you say is recognized and shows up in the Dictation Box, but it's much less convenient than dictating straight into an application like Word or Outlook, because once you've finished speaking you need to remember to move what you've said into your application, using the Transfer button in the dialog.

In testing, that worked well with some apps -- I was able to dictate tweets even into Windows apps like Tweetium, although I couldn't control the app to post a tweet with a voice command.

But far too often, the same process didn't work with OneNote. Clicking the Transfer button in the Dictation Box dialog with the mouse correctly transferred the text into my OneNote document every time. But saying "Click Transfer" to do the same thing -- without going back to using mouse and keyboard to control the PC -- would often lose the text I had dictated. On one occasion I found the text in a different OneNote window that was open in the background, but other times it vanished completely. Having a voice command not only fail, but fail and delete dictated text, is less than impressive.

As mentioned before, Dragon works with most common browsers (but not Edge); you'll be prompted to install the Dragon extensions for Chrome, Firefox or Internet Explorer the first time you open the browser after installing Dragon. (I was surprised when Dragon repeatedly mis-recognized Bing as "being.")

While you can open a browser and navigate the interface with voice commands, you can also tell Dragon directly to search the Web for specific keywords. You can also use spoken searches for news, maps, photos, video or even specific sites such as eBay, MSN, YouTube, Facebook, Twitter and Wikipedia. That opens a dialog box where you can check that it recognized the key words correctly (to avoid potentially embarrassing results), but again I found that I sometimes had to manually click using the mouse rather than say "Select" in the dialog box to get the search going.

You can also control Web apps like WordPress or Facebook Messenger -- although I had variable success with these. Outlook.com was particularly difficult to drive with voice commands; I could dictate an email message, including the subject, and select the recipient from the address book, but no matter how many times I said "New" on the Outlook home screen I couldn't actually create a new email with voice commands. I could sometimes delete email messages, but other times -- as with trying to create a new email -- Dragon would show numbers overlaid on the Web page corresponding to possible commands, but no matter how many times I spoke the number corresponding to the Delete command, I couldn't get Dragon to actually send the command.

Controlling the Outlook desktop app was considerably more successful; I was able to reply to messages and even accept meeting requests using voice commands, although I could not switch to different folders. I was also able to navigate around Windows, including opening the Start menu and choosing applications to launch, although oddly the Start menu sometimes remained open even after the application launched.

Controlling Excel or Word with voice commands worked well when using the Ribbon (I could easily insert smart art or a chart -- in fact, I occasionally did it by accident), and there are handy voice shortcuts to insert the total of a group of numbers into a table or file a message in a folder. Confusingly, though, you need to use a completely different voice command to trigger the File menu ("open File tab" rather than "open Layout") using speech in the Office applications.

Dragon lets you move seamlessly between controlling an application and dictating documents when you work in an application like Word.

While dictating text, I found a few short words would occasionally get left out, and from time to time a word would be recognized correctly, then inserted twice. Quite often, Dragon would tell me that it needed me to repeat a phrase and then would immediately insert it correctly anyway (which was another way I ended up with duplicate words).

Some very similar-sounding words were recognized incorrectly, like "sync" and "sink" or "dot" and "dock" (which Dragon initially recognized as "dork"). More annoyingly, I would sometimes get the singular form of a word like "suggest" when I had said "suggests." On the other hand, if Dragon mis-recognized, say, "accept" as "except," then the correct word would almost always be listed as an alternate when I told it to correct the mistake.

When you notice a word or phrase that's been recognized wrong, you can say "Undo that" or "Delete that." If you say "Correct that" Dragon opens a Correction menu that shows a numbered list of alternatives; you can say the number to choose the one you want, or say "Spell that" if you don't see the correct word on the list.

If you need to correct something you didn't just enter, you can say "Select" and then the word or phrase that's wrong; if it's a word that appears in your document more than once, Dragon shows numbers in the text so you can correct other instances.

As with the rest of Dragon, you can control the Correction menu with voice commands, including adding new words to Dragon's vocabulary.

It's also easy to do some simple formatting as you dictate, by selecting the words you want to format (by speaking the "Select" command). You can create a numbered or bulleted list, put words into to bold or italics or underline them, change the capitalization of words or put a phrase into quotes.

Generally, I found that the recognition quality was good. I was able to dictate large portions of this review into Microsoft Word reasonably quickly and without being slowed down much by recognition errors; there were only three or four instances of words that were so badly wrong that I later had problems working out what I might have originally said. (If you're stumped, the Correction menu has an option for playing back what you dictated, although that doesn't save as much information when you're using Web apps as when you dictate into a desktop app.)

I didn't need to pause frequently when speaking, although you will probably find that it takes some time for you to be completely comfortable composing out loud rather than on a keyboard.

Eventually, I found that I could dictate most of a sentence without a break on my Intel Core i5 laptop and Dragon would catch up with me soon after I got to the end of the sentence and stopped talking, while I was thinking about what to say next. This is close enough to real time so that most users should be able to talk in phrases and sentences rather than a word as a time, and still keep an eye on how accurate the recognition is.

You do need to minimize background noise though. If there is music playing or people talking elsewhere in the room, or if a pet is making noise, you're likely to get far more errors. And if you accidentally leave the microphone on while you're having a conversation, what you get is a particularly abstract form of poetry.

The most disconcerting thing is likely to be getting used to talking to your computer (and hearing your own voice) instead of typing on a keyboard. The times when spoken corrections went wrong occasionally left me in a loop where the commands I used to try and correct the mistake were recognized as words instead. It was sometimes easier to drop back to the keyboard briefly just to fix the problem -- but I ran into this far less often than I did in earlier generations of the software.

Dragon has some built-in rules for how it presents what you dictate. Numbers are usually recognized as words, unless they're in a list or part of a date or measurement. If you are correcting text that has been formatted by one of these automatic rules, you'll get a large pop-up explaining this and telling you how you can change the rule; for example, if you always want to have numbers recognized as digits rather than as words or choose your preferred spelling. And again, you can make those changes with voice commands as well.

A convenient feature is the ability to save boilerplate text like your name and address for signatures, or terms and conditions you often add to an email, so you can insert it by saying a single word. You can make this much more powerful by putting variables into the text which you can fill out as if you were dictating fields into a form (by saying "next field" to jump to the next field); you can use this for mailing labels, reports and other things that need to use a specific template.

However, I had to choose the shortcuts to trigger these Auto-text entries carefully -- otherwise Dragon would just show the phrase I had said instead of inserting the Auto-text.

It is also confusing that when you create one of these shortcuts, it's called Auto-text, but when you want to edit it, you have to look for Custom commands. And if you want to add steps that cover multiple programs or need keyboard shortcuts, like creating an email and attaching a file, the feature is called MyCommands (even though it's part of the advanced section of the same dialog).

You can also control what Dragon recognizes by adding words and phrases to its vocabulary. This ought to make Dragon more accurate when you're using names, addresses and product names -- but I found this didn't always work well. You can add words using the commands on the DragonBar, or you can open Dragon's Vocabulary Editor (also from the DragonBar) if you want to see what's already in the vocabulary and then add missing phrases. You speak the word, correct what Dragon recognizes it as and, if it's not getting it right, train the software by speaking the word several times.

However, when I tried to add OneNote to the Vocabulary Editor (it's a product name that Dragon didn't recognize) and train it specifically to recognize OneNote as a single word rather than two separate words, things didn't go as planned.

After I finished training it by saying "OneNote" half a dozen times, I tried to use it in my document. Dragon could find the instance of the word "OneNote" that I had previously corrected, but not any of the instances that I had entered into the document after it was trained. I removed the word from its vocabulary and tried again, and this time Dragon could recognize all the instances of "OneNote" in the document, but it still didn't suggest the correct capitalization for those words, and it didn't recognize the word correctly when I was dictating either. I had no more success by entering OneNote as an Auto-text expansion.

Finally, the third time I added it as a custom word, Dragon began to recognize OneNote correctly at least some of the time. So you may need to invest more time than you expect in training the system for your custom words.

Dragon learns your voice profile and should continue to improve slightly as you use it, although I didn't see a noticeable difference over the approximately three weeks of testing. You can also use it to transcribe audio files recorded with your voice or from another speaker, and you can have Dragon watch a specific folder and automatically transcribe files you drop into it.

However, it can cope only with a single speaker per file, and you need to create an audio profile for each speaker by recording at least a minute of speech and correcting any recognition errors by hand (making it less useful for meetings with other people).

I found recognition of recorded files significantly less accurate than the real-time recognition, even on the same device. However, Dragon can synchronize playing back the audio file as you edit your transcription in Word, which does make correcting less painful.

With this release, Dragon also promises to sync your voice profile and Auto-text shortcuts across multiple devices, so that if you use several computers, you should immediately get better recognition on the second machine you set up. In January, Nuance is also planning to release Dragon Anywhere, mobile voice recognition apps for iOS and Android that will share your voice profile and your custom phrases from the desktop software to get more accurate recognition on your phone.

This release of Dragon continues to improve the accuracy of the product and tweaks the interface to be less intrusive.

Its most interesting features are custom words and audio shortcuts (although the interface for these features is confusing), and synchronizing your voice profile and those custom shortcuts with the smartphone versions of the software, which are not yet available. Controlling Web apps is useful but doesn't work as consistently as does controlling desktop software.

I was particularly disappointed by the (small) number of occasions when I lost text whilst dictating. You will also need a reasonably powerful PC; I found the software would sometimes become unstable when system resources were low.

This version of Dragon shows that voice recognition software has become good enough to be really useful, although it still isn't completely reliable in every situation. Dictating text or controlling your computer with your voice can still be a little strange, even when it's very convenient. If you devote some time to creating your own shortcuts as custom commands rather than just using voice to control an operating system designed for keyboard and mouse, you can be very productive -- but Nuance needs to improve the interface for doing that, so that it's much clearer what you can do and how.

(www.computerworld.com)

Mary Branscombe

Per E-Mail versenden

Artikel als PDF kaufen

Über den Autor