2008-09-18

I Talk It Types

An update on voice-recognition training

It's now been about three weeks since I started using Dragon NaturallySpeaking (version 9) voice-recognition software. Herewith an update.

For a blog post like this, the accuracy of the program is excellent: I would guess that it runs well over 90%. The other thing that I use the program for is to dictate handwritten drafts of old stories, so that I don't have to type them in, but I'll still get a digital copy. Since my writing involves strange words, archaic sentence structure and usage, the accuracy of the program is a good deal less with this material – I guess it runs about 75 to 80%.

I find that I have definitely changed the way I speak (at least when I dictate to the microphone) and as a result, my throat gets a little sore if I dictate too much in a day. I'm not sure if this is because I'm straining when I talk – trying to speak more clearly and enunciate more clearly than I ought to – or whether it's just that I'm not used to speaking so much, so long, without a break. I keep trying to remember to have some liquid nearby to take a sip now and then – every minute or so – so far I don't remember to do so, but I would advise anyone trying voice-recognition software that this is probably a good idea.

Another bit of advice for anyone who wants to try the program is this: use it every day. Use it for at least half an hour every day. At first, you'll find that it takes longer to do any given task with voice recognition (which includes time spent training and correcting mistakes) than it would to do the task your usual way. I expect it will take at least two weeks to gain proficiency sufficient that, using the program will be just as fast, if not faster, than doing it in your normal manner.

One problem with the software is that the manufacturers don't allow you any trial, to test the software and see if it really will work for you. I believe that the company Nuance which manufactures Dragon NaturallySpeaking has on their website an online demo, which you can play with to get some preliminary idea of how well the program might work for you. But I don't know how extensive the test would be for you – it certainly can't compare to using the real program on your own computer for half an hour each day for two weeks.

The program is also fairly expensive, so you really have to have a good reason to want to invest the money and the time to find out if it's good for you or not.

Here are some things that would make it more worthwhile for someone to use the program: first, anyone with repetitive strain injury or RSI and carpal tunnel disease would find that the voice-recognition software gives his hands and fingers a good break, and allows him to continue to be productive without further risk of injury. Second, there are other people who find typing difficult. When the winter comes, I keep my house so chilly that my fingers get quite cold when I'm typing for long periods. So I look forward this winter to being able to keep my hands tucked away, nice and snug and warm, while I dictate to the voice-recognition software, and it does the typing for me.

Today I boosted the amount of RAM in my computer, doubling it to 2 GB. I have been running into the problem where the voice-recognition software would tell me that it was running out of memory, so I hope that this is going to fix that. So far today, I have not yet noticed any appreciable boost in speed in the program. (For the record, I'm dictating to a Toshiba laptop running Windows XP with an Intel core duo CPU clocked at 1.6 GHz.) One thing I'm not sure about, is how raw clock speed in the CPU multicore chip helps the program work better. I believe the minimum requirements for version 9 of Dragon NaturallySpeaking, which is the version I have, specify a 1 GHz CPU, and recommend 2.4 GHz clock speed. 1.6 GHz falls roughly halfway between 1 GHz and 2.4 GHz, but this is a mobile chip, so it might not be as powerful as the corresponding desktop chip. If we take the two cores and multiply 1.6 times two, we get 3.2 GHz, but I have no doubt that neither Windows XP nor Dragon NaturallySpeaking can 'see' both cores and use them to their maximum. I doubt the two cores running at 1.6 GHz do much better than 2.0 GHz – but I'm no expert in this.

As another one of my 'blind' experiments, I'm dictating this blog post without looking at the screen. I find that this works faster in speaking because I don't have to bother with checking to see if the program made any mistakes. On the other hand, whatever mistakes it's making, will be twice as troublesome to correct, because I'm going to have to go back over the entire blog post and reconstruct what I meant in the cases where the software atrociously misinterpreted my mumbles.

One other thing to consider is how fast I can compose while speaking aloud, compared to writing, whether that means writing out in longhand or typing on the keyboard. I'm not really used to telling tales out loud. Moreover, I find I have to put part of my attention into the simple act of speaking clearly, always with part of my mind considering how well program might be interpreting what I'm saying. This another difficulty also: that is that you have to be speaking out punctuation, which is different from the way we just talk normally. I have to speak out loud and name every punctuation mark, and that does throw my concentration off a little bit. I expect – or rather at least I hope – that in time I'll get used to this, and it will become second nature. (The program does feature a mode of operation under which the program will interpret punctuation and insert it based upon the way you speak, so that you don't have to name each punctuation mark. I expect it looks at rising inflection at the end of a sentence – or at least what the program interprets as a sentence – and then it interprets it as a question and answer sessions with you? A short pause would be marked with a comma, and a longer pause would be marked with a period. (I had to type out the words, 'comma' and 'period' – because I don't know how to force the program to spell out the words, rather than typing the punctuation marks.) Of course the program has to know something about grammar in order to make probable guesses as to the words you mean, so, analyzing grammar would give the program a better idea of what constitutes a full sentence.

I would say that using the program for normal tasks – especially business-related tasks – would be much simpler, smoother, and quicker transition than using the program to dictate fiction or detailed scientific treatises. I believe there are more expensive special editions of Dragon NaturallySpeaking geared towards medical or legal professionals which contain prepackaged vocabulary lists geared to those professions.

(Composed by dictation Tuesday 17 September 2008)