I begin to train the voice-recognition software
I’ve always been interested in voice recognition software. Now, finally, it seems I have a powerful enough computer to be able to actually use it. Because the manner in which a talesman tells his tale influences both the style and content of the tale, I realize that dictation represents for me a new challenge – something that I have to learn, practice with, and gradually become a master (that is, if I ever do master it).
And because I was always interested in this voice recognition software as a writing tool, I expect that others are also interested in the process – so I thought that in my first days of training the program, I would do my blog posts using Dragon NaturallySpeaking exclusively. (For the record, I’m using Dragon NaturallySpeaking standard version 9.) As a result, the following few blog posts will probably be of very little interest to most of you, and quite interesting indeed to a small number of you.
My apologies in advance to go to and those of you who are bored.
Very well, let’s begin shall we?
The first thing that I notice about this process is how slow it is – not that the program is slow, but rather that I am slow in adapting to the program. This is not one of those cases where the program adapts to the user; this is, instead, an interactive dance between the program and the user. The program learns from the user but the user must also learn from the program. I have to learn how loud I should talk, how clearly I have to enunciate, and how fast I can go. My suspicion right now is that I can go a lot faster than I imagine, and that really it’s only my own hesitation that’s slowing everything down. I find, indeed, that if I speak faster the program, after a bit of hesitation, will catch up to me and will keep up with me. On the other hand, there is always the problem which is inherent in these programs – that is, that the program will misinterpret what you say, but it will not misspell anything. All the words will be correctly spelled – they’ll just be the wrong words! And that means that it’s a lot trickier to proofread text that has been dictated through a voice recognition program than it is to proofread text you type out. Microsoft Word, WordPerfect, and openoffice.org will all check your spelling as it is typed, and underline words that the program doesn’t recognize. But when Dragon NaturallySpeaking (or, I imagine, any other voice recognition software) doesn’t understand what you said, what it understands will all be spelled correctly – and that means no wavy red lines under any words to mark as signposts what you got wrong (or in this case, but the program misunderstood).
As a result, I find that I take each phrase and give a long pause after it in order to give the program enough time to type out the phrase – then I look at the phrase as the program understood what I said, and double check. This introduces many long pauses into the whole process of dictating into the program.
Just for an experiment, I’m going to try to speak without even checking the recognition for the next paragraph:
All right: right now I’m not even looking at the screen while I picked it. I’m speaking at pretty much a regular pace (a little slower than I would speak normally, actually). My voice is getting a little hoarse – and this is a problem with speaking not in the normal pull envoys, which is something we makers of the program warned us against. All the same, I find it almost impossible not to try to speak a little more clearly and precisely than I normally do. Also, I don’t normally talk for long periods of time – and so my vocal chords and what other other apparatus that I have for speaking isn’t really used to this kind of talking and talking and talking.
I must confess that I have glanced at the screen a couple of times and also that I put in a couple of pauses. But my thoughts are not flowing perfectly smoothly which is probably a result of my lack of experience in speaking extend for a UNIX way. So sometimes I have to pause just in order to put my thoughts together. When I write longhand or when I typed out what I’m writing on a keyboard, the process of putting the words down is generally slower than my own imagination in composing the lines. This might be because I’m more custom derided in that way, or it might just be the typing or writing longhand are both slower than speaking – and so I’m not used to composing quite so quickly as I am right now.
The previous two paragraphs were spoken just about as fast as I can compose this kind of stuff.
I see some boners already. For example, I said “tone of voice” but what Dragon NaturallySpeaking typed out in response was “pull envoys” and when I said “the makers of the program” the program thought that I said “we makers of the program.” You will also notice that I said “and what other other apparatus” – and that is in fact what I said, so don’t blame everything on the program. The next misunderstanding is so weird that I don’t even know what I meant to say – that’s where it typed out “extend for a UNIX way” I probably said something like “extended time” or “extended period.” Next up: “because I’m more custom derided that way” – which is how the program understood me when I said “because I’m more accustomed to writing that way.”
The number of errors is not great, but because it’s so difficult to proofread this kind of mistake, it really does slow you down at least until you get to the point where at the program understands you, and you understand the program – and also where you trust the program.
This is an especially big problem for any fantasy writers, because we are constantly inventing new words new terms new names – none of which exist in any dictionary in the world, and as a result we can’t expect a program like Dragon NaturallySpeaking to understand what we are saying without a lot of training. And it can be understandable that this period of training would take so long that most of us, incredibly frustrated, would give up before we ever reached the point of true proficiency – and, of course, we don’t know where that point is. There’s no way of telling just how good we can get with this program until we get there – in the meantime we have to trust that it will get good enough so that we can trust it without constantly checking how it understands what we are saying. This is much less a problem with the typical user, who does things like e-mail or standard business writing.
(Composed by voice Monday 1 September 2008)