One of the holy grails of computer science and medicine has been accurate speech recognition. This seems to be a technology that is always 5 years away. Anandtech has a review of Dragon Naturally Speaking and the built-in speech recognition for Microsoft Office 2003. The short version of Anandtech’s review: Dragon is better than Microsoft but both have accuracy issues. There is also the Free and Open Source Sphinx speech recognition engine not reviewed in the Anandtech article.
The accuracy issues pointed out in the Anandtech article appear to be the same issues that have plagued this technology for years. 95-99% accuracy appears good until you realize that this means 5-1 mistakes per 100 words.
I first saw Dragon demoed on a Intel 386 at a medical trade show 15 years ago. It seems that not much has changed since then. As well, I tried to get Microsoft’s Office 2003 speech recognition working on my home machine. I trained it and began working only to have it consistently crash my Windows XP machine badly enough to require power cycling. The short version on Sphinx? Still trying to install it but stay tuned for a review if it does anything useful.
When I give talks on health IT to non-technical audiences, I usually start off the question and answer session with: “Are any of you wondering why I didn’t discuss speech recognition?” the answer from the audience is inevitably “yes”. The answer I give is that this is a particularly difficult computer science problem due to the lack of semantic intelligence of computers in general. For example, disentangling things like: too, to, two and tutu. It will probably take connection to a project like Cyc and OpenCyc to increase accuracy to usable levels.