Well the Python lab seemed to go ell last night, and I had a lot of fun doing it again, even if the turnout was… well not what I had hoped. I think the people who did attend enjoyed it enough. Most people have come back to talk to me about it today. One person asked if I would be doing it again tonight, but alas, not.
I saw some of the most fantastic talks today. I could talk about those, but that is not what I will do.
I was stressing out today as I had a lightning talk to give. Lightning talks are nothing in general, except mine was on speech recognition and why Nuance is being a sponsor. I love speech recognition, but doing live demo’s of it are next to impossible. I ‘have’ the equipment to do it right, but everything fell apart during the past two days. My very expensive noise canceling microphone headset which can deal with audio feedback from monitors is back in Boston. The laptop that I was supposed to use never showed up. The USB stick that my mobile profile and slides were on was destroyed by the swag. The install CD’s I had were destroyed at some point; thanks to a spilled coke I think. The headset alone is a deal breaker actually. A normal human being would have given up at this point, but as anyone will tell you, I am anything but ‘normal.
I did have a hard drive with an early gold candidate of Dragon Naturally Speaking 9.5 (totally by accedent), Catherine Devlin graciously allowed me to use her laptop, and Carl Karston and the AV crew <insert forgotten name here> set me up with a Sure USB Microphone. This Microphone was the shiznitz! So I scrapped the entire planned talk and decided to work with what I had. I think it worked out in the end.
I should explain a bit about doing live demonstrations of speech recognition. They work quite well in demonstration booths if you have an enrolled profile (the user has trained with the system), and a high quality $$$ mic. But there are some rules to doing these demonstrations. You do not do them with a bad microphone. You do not do them on unknown laptop hardware. You do not do them with a speaker independent un-enrolled profile. You do not do them with out a prepared script that you have tested and know inside out and backwards. And you never, ever do them in front of 1000 people in a very noisy, echoy ballroom external under amplification. If you break these rules, you end up with demonstrations like the one Microsoft had, and it ends up on youtube (out of professional respect I will not link to their demo), and it continues the myth that speech recognition doesn’t work. Because it doesn’t work under those conditions, as even humans can’t recognize what you are saying half the time.
My original talk was going to be fun and interesting and poke fun at speech recognition demonstrations in a way that people would get, yet still show off how good it was even in bad conditions. The software can handle it; I have tested it enough to be sure of this. Well under the actual conditions I was faced with I knew it would be Microsoft all over again. I had four options:
- Follow the script and make a Nuance products looks like total garbage.
- Ditch the live demonstration and give a dry boring talk on why Nuance was sponsoring Pycon (and have it forever forgotten)
- Ditch the talk completely, and leave Nuance as just a footnote on the website.
- Have fun, and if I get very lucky, people will remember Nuance in a positive light.
I am not 100% sure I got very lucky, but I did get lucky.
Only at PyCon can I show up with litterally nothing, and have people bend over backwards to help me. I just asked Catherine to use her machine because I noted at the python lab that it was a nice, fast Dell XPS machine. I had 0 expectation, and yet she had absolutely no problem giving me her machine and password for full access to it for an in determinant amount of time. Carl and the AV guys were running around like mad looking for anything that would work. What they came up with was beyond my expectation and set the entire tone for the talk. When I saw the mic I had a flash of inspiration, and somehow it seems to have worked. I actually did a 3min enrollment with the mic in the atrium in a deafening din of noise during the break just before the talk, much to the enjoyment of Mary who was almost in tears with laughter. (I was using our David Barry enrollment script and almost yelling it.) I can’t wait to see the video.
Another surprise was that there were many Dragon users in the audience. Many people came up to me after wards when I was doing the demonstration in the (now quiet) Atrium with success stories and offers of professional headsets. One person (Lisa) is a storm chaser and uses dragon while driving after and away from tornadoes! Other people came up with very good questions. The level of sophistication of the average PyCon goer is quite high and everyone had a great respect for the problems of speech and language in general. There were none of the ’star-trek’ questions I often get.
Later I ran into someone who had had less than stellar experiences. We talked for quite a while about what the source of the problems could be, and I am very pleased that they are willing to try again and provide feedback. For some people speech recognition using the standard speaker independent base models just does not work well. There are so many variables determining why this is the case on a person by person basis, that it is next to impossible to tell what the real problem is just by talking. These users are the ones we want to know about. Those who try it, but for what ever reason it just doesn’t seem to work for them soon enough. Those with only ~95% accuracy or worse and quickly give up (or even not so quickly). Due to the activation/registration that the product does, we have decent numbers on how many people are effected by this. The numbers are quite low and it is not a problem from a ‘marketing’ or ‘business’ perspective. It is a problem from a research and understanding perspective. We want it to work for everyone.
Other important notes:

I think you did a great job on your talk. I was close enough to the screen to see the tiny text that the software produced when you just went for it.
I was blown away that it work so well. Sure it missed a word or two, but it typed better than I do