Dougma (dŭg·mə) n.

  1. An authoritative principle, belief, or statement of ideas or opinion, especially one considered to be absolutely true by Doug; who is often wrong.
  2. A specific tenet or dougtrine authoritatively laid down, as by Doug.
  3. A system of principles or tenets, for Doug.
November 21st, 2007

I HATE TAGS

I was not going to do any posts about the PyCon proposal review process until it was done, but I am sitting here with a head cold, and hopped up on med’s, and my pet-peeve button has been pressed one too many times (so we will go with that excuse, verses me just being a disagreeable, argumentative, melodramatic person ;-) ). This does not directly relate to the proposal system, but it is a very useful data point. Obviously I do not hate tags to the exclusion of their general use. This post and all my others have tags. I have not always been so averse to tags and saw semantic tagging as being the first step into a more useful web experience. The real issue is that every one has their own concept of what a tag is and how they are used.

This topic came up on the organizers list a few weeks ago. Laura Creighton made some very good observations on how we were crossing purposes with the tags on proposals; using them for reviewers, speakers, and attendees. She notes that the tags which are useful to reviewers are not those that are useful for people wishing to attend a talk (software issues kept me from implementing this). Herein lies the first problem with tags. Each person looking at the tags will be trying to use them for a different purpose, and you can not tell that person not to use them in that way. Some people want them to filter multiple objects of some type, some want to find related information, some are more interested in the occurrence of specific tags, while others find information in the tags them selves. The holy grail is to try to have one set of tags achieve all goals, but it just can not be done, as the review process has shown.

To me, the best example of tags just not working, is the tag cloud. When I first saw a tag cloud I was in shock. “Why didn’t I think of that?” It seemed so obvious that the tag cloud, with size and color, conveyed a depth an immediacy of information that it was one of those revolutionary ideas. A revolutionary idea is one which is very simple, easy to use, and yet extremely powerful. Then I started trying to use sites which implemented tag clouds. I now despise the tag cloud, and dismiss sites which use them. Why? Because they do not provide an interface into the information I the user care about. A python based blog has a huge ‘Python’ tag and a number of other medium sized tags all related to python, and then there are a few little tiny, impossible to read tags, one of them being ’svn’ say, and that one post contains the nugget of information I care about. If there were a way to make the cloud represent the size of tags by how important I deem them to be at that time, then it would be a useful interface. As it is, it is an annoyance, and completely useless as a means to find information, which is presumably the purpose of the interface. the one place where I thought it would be usefu, news, where the size is representative of the number of news articles of the day or week which have that tag, falls flat in my experience as a user. I rarely care about the most talked about subjects of the day. I care about the little things, the oddities, the things that matter to me. I guess what is important to me is not what is important to the majority of other people. Maybe if I cared about what everyone else seems to care about (if these clouds are in fact representative of that), then they would be useful. But are they even a good measure? How would one determine that?

This is the general problem with tags at the end of the day. They are representative of what the person assigning the tags feels is important, not the person who is using the tags feels is important. So why do I use tags on my site? They are for me. I really do not expect them to be useful to anyone else. The tags help me find stuff I have posted previously, and to generate some tag specific feeds. This post will not show up on the unofficial planet python, as it is not python related. In that regard the tag is a content filter for that aggrigator, which again is more a use for me, than anything else.

So what does this have to to with the pycon proposal system? Well last year when I first wrote the proposal system I included a free form text field for people to add categories. No one could agree on how this field would be used, so I just left it like that and figured the program committee chair would figure out how it would be used. Keep it simple, and let procedure determine the rest, not code. Its a meme I use often, and it provides for great flexibility with little (usually no) code work. The proposal system was in a lot of flux at the time, and we never did come up with instructions for that field. The end result was a haphazard representation of like categories with little overlap. There was ‘web’, ‘web service’, ‘web-interface’, ‘web framework’, etc. Jeff Rush spent hours going over the list and standardizing it to what we ended up with. It is a fairly long list, but not too long for the number of talks, and it fits on one page on all but the most restrictive of resolutions. We did not want a repeat of this and when I asked for feedback on the proposal system, the only specific feedback I received was on the categories. What people wanted was:

  • Have example categories to choose from.
  • Allow for people to add their own.
  • Have those added categories in the selection.
  • Allow for spaces. (At first I liked this, but now I regret it)
  • Limit people to 10 categories. (I wanted 3 and should have stuck to this point)
  • Do something better than the usual multi-select as the control-click thing can be a pain.

I would have ignored that last one, except 5 out of 8 pieces of feedback mentioned that. So I cheated and used the django multi-select widget, and seeded the tags with ones from the previous year. There are some public examples where you can see the end result. You will also see all the tags people have added over the review process. the hope was that as people would only add a few tags as needed and would reuse the ones already present. The nice ’search’ feature on the widget makes things even easier with limited form real estate. The end result is we have 113 unique tags being used for 141 talks! We have 6 tags which start with ‘web’, for which ‘web’ would be perfectly fine. We had 5 talks which used the tag ‘develop 3x faster than others’?!?! The truly humerus part is that two of those talks were claiming to be faster/better than the other. I have removed that tag, and will be going through the others as well. One person repeated the title of the talk as a category. There are others whom have created lengthy categories resulting in a category list which is longer than their talk summary! The only thing I can think of to explain all this is that the form interface still sucks, and the process is still not described clearly.

I will be going through all the proposals and doing a category house cleaning, and try to get a handle on things. I think I will limit things to about 20 categories, and next year those will be the only categories we will have. We will also limit the number of categories per proposal to be 4. Hopefully this will make categories have some actual use. Currently they are nothing more than an annoyance, and have little value.

I should note that I am not a web designer, user interface specialist, or web anything. I am a C++ developer with extensive python C/AP and integration experience. This PyCon-Tech stuff is a lark, and 100% out of my comfort zone. The things I want to work on (survey data mining using NLP, interest statistical analysis, group theory for google maps), I have had to put off as more important core features are needed. Think I am completely off base? Think you can do better? See something about the site you don’t like and think it sucks? You are most likely correct, and no I do not know how to fix it. So PLEASE step forward and help out!

November 19th, 2007

Here there be Dragons!

Well it has been a 10+ year wait for some and a ~6 year wait for me, but there is finally a Dragon Naturally Speaking TV Commercial! It’s just being shown on some basic Cable channels (CNN, Discovery, History, Fox ***). There are not that many spots purchased, but it’s something! I remember being told back in 98 that we would have a commercial for VoiceXpress, never to see it happen. I was more than a little jaded when it was announced internally a year ago, that it was finally going to happen. It is a little surreal to see something that I have worked on (that is not Open Source) hawked on TV. Now I am am just hoping we can get a GOOD commercial for the product, I would settle for merely not bad at this point.

My god, I wish I had something good to say about this piece of advertisement. Lets decompose.

1. The product

They somehow took a product that is fun to watch people use, and has a very cool, minimal, embedded interface, and made it a chore to watch. Worse yet, they misrepresented it’s capabilities. It works better, faster, and easier than is portrayed. It is a continuous speech recognizer, why is that kid talking discreetly (one word at a time, and slowly?). It looked like he was using Kurzweil Voice or Dragon Dictate from 1995! Even the executive is talking as if he is reading a cue card (which is most likely the truth of it). This software is used by stenographers in court rooms. The fastest speaker can be recognized. I thought the most precious thing in a commercial was time? Why is there so much filler?

They never actually show the software clearly. The dragon bar is just a yellow blurry blob on the screen. Granted you can hide the bar, but the end result is, the system appears to be very slow and unresponsive. Even the cute typing text in the add is a misrepresentation. We do not send one letter at a time to the screen; that would be annoying after 2 seconds, not to mention slow. At least the womans use of the product is a little closer to reality. Oh and yes, you have full command and control over your PC ala Star Trek, but by saying ‘listen to me’ and ’stop listening’ instead of ‘computer’; though you would never get that impression.

2. The script

Wow. I mean wow. When it comes to emerging trends, technology, and what the ‘hip’ kids are doing, our upper management is fairly on top of things. They also understand the products they sell at a level I have not seen at any other company. I guess the same can not be said of our hired marketing firm (or whoever produced this thing)! The things wrong with the script (from both a technical and substantive perspective):

  • The woman writing an e-mail: Even my Mom knows what a signature is and how to set her mail program to include it automatically. Who in their right mind would use a voice macro to do that? There are much better ways to show off macro’s. And form filling would be better for multiple signatures. That is a killer feature!
  • The history report. I know adults think teenage boys are stupid and cannot write a decent report, but this poor actor must have been choking back bile when he saw this script.
  • The IM interaction. This is just abysmal: ‘Are you in’? Every IM client from hour 0 has had the ‘away’ feature, so yes, he is ‘in’, you know that already. I guess ‘in’ is the hip new lingo all the kids are using these days.
  • The executive. This is how a teenage boy pretending to be an executive talks.

3. The acting

All I can say is I really hope we got a good deal here. In some cases I think it is the material. There is not much you can do to make it palatable, so why waste the effort. It is a cut rate commercial for a few basic cable channels after all. What is interesting is we have many commercials for our vendors and resellers, and the acting there is top notch. You can find most of those on youtube, or on our website (though this is a demo, not a commercial).

4. The Direction and Cinematography

Ok, its a commercial. The sets were believable. the lighting spot on, the sound perfect, and the basic camera work was good. But please show the product. You never got a good look at what was going on, on the machine. The product is supposed to be the star of the commercial. Here it appears to be a character off screen being talked about.

What would I have done?

Well I would do a rip off of the HP ‘talking hands’ commercials. This was actually my idea back in 1997 when I first started working with speech. Show the apps swapping around like you do in real life, only by voice. Show the real speed of the product. show the people, but not their heads. Show their feet propped up on the desk in front of them. Put the product first, and make it interesting. Have them standing there, and the transparent screen in front doing all the cool things they do in the HP commercials, only they have their hands behind their back. Show people really using the product, not selling it. I guess that is why I am an engineer and not in marketing.

Additional Notes:

No you do not need to speak punctuation. There is auto-punctuation, but before it is accurate the system needs to have been used for a number of hours to collect enough statistics on how you use punctuation and inflection. Having DNS inspect past e-mails and documents can help, but gathering the inflection data is the key part. For that reason it is turned off be default. More research and data will fix that.

No you do not need to spend hours or even minutes training the system. You can use it right out of the box, and it will get better with time. To be honest the learning curve is more on the user end, than on the software end. Talking to a computer is just like any new means of input, like cell phone SMS tapping, or pen tablet. It takes time to become comfortable with it, but not as much time as other input devices.

This blog post dictated with Dragon Naturally Speaking 9.5 Professional.

November 18th, 2007

Spotlight PyCon-Tech: CacheMgr

Ever wonder what the heck is going on in your django cache? Ever wish you could clear an entry or two or all? Well I have, and CacheMgr is my answer:

CacheMgr Screenshot

This does not negate the need for proper cache use, but it can be a life saver when you want to update your sites templates, but don’t want to restart the server because people are say, using it to submit proposals before a deadline. This app is not feature complete. Currently only the following cache backends are supported:

  • dummy
  • simple
  • localmem

The remaining backends need to be implemented. The database and file based backends are simple to do, but memcached will take a bit more work. The system is extensible, so you can extend it to work with your own cache backends the same way you can with django’s cache system. All you have to do is extend the base class the same way you would with the django cache system:

class BaseCacheMgr(object):
    """No description has been set.
    """
    has_cull = False
    scheme = 'unknown'

    def __init__(self, cache, host, params):
        self.cache = cache
    def __iter__(self):
        """Should be overridden to iterate over the cache and return
        a dict with the form {'key': key, 'short_key': short_key,
                                     'repr': cache_value,
                                     'expires': datetime_expires, 'expired': False}
        """
        raise NotImplementedError
    def info(self):
        default = (str(self.cache.default_timeout)
                   if hasattr(self.cache, 'default_timeout') else 'None')
        return [{'name': 'Scheme', 'value': self.scheme},
                {'name': 'Description', 'value': self.__doc__},
                {'name': 'Default Timeout', 'value': default},
                {'name': 'Has Cull', 'value': repr(self.has_cull)}]
    def clear(self):
        raise NotImplementedError
    def cull(self):
        if not self.has_cull: return
        raise NotImplementedError
    def delete(self, key):
        self.cache.delete(key)

Fairly self explanatory when you look at the image above. Take a look at the simple backend implementation for more details.

Other features that need to be implemented:

  • Clear Expired Button
  • Paged View (like the admin)
  • Sort Table heading links (like the admin)
  • Search Fields (like the admin)
  • Move the repr/short_key/expires helper code into the base class.

November 11th, 2007

PyCon 2008 Needs YOU!

There is less than a week left to submit proposals for PyCon 2008 Chicago. This is a community conference, and we need the community to step forward and make this conference a success. After the submission deadline (Friday November 16th), there will be a further week where the Program Committee will work with authors to get the proposals finalized. So don’t feel you need to have a talk fully fleshed out to submit. We have some sample proposals so you can see exactly what a proposal entails. We are particularly interested in Panel Discussions, which were quire popular last year.

Don’t think your qualified? Think again! 

If you think you do not have enough python experience, or not part of some core community or project, you are exactly the type of person we want to hear from! You are the core of the python community. Those who use python to solve problems, no matter how small, or just to have fun. Don’t feel you need to give the presentation alone! Find a friend or two and do the talk together. Still don’t feel comfortable enough? Consider joining us and holding an Open Space talk at the conference. These are unstructured free form talks by people interested in a common topic or issue. Also consider giving a 5min Lightning Talk.

Talks can be about anything, even things unrelated to Python. We have had talks on patents and licensing, talks on writing your first python module, and even talks on human brain modeling. Many attendees are new to python, so talks on basic python are highly prized. The majority of talks are 30min long, with 5min setup, and 5min or more for questions is recommended.  There are 45min slots for more detailed discussions. PyCon is at it’s heart an unconference.

The power of an Unconference. 

At many technical conferences, it is hard to decide which talks to attend,  which ones will be of greatest interest, or worst, you want to see everything. With an hour long (or longer) presentation you are guaranteed to ‘loose’ parts of the audience as not everyone will be equally interested in all aspects of a presentation. This is just a fact of presenting. There is also little chance for follow up between the speakers and the attendees. PyCon addresses these issues on many levels.

PyCon keeps the talks short which helps to keep things interesting, and keep the attendees hungry for more information.  Speakers are encouraged to hold follow-up Open Space talks to dive into details that attendees are interested in. Only those truly engaged in the discussion will attend the Open Space follow up, and a more meaningful interchange of ideas can ensue, in a more personal environment. Even if the speaker is otherwise engaged, attendees can hold the Open Space talk on the spot if they wish. Presentations are not to be seen as a one way dissemination of information, but as a spring board to deep sharing of knowledge and experiences.

And I haven’t even mentioned  the tutorials, parties, BoF’s, labs or sprints yet!

|