After finally being done with the talk proposals on Sunday, I decided to take a break and read some of those python blog thingies to see what the clean people are up to. I was hoping to find a few topics of minutia I could loose myself in. What I found were a ton of posts on the new Google Chart API. Now I have been looking for a good chart solution for quite some time. I even discussed certain options over at Marty Alchin’s blog. I have looked at DojoX Charts, and Open Flash Chart, and many others. Dojo has the best charts, but the API is a PITA to figure out. the Doc is mostly auto generated from the code, and the samples are next to useless. I know I can do great things with it, but I recently spent 3 hours on it and got nowhere. Open Flash Chart got me up and running in no time, and has some nice python bindings, but its a flash based solution. So when I saw the Google API, I just dove into it’s doc to see what is up, and forgot all about the blogs. Lets see what happened next shall we?…
Within 5min I had broken out Wing IDE and was coding franticly. It was a shear joy to think about. The API is simple, and robust, and I knew exactly the chart I wanted to make. Not a simple dinky chart, but a full fledged stacked area line chart with color for multiple data plots and a ton of data for each plot. One of the additions to the proposal system is a complete change history on the proposals. This is implemented with the django admin log, so that even changes made in the admin are captured (though with less detail). You can perform one of three actions; add, change, and delete. We have four primary objects in the proposal system; proposals, reviews, comments, and attachments. New attachments are considered proposal edits for the graph, and you can not edit or remove comments or attachments. You can not delete anything. So this broke down into 5 data plots; new proposal, edit proposal, new review, edit review, comment. I wanted to show each of these changes on a per day basis, with the space accounting for the total number of edits having occurred to that point in time. In short I dove into one of the harder graphs. In truth the real work was generating the data I wanted to plot. The Google Chart API is a generic plotting api, so you need to scale and convert your data to match the type of graph you want to plot. Not a big deal, and anyone who has done any real work with plotting packages can do this blindfolded, and the Google API is so simple it makes it fun.
With that done, I decided to change my code a bit to add the ability to restrict the time frame, so I could see the post-deadline changes only:
The code to generate the graphs is now part of the project, and I will integrate it somehow for next year. In total I spent about an hour, and it was quite an enjoyable hour. With that done, I showed off my work to a few people, and then went back to the blogs to see what people were saying about the API.
What I read shocked me. There were a number of people who really liked the API, but I could not find anyone who had actually bothered to fully read the documentation or use it! There were complaints about negative numbers and hidden or undocumented features, and other garbage that is not really worth discussing. There were a few people who ‘got it’, and in general people thought it was cool and interesting. There were a few rants that are not worth mentioning.
So I think I will tackle the biggest misconception I have seen thus far. The Google Charting API is NOT for plotting your raw data points! It does not deal with dates. It does not deal with scales. It does not deal with negative numbers, log scales, or fancy data; but it CAN plot them! Why? Because it is just a basic plot package, and it has to deal with the restrictions that are placed on URL’s, as that is the data transmission layer. This is what I mean. If you have say data which ranges from 0 to 10, and you send Google that data to plot. It will plot it all right, but it will only plot it in the bottom 10% of the graph. You need to scale the data up to one of three plot ranges. The docs go out of their way to highlight the distinction between actual and plotted data. These ranges are determined due to the limitations of the URL. The first one is the ‘simple‘ 0-60 scale, supported by the simple encoding. this is the best way to get a lot of data points sent to google. This is because it only requires a single character per data point. If you don’t mind the simple resolution of only 61 discernible points on the Y axis, this is for you. The second is the ‘text‘ encoding; a 0-1000 range using essentially the percentile notation with one decimal point. The docs say 0.0 -> 100.0, but 1K by any other name is still 1K. As this requires a whopping 5 characters per data point average (have to count the separator), I see no reason to every use this, but I understand that it is good for people who want to be able to read and hand type the chart data. The last is the ‘extended‘ base64 2digit character encode which allows for a range from 0-4095. Depending on the encoding you use, you must rescale your data to match. So if you have negative values in your data, you must shift and rescale your data, and draw a new X axis line on the chart where your scaled 0 value is. Why not have the API support math conversions? Because there are so many, and because of the URL encoding. You have a limited number of characters in a URL, and you need to optimize for that.
Lets dive deeper into the ‘extended‘ encoding. This is most likely the only encoding I will ever use. Unfortunately the javascript sample code it comes with is complete garbage, and at first blush looks broken, limiting people to only 3844 unique values. I fear it will turn people off to what is a very simple encoding. Lets walk through an evolution of encoders in python [NOTE: all code on this page is in the public domain]:
GC_EXTENDED_MAP = (
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
'abcdefghijklmnopqrstuvwxyz'
'0123456789-.')
gc_extended = lambda num: GC_EXTENDED_MAP[num/64]+GC_EXTENDED_MAP[num%64]
Here we have a very simple number to 2 digit google extended encoding. This assumes that the number has already been scaled to between 0 and 4095. Yup thats it folks. So now what we want is to have something deal with the scaling for us.
GC_EXTENDED_MAP = (
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
'abcdefghijklmnopqrstuvwxyz'
'0123456789-.')
def gc_extended(num, max=4095):
scaled = (num*4095)/max
return CG_EXTENDED_MAP[scaled/64]+CG_EXTENDED_MAP[scaled%64]
this is better, but there are still some issues. The first one I will tackle is the ’rounding’ issue. Now the sample javascript code the google provides uses round. I personally do not like javascript’s round implementation as I have found it to be very slow on IE 6. I have no clue why. I also rarely use round in python when dealing with integer numbers. This is mainly due to laziness, but more on that in a bit. Unless you have a raw data range of around 6K or greater, you do NOT need to use round. This is because you are scaling up, and the difference of 1 is most likely going to be well within your delta error, or below your percision anyway. But lets say you do want to deal with rounding issues, and we can deal with floating point data at the same time:
scaled = int(((num*4095.0)/max)+0.5)
Done. Cheap and sleazy round and no ‘import math’ needed. this is also faster in javascript on IE 6; MUCH faster (no clue why, maybe their round is a wrapper for fround?) Ok, back to some real issues. Many times you will want to plot negative numbers. You could shift your data, and then scale. or you could manage that as part of the encoding. Managing it as part of the encoding is slower in some respect (as you are repeating math), but it does simplify the code a bit.
def gc_extended(num, min=0, max=4095):
scaled = ((num-min)*4095)/(max-min)
return CG_EXTENDED_MAP[scaled/64]+CG_EXTENDED_MAP[scaled%64]
Now we are getting some where. But there is one last issue. We may not know the full range of values, but instead want to Ceil/Floor errant values. For example, when dealing with signal processing, I know the signal will normally be within a given range (+-10db), but I also know that sometimes plugging in or unplugging equipment can cause measurement spikes in the data at 100db. These values are ‘real’ in that they happened, but would skew the graph. We want to peg those to the local min/max:
def gc_extended(num, min=0, max=4095, floor=0, ceil=4095):
if num < floor: num = floor
if num > ceil: num = ceil
scaled = ((num-min)*4095)/(max-min)
return CG_EXTENDED_MAP[scaled/64]+CG_EXTENDED_MAP[scaled%64]
There, that simplifies things, and with floor and ceil as pre scaled values, the api is simpler (if a bit slower). One last thing to deal with. Many times you need to deal with missing data points. Say you are plotting two lines which data collected at different points on the X axis. You have three options. You can give google the data for both X and Y in pairs, or you can give the data for just Y, and fill in the missing points with the ‘missing’ data marker ‘__’ (the simple and text encodings also support missing data), or a combination of both. This can drastically reduce the length of the URL encoding the data if the union set of missing unique x values is large and you go with supplying pairs, or small and you go with additional missing markers.
GC_EXTENDED_MAP = (
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
'abcdefghijklmnopqrstuvwxyz'
'0123456789-.')
def gc_extended(num, min=0, max=4095, floor=0, ceil=4095, missing=None):
if num == missing: return '__'
if num < min: num = floor
if num > max: num = ceil
scaled = ((num-min)*4095)/(max-min)
return CG_EXTENDED_MAP[scaled/64]+CG_EXTENDED_MAP[scaled%64]
There; done. That is all the python code for dealing with encoding any range of numbers onto a scaled google chart data extended encoded number. You can have your ‘missing’ data point be -1 and it will still work just fine. You can Ceil and Floor, and deal with all the rest. If you use the django curry utility, you can have even MORE fun! Lets take a look:
def encode_data(raw_data):
"""encode the data for plotting 5 standard deviations from the average
computed with missing data treated as 0.0 for avg compute
(common bell curve compute)
"""
dev = standard_deviation(raw_data) * 5
avg = sum(x for x in raw_data if x != -1)/len(raw_data)
encoder = curry(gc_encode, min=avg-dev, max=avg+dev, missing=-1)
return ''.join(encoder(num) for num in raw_data)
I love python. Now lets bring this full circle back to that blog post Marty made on data visualization in Django. With not too much difficulty we could construct some standard data based graphs for the django DataBrowse contrib app. These would be simple, but extensible graphs. There would be a django view for generating the Google Chart API url, and would then return a redirect to that URL! No charting packages or javascript to write. Just some simple python, which unlike other charting packages can be checked into the Django contrib, as it does not require any other packages. Another cool use is having dynamic up to date charts in your S5 presentation! This is just to0 cool.

You are such a nerd, Doug. =)
That being said, I was excited when Google announced this as well.
infinitely useful, someone should put this in the many googlechart apis for python (pygooglechart for example)