Well, this was going to be three or four posts, but thanks to some interesting announcement from google, it all sort of runs together.
It still will be I think. I will most likely try to rewrite things to give an overview and go into detail on specifics later. Things are getting interesting at work so we will see how much time I have to pull that off.
Files
Ivan beat me to the punch on the main gist post. While at PyCon I had the opportunity to chat with Mike Fletcher, another OLPC volunteer whom I forget their name, Phil Hassey, Richard Jones, Jeff Rush, and about 5 other people who wandered in and out of the small sprint room we were all half passed out in. People came and went durring the discussion (I believe Richard and Phil went off to play a board game at one point as well) which ranged from modern Sci-Fi offerings to games to global warming being a net win for Canada to the history of the world (not the movie). I should have gone to bed well before the discussion started. The discussion turned to the object store on the OLPC platform. Jeff, coming from a ZODB background, was quite pro object store systems replacing ‘file systems’. This is a hot button topic with me. This topic has come up at every professional job I have had going all the way back to when I was an CO-OP at Motorola as a ‘Document Administrator’ (secretary). In fact the only two topics which are more hot button for me are ‘common application UI framework’s, and ‘security after the fact‘. I first started thinking about this subject back in 93 when I first started working on MUDDs (warcraft, only 100% text for you youngins). The world was editable online (like a lisp MUSH) but also had revision history (via RCS initially). We were dealing with ’serialization’ and how objects were managed. I fell in love with the idea that everything could be described as having a set of attributes (tags) and really you wanted to store and manage these things by those attributes. Permissions were nothing more than attributes. Actions were nothing more than attributes. Meta data by definition were just attributes. We struggled with systems for this, but I came away convinced that we needed a new paradigm in object storage, and this ‘file’ stuff was running on borrowed time. It came up again at Motorola for document management. It came up again at OpenVision (later Veritas) for backup and security compliance. It came up again with ClearCase and Derived Objects. It came up again with ‘dictionaries’ and data management for VoiceXpress. And the code base I currently work on has something called ‘DFiles’ which I can not discuss except to mention the name (DRAT!)
Storage
Back to the discussion at PyCon. I wish I had a transcript of the discussion (no I don’t… I was not as coherent as I think I was). The Idea that everything is just blobs in a cloud of data where the tags determine the meta-structured is nice, but there are some problems. The first and most obvious problem is that it does not integrate well with existing technology and libraries. Decades of software has been written with the concept of files. You can try a fake mapping, and try to integrate things, but it does not work well. Then there is the concept of ’sub-blobs’. That is each of the pieces of data could have sub parts. This maps well to your document which might have a chart or spreadsheet as part of it for instance. This can greatly simplify serialization, and you get all those nice blob store things. Your in-memory structure is your serialization structure. But in reality we already have this. They are called files and directories. It is simply (*cough*) an implementation detail dealing with the storage mapping. Ok, there is nothing simple about it, but we will come back to this. The argument then turned to the fact that you can’t have a blob show up in more than one directory. False. Those are called symlinks, but again that is an implementation detail. One of the biggest benefits of an object-store-as-filesystem is the ability to find and manage things not in a ridged tree structure which does not scale well in the average human brain (where did I put my (ssh) keys again?) But in practice it is just replacing one confusing arbitrary structure with another on some level as it’s usefulness is measured by the quality of the tags, attributes, and indelibility of the data.If you had those things well defined in a directory tree structure, then it works just as well (as google desktop search proves). A more subtile problem is that not all tags/attributes are created equal. It took a long time for my betters and practical experience to prove this to me. Many attributes are only useful to programs. These programatic tags are for relating data, validation, encoding, and the like. Most of the time these are auto generated or involve mathematical computations. They are never intended for human interpretation,but are none the less crucial for data management. You can try to predetermine the different types of these meta attributes, or just lump them together, but neither of these approaches are really tractable. Spend some time deep diving into the abuses of the windows registry and you begin to get an idea of the issues.
I know I am glossing over all the details, and not really giving any points the attention they deserve. I am not even properly quantifying the points. Issues of language are completly being skipped over (try describing what a ‘word’ is in your application; try again when that application deals with speech and natural language… how does that abstract into meaningful tags?) Oh well. The point is there must be a happy median. We should be able to have something which has a file system programatic interface, as well as a generic data store interface. The browsing of the data should be an abstraction. If this is implemented with a classic journaling file system or in a database should be an implementation detail at the filesystem level. Why invent a new abstraction layer which everyone must now implement against when we have a perfectly good one that everyone already does? A file by any other name still contains data. If this is such a good model, think about extending it to namespaces. The problems in software code management (which is just data on a very real level) for which namespaces were invented exist on the filesystem as well. Chew on that while you code with Matrix.Optimizer and Optimizer.Matrix.
Google has an interesting take on all of this. All of their service (news, documents, reader, calendar, mail, blogger, etc) all have a file like data storage for the objects represented. They use folders/directories (really tags). The only restriction is that the folders are only one level deep. I do not care for this myself. I would love to be able to have a ‘people’ folder under my ‘python’ folder and have only those times tagged with both ‘python’ and ‘people’ under that ‘folder’. Maybe that is just me. I would not want these sub folder relations to be automatic. I would want control over the layout, but have the population automatic. But that is the only extension to their system I would like to see. Beyond that it just works. It works with both the object store model and the file/directory model. If only google would open up their API’s a bit more to include this system. On wait, they just did. You know if I had hit ‘publish’ on this post last evening when I first wrote most of this, I would have been ‘prophetic’ or at least ‘first post!’.
It’s not all hearts and ponies and sparkle (even if it is python and an abstraction layer on top of django to boot!!!) I have been holding off on posting this err… post until I could formulate a non-reactionary opinion on the entire Google Apps thing. I now have an opinion and it is much along the same lines as Duncan McGreggor. The issues I have are both similar and yet unique to his, and I will post on them separately.
Python, and Conference Software
This post is already too long,and my laptop battery is dying (no the charger is at work
). Those of you that I talked with at Pycon about UnConference hosting know what this is all about, and I told you so ;-). The last piece just fell into place. With that, good night ![]()
