Well I think I disproved #8 but proved #9 with my last post. Writing something even if it is not ‘perfect’ is not always the best option, when what you do write is drivel.
I started to write a response to Eric’s comment and realized it was really a separate blog post, so here it is.
Eric gets a bit confused about my take on ‘tags’ and ‘directories’ and how I would like to have sub-folders on google apps where I have django/people as an example. This is because my explination of what I was trying to describe was utter crap. I had a concrete example from working on the Memorize activity on the OLPC, but I left that part out for reasons I wont go into. I think a concrete example is needed however, as these meta concepts are too vague unless you nail them down. So with that said, I will use the google RSS reader as the application in question which requires an object store.
I have been using the using the google reader app for about a week now, and I must say it annoys me a little. So you have your RSS feeds and you can stick them into ‘folders’, which are more like tags. A feed can be in multiple ‘folders’ at the same time. When you select a folder you see all the entries from all the feeds in that folder. You cna also select the individual feeds. The low level objects are the individual posts, and life at first is good. This system maps quite nicely onto the concept of an object store. Really the ‘folders’ are nothing but a semantic term for a tag or meta-grouping of the feeds and the objects in those feeds. The feed names are nothing more than meta-information on the posts as well. You coould imagine this as the ideal example of using an object store over using a file system. But as we will see there are problems.
No Sub Folders
There are no sub-folders. Why would I want sub-folders? (Yes the below are all the same problem to some extent, but manifested as different symptoms; but I will connect them up later)
1. Only a single level of organization imposed on me
Many systems limit the top level tags shown to those with more than N items in them, and then you cross link/relate from there. Others use tag clouds. Neither really works for me as those are automated systems imposing their order on me, not the other way around.2. Single level organization becomes unmanageable.
Lets look at this blog as an example of this. Look at the left column at the tags on my posts. I rarely post, and yet look at that list! Imagine managing a full machine, OS, data blobs, etc like that! The truth is I already have 20 folders in GMail, and 30 in google reader. I have feed and mails being sent to multiple folders (tags) and after a while I can’t find anything!
3. No means of customizing the organization
I really want to have a sub folder python/people. This would only contain the entries that are tagged both python and people. The top level ‘folder’ people would contain all the feeds tagged as ‘people’, but would not necessarily contain a sub-folder ‘people’. I want to be the one to determine the breakdown. I want to impost the order I want on the system, not the other way around. This organization is for my benefit, and I want control over its customization. The drill down of a folder tree makes perfect logical sense. This does not preclude a system for selecting unions, exclusion, and the like in a more advanced (complicated) interface, but you need something which scales. A flat single level does not scale.
Limited Modality
One of the great promises of the meta-tagged object stores is the ability to magicly use this data for cool new represenations and visualizations of that data. Teh googel reader has some of that. You can select to only see unread entries, allow for list, preview, or full view. There are even statistics! You can see ’stared’ entries. Sort by date etc. But the grander promise is still unfulfilled.
1. No calendar integration
These things have dates and times on them. google has fantastic falendar widgets. It should be automatic right? There should be no code to write, just something to plug in and have it ‘just work’. So where is it?
2. No custom meta data
We have staring, and marking for ’sharing’ but those things and folders are meta-data which is forced on me. I want to add personal notes to myself on posts or feeds. I want to ad my own flag types ‘like I commented on this’, and ‘needs followup’; sometimes a star is nothing more than a piece of twine on your finger and just does not cut it. Object stores should just provide these things and somehow understand them for cool new visualizations if they are of a predefined meta type…. New interesting statistics should be possible on my personal meta data and the graphs should be automatic.
3. No meta-discovery
We are bringing in this data from external sources and it is already tagged with plenty of meta data. I should have access to that as well.
I am being unfair
The google reader is an incredible system, and you really do not want it to be this huge complex system. It should be simple and do one thing well, allow you to read feed entries you care about. But really I am using it (unfairly) as a prototype for a theoretical object store system. So lets talk about that for a bit. With that said I would love to have a few months working on the reader app to extend its functionality and do it in a way that directly map onto GMail and the other offerings (not gonna happen).
What does the object store look like?
What is the object store behind the reader?
1. Is it a file system with a directory per feed, a file for the meta-information on that feed, a file for the feed entries, and a file for the search index? I know of one reader that does it that way.
2. Is it a true dynamic object store with the meta information stuck to each rss entry where the feed name and information are just meta-tags? doubtful.
3. Is it a database back end which is shared by multiple reader users and broken into complex table relations? Most likely, but it doesn’t have to be.
4. In the end does it really even matter? No it doesn’t.
Herein lies the biggest problem with most attempts at object store systems. They are attempts to capture the visualization and map it directly to the storage. The memory layout, visualization, storage, indexing, and security are all munged together into one tangled mess.
Design for storage, not for representation
The object store layout should not be a 1:1 mapping to a visual display. That is what we have with current file browsers, directories, and files. The file browser represents the file system implementation in all it’s grueling detail. No wonders you can’t find your files. On most linux systems file names which differ only in case are different names. Some love it, and some hate it. Does it really matter if there is a good visualization abstraction? No, the abstraction hides the implementation details that the user should not care about. there also needs to be an abstraction layer for the interface to the storage. Something for the developers to use which is simple and does not expose the implementation details of said storage.
Most people who start to tackle implementing an object store replacement for a file system take the reverse approach and attempt to make the object store implementation represent the particular visualization or visualizations they are thinking of. They design the interfaces for storing the data to expose the details of the implementation. Exposing the meta-data as a core component. You need to first save to the temp storage, then push to the long term storage (for instance). They expect the storage to glean some important information from the meta-data (image dimensions, sampling rate, author, history). At some level this is needed (re: security), but most systems I have seen take this too far. What you want to do instead is figure out what the real issues are, and design the storage system for storage management; not for the meta-systems. The meta systems should be used by higher level abstractions for viewing that storage in interesting ways. This is extremely hard to do properly. When done properly, it does not matter if the data and meta-data is coming from a database, a file system, or a cloud. that becomes an implementation detail. Some systems will work better for different types of data and meta-data. That is how it should be. There are no silver bullets.
At some point I would love to go over how I think all this can be achieved. Where the levels of abstraction are, and what the meta-management systems would look like. The concepts are actually quite simple. Security First, Storage, Meta-Meta, Meta, Index, Visualization. Configuration management systems are the closest to a complete implementation I have seen. I just do not have time for all these ‘pet projects’ which are years of work….
