
Steve Holden is participating in the 5K Race for Hope. He is looking for people to sponsor his run. Lets show the other groups the generosity of the Python Community! (Sorry Team Hopkins, Steve got to me first
)
If Steve is willing to go the full 5K distance, we should be able to support him with some cash. With the exception of this past year, it has been his fund raising efforts which have kept PyCon so cheap. Lets put some of those saved pennies towards a great cause!
Hmm…. I know I should be more concerned about what this says about my internet useage:
You have exceeded your 1027.63 gigabyte download limit
You have downloaded a total of 1027.67 gigabytes.
I was going to write a reaction to Ray Ozzie’s comments on Microsoft and Open Source. First off this is a blog post which just quotes some of his comments. One must be careful when that happens, and I would prefer to read an entire transcript. Still there is enough context there to work with. I have been waiting for over a year to hear from this titan. It saddens me to read these excerpts.
I never write a post all in one go, and while doing other things I saw another blog post about this subject. It is much better than anything I could have ever written and sums up every point I wanted to make; then continues on to a level I am incapable of. So go read that.
I was having a decent enough day until this came across my rss reader:
DreamWorks Acquires Rights for Ghost in the Shell
And another of my favorite franchises gets mutated, warped and ruined. “But it’s DreamWorks and Spielburg“, you say. Exactly. Not only do I get to see something I love destroyed, I get to watch DreamWorks produce a turkey. To be honest I do hope they can pull it off, but I really do not expect it to happen. Even the franchise creator Masamune Shirow botched it (GitS2: Innocence was utter crap). They plan on making ‘it’ a 3-D live action movie (I think I just threw up a little in my mouth), but are very vague on what exactly ‘it’ is. The origional film from 1995 was revolutionary. It brought cyberpunk to a new level which has not been matched sense; not even by the rest of the franchise.
When I first hear that a TV series, GitS: Standalone Complex, was going to be produced I was concerned; just having been betrayed by Innocence. I was dismayed when I saw the opening sequence. You see in anime, the opening and ending sequences say a lot about the show in an indirect manner. The opening is usually orders of magnitude better than the actual show. The GitS:SC opening, rendered on a Playstation2, was so bad that I wrote it off (don’t get me wrong the music is top notch). That is until I heard from friends what I was missing. It turned out to be a good solid anime, shot through with moments of brilliance. Then came the follow up 2nd season, appropriatly named GitS:SC 2nd Gig. The entire season was phenomenal. The opening actually lived up to the series (without surpassing the actual content of the show). The sub plots were all tied together superbly. There were a total of 2 ‘filler’ episodes and even those managed to move the other plots along (which again were part of a seamless whole). GitS:SC Solid State Society actually made up for Innocence wrapping up the story and ending the franchise on a high note.
And therein lies the problem. The story is done. The only way to do anything more with it is to take it to a different country where the majority of people know nothing about it and re-do it in some botched way (ala ‘The Ring’, ‘The Visitors’, ‘Dark Water’, I could go on for pages…) Go see the originals of those films, especially ‘Dark Water‘. No comparison.That movie gave me nightmares; I turned off the US remake because I was laughing.
What really scares me is that another Masamune Shirow’s works (and another of my favorite manga+anime’s) AppleSeed was recently redone in ‘live action based 3-D‘, and… well… I could not watch more than 15min of it. I wish I had never seen those 15min as they have laid a taint on my memory. It was bad. I mean really really BAD. It’s Phantom Menace all over again (or at least Bubblegum Crisis 2040… why? The series was done… no need to ruin it!) [Yes I have the Priss Hurricane Video in origional japanese and Live Concert on import PAL VHS]
Postscript: What did we do before wikipedia and youtube!?!!?!?!?
I have been holding off on writing this post as I prefer to fully form an opinion.
At the writing of this there are almost 600 blog posts about the new google hosted application. Most seem to me to be flailing around the actual core of what this new little beastie is. Some are comparing it to Amazon offerings, some as a threat to commodity hosting, and some as the dawn of a new computing revolution. A few highly respected people see this as brining application development to the people the way that html/aol/myspace brought web development to the people. Many see this as a validation that python is an enterprise level platform. While I believe python is just that, I do not yet see this as a validation. The validation comes with Google App Engine’s success. Not that the language needs this added validation. As for the revolution, time will tell.
Google is toting this as making the web a platform. A platform for development, essentially replacing the desktop as where applications get developed and deployed. They do all the busy work of setting up the hardware, configuring systems for monitoring traffic, setting up the database, setting up the source control system, bug tracking, and all the rest, and let you focus on writing the application. Also you get the power of Googles massive data centers with literal warehouses of machines and disks and their custom database. They bill this as being a platform for building your web based applications, a user base, business, and revenue stream. Of course one revenue stream will be ad-sense further promoting the Google advertising juggernaut. This is all fantastic, but there are limitations (as there must be). I put the limitations at the end.
What is Google really up to?
Google is not releasing the App Engine in a void. There are many other services that google has been rolling out over time (and many quite recently) which need to be looked at in order to get a proper view of what is happening. First lets step back a bit and look at Google’s past. In the past when google released a feature with an API, people would rush out and start building mashups. Mashups which combined parts of Google, and parts of other systems. Systems google often had little or no control over. Early on Google revoked some keys when things got out of hand. Very early on there was some bad PR. Some mashups went away, some came back, some just died out. There was a wealth of data, information, and potential revenue behind those mashups, but it was out of Googles hands by and large.
In the mean time Facebook came along and changed what it meant to write a web app. No longer were applications monolithic disconnected things. They were widgets which plugged into a page. They were cool integrated, socially networked, and shared. They were things people paid real money for. They were things people were using to generate adwords revenue! Google created their own apps. They created them for all the other social sites and the desktop. they did not care who was the hot new trend, as long as they had a share. But they have to play by other peoples rules and API’s.
So google has their search, mail, maps, and documents, online bookmarks, and calendar. They have an rss reader. Others are making mashups, and now many of those are occurring on FaceBook as 3rd party apps. Google releases some extensions for form filling on the docs, and integrating charts. They release a data API. They release OpenSocial as an attempt to standardize all these social networks and the core of what their apps provide; the social connections. They release custom site hosting (without announcing it except in a blog posting). They now have all these great applications and pieces of applications. They have a means of creating, editing, and hosting static html and data. What they lack is a framework to integrate everything. Something where they host the mashups. Something that they can do the deep data mining on. At least that was the case until App Engine came along.
Google claims in the very opening of their announcement that the App Engine is all about the developers. It is all about the people out there who develop neat and interesting things and the feedback loop that creates. The creative creativity of the masses. That has always been the key to Googles overall success. They provide the tools, and others create all those great mashups, sites, and apps. This is not about you creating your cool new app. This is about you creating your cool new Google mashup app utilizing all the other google API’s. They are not all there yet, but they will be. The crucial one, the user backend, is already there. All the other offerings do not require their python API. All the other offerings already have javascript and IFrame, and other means of integrating which were developed for integrating with your blog or MySpace or FaceBook. But make no mistake, they are coming to GAE.
In short this is about taking all those Google pieces parts and creating the ‘next big thing’ and using the developers out on the internet to do it, as they are the ones who will do it anyway and now they can do it for Google. Google gets their precious data, their add revenue, and at some point people get to pay for the privilege of developing apps for them (either via adds or real money for removing those quotas).
Now comes the really cool part. The SDK includes everything you need for running locally. They have the Google Gears framework for making your apps work both online and offline on your desktop. Integrate all that fully and there really is no difference between your online web based apps and your desktop apps. There is still a long row to hoe before it gets to that point, but the pieces are falling onto place.
Why Python?
There are a number of theories about the real reasons for choosing python. Most believe it’s because python is one of Googles 4 primary languages. I do not believe that exactly. If this could have been done in Java, they would have done that. PHP is the only other ‘language’ that could approach what they want to do. As what they want is a platform for developing mashups with their existing technologies on a massively distributed scale by unknown random people, here is a short list of requirements:
I know of no other language which meets these requirements. PHP comes close, but would require a partial lobotomy (where python just has some modules removed or limited). Also PHP is not one of the languages that there are API clients for. I know that people are clamoring for other languages. All I can say is don’t hold your breath. I just do not see it happening any time soon.
[UPDATE: as a commenter points out, google is quite dedicated to python and has many core programmers on staff including the language creator. This is a great help for getting things done and adding validity to the project. Read the comments to hear my thoughts on Ruby.]
Growing Pains
App Engine is in its infancy. As with all their Beta projects there are problems. The main problem is how they are dealing with the problems. In short they are overwhelmed. People are asking for PHP, and their favorite python projects to be supported. They made the mistake of claiming that most python frameworks will run on it without putting up the proper CAUTION signs. It is possible to get Zope to run on it with some work. All that is missing is the hook to use the google database instead of the ZODB as the backend, a few minor tweaks, and use the WSGI adapter. Twisted is just out due to the signals, and the threads, and crucially, the tcp connections. One of the problems is that people expect that XYZ module should just work, and it’s the App Engine teams job to do that. The team seems to feel that they provide the framework and others should do the porting. There are also reports of bugs not being responded to in a timely manner. This is a bit laughable given the shear number of bugs currently reported and the 15 or so engineers they have dealing with all the App Engine deployment issues. I am sure that no one expected to have to deal with flamewars in the bug tracker. Or that thousands of people would post +1 comments in the bug tracker making it next to unusable (some people just can’t read instructions). I would not expect all the current bugs to be triaged until late next week or the week after.
Most of the complaints seem to be about the limitations put in place. I can understand that, but I can also understand why hell will freeze over before most of them are lifted. When it comes to an initial deployment it seems quite generous and unrestricted. Insanely so. If you think about what it takes to deploy something like this, at this level, things start to click into place. How would you do it? How would you manage the issues, security risks, vectors of abuse? It is great to say you want to create thread to accept a certified https connection, but if you are making that request, then you have no clue about the technical aspects behind that request or the technical aspects behind the App Engine.
Current Limitations
1. No long running processes
These are run once executions, and there is a time limit of a few seconds. Think of this the same way you would think about a PHP page.
2. No reliable state between runs
There is potential state from one run to the next, but you should not rely on it for large deployments. All state and persistent data should be stored in the database (or via some neat hacks). NOTE: this is more from my reading between the lines and knowledge of load balanced grid deployments. I.e. I do not trust their ‘cache‘ system as something that can be relied upon.Why? Because we are talking about nodes and sandboxes.
3. No incoming TCP connections
No binding to sockets, etc. These are Google’s servers. Even they do not know which node your http request which starts the app will be run on; no way of knowing which IP it really will be. Only apps are running on these nodes. This means no mixing of non-app and app requests. No twisted or zope admin instances. For google to provide a proper balanced network (with proper dispatching), it has to be that way (well at this phase in the game at least).
4. Limited connections out
Google has a url API for making http and https requests out to other servers, a connection to a database and a mail API. Those are the only outgoing network connections, and all are bound in API’s. If you were allowing anyone to run programs on your servers would you want them to be part of botnets?
5. No https
This is not static IP hosting, no cert for you! There are some things that can be done, but there would be cert warnings, etc. Granted this does not stop you from integrating with PayPal, or Google Checkout, where the https checkout is handled by a different site (insanely weak). [UPDATE: yes I know static IP's are not required for certs, but they are required if you would rather people not to get the cert warning or have IE7 mark the site as 'insecure'. And google will not pay for a cert per app, nor will it get a single cert for all apps some of which they are not really sure of their authenticity (a phising app based on Adrian's dynamic html->template tool for instance.) I do expect them to support something in the future, but that is a ways off and will not be for free.]
6. No spawning new processes (or signal overriding)
Well no big surprise here. Starting new processes could be very dangerous for all, and signal overriding… well that could make it hard for google to safely stop a rogue app (among other things).
7. No creating new threads!
Ok, this is a bit strange at first blush, but if you have ever dealt with grid deployments, or taught a 102 CS course (where you start covering semaphores and mutexes, and IPC) you have had the experience of a rogue multi-thread app taking down a machine. Part of the problem is that creating a new thread is very much like starting a new process. One of the interesting things about starting a new process is that the operation does not adhere to the nice protocol. It gets the CPU to do that start no matter what, and at the kernel priority (which is not very nice). New threads behave the same way, and are a PITA to deal with when trying to take down a process which has gone rogue. I hated that lab. I have other theories behind why they do not allow this (but that is for another post)
8. No ‘real’ filesystem
Well there is no real access to the file system. Not the ‘real’ file system. As such certain things like tmpfile are not present (as there is no /tmp directory).
9. Crippled import/bytecode
Well that is an overstatement. Google has written their own import replacement, and modified the bytecode (I think) from standard python. I have some theories on why for a later post, but the deal is, forget about using marshal, imp, or even some of the package __import__ hooks, and cPickle is just pickle. Part of the reason is because of the lack of a ‘real’ file system. The python path and import control is special as only packages from google, and those in your current app are available, and they are specially managed. This should not affect anyone unless you play funky package import tricks that you should not be doing anyway. Extending __path__ in packages does still work, but using __import__ directly to import a package using a computed abs path does not work (might be a bug).
10. Quotas and App shutdown
If your app gets too popular and goes over quotas, then it is disabled. Once it gets too popular, you need to buy more computes, etc. None of the quotas are set in stone yet, and of course if you use google analytics and/or ad-sense, then the quotas are less restrictive or removed. The details are still in flux. For the beta period you can request a larger quote for free (but each request is reviewed for merit). You can also report app abuse if you find that someone’s app is not being nice.
[UPDATE: Here is the link to the current quota system.]
11. Only 3 Apps and no deleting.
For the beta period, each developer can have just 3 apps, and you can not delete an app.
12. Only pure python
No c extension modules. This is again because of the sandbox system, and all the other stuff above. You can’t prevent process or thread spawning in a c extension. You can not stop a c extension from corrupting things in very bad ways. You can’t stop it from attempting to connect out or bind a tcp port. And it would be a PITA to distribute the binaries to all the nodes like they do for the apps themselves (via custom import hooks + caching).
[UPDATE: fixing numbering and adding some other restrictions and errors people have pointed out]
13. 1MB per file upload limit and 500MB total storage limit.
The 500 MB limit is part of the current Quota system, but I was unaware of the 1MB file upload limit that a commenter pointed out.
14. 1000 files in an app limit
This is a huge problem for people trying to deploy pylons, TG, or Django trunk based applications. One potential solution (which is not currently supported) would be for google to allow for python zip imports and have things bundled.
15. The Google DataStore has limitations over a classic RDBMS
Ben Bangert has a great write up on this, so go read that. ![]()
It’s like someone peered into my 3am brain and drew what they saw.
Randall: you forgot the farside chicken perched on the tree, but besides that, spot on! Spot On!!
Because all the cool kids are doing it:
history|awk ‘{a[$2]++ } END{for(i in a){print a[i] ” ” i}}’ |sort -rn|head
New laptop:
44 ls
37 cd
18 svn
16 netstat
13 ping
9 ssh
6 nslookup
4 python
4 mv
4 man
PyCon Server:
314 cd
82 ps
78 ls
68 tail
67 svn
59 exit
52 grep
40 uptime
38 touch
32 python2.5
Work (this can’t be right, my .cshrc must be set to keep everything for ever or 100K entries or something):
1034 pmrec
932 python
720 cd
621 gmake
612 ls
501 grep
314 tail
280 cut
210 perl
207 head
Well I think I disproved #8 but proved #9 with my last post. Writing something even if it is not ‘perfect’ is not always the best option, when what you do write is drivel.
I started to write a response to Eric’s comment and realized it was really a separate blog post, so here it is.
Eric gets a bit confused about my take on ‘tags’ and ‘directories’ and how I would like to have sub-folders on google apps where I have django/people as an example. This is because my explination of what I was trying to describe was utter crap. I had a concrete example from working on the Memorize activity on the OLPC, but I left that part out for reasons I wont go into. I think a concrete example is needed however, as these meta concepts are too vague unless you nail them down. So with that said, I will use the google RSS reader as the application in question which requires an object store.
I have been using the using the google reader app for about a week now, and I must say it annoys me a little. So you have your RSS feeds and you can stick them into ‘folders’, which are more like tags. A feed can be in multiple ‘folders’ at the same time. When you select a folder you see all the entries from all the feeds in that folder. You cna also select the individual feeds. The low level objects are the individual posts, and life at first is good. This system maps quite nicely onto the concept of an object store. Really the ‘folders’ are nothing but a semantic term for a tag or meta-grouping of the feeds and the objects in those feeds. The feed names are nothing more than meta-information on the posts as well. You coould imagine this as the ideal example of using an object store over using a file system. But as we will see there are problems.
No Sub Folders
There are no sub-folders. Why would I want sub-folders? (Yes the below are all the same problem to some extent, but manifested as different symptoms; but I will connect them up later)
1. Only a single level of organization imposed on me
Many systems limit the top level tags shown to those with more than N items in them, and then you cross link/relate from there. Others use tag clouds. Neither really works for me as those are automated systems imposing their order on me, not the other way around.2. Single level organization becomes unmanageable.
Lets look at this blog as an example of this. Look at the left column at the tags on my posts. I rarely post, and yet look at that list! Imagine managing a full machine, OS, data blobs, etc like that! The truth is I already have 20 folders in GMail, and 30 in google reader. I have feed and mails being sent to multiple folders (tags) and after a while I can’t find anything!
3. No means of customizing the organization
I really want to have a sub folder python/people. This would only contain the entries that are tagged both python and people. The top level ‘folder’ people would contain all the feeds tagged as ‘people’, but would not necessarily contain a sub-folder ‘people’. I want to be the one to determine the breakdown. I want to impost the order I want on the system, not the other way around. This organization is for my benefit, and I want control over its customization. The drill down of a folder tree makes perfect logical sense. This does not preclude a system for selecting unions, exclusion, and the like in a more advanced (complicated) interface, but you need something which scales. A flat single level does not scale.
Limited Modality
One of the great promises of the meta-tagged object stores is the ability to magicly use this data for cool new represenations and visualizations of that data. Teh googel reader has some of that. You can select to only see unread entries, allow for list, preview, or full view. There are even statistics! You can see ’stared’ entries. Sort by date etc. But the grander promise is still unfulfilled.
1. No calendar integration
These things have dates and times on them. google has fantastic falendar widgets. It should be automatic right? There should be no code to write, just something to plug in and have it ‘just work’. So where is it?
2. No custom meta data
We have staring, and marking for ’sharing’ but those things and folders are meta-data which is forced on me. I want to add personal notes to myself on posts or feeds. I want to ad my own flag types ‘like I commented on this’, and ‘needs followup’; sometimes a star is nothing more than a piece of twine on your finger and just does not cut it. Object stores should just provide these things and somehow understand them for cool new visualizations if they are of a predefined meta type…. New interesting statistics should be possible on my personal meta data and the graphs should be automatic.
3. No meta-discovery
We are bringing in this data from external sources and it is already tagged with plenty of meta data. I should have access to that as well.
I am being unfair
The google reader is an incredible system, and you really do not want it to be this huge complex system. It should be simple and do one thing well, allow you to read feed entries you care about. But really I am using it (unfairly) as a prototype for a theoretical object store system. So lets talk about that for a bit. With that said I would love to have a few months working on the reader app to extend its functionality and do it in a way that directly map onto GMail and the other offerings (not gonna happen).
What does the object store look like?
What is the object store behind the reader?
1. Is it a file system with a directory per feed, a file for the meta-information on that feed, a file for the feed entries, and a file for the search index? I know of one reader that does it that way.
2. Is it a true dynamic object store with the meta information stuck to each rss entry where the feed name and information are just meta-tags? doubtful.
3. Is it a database back end which is shared by multiple reader users and broken into complex table relations? Most likely, but it doesn’t have to be.
4. In the end does it really even matter? No it doesn’t.
Herein lies the biggest problem with most attempts at object store systems. They are attempts to capture the visualization and map it directly to the storage. The memory layout, visualization, storage, indexing, and security are all munged together into one tangled mess.
Design for storage, not for representation
The object store layout should not be a 1:1 mapping to a visual display. That is what we have with current file browsers, directories, and files. The file browser represents the file system implementation in all it’s grueling detail. No wonders you can’t find your files. On most linux systems file names which differ only in case are different names. Some love it, and some hate it. Does it really matter if there is a good visualization abstraction? No, the abstraction hides the implementation details that the user should not care about. there also needs to be an abstraction layer for the interface to the storage. Something for the developers to use which is simple and does not expose the implementation details of said storage.
Most people who start to tackle implementing an object store replacement for a file system take the reverse approach and attempt to make the object store implementation represent the particular visualization or visualizations they are thinking of. They design the interfaces for storing the data to expose the details of the implementation. Exposing the meta-data as a core component. You need to first save to the temp storage, then push to the long term storage (for instance). They expect the storage to glean some important information from the meta-data (image dimensions, sampling rate, author, history). At some level this is needed (re: security), but most systems I have seen take this too far. What you want to do instead is figure out what the real issues are, and design the storage system for storage management; not for the meta-systems. The meta systems should be used by higher level abstractions for viewing that storage in interesting ways. This is extremely hard to do properly. When done properly, it does not matter if the data and meta-data is coming from a database, a file system, or a cloud. that becomes an implementation detail. Some systems will work better for different types of data and meta-data. That is how it should be. There are no silver bullets.
At some point I would love to go over how I think all this can be achieved. Where the levels of abstraction are, and what the meta-management systems would look like. The concepts are actually quite simple. Security First, Storage, Meta-Meta, Meta, Index, Visualization. Configuration management systems are the closest to a complete implementation I have seen. I just do not have time for all these ‘pet projects’ which are years of work….
Well, this was going to be three or four posts, but thanks to some interesting announcement from google, it all sort of runs together.
It still will be I think. I will most likely try to rewrite things to give an overview and go into detail on specifics later. Things are getting interesting at work so we will see how much time I have to pull that off.
Files
Ivan beat me to the punch on the main gist post. While at PyCon I had the opportunity to chat with Mike Fletcher, another OLPC volunteer whom I forget their name, Phil Hassey, Richard Jones, Jeff Rush, and about 5 other people who wandered in and out of the small sprint room we were all half passed out in. People came and went durring the discussion (I believe Richard and Phil went off to play a board game at one point as well) which ranged from modern Sci-Fi offerings to games to global warming being a net win for Canada to the history of the world (not the movie). I should have gone to bed well before the discussion started. The discussion turned to the object store on the OLPC platform. Jeff, coming from a ZODB background, was quite pro object store systems replacing ‘file systems’. This is a hot button topic with me. This topic has come up at every professional job I have had going all the way back to when I was an CO-OP at Motorola as a ‘Document Administrator’ (secretary). In fact the only two topics which are more hot button for me are ‘common application UI framework’s, and ‘security after the fact‘. I first started thinking about this subject back in 93 when I first started working on MUDDs (warcraft, only 100% text for you youngins). The world was editable online (like a lisp MUSH) but also had revision history (via RCS initially). We were dealing with ’serialization’ and how objects were managed. I fell in love with the idea that everything could be described as having a set of attributes (tags) and really you wanted to store and manage these things by those attributes. Permissions were nothing more than attributes. Actions were nothing more than attributes. Meta data by definition were just attributes. We struggled with systems for this, but I came away convinced that we needed a new paradigm in object storage, and this ‘file’ stuff was running on borrowed time. It came up again at Motorola for document management. It came up again at OpenVision (later Veritas) for backup and security compliance. It came up again with ClearCase and Derived Objects. It came up again with ‘dictionaries’ and data management for VoiceXpress. And the code base I currently work on has something called ‘DFiles’ which I can not discuss except to mention the name (DRAT!)
Storage
Back to the discussion at PyCon. I wish I had a transcript of the discussion (no I don’t… I was not as coherent as I think I was). The Idea that everything is just blobs in a cloud of data where the tags determine the meta-structured is nice, but there are some problems. The first and most obvious problem is that it does not integrate well with existing technology and libraries. Decades of software has been written with the concept of files. You can try a fake mapping, and try to integrate things, but it does not work well. Then there is the concept of ’sub-blobs’. That is each of the pieces of data could have sub parts. This maps well to your document which might have a chart or spreadsheet as part of it for instance. This can greatly simplify serialization, and you get all those nice blob store things. Your in-memory structure is your serialization structure. But in reality we already have this. They are called files and directories. It is simply (*cough*) an implementation detail dealing with the storage mapping. Ok, there is nothing simple about it, but we will come back to this. The argument then turned to the fact that you can’t have a blob show up in more than one directory. False. Those are called symlinks, but again that is an implementation detail. One of the biggest benefits of an object-store-as-filesystem is the ability to find and manage things not in a ridged tree structure which does not scale well in the average human brain (where did I put my (ssh) keys again?) But in practice it is just replacing one confusing arbitrary structure with another on some level as it’s usefulness is measured by the quality of the tags, attributes, and indelibility of the data.If you had those things well defined in a directory tree structure, then it works just as well (as google desktop search proves). A more subtile problem is that not all tags/attributes are created equal. It took a long time for my betters and practical experience to prove this to me. Many attributes are only useful to programs. These programatic tags are for relating data, validation, encoding, and the like. Most of the time these are auto generated or involve mathematical computations. They are never intended for human interpretation,but are none the less crucial for data management. You can try to predetermine the different types of these meta attributes, or just lump them together, but neither of these approaches are really tractable. Spend some time deep diving into the abuses of the windows registry and you begin to get an idea of the issues.
I know I am glossing over all the details, and not really giving any points the attention they deserve. I am not even properly quantifying the points. Issues of language are completly being skipped over (try describing what a ‘word’ is in your application; try again when that application deals with speech and natural language… how does that abstract into meaningful tags?) Oh well. The point is there must be a happy median. We should be able to have something which has a file system programatic interface, as well as a generic data store interface. The browsing of the data should be an abstraction. If this is implemented with a classic journaling file system or in a database should be an implementation detail at the filesystem level. Why invent a new abstraction layer which everyone must now implement against when we have a perfectly good one that everyone already does? A file by any other name still contains data. If this is such a good model, think about extending it to namespaces. The problems in software code management (which is just data on a very real level) for which namespaces were invented exist on the filesystem as well. Chew on that while you code with Matrix.Optimizer and Optimizer.Matrix.
Google has an interesting take on all of this. All of their service (news, documents, reader, calendar, mail, blogger, etc) all have a file like data storage for the objects represented. They use folders/directories (really tags). The only restriction is that the folders are only one level deep. I do not care for this myself. I would love to be able to have a ‘people’ folder under my ‘python’ folder and have only those times tagged with both ‘python’ and ‘people’ under that ‘folder’. Maybe that is just me. I would not want these sub folder relations to be automatic. I would want control over the layout, but have the population automatic. But that is the only extension to their system I would like to see. Beyond that it just works. It works with both the object store model and the file/directory model. If only google would open up their API’s a bit more to include this system. On wait, they just did. You know if I had hit ‘publish’ on this post last evening when I first wrote most of this, I would have been ‘prophetic’ or at least ‘first post!’.
It’s not all hearts and ponies and sparkle (even if it is python and an abstraction layer on top of django to boot!!!) I have been holding off on posting this err… post until I could formulate a non-reactionary opinion on the entire Google Apps thing. I now have an opinion and it is much along the same lines as Duncan McGreggor. The issues I have are both similar and yet unique to his, and I will post on them separately.
Python, and Conference Software
This post is already too long,and my laptop battery is dying (no the charger is at work
). Those of you that I talked with at Pycon about UnConference hosting know what this is all about, and I told you so ;-). The last piece just fell into place. With that, good night ![]()
| powered by WordPress | Tiga theme with a bit of Ozh | 
Written
works are
licensed under a
Creative
Commons Attribution-Share Alike 3.0
License,
unless otherwise noted with specificity.