Dougma (dŭg·mə) n.

  1. An authoritative principle, belief, or statement of ideas or opinion, especially one considered to be absolutely true by Doug; who is often wrong.
  2. A specific tenet or dougtrine authoritatively laid down, as by Doug.
  3. A system of principles or tenets, for Doug.
April 9th, 2008

Files, Storage, Google, Python, and UnConference Software

Well, this was going to be three or four posts, but thanks to some interesting announcement from google, it all sort of runs together.

It still will be I think. I will most likely try to rewrite things to give an overview and go into detail on specifics later. Things are getting interesting at work so we will see how much time I have to pull that off.

Files

Ivan beat me to the punch on the main gist post. While at PyCon I had the opportunity to chat with Mike Fletcher, another OLPC volunteer whom I forget their name, Phil Hassey, Richard Jones, Jeff Rush, and about 5 other people who wandered in and out of the small sprint room we were all half passed out in. People came and went durring the discussion (I believe Richard and Phil went off to play a board game at one point as well) which ranged from modern Sci-Fi offerings to games to global warming being a net win for Canada to the history of the world (not the movie). I should have gone to bed well before the discussion started. The discussion turned to the object store on the OLPC platform. Jeff, coming from a ZODB background, was quite pro object store systems replacing ‘file systems’. This is a hot button topic with me. This topic has come up at every professional job I have had going all the way back to when I was an CO-OP at Motorola as a ‘Document Administrator’ (secretary). In fact the only two topics which are more hot button for me are ‘common application UI framework’s, and ‘security after the fact‘. I first started thinking about this subject back in 93 when I first started working on  MUDDs (warcraft, only 100% text for you youngins). The world was editable online (like a lisp MUSH) but also had revision history (via RCS initially). We were dealing with ’serialization’ and how objects were managed. I fell in love with the idea that everything could be described as having a set of attributes (tags) and really you wanted to store and manage these things by those attributes. Permissions were nothing more than attributes. Actions were nothing more than attributes. Meta data by definition were just attributes. We struggled with systems for this, but I came away convinced that we needed a new paradigm in object storage, and this ‘file’ stuff was running on borrowed time. It came up again at Motorola for document management. It came up again at OpenVision (later Veritas) for backup and security compliance. It came up again with ClearCase and Derived Objects. It came up again with ‘dictionaries’ and data management for VoiceXpress. And the code base I currently work on has something called ‘DFiles’ which I can not discuss except to mention the name (DRAT!)

Storage

Back to the discussion at PyCon. I wish I had a transcript of the discussion (no I don’t… I was not as coherent as I think I was). The Idea that everything is just blobs in a cloud of data where the tags determine the meta-structured is nice, but there are some problems. The first and most obvious problem is that it does not integrate well with existing technology and libraries. Decades of software has been written with the concept of files. You can try a fake mapping, and try to integrate things, but it does not work well. Then there is the concept of ’sub-blobs’. That is each of the pieces of data could have sub parts. This maps well to your document which might have a chart or spreadsheet as part of it for instance. This can greatly simplify serialization, and you get all those nice blob store things. Your in-memory structure is your serialization structure. But in reality we already have this. They are called files and directories. It is simply (*cough*) an implementation detail dealing with the storage mapping. Ok, there is nothing simple about it, but we will come back to this. The argument then turned to the fact that you can’t have a blob show up in more than one directory. False. Those are called symlinks, but again that is an implementation detail. One of the biggest benefits of an object-store-as-filesystem is the ability to find and manage things not in a ridged tree structure which does not scale well in the average human brain (where did I put my (ssh) keys again?) But in practice it is just replacing one confusing arbitrary structure with another on some level as it’s usefulness is measured by the quality of the tags, attributes, and indelibility of the data.If you had those things well defined in a directory tree structure, then it works just as well (as google desktop search proves). A more subtile problem is that not all tags/attributes are created equal. It took a long time for my betters and practical experience to prove this to me. Many attributes are only useful to programs. These programatic tags are for relating data, validation, encoding, and the like. Most of the time these are auto generated or involve mathematical computations. They are never intended for human interpretation,but are none the less crucial for data management. You can try to predetermine the different types of these meta attributes, or just lump them together, but neither of these approaches are really tractable. Spend some time deep diving into the abuses of the windows registry and you begin to get an idea of the issues.

I know I am glossing over all the details, and not really giving any points the attention they deserve. I am not even properly quantifying the points. Issues of language are completly being skipped over (try describing what a ‘word’ is in your application; try again when that application deals with speech and natural language… how does that abstract into meaningful tags?) Oh well. The point is there must be a happy median. We should be able to have something which has a file system programatic interface, as well as a generic data store interface. The browsing of the data should be an abstraction. If this is implemented with a classic journaling file system or in a database should be an implementation detail at the filesystem level. Why invent a new abstraction layer which everyone must now implement against when we have a perfectly good one that everyone already does? A file by any other name still contains data. If this is such a good model, think about extending it to namespaces. The problems in software code management (which is just data on a very real level) for which namespaces were invented exist on the filesystem as well. Chew on that while you code with Matrix.Optimizer and Optimizer.Matrix.

Google

Google has an interesting take on all of this. All of their service (news, documents, reader, calendar, mail, blogger, etc) all have a file like data storage for the objects represented. They use folders/directories (really tags). The only restriction is that the folders are only one level deep. I do not care for this myself. I would love to be able to have a ‘people’ folder under my ‘python’ folder and have only those times tagged with both ‘python’ and ‘people’ under that ‘folder’. Maybe that is just me. I would not want these sub folder relations to be automatic. I would want control over the layout, but have the population automatic. But that is the only extension to their system I would like to see. Beyond that it just works. It works with both the object store model and the file/directory model. If only google would open up their API’s a bit more to include this system. On wait, they just did. You know if I had hit ‘publish’ on this post last evening when I first wrote most of this, I would have been ‘prophetic’ or at least ‘first post!’.

It’s not all hearts and ponies and sparkle (even if it is python and an abstraction layer on top of django to boot!!!) I have been holding off on posting this err… post until I could formulate a non-reactionary opinion on the entire Google Apps thing. I now have an opinion and it is much along the same lines as Duncan McGreggor. The issues I have are both similar and yet unique to his, and I will post on them separately.

Python, and Conference Software

This post is already too long,and my laptop battery is dying (no the charger is at work :-( ). Those of you that I talked with at Pycon about UnConference hosting know what this is all about, and I told you so ;-). The last piece just fell into place. With that, good night ;-)

June 22nd, 2007

June Cambridge Python Meetup

Peter did another fantastic job putting together this months meetup. We decided to stick with Wednesdays so we would not collide with the Plone meetup which is on Thursdays, but um… oh well…

There were two guest speakers:

1: George Lambert, Goldenware Technology
2: Mike Pittaro, SnapLogic open source data integration Project implemented in 100% Python

I decided to try something new and record the event on my little sensa mp3 player. The audio is bad at best, but it is mostly audible. We were in Somerville so at some points an airplane goes overhead. If these prove useful to people I will bring better recording equipment next time. My A/V production equipment is tied up on another project so all I did was split the audio into multiple tracks and do a lame re-encode. I tried to keep the files to under 25Meg while splitting based on topics. Unfortunately meetup.com has a 10Meg per file limit and a 100Meg per group max, so that was out of the question. The first file is under 8Meg, so please check that out first and only if you can withstand the audio quality, check out the others. I and my bandwidth will appreciate it.

NOTE: The audio is extremely soft at points and at the beginning, so you will need to crank the volume up.

  • Introductions and Django.June recap (mp3, ogg)
  • Mass TLC recap, and an extensive discussion on GPLv3, Licensing, Patents, and Python (mp3, ogg)
  • Lightning rounds with George Lambert and Mike Pittaro. (mp3, ogg)
  • Open Discussion (mp3, ogg)

The software George Lambert mentions which is used to view changes in the GPLv3 draft is Plone! Though there is talk of converting the FSF web site over to a Django based one. I sent an e-mail to the lists giving better information on OLPC for those interested as well. Noah Kantrowitz responded offering to help anyone in the group get started with development.

June 22nd, 2007

OLPC @ MassTLC OSS (part 2)

Without further ado, here is Ivan.

I have long wondered how they get so much work done in such a short period of time. ‘When do they sleep?’ I have often though. Now I know, they don’t sleep. Ivan gave this talk on no sleep. Seriously, he had not slept the night before and was somewhere on hour 38!

Dan Bricklin recorded the video, and also has full audio recordings (podcasts) of the event. Dan has some other fantastic podcasts, including one with Antonio from Tabblo, so please check those out at your leisure.

Read the rest of this entry »

June 18th, 2007

OLPC Keynote at MassTLC OSS

Just a reminder that the MassTLC Open Source Summit is tomorrow morning! It is $40 online or at the door (and $20 for MassTLC members). A pastery cart, coffee and a bag lunch will be provided (though I do not have the exact details on this). Detailed information and directions are provided below.

One of the trademarks of this event is the level of audience interaction. The talks are not passive events where attendees absorb what is spoken up on some shielded dais. As in the past the event is broken down into three parts (legal, business, and community), plus a keynote. The plan is to have the event recorded, both audio and video, for release in multiple mediums.

Legal

The legal discussion will be on the GPLv3, both the road traveled so far and what the future holds. The process has been extremely open with the community participating in its drafting on an unprecedented level. This panel discussion is a continuation of that effort as well as an examination of that effort.

Business

The corporate panel discussion is not what one would normally expect. There is no real summary description for this panel due to the nature of it. How do you describe an open discussion? I will try. Listening to marketing folks from large companies describe why their open source strategy is the winner is not that interesting or rewarding. The focus here is on the audience and learning from the successes and failures of local companies large and small. Businesses who rely on Open Source are really relying on the communities they foster. The audience is made up of those communities and budding open source based ventures. Here is a chance to discuss the strategies of the day and get a greater understanding for this complex and thriving ecosystem.

Community

For the community section this year we are trying something new. Mark Withington of the Boston PHP group is running a Lightning talk session. These are ~5min presentations by local community members. These can be very exciting. One problem with any conference is that not all topics will interest all people. You also want to have a good range of topics. The purpose of the event is to foster Open Source in Massachusetts. this is done by building relationships and helping connect people. This can be hard for a small event such as ours. Lightning talks offer a great opportunity for this. For the general audience, they are given information dense overviews of topics and events they are interested in. If you are not interested in a topic, just like the local weather, just wait 5min. Presenters, while limited by time, are forced to communicate only the core information they need to get across. The idea is to engage the audience and get them interested in what you have to say. The point is not to answer all the questions an attendee might have for you or your project, but to just get the interested and hungry for more information. After all the talks are completed there will be a break before the keynote and space for attendees to gather and talk to the Lightning Talk presenters. Here is where connections are made and presenters can connect to those who are really interested in what they have to say, and can focus on exactly what people are interested in.

OLPC Keynote

As I have mentioned here and elsewhere, I feel that the software effort behind the OLPC project has been playing second fiddle to the hardware. There has been much attention given to the ‘laptop’, the innovative hardware technology, and some on the ‘child friendly’ interface. Very little has been discussed about the revolutionary new operating system being developed. Did you know the firmware is hardware independent? Did you know that while the kernel is based on Red Hat Fedora, the higher level operating system, including the file system is written in a dynamic interpreted language? Did you know that all you have to do is press a button on the laptop and you get to modify the code for whatever application is currently running? Did you know this is all done securely and using a revolutionary process management system where each process gets its own VM? This is not linux. This is something else entirely, and yes it is 100% open source. I will be handing out the Sugar SDK Live CD which includes a full linux development environment for developing applications for this revolutionary new system.

The Details

Read the rest of this entry »

June 13th, 2007

Dual Core OLPC!

The Nerd Core group Dual Core just released their new album (do they still call them that?) Zero One. Just bought it and it’s fantastic! Check out the track 12 ‘The Children’s Machine‘ sample.

June 4th, 2007

MassTLC 2007 Open Source Summit!

Tuesday, June 19, 2007; 7:30 am registration; 8-12:00 pm
Microsoft, 201 Jones Road, Waltham, MA (map)

Join us for this half-day summit as technology leaders discuss the current state of open source and the implementation of collaborative development models. The program will spotlight innovative open source companies in a rapid fire lightning round session. The summit will culminate with a keynote presentation on OLPC (One Laptop Per Child), an initiative lead by Nicholas Negroponte, as recently featured on 60 Minutes.

8:00 am — Opening Remarks

8:15 am — The Year in Review and the Years to Come: GPL3 and what it tells us about the current and future prospects of free and open source software. A member of each of the four GPL 3 Committees will review the new license, the process by which it was promulgated, and what it all tells us about the current state of free and open source software.

  • Karen Copenhaver, Partner, Choate, Hall & Stewart
  • Ira Heffan, Associate, Goodwin Proctor
  • Scott K. Peterson, Senior Counsel, Intellectual Property, Hewlett-Packard Company / Andover, MA
  • David Rickerby, Partner, Choate Hall & Stewart

9:00 am — Open Source Strategies

  • Larry Alston, VP of Corporate Strategy, Iona
  • Robert Sutor, VP Standards & Open Source, IBM
  • Don Fisher, VP of Online Services, Red Hat
  • Justin Steinman, Director of Linux Marketing, Novell

10:00 am — Break

10:15 am — Lightning Rounds

  • Andromeda
  • BlackDuck
  • DevZuz (Simula labs)
  • Drupal
  • enterpriseDB
  • Please contact the Open Source Cluster Advisory Board at tom@masstlc.org if you are interested in presenting during the lightning round session.

11:15 am — OLPC Keynote. One Laptop per Child (OLPC) is a non-profit organization aiming to redefine learning and education for the world’s children by providing each child with a specially-developed, innovative, and low-cost laptop. We will introduce the initiative and then dive into the challenging engineering behind the OLPC software platform, covering everything from its unusual firmware to its new child-friendly GUI.

  • Ivan Krstić , Director of Security Architecture, OLPC (One Laptop Per Child)

Cost: Members/Non-Members, $20/$40

Sponsors: Choate, Hall & Stewart; IBM

Register Online : http://function.masstlc.org/programs_new/event_single.cfm?eventid=787

May 21st, 2007

OLPC on 60 Minutes

I caught the OLPC piece on 60 Minutes (video) last night and am equally delighted, and disappointed. Other news groups are picking up the story, centered around the Intel competition. Slashdot even has an article on it. Watching and reading these stories, I can’t help but hear the implicit editing reporters do to make a story more salacious than it is. I am disappointed that once again the focus is on the hardware alone. I am disappointed that ‘problems’ described with the project were not examined in any real depth. There are problems, and they deserve more than ‘[Wayan Vota] is concerned about the additional cost of the project for infrastructure such as satellite dishes.‘ While I understand that not everything can be covered in detail, there are some oversights which need clarification.

Read the rest of this entry »

|