Information Patterns: series introduction

Every time a new data format spec hits my inbox, I get a little twinge of dread.

Such documents are often enormous. They’re written in standardese (often badly). They’re usually written by committees. They go through a maze of twisty little revisions, all different.

But worst of all, they often bury their novelty in a sea of details that resemble those in the last spec I reviewed.

I’d like to do for data formats and other information representations what the Gang of Four book does for programs: call out and label the patterns that come up over and over again so that I can classify details into bigger chunks for mental processing.

You can expect to see several different kinds of post in this series:

  • Case studies. I have to look at lots of actual data formats in order to discern the patterns!
  • Data format patterns. Most posts will be about patterns I find in data formats…
  • Information usage patterns. …but some posts will be about how information is generated, stored, and used.
  • Other. I’ll probably think of some other topics as well.

I expect to look at simple cases, such as comma-separated values, as well as fiendishly complex cases, such as PDF. Programming-language syntaxes are fair game; database index disk structures are right out. In between, I’ll draw the boundary as interest dictates.

This series will be open-ended as long as people keep inventing data formats faster than I can look at them.

Posted in Information Patterns, Information Philosophy | Leave a comment

Quick addendum about the Chandler Repository

If my brief article about the Chandler Repository caught your interest, you might want to check out Andi’s blog, in which he discusses some of the design and implementation issues.
Accounts of software design from the time of implementation are incredibly valuable. Maybe someday we’ll have a search tool that lets you enter your implementation ideas and gives you back the accounts of people who have tried them before.

Posted in Storing data | Leave a comment

The Chandler Repository

I spent a few hours yesterday in the company of Andi Vajda, lead developer of the data repository component of the Open Source Applications Foundation’s Chandler project. We talked about the technical details of the repository.

The Chandler repository is an object database with some interesting design features:

  • All links between objects are bidirectional. Andi’s main motivation for this choice was to be able to guarantee to clients that references aren’t dangling without having to implement a garbage collector.
  • All objects have universally unique identifiers (UUIDs) to make equality testing trivial.
  • Repositories are versioned. Aside from the usual rollback capabilities this enables, client code can inspect a particular object as it was in any extant version.
  • In addition to a conventional notion of class inheritance, the Chandler repository supports a notion of “cloud inheritence” for merging data schemas when copying objects from one database to another.

The repository layer is cleanly separated from the layer of Chandler that knows the semantics of calendar items, tasks, and so forth. That layer is responsible for mapping its items to repository items. Interchange of personal information is carried out at this higher layer, not by the repository. There’s an interesting bit of Python metaprogramming by Philip Eby that hides some of the mapping complexity from the application-level programmer.

I’m going to have a look at the source to see how this works. There are popular application-building tools that carry out object/relational mappings (moving between the objects that are convenient to use in code and the representations that are convenient in the usual relational database), but this is the first time I’ve come face-to-face with an object/object mapping tool. I’ll report back when I’m better informed.

[Thanks to Andi for fact-checking this post!]

Posted in Storing data | Leave a comment

Scott Rosenberg’s Dreaming in Code [book pointer]

Scott Rosenberg‘s Dreaming in Code is the best journalistic portrayal of software development that I’ve ever read.

The romantic cliché of the lone introverted genius shaping masterpieces through many midnights of unfathomable incantations is mercifully absent. Rosenberg follows the Open Source Applications Foundation‘s Chandler project through several years of development, from initial impetus to its milestone 0.6 release. We see the process as it actually is: as a highly social undertaking in which people pass through the project, and the project passes through people’s lives. The developers have families, pets, outside interests; they also have passions (often conflicting) about technology and the process of creation.

Dreaming in Code is much more than a simple chronicle: Rosenberg delves deeply into the history of software development and the frustration it causes for its participants and customers as the results never seem to improve even as the underlying hardware undergoes the most rapid progress of any technology ever.

Issues of data representation, storage, and synchronization are front and center in Dreaming in Code, all carefully explained by the author in terms that make sense to the non-practitioner while remaining recognizable to us professionals (he’s really, really good at this).

I might give this book to my mom to read.

[Disclosure: I've known Andi Vajda, one of the developers portrayed in the book, for about twenty years, and count him as a friend.]

Posted in Books, Resources | Leave a comment

Kent’s Data and Reality [book pointer]

Let me kick off this blog by pointing to William Kent’s classic book Data and Reality.

Lots of books will teach you how to process data with particular technologies, but Kent’s book goes deeper. He shows in chapter after chapter how database practice fails to match the way humans actually use information.

Data and Reality is almost thirty years old, but the issues haven’t really changed: if anything, they’re much more in our collective faces.

This book may be for you if:

  • you feel strongly that you and your Social Security number (U.S. tax identification) are not the same thing,
  • you wonder whether Mark Twain and Samuel Clemens were the same person for all purposes,
  • you don’t know what to put down for Homer’s year of birth in that author/title cataloguing app you downloaded,
  • you wonder about people who think that something doesn’t exist if it’s not in the expected database (or if it’s not on the Web).

Information philosophy

If these issues sound a lot like the first ten minutes of a college philosophy course, that’s intended. Philosophy is all about seeking answers to questions we don’t often pause to ask.

Pause.

Pause.

We run into questions like these all the time in building software, especially now that we’ve woven the Web and woven ourselves and our lives into it.

On Information in Rotation I’m going to call this category “Information Philosophy”; I think it will get woven in with the more orthodox techy blog-fodder as we go along.

Anyhow, I strongly recommend Data and Reality, which is available as print-on-demand or (inexpensive) eBook from the publisher at the link I gave above (as of 2006-12-28).

[The paperback has ISBN 9781585009701 and the eBook 9781420898880. Does this make them different books?]

Posted in Information Philosophy, Resources | Leave a comment

Welcome!

Welcome to Information in Rotation!

As I write this note in December 2006, there are a couple of snappy phrases that loom large in describing what we expect from the use of data on the World Wide Web:

  • The Semantic Web vision looks for a “web of data” that is deeper and richer in connections between facts than the established web of documents.
  • The Web 2.0 concept sees the Web as a platform for personal data, layered services, and participatory social use of information.

This blog is for technology-oriented people who want to dive in and help these trends along in useful ways.
I’ll be driving my analyses, reviews, how-to pieces, and occasional heretical manifestos from some concrete data-centric problems that we all probably share:

  • organizing our digital music and other media,
  • synchronizing personal contact and calendar information among multiple devices,
  • avoiding data loss,
  • sharing information with others.

I expect to delve into many data formats and lots of software, sites, and services as we go along.

I’ll try to keep the entries in the main feed short, with summaries and pointers to the longer pieces. I doubt I’ll have any breaking news, and I’ll try to consolidate single-link posts into a week-in-review item.

My personal site, danrabin.com, is about me; I hope that Applied Rotation will be about us.

Posted in About the blog | Leave a comment