Every time a new data format spec hits my inbox, I get a little twinge of dread.
Such documents are often enormous. They’re written in standardese (often badly). They’re usually written by committees. They go through a maze of twisty little revisions, all different.
But worst of all, they often bury their novelty in a sea of details that resemble those in the last spec I reviewed.
I’d like to do for data formats and other information representations what the Gang of Four book does for programs: call out and label the patterns that come up over and over again so that I can classify details into bigger chunks for mental processing.
You can expect to see several different kinds of post in this series:
- Case studies. I have to look at lots of actual data formats in order to discern the patterns!
- Data format patterns. Most posts will be about patterns I find in data formats…
- Information usage patterns. …but some posts will be about how information is generated, stored, and used.
- Other. I’ll probably think of some other topics as well.
I expect to look at simple cases, such as comma-separated values, as well as fiendishly complex cases, such as PDF. Programming-language syntaxes are fair game; database index disk structures are right out. In between, I’ll draw the boundary as interest dictates.
This series will be open-ended as long as people keep inventing data formats faster than I can look at them.