Mark Dominus has an interesting post in which he does some serious software archaeology trying to discover how and when a piece of Unix filesystem metadata called “ctime” changed from being “creation time” to representing “change time”.
Mark’s post got my attention because it gives a detailed look at a case of what I term metadata drift: the tendency of metadata properties to change their meaning under the pressure of either
- clients using the field that will give them the results they need (even if the semantic fit is poor), or
- implementers following their convenience rather than the defined intent of the item.
I see this all the time in the music metadata that iTunes pulls off of CDDB. I do it myself for classical recordings, where I want to record both the composer’s name and the performer’s. I want classical music indexed primarily by composer, but rock and jazz by performer. The iTunes software makes you pick one or the other, so I abuse the album field to capture the performer’s name.
CDDB contributors make different choices under this pressure, so I spend a fair amount of time when ripping CDs editing metadata. This is just an inconvenience for me, but the ctime issue that Mark Dominus investigates can have serious consequences if, say, a revision-control system makes the wrong assumption about the semantics of “ctime” on a particular file system.