In “Who’s got the tag? Database truth versus file truth, part 3″, Jon Udell contrasts the Microsoft Vista and Mac OS X ways of associating metadata tags with image files: Vista tends to store them into the image files, and Mac OS X tends to leave the files untouched and use a separate database to store the tags (or at least Jon was under this impression).
There’s a great discussion about the relative advantages of the two approaches on the blog. Basically, storing the tags in the file makes the association harder to lose as you move the file around, and storing the tags separately avoids modifying the user’s data file. Neither one is obviously in accord with the user’s intention in all cases.
I think the issue has whole extra layers of subtlety. We perceive metadata that is stored within a data file as being what Jon Udell calls “file truth”. Since there’s only one set of metadata stored in the file, it becomes the One True Metadata. On the other hand, metadata stored in a separate database reads as the opinion of the maintainer of the database. This is exactly what social bookmarking systems such as del.icio.usdo: each attribution of a tag to a URL is also associated with a user making that attribution.
A pluralistic society requires a separate metadatabase!
This isn’t just another engineering tradeoff, though. The truth about “file truth” is that it’s still an opinion—the opinion of the last agent to modify the metadata within the file. When there’s One True Metadata, we can only represent disagreements by obliterating the last guy’s assertion.
Imagine trying to tag a scan of a photo taken at your parents’ wedding of someone you don’t recognize. You think it’s Dad’s college roommate, but your sister thinks it’s Mom’s second cousin. You have one “person depicted” slot: do you fight over it? Do you leave it blank and explain the situation in a semantically bland catch-all description field? Or do you each tag it as you will in your respective databases?
Not only is it unrealistic to allow for only one true description of a file, it’s also time we stopped regarding metadata as lost forever just because it’s not stored in the file. We could set up a distributed database that works like Gracenote’s CD identification database, but for all files instead of just music files. As with CDs, the lookup key for a file can be generated by anyone who possesses the file (by applying a secure hash), but the particular metadata obtained depends on which tagger’s part of the repository is consulted. It’s all doable, and it would eliminate blogstorms about how evil application X erases user metadata.
hi,
inside the tags, that is the database. AND – it only rides piggyback with the object data. it’s not “in” the data as you philosophically state.
so there already is a seperate metadatabase from the content – semantically speaking
it’s just that the file format, like jpeg or whatever, restricts the space and type of database that it can be. so in your analogy, yes, they do have to fight over what to put into a limited space – but only because the format restricts that space. you only need a new format, your don’t need a third abstraction
if the format simply allowed for the metadata wrapper to be extensible, and create child nodes – then it would be fine, and everybody could add whatever they want between thier own tags.
AND you could still keep the existing standards too
the database wrapper is traversed like dom
just have an **agreed upon tag** for the start of the actual bonafide “content” (like… or something…), and apps could put whatever they want above that.
isn’t that already allowed in some formats? am i missing your point, perhaps…
so do we just need a new format, in which the tag wrapper can be of any size, and can be appended to by apps…tagging inside their own unique tagspace, like:
camera info
fltered using…
.
.
.
and then, the file contents, the actual “DATA” would reside in a secure tagspace
– and it would live with the file. no second data location farther away
maybe i ate too much salsa?