<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Information in Rotation &#187; Information Philosophy</title>
	<atom:link href="http://appliedrotation.com/Techblog/?cat=6&#038;feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://appliedrotation.com/Techblog</link>
	<description>Dan Rabin writes on metadata, data, the information they represent and how.</description>
	<lastBuildDate>Sun, 01 Nov 2015 20:21:20 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.2</generator>
		<item>
		<title>Geoff Nunberg on Google Books metadata</title>
		<link>http://appliedrotation.com/Techblog/?p=64</link>
		<comments>http://appliedrotation.com/Techblog/?p=64#comments</comments>
		<pubDate>Thu, 03 Sep 2009 16:39:30 +0000</pubDate>
		<dc:creator>Dan Rabin</dc:creator>
				<category><![CDATA[Information Philosophy]]></category>
		<category><![CDATA[Information usage patterns]]></category>
		<category><![CDATA[Metadata]]></category>

		<guid isPermaLink="false">http://appliedrotation.com/Techblog/?p=64</guid>
		<description><![CDATA[Linguist Geoff Nunberg comments on the poor general quality of metadata in Google Books, and why that&#8217;s a problem. It&#8217;s a tough problem: if you do things (like scanning entire libraries) at Google-scale, you just can&#8217;t pay attention to the &#8230; <a href="http://appliedrotation.com/Techblog/?p=64">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Linguist Geoff Nunberg <a href="http://languagelog.ldc.upenn.edu/nll/?p=1701#more-1701">comments on the poor general quality of metadata</a> in Google Books, and why that&#8217;s a problem.</p>
<p>It&#8217;s a tough problem: if you do things (like scanning entire libraries) at Google-scale, you just can&#8217;t pay attention to the details.  One partial way out (which Geoff mentions) is to allow users to submit corrections, as Google Maps does for positions of placemarks.</p>
<p>The article addresses a number of important points about the provenance and usefulness of metadata, and Google employees provide some great comments and discussion.</p>
<p>(Via <a href="http://delong.typepad.com/sdj/2009/09/links-for-2009-09-03.html">Brad DeLong</a>).</p>
]]></content:encoded>
			<wfw:commentRss>http://appliedrotation.com/Techblog/?feed=rss2&#038;p=64</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The two cultures</title>
		<link>http://appliedrotation.com/Techblog/?p=31</link>
		<comments>http://appliedrotation.com/Techblog/?p=31#comments</comments>
		<pubDate>Wed, 24 Jun 2009 21:25:18 +0000</pubDate>
		<dc:creator>Dan Rabin</dc:creator>
				<category><![CDATA[Information Philosophy]]></category>
		<category><![CDATA[Information usage patterns]]></category>

		<guid isPermaLink="false">http://127.0.0.1/Techblog/?p=31</guid>
		<description><![CDATA[Jon Stokes has an excellent description of the two contrasting philosophies of information management in his comparison of the Palm Pre and the iPhone. He names the two approaches &#8220;structure-and-browse&#8221; and &#8220;collect-and-query&#8221;. I feel like I&#8217;ve been groping for these &#8230; <a href="http://appliedrotation.com/Techblog/?p=31">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Jon Stokes has an excellent description of the two contrasting philosophies of information management in his <a href="http://arstechnica.com/gadgets/reviews/2009/06/ars-palm-pre-review.ars" target="_self">comparison of the Palm Pre and the iPhone</a>.  </p>
<p>He names the two approaches &#8220;structure-and-browse&#8221; and &#8220;collect-and-query&#8221;.  I feel like I&#8217;ve been groping for these terse descriptions for years!</p>
]]></content:encoded>
			<wfw:commentRss>http://appliedrotation.com/Techblog/?feed=rss2&#038;p=31</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chris Anderson: One size metadata doesn&#8217;t fit all</title>
		<link>http://appliedrotation.com/Techblog/?p=30</link>
		<comments>http://appliedrotation.com/Techblog/?p=30#comments</comments>
		<pubDate>Wed, 21 Mar 2007 17:04:23 +0000</pubDate>
		<dc:creator>Dan Rabin</dc:creator>
				<category><![CDATA[Areas of application]]></category>
		<category><![CDATA[Information Philosophy]]></category>
		<category><![CDATA[Information usage patterns]]></category>
		<category><![CDATA[Metadata]]></category>
		<category><![CDATA[Musical]]></category>

		<guid isPermaLink="false">http://appliedrotation.com/Techblog/?p=30</guid>
		<description><![CDATA[Misfits of Metadata Chris Anderson of The Long Tail has an important post about how the metadata used in some music-listening applications doesn&#8217;t satisfy the listeners needs: [...] classical is a genre that the one-size-fits-all music aggregators such as iTunes &#8230; <a href="http://appliedrotation.com/Techblog/?p=30">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h3>Misfits of Metadata</h3>
<p>Chris Anderson of <a href="http://www.longtail.com/the_long_tail/" title="The Long Tail">The Long Tail</a> has an<a href="http://www.longtail.com/the_long_tail/2007/03/one_size_aggreg.html" title="One size metadata doesn't fit all"> important post</a> about how the metadata used in some music-listening applications doesn&#8217;t satisfy the listeners needs:</p>
<blockquote><p>[...] classical is a genre that the one-size-fits-all music aggregators such as iTunes don&#8217;t handle particularly well. They&#8217;re oriented around pop music, with its artist, album, track data format. Meanwhile classical music organizes around composer, conductor, performer, soloist</p></blockquote>
<p>He also voices my exact peeve about how jazz is treated:</p>
<blockquote><p>However, neither of them does a very good job with Jazz, where the individual musicians are often more meaningful than the band.</p></blockquote>
<p>Yup.  No reasonable cataloguer of jazz recordings separates &#8220;Thelonious Monk Trio&#8221; from &#8220;Thelonious Monk Quartet&#8221; from &#8220;Thelonious Monk&#8221;.  At the same time, it&#8217;s important to be able to locate all appearances of Thelonious Monk, regardless of whether he was the leader of the session (note that &#8220;leader&#8221; and &#8220;session&#8221; are appropriate terms in jazz discography, but not for pop or classical).
</p>
<h3>When your only tool is a hammer&#8230;</h3>
<p>I can&#8217;t help but wonder if the problems Chris calls out in iTunes come from the poor selection of data tools in most applications programmers&#8217; toolkits.  Relational databases, the current orthodox storage technique, favor using one or more tables, each consisting of records having the same selection of attributes.  There are hacks you can use to simulate having, say, jazz tracks and pop tracks in the same Tracks relation, but hacks and simulations tend to twist one&#8217;s code, so most programmers resist going there.
</p>
<h3>An XML database in every toolbox!</h3>
<p>
We don&#8217;t really have to live this way anymore.  With the popularity of <a href="http://www.w3.org/XML/">XML</a> for data interchange, the tools ecology has given us a <a href="http://www.rpbourret.com/xml/XMLDatabaseProds.htm">variety</a> of <a href="http://www.rpbourret.com/index.htm">XML database systems</a>.  The XML data model has the flexibility to represent varying record structures: in fact, it has much more flexibility than we need for the purpose!</p>
<p>Heretical as it may seem to put the cart of an interchange format before the horse of data abstraction, the XML situation is very useful in practice, at least for databases of moderate size.  The <a href="http://www.w3.org/%20">W3C</a> has come up with the <a href="http://www.w3.org/Style/XSL/">XPath</a> and <a href="http://www.w3.org/XML/Query/">XML Query</a> specifications that provide excellent query mechanisms for data represented in the XML model.  XML Query in particular is designed to look somewhat familiar to the hardened <a href="http://www.jcc.com/sql.htm">SQL</a> user.  There&#8217;s data typing taken from the <a href="http://www.w3.org/XML/Schema">XML Schema</a> <a href="http://www.w3.org/TR/xmlschema-2/">datatype recommendataion</a> as well.</p>
<h3>Better nails</h3>
<p>Anyhow, let&#8217;s learn to design with a more flexible hammer, and maybe we&#8217;ll be able to hit a wider class of nails, rather than our users&#8217; thumbs!</p>
<p><em>March is International Runaway Metaphor Month.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://appliedrotation.com/Techblog/?feed=rss2&#038;p=30</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>OpenStreetMap constructs maps from GPS tracks!</title>
		<link>http://appliedrotation.com/Techblog/?p=29</link>
		<comments>http://appliedrotation.com/Techblog/?p=29#comments</comments>
		<pubDate>Wed, 21 Feb 2007 17:06:04 +0000</pubDate>
		<dc:creator>Dan Rabin</dc:creator>
				<category><![CDATA[Areas of application]]></category>
		<category><![CDATA[Geographic]]></category>
		<category><![CDATA[Information Philosophy]]></category>
		<category><![CDATA[Information usage patterns]]></category>

		<guid isPermaLink="false">http://appliedrotation.com/Techblog/?p=29</guid>
		<description><![CDATA[Sources and uses of digital information are in-scope for this blog, and a great example just showed up in my RSS reader today. OpenStreetMap is a wiki-like project to build a world map using contributed GPS tracks [OpenGeoData pointed me &#8230; <a href="http://appliedrotation.com/Techblog/?p=29">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Sources and uses of digital information are in-scope for this blog, and a great example just showed up in my RSS reader today.</p>
<p><a href="http://wiki.openstreetmap.org/index.php/Main_Page" title="OpenStreetMap home page">OpenStreetMap</a> is a wiki-like project to build a world map using contributed GPS tracks [<a href="http://www.opengeodata.org/?p=167">OpenGeoData</a> pointed me there].  Their map of Baghdad is <a href="http://wiki.openstreetmap.org/index.php/Image:Baghdad.png">here</a>.  </p>
<p>This project is truly a product of the early 21st century: it requires GPS satellites, cheap but accurate GPS receivers, the World Wide Web, inexpensive computers with fast color graphics, and so forth.</p>
<p>And like all modern geographic applications, it also exploits a special property of GPS&#8217;s information domain: everyone agrees on the meaning of geographical location; only dates and times have a similar level of standardization.  In relational-database terminology, this means that any table with a date or location column has a meaningful join with any other.  </p>
<p>This doesn&#8217;t work with most data.  I&#8217;ve had driver&#8217;s licenses in four U.S. states, but you can&#8217;t aggregate my driving record from the state records because they all use different ID numbering schemes (nice for my privacy in this case).</p>
<p>Also noteworthy is the fact that GPS information can be used to put a time dimension into maps, since we can tell <em>when</em> the street is used as well as <em>where</em> it is.  There are some very pretty examples at <a href="http://cabspotting.org/timelapse.html">Cabspotting</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://appliedrotation.com/Techblog/?feed=rss2&#038;p=29</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Should metadata be stored in the file it describes?  Jon Udell wonders&#8230;</title>
		<link>http://appliedrotation.com/Techblog/?p=28</link>
		<comments>http://appliedrotation.com/Techblog/?p=28#comments</comments>
		<pubDate>Wed, 21 Feb 2007 04:20:09 +0000</pubDate>
		<dc:creator>Dan Rabin</dc:creator>
				<category><![CDATA[Information Philosophy]]></category>
		<category><![CDATA[Metadata]]></category>

		<guid isPermaLink="false">http://appliedrotation.com/Techblog/?p=28</guid>
		<description><![CDATA[In &#8220;Whoâ€™s got the tag? Database truth versus file truth, part 3&#8243;, Jon Udell contrasts the Microsoft Vista and Mac OS X ways of associating metadata tags with image files: Vista tends to store them into the image files, and &#8230; <a href="http://appliedrotation.com/Techblog/?p=28">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://blog.jonudell.net/2007/02/20/whos-got-the-tag-database-truth-versus-file-truth-part-3/">&#8220;Whoâ€™s got the tag? Database truth versus file truth, part 3&#8243;</a>, <a href="http://blog.jonudell.net/">Jon Udell</a> contrasts the Microsoft Vista and Mac OS X ways of associating metadata tags with image files: Vista tends to store them into the image files, and Mac OS X tends to leave the files untouched and use a separate database to store the tags (or at least Jon was under this impression).</p>
<p>There&#8217;s a great discussion about the relative advantages of the two approaches on the blog.  Basically, storing the tags in the file makes the association harder to lose as you move the file around, and storing the tags separately avoids modifying the user&#8217;s data file.  Neither one is obviously in accord with the user&#8217;s intention in all cases.</p>
<p>I think the issue has whole extra layers of subtlety.  We perceive metadata that is stored within a data file as being what Jon Udell calls &#8220;file truth&#8221;.  Since there&#8217;s only one set of metadata stored in the file, it becomes the One True Metadata.  On the other hand, metadata stored in a separate database reads as the opinion of the maintainer of the database.  This is exactly what social bookmarking systems such as <a href="http://del.icio.us">del.icio.us</a>do: each attribution of a tag to a URL is also associated with a user making that attribution.</p>
<h4>A pluralistic society <em>requires</em> a separate metadatabase!</h4>
<p>This isn&#8217;t just another engineering tradeoff, though.  The truth about &#8220;file truth&#8221; is that it&#8217;s still an opinionâ€”the opinion of the last agent to modify the metadata within the file.  When there&#8217;s One True Metadata, we can only represent disagreements by obliterating the last guy&#8217;s assertion.</p>
<p>Imagine trying to tag a scan of a photo taken at your parents&#8217; wedding of someone you don&#8217;t recognize.  You think it&#8217;s Dad&#8217;s college roommate, but your sister thinks it&#8217;s Mom&#8217;s second cousin.  You have one &#8220;person depicted&#8221; slot: do you fight over it?  Do you leave it blank and explain the situation in a semantically bland catch-all description field?  Or do you each tag it as you will in your respective databases?</p>
<p>Not only is it unrealistic to allow for only one true description of a file, it&#8217;s also time we stopped regarding metadata as lost forever just because it&#8217;s not stored in the file.  We could set up a distributed database that works like <a href="http://www.gracenote.com">Gracenote</a>&#8216;s CD identification database, but for all files instead of just music files.  As with CDs, the lookup key for a file can be generated by anyone who possesses the file (by applying a secure hash), but the particular metadata obtained depends on which tagger&#8217;s part of the repository is consulted.  It&#8217;s all doable, and it would eliminate blogstorms about how evil application X erases user metadata.</p>
]]></content:encoded>
			<wfw:commentRss>http://appliedrotation.com/Techblog/?feed=rss2&#038;p=28</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Information Patterns: series introduction</title>
		<link>http://appliedrotation.com/Techblog/?p=27</link>
		<comments>http://appliedrotation.com/Techblog/?p=27#comments</comments>
		<pubDate>Fri, 02 Feb 2007 18:25:25 +0000</pubDate>
		<dc:creator>Dan Rabin</dc:creator>
				<category><![CDATA[Information Patterns]]></category>
		<category><![CDATA[Information Philosophy]]></category>

		<guid isPermaLink="false">http://appliedrotation.com/Techblog/?p=27</guid>
		<description><![CDATA[Every time a new data format spec hits my inbox, I get a little twinge of dread. Such documents are often enormous. They&#8217;re written in standardese (often badly). They&#8217;re usually written by committees. They go through a maze of twisty &#8230; <a href="http://appliedrotation.com/Techblog/?p=27">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Every time a new data format spec hits my inbox, I get a little twinge of dread.  </p>
<p>Such documents are often <em>enormous</em>.  They&#8217;re written in <em>standardese</em> (often badly).  They&#8217;re usually written by <em>committees</em>.  They go through a maze of twisty little <em>revisions</em>, all different.</p>
<p>But worst of all, they often bury their novelty in a sea of details that resemble those in the last spec I reviewed.</p>
<p>I&#8217;d like to do for data formats and other information representations what the <a href="http://www.amazon.com/Design-Patterns-Object-Oriented-Addison-Wesley-Professional/dp/0201633612">Gang of Four book</a> does for programs:  call out and label the patterns that come up over and over again so that I can classify details into bigger chunks for mental processing.  </p>
<p>You can expect to see several different kinds of post in this series:</p>
<ul>
<li><strong>Case studies.</strong> I have to look at lots of actual data formats in order to discern the patterns!</li>
<li><strong>Data format patterns.</strong> Most posts will be about patterns I find in data formats&#8230;</li>
<li><strong>Information usage patterns.</strong> &#8230;but some posts will be about how information is generated, stored, and used.</li>
<li><strong>Other.</strong> I&#8217;ll probably think of some other topics as well.</li>
</ul>
<p>I expect to look at simple cases, such as comma-separated values, as well as fiendishly complex cases, such as PDF.  Programming-language syntaxes are fair game; database index disk structures are right out.  In between, I&#8217;ll draw the boundary as interest dictates.</p>
<p>This series will be open-ended as long as people keep inventing data formats faster than I can look at them.</p>
]]></content:encoded>
			<wfw:commentRss>http://appliedrotation.com/Techblog/?feed=rss2&#038;p=27</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Kent&#8217;s Data and Reality [book pointer]</title>
		<link>http://appliedrotation.com/Techblog/?p=8</link>
		<comments>http://appliedrotation.com/Techblog/?p=8#comments</comments>
		<pubDate>Fri, 29 Dec 2006 05:45:02 +0000</pubDate>
		<dc:creator>Dan Rabin</dc:creator>
				<category><![CDATA[Information Philosophy]]></category>
		<category><![CDATA[Resources]]></category>

		<guid isPermaLink="false">http://appliedrotation.com/Techblog/?p=8</guid>
		<description><![CDATA[Let me kick off this blog by pointing to William Kent&#8217;s classic book Data and Reality. Lots of books will teach you how to process data with particular technologies, but Kent&#8217;s book goes deeper. He shows in chapter after chapter &#8230; <a href="http://appliedrotation.com/Techblog/?p=8">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Let me kick off this blog by pointing to William Kent&#8217;s classic book <em><a href="http://www.authorhouse.com/Bookstore/ItemDetail~bookid~2713.aspx">Data and Reality</a></em>.</p>
<p>Lots of books will teach you how to process data with particular technologies, but Kent&#8217;s book goes deeper.  He shows in chapter after chapter how database practice fails to match the way humans actually use information.  </p>
<p><em>Data and Reality </em>is almost thirty years old, but the issues haven&#8217;t really changed: if anything, they&#8217;re much more in our collective faces. </p>
<p>This book may be for you if:</p>
<ul>
<li>you feel strongly that you and your Social Security number (U.S. tax identification) are not the same thing,</li>
<li>you wonder whether Mark Twain and Samuel Clemens were the same person for all purposes,</li>
<li>you don&#8217;t know what to put down for Homer&#8217;s year of birth in that author/title cataloguing app you downloaded,</li>
<li>you wonder about people who think that something doesn&#8217;t exist if it&#8217;s not in the expected database (or if it&#8217;s not on the Web).</li>
</ul>
<h3>Information philosophy</h3>
<p>If these issues sound a lot like the first ten minutes of a college philosophy course, that&#8217;s intended.  Philosophy is all about seeking answers to questions we don&#8217;t often pause to ask.  </p>
<p>Pause.</p>
<p>Pause.</p>
<p>We run into questions like these all the time in building software, especially now that we&#8217;ve woven the Web and woven ourselves and our lives into it.  </p>
<p>On <a href="http://appliedrotation.com/Techblog/"><em>Information in Rotation</em></a> I&#8217;m going to call this category &#8220;Information Philosophy&#8221;; I think it will get woven in with the more orthodox techy blog-fodder as we go along.</p>
<p>Anyhow, I strongly recommend <em>Data and Reality</em>, which is available as print-on-demand or  (inexpensive) eBook from the publisher at the link I gave above (as of 2006-12-28).  </p>
<p>[The paperback has ISBN 9781585009701 and the eBook 9781420898880.  Does this make them different books?]</p>
]]></content:encoded>
			<wfw:commentRss>http://appliedrotation.com/Techblog/?feed=rss2&#038;p=8</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
