How Data Became Metadata

Is it just me, or is the news that the NSA is getting off the hook on its surveillance of us because it’s just “metadata” more than a bit duplicitous? Somehow the general public is being sold this idea that if the NSA is not looking at the content of our phone calls or email, then maybe it’s alright.  I’m not sure where this definition of metadata came from, but  unfortunately it’s one of the first that the general public has had and it’s in danger of sticking. Our industry has not done ourselves any favors by touting cute definitions of metadata such as “data about data.” Not terribly helpful.  Those of us who have been in the industry longer than we want to admit, generally refer to metadata as being equivalent to schema.  So the metadata for an email system might be, something like:

  • sender (email address)
  • to (email addresses)
  • cc (email addresses
  • bcc (email addresses)
  • sent (datetime)
  • received (datetime)
  • read (boolean)
  • subject (text)
  • content (text)
  • attachments (various mimetypes)

If we had built an email system in a relational database, these would be the column headings (we’d have to make a few extra tables to allow for the multi-value fields like “to”).  If we were designing in XML these would be the elements. But that’s it.  That’s the metadata. Ten pieces of metadata. The NSA is suggesting that the values of the first seven fields are also “metadata” and that only the values of the last three constitute “data.”  Seems like an odd distinction to me.  And if calling the first seven, presumably harmless “metadata” and thereby giving themselves a free pass doesn’t creep you out, then check out this: . Some researchers at MIT have created a site where you can look at your own email graph in much the same way that the NSA can. Here is my home email graph (that dense cloud above is the co-housing community I belong to and source of most of my home email traffic).  Everybody’s name is on their nodes.  All patterns of interaction are revealed.  And this is just one view.  There are potentially views over time, views with whether an email was read, responded to, how soon etc., etc. Imagine this data for everyone in the country, coupled with the same data for phone calls. If that doesn’t raise the hackles of your civil libertarianism, then nothing will. ‘Course, it’s only metadata.   by Dave McComb

Scroll to top
Skip to content