So, what is metadata?
During the recent kerfuffle about how much spying the NSA and GCHQ are carrying out on absolutely everyone with a life in cyberspace, the word “metadata” kept cropping up. In this context, the word was used mainly with reference to emails. Email messages have metadata associated with them and the apologists for the government spies claimed that when it came to reading our emails, only the metadata is examined and not the content of the messages themselves. Oh, so that’s alright, then (not).
It’s not just emails that have metadata associated with them. Digital images have metadata attached to them as well. For that matter, so do modern word processing documents, spreadsheets, web pages, and many other types of data files. Metadata can allow one computer program to understand the contents of a file created by a different computer program so that it can perform useful work on that file (eg it may, thereby, know how to display it, or to change its data, or how to print it).
A (possibly simplistic) definition of metadata is that it is “data about data”. A more useful way of expressing it is that it is data that describes the structure of the data it is attached to. So, a web page may contain metadata that tells you what language the page is written in, what program was used to write the page and so on. The metadata of a digital image may include the date, time, and even location of the image’s capture, as well as the make of camera, the lens, exposure settings, file size, file type and so on. The metadata of an email message will include the sender and receiver’s identity, the date and time of sending, the type of language coding used, and so on.
Aforesaid apologists would try and convince us that reading an email message’s metadata is equivalent to noting the information on the outside of a sealed envelope. To describe this analogy as disingenuous would be stretching the use of that word a bit far. Metadata is far more important than that. Metadata is attached to files in a very highly stuctured way. This means that it is easily readable electronically. That is the point of it. A machine can understand the contents of metadata without worrying about all of the vagaries and nuances of computer coding and language, meaning, context, culture, etc that come into play when analysing the actual content of a file. On an email for instance, the “reading machine” would know exactly where to find the information about the sender and receiver’s names, the date and time of sending etc, without having to do any more than read the metadata to be found in the same place every time.
Not only is metadata very quick and easy for machines to read, but the sheer amount of data that is relentlessly hoovered up enables relationships and patterns to be discerned that wouldn’t be apparent to anyone who just had the resources to note the data on the outside of a sealed envelope or two.
Let me offer an example. There is a program – called Immersion – that will analyse your Gmail for you and show you just what metadata it can reveal.
As a single person, I don’t need to fear my spouse analysing my Gmail traffic, but suppose that someone else was looking at my Gmail to see if I was having a clandestine relationship. As the image shows, “Patricia” is a clear candidate according to the metadata of my Gmail traffic. Reading the metadata of my Gmails does not reveal why “Patricia” figures so prominently and anyone analysing this metadata could very easily start coming to erroneous conclusions. What if these analysts are tracking Patricia as well as me and see someone in her analysis who knows someone who knows someone who is of interest to them from a terrorism perspective? If we believe the theory about “six degrees of separation” then this is, in fact, almost bound to be the case. So, by reading the metadata of my Gmails, the spooks can now conclude that I may be a “person of interest” to them, thereby “justifying” further intrusions into my life.
The real story between Patricia and me? Very simple and prosaic, actually. I accidentally sent out an email to all my clients using my Gmail account instead of my normal email account and my Gmail address seems to have insinuated itself into Patricia’s email contact list. If she sends me an email now, it’s always my Gmail address that is selected. If I then reply to that message it will go from my Gmail account unless I manually change it to my normal address. So, according to my Gmail metadata, there appears to be a potentially significant relationship that is, in fact, nothing more than a normal client/supplier relationship. It may not be too difficult to explain that to a (non-existent) spouse, but just multiply this one small example by all the people and all the links and all the metadata in all the emails in the world, and I think the result is an infinite amount of scope for mischief, misinterpretation, and abuse of power.
Just in case you think I may be paranoid, a useful place to start learning a bit more is this link to a Guardian web page on the subject of metadata.