Wired on Friday: Data, for all its high-tech connotations, is nothing new. From the time of cuneiform inscriptions we've recorded it: facts, pictures, words, records, memories.
And for just as long, we've recorded data about that data: the dates it was recorded, who it was recorded by, categories that it was filed under.
Archivists and record-keepers call that data about data, metadata. We use metadata to navigate around our knowledge. It's what you use to find books in a library or a particular person in your address book.
Now, on the net, the supply of data we can gather dwarfs all the data we have collected before, and the collection is growing by the second. Metadata to navigate through this storm is, by contrast, a tiny fraction.
That's because crafting good metadata is hard. What makes sense as a category is culturally dependent and highly subjective. Jorge Luis Borges highlighted the problem of incorrect categorisation by talking about the Celestial Emporium of Benevolent Knowledge, which divided animals into metadata that defined them as (a) those that belong to the Emperor, (b) embalmed ones, (c) those that are trained, (d) suckling pigs, (e) mermaids, (f) fabulous ones, (g) stray dogs, (h) those that are included in this classification, and so on.
It's difficult, as one files, to know what metadata is arbitrary and what is not, and what will benefit future searchers and what is irrelevant.
Then there is the problem of deliberate abuse. Categorisation is often a politically-charged act, as in the signs that once read 'No Dogs or Irish' showed.
Given these problems, it's not surprising that metadata is a specialised business, the purview of postgraduate-trained professionals. But increasingly online, as users are faced with a torrent of data and a paucity of experts, a new and fast-growing source of metadata is emerging: you.
Who wants - or has the skill - to be a volunteer librarian? At some small level, we all do it, at least, to our own collections of data. When you pop a CD into a computer, the chances are the computer will recognise it - knowing the album title, the artist, and the track names.
If it doesn't, CD-playing programs will generally let you type in your own metadata for your obscure new purchase. What's less obvious is what happens after you've typed in your song names. They're sent back up to the central database. Thereafter, anyone else who inserts that CD into their computer will have your metadata presented to them.
That's how the database works - those CDs you previously inserted were recognised because some other user typed in the metadata, just as you did. You are a tiny part of a giant, distributed metadata stewardship.
More web sites are now offering 'tagging' capabilities to their users' data . Tagging is adding single words or small phrases of personally-significant metadata to applications on the web. It's fast and simple to do. The data, and its tiny dots of metadata, are then shared with others, who can edit and search them at will.
Del.icio.us is a communal bookmarker that grew out of programmer Joshua Schachter's efforts to organize his bookmarks. "When I started working on 'delicious', seemed like an obvious feature," he says.
After all, he tagged his own bookmark collection with one-word category descriptions.
Now, at del.icio.us, thousands of users bookmark anything on the web they want, and give it metadata tags according to their own interests. The contributed collection - with its metadata - is made available for free online.
What makes del.icio.us fascinating is you can see and search the tags that others have contributed. So walk through 'productivity' or 'darfur' and you'll quickly see a growing collection of items tagged for exploration. Adding your own tags is as easy as browsing others. Like organising your CD collection, creating meta-data becomes easy and, for the more obsessive-compulsive amongst us, rather fun.
Other sites that have picked up on the idea of quick-and-dirty metadata are blogging programs like Moveable Type and photography sites like Flickr.
All add much-needed metadata to the web. The clues left by users sorting their own collection are now being picked up by search engines like Technorati, which are incorporating such user-contributed metadata to their knowledge regarding where to search online.
But with metadata creation converted from an intense profession to a casual side-effect, Borges' problem persists, and is perhaps amplified. As Joshua Schachter puts it, "I think rely too much on a given person's internal context. My 'cat' is computer-aided translation, and your 'cat' is about felines. How do you resolve this?".
Prejudice plays a part too. Tags on 'cancer' show medical sites, and also sites that the taggers view as a metaphorical 'cancer' on the web.
User-generated author metadata suffers from many malicious and ignorant entries, but unlike professional or technical ways of acquiring metadata, user-editable results are rechecked and corrected.
It's the difference between classifying your own private collection, and having your classification peer-reviewed by many others. "I think systems that let multiple people tag the same content work well," says Schachter.
As long as corrections can be made as easily as the bad metadata can be added, the system tends towards good stewardship of the meta-data.
Perhaps that's what will save this collective bank of knowledge about knowledge from collapse.
Quick and dirty tagging may have professional archivists smacking their foreheads with frustration, but perhaps what the web needs is exactly what it has always benefited from, a messy, just-good-enough approach that rewards participation as much as professionalism.