Creating a smarter Web for a lazier human race

Wired on Friday One of the curious aspects of the technology march is how predetermined it looks in the past, and how conditional it seems in the present. The car, the plane, the Web, all look to us now as artifacts that would have cropped up no matter what we did. If Daimler and Benz, the Wright Brothers, or Tim Berners-Lee had never existed, we feel that someone else would have come up with something similar.

Yet we are always being cautioned about the great responsibility we have to choose the correct technologies for the future. Genetically modified foods? Supersonic passenger jets? The future of the Web? We are always at the crossroads.

For the past few years, in the academic birthplaces of the Web, much time has been spent discussing its future. Its creator, Tim Berners-Lee, has a vision, and he and others have been exhorting the rest of the Net to join him in implementing it. We are, it is said, at the crossroads.

Berners-Lee's path leads to what he calls "the Semantic Web". As with anything connected with semantics, it is a little tricky to define.

It's about dragging the Web back to what it should always have been: a library of facts and opinion not readable solely by millions of human users but comprehensible to the machines it passes through.

Think of the Semantic Web as a parallel Web written not in the (overly) familiar language of our Web pages, but in the careful code-like logic of computers.

On the current Web, I write on my homepage and I put up a picture of my cat, Dyson. On the Semantic Web, a translation of this will exist: something like "There exists a thing, X, which is called 'Dyson', which is a thing in the class of things called 'domestic cats', and which has the relationship 'is-a-pet-of' to thing 'Danny O'Brien'. And this is a 'picture' of 'Dyson'."

Computers may never have petted a 'domestic cat', but with the help of a Semantic Web they can discover that it has "fur", goes "meow", there's one called "Dyson", and here is a picture of it. With such a Semantic Web, and with their own capabilities in logical deduction, the claim goes, computers will be able to make smart decisions on how best to obey our commands.

So, for instance, if I book a holiday, the part of my PC responsible for the booking will be able to use the Semantic Web to determine that I have a pet-thing to arrange a stay in a cattery for "Dyson". It will warn me if a friend is coming to stay with cat allergies (because my friend will, on the Semantic Web, have the attribute "allergy" set to "domestic cats").

It's an exciting vision, and one that you might imagine most technologists - especially internet technologists - would share. But Berners-Lee has had an uphill struggle selling his new Web - especially to those who have grown up with the old.

Even though Berners-Lee's first Web was supposed to be a "bit" machine-readable, it turned out very messy. Many feel the pure Semantic Web will end the same way. They feel the solution to that is not to invent a new Web more friendly to machines (a doomed exercise in a messy world), but to force machines to understand more from the current untidy Web.

The ultimate example of this is the search engine Google. Google's developers are notoriously smart, in an academic sense. But they're also notoriously sceptical of the advantages of a parallel Web of knowledge. For instance, at the start of this piece, I needed to know who invented the car. So I typed "who invented the car?" into Google.

Google has no understanding of that question. It does not, as the Semantic Web would, need to parse that into verb and nouns, and have a healthy understanding of the associations of invention and automobile. In fact, it ignores at least half of what I've asked it - Google doesn't search for common words like "who" or "the". It looks for pages that are associated with the phrase "invented car", either because the page contains them or because others have linked to that page using that phrase.

And it works. The first 10 hits give me nearly as many answers as to the inventor of the car - Charles Kettering, Daimler, Benz, Joseph Cugnot, Langen and Otto, Henry Ford. For my needs, I picked the most familiar - Daimler/Benz. Google and I have fumbled to an answer.

Before the Semantic Web did that, I'd have to ask a far more specific question. I - or a very smart program - would have to convert my question, with all its ambiguities (petrol-driven cars or steam automobiles? Inventor, or populariser? Patent-holder or first to market?), into a piece of machine-readable code. And then it would search the Semantic Web for answers - looking through millions of facts, making deductions, to come to a final conclusion.

The Semantic Web will be neater. The problem is, as Berners-Lee says: "Instead of asking machines to understand people's language, [the Semantic Web] involves asking people to make the extra effort."

Crafting a machine-readable Semantic Web translation is a specialist activity, requiring a fair amount of deep thought and forward-planning. The billion webpage question is: will a smaller, more precise Semantic Web ever be able to compete with a much larger, fuzzier Web?

Berners-Lee and his supporters are putting their money on the Semantic Web providing and exhorting the world to move toward it by encoding their knowledge. Companies like Google are putting their money on the laziness of the millions providing the sloppy knowledge of the old Web, the cumulative value of that knowledge, and the ability of computers to dumbly, but speedily, plough through the mulch. Nobody knows who is right. But perhaps, like all those other crossroads in our history, both paths will lead to the same destination: smarter computers helping lazier (but more effective) humans.