The new smarter web

 

INTERNET TECHNOLOGY:Coming soon to an internet connection near you: the semantic web, writes  KARLIN LILLINGTON

IF THE term is news to you, don't feel too bad. Until fairly recently, the semantic web - also known as Web 3.0 - has mostly been the subject of research work, trials and theorising. But it is starting, tentatively, to become a slow-growing reality, with potential legal implications for web users, especially organisations that manage large amounts of data.

That's because the core idea of the semantic web is making information available to and understandable by other computers on the web, which in turn makes it more easily and accurately searched, archived, used and reused, manipulated, mined and accessed.

Or put another way, the semantic web creates "smart data" and links it together in more productive ways for both the machines that handle it and the humans who want to use it.

How is data made "smart"? By marking it up: encoding information about the data into the data so it is tagged intelligently and can explain to another machine what it means. The general idea is similar to what many people already do with their own data - say, a blog post or a picture uploaded to the web. Adding descriptive tags - words or phrases that help identify the image or the piece of writing - are modest and minimal building blocks for the semantic web.

The real change will happen with encoding on a larger scale, when organisations start tagging the volumes of the data they generate and store by using the agreed standards - called OWL (web ontology language) and RDF (resource description framework) - for doing so.

"The semantic web is going to build an ontology - a taxonomy with rules. Once there's that structure, then a machine can start doing things," says Liam O'Morain, a business consultant to DERI (Digital Enterprise Research Institute) at University College Galway, which carries out internationally recognised work in the area of the semantic web.

He gives the example of a search using the word "china". Offhand, a machine or a search engine will not know whether the searcher wants information related to ceramic dishes, or to the country. If properly tagged on the semantic web, he says, the data would identify itself as being relevant to China the country, or china plates. For the searcher, the returned data is more likely to be relevant and less time is spent searching.

Why are these moves towards a new version of the web happening now? O'Morain says the web and the way we use it has changed significantly in recent years, but the technology underlying it has not.

"The technology hasn't changed a lot from http and html but the user has gone from publish and push, to participation, and there's the whole phenomenon of open source, software as a service [ service-oriented architecture or SOA], crowd sourcing, etc. We've done all this wonderful stuff but under the hood it's still really ugly," he says. For the web to be able to do what we increasingly demand from it, the underlying structure needs to improve.

Leading the charge for the changeover to the semantic web is none other than the creator of the worldwide web, Tim Berners-Lee (who is also on the advisory board for DERI Galway).

The idea is that the changeover will happen gradually - not in a "big bang" but piecemeal, in chunks that will gradually be interlinked and further annotated as time goes by. "The different approaches over time will overlap and converge. As Tim Berners-Lee says, we won't notice the change when it happens, but we won't be able to imagine going back. It's going to put structure where there's no structure. The balancing act is not to make it so rigid that we lose all of the web's flexibility," says O'Morain.

He says areas of industry that already need to classify information will likely be the first to move towards marking up data - life sciences, for example, and perhaps healthcare. Sectors like financial services will move more slowly because they have to start from a more basic level, working on taxonomies for classifying their data.

This will lead to new business possibilities, he says. "In terms of opportunities, it's a perfect storm," with technological advances driving the need to start classifying data to build the semantic web. "The [ internet] pipes have got very big, with cheaper broadband; [ computers are] very strong, there's a lot of AI [ artificial intelligence] and natural language processing that has been worked on for the past 10 years. There will be opportunities for new specialist business that can classify data for companies," he says.

Companies need to consider some of the challenges that will come with such detailed classification of data and start planning for this shift.

Philip Nolan, a lawyer with an interest in internet-related law and head of the commercial department at Mason Hayes & Curran, says the semantic web will almost certainly introduce data protection and control issues for organisations, and could potentially allow for defamation by machine.

He says for organisations collecting personal data on third parties, "if data is going to be reconfigured and made smarter in this very searchable way, the user needs to know that. It has to be informed consent." He gives the example of a publisher that has names and addresses collected for a magazine subscription, and would need to show caution in tagging the data and making it available for some other purpose.

Nolan also notes potential difficulties with the existing Friend of a Friend (FOAF) semantic web approach to tagging personal information which enables it to be made available within the vast FOAF network, and also enables the data of a user's network of friends to be made available to others.

FOAF is essentially a data store of personal information and thus makes an interesting case, says Nolan. He argues that any system that makes such data on others available will also need to have the ability to regulate who sees what.

Perhaps the semantic web will have these possibilities structured in and will thus help resolve some of the existing data-protection issues, he says. "You could encode data to go to a specific user group, so the semantic web might provide a way of managing that data. You could build in an expiration date after which it could no longer be viewed, for example, too."

Another interesting issue is "the risk of inference", says Nolan. Case law has already established that defamation law applies to electronically published information - but what if machines generate information that is taken as fact, and repeated as fact, by making incorrect inferences from data?

When humans read a document, they understand the context for the information contained in it but a machine does not, and may summarise "facts" that are defamatory.

"The nuances which might be built into the document - say an encyclopedia entry - would be lost by a machine," he says.

Nolan gives the example of a person searching for information on John F Kennedy. A semantic web return on such a search might note JFK was killed by Lee Harvey Oswald, without the context of the Warren Commission investigation, subsequent conspiracy theories and other subtleties. Such a summary made by a machine and then repeated could be defamatory to an individual.

All of these possibilities, he stresses, are as yet only considerations. But as O'Morain stresses: "Businesses need to start thinking about some of these issues."

"If you look at the current law, and these very nuanced technologies, there is a challenge," says Nolan.