Devising ways to exploit the web in a more efficient fashion

INNOVATION PROFILE Digital Enterprise Research Institute (DERI), NUI Galway

INNOVATION PROFILEDigital Enterprise Research Institute (DERI), NUI Galway

THE SCALE OF the worldwide web and the amount of information it contains almost defies description and human comprehension, and its growth quite simply takes the breath away.

According to global data specialist EMC there is slightly more than one exabyte of digital data records in existence on records today – an exabyte being one billion gigabytes. And it is forecast to grow to 40 exabytes by 2020.

This estimate is borne out by IBM research which indicates that more than 90 per cent of the world’s data has been created in the last two years alone. Indeed, just two months ago an internet search for the word “Ireland” would have returned 265 million results. That has increased to 324 million today.

READ MORE

The challenge posed by this enormous volume of data and information is how to make use of it and that is the main thrust of the research being carried out by the Science Foundation Ireland-funded Digital Enterprise Research Institute (DERI) at NUI Galway.

DERI’s work is aimed at enabling and supporting people, organisations and systems to collaborate and interoperate on a global scale using semantic web technologies.

The semantic web is a term coined by worldwide web inventor of DERI advisory board member Tim Berners-Lee to describe the “web of data” that enables machines to understand the semantics, or meaning, of information on the web.

Berners-Lee defines the semantic web as “a web of data that can be processed directly and indirectly by machines”. It involves the insertion of machine-readable metadata into web pages to give information on how they are related to each other, enabling automated agents to access the web more intelligently and perform tasks on behalf of users. DERI’s goal is to use the semantic web to make information on the web as accessible as information in the human brain.

“There is a lot of rubbish on the web; there is just too much information,” explains DERI director Prof Dr Stefan Decker. “A person looking for information has to read through a lot of results and build a mental model of how all the different information relates to each other. But if you are doing a search on a topic like Ireland the information you are looking for might be on page 10 million and you are never going to get there. What we are doing at DERI is bringing all the information together and showing how it relates to each other with the computer doing it for you.”

He cites the example of climate change as a subject where the sheer volume of information available on the web is potentially causing its own problems.

“If you are interested in climate change and want to find out more about it you would have to read through thousands of pages just to understand it and even then you wouldn’t know which paper was controversial or not. Not many people go to this trouble and maybe that is why it is so easy to spread confusion about the subject. If the computer can help you understand the domain it will cut out a lot of this work and make complex subjects far easier to understand.”

This is the point of the semantic web. “The information on it gets transformed into a worldwide network of knowledge, not just data,” he points out. “And this is not just an idea being worked on by a nutty professor and his colleagues in the west of Ireland. DERI isn’t the only institute working on this. DERI just happens to be the biggest in the world.”

The institute now has 140 members from 30 nations including lecturers, research associates, postdoctoral researchers, and PhD and MSc students. It also boasts an impressive list of external partners including Avaya, Cisco, Ericsson, Alcatel-Lucent, Celtrak, Openlink, Storm Technologies, and FBK – the European Centre for Theoretical Studies in Nuclear Physics and Related Areas.

Interestingly, while the huge and rapidly increasing amount of data on the internet is creating difficulties it is also generating opportunities. “The more data we have the greater our desire is to exploit it,” says Decker.

This is due to something called the “network effect” which, according to Metcalfe’s Law, is defined as: “The value of a network is proportional to the square of the number of connected members.” In other words, a network with eight members is actually 16 times more valuable than a network with two members.

Decker explains it in terms of fax machines. “One machine is pretty useless if it is not connected to anything and it is not very valuable if it is only connected to one or two others. But if there are lots of fax machines all connected to each other then they become very valuable. The same is true for data. An individual piece of data has a certain value on its own but this value is multiplied if data are connected to each other.”

Part of the way DERI is helping to transform this data into knowledge is by effectively creating a new language to pose the questions. “We are developing the grammar – what we call the resource description framework, and the vocabulary – the ontologies, which will allow us to organise knowledge in a machine-comprehensible way and give an exploitable meaning to the data. This is not natural language processing which machines still can’t understand but it’s about halfway there.”

The real social and commercial potential of DERI’s work possibly lies in its ability to bring data together from multiple domains in an understandable way. In the earlier example of Ireland a researcher may be looking for information on how the Irish state supports industry.

A vast number of private and public sector sites would be trawled through by the computer and only relevant information from documents would be displayed – with no need to go all the way to the 10,000,000th page.

The DERI application which performs this task is known as SIOC and is already in use in US government departments. Indeed, it was used by the Obama administration to help keep track of where its trillion dollar economic stimulus package was going and what impacts it was having.

Another area of application is in healthcare and not just in the obvious realm of patient data records. Personalised medicine is an emerging field which will see different groups of patients suffering from the same conditions being prescribed different medicines in accordance with their likely efficacy.

However, figuring out which patients will react best to which treatments will be a matter of genetics among other things.

“Genomics will be hugely relevant to the pharmaceutical industry in the development of personalised medicine and our solutions can help with this,” says Decker.

Another area of application is cyber security and privacy. “An enormous number of potential threats have to be monitored and analysed and you need a very flexible way of looking at linked data to do this properly. We are talking to Georgia Tech about doing some work together on this at the moment.”

DERI’s work may also impinge on the way we live our lives by assisting in the creation of what is known as “liquid democracy”. This is a highly participative form of democracy which allows citizens to make their views known on policy making and have these taken into account on a constant basis.

“It is not a simple yes or no question,” Decker points out. “Our solutions will help governments weigh up the arguments being made by citizens before making trade-offs and informed decisions on policy.”

These are just a few aspects of DERI’s work which may ultimately change the way we access information and help in the creation of the digital economy of the future by offering new and more efficient ways to exploit networked knowledge.