Web spiders sent out to map the pattern of Net links

The Internet may not be a chaotic collection of billions of websites, links and quirky creations but may, like many other things…

The Internet may not be a chaotic collection of billions of websites, links and quirky creations but may, like many other things, have a definable structure.

UCD scientists are embarking on a three-year study to try to make sense of the myriad structures and links that have developed organically as the Internet has developed.

Like physics, chemistry or biology, Prof Mark Keane and Dr Barry Smyth believe that information is governed by basic natural or scientific laws.

To find and define the substance of the behaviour and structure of information on the Internet, the UCD scientists will use a number of web spiders imbued with artificial intelligence to explore the Web.

READ MORE

The spiders will map the pattern of links between the millions of Web addresses around the world.

If they are successful in finding a definite structure or pattern to the way websites link to each other, they can use modelling based on this data to then build search engines that are far more effective than existing ones at finding information on the Web.

Finding an underlying structure in the mayhem of a medium that doubles its size every six months is not going to be easy, and the team will be using a number of web spiders - the instruments search engines use to search the net - to trawl the Net The web spiders will work on their task on a 24-hour basis, seven days a week.

Dr Smyth said the team believed certain pages on the Internet were "authorities" in that they were linked to more pages than any others and they would investigate why these pages acted as hubs as opposed to others.

When people look at the Internet, they see a whole mess of sites and a vast tangled web of links, but, up to now, search engines and the Web spiders that they use to gather information on new sites and new information have been quite unintelligent, according to Dr Smyth.

Up to now, most search engines have been text based, with a keyword being the central focus of a query, according to Dr Smyth. Discovering why some websites form links to one another more than others may provide the basis for more intelligent navigation of the Web.

Dr Smyth said the recent linking of Google, which is a simple linkage-based search engine, with Yahoo is a sign that portals are realising the importance of such methods.

The team hopes to build a Web connectivity database of the pattern of linkages, not just between one website and another, but also the sequence of linkages connecting one site to another through a number of others.

Using artificial intelligence and data mining tools to collate the information in the database, the scientists expect patterns to appear that can then be used in any number of ways.

The research will investigate both narrow and broad coverage, with a focus on different areas of activity on the Web.

The way in which financial or business pages link to each other may be inherently different to the way people navigate travel or sports sites and may give an insight into different patterns of behaviour.

The commercial applications of such information could be invaluable both to those seeking to set up websites and those who are already online but are seeking to discover more effective ways of reaching people with the vast quantity of content that now exists.

Dr Smyth said one of the simplest benefits to come out of the research would be that the resulting statistical information would give surfers the ability to see the most-used links on websites.

The project is a similar to a voyage of discovery into uncharted waters, but Dr Smyth believes such research is essential.

He said the shift from textbased searches to link-based ones would probably lead to more intelligent and accurate search engines, hopefully making navigating the Web a less frustrating experience.

Finding a structure to the apparent chaos of the Internet may be akin to trying to find a yeti in the Himalayas, but the UCD project team is hopeful of providing a better understanding of this confused mass.