Net indexers plan to cover entire Web

Stung by criticism that search engines have fallen hopelessly behind in indexing the 800 million pages of the World Wide Web, …

Stung by criticism that search engines have fallen hopelessly behind in indexing the 800 million pages of the World Wide Web, several search companies have launched themselves on a Herculean effort to scan and review the entire expanse of cyberspace.

Excite, which operates the third most popular search engine www.excite.com, has announced plans to look at the Web's entirety using a new technology to be deployed in the next few weeks. Excite now has indexed only about 50 million pages of the Web.

But some critics suggest that all this effort may be a massive waste of resources - essentially a marketing scheme that will mean little to the average user and may even be counterproductive by vastly expanding the number of irrelevant results on a search request.

"What does it mean to have another 100,000 or 200,000 links show up in a search?" asked Jakob Nielsen, of the Web usability consultancy Nielsen Norman Group. It is 100 per cent irrelevant."

READ MORE

Still, the push to become the biggest search engine in cyberspace has already begun to gain momentum, driving a variety of companies into the fray. "The whole idea of bigger is better is back with a vengeance," said Danny Sullivan, editor of www.searchenginewatch.com, a London-based online magazine dedicated to the online-search industry.

Norwegian search-engine company Fast (www.alltheweb.com) also announced last week that it plans to catalogue all of the Web within the next year. The company also claims to be the current index champion at more than 200 million web pages.

Inktomi, which produces one of the most widely used search engines on the Internet said that it too has begun to feel the pressure to keep up. "We've seen a resurgence of the idea: big, big, big," said Kevin Brown, director of marketing for Inktomi. "Relevance of results is still the leading issue, but we intend to grow our index substantially too."

While part of the movement may be just an effort to gain bragging rights in a highly competitive industry, the current arms race between search-engine companies touches on a Holy Grail of the Internet - cataloguing the entirety of humanity's online knowledge.

So far, the search engines have done miserably in the task. A study by scientists at the NEC Research Institute found that even the best search engines today have found no more than 16 per cent of all web pages. The study, published in July in the journal Science, raised question of whether the Internet could actually lead to a step backward in the distribution of knowledge as more information was lost than gained because of the inability of the search engines to keep up.

The scientists found that most search engines index less than 10 per cent of the Web. Even by combining the efforts of all the search engines, only 42 per cent of the Web had been indexed.

Kris Carpenter, director of search products and services for Excite, said she believes that most consumers still do not want all 800 million pages of the Web - a large percentage of which consist of vanity sites or extremely obscure data. But she added that it has become more important to at least scan the entire Web so the search engines can make better decisions on what is important.

Excite now uses fewer than 10 spiders (automatic search programs) to cover the Internet, but with its new technology, it will begin deploying dozens - each capable of covering up to 35 million pages a day.