Evolving an answer to the missing link

Most writing about the Web is done from a very short-term perspective. Authors create websites quickly and abandon them almost as quickly. Sometimes the sites are pulled down (leaving everyone who linked to them irritated), or they are left up, as dead sites, unmaintained and full of broken links. In either case, the authors rarely thought about what role their site might play in the long-term unfolding of the Web.

I am in the unusual position of knowing something of what it is like to maintain a website over a long period. This is my sixth year of maintaining a several-hundred-page site on my research and other interests, and a number of surprising lessons have been learned over those years.

Lesson 1: Links are free ads Mine is a classic "amateur" site, with no commercial purpose other than to promote my research work in computer science and my other interests (mainly history). It gets 30,000 hits a month (the circulation, I suppose, of a small magazine) and would probably get more if I promoted it or registered it properly with the search engines.

Because it is non-commercial, I link to other relevant websites throughout my texts. Linking is the fundamental innovation of the Web - the ability to directly reference other works from within your text - yet few commercial sites outside search engines take any advantage of it. They prefer to reinvent the wheel in an (absurd) attempt to force users to stay within their site.

What are the long-term implications of a site which links? It turns out that in what is effectively a small magazine, I currently offer over 500 free ads for Yahoo Corporation, without being asked to, and without expecting anything from it in return. The question is - Why? And why did Yahoo get these free ads and not anybody else?

Lesson 2: In the future, all links will be to Yahoo

As an example of the evolution of a typical page over the last few hectic years let us take a page referring to Shakespeare. It would be pointless for me to start maintaining my own (inadequate) biography or collection of data on Shakespeare at this point. The logical thing to do is to point to someone who specialises in Shakespeare. Hyperlinks should be (but rarely are) used like this to strip down pages to their original content and devolve everything else to remote specialised pages.

In the early days of the Web, this would have involved a link to some Californian computer science student's home page, something like: www.stanford.edu /cs94joeshmo/shakespeare.html, which would have been the only Shakespeare page on the Web at that time.

As the Web developed of course, cs94joeshmo would graduate and vanish, and I would have to change my page to link to some other soon-to-vanish site. After a few years of this, you either go crazy or develop strategies to future-proof your pages. First, I started linking to more heavy-duty, dedicated pages, something like: www.shakespeare .org, which at least looked as if it would still exist the following year. But the Web kept exploding, and it was soon apparent that shakespeare.org was only one of dozens of dedicated Shakespeare sites. It did not make sense to restrict my readers to one in particular, so I started linking to things such as: www.shakespeare .org/shakespeare-sites.html. But still the Web kept changing, and I began to wonder if this was the best list of sites to link to. What I really wanted to find was the definitive place on the Web that the word "Shakespeare" should link to. The answer, I argue, and the final resting place of all my ceaselessly changing links, is a Yahoo category, something along the lines of: www.yahoo.com/Liter ature/Shakespeare/

Yahoo offers the confidence that this link will work forever, and that the page it refers to will be maintained forever - gradually expanding with sub-categories and sub-sub-categories as the amount of information increases. Over the last half-decade hundreds of my links have slowly migrated to Yahoo, and linking to specific pages has been slowly replaced by linking to Yahoo categories, as they are introduced. Now that Yahoo is embedded in my pages, any new web directory is going to have to be a lot better to make it worthwhile to go and change those 509 links.

In fact the other web directories, search engines and portals don't even seem to realise what is going on. Catering almost entirely for the transient "surfer", they provide keyword-ranked chaotic lists of information, on pages that cannot or are not designed to be linked to. Try "Shakespeare" on any of them to see what I mean. They use frames so it is difficult to find an address to link to. The address is a mess of keywords and search parameters (CGI arguments) or temporary identification numbers (cookies) - anything, it seems, to make sure you do not link to their site. Only Infoseek seems to provide an actual page that you can link to: www.infoseek.com/Books/ Shakespeare/ - and it is a far inferior cousin to Yahoo's complex page.

Lesson 3: On the Web, more people will read the archive than the current issue One more strange fact is that my pages provide 115 free ads for The Irish Times, and basically no ads at all for the Irish Independent or the Guardian. Why is this? The answer is the online archive. Newspapers and magazines on the Web have developed online archives almost by accident. When the day's news is over, some simply replace their front page with the next day's news, thereby guaranteeing that nobody will ever link to one of their articles. But others leave the old pages online somewhere, thereby providing hundreds of useful hooks for web authors to link their pages to.

For instance, one of my pages refers to Captain Percival Lea-Wilson, an RIC man killed in 1921. It happens that there was an exchange of letters and an article by Neil Jordan about him in The Irish Times in October 1996. By the marvels of online archiving, this material is not buried in the National Library but is alive on the Web to be linked to. Links can even be directed to an individual reader's letter. Now every time I read a useful article I note the URL as a potential link. If the newspaper has no online archive - well easy come, easy go.

So online archives (or the lack of them) are another example of sites failing to think long-term. In fact, web design expert Jakob Nielsen has found the majority of hits an article gets are when it is in the archive rather than when it is the current issue. This is another of those surprising facts about the Internet that has only become clear as the years have gone by. Eventually, advertising space in the archive should become more expensive than space in the current issue itself.

Few people think about the Web long-term, but I would argue that what people link to and why will eventually be dominated by long-term thinking. "Surfing" is only a fad - in the long run, I imagine the Web becoming the definitive organiser of humanity's information. We will be able to rationally analyse what is the best place on the planet for a particular concept to link to. The reward for the winners in this competition will clearly be immense.

Dr Mark Humphrys is a lecturer in the School of Computer Applica- tions at Dublin City University. His pages (with working versions of the sample links above) are at: www.compapp.dcu.ie/ humphrys