The internet is deciding what to forget

The internet is so vast and all-consuming that it’s easy to forget how fragile it can be.

Do something embarrassing online and there’s a good chance it will live there forever, shared without your consent. But not everything that’s posted is permanent. The last big study of web pages found that more than a third available in 2013 were now inaccessible – leaving a trail of “link rot” in their wake.

Maybe you think this is a good thing. If you’ve ever scrolled back far enough to see your very first Facebook status update, you’ll probably wish that link was broken. Right now there’s a trend for AI-generated videos of Love Island starring cartoon fruit that regularly get millions of views. Do digital bananas in Hawaiian shirts chatting up pineapples need to be saved for posterity? Probably not.

But disentangling what will and will not matter to our collective cultural memory is proving difficult. Efforts to save absolutely everything haven’t gone very well. There’s too much and a lot of it is nonsense.

David Keenan: ‘For me, the names of places and streets in the North have incredible magic to them’

‘Some days I genuinely feel like my life would be happier without my husband’

Bowel cancer, depression, Parkinson’s disease ... let me tell you why I’m wonderful

Luxury apartment in magnificent period building by the harbour in Dún Laoghaire

In 2010, the Library of Congress took the view that Twitter was a crucial source of modern history and decided to archive every single tweet. It “may prove to be one of this generation’s most significant legacies to future generations”, the library wrote.

That “may” seems over-optimistic. To most people, the repository is both unwieldy and uninteresting. As of 2017, the library seems to agree. It now opts to save just a few select posts.

The risk in being selective, of course, is missing something important. Dutch consultant Maurice de Kunder has been following the number of web pages indexed by search engines for more than a decade and found it had fallen from 4.7 billion to 3.98 billion.

Some deletions are more deliberate than others.

Last year, Elon Musk’s “department of government efficiency” launched a project to eliminate up to 20 per cent of US federal websites. Particular words, such as climate change, also evaporated. A couple of months later, large companies began rewriting their own sites to also remove references to climate change.

The only reason we know this is because third parties were keeping track – the organisations themselves did not flag changes.

Because online content is regularly overwritten, what the historian Abby Smith Rumsey calls modern memory technology has a significantly shorter lifespan than pre-digital versions. There is neither a single record of everything posted online nor an agreed-upon way to save it.

This has become more noticeable with the death of digital publications. You can see newspaper editions printed in 1665, the year the Great Plague of London began, but you can no longer visit a modern news site such as Wales’s The National, which launched in 2021 and was then taken offline. Some sites, such as Gawker, have been archived while others have disappeared into 404 errors (the status code that indicates a server can’t find a webpage).

A few have entered into a strange afterlife. When cult site The Hairpin was shut down in 2018, its domain was purchased by a Serbian entrepreneur called Nebojša Vujinović, who specialises in buying old news sites and filling them with AI-generated clickbait. Now it just redirects readers to an online gambling site.

Despite relying heavily on digital data, we have left its preservation to a mishmash of individual efforts. The best known is the Wayback Machine, an initiative from the American non-profit Internet Archive. This takes snapshots of websites (it has preserved more than one trillion so far) but it doesn’t have everything. Copyright owners can seek content removal and some sites have begun to blacklist the Wayback Machine, suspecting that AI companies are using it as a way to scrape content without permission. A report by the Nieman Lab found that the volume of snapshots dipped in the second half of 2025.

A second popular option is archive.today, a mysterious site operating under multiple domain names. How long it will last is anyone’s guess. Last year, the FBI subpoenaed the unknown registrar behind it and Wikipedia recently asked editors to stop linking to it “due to concerns about botnets, linkspamming and how the site is run”.

There is, of course, a sort of immortality in the fact that much of what exists online has been used to train AI models. But this isn’t much help if you want to trace something’s original form. Even online snapshots of web pages may prove less durable than physical archives.

We treat the internet as if it is limitless and permanent, but transience is inbuilt. If you see something online worth saving, you’d better do it yourself. – Copyright The Financial Times Limited 2026