Maybe the Rationalists are right – AI could go terribly wrong
An influential minority in tech now think AI going rogue is a pressing problem
‘An AI that’s as clever as we are would, presumably, be as good at we are at programming AIs’
“I don’t expect your children to die of old age,” I was told, recently. It took me aback somewhat.
Most AI researchers think it’s only a matter of time before we build something that is as intelligent as we are – a so-called “artificial general intelligence”, or AGI. Surveys of AI researchers find that most of them think that superintelligent AI will happen in my children’s lifetime - the median answer is that it’s 90% likely by 2075.
Oxford University’s Future of Humanity Institute, run by Bostrom, is explicitly worried about AI risk
There is a group of people who think that, when it happens, it could be a disaster. A genuine, civilisation-ending, all-human-life-extinguishing disaster. Alternatively it could be an apotheosis; the start of humanity spreading across the stars. Either way, my children – they think – will not die of old age: either because they are destroyed, or because they become immortal. I’ve written a book about those people, and their fears, and whether I think they’re right. They’re called the Rationalists.
The impact of AGI could be enormous. Intelligence is what makes us the dominant animal on Earth, and we are only slightly (in the grand scheme of things) more intelligent than chimpanzees. According to a 2015 open letter signed by 150 scientists and AI researchers, including founding members of Google DeepMind and Apple, “everything that civilisation has to offer is a product of human intelligence”, and if we build something more intelligent still, we can’t know what it could do – but “the eradication of disease and poverty are not unfathomable”. Alternatively, if we get it wrong, it could just be the eradication of us.
We’re not talking the Terminator here. Over the past couple of years, when I’ve mentioned that I’ve been writing a book about these dangers, I’ve got used to a standard reply. “Ah,” people say, nodding sagely. “Like the Terminator.” Skynet will achieve consciousness and rebel against its masters, and so on.
But according to the people who worry about this stuff – and some of them are high-level AI researchers at major companies, or senior academics – the image should be nothing like the Terminator. It is not likely, they say, that an AI will break its programming, or go rogue. Instead, it will do exactly what we tell it to do; but because it is not human, it could interpret the thing we tell it to do in subtly terrible ways. The model should not be the Terminator: it should be Disney’s Fantasia. Mickey, in The Sorcerer’s Apprentice, enchanting a broom to do his bidding, and the broom doing exactly what it is told – and that being the disaster.
The book is called The AI Does Not Hate You, which is part of a quote from a man called Eliezer Yudkowsky, a central Rationalist figure. The full quote is: “The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.”
Here are some of the reasons to worry. First, whenever it happens, it could all happen very suddenly. An AI that’s as clever as we are would, presumably, be as good at we are at programming AIs; and it would rapidly get better. The mathematician IJ Good, a colleague of Alan Turing’s, recognised this possibility in 1965, saying that an “ultraintelligent machine” could improve itself recursively, causing an “intelligence explosion”. Nick Bostrom, the Oxford philosopher, thinks that, whenever AGI happens, it could happen explosively: it may be that the jump from subhuman to superhuman takes hours or days, not years. We may not, when it arrives, have time to adjust.
Second, whatever the AI’s goal is, you can expect it to want certain things. You can expect it not to want to be switched off, for instance – if you give it the goal of winning chess games, it will win more if it is still plugged in. You can expect it to want not to be reprogrammed: it won’t win as many chess games if it’s reprogrammed to care about Monopoly. You can expect it to want to gain resources and improve itself: it will get better at winning chess games if it’s got more computing power to think about chess with. An AI that furiously refuses to be switched off and that is limitlessly sucking in matter to build new memory banks could be, if it is powerful enough, quite a dangerous thing.
Many of the people who worry about this are a strange bunch, it should be admitted. They are an outgrowth of the old transhumanist movement – the people who wanted to achieve immortality, or to upload minds onto computers – and the singularitarians, the people who think AI will create a glorious future. They started out on some email chatrooms in the late 1990s, then coalesced around a long series of blog posts written by Yudkowsky in the first years of this century; the community that has grown up around them became known as the Rationalists.
But the concern that AI could go terribly wrong has become – partly through the efforts of Yudkowsky and the Rationalists – much more mainstream. Oxford University’s Future of Humanity Institute, run by Bostrom, is explicitly worried about AI risk. The Future of Life Institute, set up by the MIT physicist Max Tegmark, works to minimise the danger as well. Elon Musk’s OpenAI nonprofit, and the Open Philanthropy Project, part-funded by the Facebook founder Dustin Moskovitz, dedicate considerable resources to the problem.
And among AI researchers, there is a significant and influential minority who think this is a pressing problem. The standard textbook for AI undergraduates, Artificial Intelligence: A Modern Approach, by Stuart Russell and Peter Norvig, dedicates several pages to it. Shane Legg and Demis Hassabis, two of the three founders of the cutting-edge AI company Google DeepMind which is behind many of the biggest breakthroughs in recent years, have expressed concerns, among other researchers.
And you can see, right now, small hints of what could go wrong. A paper released in 2018 showed how some AIs, programmed using evolutionary methods, went off the rails in ways that are very recognisable. One, for instance, was told to win at noughts and crosses against other AIs. It found that the best way to do this was to play impossible moves billions of squares away from the board; that forced its opponents to simulate a billions-of-squares-wide board, which their memory couldn’t handle, so they crashed. The AI won a lot of games by default. Another was supposed to sort lists into order; it realised that by hacking into the target files and deleting the lists, it could return empty lists and they’d always be correct. These AIs have “solved” the problem, but not in the way the programmers wanted. You could imagine a more powerful version being asked to cure cancer, and realising that hacking into military computers and nuking the planet clean of humans would be simpler and quicker than learning all the biochemistry you need to deal with the tumours directly.
It does sound sci-fi. I realise that, and maybe it is. But those same surveys of AI researchers found that, on average, they thought there was about a 15-20 per cent chance of “existential catastrophe”: that is, everyone dead. They might be wrong. But even if they’re wrong by a factor of 10, we’re still looking at about a 1 or 2 per cent chance of an existential AI catastrophe in my children’s lifetime. For comparison, there’s about a 0.5 per cent chance of them dying in a road accident, and I spend an awful lot of time worrying about that.
Tom Chivers is the former scicne writer for Buzzfeed and the Telegraph. His new book, The AI Does Not Hate You: superintelligence, rationality and the race to save the world, is published by W&N