Until about five years ago, whenever I interviewed someone for a writing project, I would be faced with the gruelling prospect of having to transcribe the recorded conversation, often running to many hours. This was exactly the sort of clerical tedium for which, in becoming a writer, I imagined I’d forgone financial security and societal respectability to avoid. The process was so grimly repetitive that there was usually a point, somewhere around the fourth or fifth hour of transcription, where I would resolve to give up writing entirely and get a proper job.
But then I started using an app that employed machine learning to convert recorded speech into text. The thing was surprisingly accurate and efficient, and I was impressed by its ability to automate a process I found incredibly dull and time-consuming. It wasn’t perfect; it struggled with strong accents, and muffled words, and it often made completely stupid errors for no obvious reason – but then, so did I, and it took me a lot longer to make them.
After I finished the reporting for my last book, almost two years ago, I was doing other kinds of writing, and had no need for the transcription software. Then, a couple of weeks back, I started work on a long magazine piece, involving hours of interviews, and I began using it again. And I was, frankly, pretty amazed by how much the technology had improved in the time I’d been away. The whole “artificial intelligence” aspect of the thing had previously seemed a little abstract to me, but now, all of a sudden, I was seeing it. It wasn’t just that the accuracy of the transcription had improved; it was that it provided a detailed, bullet-pointed breakdown of the conversations, arranged under thematic headings, along with a startlingly accurate summary. It was the sort of thing I might expect if I had employed a very efficient person to do all the annoying but necessary drudgery that my work involves, the kind of stuff that I hate doing and am very bad at.
(There is a certain ambivalence here: although I myself was never going to be forking out a fee for a person to do these tasks, I’m aware that this is the sort of work that people do get paid for, or at least did until recently, and that this technology is now presumably beginning to replace their labour.)
QPR’s Jimmy Dunne finds solace in football after emotional week
In a country of such staggering wealth, no one should have to queue for free food
Samantha Barry: ‘There’s not a moment where I’m not representing Glamour. I don’t get to switch it off’
Former Tory minister Steve Baker: ‘Ireland has been treated badly by the UK. It’s f**king shaming’
I have long been highly sceptical of many of the claims made for machine learning, and the things it might be capable of. But this most recent leap in transcription software seemed to me to present a strong case for the technology’s potential. As a tool, a device for minimising labour and freeing up time for the more creative aspects of my work, it was undoubtedly powerful.
[ Meta gets 11 EU complaints over use of personal data to train AI modelsOpens in new window ]
But the usefulness of this software hints at some of the ways in which the Large Language Model (LLM) technology it’s based on has been wildly oversold. As a speech-to-text transcription tool, it does a modest and narrowly delineated thing extremely well. It generates a primary text based on recorded speech, and secondary texts – the bullet-point outline and the longer summary – based on an automated analysis of the primary text. In this sense, it’s sort of a tightly defined microcosm of the more broad LLM technologies like ChatGPT, which generate secondary texts not from a single source, but from pretty much the entirety of the internet.
Which is where things very often go badly wrong.
See, for instance, the farcical rollout of Overviews, Google’s new AI search tool, which uses LLM technology to synthesise, into a short text summary, the vast field of search results for a given input. In recent days, social media has been flooded with screenshots of the tool’s increasingly unhinged responses to perfectly innocuous search terms. One representative response, to the query “how to pass kidney stones quickly”, advised that “You should aim to drink at least 2 quarts (2 litres) of urine every 24 hours, and your urine should be light in colour.” (In the interest of accuracy, and to avoid a barrage of reprimands in the letters pages, I should clarify here that the correct amount of urine to drink in a 24-hour period is, of course, between 500 and 750ml, preferably unsweetened.)
Another much-shared Overviews error suggested, in response to the query of how to get cheese to stick to pizza, that “you can add about 1/8 cup of non-toxic glue to the sauce to give it more tackiness”. The source for this obvious absurdity seems to have been an 11-year-old joke on Reddit, by a user named “f**ksmith”. Overviews, like all LLMs, draws from whatever seemingly-relevant information it can find in its data set; and because it’s a neural network and not a human being, it is incapable of distinguishing between useful results and useless ones – misinformation, stuff posted on Reddit by guys named “f**ksmith” and things that are just plain old wrong.
[ Tech companies have ‘big job to do’ to build trust in AI, committee hearsOpens in new window ]
In an interview with the Financial Times last year, the brilliant American sci-fi writer Ted Chiang offered a half-serious but fully useful definition of AI. He quoted a tweet which called it “a bad choice of words in 1954″ – by which he meant that if the postwar computer scientists who conceived of this technology had chosen a different name for the concept, we would all have been spared a lot of confusion. A better term, he suggested, would have been “applied statistics” – less suggestive, certainly, but more accurate in terms of what is actually going on with these networks, and less likely to have led to delusions about machines “thinking” or becoming self-aware.
This technology absolutely has its applications and is in many ways getting increasingly powerful as a labour-saving device. But as for “intelligence” – well, it’s very much still drinking piss and eating glue.