A research team in IBM's Dublin campus had a reason to celebrate when LanguageWare, a technology at the cutting edge of artificial intelligence, got the upper hand over human contestants in the US game show 'Jeopardy!', writes KARLIN LILLINGTON
WHEN IBM’S extraordinary computer Watson created artificial intelligence (AI) history by beating two human players last year at the US game show Jeopardy! a research team in IBM’s Dublin campus had a little celebration themselves.
A core language-analysis technology that enabled Watson to understand the game show’s questions, search through millions of documents and other resources contained in its memory, and outdo the humans in producing the correct answer, came from the Irish lab.
And surprisingly, the LanguageWare technology that has been honed for more than a decade by the Mulhuddart research group and is at the very cutting edge of AI, began life as a lowly spellchecker.
“It’s been a long road,” says DJ McCloskey, the head of the LanguageWare group at IBM Ireland. “That piece of technology was actually developed here in February 2001.”
McCloskey, who has a background in physics and mathematics and gradually developed an interest in the mathematics of information and textual analysis, says Watson’s linguistic pyrotechnics displayed on Jeopardy! developed out of the spellchecker from Lotus Notes.
“A very skilled group of linguists had developed that program 20 years before – that was a key leg-up,” he says. In looking through existing IBM technologies that had further development potential, a researcher had suggested the Dublin group explore building on the textual recognition capabilities in the spellchecker.
That initial work has expanded into the 10 person LanguageWare team that comprises pure linguists, computational linguists, mathematicians, engineers and other specialists. “Here in Dublin, we have a team with world-class expertise in textual analysis now. It’s a multi-disciplinary team that doesn’t look like any other team in the organisation.”
The team has already put LanguageWare technologies to interesting uses. IBM Ireland was involved in a project to digitise and add semantic tags to the 1641 Depositions, eight volumes of 1,559 personal accounts of one of the most violent moments in Irish history, the 1641 Rebellion.
The manuscripts, held by Trinity College and now publicly available on its website, can be searched and analysed in ways that would have been impossible without the benefit of LanguageWare.
The key problem for the researchers in approaching a 17th-century text or a question from a game show host is the same: how do you take unstructured information – a series of words that make sense to a human, but are nothing more than a chain of letters and punctuation to a machine – and enable a computer to parse it and convert it into structured information.
McCloskey says LanguageWare was part of Watson from its initial development in the US, and as it emerged out of the pure research division, the Irish team was confident that it would be a valuable part of further developing Watson into a commercial application. “We never really had any question that we’d be there,” he says.
The way Watson “thinks” and then responds to a query, is to apply an analysis engine to an incoming string of text – an algorithm that can process a piece of text and annotate it so that the meaning and role of individual elements is understood.
Step by step, a series of annotators then delve deeper into the text to try to determine its precise meaning. Then, Watson’s vast memory is searched to bring back the most likely correct answer. McCloskey stresses that Watson works on the basis of probabilities that an answer is correct. “In Watson, everything has a confidence measure. Nothing is certain.”
Over time, Watson learns, from initial training, and then, the response of researchers (and now, customers) as to whether answers are correct. Wrong answers get stored as wrong, he notes.
“One of the key reasons it won at Jeopardy! is it knew when it didn’t know,” he said. So Watson didn’t risk being penalised with incorrect guesses. And Watson is smarter now than it was when it took on its human challengers in the game show.
“Machine learning gets better over time. Input sources are deeper than just websites or links – for example, it analyses scholarly papers and journals – and that’s how it matches, and gives a very strong match. We’re not just returning documents here, we’re trying to return the right answer. The answer is a set of concepts linked in a very nuanced way,” says McCloskey.
For Jeopardy!, Watson’s goal was to produce the answer to a specific question. But the show was really primarily a demonstration of a highly sophisticated ability to understand text, analyse millions of resources, and return a response in seconds – a computing ability for big data analysis that has many commercial applications.
IBM is offering Watson as a service to industries such as medicine and financial services. A cancer specialist can use Watson as a decision support tool, for example, using Watson to analyse a massive database of medical articles to determine which next step in treatment is likely to be most successful for a patient. In such a case, Watson doesn’t need to have a single answer, but would be most useful in providing a ranked set of answers that could be further explored by the oncologist.
Much of the effort in Dublin now is in further advancing Watson’s language processing ability, and developing its user interface. For Jeopardy!, there was little need for a user interface because single questions were posed in the expectation of a single answer. But with a sophisticated user interface that allowed a back-and-forth discussion, Watson could, for example, produce a more refined answer by engaging in further questions and answers with a doctor, to help narrow its search for the most appropriate answer.
“The utility of that alone is transformative. Watson is working on things that change lives. And that motivates people here to do their best.”
INSIDE THE MIND OF WATSON 'THE HUMBLE GENIUS WHO KNOWS IT'S ALWAYS RIGHT':
ACCORDING TO Stephen Gold, IBM Watson Solutions director, about every 10 years the company takes on what it calls a "grand challenge", a very large-scale research project to achieve a dramatic computing goal.
A previous such challenge was the building of Deep Blue, the computer that beat Garry Kasparov at chess in 1997.
The latest is Watson, a computer named for IBM's first president Thomas J Watson, which demonstrated its artificial intelligence and natural language-processing abilities by defeating two human opponents on the US game show Jeopardy! in February last year.
To win, Watson was able to return correct answers in an average of three seconds – astonishingly fast for a system that had to parse a question, search its four-terabyte memory of 200 million pages of information (including encyclopedias, dictionaries, Wikipedia, journals and literary works), deduce possible answers, then find the one it was confident would be correct.
Such projects start as "a formidable challenge to bring together a number of technologies in a way that will have a demonstrable effect and real impact".
As for Watson, he says with a laugh, "it began as most things do – in a pub".
The result is a computer that is actually a conglomeration, "a system of 41 subsystems" that can execute thousands of language analysis algorithms in a fraction of a second.
He describes Watson as "the humble genius who knows it's always right" and as a commercial application, can therefore "be a helpful assistant to a doctor, a lawyer, a banker. What Watson is capable of doing is putting content in context. What Google is to search, Watson is becoming to discovery."
The system has three underpinnings – natural language analysis, hypothesis generation and answer generation.
IBM's Dublin LanguageWare team was critical in supplying Watson's natural language piece, Gold says.
"They were instrumental in this area, providing Watson's deep QA activity. And this will be pivotal in the course of bringing Watson to market [as a service]."