As the population ages, the idea of robots as carers is gaining currency, writes Karlin Lillington
What does a robot need to understand about gesture and conversation to chat effectively with a human?
That's a challenge Dr Candace Sidner has taken on with her research partner, a furry penguin robot named Mel.
"Why do I have a penguin robot? Because someone in my lab built him, and I thought, 'Wow, I could do some pretty cool things with this'," the Mitsubishi Electric Research Laboratory (Merl) researcher and user interface expert says.
Mel, based at Merl's Cambridge, Massachusetts lab, can do a few cool things, too. He can talk. He can "see" people - enough to tell if they are looking at him, or nodding as they follow his conversation. He can point with his beak or his flipper - helpful when he is explaining an experiment involving a glass of water, which is one of his research tasks. And he is fully mobile, though that wasn't always the case.
"He has a Pioneer II robotic base now that he moves around on," says Sidner. A camera on his head counts the ceiling tiles so that he can navigate his way around the lab.
Sidner has a particular interest in computational language behaviours and artificial intelligence - her PhD is from the Massachusetts Institute of Technology, in artificial intelligence.
She spoke recently in NUI Maynooth at a multidisciplinary international seminar entitled "Constraints in Discourse", where participants generally came from language and linguistics backgrounds.
She presented an unusual paper on what she calls "engagement decisions for robots" - teaching a robot to know how to start and continue a conversation, how to use and interpret gestures, and when to conclude. That means first studying what humans do.
Sidner notes a few things that we all do unconsciously: head gestures and body stance, direction of gaze, facial expressions, turn-taking during conversation. But given the limitations of working with a robot, and the complexity of interactions between conversationalists, she needed to strip the rules of engagement back to basics - so that a robot could learn and practice them.
To learn what was most essential, she filmed people having a simple type of conversational exchange known as "hosting", where one person is explaining something to another. In this case, speakers explained an experiment.
One video was particularly interesting to her - as one person explained the experiment, the listener kept his eyes fixed on the floor almost the entire time, which might be construed as an attempt to end a conversation. Yet it was clear he was following the talk - he would nod - so perhaps this was just a shy individual, she says.
Next, she took all of this information and winnowed it down into a structure of actions and responses for Mel.
"For the robot, we did a couple of things. He would track the visitor when the visitor speaks. The robot would look at the visitor when the robot speaks, but Mel would look at an object when he mentions it, then look back at the visitor again.
"Mel would expect a visitor to look at the object when he is pointing at it; if the visitor doesn't, then Mel explains where it is," she says. "And of course, we taught Mel that it is acceptable for the human to look at the floor!"
How does Mel work? "He uses algorithms for face detection and face tracking. He uses a speech recognition program from IBM. Movement and sound - all that information needs to be fused and provided to his 'brain'."
Then Mel was ready to go - he had an experiment with a glass and pitcher of water to explain. Visitors were filmed listening to Mel's talk.
"First, we wanted to know how appropriate his gestures were. How important were they? Was it just important that he talked, or that he also gestured?" Thirty-two conversations were filmed, some where he gestures, some where he is "wooden", she says. "People preferred the moving Mel. We found people very closely tracked him when he gestures."
Nods were hard to program for - not least because existing tracking software assumes a large degree of movement in a nod, but she found that during conversation, nods might be only a 3 per cent inclination of the head. It took almost two years to pull together the data needed to get Mel to follow nods, she says.
She describes the behaviour she programs Mel to produce as "very, very primitive" in contrast to the complexity of even the briefest human interactions.
Yet Mel is oddly compelling. Her videos of people interacting with Mel indicate that within a few moments people participate in a conversation with him using human responses such as nods and gestures.
"Basically, we're language machines," she says, talking after her lecture. "If you create gestures, we respond. It's part of our hardwire. Gesture is very basic to humans."
But why teach a robot to nod, point and follow a conversation? Sidner says it's because we will be communicating with robots - and they with us - probably a lot sooner than most people think. She envisions "robotic companions for elderly people" as populations age and there aren't enough caregivers. Robots could also show people around - in a museum, for example.
While some experts believe we will have robots within two decades - futurologist Ray Kurzweil, for example - she says she is sceptical robotics can advance that quickly.
"I think the problems are harder than that. The language problems are harder, for example. And one of the underlying problems is you need to build a very large knowledge base of what people know. You can't just dump that in; it needs to be very well integrated," she says.
Sidner thinks that far from fearing robots, people tend to be fascinated by them whether they be the little Roomba robot vacuum cleaners now on the market, or Sony Aibo dog robots.
Undoubtedly that's one reason Mel works well with people - a robotic penguin is charming, not scary. Then again, the grannies of the future may not care to have their daily medication brought in by a furry penguin on wheels.
More information: http://www.merl.com/projects/hosting/