‘Secret sauce’ for web image searches
Searching for images or video could be about to get a lot easier
In Dublin this week for the Special Interest Group on Information Retrieval conference, Lorenzo Torresani set out to see whether there might be a way that the contents of an image could also be understood by search engines, improving search results for both documents and images.
Anyone who has ever gone looking for an image on a search engine such as Bing or Google knows how hit-and-miss such an endeavour can be.
Type in a search term like “Ferrari Formula One” and up will come lots of images of the cars, but even images that come back among the top returns can vary greatly in quality. And along with the car images will be plenty of pictures that seemingly haven’t anything to do with cars at all.
That’s because searching for images or video on the internet remains a difficult challenge, reliant on people putting tags on images to identify the content, or textual elements of the page that might indicate the content of the image. So search engines don’t do much with images at all.
“One reason why the web is so compelling is that we get the gist of a web page mostly from the images,” says Lorenzo Torresani, assistant professor of computer science and head of the Visual Learning Group at Dartmouth College in the United States. “We look at the title, but if we have a few pictures at the top, that’s what we look at.
“Yet it’s remarkable that search engines do the opposite. They actually ignore, they strip away, that kind of information.”
Torresani thinks he has a better solution, which addresses one of the key problems preventing more productive searches: the fact that popular search engines base results on an analysis of the text on a page alone. Information about digital images is encoded in a different language of visual descriptors that mean little to a text-based search engine.
Thanks to huge advances in image recognition in the past few years, Torresani says that high level information about an image can now be automatically deduced by computer programs, giving strong clues about the actual contents of an image, such as whether a given pixel is likely to represent a steel, glass or grass surface. But that information isn’t read by search engines.
People working on text-based information retrieval tend to think the problem is simply too difficult to resolve as searches would slow significantly if, say, Google were to try and evaluate billions of individual images along with text-based documents.
Cracking the problem
In Dublin this week to present his group’s work at the 36th annual SIGIR (Special Interest Group on Information Retrieval) conference, Torresani set out to see whether there might be a way that the contents of an image could also be understood by search engines, improving search results for both documents and images.
In a collaboration between Microsoft’s UK research lab in Cambridge, where he worked for two years, and Dartmouth, where he has been based for half a decade, Torresani and his team think they have cracked the problem.
They have developed an approach that effectively creates a translator between the “languages” of text and image, bringing together the worlds of information retrieval and image recognition.