Camera-phone photos add human dimension to machine vision

Imagine doing a Web search by placing a photograph taken with your camera phone in the search argument instead of typing in a keyword or a name.

May 1st, 2006
Th 0605lfwsc

Imagine doing a Web search by placing a photograph taken with your camera phone in the search argument instead of typing in a keyword or a name. An international team of computer science researchers appears to be developing a capability that might eventually make something like that possible. Actually the research efforts, which were described in a technical presentation in January at the 2006 SPIE Electronic Imaging meeting in San Jose, CA, have to do with providing a machine-vision system with context, which is what enables humans to understand what we see.

The paper was presented by one of the collaborators, Adetokunbo Bamidele, a research engineer and Ph.D. student at University College London, who unveiled what he described as “possibilities of a new context-aware paradigm for image analysis.” Bamidele reported accuracies of 60% for automated recognition of people and 67% for automated recognition of places, when depicted in casually taken camera-phone photographs.

The results initially seem surprising, considering the low resolution and slow shutter speeds of camera phones that lead to motion blur, or grainy photos in poor lighting conditions. And unlike the standard, frontal, head-and-shoulders composition of “mug shot” photos taken specifically for identification purposes, camera-phone users often take spontaneous photos at odd and widely varying angles, under every conceivable lighting condition and often with multiple subjects in the same photo (see figure).


Automated place-recognition accuracy of 67% has been achieved by analyzing imagery along with context information available through camera phones.
Click here to enlarge image

The inability to navigate varying image contexts has frustrated machine-vision researchers for decades, according to Bamidele and colleagues, who portray the plight of the typical machine-vision system in human terms using the following humorous story:

“You go out drinking with your friends. You get drunk . . . really drunk. You get hit over the head and pass out. You are flown to a city in a country you’ve never been to with a language you don’t understand and an alphabet you can’t read. You wake up face down in a gutter with a terrible hangover . . . you have no idea where you are or how you got there.”1

Little or no memory, severely limited fields of sensory perception, and particularly lack of context essentially prevents machine-vision systems from “understanding” what they are viewing in the same way that humans do. Nevertheless the research collaborators chopped recognition error rates from as much as 70% to as low as 33% using the ability provided by camera phones to correlate time and location based on cell-phone data with image features in the photograph.

Camera-phone sensors can provide the exact time a photograph is taken using the time server on the cellular network. They can also provide the geographic location of the camera when the picture is taken using the Cell ID from the cellular network, as well as GPS location from GPS receivers. The identity of the photographer, the company that he or she was in at the time of the photograph, and with whom the photograph might have been shared later can also be obtained using Bluetooth data. Analyzing just the contextual information provided by the camera phone actually yielded more accurate person and place recognition than analysis of the cell-phone image, Bamidele said. And the combination of context and image analysis outperformed either one alone.

Image analysis provided 43% accuracy for facial recognition, compared with 50% accuracy achieved through context analysis and 60% by combining both methods. For place recognition, 30% accuracy was achieved via image analysis, 55% through context analysis and 67% through combining both methods. The potential for camera-phone technology to make significant inroads into the automation of image recognition comes from the combination of image capture, context-sensing, programmable computation, and networking in single mobile platform, according to Bamidele. Another major aspect of that potential comes from the fact that camera phones are becoming a ubiquitous presence. Bamidele quoted a market estimate that five out of every six imaging devices sold in 2005 would be camera phones.

Additional collaborators on the project include Marc Davis (University of California Berkeley and founding director of Yahoo Research, Berkeley), Michael Smith (France Telecomm), Fred Stentiford (University College London), John Canny, Nathan Good and Simon King (University of California Berkeley) and Rajkumar Janakiraman (National University Singapore). The next steps include incorporating addition analysis techniques with the intent of further improving recognition accuracy, as well as investigating torso-matching in addition to face-matching to aid recognition of individuals in photos with multiple subjects.

Bamidele also gave a live demonstration at the Electronic Imaging Conference last winter of an automated Web-based color correction service based on work at the University College of London in pattern recognition and image processing. As with the facial and place recognition research, the color-matching work is also based on automating aspects of the human approach to image interpretation.

The automated color correction is based on the idea that color constancy arises in human vision as a result of comparison as opposed to an absolute definition of color correctness. The color correction service allows users to submit images and request that the illumination be changed to remove unacceptable coloration, for instance, by changing the illumination quality from tungsten to sunlight.

Earlier this year, the market potential of the automated color correction, as well as face and place recognition, work was recognized twice by the Center for Scientific Enterprise (London), Bamidele said. In January the combined work garnered a scholarship award, and in March the color correction service was awarded a “Provost’s Prize” for demonstrating “how fundamental research in electronic imaging (color correction) can rapidly reach the marketplace and generate revenue.”

REFERENCE

M. Davis, M. Smith, F. Stentiford, A. Bamidele, J. Canny, N. Good, S. King, R. Janakiraman, Internet Imaging VII, SPIE Conf., San Jose (Jan. 15-19, 2006).

More in Research