Automated face detection improves with CMU 'tiny faces' algorithm
Carnegie Mellon University researchers have demonstrated a significant advance in detecting tiny faces in a crowd.
IMAGE: An automated face detection method developed at Carnegie Mellon University enables computers to recognize faces in images at a variety of scales, including tiny faces composed of just a handful of pixels. (Image credit: CMU)
The trick to finding tiny objects, say researchers at Carnegie Mellon University (CMU; Pittsburgh, PA), is to look for larger things associated with them. An improved method for coding that crucial context from an image has enabled researchers Deva Ramanan, associate professor of robotics, and Peiyun Hu, PhD student in robotics, to demonstrate a significant advance in detecting tiny faces.
When applied to benchmarked datasets of faces, their method reduced error by a factor of two, and 81% of the faces found using their methods proved to be actual faces, compared with 29 to 64% for prior methods. "It's like spotting a toothpick in someone's hand," Ramanan said. "The toothpick is easier to see when you have hints that someone might be using a toothpick. For that, the orientation of the fingers and the motion and position of the hand are major clues."
RELATED ARTICLE: Still overcoming challenges, facial recognition technology advances
Similarly, to find a face that may be only a few pixels in size, it helps to first look for a body within the larger image, or to realize an image contains a crowd of people. Spotting tiny faces could have applications such as doing headcounts to calculate the size of crowds. Detecting small items in general will become increasingly important as self-driving cars move at faster speeds and must monitor and evaluate traffic conditions in the distance.
The research paper describes how "foveal descriptors" encode context in a way similar to how human vision is structured. Just as the center of the human field of vision is focused on the retina's fovea, where visual acuity is highest, the foveal descriptor provides sharp detail for a small patch of the image, with the surrounding area shown as more of a blur.
By blurring the peripheral image, the foveal descriptor provides enough context to be helpful in understanding the patch shown in high focus, but not so much that the computer becomes overwhelmed. This allows Hu and Ramanan's system to make use of pixels that are relatively far away from the patch when deciding if it contains a tiny face. In addition to contextual reasoning, Ramanan and Hu improved the ability to detect tiny objects by training separate detectors for different scales of objects. A detector that is looking for a face just a few pixels high will be baffled if it encounters a nose several times that size, they noted.
The Intelligence Advanced Research Projects Agency supported this research. The work is part of CMU's BrainHub initiative to study how the structure and activity of the brain give rise to complex behaviors, and to develop new technologies that build upon those insights.
SOURCE: Carnegie Mellon University; http://www.cmu.edu/news/stories/archives/2017/march/faces-in-crowd.html