Ambient speech is recovered from video of a bag of chips

Aug. 5, 2014
In a technical paper to be presented at SIGGRAPH 2014 (10-14 Aug. 2014; Vancouver, BC, Canada), researchers at the Massachusetts Institute of technology's Computer Science and Artificial Intelligence Laboratory (MIT CSAIL; Boston, MA), Microsoft Research (Redmond, WA), and Adobe Research (San Jose, CA) will describe their successful efforts to recover ambient sound from a video taken from a distance of an object such as a bag of chips.

In a technical paper to be presented at SIGGRAPH 2014 (10-14 Aug. 2014; Vancouver, BC, Canada), researchers at the Massachusetts Institute of technology's Computer Science and Artificial Intelligence Laboratory (MIT CSAIL; Boston, MA), Microsoft Research (Redmond, WA), and Adobe Research (San Jose, CA) will describe their successful efforts to recover ambient sound from a video taken from a distance of an object such as a houseplant or a bag of chips.

The one caveat is that they use a high-speed (2 to 20 kHz) video system for most of their experiments; however, they do describe and try out a method using CMOS cameras that operate at the normal 60 Hz video rate.

(Video: MIT)

High frame rates

The experimental setup includes an object, a loudspeaker (placed on a separate stand from the object), the video camera, and photography lamps. The high-end cutoff frequency of the recovered audio is naturally related to the video frame rate used (higher rates lead to higher cutoff frequencies).

Captured resolutions ranged from 192 x 192 to 700 x 700 pixels. Sound volumes ranged from 80 dB (actor's stage voice) to 110 dB (comparable to a jet engine running 100 m away). The researchers used publicly available 14-year-old code to process the videos.

In addition to ramp signals for characterization, the researchers tested the setup on human voices, including a live speaker reciting the poem "Mary had a little lamb." The majority of experiments focused on the bag of chips at 2200 frames per second (FPS).

Speech recovery was successful, with results comparable to those taken using a laser Doppler vibrometer combined with retroreflective tape. One great advantage of the video approach itself is that no active lighting or retroreflective tape is needed.

Low frame rates

Even more interesting was how the researchers took advantage of what normally is considered a disadvantage of inexpensive CMOS imagers (such as those in phones and DSLR cameras). A typical CMOS device has a "rolling shutter" where individual lines are sequentially read out to create an image. If each line at 60 FPS is considered a separate exposure, "frame" rates of up to about 2000 Hz can be achieved.

With a loudspeaker playing speech and the bag of chips as an object, a processed and "denoised" signal was obtained; the resulting audio will be available as clips here.

The technology is patent pending.

MIT CSAIL's page on the SIGGRAPH paper: http://people.csail.mit.edu/mrub/VisualMic/

About the Author

John Wallace | Senior Technical Editor (1998-2022)

John Wallace was with Laser Focus World for nearly 25 years, retiring in late June 2022. He obtained a bachelor's degree in mechanical engineering and physics at Rutgers University and a master's in optical engineering at the University of Rochester. Before becoming an editor, John worked as an engineer at RCA, Exxon, Eastman Kodak, and GCA Corporation.

Sponsored Recommendations

How Precision Motion Systems are Shaping the Future of Semiconductor Manufacturing

March 28, 2024
This article highlights the pivotal role precision motion systems play in supporting the latest semiconductor manufacturing trends.

Understanding 3D Printing Tolerances: A Guide to Achieving Precision in Additive Manufacturing

March 28, 2024
In the world of additive manufacturing, precision is paramount. One crucial aspect of ensuring precision in 3D printing is understanding tolerances. In this article, we’ll explore...

Automation Technologies to Scale PIC Testing from Lab to Fab

March 28, 2024
This webinar will cover the basics of precision motion systems for PIC testing and discuss the ways motion solutions can be specifically designed to address the production-scale...

Case Study: Medical Tube Laser Processing

March 28, 2024
To enhance their cardiovascular stent’s precision, optimize throughput and elevate part quality, a renowned manufacturer of medical products embarked on a mission to fabricate...

Voice your opinion!

To join the conversation, and become an exclusive member of Laser Focus World, create an account today!