Ambient speech is recovered from video of a bag of chips

Aug. 5, 2014
In a technical paper to be presented at SIGGRAPH 2014 (10-14 Aug. 2014; Vancouver, BC, Canada), researchers at the Massachusetts Institute of technology's Computer Science and Artificial Intelligence Laboratory (MIT CSAIL; Boston, MA), Microsoft Research (Redmond, WA), and Adobe Research (San Jose, CA) will describe their successful efforts to recover ambient sound from a video taken from a distance of an object such as a bag of chips.

In a technical paper to be presented at SIGGRAPH 2014 (10-14 Aug. 2014; Vancouver, BC, Canada), researchers at the Massachusetts Institute of technology's Computer Science and Artificial Intelligence Laboratory (MIT CSAIL; Boston, MA), Microsoft Research (Redmond, WA), and Adobe Research (San Jose, CA) will describe their successful efforts to recover ambient sound from a video taken from a distance of an object such as a houseplant or a bag of chips.

The one caveat is that they use a high-speed (2 to 20 kHz) video system for most of their experiments; however, they do describe and try out a method using CMOS cameras that operate at the normal 60 Hz video rate.

(Video: MIT)

High frame rates

The experimental setup includes an object, a loudspeaker (placed on a separate stand from the object), the video camera, and photography lamps. The high-end cutoff frequency of the recovered audio is naturally related to the video frame rate used (higher rates lead to higher cutoff frequencies).

Captured resolutions ranged from 192 x 192 to 700 x 700 pixels. Sound volumes ranged from 80 dB (actor's stage voice) to 110 dB (comparable to a jet engine running 100 m away). The researchers used publicly available 14-year-old code to process the videos.

In addition to ramp signals for characterization, the researchers tested the setup on human voices, including a live speaker reciting the poem "Mary had a little lamb." The majority of experiments focused on the bag of chips at 2200 frames per second (FPS).

Speech recovery was successful, with results comparable to those taken using a laser Doppler vibrometer combined with retroreflective tape. One great advantage of the video approach itself is that no active lighting or retroreflective tape is needed.

Low frame rates

Even more interesting was how the researchers took advantage of what normally is considered a disadvantage of inexpensive CMOS imagers (such as those in phones and DSLR cameras). A typical CMOS device has a "rolling shutter" where individual lines are sequentially read out to create an image. If each line at 60 FPS is considered a separate exposure, "frame" rates of up to about 2000 Hz can be achieved.

With a loudspeaker playing speech and the bag of chips as an object, a processed and "denoised" signal was obtained; the resulting audio will be available as clips here.

The technology is patent pending.

MIT CSAIL's page on the SIGGRAPH paper: http://people.csail.mit.edu/mrub/VisualMic/

About the Author

John Wallace | Senior Technical Editor (1998-2022)

John Wallace was with Laser Focus World for nearly 25 years, retiring in late June 2022. He obtained a bachelor's degree in mechanical engineering and physics at Rutgers University and a master's in optical engineering at the University of Rochester. Before becoming an editor, John worked as an engineer at RCA, Exxon, Eastman Kodak, and GCA Corporation.

Sponsored Recommendations

Request a quote: Micro 3D Printed Part or microArch micro-precision 3D printers

April 11, 2024
See the results for yourself! We'll print a benchmark part so that you can assess our quality. Just send us your file and we'll get to work.

Request a Micro 3D Printed Benchmark Part: Send us your file.

April 11, 2024
See the results for yourself! We'll print a benchmark part so that you can assess our quality. Just send us your file and we'll get to work.

Request a free Micro 3D Printed sample part

April 11, 2024
The best way to understand the part quality we can achieve is by seeing it first-hand. Request a free 3D printed high-precision sample part.

How to Tune Servo Systems: The Basics

April 10, 2024
Learn how to tune a servo system using frequency-based tools to meet system specifications by watching our webinar!

Voice your opinion!

To join the conversation, and become an exclusive member of Laser Focus World, create an account today!