"As part of the calibration, the speed of sound is also a parameter which is optimized to obtain the best model of the system, which allows this whole procedure to act as a ridiculously overengineered thermometer."
Reminds me of the electronics adage: "all sensors are temperature sensors, some measure other things as well."
show comments
dllu
I once did a project to do multilateration of bats (the flying mammal) using an array of 4 microphones arranged in a big Y shape on the ground. Using the time difference of arrival at the four microphones, we could find the positions of each bat that flew over the array, as well as identify the species. It was used for an environmental study to determine the impact of installing wind turbines. Fun times.
show comments
dchichkov
I'm curious, why haven't you used TDM I2S microphones for your array and used PDM?
Then you can take Jetson (or any I2S capable hardware with DSP or GPU on it) and chain 16 microphones per I2S port. It would seem a lot easier to assemble and probgam, if comared to FPGA setup.
show comments
jcims
Look up acoustic cameras on YouTube, there are some pretty impressive demonstrations of their capability. This is one of the companies I've been watching for a while, but it looks like FLIR and some other big names are getting into it: https://www.youtube.com/@gfaitechgmbh
The one use case that is both creepy and interesting to me is recording a public space and then after the fact 'zooming in' to conversations between individuals.
show comments
brunosan
Armchair comment. I would LOVE to be a grad student again and try to pair it with ultrasound speaker arrays, for medical applications. Essentially a super HIFU (High-Intensity Focused Ultrasound) with live feedback. https://en.wikipedia.org/wiki/Focused_ultrasound
show comments
adamcharnock
I would love to see this come to our various mobile devices in a nicely packaged form. I think part of what is holding back assistants, universal-translators, etc, is poor audio. Both reducing noise and being able to detect direction has a huge potential to help (I want to live-translate a group conversation around a dining table, for example).
Firstly it would be great if my phone + headphones could combine the microphones to this end. But what if all phones in the immediate vicinity could cooperate to provide high quality directional audio? (Assuming privacy issues could be addressed).
show comments
hinkley
Boeing ginned up a spherical version of these and used it on 787 prototypes to identify candidates for sound deadening material.
Apparently in loud situations like airplanes, audio illusions can make a sound appear to come from a different spot than it really is. And when you have a weight budget for sound dampening material it matters if you hit the 80/20 sweet spot or not.
Salmonfisher11
If somebody wants to play around with Zynq 7010's - have a look at the EBAZ4205 board. They can be bought from Aliexpress (20-30€). These are former Bitcoin Mining controllers.
Some people reverse engineered the entire thing. It can be found in GitHub. And there's an adapter plate available for getting to the GPIOs.
For a less complex entry there are also Chinese FPGAs ("Sipeed" boards which use a GoWin FPGA. They are quite capable and the IDE is free.
show comments
kindiana
OP here, cool to see so many people are interested in this project! Happy to answer any questions (and I'll go around to reply to any questions already here)
gravypod
I was just doing research and landed on this exact page last night! I was wondering if anyone knows how someone could mic a room and record audio from only a specific area. For my use case I want to record a couch so I can watch TV with my friends online and remove their speech + show noise from the audio. Setting up some array of mics and using them for beam steering would probably work but there's not a lot of examples I could find on GitHub with code that works in real time.
show comments
cushychicken
This is more or less the same principle of how Amazon Echo devices work, but on steroids.
Very neat. I would be surprised if you aren’t seeing some diminished marginal returns from all those extra mics, but I guess you’re trying to capture azimuth deltas that Echo devices don’t really care about.
crote
I'm a bit surprised by those long "arm" PCBs. They are already doing calibration to account for some relatively large offsets: why not place each sensor on its own PCB, mount them to some carrier structure, and let calibration deal with the rest?
show comments
beambot
Starting to see more & more of this with drones. In some cases, it's for military to detect drones nearby. In others, it's being used by drone delivery companies to detect other planes in the sky in a way that is cheaper, works in low-visibility, and doesn't use the same power requirements as radar.
amelius
Nice. It would be cool if this project could cleanly separate sources based on location.
That would be a bit like a lightfield camera, where you can edit the focusing parameters after the image has already been taken, but now with sound.
What is the most practical application for this technology? Could you use it to pinpoint sounds coming from a car like a squeak?
show comments
gizajob
What about a soundfield microphone? Does about the same thing and the electronics can be done in the analogue domain.
show comments
jsharf
Wow, you can refocus the direction after the audio is recorded!
This would be cool to mix with VR, so you could hear different conversations as you move around a virtual room
sfelicio
Very very cool.
killjoywashere
I wonder how well this would work with laser microphones on a pane of glass. Can you infer keystrokes with near infrared laser? That is, can you identify the heatmap of keystroke events to infer which keyboard they're using, then replay the tape to identify the strings of characters being typed? Can you localize the turning of pages with UV?
show comments
pftburger
I wonder if there is a meaningful limit to number of listening zones. I’m imagining a 3d grid of virtual mics in a space, each with an AI behind it
Heck, train the model on the raw sensor data and you get the most awesome conference mics
djmips
This has been on my to-do list since forever! Nice work Ben Wang.
jensenbox
Why a radial pattern and not a grid?
show comments
peter_retief
Is this what is used to find direction of sound or where sound is coming from.
Like gunshots?
jojobas
Using crab rave for demo is top notch.
cma
Could this be combined with a smaller number of high quality mics and then machine learning or something else incorporating them to boost the overall quality while maintaining all the other features?
"As part of the calibration, the speed of sound is also a parameter which is optimized to obtain the best model of the system, which allows this whole procedure to act as a ridiculously overengineered thermometer."
Reminds me of the electronics adage: "all sensors are temperature sensors, some measure other things as well."
I once did a project to do multilateration of bats (the flying mammal) using an array of 4 microphones arranged in a big Y shape on the ground. Using the time difference of arrival at the four microphones, we could find the positions of each bat that flew over the array, as well as identify the species. It was used for an environmental study to determine the impact of installing wind turbines. Fun times.
I'm curious, why haven't you used TDM I2S microphones for your array and used PDM?
I understand that ICS-52000 is a relatively low cost ($2/100pcs) and there are even breakout boards available with 4 microphones, which can be chained to 8 or 16, like https://www.cdiweb.com/datasheets/notwired/ds-nw-aud-ics5200...
Then you can take Jetson (or any I2S capable hardware with DSP or GPU on it) and chain 16 microphones per I2S port. It would seem a lot easier to assemble and probgam, if comared to FPGA setup.
Look up acoustic cameras on YouTube, there are some pretty impressive demonstrations of their capability. This is one of the companies I've been watching for a while, but it looks like FLIR and some other big names are getting into it: https://www.youtube.com/@gfaitechgmbh
The one use case that is both creepy and interesting to me is recording a public space and then after the fact 'zooming in' to conversations between individuals.
Armchair comment. I would LOVE to be a grad student again and try to pair it with ultrasound speaker arrays, for medical applications. Essentially a super HIFU (High-Intensity Focused Ultrasound) with live feedback. https://en.wikipedia.org/wiki/Focused_ultrasound
I would love to see this come to our various mobile devices in a nicely packaged form. I think part of what is holding back assistants, universal-translators, etc, is poor audio. Both reducing noise and being able to detect direction has a huge potential to help (I want to live-translate a group conversation around a dining table, for example).
Firstly it would be great if my phone + headphones could combine the microphones to this end. But what if all phones in the immediate vicinity could cooperate to provide high quality directional audio? (Assuming privacy issues could be addressed).
Boeing ginned up a spherical version of these and used it on 787 prototypes to identify candidates for sound deadening material.
Apparently in loud situations like airplanes, audio illusions can make a sound appear to come from a different spot than it really is. And when you have a weight budget for sound dampening material it matters if you hit the 80/20 sweet spot or not.
If somebody wants to play around with Zynq 7010's - have a look at the EBAZ4205 board. They can be bought from Aliexpress (20-30€). These are former Bitcoin Mining controllers.
Some people reverse engineered the entire thing. It can be found in GitHub. And there's an adapter plate available for getting to the GPIOs.
For a less complex entry there are also Chinese FPGAs ("Sipeed" boards which use a GoWin FPGA. They are quite capable and the IDE is free.
OP here, cool to see so many people are interested in this project! Happy to answer any questions (and I'll go around to reply to any questions already here)
I was just doing research and landed on this exact page last night! I was wondering if anyone knows how someone could mic a room and record audio from only a specific area. For my use case I want to record a couch so I can watch TV with my friends online and remove their speech + show noise from the audio. Setting up some array of mics and using them for beam steering would probably work but there's not a lot of examples I could find on GitHub with code that works in real time.
This is more or less the same principle of how Amazon Echo devices work, but on steroids.
Very neat. I would be surprised if you aren’t seeing some diminished marginal returns from all those extra mics, but I guess you’re trying to capture azimuth deltas that Echo devices don’t really care about.
I'm a bit surprised by those long "arm" PCBs. They are already doing calibration to account for some relatively large offsets: why not place each sensor on its own PCB, mount them to some carrier structure, and let calibration deal with the rest?
Starting to see more & more of this with drones. In some cases, it's for military to detect drones nearby. In others, it's being used by drone delivery companies to detect other planes in the sky in a way that is cheaper, works in low-visibility, and doesn't use the same power requirements as radar.
Nice. It would be cool if this project could cleanly separate sources based on location.
That would be a bit like a lightfield camera, where you can edit the focusing parameters after the image has already been taken, but now with sound.
https://en.wikipedia.org/wiki/Light_field_camera
What is the most practical application for this technology? Could you use it to pinpoint sounds coming from a car like a squeak?
What about a soundfield microphone? Does about the same thing and the electronics can be done in the analogue domain.
Wow, you can refocus the direction after the audio is recorded!
This would be cool to mix with VR, so you could hear different conversations as you move around a virtual room
Very very cool.
I wonder how well this would work with laser microphones on a pane of glass. Can you infer keystrokes with near infrared laser? That is, can you identify the heatmap of keystroke events to infer which keyboard they're using, then replay the tape to identify the strings of characters being typed? Can you localize the turning of pages with UV?
I wonder if there is a meaningful limit to number of listening zones. I’m imagining a 3d grid of virtual mics in a space, each with an AI behind it
Heck, train the model on the raw sensor data and you get the most awesome conference mics
This has been on my to-do list since forever! Nice work Ben Wang.
Why a radial pattern and not a grid?
Is this what is used to find direction of sound or where sound is coming from. Like gunshots?
Using crab rave for demo is top notch.
Could this be combined with a smaller number of high quality mics and then machine learning or something else incorporating them to boost the overall quality while maintaining all the other features?
damn, this is so cool