Voice activity detector circuit?

MikeMl · Sep 20, 2015

I'm looking for a simple voice activity detector circuit (VAD) I can put between the audio output of a VHF receiver and an Arduino. I already have a positive logic signal from the receiver to the Arduino that says that the receiver is receiving a carrier (Carrier Operated Squech=COS) detector.

I would like to have a second detector whose output says that there is a strong likelyhood that the carrier that the receiver is detecting is modulated with human voice. I dont need to recognize what is being said; just that the receiver is detecting a voice-modulated transmission.

AnalogKid · Sep 20, 2015

The primary vocal chord frequency runs around 100 to 150 Hz or so, so a fairly narrow bandpass filter could pick off vowel energy to be amplitude tested against a threshold. This is for demodulated audio, so what is the modulation scheme? Also, how fast a decision do you need? Are you trying to unmute something fast enough to catch the first word, or something like that? Or were you thinking of something that extracts this information directly from the modulated carrier?

ak

MikeMl · Sep 20, 2015

AnalogKid said:
The primary vocal chord frequency runs around 100 to 150 Hz or so, so a fairly narrow bandpass filter could pick off vowel energy to be amplitude tested against a threshold.

Are you saying that there is a lot of energy near ~150Hz independent of the speaker?

This is for demodulated audio, so what is the modulation scheme?

As it is for demodulated audio, it shouldn't matter. As it happens, it is an AM receiver.

Also, how fast a decision do you need? Are you trying to unmute something fast enough to catch the first word, or something like that?

No, I would like to validate a 3 sec transmission after it is over with yes/no indication that the transmission carried human voice.

Or were you thinking of something that extracts this information directly from the modulated carrier?

ak

No, the input to the VAD would come from the receiver's audio output.

AnalogKid · Sep 20, 2015

Typo, 100 Hz to 250 Hz. ish. That is the fundamental for speech vowel sounds, before the mouth shapes the harmonics. Singing is different; when you sing a high A, the vocal chords vibrate at 440 Hz.

Your responses cleared up a lot. Something I forgot to ask - what else might the transmission be that is not voice? It could affect what the detector detects.

ak

MikeMl · Sep 20, 2015

The whole 3sec of transmission could be carrier only (very little background noise).
The 3sec of transmission could be loud background noise (possibly similar spectrum to voice, but no temporal variation).

I suspect that it is going to be the temporal cadence that makes it possible to say it is voice.

AnalogKid · Sep 21, 2015

Years ago I did a syllable counter based on bandpass filters, but that was for office dictation. Still, if you say that there must be a minimum number of syllables in a 3 second span, I think you can get there without a DSP.

ak

ccurtis · Sep 21, 2015

Speech is about 3 syllables per second, so I'm thinking a 4 Hz low-pass filter first, followed by a differentiator (rate of change) circuit, followed by a full-wave rectifier (absolute value) circuit, the resulting signal compared to a DC threshold set by the noise (output without voice) floor. After that, a retriggerable one-shot to prevent possible output chatter. Obviously, experimentation is in order. Never tried it myself. Just an idea.

AnalogKid · Sep 21, 2015

That pretty much describes sone of what I developed. The filter bandwidth needs to be wider to capture the harmonics of the envelope. This gets you faster detection.

ak

Mikebits · Sep 21, 2015

Nor sure if this will do what you want. Analog devices has a App note: AN-934 60 dB Wide, Dynamic Range, Low Frequency AGC Circuit Using a Single VGA Application Note (Rev. 0)
Maybe you could use the detector voltage with an ADC and use that for determining valid audio.
https://www.analog.com/media/en/technical-documentation/application-notes/AN_934.pdf

ccurtis · Sep 21, 2015

AnalogKid said:
That pretty much describes sone of what I developed. The filter bandwidth needs to be wider to capture the harmonics of the envelope. This gets you faster detection.

ak

The idea behind the differentiator is to exaggerate (speed up) the response of such slow, low-pass filter. I'm thinking that keeping the cutoff at 4 Hz will be more selective to a voice waveform envelope than a wider filter may be.

Welcome to our site!

Electro Tech is an online community (with over 170,000 members) who enjoy talking about and building electronic circuits, projects and gadgets. To participate you need to register. Registration is free. Click here to register now.

Voice activity detector circuit?

MikeMl

Well-Known Member

AnalogKid

Well-Known Member

MikeMl

Well-Known Member

AnalogKid

Well-Known Member

MikeMl

Well-Known Member

AnalogKid

Well-Known Member

ccurtis

Well-Known Member

AnalogKid

Well-Known Member

Mikebits

Well-Known Member

ccurtis

Well-Known Member

Similar threads

Latest threads

New Articles From Microcontroller Tips