Continue to Site

Welcome to our site!

Electro Tech is an online community (with over 170,000 members) who enjoy talking about and building electronic circuits, projects and gadgets. To participate you need to register. Registration is free. Click here to register now.

  • Welcome to our site! Electro Tech is an online community (with over 170,000 members) who enjoy talking about and building electronic circuits, projects and gadgets. To participate you need to register. Registration is free. Click here to register now.

Voice activity detector circuit?

Status
Not open for further replies.

MikeMl

Well-Known Member
Most Helpful Member
I'm looking for a simple voice activity detector circuit (VAD) I can put between the audio output of a VHF receiver and an Arduino. I already have a positive logic signal from the receiver to the Arduino that says that the receiver is receiving a carrier (Carrier Operated Squech=COS) detector.

I would like to have a second detector whose output says that there is a strong likelyhood that the carrier that the receiver is detecting is modulated with human voice. I dont need to recognize what is being said; just that the receiver is detecting a voice-modulated transmission.
 
The primary vocal chord frequency runs around 100 to 150 Hz or so, so a fairly narrow bandpass filter could pick off vowel energy to be amplitude tested against a threshold. This is for demodulated audio, so what is the modulation scheme? Also, how fast a decision do you need? Are you trying to unmute something fast enough to catch the first word, or something like that? Or were you thinking of something that extracts this information directly from the modulated carrier?

ak
 
The primary vocal chord frequency runs around 100 to 150 Hz or so, so a fairly narrow bandpass filter could pick off vowel energy to be amplitude tested against a threshold.
Are you saying that there is a lot of energy near ~150Hz independent of the speaker?

This is for demodulated audio, so what is the modulation scheme?
As it is for demodulated audio, it shouldn't matter. As it happens, it is an AM receiver.

Also, how fast a decision do you need? Are you trying to unmute something fast enough to catch the first word, or something like that?
No, I would like to validate a 3 sec transmission after it is over with yes/no indication that the transmission carried human voice.

Or were you thinking of something that extracts this information directly from the modulated carrier?

ak
No, the input to the VAD would come from the receiver's audio output.
 
Typo, 100 Hz to 250 Hz. ish. That is the fundamental for speech vowel sounds, before the mouth shapes the harmonics. Singing is different; when you sing a high A, the vocal chords vibrate at 440 Hz.

Your responses cleared up a lot. Something I forgot to ask - what else might the transmission be that is not voice? It could affect what the detector detects.

ak
 
The whole 3sec of transmission could be carrier only (very little background noise).
The 3sec of transmission could be loud background noise (possibly similar spectrum to voice, but no temporal variation).

I suspect that it is going to be the temporal cadence that makes it possible to say it is voice.
 
Years ago I did a syllable counter based on bandpass filters, but that was for office dictation. Still, if you say that there must be a minimum number of syllables in a 3 second span, I think you can get there without a DSP.

ak
 
Speech is about 3 syllables per second, so I'm thinking a 4 Hz low-pass filter first, followed by a differentiator (rate of change) circuit, followed by a full-wave rectifier (absolute value) circuit, the resulting signal compared to a DC threshold set by the noise (output without voice) floor. After that, a retriggerable one-shot to prevent possible output chatter. Obviously, experimentation is in order. Never tried it myself. Just an idea.
 
That pretty much describes sone of what I developed. The filter bandwidth needs to be wider to capture the harmonics of the envelope. This gets you faster detection.

ak
 
That pretty much describes sone of what I developed. The filter bandwidth needs to be wider to capture the harmonics of the envelope. This gets you faster detection.

ak

The idea behind the differentiator is to exaggerate (speed up) the response of such slow, low-pass filter. I'm thinking that keeping the cutoff at 4 Hz will be more selective to a voice waveform envelope than a wider filter may be.
 
Status
Not open for further replies.

New Articles From Microcontroller Tips

Back
Top