Continue to Site

Welcome to our site!

Electro Tech is an online community (with over 170,000 members) who enjoy talking about and building electronic circuits, projects and gadgets. To participate you need to register. Registration is free. Click here to register now.

  • Welcome to our site! Electro Tech is an online community (with over 170,000 members) who enjoy talking about and building electronic circuits, projects and gadgets. To participate you need to register. Registration is free. Click here to register now.

Speech Recognition(theory)

Status
Not open for further replies.

Smartie

Member
Hi guys,

I'm currently hacking a old furby i got second hand and i wanted to know if it was possible to implement some sort of speech recognition.

The idea is to have this furby connected to the computer by serial or blue tooth and it would tell me if i have any new email messages or any news updates or possibly the weather as well

I've looked around for speech recognition and one site seems to make chips that can identify commands but i cant find any available to buy any where. I came across an old thread on this site talking about it and it was suggested to work with a dsPIC but i don't have one and not sure whether to go with them or not.

What if i was to sample the sound with my PIC and send the data straight to the PC via rs232 and have the computer analyze the data?

I'm just after ideas at this stage on how this could be done.

Cheers
Roman
 
The idea is to have this furby connected to the computer by serial or blue tooth and it would tell me if i have any new email messages or any news updates or possibly the weather as well
Sounds like you just need a text to speech program. Or do you want to give a verbal command to the computer like "Weather" and it reads you the weather report.
Windows7 has speech recognition built in, so all you'd need is a mic and speaker in the furby connected to your sound card.
 
Sounds like you just need a text to speech program. Or do you want to give a verbal command to the computer like "Weather" and it reads you the weather report.
Windows7 has speech recognition built in, so all you'd need is a mic and speaker in the furby connected to your sound card.

yeah, I'm after verbal commands. That's not a bad idea but I'm hoping to have this on Bluetooth in the end so not sure if this will work all the way through.

I've found an example of how to use windows voice recognition in c# which is good so some ideas on how to get the sound from Furby's Mic to the pc would be good
 
Speech recognition is a difficult thing. The last time I checked connected speach (the way we run words together when we talk) was still a topic of research in the AI community.

I suggest you put a directional mic on a stand and use steppers to track the robot. Use the PC to do the recognition.
 
Last edited:
What if I was to do something like this?

**broken link removed**
The pic sends raw data from the mic to the pc, then the pc sends that to a virtual audio device (microphone) then windows Speech Recognition can understand the commands, i can set up a C# app that responds to specific commands like "Hello Furby".

What do you guys think?
 
Last edited:
It would be easier just to send the audio from the furby to the computer's sound card via an analog radio link. It could be a privacy issue if you have marital relations in the same room as the furby. :D
 
Last edited:
And even easier still to connect Furby by a little coax cable from his mic to the PC soundcard input. That's the easiest way to get the PC software happening and see if it will be worth the effort.

And it gives you a bit more privacy... ;)
 
Well, mono speech at 16 bits/8KHz sampling is 128Kbit/sec, not counting the stop and/or parity bits. That's a pretty hefty link.

I would recommend the dsPIC you heard about. It's "self contained".

ALL speech recognition suffers significantly if it's a room-based omni mike, as opposed to a mike fixed right in front of you.
 
Well, mono speech at 16 bits/8KHz sampling is 128Kbit/sec, not counting the stop and/or parity bits. That's a pretty hefty link.

I would recommend the dsPIC you heard about. It's "self contained".

ALL speech recognition suffers significantly if it's a room-based omni mike, as opposed to a mike fixed right in front of you.

well the input could be clipped to reduce bits, not sure if this is a good idea tho

**broken link removed**
 
Reducing the bit width reduces the sound quality, which degrades the performance of the recognition, as does reducing the sample rate, which means limiting the bandwidth. 4KHz bandwidth, 8 KHz sample rate covers all of speech, but not singing.

I don't know how much it'll be degraded by using 15 or 14 bits... there's be a lot of math in fixing the serial data bytes into 14 or 15 bits. 8 would probably be a huge problem.

The resolution you get with a particular bit width is related to the dynamic range used. For example, if your ADC is configured so that the max and min 16-bit codes occur with 1vpp, but the microphone is amplified so that speech only produces maximum vpp of 250mV, then that's the same resolution of 14 bit sound that goes through the entire bit space. But, it has the headroom to accommodate louder speech if needed, and clipping from experiencing sound louder than the ADC has a code for is bad. Getting just the right amplification is actually a bit tricky. The problem is MUCH worse with open mikes, because the sound level varies a lot with distance.
 
Last edited:
You completely skipped signal to noise ratio...
 
this is going to be a lot harder than i thought... but at least its worth the try.

wouldn't windows speech recognition be able to work with the noise?
 
smartie, why do you think we don't all have computers and houses that light up when we calmly speak 'lights' while walking into a room, or see people repeatedly shouting 'call steve' into a phone three times and then dialing it manually and never using the voice features again =O

NOTHING works with noise, you have to have a good signal to work with, structured background information that we would call 'noise' is in fact information to a PC.

The best possible voice recognition method I could think of would use a throat mic, they basically pick up vibrations directly from the voice box, but lose all of the information the mouth adds, so it'd require very heavily defined dictionaries of easily definable words.
 
Last edited:
smartie, why do you think we don't all have computers and houses that light up when we calmly speak 'lights' while walking into a room, or see people repeatedly shouting 'call steve' into a phone three times and then dialing it manually and never using the voice features again =O

ah I get ya. I'll start Googleing methods of how to reduce noise
 
You'll never find what you're looking for.
As I stated, what YOU consider noise, such as background vibration other people talking etc.. etc.. are information to the PC, you can't eliminate that without also eliminating the originating signal. Active noise cancleation is a possibility but that still requires isolation of the 'data' source and the 'noise' source. So you need a mic at the mouth, that has high distance attenuation, and one more distance that has a more general pickup so you can selectively add them together to get the best results.
 
Status
Not open for further replies.

Latest threads

New Articles From Microcontroller Tips

Back
Top