Speech Recognition(theory)

Smartie · Dec 27, 2009

Hi guys,

I'm currently hacking a old furby i got second hand and i wanted to know if it was possible to implement some sort of speech recognition.

The idea is to have this furby connected to the computer by serial or blue tooth and it would tell me if i have any new email messages or any news updates or possibly the weather as well

I've looked around for speech recognition and one site seems to make chips that can identify commands but i cant find any available to buy any where. I came across an old thread on this site talking about it and it was suggested to work with a dsPIC but i don't have one and not sure whether to go with them or not.

What if i was to sample the sound with my PIC and send the data straight to the PC via rs232 and have the computer analyze the data?

I'm just after ideas at this stage on how this could be done.

Cheers
Roman

kchriste · Dec 27, 2009

smartie_on_computer said:
The idea is to have this furby connected to the computer by serial or blue tooth and it would tell me if i have any new email messages or any news updates or possibly the weather as well

Sounds like you just need a text to speech program. Or do you want to give a verbal command to the computer like "Weather" and it reads you the weather report.
Windows7 has speech recognition built in, so all you'd need is a mic and speaker in the furby connected to your sound card.

Smartie · Dec 27, 2009

kchriste said:
Sounds like you just need a text to speech program. Or do you want to give a verbal command to the computer like "Weather" and it reads you the weather report.
Windows7 has speech recognition built in, so all you'd need is a mic and speaker in the furby connected to your sound card.

yeah, I'm after verbal commands. That's not a bad idea but I'm hoping to have this on Bluetooth in the end so not sure if this will work all the way through.

I've found an example of how to use windows voice recognition in c# which is good so some ideas on how to get the sound from Furby's Mic to the pc would be good

Smartie · Dec 28, 2009

so no more ideas?

3v0 · Dec 28, 2009

Speech recognition is a difficult thing. The last time I checked connected speach (the way we run words together when we talk) was still a topic of research in the AI community.

I suggest you put a directional mic on a stand and use steppers to track the robot. Use the PC to do the recognition.

Smartie · Dec 29, 2009

What if I was to do something like this?

**broken link removed**
The pic sends raw data from the mic to the pc, then the pc sends that to a virtual audio device (microphone) then windows Speech Recognition can understand the commands, i can set up a C# app that responds to specific commands like "Hello Furby".

What do you guys think?

kchriste · Dec 29, 2009

It would be easier just to send the audio from the furby to the computer's sound card via an analog radio link. It could be a privacy issue if you have marital relations in the same room as the furby.

Mr RB · Dec 29, 2009

And even easier still to connect Furby by a little coax cable from his mic to the PC soundcard input. That's the easiest way to get the PC software happening and see if it will be worth the effort.

And it gives you a bit more privacy...

Oznog · Dec 30, 2009

Well, mono speech at 16 bits/8KHz sampling is 128Kbit/sec, not counting the stop and/or parity bits. That's a pretty hefty link.

I would recommend the dsPIC you heard about. It's "self contained".

ALL speech recognition suffers significantly if it's a room-based omni mike, as opposed to a mike fixed right in front of you.

Smartie · Dec 30, 2009

Oznog said:
Well, mono speech at 16 bits/8KHz sampling is 128Kbit/sec, not counting the stop and/or parity bits. That's a pretty hefty link.

I would recommend the dsPIC you heard about. It's "self contained".

ALL speech recognition suffers significantly if it's a room-based omni mike, as opposed to a mike fixed right in front of you.

well the input could be clipped to reduce bits, not sure if this is a good idea tho

**broken link removed**

Oznog · Dec 30, 2009

Reducing the bit width reduces the sound quality, which degrades the performance of the recognition, as does reducing the sample rate, which means limiting the bandwidth. 4KHz bandwidth, 8 KHz sample rate covers all of speech, but not singing.

I don't know how much it'll be degraded by using 15 or 14 bits... there's be a lot of math in fixing the serial data bytes into 14 or 15 bits. 8 would probably be a huge problem.

The resolution you get with a particular bit width is related to the dynamic range used. For example, if your ADC is configured so that the max and min 16-bit codes occur with 1vpp, but the microphone is amplified so that speech only produces maximum vpp of 250mV, then that's the same resolution of 14 bit sound that goes through the entire bit space. But, it has the headroom to accommodate louder speech if needed, and clipping from experiencing sound louder than the ADC has a code for is bad. Getting just the right amplification is actually a bit tricky. The problem is MUCH worse with open mikes, because the sound level varies a lot with distance.

Sceadwian · Dec 30, 2009

You completely skipped signal to noise ratio...

Smartie · Dec 30, 2009

this is going to be a lot harder than i thought... but at least its worth the try.

wouldn't windows speech recognition be able to work with the noise?

Sceadwian · Dec 30, 2009

smartie, why do you think we don't all have computers and houses that light up when we calmly speak 'lights' while walking into a room, or see people repeatedly shouting 'call steve' into a phone three times and then dialing it manually and never using the voice features again =O

NOTHING works with noise, you have to have a good signal to work with, structured background information that we would call 'noise' is in fact information to a PC.

The best possible voice recognition method I could think of would use a throat mic, they basically pick up vibrations directly from the voice box, but lose all of the information the mouth adds, so it'd require very heavily defined dictionaries of easily definable words.

Smartie · Dec 30, 2009

Sceadwian said:
smartie, why do you think we don't all have computers and houses that light up when we calmly speak 'lights' while walking into a room, or see people repeatedly shouting 'call steve' into a phone three times and then dialing it manually and never using the voice features again =O

ah I get ya. I'll start Googleing methods of how to reduce noise

Sceadwian · Dec 30, 2009

You'll never find what you're looking for.
As I stated, what YOU consider noise, such as background vibration other people talking etc.. etc.. are information to the PC, you can't eliminate that without also eliminating the originating signal. Active noise cancleation is a possibility but that still requires isolation of the 'data' source and the 'noise' source. So you need a mic at the mouth, that has high distance attenuation, and one more distance that has a more general pickup so you can selectively add them together to get the best results.

Speech Recognition(theory)

Smartie

Member

kchriste

New Member

Smartie

Member

Smartie

Member

3v0

Coop Build Coordinator

Smartie

Member

kchriste

New Member

Mr RB

Well-Known Member

Oznog

Active Member

Smartie

Member

Oznog

Active Member

Sceadwian

Banned

Smartie

Member

Sceadwian

Banned

Smartie

Member

Sceadwian

Banned

Similar threads

Speech Recognition(theory)

Member

New Member

Member

Member

Coop Build Coordinator

Member

New Member

Well-Known Member

Active Member

Member

Active Member

Banned

Member

Banned

Member

Banned

Similar threads

Privacy & Transparency

Privacy & Transparency