It all depends on what range of words/sounds you want to recognise and whether you need a real-time system or not.
The HM2007 is limited to recognising 40 words each up to 1 sec duration.
If you only needed to distinguish between, say, "go" and "stop" the processing could be relatively simple and handled by an 8051. Handling very large vocabularies, especially in a real-time system, would probably be beyond the capabilities of the 8051.