It's not the speed, it's the instruction set. The F4 uses a Cortex-M4 core which has an extended DSP instruction set.
I don't use STM32, but LPC offers some DSP libraries for their Cortex-M3 devices. You might want to look for something similar for the STM32 Cortex-M3.
Assuming you can find the libraries, I have no experience in DSP, so I have no idea whether the F3 would be fast enough with just the libraries to do what you want with it.
I know that F4 have single cycle Multiply–accumulate operation, which is very important for DSP, and F3 doesn't have this feature, but on the other hand I need to do a simple audio project with digital filters.
See the STM32 DSP library from ST and CMSIS DSP Software Library from Keil:
Processor Support
The library is completely written in C and is fully CMSIS compliant. High performance is achieved through maximum use of Cortex-M4 intrinsics.
The supplied library source code also builds and runs on the Cortex-M3 and Cortex-M0 processor, with the DSP intrinsics being emulated through software.