You may already know this, but...
Digitize the audio with a clock signal whose frequency is more than twice the bandwidth of your audio, and use that clock to also drive a counter which is long enough to address as much memory as you need (or have). Think of the memory address space as being a circle, like a clock. Think of the address as a rotating pointer (the write pointer) which zooms around and around the circle, storing the audio. Generate a read pointer at the same rotation frequency, but with variable offset, by using adders or counters. This generates a rotating read pointer which chases the write pointer. The delay is a function of the write-to-read offset.
You will probably want to control the pointer offset (delay) with a microcontroller, although you could do it with standard logic parts. Static RAM would probably be a lot simpler to control than DRAM (computer memory), because of the limitations of minimum clock frequency and refresh imposed by DRAM. Some of the other trolls here may be able to refute this - it's not my strong suit.