Welcome, Odyssey25!
I was tasked, a few years back, with developing a salt water marsh display for a US Forestry Service public facility. It had eleven instances of wildlife for which 6th graders had recorded a short (not to exceed 20 seconds) verbal explanation. These had to be individually activated. And the display had to be "people proof" and stand alone (no external power).
While this, once constructed, might not suit your unit dimension specs, it might at least give you something to think about.
Essentially, I took 11, separate voice recorder/playback modules,very much like this one:
**broken link removed**
By way of a simple unit selection switch, power was applied to only the selected unit. At that point another button switch (common to ALL the units) could be pressed that would activate the playback of that powered unit whose output went to a common speaker (appropriately matched).
The units I used were 9VDC units, while those noted above will work off of 5VDC (note current usage specs).
Not, perhaps, as sophisticated as an Arduino rig, but considerably easier to construct and the display I mentioned is still in use, with the same 9VDC battery I installed 5 years ago.
<EDIT> Just had the thought of using the recordable gift card units you can find at hallmark, drug stores, etc.. They operate off of small, watch type batteries, with similar activation.