Leftyretro - I did consider doing what you suggest, but my IC (ATMega32) can only source 40mA per pin, so it's useless for driving any kind of large array. The parts count isn't too bad - you just need 64 high side transistors, 8 for the low, a resistor for each base, and 64 more for the columns. In this prototype version, I also have 9 decoder chips, that select the row and column respectively. Using serial-to parallel outputs, that could become 4, using 4 16bit led drivers / SIPOs.
Multiplexing is something of a necessity with these things I think - assuming you want to address every 'voxel' (3d pixel, apparently) of the cube individually, you can't light all the LEDs at once. If you see the enclosed pic, the most LEDs that can be lit at any time is n, where the size of the cube is n^3. If you were to try lighting them all at once - lighting both led (0,0,0) and led (7,7,7) would result in also lighting (0,0,7) and (7,7,0) as well. See the enclosed pic for how it is wired to see what I mean...
Charlieplexing is also possible - my original plan was to use 'googleplexing' - but I figure it adds another layer of complexity to the software element. As far as I can see, it doesn't allow you to light any more LEDs, and is only really of use if you want to drive really huge arrays. As a limiting factor, using serial-to-parallel registers - you only need 1 pin for every 16 columns in the cube. That is; you could easily light a 16x16x16 cube with 20 pins: 16 for the serial input registers, and 4 for a 16bit decoder/demultiplexer.
If you do make a cube, I suggest you construct the thing using a jig and tinned copper wire as a frame for the layers. Just using the LED cathodes to join the ground planes together makes the thing too fragile, and is probably worse aesthetically. Also, if you do go through with it, feel free to ask me, as I dare say I've hit every design hurdle on the way...!!