this should be ne problem using serial in / parallel out, with latch
like **broken link removed**
you need only 3 out of 8 lines you have right now (data, clock and latch).
you can add as many 74HC595 as you like and still control them using only three lines. one chip would give you 8 outputs so for 40 you would need five chips.
There is a bargraph led driver (serial in, 36 outputs) that could also be used, cant recall the name offhand but a search should find it.
Each line could then control 36 outputs.
This will do up to 128 outputs. Use 5 '573's for 40 outputs
With C0 high and C1 low, At the pin, not register
put address of 573 on data bus
Toggle C1 high then low
Put data on data bus
Toggle C0 low then high
Not yet tested, parts on order