Hi,
You might start by looking up the "Butterfly Algorithm". That might help get you started.
To be of any more help i'd have to review this myself as the last time i did a DCT was for decoding a Jpg image file, and that was probably 15 years ago. By using these methods you can reduce the calculations greatly. I was able to reduce the Jpg decoder DCT to use just five constants and i think two of them could be calculated from the other three. The number of calculations are thus reduced.
If i remember right there is a transformation you can do to convert from FFT to DCT, but again i'd have to review that myself to be of more help.
BTW, "Null multiplicative complexity" would refer to the fact that the problem can be solved by using math operations that are of a lower level than multiplication, such as addition and maybe bitwise operations. Put more simply, no multiplications are needed when you only have 1's and -1's and 0's.
In the past, algorithms used to be classified as to the number of multiplications they required ignoring lower level math operations because multiplications took so much longer than additions and subtractions. Today that's not true for light engineering but for more complicated problems more precision is often needed so the standard floating point units can no longer do it all. Every time the precision is doubled, the direct multiplication operations for a single full width multiplication goes up by four times and the full width additions for that one multiplication goes up by two times which is really the same as four direct additions. So the number of multiplications can still be an important dominating factor in algorithm execution time. If you dont have to do any multiplications you're very happy with your algorithm.
The number of stored constants used to be a factor also, but i bet that's not too much of an issue for most regular problems anymore due to the way computer memory progressed over the years.