typically, multiplexing only decreases pin count once you get above 5 inputs. you use a number of IO that is the square root of the number of switches (rounded up of course)
with 3 switches, you have a 3x1 matrix, ie- not even really multiplexed. at 4 switches, you would go with a 2x2, which is 4 IO. at 5 switches, you use a 3x2, and use 5 IO. at 6 switches, you still use a 3x2, and so you only need 5 inputs. and from then on you keep going up and up and when you get to 16 or 25 inputs, you are saving tons of I/O.
*edit* i've never looked at these diode methods, perhaps it's possible with those somehow.