That link you posted is right but it's probably confusing.
Instead of using a formula, understand how it works. A 8051 machine cycle is 12 clock cycles, which is what it takes for the timer to increment.
To keep it simple, first take a 12MHz crystal, since a machine cycle takes exactly 1us. If you load the 16-bit timer with zero, it counts from 0 to 65535. Which takes 65536us. That's the max delay.
So to get a delay of 5ms, you subtract 5000 from 65536, and load the timer with the result. That's all.