Are you referring to when first turning on a television set? If so the audio circuit(s) power up instantly. Video on the otherhand requires the build-up of high voltage to begin to build the "beam". This takes a tad bit longer than the audio circuits. This applies to CRT type televisions and some plasma sets.
Both the audio and video are modulated onto a radio wave, which propagates at ~300,000,000 m/s. What you see and hear out of the TV set has to do with how the signals are processed inside the TV. At the distance between the observer and the TV set, the acoustic delay between TV speaker and observer's ears compared to the visual delay between the TV screen and the observer's eyes is imperceptible to humans.
A little FYI: TV systems typically have facilities to synchronize sound and picture information. This is required, not because of a difference in propagation, but because of a difference in processing delays. In modern digital cable transmission, sound and picture info are sent as different packets, and synchronized using time stamp information embedded in the packets. It doesn't always work; sometimes, you see the sync get off.