I don't think the vhdl code is concerned with how the bits get to the input of your chip. It is only concerned with what happens after they do. The reference to 24 bits was to create one colored pixel from three 8-bit values which could be written to Digital to Analog converters controlling the Red, Blue, and Green guns aimed at the screen.
On the other hand if you have a monochrome screen ( aka Black and White) then a single 8-bit byte could represent 256 shades of gray.
It might help if you did some research into various video formats so that you can evaluate the alternatives. It's pretty hard to write code before you define your objective and the requirements.
Back of the envelope calculation
Code:
Assume a screen is 640 pixels across and 480 pixels tall
This is 307,200 pixels for the whole screen.
Using 24 bits per pixel gives 7,372,800 bits in a single image
At 30 frames per second this is 221,184,000 bits per second
That's about 4.5 nano seconds per bit which is smokin' along