There are two basic ways you can interface to a graphic LCD, especially the larger multi colour ones. Some have a processor friendly interface such as I2C, SPI or parallel and you feed them simple commands. These need to have inbuilt memory so they can store a copy of what should be on the screen. On the LCD board will be a chip that is responsible for refreshing the display multiple times a second.
Alternatively some screens have a raw pixel interface which you must feed with a simple video signal at a fast rate. This makes it possible to do video but is much harder to work with and your processor needs enough RAM to keep a copy of the screen at all times. A lot of processor cycles will be taken up keeping the screen refreshed unless you have dedicated circuitry in the processor (such as the inbuilt display driver on the 24F series PICs, which is designed to work with these).
The first is common for small to medium screens, often in monochrome but sometimes with colour. The second is more common on medium to large pixel count displays such as STN/TFT screens. Your screen is definitely the first type and therefore it will definitely have memory.
This means that you do not need to keep an entire copy of the screen at once in your chip. Happy news! However you still need space to store any image data which you will send to the screen.
With regard to storing this data I suggested 3 bytes per pixel as this keeps the code simple. If you pack your 5,6,5 data into two bytes (not one, obviously this is not possible) then you end up with an arrangement like RRRRRGGG GGGBBBBB in two bytes. I have not checked your datasheet in detail but it usually makes sense to store it in the same way the GLCD expects to receive it. However what if you want to modify a pixel? You have to unpack the three values with some code or modify them in place.
This is less code efficient than simply storing NNNRRRRR NNGGGGGG NNNBBBBB in three bytes. You have the raw value in each byte and can easily operate on it using the mathematical opcodes your processor has. Many 8 bit uCs have addition and subtraction in a single cycle, some even have multiplication and division. This is more code efficient but 1/3rd less memory efficient across the whole screen.
Hope this helps.