There are two data write types in the driver: page aligned and non page aligned. In both cases i pump out a byte at a time directly from the array.
Page aligned writes are simple - i iterate over the bytes of the letters (see note) and output. The byte from the array is outputted directly to the LCD with no masks and no shifts.
Non page aligned writes (for example, write text starting from vertical pixel 3 as opposed to pixels 0, 8, 16 which are page aligned) require me to combine two bytes from the array into one. For each physical LCD page, i must take part from the higher byte and part from the lower byte. There is nothing that can be done to avoid this, as far as i can think of. Perhaps this is what you saw.
Note: In both cases the bytes are not iterated over in a sequential fashion (byte[0], byte[1], byte[2], ..., byte[n]) and calculating the next byte's offset is indeed overhead. This is required to allow the bitmap to be stored in a visual manner to allow manual editing. I will have an option to instruct the bitmap to text converter to generate the data so that the bytes will be sequential. This will mean that they will not be easy to edit manually, but will allow faster outputting to the LCD.