It sounds like the data gathered in the experiment as described will actually be quite good if those who might use the data are interested in how long a flashlight will work. If I were a consumer I'd appreciate data in a form that means something to me - a test that represents what I might do. On the other hand, data resulting from constant current draw or other refinements might be more useful.
I do not know if the light output of the bulb will change significantly as it ages. That might be something to watch for, to the extent that you can. If I had six different batteries (label them A,B,C,D,E&F) to test I'd purchase three of each then test one each of A thru F, then another of A thru F then the final group of A thru F. Keeping that order might help to reveal aging of the bulb.
Worth noting is that the resistance of the bulb will change with the voltage applied. Note also that the light spectrum changes with voltage yet your photocell may have different sensitivities at different wavelengths. These things are good to know but may not affect your experiment to a great degree.