Wow lots of questions!
To Jpanhalt; You are correct that the accuracy of the second oscillator is not that critical. The chaos component exists in the shape of each mains cycle, and the chaos will not be removed by any small additional chaos in the timing of the measuring. Basically entropy + pattern = entropy, entropy + slight entropy = entropy etc.
The reason I originally specced a 20MHz xtal was to run the cheap PIC at its max speed so it gave a 5MHz counter. In the hardware unit I used a 20MHz ceramic resonator which is still accurate 20MHz enough for good serial comms but is less accurate than a proper XTAL. Now if you used a better PIC that had internal OSC with PLL that could still give you a 5MHz timer that would be ok too, but I wanted to stick with the cheap and very popular 16F628A that most people have and that I have used in other projects on my web page.
To Edeca; I'm familiar with diehard and dieharder suite of tests, but they are tailored for very large sample sizes. I used ENT.EXE which is good enough for smallish data sizes. The results are well within what you would expect for true random data, in fact the 3 samples that I made roughly 100kb each all measured good and all were significantly different to each other, which are all things to be expected from real world entropy. Below shows one of the tests;
**broken link removed**
To Cachehiker; Actually I didn't find any repeating data although that is in the mains chaos if you look for the components (other frequencies etc). By using the data from the least significant bits this eliminates the larger patterns, in the same way you could measure the spacing of bricks in a brick wall down to the millionth of an inch, then just keep the last digit.
To Alec_t; That's a great question! On my first test I almost canned the project. I was using a gentle RC filter of about 47k to 0.1uF (I think) on the PIC capture input pin, and was seeing significant biasing where the capture module was firing on odd capture counts most of ther time, with count 1 (out of 256) having a rather high bias. It took me half an hour to realise that the timer1 CCP1 counter was drawing a Vdd current spike on clocking from 255 to 0, and this was dragging down Vdd by some microvolts and this was pushing the PIC CCP1 schmidt trigger pin to clock that instant in time.
To fix it I used the PIC comparator to detect the input voltage, and dropped the mains source impedance to a few hundred ohms and removed the input cap. Then added a larger tantalum cap to Vdd, and another on the comparator Vref. The result was an honest triggering of the exact voltage of the mains input regardless of tiny Vdd changes. The comparator output is a very fast digital output that now clocks the CCP1 capture pin instantly, not depending on any factors of the capture module. I then run about 100kb of data and recorded the 3 LSB bits and there was no significant bias I could detect.
Ideally all other factors will "average out in the wash", especially since the design now XORs three different and independent bits from different points in time. Ultimately this RNG data would be used by seeding it into a software psuedo RNG anyway if you want the very best data, as real RNG data can trend (for instance it could make the same byte 10 times in a row, which happens in nature, but could never happen in most psuedo RNGs).