Mr RB's RNG and RNGs in general

Mr RB · Nov 28, 2010

MisterT-

If you use 17 cylinders and the possible cylinder values are the 18 numbers: {1,2,3,1,2,3,4,5,7,1,2,3,6,4,1,3,11,9}

Number of unique cylinder permutations is:

18! / (4! * 3! * 4! * 2!) = 926 269 344 000

You must divide by the factorials because you have number 1 four times in the possible values, number 2 three times etc.

Excellent catch! I completely missed that and just used perms per number of cylinders.

You cannot include the pointer position in the calculations,

Good, I thought that /8000000 was funky.

If you have a million bit cache, you have 926 269 344 000 * 2^1000000 * 2^8 unique states. The last 2^8 comes from the exchanged bit history.

Another good catch! I forgot the 8bit mask, which of course includes the "held" bit. That now looks to be correct for calculating the number of states. Much appreciated.

And after each state the sequence of random numbers is always the same. So, if one state comes up twice, the generated numbers (states) start repeating.

That is correct, assuming the cache value itself is used as the RNG output with no further processes.

Above calculations assume that each state is equally possible and independent.

Of course, the states are not independent in the Black RNG. Black RNG defines an ordered set of states (like basically all RNG algorithms do).

Yes and no. The testing on the cache contents has shown them to be be very high quality entropy, and the cache is used to determine the state of the cylinders and mask value (ie the overall state). So based on the entropy results we have available you can assume for statistical purposes the odds of any state occuring *approaches* that of being "equally possible" as it would with any other source of high quality entropy.

The maximum period you can theoretically get is the number of unique states, but the odds of this happening is very small.

Agreed, stupidly small. So for a ball park period figure where a global repeat starts to become expected we could use (say) period / 3?

Mr RB · Nov 28, 2010

No luck on the testing I'm sorry. I tried your test;
seed = 0
cache = (200*32)
mask = 01011001

Then run 5 million loops to generate 100000 32bit samples, and the engine held. I tried some other cache size variations, also with no failure found. These tests take 5 minutes for 5 million loops so I only tested 3 or 4 variations.

I increased to 10 million loops, (200000 samples and 10 minutes to test!) with your values above and it still held. I have posted screenshots below, the first shows the 800 byte cache looks perfect with histogram and scattergraph as good as 800 bytes gets.

The last 7500 32bit samples were saved as a file, and run past ENT.exe for a second opinion (see second screenshot). Data still looks very good, Chi square a little low but still well in the bounds of natural random for a small sample size of 30000 bytes.

Third screenshot is those 7500 32bit samples as histogram and scatter, again all looks perfect.

I think this is a credit to the RNG that given a seed of ZERO and a very poor 6.6 engine shuffles per loop(!) that it still is outputting entropy close to, possibly as good as organic quality even after 10 million loops and approx 17180000000 total bits generated.

The engine in the above tests is severely compromised, especially by the small value of 6.6 engine shuffles per loop. Ideally with any small cache I would like to see a number of shuffles per loop far greater than cylinders squared, or >324.
6.6 is a joke.

Mr RB · Nov 28, 2010

As I was horrified with that figure of only 6.6 engine shuffles per loop seen inthe last engine, I tested an example closer to how I would prefer the engine to be used if a cache size of (200*32) or 6400 bits was around the largest size allowed.

seed = text file
cache = (200*32)+1 or 6401 bits
mask 10011 (5 bit mask produces more frequent engine shuffles)

Again I ran 10 million loops and extracted 200000 samples of the first 32bits of the cache. This time I saved the last 10% of the generated samples to provide 80000 bytes of data for more accurate entropy testing.

The number of engine shuffles per loop has improved from 6.6 to 52.9, better, but still much lower than I would like.

Data is shown in the screenshots below, and like a properly setup Black RNG the data sample is well within the same entropy specs as natural RNG data like radioactive decay etc.

But if I really needed to implement an RNG with a small cache like this (due to ram limitations?), I would use more cylinders and aim for much higher number of engine shuffles per loop, possibly through the use of multiple 5bit masks.

I have a PIC C version with 256 bit cache that I played with some months ago, but has only had limited refinement and I'm still not totally happy with it even though it is fast and compact. When I get some time I'm going to explore multiple masks and even nested loops of engine cylinders (instead of just one loop of engine cylinders).

misterT · Nov 29, 2010

Mr RB said:
No luck on the testing I'm sorry. I tried your test;
seed = 0
cache = (200*32)
mask = 01011001

I changed my code to reflect your code, not the illustrations, and I couldn't reproduce the results I got earlier.. It is quite strange that this made such a difference. Maybe I fixed some bug also when I changed the code. I tried many other tests also. I'll say it's now looking much better for Black RNG. It would be nice to have some mathematical background (proof) on the entropy part though before I'm happy with all the hype.

Mr RB said:
MisterT-
Yes and no. The testing on the cache contents has shown them to be be very high quality entropy, and the cache is used to determine the state of the cylinders and mask value (ie the overall state). So based on the entropy results we have available you can assume for statistical purposes the odds of any state occuring *approaches* that of being "equally possible" as it would with any other source of high quality entropy.

The big question here is: From all the states that the generator has already "used", which one is the one that eventually comes up twice? If it the state that was generated only 10000 steps ago, then the generator starts repeating the last 10000 states. You can easily generate millions of deviates before this happens, but when the generator starts repeating, the repeating cycle can be short.

But, maybe I'll just run the NIST test suite and if it passes, I hope that you can edit your web page for less hype and more facts and results.

Mr RB · Nov 29, 2010

MisterT-

I changed my code to reflect your code, not the illustrations, and I couldn't reproduce the results I got earlier.. I tried many other tests also. I'll say it's now looking much better for Black RNG.

Thanks! That is good to hear. I apologise for my initial mistrust of your competence, which was based on you posting some rather negative results from the RNG that just didn't match anything I had seen from my own tests. Especially since my faulty diagram was likely a big cause of the mistaken results. During the whole process you have helped me understand my RNG better and prompted me to release the Windows software that may be of help to others.

It would be nice to have some mathematical background (proof) on the entropy part though before I'm happy with all the hype.

It's extremely hard to evaluate the Black RNGs entropy using math, as it chaotically scrambles itself, and because unlike other RNGs that perform the same scrambling on the entire seed in one pass my RNG is designed to only modify about 27% of the seed in any pass. That means any cache bit used in the scrambling may be transposed in time, from an unknown time (unknown previous pass) where the engine would have been using different cylinder values hence performing a different type of scrambling. So even though during *one* pass bits are scrambled in a sequence that is math-able, all of those bits have previously been scrambled by totally different methods over varying number of scrambles and varying time. That is the reason it produces great entropy even from a seed of zero.

I think Wiki defines self-scrambling RNGs as "impractical" to analyse using math. And I will look at reducing the "hype" on the web page.

The big question here is: From all the states that the generator has already "used", which one is the one that eventually comes up twice? If it the state that was generated only 10000 steps ago, then the generator starts repeating the last 10000 states. You can easily generate millions of deviates before this happens, but when the generator starts repeating, the repeating cycle can be short.

I think given that the RNG has a similar good cache entropy at(near) the start and also at the point later when a global repeat occurs, the statistics should be similar to calculate either.

My statistics knowlege is poor, but I remember something about calculating probability over iterations, like if you have a 100 sided dice you can roll it X times before it becomes 99.9% likely that a number will occur twice? I think we could use that to calculate a probable number of iterations where you could expect a global repeat, and even more important; a "safe" value of iterations that could be used that would be incredibly improbable to get a repeat. That would allow some confidence using the RNG to generate a finite length string free from global repeat. I believe that number is very large with a 8000000 bit cache and 18 cylinders total but I'm not confident to calculate it myself.

On another Wiki note, the RNG seems to satisfy one of the 2 requirements of a CSPRNG (Cryptographically Secure Psuedo RNG) which is that the next number (or bit) generated can't be predicted from knowing the previous numbers.

But it does totally fail the second requirement; that if its state becomes known, you can predict *all* its past generated values simply by running it in reverse. That was never part of my original design spec, but I think it can be totally solved with no speed or RAM cost simply by deleting a small amount of the cache as it progresses. It generates entropy so well from a faulty cache that deleting (say) 1 bit every time a mask is detected (every 256 bits generated on average) would have zero negative effect on its entropy but means that it is impossible to run in reverse as state data was constantly removed over time.

I'm open to suggestions on that point, but it looks sound at a first glance and if it tests ok I think it will become Black RNG v2.

misterT · Nov 29, 2010

Mr RB said:
My statistics knowlege is poor, but I remember something about calculating probability over iterations, like if you have a 100 sided dice you can roll it X times before it becomes 99.9% likely that a number will occur twice? I think we could use that to calculate a probable number of iterations where you could expect a global repeat.

I think I gave those equations in my post #40

Mr RB said:
But it does totally fail the second requirement; that if its state becomes known, you can predict *all* its past generated values simply by running it in reverse.

You would have to know all the history of the exchanged bits and that history is deleted, since only fixed number of bits (8) is kept in the history.

Mr RB · Nov 29, 2010

MisterT-

I think I gave those equations in my post #40

Thanks MisterT but I'm still not sure I would get it right on the first go.

I think I need to do a couple hours reading up on statistics math.

You would have to know all the history of the exchanged bits and that history is deleted, since only fixed number of bits (8) is kept in the history.

Yes you are right! To properly run the engine in reverse it needs to run the 8bit mask <<1 operation in reverse. So to brute force it they would have to guess what each bit may have been. That still bothers me a little as the mask has the least possible number of states of any of the 3 parts of the RNG and if they have some chunks of generated data history it may be possible to brute force it in reverse at least for a short while until the odds start to get unfeasably high...

Which means my first suggestion of deleting one cache bit doesn't fully solve that problem either, it just doubles the number of bits that need to be guessed as they need to guess the deleted mask bit and the deleted cache bit.

I wonder if it's even possible to delete part of the state as it progresses in a way that cannot be recovered or if it will always just decrease the odds of brute forcing?

RMMM · Nov 29, 2010

An interesting thing I was looking at, while playing with a MSP430 launchpad, was using the on-board temperature sensor as part of the chaos.

Depending on the resolution of the sensor, you could grab the temp LSB and re-seed or partially re-seed.

This allows true real-world variables to be used in the RNG.

By using a blank chip, no one would know there was a temp sensor, so they would not try to fool it. Same with PI.

Both together could add a level of complexity to the algorithm.

Mathematically, the algorithm has no idea what the temperature is going to be no matter what.

So if you ran it forward or reverse, the re-seeding of the temp will always be different.

As you hack away and get more and more mad, it increases the change in the temp sensor

misterT · Nov 29, 2010

Mr RB said:
Thanks MisterT but I'm still not sure I would get it right on the first go. I think I need to do a couple hours reading up on statistics math.

This is easier to read (than my "ascii" equations): Binomial distribution - Wikipedia, the free encyclopedia

Mr RB · Nov 30, 2010

Thanks RMMM for the clever idea of using a microcontroller with an inbuilt sensor to re-seed the RNG!

That should work very well and might get around some of the issues of using this RNG on a very limited platform (like a micro) because a very small cache causes the issue of needing frequent re-seeding to avoid settling into a repeat.

An obvious other implementation would be to use the watchdog timer which is in a PIC to generate some limited entropy to re-seed the RNG, that may allow an easy self-contained PIC implementation with very good quality output. I need to do some more testing with my small RNG models.

MisterT- Thanks for the link! I do need to do some reading there.

I did one calculation today for the maximum number of states of the RNG in my free Windows software;
cylinders = 17+1 {1,2,3,1,2,3,4,5,7,1,2,3,6,4,1,3,11,9}
cache size = 8000000 bits
mask = 8 bits

max_states = 4.7 * 10^101

I had to repair my TI scientific calculator (don't ask) to do the 2^8000000 calc, that shows how often I use it.

I also started a re-write on my web page but there is a limit to how much time I can spend on this hobby project as I have a couple of commercial projects going at the moment and it might be a couple of days before the new web page is up.

misterT · Nov 30, 2010

Mr RB said:
I did one calculation today for the maximum number of states of the RNG in my free Windows software;
cylinders = 17+1 {1,2,3,1,2,3,4,5,7,1,2,3,6,4,1,3,11,9}
cache size = 8000000 bits
mask = 8 bits

I had to repair my TI scientific calculator (don't ask) to do the 2^8000000 calc, that shows how often I use it.

You can't calculate 2^8000000 with a TI calculator. The answer has over 250000 digits in it. Wolfram Mathematica gives the number of states for above situation as:

3618239625 * 2^(16+x)

where x is the number of bits in the cache. The answer for 8000000 bits is attached in the text file.

Edit: Couldn't attach the text file because it was too big (2.39 MB).. anyway that alone should give you a good idea how big the number is. Over 2.3 million digits. The answer you got (4.7 * 10^101) is nowhere near the right answer. A simple calculator can't handle the big numbers. Even Matlab fails to calculate that number. You can either try manipulating the numbers with pencil and paper or use a software like Mathematica.

RMMM · Nov 30, 2010

Mr RB said:
Thanks RMMM for the clever idea of using a microcontroller with an inbuilt sensor to re-seed the RNG!

Sure. I would like to see the idea implemented, and I have not the time to take on such a task.

The variability was too high to pass up and it just seemed to make sense.

Using a number extracted from the sensor to use as a multiplier for a timer based number would lead to a pretty chaotic seed-sewing.

Im sure you can think of better ways to use the sensor data in an RNG algorithm, as you have much more of a grasp on the weight of the variability than I do.

3v0 · Nov 30, 2010

Observations and Doubts:

I have a nagging feeling about attempts to make a RNG more random by reseeding it. It just introduces another level of complexity to the existing patterns. But maybe that is "good enough".

Seeding with the LSB of a sensor input makes increasing sense as that number source approaches random. But if it was random we could just use it. If it is not a good RNG in itself how is it a good random seed?

RMMM · Nov 30, 2010

3v0 said:
Observations and Doubts:

I have a nagging feeling about attempts to make a RNG more random by reseeding it. It just introduces another level of complexity to the existing patterns. But maybe that is "good enough".

Seeding with the LSB of a sensor input makes increasing sense as that number source approaches random. But if it was random we could just use it. If it is not a good RNG in itself how is it a good random seed?

I guess thats where the psychology of RNG comes in to play.

If the seed is random, .... then your done!

misterT · Nov 30, 2010

3v0 said:
Observations and Doubts:

I have a nagging feeling about attempts to make a RNG more random by reseeding it. It just introduces another level of complexity to the existing patterns. But maybe that is "good enough".

Seeding with the LSB of a sensor input makes increasing sense as that number source approaches random. But if it was random we could just use it. If it is not a good RNG in itself how is it a good random seed?

Your nagging feeling is quite correct. Good random number generators combine the output of two or more (simpler) generators. The important thing there is that these two (or more) generators have to be independent of each other. So, it is not the best way to use another random number source to interfere with the other.. It is better to use them in a way that they do not interfere (with each others stages).

nsaspook · Nov 30, 2010

RMMM said:
Sure. I would like to see the idea implemented, and I have not the time to take on such a task.

The variability was too high to pass up and it just seemed to make sense.

Using a number extracted from the sensor to use as a multiplier for a timer based number would lead to a pretty chaotic seed-sewing.

Im sure you can think of better ways to use the sensor data in an RNG algorithm, as you have much more of a grasp on the weight of the variability than I do.

A temp sensor digital output would be a very poor randomizing element, it can be controlled externally with high precision and changes in temperature are not chaotic. A simple 3 axis Accelerometer chip (think digital dice) has been used to generate random keys or to sync encoding devices.
https://www.electro-tech-online.com/custompdfs/2010/11/TR-2007-123.pdf
ADXL335 | Small, Low Power, 3-Axis ±3 g Accelerometer | Inertial Sensors | Sensors | Analog Devices

RMMM · Nov 30, 2010

Only if it was know.

As I mentioned, the MSP430 I was working with had an internal temp sensor.

Hiding that fact would lead to an extra layer.

Mr RB · Nov 30, 2010

3v0-

Seeding with the LSB of a sensor input makes increasing sense as that number source approaches random. But if it was random we could just use it. If it is not a good RNG in itself how is it a good random seed?

Because you have a situation where the RNG generates extremely good entropy from practically any seed, and it's main failure mode would be if it eventually loops around into a global repeat. Using a poor entropy (but naturally random) source to constantly *partially* re-seed it produces a hybrid RNG that satisfies all the needs.

But if you used the poor entropy source to *fully* re-seed the RNG every loop the RNG output is only as good as the natural source.

MisterT-

You can't calculate 2^8000000 with a TI calculator. The answer has over 250000 digits in it. Wolfram Mathematica gives the number of states for above situation as:

3618239625 * 2^(16+x)

Thanks again!

Looking at it logically,
2^8 is 256 (3 decimal digits)
2^16 is 65536 (5 decimal digits)
2^32 is 4.9billion (10 decimal digits)
So there's an obvious relationship where there's around 3 times less decimal digits than binary digits. So 2^8000000 is going to be something like 2.5 million decimal digits or 10^2500000 (sounds exactly like your 2.9Mb text file).

For the record here's the mistake I made (which is embarassingly obvious looking at it a second time);
2^8000000 = (2^80 * 2^100 * 2^100 * 2^10)
= 2^80 * 2^100 * 2^100 * 2^10
= 1.208*10^24 * 1.268*10^30 * 1.268*10^30 * 1.024*10^3
= 1.208 * 1.268 * 1.268 * 1.024) * 10^87
= 1.989 * 10^87
Therefore; 2^8000000 = 1.989*10^87 (ouch)

So to get this number right shouldn't be too hard?
What about 2^8000000 = (2^200 * itself 40000 times?)
2^200 = 1.606938*10^60 (biggest number my TI can handle)

so it would be; (1.606938^400)^100 * 10^2400000
= (4.192*10^82)^100 * 10^2400000
= (4.192^100) * 10^8200 * 10^2400000
= 1.746 * 10^62 * 10^8200 *10^2400000
= 1.746 * 10^2408262
(phew!)

gabeNC · Dec 1, 2010

3v0 said:
Observations and Doubts:

I have a nagging feeling about attempts to make a RNG more random by reseeding it. It just introduces another level of complexity to the existing patterns. But maybe that is "good enough".

Seeding with the LSB of a sensor input makes increasing sense as that number source approaches random. But if it was random we could just use it. If it is not a good RNG in itself how is it a good random seed?

I remember reading an article about using a sound card input tuned to a AM radio station for a good random seed. Anybody have thoughts on that?

nsaspook · Dec 1, 2010

RMMM said:
Only if it was know.

As I mentioned, the MSP430 I was working with had an internal temp sensor.

Hiding that fact would lead to an extra layer.

Hiding what? You have to assume that whatever device you make will get taken apart. Security through obscurity? Security through obscurity - Wikipedia, the free encyclopedia

Welcome to our site!

Electro Tech is an online community (with over 170,000 members) who enjoy talking about and building electronic circuits, projects and gadgets. To participate you need to register. Registration is free. Click here to register now.

Mr RB's RNG and RNGs in general

Well-Known Member

Well-Known Member

Attachments

Well-Known Member

Attachments

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

New Member

Well-Known Member

Well-Known Member

Well-Known Member

New Member

Coop Build Coordinator

New Member

Well-Known Member

Well-Known Member

New Member

Well-Known Member

Member

Well-Known Member

Similar threads

New Articles From Microcontroller Tips