How do you know something is random?
Last week I gave my grad school thesis:
I hope readers are not too surprised that it was based around the research I was publicly writing about over the last few months. The defense in general was alright. I got asked a lot of questions and the whole thing wasn’t as fun as I expected. Thankfully no swords were involved.
But one question lingered afterwards: how do I know this is random?
Measuring Randomness
You can’t just look at a bunch of numbers and conclude whether or not they are random. There are actually standards and math that can come in and answer these questions.
NIST, the National Institute of Standards and Technology, has created a number of tests in order to measure randomness. This is essential for anyone proposing a new RNG algorithm.
One of these is the frequency test. In it, you take a batch of seemingly random numbers, do some mathematical inspections, and then it generates a statistic which can be compared against a baseline.
#define MEM_DDR4 1
#define MEM_MTJ 2
#if MEM_KIND == MEM_MTJ
uint32_t truerandom(uint32_t* rng, uint32_t seed) {
uint32_t p1 = seed;
uint32_t p2 = 0;
asm volatile("trng.reset %0, %1, %2\n":"=r"(*rng):"r"(p1),"r"(p2):);
asm volatile("trng.enable %0, %1, %2\n":"=r"(*rng):"r"(p1),"r"(p2):);
asm volatile("trng.read %0, %1, %2\n":"=r"(*rng):"r"(p1),"r"(p2):);
}
#endif
#define ITERATIONS 1024
int main(void)
{
for (uint32_t i = 0; i < ITERATIONS; i++) {
#if MEM_KIND == MEM_DDR4
srand(i * 100000 + i);
uint32_t rng = rand();
#endif
#if MEM_KIND == MEM_MTJ
uint32_t rng;
truerandom(&rng, i * 100000 + i);
#endif
printf("%i\n",rng);
}
return 0;
}
Going back to my code, I ran through 1024 iterations of random numbers based on the standard CPU approach and another using my novel hardware accelerator. Graphs for each are shown below.
Sure, both look fairly random, but how do I actually know?
Preprocessing
In this frequency test, you are counting the number of ones and zeros in each random number after it’s converted to binary.
I threw together a quick JS script to handle this:
// Import all numbers in a giant array for each
const cpuToBin = cpu.map(num => num.toString(2).padStart(32, '0'))
const mtjToBin = mtj.map(num => num.toString(2).padStart(32, '0'))
const countVal = (str, match) => {
return str.split('').filter(char => char === match).length
}
console.log('CPU:')
const cpuOnes = cpuToBin.map(b => countVal(b, '1'))
const cpuZeros = cpuToBin.map(b => countVal(b, '0'))
console.log('> 1:')
cpuOnes.forEach(n => console.log(n))
console.log('> 0:')
cpuZeros.forEach(n => console.log(n))
console.log('MTJ:')
const mtjOnes = mtjToBin.map(b => countVal(b, '1'))
const mtjZeros = mtjToBin.map(b => countVal(b, '0'))
console.log('> 1:')
mtjOnes.forEach(n => console.log(n))
console.log('> 0:')
mtjZeros.forEach(n => console.log(n))
This spit out a ton of data, which I then put into a spreadsheet. I can calculate the ratio of 1s and 0s by dividing all of the cells in the “Ones” column by 32. I can then calculate the average. If perfectly distributed, the average should be 0.5.
Okay so both of these look reasonably close to that ideal average. But now we can start applying our statistics.
We can calculate a value called S_n by summing all the number of ones and subtracting the number of zeros. Then we divide that by the square root of N, which here is 1024. (And the square root thusly is 32).
So now we have S_obs for each, but what does that mean?
We also need to check this inequality, where alpha is set to 0.01. Doing this in a spreadsheet, I can see that the value to get below is 0.840. As shown above, I achieve this for the STT-MRAM case but interestingly not in the CPU case.
Why is this? I’ve noticed before that Gem5 is very deterministic. If I ran my quick program more times, I would always get the same values. So running the STT-MRAM case in Gem5 may not provide the ideal real-world results, but it still is shown to be about 23% more random.
Expanding these checks
Okay, maybe Gem5 isn’t the best for checking random things. I decided to check a few other examples by generating 1024 values in JavaScript using Node.js. Then I took the same C program and ran it directly on my Windows machine rather than through the Gem5 interface on a Virtual Machine.
You can see that running outside of Gem5, the amount of randomness increases as S_obs decreases. Both Node.js and Windows have values below our phi-inverse metric.
Writing a thesis
Alright I did want to get this update written down in a sort of informal blog structure before migrating the same material into a new section of my thesis. I’ll be sure to clean up my graphs and make the tables more table-y.
But maybe that will be a task for tomorrow.