In [2]:
ioc("the quick brown fox jumps over the lazy dog") * 26 ioc(os.urandom(2 ** 15)) * 256
Out [2]:
1.7543084
Out [2]:
0.9962496
Index of coincidence is a metric that can be used to measure how evenly-distributed (or “random”) the letters of a given text is.
Algorithm
The metric calculates how likely you are to pick the same character if you pick two random characters from the text. An easy way to calculate this is to pick two random characters and count how many of them are identical.
Example
In [3]:
lipsum = """ Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla tempus convallis accumsan. Suspendisse ac euismod lectus. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Aliquam sit amet urna accumsan, porttitor sem id, aliquam sapien. Morbi ullamcorper erat eget auctor viverra. Nam rutrum eget justo id fermentum. Proin accumsan condimentum dolor, non bibendum lorem placerat quis. Sed ultrices, nisl non varius feugiat, purus eros faucibus quam, vel laoreet quam turpis laoreet nibh. Etiam tincidunt massa volutpat ligula pharetra faucibus. Curabitur malesuada erat orci, ac aliquet nibh vestibulum tincidunt. Nulla gravida erat neque, tristique aliquet erat egestas at. Vivamus tristique tristique nisl, convallis faucibus nisi dignissim sed. Cras id erat sed sapien rutrum imperdiet. Maecenas vestibulum mi libero, non iaculis nunc viverra sed. Donec massa felis, tincidunt at ligula ut, ultrices pulvinar libero. Aenean lectus ipsum, porta sit amet sapien quis, fermentum non.""".replace("\n", "")
In [4]:
same = 0 total = 0 for _ in range(5_000_000): c1 = random.choice(lipsum) c2 = random.choice(lipsum) if c1 == c2: same += 1 total += 1
In [5]:
same / total
Out [5]:
0.0641662
In [6]:
same / total * 26
Out [6]:
1.6683212000000003
In [7]:
rand_bytes = list(os.urandom(2 ** 14)) rand_bytes[:10]
Out [7]:
[250, 198, 255, 162, 148, 14, 127, 254, 85, 202]
In [8]:
same = 0 total = 0 for _ in range(5_000_000): c1 = random.choice(rand_bytes) c2 = random.choice(rand_bytes) if c1 == c2: same += 1 total += 1
In [9]:
same / total
Out [9]:
0.0039766
In [10]:
same / total * 256
Out [10]:
1.0180096