Get Firefox! "my blog doesn't just deal with my life, it deals with some important stuff too"

27th of July 2006

You are not a unique and special snowflake

You may very well be sharing your DNA with someone else

...But first, all about Rainbow Tables

I've been thinking a lot recently about big numbers. I'm not sure why, it's all sort of an introspective thing I suppose. I've always been fascinated by AI (and still hope to get a job involving me in AI at some point in the future) and, of course, part of AI is working out how to deal with massive amounts of data. The other day I started looking at passwords and cryptography. I got wondering "why 8 characters?" After-all, the general advice for a strong password is 8 characters minimum, but why 8? So I worked out an algorithm that could determine the approximate time needed to crack a password based on various parameters. It turns out that a government could probably crack a 7 character password in under a month, where as an 8 character password would take over a year (if they've got a lot of power available).

This led me to start looking at Rainbow Tables. Rainbow Tables are a form of time-memory trade-off hacking. The idea is like this:

A 1 way hash can turn a password into a hashed value from which it is not possible to determine the original starting password. However, if we take every single possible password, feed it through the hash and store the value then we can look the password up from the hash. The problem is that the amount of storage needed for this is vast.

Rainbow Tables can massively reduce the amount of storage needed. This is done by storing the passwords and hashes as chains. To do so, a second function known as a reduction function is generated which can take a given hash and generate a password from it. It's not a reverse of the original hashing function (if we had such a thing then we wouldn't need Rainbow Tables now would we?). This reduction function can then be used to generate the chains. This is now probably best explained with an example.

Say we start with the password ABCD. We apply the hashing function to it to get a hash of 1234.

ABCD --HASH--> 1234

Now we feed the hash into the reduction function to generate a new random password.

1234 --REDUCE--> M8UK

This new password is fed back through the hash function, and the resulting hash is fed back through the reduction function.

M8UK --HASH--> 6682

6682 --REDUCE--> YGA9

This is repeated some predefined number of times. From what I've read, the standard is 10,000 times.

YGA9 --HASH--> 9102

9102 --REDUCE--> A00C

A00C --HASH--> 0176

0176 --REDUCE--> HHUA

...[many iterations later]...

KPUH --HASH--> 9091

9091 --REDUCE--> FRIN

FRIN --HASH--> 7682

7682 --REDUCE--> BVCR

So we now have a chain which starts with the password ABCD and ends with the password BVCR. All the passwords and hashes generated by the algorithm to get this new password are discarded, and the start and end are stored in the database. How is this useful? Well we are storing only 1 entry in the database per 10,000 passwords. So we have massively reduced the need for storage. This might seem a little pointless at first sight as at no point are we storing the actual hashes of any of these passwords, but the point is we have the means to recreate those intermediate passwords and hashes.

As an example, let's say that we want to crack the hash 0176. This is fed through the algorithm that generates the chains and each password is tested against the rainbow tables. If a match is found then we can find the hash:

0176 --REDUCE--> HHUA [not in our table]

...[many iterations later]...

KPUH --HASH--> 9091

9091 --REDUCE--> FRIN [not in our table]

FRIN --HASH--> 7682

7682 --REDUCE--> BVCR [found in the table]

Having found BVCR in the table, we look up the start of the chain. Remember we stored ABCD alongside BVCR. Now we begin reconstructing the chain that led to BVCR until we find the password that, when hashed, provides us with the hash value we are trying to crack.

ABCD --HASH--> 1234 [not the hash we want]

1234 --REDUCE--> M8UK

M8UK --HASH--> 6682 [not the hash we want]

6682 --REDUCE--> YGA9

YGA9 --HASH--> 9102 [not the hash we want]

9102 --REDUCE--> A00C

A00C --HASH--> 0176 [matches the hash!]

And thus the password is cracked. To do this for the entire 8 character "strong password" space would take approximately 3 months if distributed over 40,000 dedicated PCs. That's a lot of passwords to hash.

As a part of all of this I also got thinking about something that occurred to me back in secondary school when I was doing Biology. DNA can be configured in any of 3 billion possible ways. That's 3,000,000,000 for those of you who can't imagine that in numerical form. Nine zeros; that's a big number. Obviously not every configuration is a good configuration, but there are certainly a lot nevertheless.

I remember when we were studying DNA, and this fact was mentioned, I thought to myself "3 billion is a very big number, it's no wonder DNA matching is used to catch criminals". And then something else occurred to me. There are approximately 6.5 billion people in the world. 3 billion possible configurations of DNA, 6.5 billion people. Now I'm no mathematical genious, but even I can see that this means there have to be duplicates out there. In fact it's very likely, from a statistics point of view, that there is at least one other person in the world who shares my DNA. There's potential for "the perfect crime" in there somewhere.

And this brings me all the way back to the title of this blog. "You are not a unique and special snowflake." And it's so very true, you're not. There's a very real chance that someone, somewhere, has the same DNA as you.

Blog #604, posted at 13:38 (GMT)