Why secure systems require random numbers
(Image Copyright (c) Walt Disney)
If you’ve been following recent news about technical spying by the US
National Security Agency and the UK’s Government Communications
Headquarters you may have come across a claim that the NSA was
involved in weakening a random number generator. The obvious question
to ask is… why mess with random number generation?
The answer is rather simple: good random numbers are fundemental to
almost all secure computer systems. Without them everything from
Second World War ciphers like
Lorenz to the SSL your
browser uses to secure web traffic are in serious trouble.
To understand why, and the threat that bad random numbers pose, it’s
necessary to understand a little about random numbers themselves (such
as “what is a good random number anyway?”) and how they are used in
secure systems.
A Hacker News Hack
As an example of how random numbers go wrong I’ll begin with a hack of
the popular programming and technology web site Hacker
News.
Four years ago I
mentioned on the site
that its random number generator was vulnerable to being used to
attack the site. Not long after, and entirely independently, another
contributor to the site actually carried out the
attack with the
permission of the site owner.
Here’s how it worked. When you log into a web site you are typically
assigned a unique ID for that session (the period you are logged
in). That unique ID needs to be unique to you and not guessable by
someone else. If someone else can guess it they can impersonate you.
In the case of Hacker News, the unique ID is a string of random
characters such as lBGn0tWMcx7380gZyrUO9B. Each logged in user has a
different string and the strings should be very, very difficult to
guess or figure out.
Pseudo-randomness
The IDs are generated internally using a pseudo-random number
generator. That’s a mathematical function that can be called
repeatedly to get apparently random numbers. I say apparently because,
as the great mathematician John von Neumann said: “Anyone who
considers arithmetical methods of producing random digits is, of
course, in a state of sin.” The computer scientist Donald Knuth tells
a story of inventing a pseudo-random number generator himself only to
be shocked at how poor it was.
Although pseudo-random number generators can generate a sequence of
apparently random numbers they have weaknesses.
von Neumann used a simple pseudo-random number generator called the
middle square
that works as follows. You start with some number (called a seed) and
square it. You take the four middle digits as your random number and
square them to get the next random number, and so on.
For example, if you chose 4181 as a seed the sequence 4807, 1072,
1491, 2230, 9279, … would be generated as follows:
Random number Its Square Middle digits
4181 17480761 4807
4807 23107249 1072
1072 1149184 1491
1491 2223081 2230
2230 4972900 9729
9279 94653441 6534
and so on
This particular pseudo-random number has long since been replaced by
better ones such as the Mersenne
Twister whose output
is harder to predict. The middle square method is trivial to predict:
the next number it generates is entirely determined by the number it
last produced. The Mersenne Twister on the other hand is much harder
to predict because it has internal state that it uses to produce
random numbers.
In the world of cryptography there are cryptographically secure
pseudo-random number generators
which are designed to be unpredictable no matter how many random
cnumbers you ask it to generate. (The Mersenne Twister isn’t
cryptographically secure because it can be predicted if enough of the
random numbers it generates are observed.)
For secure systems it’s vital that the random number generator be
unpredictable.
Starting With A Seed
And all pseudo-random number generators need to start somewhere; they
need to be seeded and that’s where Hacker News failed. The random
number generator was seeded with the time in milliseconds when the
Hacker News software was last started. By some careful work, the
attacker managed to make Hacker News crash and could then predict when
it restarted within a window of about one minute. From it he was able
to predict the unique IDs assigned to users as they logged in and
could, therefore, impersonate them. (Similar random number problems
enabled one group of people to cheat at online
poker.)
The full details of how the Hacker News Hack worked are
here. The attack worked
because once Hacker News crashed the attacker would wait for it to
start and note the current time. Amusingly, the Hacker News server was
willing to give out that information. The attacker then had 60s worth
of possible seeds (60,000 seeds since the seed was in milliseconds).
So, the attacker would log in and look at their own unique ID. It had
been generated by random numbers inside Hacker News’s software. He
then tried out each of the 60,000 seeds and ran the random number
generation algorithm used by Hacker News until he found a match with
his own unique ID. That told him which seed had been used, and it let
him keep generating further unique IDs by generating the same sequence
of random numbers that Hacker News was using. From that he could
predict the unique IDs given out to users as they logged in and he
could then impersonate them.
The Hacker News code was changed to use the Linux /dev/urandom source
of random numbers which means that today unique IDs are generated with
a good random number generator and without the weak seed previously
used.
So, there are two ways in which pseudo-random number generation can
fail: the seed could be bad or the algorithm itself could be weak and
predictable.
Random Numbers Everywhere
The Hacker News example isn’t about cryptography itself, but random
numbers are vital to cryptographic schemes. For example, any HTTPS
session starts as follows:
-
The web browser sends information to the server about which version
of SSL it wants to use and other information. -
The web server replies with similar information about SSL versions
and its SSL certificate. -
The web browser checks that the certificate is valid. If it is, it
generates a random ‘pre-master secret’ that will be used to secure the
connection.
After that further exchanges occur all based on the randomly chosen
pre-master secret. It needs to be unpredictable for the connection to
be secure.
Here’s part of how a computer using WiFi establishes a secure
connection to an access point using the popular
WPA2 protocol:
-
The access point generates a random
nonce and sends
it to the computer. -
The computer generates a random nonce and sends it to the access
point.
The access point and the computer continue on from there using those
random nonce values to secure the connection.
Similarly, random numbers turn up when logging into web sites (and
other systems), creating secure connections to servers using SSH,
holding Skype video chats, sending encrypted email and more.
And the Achilles’ Heel of the only completely secure cryptosystem, the
one-time pad is that the
pad itself must be completely randomly generated. Any predictability
or non-uniformity in the random numbers used can lead to breaking of a
one-time pad. (The other problem with one-time pads is reuse: they
must be used only
once.)
CloudFlare’s Random Number Source
At CloudFlare we need lots of random numbers for cryptographic
purposes: we need them to secure SSL connections,
Railgun,
generating public/private key pairs, and authentication systems. They
are an important part of forward
secrecy
which we’ve rolled out for all our customers.
We currently obtain most of our random numbers from either OpenSSL’s
random number generation system or from the Linux kernel. Both seed
their random number generators from a variety of sources to make them
as unpredictable as possible. Sources include things like network
data, or the seek time of disks. But we think we can improve on them
by adding some truly random data into the system, and, as a result,
improve security for our customers.
We’ve embarked on a project to further improve our random numbers by
providing a source of truly random numbers that don’t come from a
mathematical process. That can be done using things like radioactive
decay, the motion of
fluids, atmospheric
noise, or other
chaos.
We’ll be posting details of the new system when it’s online.
No comments yet.