How Random Number Generation Works in Computing

How Random Number Generation Works in Computing

Computers are deterministic machines. Given the same inputs and the same state, they produce the same outputs every time. That's what makes randomness hard — a program running on predictable hardware can't spontaneously produce unpredictable numbers without some external input. Grasping this tension is the key to using random numbers correctly.

Why "Random" Is Harder Than It Sounds

A deterministic processor executing a fixed sequence of instructions will always produce the same result. There's no quantum uncertainty in a for-loop. Any function that returns "random" numbers is either using an algorithm that mimics randomness (pseudo-random), or it's reading entropy from some genuinely unpredictable physical source (truly random).

The distinction matters enormously. For statistical simulations and shuffling a playlist, pseudo-random is fine. For generating a password or a session token, you need the real thing — and using a pseudo-random generator in a security context has caused serious vulnerabilities.

Pseudo-Random Number Generators (PRNGs)

A PRNG is an algorithm. It takes a seed (an initial value) and produces a sequence of numbers that pass statistical tests for randomness — they look random, but they're entirely determined by the seed. Give it the same seed twice, and you get the exact same sequence.

Linear Congruential Generators

The simplest PRNG is a linear congruential generator (LCG). The formula is:

X(n+1) = (a × X(n) + c) mod m

Where X(0) is the seed, a is the multiplier, c is the increment, and m is the modulus. The output cycles through at most m values before repeating. With well-chosen constants, the cycle length equals m.

def lcg(seed, a=1664525, c=1013904223, m=2**32):
    state = seed
    while True:
        state = (a * state + c) % m
        yield state / m  # normalize to [0, 1)

LCGs are fast and simple, but they have low-dimensional structure. If you plot pairs (X(n), X(n+1)) they fall on parallel lines. This periodicity is obvious to statistical tests — and to attackers. C's old rand() function used an LCG. Don't use LCGs for anything beyond toy examples.

Mersenne Twister

The Mersenne Twister (MT19937) is the default PRNG in Python (random.random()), Ruby, PHP, and many other languages. It was designed in 1997 and has a period of 2^19937 − 1 — far longer than you'll ever need. (Wikipedia's Mersenne Twister entry covers the algorithm and known weaknesses.)

MT19937 passes most statistical tests and is fast. But it's not cryptographically secure. After observing 624 consecutive 32-bit outputs, you can reconstruct the full internal state and predict all future outputs. This is a known attack.

import random
random.seed(42)
print(random.random())  # deterministic: 0.6394267984578837

For games, simulations, and procedural generation, Mersenne Twister is excellent. For anything security-related, it's dangerously wrong.

Cryptographically Secure PRNGs (CSPRNGs)

A CSPRNG is a PRNG with two additional properties: the output is computationally indistinguishable from true randomness (you can't tell the difference even with a powerful computer), and knowing the past output doesn't help you predict future output. The reference standard is NIST SP 800-90A Rev. 1, which specifies the deterministic CSPRNG constructions approved for federal cryptographic modules.

The source of security is entropy — genuine unpredictability harvested from the physical environment.

OS Entropy Pool

Modern operating systems maintain an entropy pool built from:

  • Hardware interrupt timing (keyboard, mouse, network packets arriving at irregular intervals)
  • CPU hardware RNG instructions (Intel's RDRAND, ARM's TRNG)
  • Disk I/O timing variations
  • Network packet arrival times

On Linux, this pool is accessible through two interfaces: /dev/random and /dev/urandom. The historic difference was that /dev/random blocked when the entropy estimate was low; /dev/urandom never blocked. Modern Linux (kernel 5.6+) has eliminated this distinction — both are now equivalent once the pool has been seeded at boot. The recommendation is to use /dev/urandom (or the getrandom() syscall, which is preferred in new code). The Linux `random(7)` man page documents the current behavior and the rationale.

#include <sys/random.h>

// Preferred on Linux 3.17+
unsigned char buf[32];
getrandom(buf, sizeof(buf), 0);  // fills buf with secure random bytes

On macOS, /dev/random is backed by a CSPRNG seeded from hardware entropy sources including the Secure Enclave (T2 chip on Intel Macs, Secure Enclave on Apple Silicon). On Windows, BCryptGenRandom is the system CSPRNG.

Hardware RNGs

CPUs since around 2012 include a true hardware random number generator. Intel's RDRAND instruction samples thermal noise from on-chip circuitry to produce random bits. This isn't a PRNG — it's sampling physical randomness directly.

Operating systems use RDRAND as one input to their entropy pool, but security-conscious implementations don't rely on it exclusively. An implementation flaw in RDRAND (or a malicious hardware design) would be undetectable without independent entropy sources. Defense in depth means mixing multiple sources.

Math.random() vs crypto.getRandomValues()

This is the JavaScript version of the PRNG vs CSPRNG choice, and it's one of the most important things to get right in web applications.

// DO NOT use for security purposes
Math.random()            // PRNG, implementation defined, not secure

// Use for security purposes
crypto.getRandomValues(new Uint32Array(1))[0]  // CSPRNG, uses OS entropy

// Or for convenience in modern environments
const { randomBytes, randomUUID } = require('crypto')  // Node.js
randomBytes(32).toString('hex')  // 64-char hex string, cryptographically secure
randomUUID()  // 'f47ac10b-58cc-4372-a567-0e02b2c3d479'

Math.random() is appropriate for: game mechanics, test data, procedural generation, random visual effects, randomly ordering UI elements. It is never appropriate for: session tokens, password reset links, CSRF tokens, cryptographic keys, or any value an attacker would benefit from predicting.

The V8 engine (used by Chrome and Node.js) implements Math.random() using xorshift128+ — a good statistical PRNG that runs fast, but with a known internal state that can theoretically be recovered from output.

Seeds and Why Predictable Seeds Are Dangerous

Every PRNG is initialized from a seed. If you control the seed, you control all the output. This is why predictable seeds are a vulnerability.

A classic mistake is seeding with the current time:

import random, time
random.seed(int(time.time()))  # predictable within a second
token = random.randint(0, 1000000)  # attacker can brute-force the seed

An attacker who knows approximately when your server generated a token can try all seed values in a small window (say, ±30 seconds = 60 attempts). If the token space is small, they find the token trivially.

The correct approach: don't manage seeds for security-sensitive code. Use the system CSPRNG, which seeds itself from the entropy pool automatically.

For generating random test data or picking sample items, the Random Line Picker works well for everyday use cases where statistical randomness is all you need. When your use case involves cryptographic operations, compare what we've covered here with Encoding vs Encryption vs Hashing to understand where random number generation fits in the broader security picture.

Quick Decision Guide

Use Math.random() / Mersenne Twister when:

  • You need speed and statistical uniformity
  • The application is a game, simulation, or visual effect
  • Predictability wouldn't give an attacker any advantage
  • You need reproducible results from a known seed (unit tests, procedural generation)

Use a CSPRNG when:

  • Generating any token, key, or identifier that must be secret
  • Building password reset flows, session management, or OAuth state parameters
  • Implementing any cryptographic primitive
  • When an attacker correctly guessing the output would cause harm

The CSPRNG is slightly slower — but on modern hardware, crypto.getRandomValues() generates millions of bytes per second. The performance difference only matters if you're generating gigabytes of random data, which is unusual. When in doubt, use the CSPRNG. The cost is negligible. The security benefit is real.

For more on how randomness connects to hashing and security — particularly why hash functions need to be unpredictable — see Public Key Cryptography Explained.

FAQ

Is `Math.random()` actually unsafe in 2026?

Yes — for any use case involving authentication, sessions, tokens, or identifiers. V8's xorshift128+ implementation produces statistically uniform output but the internal state can be recovered from observed output, so an attacker who sees a few generated values can predict subsequent ones. The 2018 V8 fix changed the algorithm but didn't change the safety profile. Use crypto.getRandomValues() for any value that an attacker would benefit from predicting.

What's the difference between `crypto.randomUUID()` and `crypto.randomBytes(16)`?

randomUUID() returns a Version 4 UUID string with hyphen formatting (f47ac10b-58cc-4372-a567-0e02b2c3d479) — 122 random bits with 6 bits used for version/variant markers. randomBytes(16) returns 16 bytes (128 bits) of pure randomness with no formatting. UUIDs are convenient as identifiers; raw bytes are right for tokens that don't need to be UUID-shaped. Both use the same CSPRNG underneath.

Should I worry about RDRAND being backdoored?

The Linux kernel community decided not to rely on RDRAND alone after Snowden-era concerns about hardware-level backdoors — the kernel mixes RDRAND output with other entropy sources, so a compromised RDRAND can't fully control the random pool. For application code, you don't interact with RDRAND directly; you use the OS CSPRNG, which already does defense-in-depth mixing. So no, you don't need to worry — but the kernel devs do.

How much entropy do I need for a secure token?

128 bits is the minimum for security-relevant tokens — that's the threshold below which brute force becomes feasible. Most production systems use 256 bits (32 bytes) for session tokens, API keys, and CSRF tokens. UUIDv4 has 122 bits, which is just below the 128-bit threshold but still considered secure for identifier-style use cases. For cryptographic keys, follow the algorithm's spec (typically 256 bits for symmetric keys).

Can I use the time as a seed for non-security randomness?

For test data, simulation reproducibility, or game replays, yes — using time() or a fixed integer as a seed for Mersenne Twister is fine and gives you reproducibility. For any security-related randomness, never seed manually; use the OS CSPRNG which seeds itself from real entropy. The danger is mixing the two: a random.seed(time.time()) followed by random.randint(...) for a "secure token" is one of the classic vulnerability patterns.

What's the performance cost of CSPRNG vs PRNG?

For typical workloads, negligible. crypto.getRandomValues() generates millions of bytes per second on modern CPUs (often 100MB/s+). Mersenne Twister is faster (often 500MB/s+), but the difference only matters when generating gigabytes — most apps generate kilobytes per request at most. The exception is high-throughput simulations and procedural generation, where the speed difference is noticeable.

How do I generate random integers in a specific range without bias?

The naive Math.floor(Math.random() * range) introduces modulo bias when the range doesn't divide evenly into the source space. For uniform integers in [0, max), use rejection sampling: generate a random number, reject and retry if it falls in the "extra" portion. crypto.getRandomValues() + rejection sampling is correct. Or use crypto.randomInt(min, max) (Node.js) which handles this correctly. The bias is small (often imperceptible for practical use) but real for cryptographic protocols.

Is true randomness from quantum sources actually different?

Yes, philosophically — quantum randomness is genuinely non-deterministic per current physics, while classical noise (thermal, atmospheric) is deterministic but practically unpredictable. For cryptography, this distinction is irrelevant: both are mixed into entropy pools that mask any individual source's properties. Quantum random number generators (sold as USB devices for ~$300+) provide one extra entropy input but don't make a CSPRNG more secure — modern CSPRNGs are already secure with classical entropy sources.