How QR Codes Work: From Data to Scannable Image and Back

How QR Codes Work: From Data to Scannable Image and Back

QR codes are everywhere — restaurant tables, product packages, conference badges, bus stops. They look like noise, but there's a precise structure underneath that lets any smartphone camera read them reliably, even when they're torn, dirty, or at an angle. Here's how they actually work.

A Brief Origin

QR code stands for "Quick Response." Denso Wave, a Toyota subsidiary, invented them in 1994 for tracking car parts during manufacturing. The goal was a code that could store more data than a barcode and be read at high speed from any direction. Denso Wave holds the patent but released it royalty-free, which is why QR codes spread without licensing friction. The standard is now ISO/IEC 18004.

The Anatomy of a QR Code

A QR code is a square grid of black and white modules (the individual squares). It's not random — every region has a defined purpose.

Finder patterns: The three large squares in the corners (top-left, top-right, bottom-left) are finder patterns. They're a 7×7 black square inside a white border inside another black border. Scanners look for this 1:1:3:1:1 ratio of black/white/black/white/black — the ratio stays constant no matter what angle the code is scanned from, making the orientation easy to detect. Wikipedia's QR code overview has helpful diagrams showing every region of a typical code.

Timing patterns: Alternating black-and-white rows/columns that run between the finder patterns. They tell the scanner how large the modules are and help it align to the grid.

Alignment patterns: Smaller squares that appear in larger QR codes (Version 2+). They help scanners handle distortion when the QR code is printed on a curved surface or scanned at an angle.

Format information: Stored near the finder patterns, this encodes the error correction level and mask pattern used — information the scanner needs before it can decode anything else.

Data modules: Everything else. This is where your actual data lives.

Data Encoding Modes

QR codes have four encoding modes, and the scanner/encoder picks the most efficient one for your content:

  • Numeric: Only digits (0–9). Most compact — encodes 3 digits per 10 bits.
  • Alphanumeric: Digits, uppercase A–Z, and a small set of symbols (space, $, %, *, +, -, ., /, :). Encodes 2 characters per 11 bits.
  • Byte: Any 8-bit data, typically UTF-8. One character per 8 bits.
  • Kanji: Double-byte characters from the Shift JIS encoding. 13 bits per character.

A URL like HTTPS://UTILITYKIT.TOOLS can be encoded in alphanumeric mode (URLs only use uppercase letters and allowed symbols). But if your URL has lowercase letters or query parameters, byte mode is needed. Some encoders convert to uppercase automatically for URLs — a small optimization that reduces the code's visual density.

Reed-Solomon Error Correction

Reed-Solomon error correction is why damaged QR codes still scan. It adds redundant data so the scanner can reconstruct the original even if part of the code is obscured or damaged.

QR codes offer four error correction levels:

Level Data recovery capacity
L ~7%
M ~15%
Q ~25%
H ~30%

Higher error correction means a larger QR code (more modules needed to store the redundant data), but more resilience to damage. This is why logos can be placed in the center of QR codes — the H level can absorb the loss of up to 30% of the modules, and careful placement keeps critical data outside the logo area.

Reed-Solomon coding is also used in CDs, DVDs, Blu-rays, and space communication — it's a mature error correction algorithm originally developed by Irving Reed and Gustave Solomon in 1960.

QR Code Versions

QR codes come in 40 versions. Version 1 is a 21×21 grid. Each subsequent version adds 4 modules per side: Version 2 is 25×25, Version 3 is 29×29, up to Version 40 at 177×177.

Larger versions store more data. Version 1 can hold about 41 numeric characters, 25 alphanumeric characters, or 17 bytes (at error correction level L). Version 40 can hold up to 7,089 numeric characters or 4,296 alphanumeric characters. For a typical URL, Version 3–5 is usually sufficient.

When you use a URL shortener before generating a QR code, you're reducing the payload size and allowing the encoder to pick a smaller version with fewer modules — which makes the code easier to scan at small sizes.

How Scanners Decode

When your phone scans a QR code, the camera software is doing this:

  1. Locate the finder patterns — find the three 1:1:3:1:1 ratio markers.
  2. Determine orientation and perspective — compute the transformation needed to "flatten" the code if it's at an angle.
  3. Read format information — determine error correction level and mask pattern.
  4. Apply the mask inverse — QR codes apply a mask to the data to prevent large areas of solid color that could confuse the scanner.
  5. Read data modules — extract the bit sequence from the data region.
  6. Apply error correction — use Reed-Solomon to fix any errors.
  7. Decode — interpret the bit stream according to the encoding mode.

The whole process typically takes under a second on modern hardware.

URL QR Codes vs. Other Data Types

QR codes are just encoded data — the content could be anything. Common uses:

  • URLs: The most common. Most phones automatically offer to open the link.
  • Plain text: Just text, no action triggered.
  • vCard/meCard: Contact information. Apps can offer to add to contacts.
  • WiFi credentials: WIFI:T:WPA;S:NetworkName;P:password;; — phones can connect automatically (the format is documented on the Wi-Fi Alliance reference).
  • Email/SMS: Pre-filled email addresses or messages.
  • Calendar events (vCal): Scannable event invites.

The "type" isn't encoded separately — apps infer it from the content format. A QR code starting with https:// is a URL; one starting with BEGIN:VCARD is a contact. There's no QR code standard for "this is a URL" — the URL format itself is the signal.

Optimization Tips

If you're generating QR codes for print materials or small displays:

  • Use a URL shortener for long URLs — fewer characters means a smaller version and less visual complexity.
  • Use error correction level L or M unless the code will be in a rough environment. Higher error correction = larger code.
  • Add quiet zone — the white border around the code. Minimum 4 modules. Without it, scanners can't locate the finder patterns reliably.
  • Maintain contrast — black on white is ideal. Colored codes work but increase scan failure rates. Avoid light-on-dark.
  • Test at the intended print size — a Version 5 code printed at 1cm is practically unscannable.

For more on encoding and how data transforms between representations, Encoding vs. Encryption vs. Hashing is a good companion read. The Number Systems Explained post also covers the binary and base representations underlying data encoding.

The QR Code Generator tool generates codes for any text or URL and lets you download them as PNG. The URL Encoder is useful for cleaning up URLs before encoding — percent-encoding special characters ensures the URL inside a QR code is valid.


QR codes are a rare example of a 1990s industrial barcode format that found mass consumer adoption 25 years later. The engineering decisions that made them robust enough for factory use — error correction, orientation-independent scanning, version scalability — turned out to be exactly what makes them practical on restaurant tables.

FAQ

What's the maximum amount of data a QR code can hold?

Version 40 (the largest) at error correction level L can hold up to 7,089 numeric digits, 4,296 alphanumeric characters, 2,953 bytes (UTF-8), or 1,817 Kanji characters. In practice, anything beyond a few hundred characters produces a code with so many modules that it's unscannable at typical print sizes. Best practice: keep payloads under 300 characters and use a URL shortener for long links.

Why are some QR codes colorful or have logos in the middle?

The high error correction levels (Q at 25%, H at 30%) let you obscure or replace up to 30% of the modules without breaking decode. Logo-in-center QR codes typically use level H and place the logo in a region that doesn't overlap finder or alignment patterns. Colored codes need maintained contrast — black-on-white is most reliable, dark-on-light works, but light-on-dark and busy patterns frequently fail to scan.

How does my phone scan a QR code so quickly?

Modern phones run a continuous detection algorithm at ~30 fps. The camera frames are analyzed for the 1:1:3:1:1 black-white ratio of finder patterns; once two or three are found, the code is identified and decoded in milliseconds. Apple's Vision framework and Google's ML Kit both use neural networks to handle damaged or rotated codes. The whole process from "code in viewfinder" to "URL opened" is usually under 200ms.

Yes — QR codes are just encoded URLs, and a malicious URL is still a malicious URL. "Quishing" (QR phishing) attacks use printed codes pasted over legitimate ones in public places, redirecting users to fake login pages. Modern phone scanners show the URL before opening, but users frequently dismiss this. For sensitive contexts (parking meters, payment terminals), prefer typed URLs or app-specific scanners that validate the destination.

What's the difference between QR Code and Aztec, Data Matrix, MaxiCode?

QR Code is the most common consumer-facing 2D barcode. Data Matrix is smaller and used heavily in industrial labeling and small-package marking (medication, electronics components). Aztec is used in transit (airline boarding passes, train tickets) because it doesn't need a quiet zone. MaxiCode is UPS-specific. They all use similar finder-pattern + error-correction approaches; QR won the consumer space because of mobile camera ubiquity.

Why do WiFi QR codes use the format `WIFI:T:WPA;S:network;P:password;;`?

That's a non-standard convention popularized by Android and adopted by iOS. There's no formal spec — Denso Wave's QR standard says nothing about WiFi credentials. The format works on iOS 11+ (2017) and Android 10+ (2019). The fields are: T = security type (WPA/WEP/nopass), S = SSID, P = password. The trailing ;; is a literal end marker. Always test before printing because some older devices reject malformed strings.

Should I use error correction level H for outdoor signage?

Yes if the surface might get damaged, dirty, or weathered — H tolerates ~30% module loss. The downside is a larger code (more modules for the redundant data), so you need more printed area to maintain scannability. For indoor laminated signage that won't be damaged, level M (15%) is plenty and produces smaller codes. Level L (7%) is fine for digital displays where the code is always pristine.

How does the masking pattern work?

Before applying error correction, QR encoders apply one of 8 mask patterns (XOR overlays) to the data region. The pattern that produces the best balance of black/white modules — avoiding large solid blocks that would confuse the scanner — is selected automatically. The chosen pattern is encoded in the format information area. Scanners reverse the mask using the format info before decoding the data. This is why two QR codes with identical content can look slightly different — different mask choices.