What YAML Is and Where You'll Find It
YAML (YAML Ain't Markup Language — yes, it's a recursive acronym) started as a data serialization format designed for human readability. The first spec was published in 2001 and the official YAML site hosts the canonical specs. It was initially pitched as a general-purpose alternative to XML, but it found its real home in configuration files.
You've almost certainly written YAML if you've used any of these:
- Docker Compose —
docker-compose.ymldefines services, volumes, and networks - Kubernetes — every manifest is YAML: Deployments, Services, ConfigMaps (see the Kubernetes API docs)
- GitHub Actions —
.github/workflows/*.ymldefines CI/CD pipelines, documented at GitHub Actions - Ansible — playbooks and inventory files
- Ruby on Rails —
database.yml,config/application.yml - Jest / ESLint / many Node.js tools — accept YAML config files
The pitch: config files are written by humans and should be readable by humans. YAML delivers on that in simple cases. It's when things get complex that the quirks start to bite.
The Indentation Rules
YAML uses indentation to represent structure. There are no braces or brackets in block style — the nesting is entirely determined by whitespace. This rule is absolute:
Tabs are forbidden. YAML only allows spaces for indentation. Mixing tabs and spaces causes a parse error. Most editors configured for YAML convert tabs to spaces automatically, but if you're copying from somewhere that uses tabs, expect cryptic errors.
The indentation level must be consistent within a block, but can change between blocks. Conventionally two spaces per level is standard.
database:
host: localhost
port: 5432
credentials:
username: app_user
password: secret
This is a mapping (YAML's term for key-value dictionary) nested three levels deep. Each level adds two spaces. The same structure in JSON would be:
{
"database": {
"host": "localhost",
"port": 5432,
"credentials": {
"username": "app_user",
"password": "secret"
}
}
}
Sequences (arrays) use a hyphen-space prefix:
services:
- web
- worker
- scheduler
And you can mix them:
servers:
- host: web1.example.com
port: 80
tags:
- primary
- loadbalanced
- host: web2.example.com
port: 80
Data Types and the "Norway Problem"
YAML infers types automatically. Convenient, and also responsible for some legendarily painful bugs.
The basic types work as you'd expect:
name: Alice # string
age: 30 # integer
score: 9.5 # float
active: true # boolean
nothing: null # null (also ~)
The Norway problem (also called the "yes" problem) comes from YAML 1.1's over-eager boolean coercion — see the Wikipedia entry on the YAML "Norway problem" for the full backstory. In YAML 1.1, yes, no, on, off, true, false (and their uppercase variants) are all booleans. This means:
country_codes:
NO: Norway # YAML 1.1 parses NO as false!
YES: Yemen # Parses as true
This caused real bugs in applications with country codes, feature flags, and config keys named things like on or off. YAML 1.2 fixed this — only true and false are booleans. But many parsers (including older versions of PyYAML) still implement 1.1 behavior.
The safe defense: quote anything that could be misinterpreted.
status: "on"
country: "NO"
feature_enabled: "true" # or just use actual true/false for booleans
Similarly, watch out for bare strings that look like other types:
version: 1.0 # float: 1.0
version: "1.0" # string: "1.0" — be explicit if you need the string
port: 8080 # integer
zip_code: 01234 # integer: 1234 (leading zero stripped!) — quote it
Block vs Flow Style
YAML supports two styles: block (the indented multi-line form) and flow (the compact inline form). Flow style uses JSON-like syntax and is valid YAML.
# Block style
colors:
- red
- green
- blue
# Flow style (valid YAML)
colors: [red, green, blue]
# Block mapping vs flow mapping
person:
name: Alice
age: 30
person: {name: Alice, age: 30}
Flow style is useful for short lists and mappings that would waste vertical space in block style. Most style guides prefer block style for readability in config files. Flow is also what YAML dumps fall back to for deeply nested structures.
Multiline Strings
YAML has two multiline string operators, and getting them confused causes subtle bugs.
The literal block scalar (|) preserves newlines exactly:
script: |
#!/bin/bash
set -e
npm install
npm test
This produces the string "#!/bin/bash\nset -e\nnpm install\nnpm test\n". The trailing newline is included by default (| behavior). Use |- to strip it.
The folded block scalar (>) replaces newlines with spaces (except blank lines, which become newlines):
description: >
This is a long description that
wraps across multiple lines for
readability in the source file.
A blank line creates a paragraph break.
This produces "This is a long description that wraps across multiple lines for readability in the source file.\nA blank line creates a paragraph break.\n". Use >- to strip the final newline.
In GitHub Actions, the | operator is ubiquitous for inline shell scripts:
- name: Run tests
run: |
npm ci
npm test
npm run lint
Anchors and Aliases
YAML lets you define a value once and reuse it elsewhere in the same file with anchors (&) and aliases (*). The merge key (<<:) extends this to merge mappings.
# Define an anchor
defaults: &defaults
timeout: 30
retries: 3
environment: production
# Use the anchor in other keys
api_service:
<<: *defaults
port: 8080
worker_service:
<<: *defaults
port: 8081
timeout: 120 # override just this field
This is particularly useful in Docker Compose for sharing common service configuration, and in GitHub Actions for reusing step definitions. The <<: key merges all keys from the referenced mapping; individual keys defined after the merge override the anchored values.
Note that anchors are a YAML-level feature — they're resolved during parsing, so what you get in your application is the fully merged result. The anchor names don't appear in the parsed data structure.
Common YAML Gotchas
Tabs causing parse errors. Most common in files copy-pasted from a source that uses tab indentation. The error message is usually something like found character that cannot start any token. Check your editor's "show whitespace" feature.
Unquoted special characters. A colon followed by a space (: ) is the key-value separator, so it must not appear unquoted inside a value. A URL like http://example.com/path is technically safe because :// has no space after the colon — but any value containing : (colon-space) anywhere will break parsing. The safest habit is to quote any value that contains a colon.
# Dangerous — colon-space inside the value breaks parsing
redirect: http://example.com/path?foo: bar
# Safe — always quote values containing colons
url: "http://example.com/path"
redirect: "http://example.com/path?foo: bar"
Accidental type coercion. Already covered — version strings, zip codes, country codes, on/off values.
Indentation-based scope bugs. A single extra space at the beginning of a line puts that key in the wrong parent object. Easy to introduce by accident, sometimes hard to spot visually.
YAML in GitHub Actions is YAML 1.1. The actions/runner uses a YAML 1.1 parser. Boolean coercion gotchas apply. Always quote values like on, off, yes, no in Actions workflows.
YAML vs JSON Equivalence
Any JSON document is valid YAML 1.2 (flow style covers all of JSON's syntax). But not all YAML is valid JSON — block style, comments, anchors, and the richer type system are all YAML-only.
The JSON to YAML tool converts between the formats in your browser. It's handy when you have a JSON config and need to migrate it to YAML, or when a Kubernetes manifest needs to round-trip to JSON for a tool that requires it. For validating the resulting JSON structure, JSON Formatter highlights syntax errors and lets you explore the tree.
For a broader look at data format tradeoffs, JSON Basics and Syntax covers the fundamentals of JSON, and XML vs JSON compares the two most common structured formats.
The official YAML specification is the authoritative reference, and the YAML 1.2 changelog documents the specific differences from 1.1 — worth reading if you're debugging type coercion issues.
FAQ
Why does YAML treat `NO` as false?
That's the Norway problem — YAML 1.1 treats yes, no, on, off, true, and false (in any case) as booleans. So a country code NO for Norway silently becomes false. YAML 1.2 fixed this — only true and false are booleans. But many parsers still implement 1.1, including GitHub Actions' actions/runner. Always quote strings that could be misinterpreted: country: "NO".
Can I use tabs for indentation in YAML?
No — YAML explicitly forbids tabs for indentation. Only spaces are allowed. Mixing tabs and spaces causes a parse error with cryptic messages like "found character that cannot start any token." Configure your editor to convert tabs to spaces for .yml and .yaml files. The convention is 2 spaces per indentation level, though any consistent number works within a block.
Should I use YAML or TOML for config files?
For deeply nested configs (Kubernetes, Ansible, GitHub Actions), YAML's compact indentation wins. For shallow configs with explicit types (CLI tools, Rust/Python projects), TOML is safer — explicit types, no Norway problem, real spec compliance. The depth of nesting is the deciding factor: 1-2 levels favor TOML, 4+ levels favor YAML.
Is YAML a superset of JSON?
YAML 1.2 is — every valid JSON document is valid YAML 1.2 (using flow style). YAML 1.1 isn't quite, due to subtle differences in number parsing and unquoted strings. The reverse isn't true: YAML's block style, comments, anchors, and richer types have no JSON equivalent. Most YAML parsers can output to JSON and vice versa, but you lose YAML-specific features in the conversion.
What's the difference between `|` and `>` in YAML strings?
| (literal) preserves newlines exactly — useful for shell scripts or anything where line breaks matter. > (folded) collapses newlines into spaces, with blank lines becoming paragraph breaks — useful for prose that wraps in the source for readability. Use |- or >- to strip the trailing newline. GitHub Actions uses | constantly for inline shell scripts.
Why does my zip code `01234` become `1234`?
YAML auto-coerces unquoted numeric-looking values to integers, stripping leading zeros. Same problem affects version strings (1.0 becomes float 1.0, not string "1.0"), phone numbers, and IDs that happen to be all-digits. Always quote values where the string representation matters: zip: "01234", version: "1.0". If in doubt, quote it.
How do I share configuration between multiple YAML keys?
Use anchors (&name) and aliases (*name), with merge keys (<<:) for mappings. Define a block once with &defaults, then reference it with *defaults or merge it into another block with <<: *defaults. Particularly useful in Docker Compose for shared service configuration. Anchors are resolved during parsing, so the application sees the fully merged result — anchor names don't appear in the parsed data structure.
Why do GitHub Actions workflows have so many quoted strings?
Because GitHub Actions uses a YAML 1.1 parser, which is aggressive about boolean coercion. Workflow names, job names, step IDs, and matrix values often need quoting if they could be misread as booleans (on, off, yes, no) or numbers. The actions documentation specifically recommends quoting strings to avoid surprises. It's verbose but reliable.