JSONL JSON Lines Explained — The Streaming Data Format

Q: Can I convert a regular JSON array to JSONL?

Yes, in one line: jq -c '.[]' input.json > output.jsonl. The -c flag forces compact (single-line) output, and .[] iterates over the array. In Python: for record in json.load(f): print(json.dumps(record)). The conversion is lossless as long as your records were already independent — JSONL doesn't support a top-level wrapper object.

Q: Can I use JSONL for HTTP streaming responses?

Yes — set the Content-Type to application/x-ndjson (the de facto standard, though not officially registered) and stream one JSON object per line. The client reads the response stream line-by-line, parsing each line as it arrives. This is how OpenAI's streaming completions and many real-time APIs deliver incremental data without buffering the whole response.

Q: What's the right way to handle errors in JSONL processing?

Always wrap each json.loads() call in a try-except (or equivalent) so a single malformed line doesn't kill the whole pipeline. Common errors: trailing newlines (skip empty lines), partial writes (line cut off mid-write), encoding issues (decode errors on non-UTF-8 input). Log bad lines with line numbers but continue processing. JSONL's resilience to partial corruption is one of its biggest advantages over JSON arrays.

If you've worked with log files, ML training datasets, or data pipelines, you've probably encountered JSONL without knowing its name. Each line is a valid JSON object, and the format solves a set of problems that regular JSON arrays handle poorly at scale.

What JSONL Actually Is

JSONL stands for JSON Lines. Each line in the file is a complete, self-contained JSON value — usually an object — built on top of the JSON data interchange format defined in RFC 8259. Lines are separated by newlines (\n).

{"id": 1, "event": "click", "ts": "2024-01-15T10:00:00Z", "user": "alice"}
{"id": 2, "event": "view", "ts": "2024-01-15T10:00:03Z", "user": "bob"}
{"id": 3, "event": "purchase", "ts": "2024-01-15T10:01:12Z", "user": "alice", "amount": 49.99}

That's it. No wrapping array, no commas between objects, no outer brackets. Each line is independently parseable.

The format has a few aliases you'll see in the wild: NDJSON (Newline Delimited JSON), JSON Lines, and occasionally LDJSON (Line Delimited JSON). They all describe the same thing. The official JSON Lines site uses the .jsonl extension, and that's become the de facto standard file extension.

Why It Exists: The Problems with JSON Arrays

A regular JSON array looks like this:

[
  {"id": 1, "event": "click"},
  {"id": 2, "event": "view"},
  {"id": 3, "event": "purchase"}
]

Fine for small datasets. But at scale, it has real problems:

You can't stream it without parsing the whole thing. A JSON parser typically needs the complete document. To read record 500,000 from a 10GB JSON array, you have to load (or at least scan) all preceding bytes.

You can't append to it. Adding a new record means rewriting the file — at minimum, removing the closing ], adding a comma, adding the new object, and adding ] back. Atomic appends are impossible.

Tools like grep, wc, and sort can't work on individual records. Because a JSON array has structure spanning multiple lines, line-oriented Unix tools become useless for inspection.

JSONL eliminates all three. Each line is complete. You can read line by line, append with a simple file write, and grep '{"event": "purchase"' events.jsonl just works.

Reading and Writing JSONL in Python

Python makes JSONL trivial:

import json

# Writing JSONL
records = [
    {"id": 1, "event": "click", "user": "alice"},
    {"id": 2, "event": "view", "user": "bob"},
]

with open("events.jsonl", "w") as f:
    for record in records:
        f.write(json.dumps(record) + "\n")

# Reading JSONL
with open("events.jsonl", "r") as f:
    for line in f:
        line = line.strip()
        if line:  # skip empty lines
            record = json.loads(line)
            print(record)

# Appending a new record
with open("events.jsonl", "a") as f:
    f.write(json.dumps({"id": 3, "event": "purchase"}) + "\n")

The if line guard is important — trailing newlines or blank separator lines in some JSONL files will cause json.loads("") to raise an exception.

Reading and Writing JSONL in Node.js

const fs = require('fs');
const readline = require('readline');

// Reading JSONL (streaming, line by line)
async function readJsonl(filePath) {
  const fileStream = fs.createReadStream(filePath);
  const rl = readline.createInterface({ input: fileStream });

  for await (const line of rl) {
    if (line.trim()) {
      const record = JSON.parse(line);
      console.log(record);
    }
  }
}

// Writing JSONL
function writeJsonl(filePath, records) {
  const lines = records.map(r => JSON.stringify(r)).join('\n') + '\n';
  fs.writeFileSync(filePath, lines, 'utf8');
}

// Appending a single record
function appendJsonl(filePath, record) {
  fs.appendFileSync(filePath, JSON.stringify(record) + '\n', 'utf8');
}

The readline interface reads the file as a stream — memory usage stays constant regardless of file size. Try to read a 10GB JSONL file with JSON.parse(fs.readFileSync(...)) and you'll run out of memory. With readline, you process one line at a time.

Where JSONL Is Actually Used

Log files and event streams. Application logs that need structure beyond plain text are often JSONL. Each log entry is one line, parseable independently. Tools like Fluentd, Logstash, and Vector can process JSONL streams natively — see the Fluentd JSON formatter docs for an example of native JSONL output.

Machine learning training data. OpenAI's fine-tuning API accepts training data as JSONL. Hugging Face datasets are often distributed as JSONL. The format handles millions of examples with constant memory overhead, which matters when you're processing training sets that won't fit in RAM.

Data pipelines and ETL. When you're moving data between systems, JSONL is a natural intermediate format. It's easy to produce, easy to consume, and each record can have different fields without breaking the parser.

Database exports. MongoDB's `mongoexport` produces JSONL — one document per line. (mongodump produces binary BSON files, not JSONL.) JSONL is also a natural output format for tools like jq in pipeline mode.

Comparing JSONL to a JSON Array

	JSONL	JSON Array
Streaming-friendly	Yes	No
Appendable	Yes	No
`grep`-friendly	Yes	No
Full-file parse required	No	Yes
Valid JSON	Each line is	The whole file is
Human readability	Good	Good
Size	Slightly smaller (no outer brackets)	Slightly larger

For data interchange between two APIs, a JSON array is often fine — the payload is small and you want the whole thing at once. For anything large, append-heavy, or streaming, JSONL wins cleanly.

Gotchas to Watch For

Trailing newlines. Most well-formed JSONL files end with a newline after the last record. Some don't. Always guard against empty lines when reading, as shown in the examples above.

No standard MIME type. JSONL doesn't have an officially registered MIME type. application/x-ndjson and application/jsonl are both used in practice. Pick one and document it.

Not the same as pretty-printed JSON. A multi-line pretty-printed JSON object is not JSONL — each complete record must be on a single line. If your records span multiple lines, they're not JSONL-parseable.

Mixed types per line. JSONL doesn't require all lines to have the same schema. In practice most JSONL files are homogeneous, but there's nothing in the spec that requires it.

For inspecting the structure of JSONL records, paste a single line into our JSON Formatter to explore and validate it. If you need to convert between JSONL and tabular formats, JSON to CSV and CSV to JSON handle the conversion.

For the basics of JSON itself, see JSON Basics and Syntax. And if you're working with tabular data formats, CSV and TSV: The Universal Data Format covers the other side of the equation.

JSONL is a simple idea that solves real problems. If you're building anything that involves logs, datasets, or streaming data, it belongs in your toolkit.

FAQ

What's the difference between JSONL, NDJSON, and LDJSON?

They're three names for the same thing — newline-delimited JSON where each line is a complete JSON value. JSONL (JSON Lines) is the most common name in 2026, used by OpenAI, Hugging Face, and most data tooling. NDJSON is older terminology, still common in streaming APIs. LDJSON appears occasionally in older specs. Pick .jsonl as the file extension; everyone recognizes it.

Can I convert a regular JSON array to JSONL?

Yes, in one line: jq -c '.[]' input.json > output.jsonl. The -c flag forces compact (single-line) output, and .[] iterates over the array. In Python: for record in json.load(f): print(json.dumps(record)). The conversion is lossless as long as your records were already independent — JSONL doesn't support a top-level wrapper object.

Is JSONL valid JSON?

The whole file is not valid JSON, but each line is. That's the entire point — you parse line-by-line, not as a single document. A JSON parser given the whole JSONL file as input will fail because it expects exactly one top-level value. Tools that handle JSONL specifically (jq with -c, OpenAI's fine-tuning API, MongoDB's mongoexport) know to read line-by-line.

Why does OpenAI's fine-tuning API require JSONL?

Because training datasets are huge, and JSONL streams cleanly — you can write each example as a separate line without rewriting the whole file. It also makes appending new examples atomic (just write a new line) and lets the API process records in parallel without parsing the entire dataset upfront. The format is also grep-friendly for spot-checking specific examples.

Can I use JSONL for HTTP streaming responses?

Yes — set the Content-Type to application/x-ndjson (the de facto standard, though not officially registered) and stream one JSON object per line. The client reads the response stream line-by-line, parsing each line as it arrives. This is how OpenAI's streaming completions and many real-time APIs deliver incremental data without buffering the whole response.

What's the right way to handle errors in JSONL processing?

Always wrap each json.loads() call in a try-except (or equivalent) so a single malformed line doesn't kill the whole pipeline. Common errors: trailing newlines (skip empty lines), partial writes (line cut off mid-write), encoding issues (decode errors on non-UTF-8 input). Log bad lines with line numbers but continue processing. JSONL's resilience to partial corruption is one of its biggest advantages over JSON arrays.

How does JSONL compare to Apache Parquet for data storage?

Different tools for different jobs. JSONL is row-oriented, human-readable, append-friendly, and works with line-based tools — great for logs and small-to-medium datasets. Parquet is column-oriented, binary, and optimized for analytical queries on large datasets — 5-10× smaller and 10-100× faster for column-projection queries. Use JSONL for ingestion and ETL; convert to Parquet for long-term storage and analytics.

Can I compress JSONL files efficiently?

Yes — JSONL compresses very well because adjacent records often share structure and repeated keys. gzip on a typical JSONL log file achieves 70-90% size reduction. zstd is faster and better. Most modern data pipelines store JSONL as .jsonl.gz or .jsonl.zst, and tools like zcat, gzcat, and zstdcat let you stream the decompressed content into line-by-line processing without ever materializing the full file.