UtilityKit

500+ fast, free tools. Most run in your browser only; Image & PDF tools upload files to the backend when you run them.

HTML Table to JSON

Convert HTML table markup into JSON data

About HTML Table to JSON

Some of the most useful data on the web lives inside HTML tables on Wikipedia pages or government portals — with no API to fetch it. HTML Table to JSON lets you paste raw HTML markup and walk away with a structured JSON array, no BeautifulSoup or Python required. Data scrapers, developers converting legacy HTML reports, and journalists pulling numbers all use this to skip parsing boilerplate. The tool finds th elements or the first tr row and uses those cells as JSON keys. Nested tags like a, span, and strong are stripped to inner text — exactly what you want for Wikipedia tables full of citation links. Multiple table elements are handled by an index picker or extract-all mode. Colspan and rowspan cells are expanded so every row has consistent shape. Type coercion turns strings like 1,234 into actual numbers or keeps them as strings.

Why use HTML Table to JSON

Smart Header Row Detection

Automatically finds th elements or uses the first tr row as the JSON key source. Cell values are trimmed and cleaned before becoming object keys, so output is immediately usable without renaming fields manually.

Handles Nested Tags

Strips a, span, strong, em, and other inline wrappers inside cells and keeps only the inner text. Wikipedia tables are full of citation superscripts and anchor links — the extractor ignores the markup and preserves only the data.

Multiple Tables in One Paste

If your HTML contains several table elements, pick the one you need by zero-based index or extract all of them as a JSON array of arrays. No need to manually isolate a single table from a complex page.

Colspan and Rowspan Aware

Merged cells using colspan and rowspan are expanded so each output row has a consistent shape with the same set of keys. No missing fields and no index misalignment from spanning cells.

Type Coercion Toggle

Numeric strings like "1,234" or "$19.99" can be parsed to the numbers 1234 and 19.99, or kept as their original strings for exact reproduction of the source. Toggle based on whether you plan to compute or just store the values.

Pure In-Browser DOM Parse

Uses the browser's built-in DOMParser to process the HTML entirely locally. Your scraped HTML — which may contain authentication tokens, internal links, or confidential data — never makes a network request.

How to use HTML Table to JSON

  1. Paste raw HTML containing one or more table elements into the input panel
  2. Select which table to extract if the HTML contains multiple tables: choose by index (0, 1, 2...) or extract all tables
  3. Toggle type coercion to control whether numbers like "1,234" become 1234 or stay as the string "1,234"
  4. Decide how to treat the first row: as a header row (keys) or as a regular data row
  5. Click Extract and review the JSON array in the output panel
  6. Copy the JSON or download it as a .json file

When to use HTML Table to JSON

  • Extracting a statistics table from a Wikipedia article into JSON for use in a data visualisation or analysis script
  • Converting a legacy HTML report from an internal tool into structured JSON fixtures for a new API
  • Pulling tabular data from a government portal or financial disclosure page without writing a scraper
  • Quickly testing the shape of a scraped table before writing a full BeautifulSoup or Playwright extraction script
  • Converting an HTML email or newsletter table into JSON for archival or analysis purposes
  • Extracting a comparison table from a competitor's documentation page for analysis

Examples

Simple table from a docs page

Input: <table> <tr><th>Name</th><th>Role</th></tr> <tr><td>Ada</td><td>Engineer</td></tr> <tr><td>Alan</td><td>Researcher</td></tr> </table>

Output: [ {"Name":"Ada","Role":"Engineer"}, {"Name":"Alan","Role":"Researcher"} ]

Wikipedia-style with anchor tags

Input: <table> <tr><th>City</th><th>Population</th></tr> <tr><td><a href="/wiki/Tokyo">Tokyo</a></td><td>13,960,000</td></tr> <tr><td><a href="/wiki/Delhi">Delhi</a></td><td>16,787,941</td></tr> </table>

Output: [ {"City":"Tokyo","Population":13960000}, {"City":"Delhi","Population":16787941} ]

Colspan handled

Input: <table> <tr><th>Q1</th><th>Q2</th><th>Q3</th></tr> <tr><td colspan="2">$10k</td><td>$5k</td></tr> </table>

Output: [ {"Q1":"$10k","Q2":"$10k","Q3":"$5k"} ]

Tips

  • To grab a table from a live site, open DevTools, right-click the table element, choose Copy → Copy outerHTML, and paste it here.
  • If the output has extra rows of empty objects, your source table had a tfoot with summary or total rows — check whether to include the footer in the extraction.
  • Currency and comma-grouped numbers ($1,234.50) become 1234.5 only when type coercion is on; disable it for strings that must be reproduced exactly.
  • Wikipedia tables often have superscript citation numbers like [1] inside cells — strip these with a quick find-replace after extraction if you do not need them.
  • If your table has merged headers across multiple thead rows, the converter uses the bottom-most header row as the key source.

Frequently Asked Questions

Can I extract a table from a live website?
Not directly — this tool works with pasted HTML markup. To get the HTML from a live page, open your browser's DevTools, right-click the table element in the Elements panel, and choose Copy → Copy outerHTML. Then paste that HTML here.
What if the HTML has multiple <table> elements?
If multiple table elements are detected, a selector lets you pick by zero-based index (0 for the first table, 1 for the second, etc.) or extract all tables as a JSON array where each element corresponds to one table.
How are nested links and bold text inside cells handled?
Inline tags like a, strong, em, span, and sup are stripped and only their inner text is kept. This means a cell containing <a href="/wiki/Ada">Ada Lovelace</a> extracts as the string "Ada Lovelace", not the anchor markup.
Will it correctly handle merged cells (colspan/rowspan)?
Yes. Cells with colspan or rowspan are expanded into their logical positions in the grid. A cell spanning two columns produces two JSON fields with the same value, and a cell spanning two rows produces the value in both corresponding output rows.
Can numbers like "1,234" be parsed as actual numbers?
Yes, when type coercion is enabled. The converter strips currency symbols, thousands separators, and percent signs before parsing — so "$1,234.50" becomes 1234.5 and "42%" becomes 42. Disable coercion to keep the original string.
What if there's no <th> header row — just <td>?
When no th elements are found, the tool uses the first tr row as the header source by default. You can also opt into auto-generated column names (col0, col1, col2...) if you want to treat all rows as data rows.
Does it preserve cell formatting like bold or colour?
No. The extractor strips all markup and returns plain text values only. Formatting is presentational and has no meaning in JSON. If you need formatting metadata, consider converting to CSV first and using a richer format.
Is it safe to paste HTML from internal tools?
Yes. The browser's DOMParser processes the HTML entirely in your tab with no network requests. HTML from internal dashboards, authenticated reports, or private wikis stays in your local browser memory.

Explore the category

Glossary

DOM tree
The in-memory hierarchical structure that a browser builds from HTML markup. Each element, attribute, and text node becomes a node in the tree, allowing JavaScript to traverse and extract table data programmatically.
Header row (<th>)
A table row using th elements instead of td. Browsers render th cells as bold and centred by default, and extractors use them as the source of JSON object keys when converting a table.
Colspan / Rowspan
HTML attributes that cause a single table cell to span multiple columns (colspan) or multiple rows (rowspan). Extractors must expand these merged cells to ensure every logical row has a consistent and complete set of fields.
Inner text
The plain text content of an HTML element after stripping all child tags. For a cell like <td><a href="...">Ada</a></td>, the inner text is simply Ada.
Nested anchor
A link tag (a element) placed inside a table cell, common in Wikipedia and documentation tables where cell text links to a related article. Extractors strip the anchor but preserve its inner text.
Selector
In the context of this tool, a zero-based index (0, 1, 2...) used to identify which table to extract when an HTML paste contains multiple table elements.