UtilityKit

500+ fast, free tools. Most run in your browser only; Image & PDF tools upload files to the backend when you run them.

Hidden Character Detector

Detect invisible characters like NBSP, zero-width spaces, BOM, tabs, CR/LF, and control chars, then clean safely.

About Hidden Character Detector

Not all characters are visible — Unicode includes dozens of invisible or near-invisible characters that can cause serious problems in text processing, programming, and data management. Zero-width spaces, non-breaking spaces, soft hyphens, bidirectional text marks, byte order marks (BOM), and control characters all look like nothing when rendered but behave differently from regular spaces or empty strings. They can break string equality checks, cause unexpected word wrapping, corrupt JSON parsing, produce PDF rendering artifacts, and confuse copy-paste operations. Hidden Character Detector scans your text for all known invisible and problematic Unicode characters, highlights their positions, counts each type found, and optionally removes or replaces them with visible markers or standard equivalents.

Why use Hidden Character Detector

Detects Zero-Width & Invisible Characters

Finds zero-width spaces (U+200B), zero-width non-joiners, soft hyphens, and other completely invisible characters.

Non-Breaking Space Detection

Identifies   and U+00A0 non-breaking spaces that look identical to regular spaces but break word splitting and equality.

BOM Detection

Catches byte order marks (U+FEFF) that are invisible at the start of files and can corrupt JSON parsers and text processors.

Bidirectional Control Characters

Detects right-to-left override (U+202E) and other bidi control characters used in filename spoofing attacks.

Safe Removal Mode

Removes only confirmed hidden characters — regular visible text is preserved exactly.

Visible Marker Replacement

Replace hidden characters with visible symbols for review before deciding whether to remove them.

How to use Hidden Character Detector

Paste your text into the input area.
The detector immediately scans and highlights any hidden characters found.
Review the legend showing which character types were detected and how many of each.
Click 'Remove all hidden characters' to strip all detected invisible characters.
Or click 'Replace with markers' to replace each hidden character with a visible symbol for manual review.
Copy the cleaned or annotated output using the copy button.

When to use Hidden Character Detector

When debugging a string comparison that fails despite the strings appearing identical on screen.
When cleaning user-submitted text before storing it in a database to prevent hidden character injection.
When inspecting text pasted from a web page or PDF that may contain non-breaking spaces or zero-width characters.
When preparing text for JSON, CSV, or XML where invisible characters in string values cause parsing errors.
When investigating a suspicious file name that may contain bidirectional control characters used for spoofing.
When cleaning text exported from a word processor that inserts soft hyphens and non-breaking spaces automatically.

Examples

Zero-width space in a word

Input: helloworld (zero-width space between hello and world)

Output: Found: 1× Zero-Width Space (U+200B) at position 5. Clean output: helloworld

Non-breaking spaces from web copy

Input: price: $10.00 each

Output: Found: 1× Non-Breaking Space (U+00A0) at position 14. Clean output: price: $10.00 each

BOM at string start

Input: This text has a BOM prefix

Output: Found: 1× BOM (U+FEFF) at position 0. Clean output: This text has a BOM prefix

Tips

Always clean user-submitted form input with this tool before storing in a database, especially for fields used in search or comparison.
Use 'Replace with markers' mode before 'Remove' mode to first understand what was found and where, then decide if removal is safe.
When debugging a failed string comparison between two visually identical strings, paste both into the detector and check if one contains hidden characters.
Text pasted from Microsoft Word almost always contains soft hyphens (U+00AD) in hyphenated words — run it through the detector after pasting.
For security-critical applications, also check for bidirectional control characters (U+202E, U+202D) that can be used in spoofing attacks.

Frequently Asked Questions

What is a zero-width space and why is it a problem?▾

A zero-width space (U+200B) takes no visual space in rendered text but is a real character in the string. It causes string equality to fail, breaks word splitting algorithms, and can corrupt search matching.

What is a non-breaking space and how is it different from a regular space?▾

A non-breaking space (U+00A0) looks identical to a regular space when rendered. The differences: it prevents line breaks at that position, it is a different Unicode code point so string equality fails, and it can cause unexpected behavior in text processing.

What is a BOM and why should I remove it?▾

A byte order mark (U+FEFF) is a Unicode code point used to indicate byte order at the start of a file. When present in the middle of text or at the start of a string value in JSON or CSV, it causes parsing errors in many tools because parsers do not expect it there.

What are bidirectional control characters?▾

Characters like right-to-left override (U+202E) reverse the display direction of following text. They have been used in file naming attacks to disguise a filename's actual extension (e.g., showing 'file.png' but naming 'fileGNP.exe' with RTL override).

Will removing hidden characters change any visible text?▾

No. The tool only removes characters in the invisible or control character categories. All printable, visible characters including regular spaces are preserved exactly.

Why does text copied from a website sometimes contain hidden characters?▾

Websites may use zero-width spaces for text measurement, soft hyphens for line break suggestions, or non-breaking spaces for layout control. These characters are valid in HTML rendering but should be stripped for plain text use.

Can this detect all types of invisible characters?▾

The tool detects a comprehensive set including zero-width characters, control characters (C0/C1), BOM, bidirectional controls, and formatting characters. Some exotic Unicode categories may not be flagged.

Does it detect invisible characters inside HTML content?▾

Yes, the detector operates on the raw text input regardless of whether it contains HTML markup. Hidden characters inside HTML attributes or text nodes are detected in the raw string.

Explore the category

Glossary

Zero-width space (ZWSP): Unicode character U+200B. Invisible in rendered text but present as a real character, commonly used for line-break hints in non-spaced scripts. Causes string comparison failures.
Non-breaking space (NBSP): Unicode character U+00A0 (HTML:  ). Looks like a regular space but prevents line breaks and differs from regular space in string comparison.
Soft hyphen: Unicode character U+00AD. Invisible unless the word breaks at a line boundary, where it renders as a visible hyphen. Inserted by word processors at potential break points.
BOM (Byte Order Mark): Unicode character U+FEFF, used to indicate byte order at the start of UTF-16 files. When present in UTF-8 text, it is an invisible artifact that often breaks parsers.
RTL override (U+202E): A Unicode control character that reverses the text direction of following characters, making left-to-right text appear right-to-left. Used in filename spoofing attacks.
Control character: Non-printable characters in the range U+0000–U+001F (C0 controls) and U+0080–U+009F (C1 controls). Include characters like null (U+0000), bell (U+0007), and escape (U+001B).