UtilityKit

500+ fast, free tools. Most run in your browser only; Image & PDF tools upload files to the backend when you run them.

XML Sitemap Validator

Parse pasted sitemap XML for structural issues.

About XML Sitemap Validator

An invalid XML sitemap is silently ignored by Google and Bing crawlers — meaning pages you expect to be indexed never enter the queue. The XML Sitemap Validator parses your sitemap XML against the Sitemap Protocol 0.9 specification, checking document structure, required fields, namespace declarations, URL format validity, date format compliance, and changefreq and priority value ranges. It catches common errors like missing <loc> elements, malformed lastmod timestamps, URLs that exceed the 2048-character limit, sitemaps with more than 50,000 URLs or exceeding 50MB uncompressed, and incorrect XML namespace declarations that cause parsers to reject the document. The tool also validates sitemap index files, checking that child sitemap references use absolute URLs and that the index structure is well-formed. After validation, you get a clear error list with line references and inline fixes.

Why use XML Sitemap Validator

Sitemap Protocol 0.9 Specification Compliance

Validates against the official sitemaps.org protocol — the same spec Google and Bing parsers use. Catches namespace errors, missing required fields, and attribute value violations that cause crawlers to silently reject the file.

50,000 URL and 50MB Limit Enforcement

Detects when a single sitemap file exceeds Google's hard limits of 50,000 URLs or 50MB uncompressed. Oversized sitemaps cause partial or complete indexing failures that are hard to diagnose from Search Console alone.

Sitemap Index File Validation

Validates sitemap index files in addition to standard URL sitemaps, checking that child <sitemap> entries reference absolute URLs, include valid lastmod dates, and the index itself is well-formed XML.

ISO 8601 Date Format Verification

The lastmod field must use W3C Datetime format (ISO 8601). The validator catches common mistakes like US-format dates (MM/DD/YYYY), missing timezone designators, or invalid day/month values that parsers reject.

URL Format and Character Validation

Checks each <loc> URL for absolute URL format, correct XML entity encoding of ampersands and special characters, length under 2048 characters, and protocol prefix, flagging any that would cause parser errors.

Pre-Submission CI/CD Integration

Use the tool as a manual check before deploying a new site or as a reference validator when reviewing sitemap generator output in your build pipeline, catching issues before they reach Search Console.

How to use XML Sitemap Validator

  1. Paste your sitemap XML content directly into the input field, or enter the sitemap URL to fetch it
  2. Select whether you are validating a standard URL sitemap or a sitemap index file
  3. Click Validate to run the full specification check
  4. Review the error list — each entry includes the affected element, line reference, and a plain-English description of the issue
  5. Fix the reported errors in your sitemap source (CMS settings, sitemap plugin configuration, or generator script)
  6. Re-run validation until the result shows zero errors before submitting to Google Search Console or Bing Webmaster Tools

When to use XML Sitemap Validator

  • Before submitting a new sitemap to Google Search Console or Bing Webmaster Tools for the first time
  • After migrating a CMS or switching sitemap plugins to verify the output format is still valid
  • When Search Console reports sitemap errors and you need to identify the specific invalid entries
  • When adding a new content type or section to an existing sitemap and want to verify the additions are valid
  • When building a custom sitemap generator script and need to validate its output against the specification
  • When auditing a client site to check sitemap health as part of a technical SEO review

Examples

Valid minimal sitemap

Input: <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url><loc>https://example.com/page</loc><lastmod>2024-06-15</lastmod></url> </urlset>

Output: Valid — 1 URL, namespace correct, lastmod ISO 8601 compliant. No errors.

Unencoded ampersand in URL

Input: <loc>https://example.com/search?q=foo&sort=asc</loc>

Output: Error line 1: Unencoded ampersand in <loc> URL. Replace & with & → https://example.com/search?q=foo&sort=asc

Invalid lastmod format

Input: <lastmod>06/15/2024</lastmod>

Output: Error: lastmod value '06/15/2024' is not valid W3C Datetime. Use ISO 8601 format: 2024-06-15

Tips

  • Only include canonical URLs in your sitemap — if a page has a rel=canonical pointing elsewhere, including the non-canonical version can confuse crawlers
  • Update lastmod only when the page content genuinely changes — setting it to today's date on every build teaches Google to distrust your dates
  • Split large catalogs into themed child sitemaps (e.g., sitemap-products.xml, sitemap-blog.xml) rather than one monolithic file to make crawl diagnosis easier in Search Console
  • Validate your sitemap in both Google Search Console and this tool — Search Console may catch live fetch issues (redirects, authentication) that a structural validator cannot
  • Ensure your sitemap URL is referenced in robots.txt with a Sitemap: directive so crawlers can discover it even without a Search Console submission

Frequently Asked Questions

What is the maximum number of URLs allowed in a single sitemap?
The Sitemap Protocol 0.9 specification limits a single sitemap file to 50,000 URLs and 50MB uncompressed. Sites with larger catalogs must use a sitemap index file that references multiple child sitemaps, each within these limits.
Does the lastmod date actually influence crawl frequency?
Google uses lastmod as a hint, not a guarantee. If lastmod values are accurate and updated when content changes, Google may crawl those URLs more frequently. However, setting lastmod to today's date on every page regardless of changes trains crawlers to ignore it.
Is changefreq required, and does Google use it?
changefreq is optional and Google has stated publicly that it largely ignores it in favour of its own crawl-frequency signals. However, it must use one of the valid values (always, hourly, daily, weekly, monthly, yearly, never) if present — invalid values cause validation errors.
Why must URLs in sitemaps use XML entity encoding?
Sitemaps are XML documents, so characters like &, ', ", <, and > must be encoded as &, &apos;, ", <, and > respectively. Unencoded ampersands in URLs with query parameters are among the most common sitemap errors.
What XML namespace declaration is required?
The root <urlset> element must include xmlns="http://www.sitemaps.org/schemas/sitemap/0.9". Missing or incorrect namespace declarations cause XML parsers and Google's sitemap parser to reject the document.
Can I include image or video URLs in my sitemap?
Yes, using the image (xmlns:image) and video (xmlns:video) namespace extensions. These require additional namespace declarations on the root element and follow separate sub-element schemas. This validator checks the core Sitemap 0.9 structure and flags unrecognised extensions.
Should sitemap URLs include trailing slashes?
Use whatever canonical form your server returns — with or without trailing slash — and be consistent. The canonical URL in your sitemap should match the canonical in your page's <link rel="canonical"> tag, otherwise Google may see a mismatch and prefer one version over the other.
How do I fix a sitemap that Search Console reports as 'could not be read'?
Start by pasting the sitemap content into this validator. The most common causes are XML parse errors (unclosed tags, unencoded ampersands), an incorrect or missing namespace declaration, or a BOM (byte order mark) at the start of the file. The error list will identify the specific issue and line.

Explore the category

Glossary

Sitemap Protocol 0.9
The XML-based protocol defined at sitemaps.org and adopted by Google, Bing, and Yahoo for communicating URL lists and metadata to search engine crawlers.
Sitemap Index
An XML file that lists multiple child sitemap files rather than individual URLs, used when a site's URL count exceeds the 50,000-URL single-sitemap limit.
lastmod
An optional <lastmod> element within a sitemap URL entry indicating the date the page was last significantly modified, in W3C Datetime (ISO 8601) format.
changefreq
An optional sitemap element suggesting how frequently a URL's content changes. Valid values are always, hourly, daily, weekly, monthly, yearly, and never.
XML Entity Encoding
The substitution of special XML characters with their entity references (& for &, < for <, etc.) to prevent XML parsers from misinterpreting them as markup.
W3C Datetime
A profile of ISO 8601 used by the Sitemap Protocol for lastmod values, supporting formats from YYYY to YYYY-MM-DDThh:mm:ss+TZ, requiring at minimum a full YYYY-MM-DD date.