UtilityKit

500+ fast, free tools. Most run in your browser only; Image & PDF tools upload files to the backend when you run them.

Robots.txt Generator

Generate a valid robots.txt with user-agent, allow/disallow, sitemap, and host directives.

About Robots.txt Generator

A robots.txt file is the first thing search engine crawlers check before visiting any page on your site. Get it wrong and you may accidentally block Google from indexing your entire site, or fail to block scrapers from hammering your API endpoints. The rules follow a simple but finicky syntax: User-agent lines select which bots to address, Disallow paths block crawling, Allow paths create exceptions within a blocked directory, and Sitemap lines help crawlers discover your content. A single typo — a missing slash at the end of a directory path, a wrong user-agent spelling, or a misplaced blank line — can have outsized consequences. This tool generates syntactically correct robots.txt files using a visual rule builder. Add rules for multiple user-agents, specify disallow and allow paths, insert sitemap URLs, and optionally set crawl-delay directives — all without memorizing the exact.

Why use Robots.txt Generator

Visual Rule Builder

Add allow and disallow rules line by line through a form interface rather than hand-editing raw text. The correct syntax — trailing slashes, proper line breaks, blank lines between agent blocks — is applied automatically.

Multiple User-Agent Blocks

Set different crawling rules for different bots in a single file. Grant Googlebot full access while blocking AhrefsBot or SemrushBot from your entire site with separate user-agent sections.

Sitemap Inclusion

Embed one or more sitemap URLs at the bottom using the Sitemap: directive. This helps crawlers discover your content more reliably, including Googlebot and Bingbot when they process the robots.txt file.

Syntax Validation

The generator validates rules for common mistakes: paths without a leading slash, invalid characters in user-agent names, duplicate rules, and an empty Disallow value (which allows everything rather than blocking it).

Wildcard and Parameter Blocking

Support for wildcard patterns like /*?sort= to block parameter-based URL variants without blocking the base URL, keeping crawl budget focused on canonical pages rather than infinite query-string combinations.

Plain-Text Output

The output is correctly formatted plain-text robots.txt content, ready to drop at your domain root. No build step, configuration, or server restart required — save as robots.txt and upload.

How to use Robots.txt Generator

  1. Select the User-agent — use * for all bots or enter a specific crawler name like Googlebot
  2. Add Disallow paths for directories and pages you want to block from crawling
  3. Add Allow paths to create explicit exceptions within a blocked directory
  4. Insert one or more Sitemap URLs at the bottom of the file
  5. Optionally set a Crawl-delay value in seconds for bots that respect it
  6. Copy the generated robots.txt content and upload it to your site's domain root as /robots.txt

When to use Robots.txt Generator

  • You are launching a new site and need to create a robots.txt that allows indexing of public pages and blocks admin and API paths
  • You are migrating a site and need to ensure robots.txt does not accidentally block your new URL structure
  • You want to block specific scrapers or SEO tools from accessing your site without blocking legitimate search engines
  • You need to add your sitemap URL to robots.txt for better crawl discovery by search engines
  • You want to block URL parameter variants from crawling to save crawl budget for canonical pages
  • You are setting up a staging environment and need a robots.txt that blocks all crawlers from indexing it

Examples

Block admin, allow rest

Input: User-agent: * | Disallow: /admin/ | Allow: / | Sitemap: https://example.com/sitemap.xml

Output: User-agent: * Disallow: /admin/ Allow: / Sitemap: https://example.com/sitemap.xml

Different rules per bot

Input: Googlebot: allow all | AhrefsBot: block all

Output: User-agent: Googlebot Allow: / User-agent: AhrefsBot Disallow: / Sitemap: https://example.com/sitemap.xml

Block parameter URL variants

Input: User-agent: * | Disallow: /*?sort= | Disallow: /*?filter=

Output: User-agent: * Disallow: /*?sort= Disallow: /*?filter=

Tips

  • Disallow blocks crawling, not indexing — to prevent a page from appearing in search results, add a noindex meta robots tag in addition to the disallow rule.
  • Always include a Sitemap: directive at the bottom of your robots.txt — it is one of the most reliable ways to ensure new content is discovered by crawlers promptly.
  • Test your robots.txt with Google Search Console's robots.txt Tester before deploying to verify the rules have the intended effect on specific URLs.
  • Do not block CSS and JavaScript files — Google needs them to render pages correctly for ranking. Blocking them can reduce your pages' ranking.
  • Add specific User-agent rules above the wildcard * block — when both match a bot, the more specific named rule takes precedence.

Frequently Asked Questions

Where does robots.txt go on my site?
robots.txt must be placed at the root of your domain at exactly https://yourdomain.com/robots.txt. It cannot be in a subdirectory. A robots.txt at /blog/robots.txt controls nothing — crawlers only check the root location.
Does Disallow guarantee a page will not appear in search results?
No. Disallow blocks crawling but not indexing. A page can still appear in search results if it has inbound links, even if it is disallowed. To prevent indexing, use the noindex meta robots tag or X-Robots-Tag HTTP header.
What is the difference between robots.txt and the meta robots tag?
robots.txt controls crawling at the URL level — blocked URLs are not fetched. The meta robots tag controls what happens after the page is crawled: noindex prevents indexing, nofollow prevents link equity passing. Use robots.txt to save crawl budget; use noindex to prevent indexing.
Can I block specific bots like AhrefsBot or SemrushBot?
Yes. Add a separate User-agent: AhrefsBot block followed by Disallow: /. Well-behaved bots respect robots.txt, but malicious scrapers typically do not — robots.txt is not a security measure.
What does User-agent: * mean?
The wildcard * matches all robots and crawlers that have not been given a specific named user-agent block. Rules in the * block apply to any bot not addressed by a more specific block higher in the file.
Should I add my sitemap URL to robots.txt?
Yes. The Sitemap: directive is one of the most reliable ways to ensure all major crawlers discover your sitemap. Google, Bing, and other compliant crawlers read this directive when processing robots.txt.
Does case matter in robots.txt paths?
Yes. Path matching in robots.txt is case-sensitive on most servers. Disallow: /Admin does not block /admin. Always match the exact case of your actual URL paths.
Why is Google still indexing my disallowed page?
Google may have indexed the page before you added the disallow rule, or it may have discovered the URL from an external link and indexed it from that signal. Remove indexed pages using Google Search Console's URL removal tool, and add a noindex meta tag for belt-and-suspenders protection.

Explore the category

Glossary

robots.txt
A plain-text file placed at the root of a domain that instructs web crawlers which pages and directories they may or may not access, following the Robots Exclusion Protocol.
User-agent
A robots.txt directive that specifies which crawler the following rules apply to. Use * for all crawlers, or a specific name like Googlebot, Bingbot, or AhrefsBot for targeted rules.
Disallow
A robots.txt directive that instructs matching crawlers not to fetch URLs matching the specified path. An empty Disallow: value means the crawler is allowed everywhere on the site.
Allow
A robots.txt directive that explicitly permits crawling of a path, used to create exceptions within a broader Disallow rule. Allow takes precedence over Disallow when both match the same URL.
Sitemap
A robots.txt directive providing the absolute URL of an XML sitemap file, helping crawlers discover all indexable pages on a site. Multiple Sitemap: lines are allowed in a single robots.txt.
Crawl-delay
An optional robots.txt directive requesting a minimum wait time in seconds between requests from a crawler. Supported by some bots like Bingbot but not by Googlebot, which manages crawl rate separately via Search Console.