Robots.txt is a text file placed in your website's root directory that provides instructions to search engine and AI crawlers about which pages or sections of your site they can or cannot access.
Robots.txt syntax:
- User-agent: Specifies which crawler the rule applies to
- Disallow: Blocks access to specified paths
- Allow: Permits access (overrides Disallow)
- Sitemap: Points to your XML sitemap
Common use cases:
- Block admin areas from crawling
- Prevent indexing of staging sites
- Manage crawl budget
- Block specific bots
Robots.txt and AI crawlers:
- GPTBot (OpenAI)
- Google-Extended (Gemini)
- ClaudeBot (Anthropic)
- PerplexityBot
Important considerations:
- Robots.txt is advisory, not enforced
- Blocking doesn't remove already-indexed content
- Blocking AI crawlers may reduce AI visibility
- Balance crawl management with discovery
Carefully consider the implications of blocking AI crawlers, as this may reduce your content's presence in AI responses.