Back to Glossary
Technical

Robots.txt

A file that instructs search engine and AI crawlers which pages they can or cannot access on your website.

Robots.txt is a text file placed in your website's root directory that provides instructions to search engine and AI crawlers about which pages or sections of your site they can or cannot access.

Robots.txt syntax:

  • User-agent: Specifies which crawler the rule applies to
  • Disallow: Blocks access to specified paths
  • Allow: Permits access (overrides Disallow)
  • Sitemap: Points to your XML sitemap

Common use cases:

  • Block admin areas from crawling
  • Prevent indexing of staging sites
  • Manage crawl budget
  • Block specific bots

Robots.txt and AI crawlers:

  • GPTBot (OpenAI)
  • Google-Extended (Gemini)
  • ClaudeBot (Anthropic)
  • PerplexityBot

Important considerations:

  • Robots.txt is advisory, not enforced
  • Blocking doesn't remove already-indexed content
  • Blocking AI crawlers may reduce AI visibility
  • Balance crawl management with discovery

Carefully consider the implications of blocking AI crawlers, as this may reduce your content's presence in AI responses.

Track your SEO performance

Beacon helps you monitor and improve your GEO Score across all major AI platforms.