What is Robots.txt? Definition | Beacon Glossary

Robots.txt is a text file placed in your website's root directory that provides instructions to search engine and AI crawlers about which pages or sections of your site they can or cannot access.

Robots.txt syntax:

User-agent: Specifies which crawler the rule applies to
Disallow: Blocks access to specified paths
Allow: Permits access (overrides Disallow)
Sitemap: Points to your XML sitemap

Common use cases:

Block admin areas from crawling
Prevent indexing of staging sites
Manage crawl budget
Block specific bots

Robots.txt and AI crawlers:

GPTBot (OpenAI)
Google-Extended (Gemini)
ClaudeBot (Anthropic)
PerplexityBot

Important considerations:

Robots.txt is advisory, not enforced
Blocking doesn't remove already-indexed content
Blocking AI crawlers may reduce AI visibility
Balance crawl management with discovery

Carefully consider the implications of blocking AI crawlers, as this may reduce your content's presence in AI responses.

Robots.txt

Related Terms

Track your SEO performance