robots.txt for AI Crawlers

The right robots posture separates public discovery from private surfaces without accidentally blocking useful assistant referrals.

Direct answer: Your robots policy should allow legitimate search and assistant fetch bots on public content while blocking admin, previews, staging, and unpublished drafts.
robotsbotsplaybook

Machine read

Primary entity

AI crawler policy

Extractable answer

High

Citation potential

Medium

Main issue

Teams block or allow bots too broadly because they do not distinguish fetch bots from training bots

Human read

Good robots policy is operational hygiene, not ideology. You want visibility on public pages and restraint on everything else.

What to change

  1. Explicitly disallow admin, preview, staging, and draft paths.
  2. Document separate decisions for search bots, assistant fetch bots, and training-oriented bots.
  3. Track crawler behavior in Cloudflare so policy decisions can be based on evidence.
Hidden failure mode: One blanket block kills high-value assistant referrals along with low-value bot traffic.
Noise check: robots.txt is not a trust substitute for weak information architecture or poor page quality.

The playbook

  • Owner: Platform operations
  • Effort: Half a sprint
  • Expected outcome: Clear crawler access rules with fewer accidental visibility losses.

FAQ

Should every AI-related bot be blocked?

No. Search and assistant fetch bots can provide referral and citation value on public pages.

What must stay blocked?

Admin routes, draft paths, preview URLs, and staging environments should stay out of public crawl surfaces.

The wrong robots file can quietly erase distribution. This is one of the few settings where a small mistake can undo a lot of editorial work.