TL;DR
- OAI-SearchBot fetches live data for ChatGPT answers and should be allowed.
- GPTBot scrapes web data for training future AI models and can be blocked.
- ChatGPT-User is a third bot that acts on behalf of a user browsing the web.
- Use a specific User-agent rule in robots.txt to block GPTBot while allowing the others.
Blocking OpenAI's web crawlers is a popular topic, but most guides get it wrong. They advise a blanket ban that hurts your visibility in ChatGPT answers. The correct approach is surgical. You need to block the training bot while giving full access to the bot that generates live search results.
This guide provides the exact robots.txt configuration to allow OAI-SearchBot while blocking GPTBot. This lets you prevent your content from being used for model training without sacrificing your ability to be cited in real-time ChatGPT responses.
The Exact robots.txt Configuration to Allow OAI-SearchBot (While Blocking GPTBot)
Here is the precise code. Place this in your yourdomain.com/robots.txt file. The order is important. Specific rules must come before general ones.
# Block OpenAI's training bot
User-agent: GPTBot
Disallow: /
# Allow OpenAI's live search bot
User-agent: OAI-SearchBot
Allow: /
How This Configuration Works
The robots.txt file works on a "first match" principle for a specific user-agent.
User-agent: GPTBot: When theGPTBotcrawler visits your site, it finds this specific block first. TheDisallow: /command tells it not to crawl any page on your site. It stops processing and leaves.User-agent: OAI-SearchBot: When theOAI-SearchBotcrawler visits, it skips theGPTBotblock because the user-agent does not match. It finds its own specific block. TheAllow: /command gives it permission to crawl everything.
This setup achieves the goal perfectly. You opt out of training data collection but remain eligible for inclusion in live ChatGPT answers that use web search.
Why You Must Differentiate Between OpenAI Bots
Failing to distinguish between OpenAI's crawlers is a critical AEO mistake in 2026. They serve completely different functions. Treating them the same means you either give away your content for free model training or you disappear from ChatGPT's search results entirely.
Understanding OpenAI's User-Agents
OpenAI uses at least three distinct crawlers, each with a specific job. A blanket Disallow for all of them is a mistake.
| User-Agent | Primary Function | Should You Allow It? |
|---|---|---|
| GPTBot | Data collection for training future AI models. | No (Optional) |
| OAI-SearchBot | Real-time web retrieval to answer user prompts in ChatGPT. | Yes (Critical) |
| ChatGPT-User | Acts on behalf of a user browsing a specific page via a GPT. | Yes (Critical) |
- GPTBot: This is the bot that scrapes the web to feed OpenAI's training datasets. Blocking this bot prevents your content from being used to build future versions of their models. There is no direct, immediate traffic or citation benefit from allowing it. For a deeper look, you can read a complete guide to OpenAI web crawlers.
- OAI-SearchBot: This is the bot that matters for visibility. When a ChatGPT user asks a question that requires current information, this bot performs a live web search. If your site is blocked to this bot, you cannot be cited as a source in the answer.
- ChatGPT-User: This bot is triggered when a user in ChatGPT clicks a link or asks a GPT to visit a specific URL. Blocking this bot breaks the user experience and prevents them from accessing your content through the AI interface.
The distinction is clear. GPTBot is for OpenAI's benefit. OAI-SearchBot and ChatGPT-User are for the user's benefit, which in turn benefits you through citations and traffic.
How to Implement and Verify in WordPress
You can edit your robots.txt file directly, but using a plugin is safer and avoids syntax errors.
Manual Method
If you use an FTP client, you can find the robots.txt file in the root directory of your WordPress installation. If it doesn't exist, you can create a new plain text file named robots.txt and upload it. Add the rules exactly as shown above.
Some traditional SEO plugins like Yoast or Rank Math also provide a built-in editor for the robots.txt file. You can find this in their settings or tools section.
Automated Method with AEO God Mode
A dedicated AEO plugin handles this automatically. The AEO God Mode plugin includes an AI Crawler Allowlist module. It identifies 18 different AI crawlers, including all three from OpenAI.
You can simply toggle GPTBot to "off" while leaving OAI-SearchBot and ChatGPT-User "on". The plugin will generate the correct, optimized robots.txt file for you without any manual editing. This removes the risk of a typo taking your site offline for all crawlers.
Verifying Your Configuration
After you've updated your robots.txt file, you need to confirm it's working.
- Google Search Console: Google has a robots.txt Tester tool. You can paste your rules into it and test them against different user-agents (like GPTBot) and URLs to see if they would be allowed or blocked.
- Server Logs: The most reliable method is to check your server logs for AI bot traffic. After a few days, you should see entries for
OAI-SearchBotbut none forGPTBot. Some plugins, like the AEO God Mode crawler log, provide a clean dashboard view of this data inside WordPress.
The Broader Context: robots.txt vs. llms.txt
Your robots.txt file is just one tool for managing AI crawlers. It's a simple allow or disallow instruction. An emerging, more detailed method is the llms.txt file.
While robots.txt says "enter" or "do not enter," llms.txt provides a detailed roadmap for crawlers that are allowed in. It can suggest which content is most important, define usage policies, and provide contact information. The two files work together. You can learn more about the differences between llms.txt vs. robots.txt for managing AI.
For now, robots.txt remains the universally respected standard for controlling crawler access. Getting these rules right is a foundational step for any serious Answer Engine Optimization strategy.
Frequently Asked Questions
robots.txt files have an implicit "allow" for any user-agent not specifically mentioned. You only need to explicitly Allow a bot if you have a broad Disallow rule that might otherwise block it.