TL;DR
- The “Not Crawled by AI” error means Google-Extended, the crawler for Gemini, is blocked or skipping your pages.
- The most common cause is a missing “Allow” rule for the Google-Extended user-agent in your robots.txt file.
- This error can also signal content quality issues, like a lack of E-E-A-T signals or poor semantic structure.
- Fixing it requires both a technical check of robots.txt and a content audit to ensure your pages are valuable for AI training.
One day your site is a trusted source for AI; the next, Google Search Console flags half your content with a new, worrying status: "Not Crawled by AI". This isn't a typical indexing problem. It’s a sign that your content is invisible to the AI models powering Google's future, like Gemini and AI Overviews. Successfully fixing "Not Crawled by AI" errors in Google Search Console is about more than just tweaking a file. It’s about proving your content is worth the crawl.
This error tells you one of two things. Either you have a direct technical block, or Google's AI systems have reviewed your page and decided it lacks the value needed for training their models. We will cover how to diagnose and fix both scenarios.
What "Not Crawled by AI" Actually Means in 2026
This status is new and specific. It relates only to the Google-Extended user-agent. This crawler is used to collect training data for Google's AI models, including Gemini.
Crucially, this error does not affect your traditional Google Search rankings. A page can rank perfectly fine in the classic blue links while still being ignored by Google's AI crawlers. The block only impacts your visibility within AI-generated answers and your content's inclusion in future model training.
The error falls into two categories:
- Hard Block: Your
robots.txtfile explicitly or implicitly disallows theGoogle-Extendedcrawler. This is a direct instruction that Google will obey. - Soft Block (De-prioritization): Your
robots.txtallows access, but Google chooses not to crawl the page. This is a quality signal. The crawler has determined the content is not valuable enough to spend resources on for AI training purposes.
The First Step: Fixing "Not Crawled by AI" Errors in Google Search Console with robots.txt
Before you touch your content, you must rule out the simple technical block. The Google-Extended user-agent is separate from the standard Googlebot and requires its own explicit "Allow" directive in your robots.txt file.
Many website owners assume allowing Googlebot is enough. It is not. You must add the following lines to your robots.txt file, which is located at yourdomain.com/robots.txt.
User-agent: Google-Extended
Disallow:
Leaving the Disallow: field blank for the Google-Extended user-agent grants it full access to your site. This single change resolves the majority of "Not Crawled by AI" errors. If you want to learn more about this specific bot, you can read a full guide on what Google-Extended is and why it matters.
The AEO God Mode plugin's AI Crawler Allowlist module handles this automatically. It detects 18 different AI crawlers and correctly formats your robots.txt file to grant access without manual editing.
Beyond robots.txt: When the Block Isn't Technical
If your robots.txt is correct but you still see errors, the problem lies with your content's perceived value. AI crawlers operate on a "value budget," not just a crawl budget. They are sent to find content with strong signals of Experience, Expertise, Authoritativeness, and Trust (E-E-A-T).
Data shows that 96% of citations in Google AI Overviews come from sources with high E-E-A-T. If your pages lack these signals, Google-Extended will pass them over.
Key areas to audit include:
- Author Attribution: Is the content written by a named author with a clear bio?
- Factual Density: Are your claims backed by data and links to authoritative sources?
- Originality: Does the page contain original research, data, or insights not found elsewhere?
Improving your E-E-A-T often starts with clear authorship. A complete guide on setting up author schema in WordPress for 2026 provides the technical steps to signal expertise directly to crawlers.
The Multi-Modal Content Requirement
AI models in 2026 are multi-modal. They process text, images, and video together. The single strongest predictor of AI citation is Multi-Modal Content Integration, with a correlation score of r=0.92.
Pages that combine text with original images and short explainer videos are up to 317% more likely to be crawled and cited. Google-Extended actively seeks this content to train its models. A page with only text is seen as incomplete and is a prime candidate for a "Not Crawled by AI" soft block.
To fix this, review your most important pages.
- Add original images with descriptive alt text. Avoid generic stock photos.
- Embed a short (60-90 second) video that summarizes the key concept.
- Ensure all visual assets are marked up with
ImageObjectandVideoObjectschema.
This strategy is fundamental to how you can appear in Google AI Overviews.
Semantic Structure and Schema: Your AI Roadmap
An AI crawler needs a clear map to understand your content's structure and key takeaways. Vague, unstructured content is difficult to parse and is often skipped. You provide this map through clean HTML structure and detailed schema markup.
- Structural Clarity: Use H2 and H3 headings for all main ideas. Use lists and HTML tables to structure comparative data. Listicles and tables account for 50% of all top AI citations.
- Semantic Completeness: Ensure your content fully answers a query without ambiguity. AI prefers "Answer Islands," or self-contained passages of 130-167 words.
- Schema Coverage: JSON-LD schema is the most direct way to explain your content to a machine. An automated schema engine can ensure every page has the correct markup, from
ArticletoFAQPage.
A clear structure makes your content predictable and valuable.
| Signal | Attracts AI Crawlers | Repels AI Crawlers |
|---|---|---|
| Content Format | Text, original images, and video combined | Text-only articles |
| Schema | FAQPage, VideoObject, Person schema | Missing or invalid structured data |
| E-E-A-T | Named author with bio, outbound citations | Anonymous content, unsupported claims |
| Structure | Clear H2/H3s, HTML tables, lists | Large, unbroken blocks of text |
The Challenge of JavaScript-Heavy Websites
Many modern websites rely heavily on JavaScript to render content. This creates a rich user experience but can make a site completely invisible to AI crawlers.
- ✓ Modern user experience
- ✓ Dynamic content loading
- ✓ Rich interactive elements
- ✗ 69% of AI crawlers cannot render JavaScript
- ✗ Key content may be invisible to Google-Extended
- ✗ Causes “Not Crawled by AI” due to empty pre-rendered HTML
- ✗ Requires server-side rendering (SSR) to fix
If your site uses a JavaScript framework like React or Vue, you must implement server-side rendering (SSR). SSR sends a fully-rendered HTML page to the crawler, ensuring all content is visible on the first pass. Without it, Google-Extended may see a blank page and report it as "Not Crawled by AI" because there was nothing of value to crawl.
Auditing Your Fixes: How to Verify AI Crawlers Are Visiting
After you've updated your robots.txt file and improved your content, you need to verify that Google-Extended is visiting your site. While Google Search Console data will eventually update, it can lag by days or weeks.
The fastest method is to check your server's raw access logs. You or your hosting provider can access these logs and search for visits from the Google-Extended user-agent string. Seeing new entries after you've made your changes is a positive confirmation that the fix is working. You can also use a plugin to check which AI bots are crawling your site traffic directly from your WordPress dashboard.