Free tool · Crawler-eye view

See your product page the way AI crawlers see it.

Your PDP looks great in a browser. GPTBot, ClaudeBot, PerplexityBot, and Googlebot never open one — they fetch raw HTML, no JavaScript, no pixels. The simulator runs that exact fetch: a 10-bot robots.txt verdict for the URL's path, the plain-text view a crawler extracts, and live Product JSON-LD and JS-shell checks.

Free No signup ~20-second result

Report manifest

10-bot roster verdict — allowed or blocked for this exact path, per crawlerfree

The plain-text view: first 1,200 characters of what a non-JS crawler extractsfree

Top findings — schema, JS-shell, and readability verdicts ranked by severityfree

Full plain-text extraction (up to 12,000 characters) — the complete crawler-eye viewemail

Per-bot detail: the exact robots.txt rule behind each verdict, plus the full fix listemail

Rate limit: 1 run per 30 seconds per IP. Every check is live and deterministic — the simulator fetches your page and robots.txt at run time; nothing is canned, and no AI model is involved.

What an AI crawler actually sees

Three things separate a crawler's view of your PDP from a shopper's. First, access: every crawler announces a user-agent and obeys (or claims to obey) robots.txt, so a single Disallow line can remove a product from an engine's answers without anyone on the team noticing. Second, rendering: most AI crawlers read the served HTML and stop — a price that appears after a JavaScript hydration pass, a size chart shipped as a JPEG, or a returns policy inside a modal simply does not exist in their copy of the page. Third, structure: engines resolve what the page sells from Product JSON-LD first and prose second, so a schema block missing offers or sku degrades the product into guesswork.

The simulator makes all three visible in one run. It fetches the URL the way a non-rendering crawler does, evaluates robots.txt for ten user-agents against that exact path (a store often admits GPTBot at / but blocks /products/ by accident), then reduces the page to its extractable plain text — title, headings, visible prose, bullets, in document order. If the fact a buyer needs is not in that text or in the schema, no AI answer built on this page will contain it.

More than a Googlebot simulator: ten crawlers, one fetch

A classic Googlebot simulator answers one question — can Google fetch and read this page? That question still matters, and the simulator answers it: Googlebot gets its own per-path verdict, evaluated against the live robots.txt the way Google's crawler documentation specifies. But product discovery no longer runs through one bot. The same fetch now has to satisfy OpenAI's GPTBot and OAI-SearchBot, Anthropic's ClaudeBot, PerplexityBot, Google-Extended and GoogleOther on the Gemini side, Bingbot feeding Copilot, and CCBot feeding most training sets. Each carries its own user-agent; each can be blocked independently; and bot-protection apps frequently block several of them by default. Every verdict row links the fix:

Finding	What it means for your products	Fix guide
Crawler blocked	The engine never sees the page. Schema and content work cannot compensate for a Disallow — access gates everything downstream.	Fix robots.txt
Product JSON-LD missing or partial	Crawlers infer the product from prose — price, availability, and rating are the first casualties, and shopping answers render the product as untrusted.	Schema for AI search
JavaScript shell detected	The served HTML is a frame waiting for a script. Non-rendering crawlers get the frame, not the product.	Headless storefronts
Thin plain-text extraction	The page is reachable but says almost nothing in extractable form — facts live in images, accordions, and widgets agents parse badly.	Agent-readable PDPs

Access is the gate, not the win. Once the roster reads green, the next questions are readiness and results: the AEO Grader scores the same URL on the structural signals answer engines reward, and the ChatGPT Product Visibility Checker runs live buyer prompts to see whether the page actually earns a citation. If the crawl-surface finding is a missing llms.txt, the llms.txt generator drafts one in about 30 seconds.

Related tools

Protocols · UCP/ACP

Agentic Readiness Grader

The simulator shows what crawlers see; the grader scores whether an agent could draft the product into a cart.

Generator

Product schema generator

If the JSON-LD check fails, start here: complete Product schema, copy-ready.

Ask AI about the AI Crawler Simulator

Have your favorite AI engine explain the crawler-eye view for your store.

Frequently asked questions

Is this a Googlebot simulator?

For the access layer, yes — it does what a classic Googlebot simulator does: fetch the page without executing JavaScript and evaluate robots.txt rules against the exact URL path. Then it repeats the same evaluation for nine more crawlers, because in 2026 Googlebot is one reader among many: GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended, GoogleOther, Bingbot, CCBot, and Bytespider each carry their own user-agent and can be blocked independently.

Which AI crawlers does the simulator check?

Ten, ordered by relevance to product discovery: GPTBot and OAI-SearchBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, Googlebot, Google-Extended and GoogleOther (Google), Bingbot (Microsoft Copilot answers from the Bing index), CCBot (Common Crawl, which feeds most LLM training sets), and Bytespider (ByteDance). Each gets a per-path allowed-or-blocked verdict with the exact robots.txt rule that decided it.

Why does the simulator not execute JavaScript?

Because most AI crawlers do not. GPTBot, ClaudeBot, PerplexityBot, and CCBot read the served HTML; client-rendered content largely does not exist for them. Googlebot can render JavaScript, but rendering is queued and budgeted — the raw HTML is still what every crawler sees first. The simulator deliberately shows that first-fetch view, because it is the floor your product data has to stand on.

My page looks fine in a browser but the plain-text view is almost empty. What happened?

The storefront is rendering product data client-side — common on headless React, Vue, or Next.js builds. The browser executes the JavaScript and paints a full page; a non-rendering crawler gets the empty shell the simulator showed you. The fix is server-side or pre-rendering for PDPs, and at minimum injecting Product JSON-LD server-side. The headless solutions page covers the patterns.

Does a robots.txt allow guarantee my product gets cited?

No. Crawler access is the gate, not the win — an admitted crawler still needs complete Product JSON-LD, extractable prose, and answer coverage before an engine cites the page. Run the AEO Grader on the same URL for the readiness score, and the ChatGPT Product Visibility Checker for a live citation test. The simulator tells you whether the engines can read the page; those tools tell you what they do with it.

One page simulated free. Every PDP, watched.

The simulator answers "what do crawlers see today?" eCommerce Insights re-asks it for every product on every refresh — and ships the fix when the answer changes.

Run the simulator Start free trial