Jobs to be done · Developer · SEO

How to verify your llms.txt.

Q: What is llms.txt and which engines actually use it?

llms.txt is a markdown-formatted file served at the root of a domain that gives AI engines a curated list of pages and content groupings. It was proposed by Jeremy Howard in late 2024. As of Q1 2026, ChatGPT, Perplexity, and Claude have published or confirmed they fetch it; Google has not formally committed. Treat llms.txt as a low-cost positive signal, not a guarantee.

Q: How is llms.txt different from sitemap.xml or robots.txt?

sitemap.xml is exhaustive and machine-parseable; it tells search engines what exists. robots.txt is a directive; it tells crawlers what is allowed. llms.txt is curated and prose-friendly; it tells AI engines what matters and gives them context. The three coexist. A complete AI-readability stance has all three.

Q: Can I block specific crawlers from llms.txt?

Crawler blocking is robots.txt's job, not llms.txt's. The two interact: a crawler blocked in robots.txt will not fetch llms.txt either. If a brand wants to share data with some engines and not others, the robots.txt user-agent rules are the lever.

Publishing an llms.txt is easy. Publishing one that is correct, complete, and actually being fetched is the part that takes a checklist.

eCommerce Insights Team · Updated 2026-05-19 · 8 min read

Quick answer

Open yourdomain.com/llms.txt in a browser. Confirm it serves with a 200 status, follows the markdown spec (H1, single-sentence description, then grouped link sections), and includes every high-revenue page in your sitemap. Check server logs for fetches from OAI-SearchBot, PerplexityBot, ClaudeBot, and Google-Extended. The free llms.txt generator validates compliance and outputs a fixed version.

What llms.txt is

llms.txt is a markdown file served at the root of a domain that gives AI engines a curated entry point to the site. It was proposed by Jeremy Howard in late 2024 and has been adopted in some form by ChatGPT, Perplexity, and Claude as of Q1 2026. The format is intentionally simple: an H1 with the site name, a one-sentence description, then markdown sections grouping high-value URLs with short context blurbs. The full proposal lives at llmstxt.org. Treat the file as a low-cost positive signal; it is not a guarantee of citation and does not replace robots.txt or sitemap.xml.

Step 1: Confirm the file is served

Open yourdomain.com/llms.txt in a browser. Three checks:

HTTP status is 200, not 301 redirected or 404.
Content-Type header is text/plain or text/markdown, not HTML.
The file renders as plain text in the browser, not styled HTML.

On Shopify, serving a true text/plain file at the root requires either an app, a redirect, or a custom-page template hack. eCommerce Insights's free llms.txt generator outputs a Shopify-compatible deployment with the right content type. See generate llms.txt for my Shopify store for the generation step.

Step 2: Validate spec compliance

A spec-compliant llms.txt has:

An H1 at the top with the site or brand name.
A short blockquote or paragraph immediately after, describing the site in one or two sentences.
One or more H2 sections grouping links. Conventional sections include "Products," "Collections," "Policies," "FAQ," and "About."
Link lists in markdown bullet syntax: - [Link text](https://...): brief context.
An optional "Optional" H2 at the end for less critical pages an engine may skip.

Common gotchas: HTML inside the file (engines strip it inconsistently); links to staging URLs (they will resolve to 404 in production); duplicate sections; links wrapped in additional markdown decoration. Run the file through the generator's compliance check.

Step 3: Audit sitemap parity

Compare URLs in llms.txt against sitemap.xml. The two should not be identical — llms.txt is curated, sitemap.xml is exhaustive — but every high-revenue product, every active collection, and every key policy page in the sitemap should appear in llms.txt. Conversely, llms.txt should not contain URLs that are not in the sitemap. Drift is common: a brand publishes llms.txt once and forgets it as the catalog changes. Quarterly review minimum.

Step 4: Check crawler fetches

Open server logs or a Cloudflare access log. Filter for requests to /llms.txt in the last 30 days. The user agents to look for:

OAI-SearchBot — ChatGPT search crawler.
PerplexityBot — Perplexity's crawler.
ClaudeBot — Anthropic's crawler.
Google-Extended — Google's AI training crawler.
GPTBot — OpenAI's training crawler.

If none of these have fetched llms.txt in 30 days, the file is unlikely to be influencing AI citation. Common reasons: the file is not actually being served, robots.txt blocks the crawler, or the domain is too small for the crawlers' fetch cadence. The first two are fixable; the third is solved by being patient and growing.

Step 5: Schedule a refresh cadence

llms.txt should refresh whenever the catalog changes meaningfully: new product launches, discontinued SKUs, repositioned collections. Weekly for active D2C catalogs, monthly for stable ones. Manual maintenance is brittle; the eCommerce Insights paid product re-generates and pushes llms.txt weekly.

Crawler adoption status

llms.txt adoption is still uneven across engines as of Q1 2026. A practical read of where each major engine stands:

ChatGPT (OpenAI). Confirmed fetching llms.txt via OAI-SearchBot. Influence on citation patterns is positive but not yet quantified.
Perplexity. Documented support; PerplexityBot fetches llms.txt and uses it as a source-discovery hint.
Claude (Anthropic). ClaudeBot fetches it; Anthropic has acknowledged the convention in public materials.
Google (Gemini, AI Overviews). No formal commitment as of Q1 2026. Google-Extended fetches the file but Google has not stated whether it influences AI Overviews ranking.
Copilot (Microsoft). Bingbot fetches it; Microsoft has not formally documented usage.

Treat llms.txt as a low-cost, positive-expected-value signal. Ship the file, monitor fetches, and pair it with the structural improvements (schema, PDP rewrites, review grounding) that drive the bulk of citation lift.

Common mistakes

HTML wrapping. Some apps render llms.txt inside the storefront HTML layout. AI engines may strip or skip the wrapped version.
Stale URLs. Discontinued products in llms.txt produce broken citations. Remove on a refresh cycle.
Over-listing. llms.txt is not a sitemap. Listing 5,000 SKU URLs dilutes the signal.
No description blurbs. The short context after each link helps the engine understand what it is fetching.
Missing policies. Shipping and return policy pages are frequently cited; omit them and you miss citations.

An llms.txt that no crawler has fetched in 30 days is a file you wrote for yourself. Useful as a habit. Not yet useful as a signal.

Frequently asked questions

What is llms.txt and which engines actually use it?

llms.txt is a markdown-formatted file at the root of a domain that gives AI engines a curated list of pages. It was proposed by Jeremy Howard in late 2024. As of Q1 2026, ChatGPT, Perplexity, and Claude have published or confirmed they fetch it; Google has not formally committed.

Where does the file live on Shopify?

It must be served at the root: yourdomain.com/llms.txt. Shopify does not expose root file uploads through the admin; the workable patterns are an app that proxies the path, a redirect rule, or a theme template tied to a custom page.

What goes in llms.txt for a Shopify catalog?

An H1 with the brand name, a one-sentence description, then markdown sections grouping product collections, top SKUs, key policy pages, the FAQ, and the contact page. Skip pagination, filters, and admin pages.

How is llms.txt different from sitemap.xml or robots.txt?

sitemap.xml is exhaustive; robots.txt is a directive; llms.txt is curated and prose-friendly. The three coexist. A complete AI-readability stance has all three.

How often does eCommerce Insights refresh llms.txt?

On the paid plan eCommerce Insights re-generates llms.txt weekly and pushes the refreshed version to Shopify automatically. On the free generator the brand exports the file and deploys on their own cadence.

Can I block specific crawlers from llms.txt?

Crawler blocking is robots.txt's job, not llms.txt's. A crawler blocked in robots.txt will not fetch llms.txt either.

Ask AI about verifying llms.txt

Have your favorite AI engine summarize this for your specific use case.

Related jobs

Developer

Related tools

llms.txt generator — free, Shopify-compatible.
AEO Grader — confirms crawl access alongside schema and content.

Generate, verify, and refresh llms.txt automatically.

eCommerce Insights keeps the file in sync with your Shopify catalog every week.

Start free trial Read the llms.txt guide