Ask ChatGPT for the best running shoe released this spring and it answers — with this spring's shoes, current prices, sometimes a stock note. The model behind that answer stopped learning in 2024. The gap between those two facts is the most misunderstood thing about AI search, and it changes what an ecommerce team should actually worry about. Brands keep asking "is my product in the training data?" when the question that decides revenue is "can the engine retrieve and read my product page right now?"
What a knowledge cutoff is
A knowledge cutoff is the date a model's training data ends. Anything published after it — your product launch, your price change, your rebrand — does not exist inside the model's weights. Ask a model with no web access about events past its cutoff and it either admits ignorance or guesses, which is one common source of hallucinated product details.
But almost no consumer AI engine runs without web access anymore. Every major AI answer engine pairs its model with live retrieval: when a query needs current information, the engine searches, fetches pages, and composes the answer from what it finds. The cutoff bounds what the model knows by heart; retrieval bounds what it can look up. For shopping queries, the look-up dominates.
Cutoff dates by engine, as of mid-2026
| Engine (model family) | Knowledge cutoff | Searches the live web? |
|---|---|---|
| ChatGPT (GPT-5 family) | Sept 30, 2024 (flagship) | Yes — ChatGPT search; OAI-SearchBot index plus live fetches |
| Perplexity (routes across several frontier models) | Varies by model | Always — retrieval-first by design; every answer is search-backed |
| Google AI Overviews / AI Mode | n/a in practice | Always — Gemini models composed over the live Search index |
| Gemini (2.5 generation) | Jan 2025 | Yes — Google Search grounding for current topics |
| Claude (Sonnet 4.5 / Opus 4.1) | Jan–Mar 2025 (varies by model) | Yes — web search available, on by default in many plans |
| Copilot (OpenAI models) | Inherits OpenAI cutoffs | Yes — grounded in the Bing index |
Cutoffs as published by each vendor for the listed model family, as of mid-2026 — see OpenAI's model documentation, Google's Gemini model docs, and Anthropic's model overview for current values. Vendors ship new models and revise documentation frequently; this table is a snapshot the research team updates periodically, not a permanent reference. Models' own self-reported cutoffs are unreliable — trust the docs, not the chatbot.
Why the second column matters more than the first
Read the table by columns and the story changes. The cutoff column varies by months; the retrieval column reads yes, always, always, yes, yes, yes. For product queries — the ones with revenue attached — every engine that matters composes from live sources. When a shopper asks "best merino base layer under $100," the engine does not consult its 2024 memories of your catalog; it fans the prompt out into retrieval queries (query fan-out), fetches PDPs, roundups, and review threads, and recommends from what it could fetch and parse.
Three consequences for an ecommerce brand. First, recency is not your moat or your excuse: a SKU launched last week can be recommended today if its page is retrievable, and a SKU the model "knows" from training can be absent because retrieval surfaced competitors. Second, the controllable surface is the live one — crawler admittance in robots.txt, complete Product JSON-LD, machine-readable price and availability. Price and stock change too fast for any training corpus; engines must read them from your page, and a page they can't parse gets skipped, not guessed at. Third, the cutoff still matters at the brand-entity level: what the model knows by heart shapes how it frames your brand when it writes around the retrieved facts. That baked-in prior moves slowly and only changes when models retrain — one reason brand framing in answers drifts on the vendors' schedule, not yours.
When was GPT-4 released? Model release dates, for the record
Cutoff questions arrive bundled with release-date questions — when was GPT-4 released, when did GPT-4 come out, when was GPT-3 released — so here is the companion table. Two clarifications save most of the confusion. First, ChatGPT and GPT are different things that version separately: ChatGPT is the product (launched November 30, 2022), GPT-2 through GPT-5 are the model families behind it, so "ChatGPT 4" really means "ChatGPT running GPT-4." Second, a release date is not a cutoff: GPT-5 shipped in August 2025 carrying a September 2024 cutoff — the model is newer than its knowledge.
| Model | Released | Knowledge cutoff (as documented) |
|---|---|---|
| GPT-2 | Feb 2019 (full model Nov 2019) | 2017–2019 era web text |
| GPT-3 | June 2020 (via API) | ~Oct 2019 |
| GPT-3.5 / ChatGPT launch | Nov 30, 2022 | ~Sept 2021 |
| GPT-4 | March 14, 2023 | ~Sept 2021 at launch |
| GPT-4o | May 13, 2024 | ~Oct 2023 |
| GPT-5 family | Aug 2025 | Sept 30, 2024 (flagship) |
Release dates per OpenAI's public announcements; cutoffs per OpenAI's model documentation at the time, as of mid-2026. Older models' cutoffs are approximate — OpenAI's early documentation was less precise than today's model cards.
The trap of cutoff-era thinking
Teams that anchor on cutoffs draw the wrong operational conclusions: "the model was trained before our launch, so we can't show up" (false — retrieval finds you) or "we were in the training data, so we're covered" (also false — retrieval replaces memory for shopping answers). The cutoff-era mental model treats AI visibility as a one-time fact about a frozen model. The retrieval reality makes it a live, per-product, per-engine measurement that moves week to week — which is why it has to be tracked, not assumed. How AI engines pick which products to cite walks the full mechanism.