Glossary entry

AI visibility metrics

The measurement family for AI answers — what each metric tells you, what it hides, and which grain supports decisions.

Last updated June 2026

The metric family

Five measurements cover most of what teams report, and each answers a different question.

  • Citation rate — of the prompts in a defined set, in what share of answers does the entity appear? The base rate of visibility; per product, it is the foundation of the citation score.
  • Share of model — of the answers in a category, what share features you versus competitors? The relative read; see share of model and its synonym AI share of voice.
  • Answer position — when cited, are you the first recommendation or the seventh? Composed answers concentrate attention at the top even more than results pages did.
  • Sentiment — how does the engine frame you when it mentions you? A citation that says "popular but frequently reported to pill" is not a win; see AI sentiment analysis.
  • Readiness scores — can engines and agents retrieve and parse the page at all: schema completeness, crawler admittance, machine-readable price and availability. The agent-readability score is this measurement per product page.

Leading vs lagging, and why both are needed

Citation rate, share of model, position, and sentiment are lagging metrics — they report outcomes the engines already produced, and they move on the engines' schedule, not yours. Readiness scores are leading — they measure the controllable inputs that make future citations possible. A dashboard with only lagging metrics tells you that you lost without saying what to fix; only leading metrics, and you are grading homework nobody marked. The pairing is the point: fix what readiness flags, then watch whether citations follow over weeks, on a held-constant prompt set so the comparison is honest. The instrumentation underneath is prompt tracking, and normal week-to-week drift means trends matter and single readings do not.

The grain problem: brand-level numbers hide the revenue

Every metric in the family can be computed at brand level or at product level, and the choice decides whether the number supports action. A brand-level citation rate of 40% sounds healthy and can coexist with the three products that drive half of revenue never appearing in an answer — a brand mention doesn't tell you which SKU won. Product-level metrics resolve the question a merchandiser can act on: which products, which engines, which intents, which PDP fields. The argument in full: measuring AI visibility — what actually matters.

How eCommerce Insights instruments the family

Per SKU, per engine, on held-constant buyer-prompt sets across ChatGPT, Perplexity, Google AI Overviews, Gemini, Claude, and Copilot. Each product carries two composite scores — the citation score (lagging: is it recommended) and the agent-readability score (leading: can an agent parse it) — with the family's raw reads (citations, position, sentiment, share against named competitors) underneath, and history so multi-week trends separate from drift. Methodology for the composites is documented in the PDP score reference. Engines change behavior on their own schedules — vendor documentation such as OpenAI's bot documentation defines what is controllable, and the metrics exist to measure exactly that controllable surface.

Related terms


Ask AI about AI visibility metrics

Have your preferred AI engine summarize this definition for your catalog.

Frequently asked questions

Which AI visibility metric should I track first?
Citation rate at the product level — of a fixed set of buying-intent prompts, which of your products appear in the answers, per engine. It is the base rate everything else qualifies, and it immediately surfaces the expensive case: revenue-driving products with zero citations. Add share of model once you need the competitive read.
What is a good citation rate?
There is no published benchmark as of mid-2026, and rates vary widely by category, engine, and prompt set — so cross-brand comparisons mislead. The usable standard is your own trend on a held-constant prompt set: flat or rising on the products that matter, with no high-revenue product stuck at zero. Treat any vendor quoting a universal "good" number with suspicion.
Why do my AI visibility metrics fluctuate week to week?
Because the engines are the moving part: models retrain, retrieval pipelines re-weight, and answer composition is probabilistic. A few points of weekly drift is normal behavior, not a signal. The pattern worth acting on is a multi-week decline on a specific product and engine — which is why metrics need history and drift context, not single readings. See LLM visibility for the framing.
Are brand-level AI visibility metrics useless?
Not useless — they are a fine comms and awareness read, and brand trackers compute them well. They just can't direct catalog work: a healthy brand citation rate says nothing about which products are absent from buying-intent answers or which PDP fields are responsible. If a metric can't resolve to a product, it can't produce a fix list.
How many prompts does a reliable metric need?
Enough that one answer changing doesn't swing the number — in practice, dozens of prompts per category rather than a handful, sampled repeatedly, because engines vary their answers across runs. Held-constant sets are the other half: change the prompts and you've changed the metric. Prompt tracking covers the instrumentation discipline.

Go deeper

See where every product in your catalog stands on this. Start a 14-day free trial — no credit card — or grade one PDP free in 30 seconds.