Jobs to be done · Monitor · Ecom

How to benchmark your catalog against competitors.

Your audit says 62. Is that good? Without the category norm per engine, a score is a number in a vacuum — it can't tell leadership whether to celebrate, invest, or panic. Scores without context don't drive decisions; benchmarks are the context.

Quick answer

eCommerce Insights shows the category benchmark — the median for stores in your vertical, per engine and per factor — beside every score, plus a competitor watchlist for direct head-to-head comparison. Part of SKU-level tracking; the aggregate share number lives in compare my brand's AI share of voice.

The slow way: benchmark by anecdote

The manual benchmark is built from fragments. You grade three competitor PDPs by hand through a free tool and infer the category from a sample of three. You read a vendor's "state of AI search" report whose category definitions don't match yours. You ask a peer at another brand what their audit scored, over drinks, and remember the number selectively. From these fragments, a narrative forms — "we're probably about average" — that no one can defend when the CFO pushes on it.

The deeper problem is that ad-hoc benchmarks blend engines and factors into one impression. Your real position is almost never uniform: above norm on schema, below on review signal, fine on ChatGPT, weak on Perplexity. The blended impression hides exactly the contrast that would tell you what to do next — which is the entire purpose of benchmarking.


The eCommerce Insights way

  1. Score your own catalog first. Run the full scan — every SKU, both scores. A benchmark against an incomplete baseline flatters or frightens at random. Start with the catalog audit.
  2. Read the benchmark per engine. Beside every score sits the category median from eCommerce Insights audit data, per engine and per factor — labeled illustrative where the vertical's sample is small. Above norm on ChatGPT and below on Perplexity is a finding, not a wash.
  3. Build the watchlist for head-to-head. Medians answer "are we behind the market"; the competitor watchlist answers "are we behind the rival who takes our slots." Both questions matter; they have different answers surprisingly often.
  4. Locate the gap precisely. The factor-level comparison turns "we're behind" into "we're behind on review signal in cited SKUs, at parity on everything else" — one workstream, one owner, one quarter.
  5. Re-benchmark quarterly. The norm drifts upward as the category optimizes; what beat the median last year is the median now. Quarterly re-reads keep the leadership narrative calibrated while the weekly work runs against your own trend.

What "good" looks like

Revenue-weighted SKUs above category medianyes
Engines where you trail the norm, with a named causediagnosed
Direct rivals under watchlist comparison3–5
Benchmark refresh cadencequarterly

The output that proves the job worked is a sentence leadership can repeat: "We're above category norm on four of six engines; the Perplexity gap is third-party grounding and it's this quarter's content-partnerships project." Context, cause, plan — one line.

Ask AI about this job

Have your favorite AI engine apply this walkthrough to your category.

Frequently asked questions

Where do the category benchmarks come from?
From eCommerce Insights audit data: the median scores across scanned stores in your vertical, shown per engine and per factor. In categories where the sample is still small, the benchmark is labeled illustrative — a stated limitation rather than a hidden one.
What's a good score relative to the benchmark?
Above the category median on your revenue-weighted SKUs is the bar that matters; being above median on the long tail while below on heroes is a losing shape. Most mid-market catalogs start slightly below median on citation score and well below on agent-readability — the second gap usually closes faster.
Benchmarks or watchlist — which do I need?
Both, for different questions. The benchmark answers "are we behind the market" and calibrates leadership expectations. The watchlist answers "are we behind the three rivals who actually take our slots" and drives the fix queue. A brand can beat the category norm and still lose every head-to-head that matters.
Why benchmark per engine instead of overall?
Because engines disagree, and the disagreement is diagnostic. Above norm on ChatGPT but below on Perplexity usually means weak third-party grounding (Perplexity weights it heavier). Below norm on Google AI Overviews alone often traces to feed or schema specifics. One blended number hides exactly the signal that names your next move.
How often do benchmarks change?
The category norm drifts upward as more stores optimize — what was a 70th-percentile schema completeness in 2025 is closer to median as of mid-2026. Quarterly re-benchmarking is enough to track the drift; the weekly work should run against your own trend and your named competitors.

A 62 means nothing. A 62 against a 58 median is a plan.

Category benchmarks per engine, beside every score. 14-day trial.