Measurement has always been half the SEO discipline. AI search is no different, except the layer is newer and vendors have not converged on definitions. This post is opinionated about which metrics are load-bearing, which mislead, and what an ecommerce team should put on the wall. The short version: measure per SKU, per engine, per query intent — then roll up. Everything else is a derivative.
The metric primitive
Start with the atomic question: for this product, on this engine, when buyers ask with this intent, how often is the product named, cited, and characterized accurately? That is the primitive. Every useful rollup — category visibility, catalog score, trend lines — derives from it. Tools that do not store at this granularity can only produce aggregates, and aggregates are the metrics most likely to mislead. The AI visibility score definition shows how the primitive composes into one number per SKU; in eCommerce Insights it splits further into a citation score (is the engine recommending the product) and an agent-readability score (can a shopping agent parse the PDP) because the two fail independently and get fixed by different teams.
Why share of voice misleads
Share of voice (AI) sums citations for a brand across a prompt set and expresses it relative to competitors. Useful for comms. For ecommerce it can move in ways that do not predict revenue: a long-tail SKU grabbing mentions offsets a franchise SKU losing them, and the rollup hides the swap. This is the same blind spot covered in brand-level tracking is missing your revenue — any metric aggregated above the SKU loses the information an ecommerce team acts on.