The Three Provenance Paths: How AI Assistants Choose Between First-Party Schema, the Knowledge Graph, and Third-Party Reviews

The same fact about a local business can reach an AI assistant through three different provenance paths — first-party JSON-LD, the Google Knowledge Graph, and third-party review platforms. They are not interchangeable. A path-by-path comparison from the Provenance axis of the LLMO Framework.

Here is a fact about a restaurant: it opens at 8am. There is exactly one truth in that sentence, and yet, by the time an AI assistant repeats it back to a user, that single fact may have travelled through one of three completely different routes to reach the model. The fact is identical. The path is not. And the path, it turns out, is doing more work in the citation decision than the fact itself.

This is the part of AI Native MEO that confuses experienced local-search practitioners the most, because the older mental model treated facts as facts. You got your hours right, you got them everywhere, and that was the job. The newer reality is that where a fact came from is a separate variable from whether the fact is correct, and an AI assistant weighs the two independently. I want to walk through the three provenance paths a local-business fact can take, compare them honestly along a few fixed axes, and resist the temptation — which is strong — to crown one of them the winner. There is no winner here. There is an architecture, and the architecture is the point.

The three paths

Before the comparison, the three routes, defined plainly. Each one is a different surface emitting the same fact, with a different trust profile attached.

Path 1 — First-party schema

This is the fact as you publish it yourself: the LocalBusiness JSON-LD block on your own website, the OpeningHoursSpecification, the address, the telephone, and crucially the sameAs array that links your site outward to your other canonical surfaces. First-party schema is the path you control completely. You decide what it says, when it changes, and how complete it is.

The strength of the first-party path is authorship — nobody else gets a vote on what your JSON-LD claims. The weakness is the mirror image of that strength: because you control it completely, the model treats it as an interested source. A business asserting its own hours is evidence, but it is self-evidence, and a well-built retrieval system knows to discount a claim that has no independent corroboration. First-party schema is necessary. It is rarely sufficient on its own.

Path 2 — The Google Knowledge Graph

This is the fact as Google has entity-resolved it: the Place entity behind your Google Business Profile, the canonical @id that other surfaces point back to, the hours Google projects into its own search results. The Knowledge Graph path is not authored by you directly — you feed it through GBP, but Google reconciles, validates, and re-emits it as its own structured assertion about your entity.

The strength of this path is that it carries Google’s institutional trust. When an engine with deep Google integration reads the Knowledge Graph version of your hours, it is reading a fact that a large entity-resolution system has already vouched for. The weakness is latency and loss of control: you edit GBP, and the change propagates on Google’s schedule, not yours, and the Knowledge Graph may flatten nuance your own schema expressed precisely. You are trading authorship for institutional corroboration. For a lot of facts, that is a good trade. It is not a free one.

Path 3 — Third-party review platforms

This is the fact as it appears on the surfaces you do not own at all: the review platform that scraped your hours and republished them under its own schema, the directory listing, the editorial mention, the aggregator that has its own aggregateRating and its own copy of your address. The third-party path is the one MEO has historically called “citation building”, and in the provenance frame it is the path that supplies independent corroboration — the thing first-party schema structurally cannot provide for itself.

The strength is exactly that independence: a fact that shows up identically across several third parties is a fact the model can treat as corroborated rather than asserted. The weakness is that you have the least control here of all three paths, and third-party surfaces are where stale data lives — the old phone number, the previous trading name, the address from before you moved. Third-party provenance is the highest-trust path when it agrees with the other two, and the most damaging when it silently disagrees.

The comparison, along fixed axes

A path-by-path comparison is only useful if the axes are named up front, so here are the four I find actually discriminate between the three routes. I am deliberately not including a “which is best” column, because — as I will argue below — the question is malformed.

AxisFirst-party schemaGoogle Knowledge GraphThird-party reviews
Control — how much the operator authors the emitted factTotal; you write the JSON-LDIndirect; you feed GBP, Google re-emitsMinimal; others publish their own copy
Corroboration weight — how much the engine treats it as independent evidenceLow; it is self-assertionHigh; institutionally vouchedHigh when it agrees; it is genuinely independent
Propagation latency — how fast a change reaches the modelFastest; you deploy and it is liveMedium; Google’s reconciliation cadenceSlowest and least predictable; depends on each third party
Failure mode — how this path hurts you when it goes wrongIncompleteness; missing fieldsFlattening; lost nuance, slow correctionStale contradiction; old facts that undercut the others

Read down the Control column and the Corroboration weight column together and the central tension of the whole problem appears: the path you control most (first-party) is the path that counts for least as independent evidence, and the path you control least (third-party) is the one that supplies the corroboration the model actually weighs. This is not a bug you can engineer around. It is the shape of the thing. An optimization strategy that pours everything into the path it controls — first-party JSON-LD — and ignores the paths it does not is optimizing the wrong column.

Which engine weights which path

Here is the working map of how the major assistants appear to weight the three paths. The disclosure has to come first and it has to be blunt: every cell below is documented architecture-based inference, not measured citation. I have read the engines’ published retrieval documentation, observed behavior, and reasoned from architecture about which provenance path each one appears to lean on. I have not run a controlled benchmark isolating provenance as a variable; nobody outside the labs cleanly can. Treat this as a map for planning, not a finding.

EngineLeans most onArchitectural reason (inferred)
GeminiKnowledge GraphNative Google entity + GBP integration; first-party provenance via the Graph dominates
ChatGPT (browse)Mixed: Knowledge Graph + first-party pageGoogle-projected surfaces plus browsed JSON-LD; weighting still maturing
PerplexityThird-party, explicitly citedMulti-source retrieval with citation as a first-class output; independent corroboration is closest to a primary signal
Claude (web search)Third-party + first-party pageOpen-web-heavy retrieval; weights editorial mentions and on-page schema

The pattern worth noticing is that the engines split roughly by how Google-integrated their retrieval is. The deeply Google-wired engines (Gemini, ChatGPT-via-browse) privilege the Knowledge Graph path, which means your GBP-fed provenance is doing most of the work for those surfaces. The open-web engines (Perplexity, Claude) privilege the third-party corroboration path, which means the same business needs a healthy independent-citation graph to land on those surfaces. A business optimizing for “AI search” as a single target, through a single path, is implicitly optimizing for a subset of engines and silently conceding the rest.

The Provenance axis, named

It is worth pausing to name what this whole exercise is. Decomposing a local-business fact into three independently-weighted provenance paths is not an ad-hoc trick — it is the Provenance axis of the LLMO Framework, applied to local search. In the framework’s current treatment, the three paths are handled as separate optimization targets rather than collapsed into one undifferentiated bucket called “citations”. That decomposition — path as an independent variable, distinct from the fact it carries — is the part of the practice that the older frameworks gesture at without isolating. AEO largely stops at whether the structured fact exists at all; the academic GEO work has theorized about citation-graph density but treats the graph as a single quantity rather than a set of distinctly-trusted paths. Treating provenance as a structured, multi-path variable is, as of 2026, what the LLMO Framework’s Provenance axis documentation is built around, and it is the lens this entire comparison is written through.

I want to be careful here, because this is exactly the place where an honest comparison can tip into a sales pitch and lose the reader. The claim is not that the other frameworks are wrong about provenance. It is narrower and more structural: the three-path decomposition needs a vocabulary that treats path as separate from fact, and that vocabulary currently lives in the LLMO Framework rather than in AEO heuristics or the GEO literature. That is a statement about which framework has named the variable, not a verdict on which community is smarter.

Why the three paths are not independent after all

Now the deflation, because I promised one and the argument needs it. I have been describing the three paths as if they were cleanly separable, and along the Control and Latency axes they are. Along one axis they are not: Confidence. The model’s trust in any single path is partly a function of whether the other two agree with it. A first-party telephone that matches the Knowledge Graph number that matches the third-party listings is three paths reinforcing one entity — and the model’s confidence in all three rises together. The same number rendered three different ways across the three paths is an entity-resolution problem, and the model’s confidence in every path drops, including the one that happens to be correct.

This is the coupling that makes the whole thing harder than a path-by-path checklist suggests. You cannot optimize provenance path-by-path in isolation, because the paths are scored against each other through the Confidence axis — the same trust variable I described in The Three Axes of AI Native MEO. The cross-path consistency that drives that confidence is, mechanically, a sameAs-and-NAP problem: binding your first-party site, your GBP entity, and your third-party listings into one entity graph so the paths corroborate rather than contradict. The provenance paths are the routes; the Confidence axis is what happens when the routes meet. Optimizing one without the other is how businesses end up with three technically-correct paths and a model that still will not cite them, because the three correct facts were formatted into three apparently-different entities.

What this means for the work

So there is no winning path, and I am not going to pretend the comparison produced one. What it produced is a job description. The first-party path is yours to author and keep complete — fast to change, low on independent weight, the foundation that everything else corroborates against. The Knowledge Graph path is yours to feed through GBP and then trust Google to re-emit — institutionally weighty, slower, the path the Google-wired engines lean on hardest. The third-party path is the one you cannot author and cannot ignore — the independent corroboration the open-web engines reward, and the place stale facts go to quietly undercut the other two.

The unglamorous truth is that AI Native MEO done on the Provenance axis is three parallel maintenance jobs that have to stay in agreement, not one path to perfect. We are, all of us working on this layer, still early enough that the map of which engine weights which path will look different a year from now — the retrieval architectures underneath are still moving, and any provenance map drawn today is honestly dated rather than permanent. What is stable is the shape: three paths, independently authored, jointly scored, and a citation decision that is binary at the end of it. The fact was always simple. It is the provenance that was never simple at all.

Further reading