Engineering 2026-06-05

Wiring Your Business into the Knowledge Graph: sameAs, @id, and Entity Linking for AI Citation

An engineer's guide to entity linking for local business: how @id gives your business a stable identity, how sameAs connects it to authoritative URIs, and why explicit graph declaration beats the string-matching layer that NAP consistency lives on.

Plate I An independent storefront: one real-world entity that has to be resolved to exactly one node in the graph Photograph: Noralí Nayla · Unsplash

Plate II A network of nodes and edges: the graph an AI assistant reconciles your business against before it cites you Photograph: Conny Schneider · Unsplash

Here is a problem that looks trivial and is not. Your business appears on the open web as at least three strings: “Blue Bottle Coffee Aoyama” on your own site, “Blue Bottle Coffee — Aoyama Cafe” in a directory, and “ブルーボトルコーヒー青山店” on a Japanese review aggregator. A human reads all three and knows they are one place. An AI assistant, asked “where should I get coffee near Aoyama?”, has to decide the same thing before it can decide whether to cite you, and it has no human intuition to fall back on. It has either evidence that the three strings denote the same entity, or it has a guess.

Most local-business optimization tries to win this by making the three strings identical. That is the NAP-consistency layer: name, address, and phone matched character-for-character so the model’s string comparison has an easy time. It works, up to a point. The point where it stops working is exactly where this piece begins, because string matching is inference (the model concludes the entities are probably the same), and there is a layer above it where you stop letting the model infer and start telling it outright. That layer is entity linking, and its two instruments are @id and sameAs.

Inference versus declaration

The distinction is worth making sharp, because it is the whole argument.

When two schema fragments both say "name": "Blue Bottle Coffee Aoyama" and "telephone": "+81-3-...", a model resolving entities does string and attribute comparison and assigns a probability that they co-refer. Clean NAP raises that probability. But the model is still reasoning “these look like the same business.” Change the name string on one surface — a new branch suffix, a romaji-versus-kana switch, a rebrand — and the probability drops, because the evidence you gave it was the strings agreeing.

@id and sameAs change the kind of evidence. Instead of “these strings look alike,” you publish “this fragment and that fragment carry the same identifier.” The model is no longer estimating co-reference from surface features; it is reading a declaration. Names can drift and the identity holds, because identity was never riding on the name in the first place.

This matters more every quarter, because the frameworks that describe AI citation are converging on it. LLMO, currently the most precise of them, treats entity identity not as something the model should probabilistically reconstruct from your NAP fields but as something you declare with resolvable URIs: the Knowledge Clarity component is, in effect, the requirement that your entity be unambiguously addressable rather than merely consistently spelled. That is a different optimization target from “make the strings match,” and it is the one that survives a rebrand.

`@id`: giving your business a primary key

@id is the stable URI you assign to your business entity. It is the primary key. Once you have one, every schema fragment that is about that business, across every page, in every @type, can point at it, and the model can collapse them into a single node.

{
  "@context": "https://schema.org",
  "@type": "Cafe",
  "@id": "https://bluebottle.example/#aoyama",
  "name": "Blue Bottle Coffee Aoyama",
  "address": {
    "@type": "PostalAddress",
    "streetAddress": "3-13-14 Minamiaoyama",
    "addressLocality": "Minato",
    "addressRegion": "Tokyo",
    "postalCode": "107-0062",
    "addressCountry": "JP"
  },
  "telephone": "+81-3-xxxx-xxxx"
}

The @id here is a URI you control: a fragment identifier (#aoyama) anchored to a canonical page on your own domain. It does not have to resolve to a live document — though it is better if it does — but it does have to be stable. The cardinal sin is letting the @id change when the page URL changes, because the moment it changes you have told the model that the old entity ceased to exist and a new one appeared.

The payoff comes when you reuse it. Your menu page emits an Offer whose offeredBy points at {"@id": "https://bluebottle.example/#aoyama"}. Your reviews page emits an aggregateRating attached to the same @id. Your contact page emits the full LocalBusiness. Three pages, three @type contexts, one identity — and a model crawling your site reconciles them without guessing.

{
  "@context": "https://schema.org",
  "@type": "Menu",
  "@id": "https://bluebottle.example/menu#menu",
  "name": "Blue Bottle Aoyama Menu",
  "hasMenuSection": {
    "@type": "MenuSection",
    "name": "Coffee",
    "offers": {
      "@type": "Offer",
      "offeredBy": { "@id": "https://bluebottle.example/#aoyama" }
    }
  }
}

That offeredBy reference is the wire. It is the difference between a menu floating in space and a menu the model knows belongs to this cafe.

`sameAs`: connecting to the authority graph

@id resolves identity within your own surfaces. sameAs resolves it against the rest of the world: it declares that your entity is the same as a node in an authoritative graph the model already trusts.

{
  "@context": "https://schema.org",
  "@type": "Cafe",
  "@id": "https://bluebottle.example/#aoyama",
  "name": "Blue Bottle Coffee Aoyama",
  "sameAs": [
    "https://www.wikidata.org/wiki/Q4928902",
    "https://www.instagram.com/bluebottlejapan/",
    "https://www.google.com/maps/place/?q=place_id:ChIJ...",
    "https://bluebottlecoffee.jp/"
  ]
}

Each URI in that array is a different kind of corroboration. The Wikidata QID anchors you to the structured public knowledge graph that sits underneath a great deal of model pre-training. The Google Place ID URL links you to the entity Google’s own surfaces project. The official site and verified social profiles are the open-web nodes a retrieval pass is most likely to reach. Together they say: the entity at this @id is the same entity you already know under all these other names.

How heavily any given engine weights a given sameAs target is documented architecture-based inference, not measured citation behavior: I can read the published descriptions of how these systems do entity resolution and reason about which anchors are load-bearing, but I cannot watch a model assign weights. With that caveat stated plainly, here is the map I work from:

`sameAs` target	What it anchors to	Why it carries weight
Wikidata QID	The public structured knowledge graph	Frequently present in pre-training; a canonical disambiguator across languages
Google Place ID URL	Google’s Knowledge Graph projection	Aligns your entity with the surface Google-integrated engines read first
Official website (`@id` host)	Your first-party canonical	Closes the loop: the authority graph points back at the identity you declared
Verified social profiles	Open-web corroboration	Reachable on a live retrieval pass; reinforces freshness
Local citation directories	Third-party entity records	Corroboration density, but lower trust than Wikidata or a verified profile

The ordering is the actionable part. A single Wikidata link does more disambiguating work than ten directory links, because it resolves you against a graph the model already holds rather than against more strings it has to trust on faith. If your business is notable enough to have, or to earn, a Wikidata item, that one URI is the highest-leverage entry in the array.

Where the layers stack

It helps to see the whole stack at once, because each layer does a job the one below it cannot.

Layer	Instrument	What it tells the model	Failure mode it removes
String consistency (NAP)	matching `name` / `address` / `telephone`	”these probably co-refer”	random spelling drift between surfaces
Internal identity	`@id` reused across pages	”all of these fragments are one entity”	menu / reviews / hours read as separate businesses
External identity	`sameAs` to authority URIs	”that one entity is this known node”	the model failing to connect you to what it already knows

NAP consistency is the floor, not the ceiling. It is necessary — contradictory phone numbers will sink you regardless of how good your sameAs array is — but it is doing the weakest kind of work, and it is the layer most exposed to the ordinary entropy of a business that changes over time. @id and sameAs are the layers that hold when the strings move. (There is an honest irony here: the cleaner your NAP, the less your sameAs array has to rescue, and the messier your reality of branches and rebrands and bilingual names, the more it does. The businesses that need entity linking most are exactly the ones too tangled to keep their strings tidy.)

If you want the layer directly below this one in detail, the companion piece on reading GBP as JSON-LD covers how Google’s own projection of your business already carries an @id, and why your job is to agree with it rather than contradict it.

A note on what this is and is not

This is structural optimization, and structural optimization is not the same as the content-and-phrasing tactics that GEO and AEO concentrate on. Those frameworks largely optimize the text that lands in an answer: how a passage is worded so a model will lift it. Entity linking optimizes something earlier and lower: whether the model can resolve who you are before it gets to what to say about you. The two are complementary, but they are not interchangeable, and conflating them is how businesses end up with beautifully worded pages the model cannot attribute to anyone. If the terminology around all this is still fuzzy, the LLMO-versus-SEO-AEO-GEO guide draws the boundaries more carefully than I can in a paragraph.

The one thing to do today

Pick your most important location and give it a real @id (a stable URI on your own domain), then add a sameAs array with, at minimum, your verified Google Place ID URL, your official site, and your primary verified social profile. If you can find or create a Wikidata item, put its QID first.

Then verify the entity actually resolves:

curl -sL https://your-domain.example/ \
  | grep -oE '<script type="application/ld\+json">[^<]+</script>' \
  | sed -E 's|</?script[^>]*>||g' \
  | python3 -c 'import sys,json; d=json.load(sys.stdin); print("@id:", d.get("@id")); print("sameAs:", d.get("sameAs"))'

If @id comes back None, your business has no primary key and every fragment about it is floating independently. If sameAs comes back None, you are asking the model to connect you to the authority graph by inference alone. Both are fixable in an afternoon, and both are cheaper than they look, because you are not adding new facts about your business — you are only telling the model that the facts it already has all belong to one entity.

The closing caveat is the same one that applies to everything in this layer: the way engines weight authority URIs today is a snapshot, and the snapshot moves. But the underlying instruction — here is my identifier, and here is what it is the same as — is about as durable a thing as you can say in structured data. Strings drift. A well-chosen URI does not.

Inference versus declaration

@id: giving your business a primary key

sameAs: connecting to the authority graph

Where the layers stack

A note on what this is and is not

The one thing to do today

Further reading

`@id`: giving your business a primary key

`sameAs`: connecting to the authority graph