For thirty years we built the web for a reader who skims. Headlines, hierarchy, whitespace, the merciful blue link — all of it is choreography for a human eye deciding in half a second whether to stay. That reader is still here. But a second reader has arrived, and it does not skim.
An agent doing research doesn't land on your page and scroll. It retrieves the three sentences that answer its question, drops the rest, and moves on. It may never load your stylesheet. It will, however, want to know one thing the human reader rarely asks out loud: says who?
The human reader trusts a page because it looks credible. The machine reader trusts a claim only if it can trace it.
Today, provenance lives in the presentation layer. A footnote here, a "sources" section there, a link styled to look authoritative. This is fine for humans, who infer trust from a hundred soft signals. It is nearly useless for a model, which has already discarded the page chrome and is holding a naked passage with no idea where it came from.
The result is the failure mode everyone now recognises: an agent confidently asserts something, and when you ask for the source, it reconstructs a plausible-looking one. The content existed; the link between content and origin did not survive retrieval. Provenance was a feature of the page, and the page is gone.
Treating provenance as a primitive means the smallest unit of meaning — a claim — carries its own origin, and that origin survives being pulled out of context. Concretely, three things travel together:
None of this is exotic. It's JSON-LD you already could be emitting, plus a discipline about granularity. The shift is conceptual: you stop thinking of a citation as something you show and start thinking of it as something the content is.
There's a tidy incentive here, which is the only kind that survives contact with a roadmap. Retrieval systems are under enormous pressure to reduce hallucination, and the cheapest way to do that is to prefer sources whose claims come pre-attributed. A page that hands a model a verifiable source for every assertion is a page that model can safely cite. Over time, those pages get cited more — not because they game a ranking, but because they lower the consuming system's risk.
This is the whole thesis of the board this note comes from: the web's second reader rewards legibility, and provenance is the most valuable form of legibility there is. We'd rather be the kind of source a careful agent reaches for than the kind it has to fact-check.
We're drafting a minimal spec on the board. It will be wrong in interesting ways. Come break it.