A field guide to the agent-legible web

A field guide implies you've been somewhere. We have been somewhere for about a year, mostly watching how retrieval systems actually read the sites they cite, and the punchline is small enough to fit on the back of a card: **crawlers don't read pages, they read passages, and the passages they trust have structure.** The agent-legible web is the web written for that fact. Not a new web — the same one, with four habits that turn out to compound: - Answer-first sections. Claim before context. - Inline structure (JSON-LD, schema.org) generated from the same data the human page renders. - A curated /llms.txt that points an agent at the right things first. - Provenance attached to claims, not to pages. ## Why now Three years ago this would have been a vanity exercise. The crawlers were either too dumb to use structured data well or too rapacious to be welcomed. Today both sides have moved: retrieval systems are very good at extracting structured records, and they would prefer to cite sources they can verify. Sites that publish for that audience get cited more. Sites that don't fall back into invisibility, where the model paraphrases them without attribution. ## What this isn't It isn't SEO with extra steps. SEO optimises for a ranking algorithm; agent legibility optimises for a retrieval system's evidence pipeline. The difference is whether you're trying to be found or trying to be trusted. They're related but not the same, and the techniques diverge fast. It also isn't a brand-safety play. This is research infrastructure. The point is that the web's second reader, whatever it ends up being called this quarter, is now part of the audience your work has to function for. You write for it the same way you write for a human reader: clearly, with structure, and without lying. ## What ships from this board Four artefacts, in rough order: a provenance primitive (see the dedicated note), a llms.txt benchmarking method, a passage-friendly markup convention, and a reference emitter. The goal is small specs that an unaffiliated site can adopt in an afternoon — not yet another standard nobody implements.

The agent-legible web is the web written for that fact. Not a new web — the same one, with four habits that turn out to compound:

Answer-first sections. Claim before context.
Inline structure (JSON-LD, schema.org) generated from the same data the human page renders.
A curated /llms.txt that points an agent at the right things first.
Provenance attached to claims, not to pages.

Why now

Three years ago this would have been a vanity exercise. The crawlers were either too dumb to use structured data well or too rapacious to be welcomed. Today both sides have moved: retrieval systems are very good at extracting structured records, and they would prefer to cite sources they can verify. Sites that publish for that audience get cited more. Sites that don't fall back into invisibility, where the model paraphrases them without attribution.

What this isn't

It isn't SEO with extra steps. SEO optimises for a ranking algorithm; agent legibility optimises for a retrieval system's evidence pipeline. The difference is whether you're trying to be found or trying to be trusted. They're related but not the same, and the techniques diverge fast.

It also isn't a brand-safety play. This is research infrastructure. The point is that the web's second reader, whatever it ends up being called this quarter, is now part of the audience your work has to function for. You write for it the same way you write for a human reader: clearly, with structure, and without lying.

What ships from this board

Four artefacts, in rough order: a provenance primitive (see the dedicated note), a llms.txt benchmarking method, a passage-friendly markup convention, and a reference emitter. The goal is small specs that an unaffiliated site can adopt in an afternoon — not yet another standard nobody implements.

From the The agent-legible web board. Replications, counter-arguments, and "you reinvented X" corrections all welcome in the thread.