A design conversation on AI, package managers, and the future of software intent capture
Date: April 2026
Format: Exploratory dialogue — human and Claude (Anthropic)
Status: Living document — conversation ongoing
Abstract
This document captures an exploratory conversation about a fundamental problem in AI-assisted software development: the gap between intent and artifact. Starting from the observation that programming languages and package managers exist to solve human coordination problems, the conversation traces a path to a proposed system — an intent capture and versioning architecture — that would treat the reasoning behind software as a first-class artifact alongside the code itself.
The core thesis: the conversations between humans and the conversation between a human and an AI is the spec. We are throwing it away.
1. The Starting Insight — Languages and Package Managers as Human Solutions
Programming languages and package managers are not fundamentally technical systems. They are knowledge transmission systems.
A package manager isn’t about code — it’s about encoding the answer to “how did someone else solve this before me?” and making that answer reusable. The code is the artifact. The real cargo is crystallized human understanding.
Programming languages are the same thing one level up: a negotiated contract between humans about how to express intent in a way other humans (and machines) can interpret.
Package managers as a knowledge system:
- A namespace — shared agreement on what things are called
- A trust system — who vouched for this knowledge?
- A dependency graph — what other knowledge does this knowledge assume?
- A versioning protocol — how has this understanding evolved?
These are solutions to human coordination problems, not computational ones.
2. How AI Breaks the Package Manager Model
AI doesn’t need a package manager because it doesn’t need the externalized knowledge store. The reason packages exist is that a human can’t hold all prior solutions in their head, so we built a system to retrieve them on demand. AI already did that retrieval — at training time. It internalized the knowledge graph that npm points to.
The evolution of the model:
| Era | Model | Where knowledge lives |
|---|---|---|
| 1 — Libraries | You write the wheel, I import it | Files |
| 2 — Package Managers | Formalized discovery, versioning, trust | Indexes |
| 3 — LLMs as code generators | The pattern, not the file | Model weights |
| 4 — Emerging | Agents that negotiate solutions in context, synthesize and discard | Ephemeral |
What AI breaks:
- Reproducibility — synthesized code has no
package-lock.json. Two runs of the same prompt can produce different implementations. - Trust — who audited the synthesized code? The package ecosystem has imperfect but real social trust signals. AI-generated code has the vibe of the prompt.
The next human problem to solve is provenance and auditability of synthesized knowledge.
3. The Reverse Problem — Validating Against a Spec
Validation tooling mirrors the same archaeology problem. A spec is crystallized human knowledge. A test suite, type system, formal contract, or OpenAPI schema — all are humans encoding intent in a retrievable, machine-checkable form.
But the symmetry breaks where it matters most.
Generation and validation are not the same problem wearing different hats:
- Generation is creative. There are infinite valid implementations of a spec.
- Validation is oracular. You’re asking whether a solution satisfied intent. This is harder, for one brutal reason:
Specs are written by humans, which means they are incomplete by definition. The spec doesn’t capture what the human meant. It captures what the human said.
This is the Goodhart’s Law trap: once you validate against the spec, the spec becomes the target, not the goal. AI is exceptionally good at satisfying the letter of a spec while missing the spirit entirely.
The Closed Epistemic Loop
If AI generates the code, validates against the spec, and helped write the spec — you have a closed epistemic loop. The system is checking its own work using criteria it influenced. There is no external ground truth. The human is no longer in the verification chain.
This is already happening: AI-generated tests validating AI-generated code. The tests pass. The software is wrong.
The implication: Generation and validation need to be adversarially separated — different agents, different prompts, genuinely different priors. Not one AI wearing two hats. This maps to how humans solved it: the person who writes the code shouldn’t be the only person who reviews it. Code review exists to introduce a second interpretation of the spec.
4. The Package Manager Equivalent for Validation
The real artifact needed is not a better linter. It is a spec versioning and intent-capture system — something that records not just what the spec says, but why it was written that way and what edge cases were consciously excluded.
This is the knowledge artifact AI actually needs to validate against. It almost never exists.
Why Function-Level Specs Are a Local Lie
Traditional specs — contracts, types, tests — are stateless and local. They answer “does this function behave correctly in isolation?” But applications are a coherent set of decisions made over time. The correctness of any one part depends on understanding the whole arc.
A function that returns the right value can still be deeply wrong if:
- It solves a problem the application abandoned six months ago
- It optimizes for a constraint that was relaxed after a business pivot
- It contradicts an architectural decision made in a different part of the codebase
No package-level spec catches that. That knowledge lived in the heads of the people who were in the room.
What Application-Level Intent Actually Is
An application’s real spec is three things layered on top of each other:
Invariants — things that must always be true regardless of what changes. Not “this function returns a positive integer” but “a user can never see another user’s private data, under any circumstances, including edge cases we haven’t imagined yet.”
The decision ledger — not just what was built, but why this shape and not another. Why is this a microservice? Why is this synchronous? Why does this module own this data? These decisions exist but are buried in Slack threads, PR comments, and the memories of people who may have left.
The evolution of intent — applications change purpose. What was a B2C product becomes a B2B platform. What was a prototype constraint becomes load-bearing. The spec is a temporal artifact, and we have almost no tooling that treats it that way.
The Shape of the System
- A semantic graph, not a document. A structured, queryable record of decisions and their relationships. Not a README. Not a wiki.
- Temporally versioned like git, but for reasoning. Not just what the code looked like at commit X, but what the intent was at that moment.
- Linked to the code it governs. Decisions anchored to the artifacts they produced, so when you change the artifact, the system surfaces the decision connected to it.
- Adversarially legible to AI. Structured so an AI agent can reason against it — not just retrieve it, but query it: “Does this proposed change violate any recorded invariant? Does it solve a problem we intentionally decided not to solve?”
5. Greenfield Projects — The Unique Opportunity
The Greenfield Illusion
Greenfield feels clean. No legacy. No inherited decisions. But greenfield projects are not decision-free — they are decision-dense. Every choice is being made for the first time, carrying maximum consequence and minimum validation.
In brownfield, bad decisions have been stress-tested by reality. In greenfield, decisions accumulate at the fastest possible rate with no feedback loop yet. AI accelerates that accumulation dramatically. The debt of uncaptured intent compounds faster than ever before.
What’s Actually Happening in AI-Assisted Greenfield Builds
The human holds a vision loosely in their head. They prompt the AI iteratively. The AI generates coherent local solutions. The codebase grows.
Created: Code, structure, tests, interfaces, data models.
Not created: Any record of why the conversation went the way it did. Why that data model and not another. What the AI proposed that got rejected, and why. What constraints the human was implicitly carrying that were never spoken aloud.
The human’s intent is the invisible hand shaping everything, and it is completely uncaptured. The AI is a highly capable executor with no persistent understanding of the vision it serves.
The Advantage That Has Never Existed Before
For the first time, the AI was present at the moment of every decision. In a traditional project, intent evaporates because humans held it in conversations, Slack threads, and their own heads. In an AI-assisted greenfield build, the reasoning process is happening in the conversation. It’s already being generated as language. It just isn’t being kept.
Every prompt-response cycle is a decision moment. The human is expressing intent. The AI is interpreting it. That exchange is the most explicit a design decision ever gets — and right now it’s being thrown away.
The conversation is the spec. We are treating it as exhaust.
6. The Proposed Architecture — Intent as Infrastructure
The Epoch
Whether brownfield or greenfield, establishing an epoch is the same operation:
Capture the best available understanding of intent as it exists right now, ratify it, and commit it as a versioned artifact.
For greenfield, the epoch is day one. For brownfield, it is an inference pass over existing artifacts — imperfect, but explicit. Imperfect and explicit beats perfect and implicit every time. From the epoch forward, both projects run the same process.
The Intent Record Schema
Each record in the intent graph carries:
| Field | Contents |
|---|---|
| Decision | What was chosen and what it governs |
| Constraints | What was explicitly non-negotiable at the time |
| Alternatives rejected | What was considered and set aside, and why |
| Conditions | What would have to change for this decision to be revisited |
| Links | To code artifacts produced, to related or conflicting decisions, to the conversation that generated it |
The schema does not need to be exhaustive on day one. It needs to be consistent and committable.
The Capture Workflow
Critical design constraint: this cannot be a separate step. The moment capture becomes a task done after development, it becomes the task that doesn’t get done.
A second agent — the intent recorder — runs alongside the primary generation agent. It is not writing code. It watches the conversation and synthesizes decision records from what it observes:
- When the primary agent proposes an approach and the human accepts, the recorder generates a candidate record.
- When the human rejects something, the recorder captures the rejection and infers the constraint that caused it.
- When the human asks for a change, the recorder notes what shifted and why.
At natural commit points, the recorder surfaces candidate records for the human to ratify. Quick review, quick correction, committed alongside the code.
The key inversion: Documentation fails because humans must generate it cold, after the fact, when intent has already faded. This works because the AI generates the draft in real time from the conversation, and the human only edits a draft already produced from their own words.
The Validation Loop
The validation agent — separate from the generation agent — does three things:
Invariant check — Does this change violate any recorded constraint marked non-negotiable? Surface it before the human sees the code, as a question not an error: “This change appears to conflict with a decision recorded here. Is that decision still operative?”
Coherence check — Does this change make sense given the trajectory of decisions that preceded it? Is it solving a problem the application decided not to solve?
Condition trigger — Does this change represent one of the conditions under which a prior decision should be revisited? If the decision record said “we’d reconsider this if we needed multi-tenancy” and this change is adding multi-tenancy — that’s not a violation. It’s a flag that an old decision needs to be consciously revisited and the record updated.
This is not automated gatekeeping. It surfaces what a senior engineer who remembered everything would surface in a code review. The human still decides — but with full context rather than partial memory.
The Brownfield Bootstrapping Pass
- Feed the AI the full available artifact set — codebase, commit history, PRs, tickets, architecture docs.
- Ask it to generate a candidate intent graph: what decisions appear to have been made, what constraints appear to be operative, what invariants does the code seem to be protecting?
- Conduct a ratification session — engineers who know the system walk through the candidate graph and correct the record.
- Commit the result as the epoch.
From that point forward, the project runs the same forward process as greenfield.
7. What Needs to Be Built
| Component | Function |
|---|---|
| Conversation store | Versions alongside git; treats the development conversation as a first-class artifact |
| Intent recorder agent | Synthesizes decision records from conversation in real time |
| Intent record schema | Lightweight enough to actually use; consistent and committable |
| Validation agent | Reasons against the accumulated intent graph; adversarially separated from the generation agent |
| Ratification UX | Makes human review fast enough to actually happen |
None of these are unsolved problems individually. They have not been assembled into a coherent system because the problem has not been framed this way:
The conversation is the spec. The epoch is the starting point. Everything after it should be infrastructure.
8. Core Propositions
-
Programming languages and package managers are human coordination systems, not technical ones. AI internalizes what they point to, making their retrieval function obsolete — but creating new unsolved problems around trust and reproducibility.
-
Validating AI-generated code against a spec is not the symmetric inverse of generating it. Specs are incomplete by definition. Closed epistemic loops — where AI generates, validates, and influenced the spec — produce code that passes tests and misses intent.
-
Application-level intent is not captured by function-level specs. The real spec is a temporal artifact: invariants, a decision ledger, and the evolution of intent over time.
-
Greenfield projects built with AI are accumulating decisions at unprecedented speed with intent captured at unprecedented low rates. This creates tomorrow’s brownfield faster than ever before.
-
The development conversation between human and AI is the most explicit a design decision ever gets. It is currently treated as exhaust. Treating it as infrastructure — versioned, structured, linked to the artifacts it produced — is the unlock.
-
The system required is not a better linter or test framework. It is a spec versioning and intent-capture architecture: a semantic graph of decisions, temporally versioned, adversarially legible to AI, with a human ratification loop that keeps intent honest.
Continuation Prompt
For continuing this conversation in another agent, share this document and use the following prompt:
This document captures a design conversation about intent capture in AI-assisted software development. The key thesis is that the conversation between a human and an AI during development is the spec, and we are currently treating it as exhaust rather than infrastructure. We have proposed an architecture consisting of: an epoch (a ratified starting-point intent graph), a real-time intent recorder agent, a versioned intent record schema, and an adversarially separated validation agent.
We were about to move into implementation design — specifically the schema for intent records, the workflow integration with existing git-based development, and the question of how an intent recorder agent should handle ambiguous or implicit decisions that the human never explicitly articulated.
Please continue from here.
Document generated: April 2026
Conversation participants: Human, Claude Sonnet (Anthropic)