😃 Backwards Compatible for Life
Chris Sawyer wrote RollerCoaster Tycoon almost entirely in x86 assembly. The original shipped in 1999, the sequel in 2002, both handcrafted at the machine level. The games captured something that resonated deeply enough that, starting in 2014, a community of volunteers began reverse-engineering RollerCoaster Tycoon 2 and rebuilding it for modern platforms. Over eleven years and nearly 30,000 commits later, the project, OpenRCT2, is still active.[1]

Thirty thousand commits to move one game across one platform shift. And RollerCoaster Tycoon is the exception. It earned that effort through nostalgia and devotion. Most software from that era is simply gone. Not because it was bad, but because the cost of preservation does not scale. Human passion is real, but it is finite. It cannot rescue every program locked into a specific implementation at a specific moment in time.
Now think about programs written in Python today. Or TypeScript. Or Rust. They are readable, expressive, well-designed. But at some point, they will feel like the wrong level of abstraction, the way COBOL feels today. Not unusable, but no longer where you would start a new project. Not because the languages got worse, but because the assumptions they were built on stopped being true.
We have seen this pattern before. The question is what survives the transition.
The Pattern
There is a recurring pattern in computing that we recognize in retrospect but rarely see in advance: every time we move up an abstraction layer, the layer below becomes an implementation detail.
Machine code was the program. Then assembly arrived, and machine code became an artifact the assembler produced. Assembly was the program. Then C arrived, and assembly became an artifact the compiler produced. Each transition felt radical at the time and obvious afterward.
I have written before about the latest extension of this chain. The compilation stack keeps growing:
intent → prompt → source code → executable
Each step is a kind of compilation. Lossy, interpretive, but increasingly automated. And each time, the abstraction above becomes the one that matters most. When someone shares a GitHub repo of AI-generated code, the first reply is often not “how does it work” but “what was the prompt.” The prompt is more reusable than the code. The code solves one problem. The prompt solves a class of problems.
Source code is currently where we live. But if AI agents are writing the code, and if those agents keep improving, then source code is heading for the same fate as assembly. Not gone. Not useless. But no longer the artifact that matters.
So what is the artifact that matters?
Two Artifacts
I think a program’s intent can be approximated well enough for practical regeneration by two things.
The first is comprehensive tests. Tests encode behavior. They say: given this input, produce this output. Given this edge case, handle it this way. When the user does this, the system does that. The right kinds of tests, integration tests, behavioral specs, end-to-end tests against a running system, describe what a program does without caring how it does it. They are implementation-independent in a way that unit tests, which tend to mirror the code’s internal structure, are not. A test suite built for regeneration looks different from a test suite built for refactoring.
The second is a supporting prompt. Some intent is hard to encode in tests. Architectural preferences. Performance characteristics. Subtle UX decisions. “Use a service-oriented architecture.” “Optimize for cold start latency.” “The hover animation should feel liquid, not mechanical.” A prompt captures the intent that tests leave ambiguous. It fills in the how and the why that tests cannot express.
Together, these two things form a practical intent specification. Not a perfect one. Tests are finite samples of a program’s behavior, not a complete description. Prompts are natural language, inherently ambiguous. Neither is airtight. But together they capture enough that a sufficiently capable AI can regenerate the codebase from scratch. Not edit it. Not patch it. Regenerate it. The tests verify the result. The prompt guides the generation. The code is the compiled output.
The formal methods community has been working on this problem for decades: specify what a program should do, then derive or verify the implementation. TLA+, Alloy, Coq. The tools are powerful and the results are rigorous, but adoption never scaled because the cost of writing a formal specification often rivaled the cost of writing the program. What AI changes is not the idea but the economics. The specification does not need to be mathematically complete. It needs to be good enough that regeneration is cheaper than preservation. That is a lower bar, and it keeps getting easier to clear as models improve.
Code is the work. Code is the craft. The idea that it becomes ephemeral feels like a dismissal of the discipline. But people valued hand-tuned assembly too. They were right to value it, and they were right to eventually move past it. The craft did not disappear. It moved up a layer.
What This Gets You
Target independence. Want to switch from Python to Rust? macOS to Linux? ARM to RISC-V? Regenerate from the same tests and prompt. The tests still pass. The prompt still applies. You have the same program on a new target, not because someone ported it line by line, but because the AI compiled from intent to a different output. The way a C program can target x86 or ARM without rewriting the source. The intent specification does not know or care about the destination.
Riding the improvement curve. This is the implication that surprised me most. The same prompts that generate a codebase today generate a better one next year as models improve. But extend that further: the same tests and prompt will eventually produce code that takes advantage of language features, frameworks, and paradigms that do not exist yet. Every other program falls behind the curve. A regenerable program rides it. You do not modernize. You regenerate, and modernization is a side effect.
Surviving eras. An x86 assembly program is trapped in the late 1990s. A Python program written today will feel trapped in the 2020s eventually. Every implementation is a snapshot of the tools and conventions of its moment. But a program defined by its tests and prompt regenerates into whatever the current era demands. The intent outlasts the implementation.
This is what I mean by backwards compatible for life. Not compatibility through preservation. Compatibility through regeneration.
Thirty Thousand Commits
Go back to RollerCoaster Tycoon.

Sawyer’s assembly code was a feat of engineering. The OpenRCT2 team’s port was equally heroic. Thirty thousand commits to reverse-engineer the game’s intent from its implementation, piece by piece, instruction by instruction, and rewrite it in C++. They had to excavate the what from the how. Every function they ported was an act of archaeology, inferring intent from raw instructions that were never designed to communicate intent to anyone.
That is the cost when the implementation is the only artifact. When intent is encoded nowhere except in the code itself, recovering that intent is brutally expensive.
Most software does not get that effort. Most software from the assembly era is gone. The utilities, the business tools, the games that were not beloved enough. Their intent died with their implementation because the two were inseparable.
I see the same pattern forming with AI-generated code. Claude generates a codebase for you. It is coherent, it works, it is well-structured. But six months later, you want to change the architecture. The AI has to understand thousands of lines of existing code, work around decisions made in a different context, patch instead of design. The generated code has crystallized. It resists change the same way Sawyer’s assembly resisted change. Not because it is bad code, but because every implementation accumulates hardcoded assumptions, structural decisions, and framework-specific patterns that make it harder to modify than to rewrite.
The difference now: rewriting is cheap. AI agents have unlimited patience. They will regenerate a codebase as many times as you ask. The cost of generation is approaching zero. The bottleneck was always verification, making sure the regenerated code does what you intended. And that is exactly what the tests provide.
Writing a comprehensive test suite for a game like RollerCoaster Tycoon would itself be an enormous effort. The feel of a coaster swooping through a loop, the emergent behavior of park guests, the way the camera moves. These are not easy to pin down in assertions. A test suite for a complex system approaches the complexity of the system.
But here is the thing: the OpenRCT2 team had to do that work anyway. They had to reverse-engineer every mechanic, every edge case, every implicit behavior. They just had to do it from raw assembly with no specification to guide them. If that effort had been invested in tests and a design document instead, they would have ended up with an artifact that could regenerate the game on any future platform, not just the one they happened to port to.
The difficulty does not disappear. It relocates. The question is whether you invest that difficulty in an artifact that works once or an artifact that works forever.
Why the Implementation Will Keep Changing
The implementation layer itself is unstable.
I run multiple AI agents in parallel on my codebases daily. Watching them work, you notice the friction. Agents diverge on style because languages offer too many equivalent ways to do the same thing. They fight sequential assumptions because they see dependency graphs but the language forces step-by-step execution. They scatter error handling across try-catch blocks and then struggle to reason about the combined behavior when they revisit the code. They are working around the languages, not with them.
These are things I see in diffs every day. And they suggest the languages agents write in will change. When they do, every codebase written in today’s languages will face the same question that assembly codebases faced when C arrived. Do you port, or do you start fresh?
If your program is defined by tests and a prompt, the answer is obvious. You regenerate. The language changed. The intent did not.
We have already gone from writing code to directing its generation. The next shift is from directing generation to specifying intent and letting generation happen automatically, repeatedly, across changing platforms and changing languages.
Why Now
At Anthropic, about 90% of Claude Code’s own codebase is now written by Claude Code. The tool builds itself.[3] At Microsoft, AI writes roughly 30% of all code. At Google, over 25%.[4] These are not research demos. These are production codebases at the largest software companies in the world.
The benchmarks tell the same story. When SWE-bench was introduced in October 2023, the best AI approach resolved 2% of real-world GitHub issues. By February 2026, frontier models resolve over 80%, and not just one model: Claude, GPT-5, Gemini, MiniMax, and several open-source systems all cluster above 75%.[2] A 40x improvement in 27 months, across the entire field. SWE-bench measures isolated bug fixes, not full codebase generation. But the gap between “fix a bug in an existing repo” and “generate a working system from a specification” is closing fast.
None of this means regeneration is ready today. Statefulness alone may be a decade-hard problem. A program is not just its behavior. It is its database schema, its migration history, its integration contracts with external services. Years of schema migrations, data format decisions that downstream systems depend on, implicit contracts with third-party APIs that break if the interface changes. Regenerating the codebase does not regenerate any of that. The intent specification for a stateful system would need to encode not just what the program does today, but every evolutionary step that brought it here. Most real-world software is stateful, which means the framework described here applies cleanly to a minority of programs today.
But that minority is where the proof of concept will come from. And even for the easier cases, the tooling barely exists. We have test frameworks. We have prompt engineering. What we do not have is a unified system that treats tests and prompts together as the canonical source, with code as the compiled output.
Think about what version control did for code. Before git, coordinating changes across a team was a manual process full of lost work and conflicting edits. Git made code collaboration tractable by treating source files as the versioned artifact. We need the equivalent for intent specifications: a system that versions tests and prompts together, diffs changes to intent the way git diffs changes to code, and makes “regenerate from spec” as routine as “build from source.” The first team to build this will prove the concept or expose where it breaks.
This is the infrastructure that needs to be built. Not a better programming language. Not a better IDE. A system that makes intent the first-class artifact, code the ephemeral one, and regeneration the default workflow.
Every previous abstraction layer, assembly to C, C to Python, Python to prompts, reshaped what was possible to build and who could build it. The next one will too.
It took 30,000 commits to port one game across one platform shift. Imagine the next shift, and the one after that. The programs we write today will face the same question: does someone care enough to port this?
Or it regenerates.
Backwards compatible for life.
Citations
[1] OpenRCT2 — An open-source re-implementation of RollerCoaster Tycoon 2 ↩
[2] SWE-bench Verified Leaderboard — Epoch AI, February 2026 ↩
[3] Top engineers at Anthropic, OpenAI say AI now writes 100% of their code — Fortune, January 2026 ↩
[4] Generative coding — 10 Breakthrough Technologies 2026 — MIT Technology Review, January 2026 ↩
Chris Sawyer wrote RollerCoaster Tycoon almost entirely in x86 assembly. The original shipped in 1999, the sequel in 2002, both handcrafted at the machine level. The games captured something that resonated deeply enough that, starting in 2014, a community of volunteers began reverse-engineering RollerCoaster Tycoon 2 and rebuilding it for modern platforms. Over eleven years and nearly 30,000 commits later, the project, OpenRCT2, is still active.<sup><a href="#cite-1" id="ref-1">[1]</a></sup>

Thirty thousand commits to move one game across one platform shift. And RollerCoaster Tycoon is the exception. It earned that effort through nostalgia and devotion. Most software from that era is simply gone. Not because it was bad, but because the cost of preservation does not scale. Human passion is real, but it is finite. It cannot rescue every program locked into a specific implementation at a specific moment in time.
Now think about programs written in Python today. Or TypeScript. Or Rust. They are readable, expressive, well-designed. But at some point, they will feel like the wrong level of abstraction, the way COBOL feels today. Not unusable, but no longer where you would start a new project. Not because the languages got worse, but because the assumptions they were built on stopped being true.
We have seen this pattern before. The question is what survives the transition.
## The Pattern
There is a recurring pattern in computing that we recognize in retrospect but rarely see in advance: every time we move up an abstraction layer, the layer below becomes an implementation detail.
Machine code was the program. Then assembly arrived, and machine code became an artifact the assembler produced. Assembly was the program. Then C arrived, and assembly became an artifact the compiler produced. Each transition felt radical at the time and obvious afterward.
I have [written before](/just-give-us-the-prompt.md) about the latest extension of this chain. The compilation stack keeps growing:
```
intent → prompt → source code → executable
```
Each step is a kind of compilation. Lossy, interpretive, but increasingly automated. And each time, the abstraction above becomes the one that matters most. When someone shares a GitHub repo of AI-generated code, the first reply is often not "how does it work" but "what was the prompt." The prompt is more reusable than the code. The code solves one problem. The prompt solves a class of problems.
Source code is currently where we live. But if AI agents are writing the code, and if those agents keep improving, then source code is heading for the same fate as assembly. Not gone. Not useless. But no longer the artifact that matters.
So what is the artifact that matters?
## Two Artifacts
I think a program's intent can be approximated well enough for practical regeneration by two things.
The first is comprehensive tests. Tests encode behavior. They say: given this input, produce this output. Given this edge case, handle it this way. When the user does this, the system does that. The right kinds of tests, integration tests, behavioral specs, end-to-end tests against a running system, describe what a program does without caring how it does it. They are implementation-independent in a way that unit tests, which tend to mirror the code's internal structure, are not. A test suite built for regeneration looks different from a test suite built for refactoring.
The second is a supporting prompt. Some intent is hard to encode in tests. Architectural preferences. Performance characteristics. Subtle UX decisions. "Use a service-oriented architecture." "Optimize for cold start latency." "The hover animation should feel liquid, not mechanical." A prompt captures the intent that tests leave ambiguous. It fills in the how and the why that tests cannot express.
Together, these two things form a practical intent specification. Not a perfect one. Tests are finite samples of a program's behavior, not a complete description. Prompts are natural language, inherently ambiguous. Neither is airtight. But together they capture enough that a sufficiently capable AI can regenerate the codebase from scratch. Not edit it. Not patch it. Regenerate it. The tests verify the result. The prompt guides the generation. The code is the compiled output.
The formal methods community has been working on this problem for decades: specify what a program should do, then derive or verify the implementation. TLA+, Alloy, Coq. The tools are powerful and the results are rigorous, but adoption never scaled because the cost of writing a formal specification often rivaled the cost of writing the program. What AI changes is not the idea but the economics. The specification does not need to be mathematically complete. It needs to be good enough that regeneration is cheaper than preservation. That is a lower bar, and it keeps getting easier to clear as models improve.
Code is the work. Code is the craft. The idea that it becomes ephemeral feels like a dismissal of the discipline. But people valued hand-tuned assembly too. They were right to value it, and they were right to eventually move past it. The craft did not disappear. It moved up a layer.
## What This Gets You
Target independence. Want to switch from Python to Rust? macOS to Linux? ARM to RISC-V? Regenerate from the same tests and prompt. The tests still pass. The prompt still applies. You have the same program on a new target, not because someone ported it line by line, but because the AI compiled from intent to a different output. The way a C program can target x86 or ARM without rewriting the source. The intent specification does not know or care about the destination.
Riding the improvement curve. This is the implication that surprised me most. The same prompts that generate a codebase today [generate a better one next year](/just-give-us-the-prompt.md) as models improve. But extend that further: the same tests and prompt will eventually produce code that takes advantage of language features, frameworks, and paradigms that do not exist yet. Every other program falls behind the curve. A regenerable program rides it. You do not modernize. You regenerate, and modernization is a side effect.
Surviving eras. An x86 assembly program is trapped in the late 1990s. A Python program written today will feel trapped in the 2020s eventually. Every implementation is a snapshot of the tools and conventions of its moment. But a program defined by its tests and prompt regenerates into whatever the current era demands. The intent outlasts the implementation.
This is what I mean by backwards compatible for life. Not compatibility through preservation. Compatibility through regeneration.
## Thirty Thousand Commits
Go back to RollerCoaster Tycoon.

Sawyer's assembly code was a feat of engineering. The OpenRCT2 team's port was equally heroic. Thirty thousand commits to reverse-engineer the game's intent from its implementation, piece by piece, instruction by instruction, and rewrite it in C++. They had to excavate the what from the how. Every function they ported was an act of archaeology, inferring intent from raw instructions that were never designed to communicate intent to anyone.
That is the cost when the implementation is the only artifact. When intent is encoded nowhere except in the code itself, recovering that intent is brutally expensive.
Most software does not get that effort. Most software from the assembly era is gone. The utilities, the business tools, the games that were not beloved enough. Their intent died with their implementation because the two were inseparable.
I see the same pattern forming with AI-generated code. Claude generates a codebase for you. It is coherent, it works, it is well-structured. But six months later, you want to change the architecture. The AI has to understand thousands of lines of existing code, work around decisions made in a different context, patch instead of design. The generated code has crystallized. It resists change the same way Sawyer's assembly resisted change. Not because it is bad code, but because every implementation accumulates hardcoded assumptions, structural decisions, and framework-specific patterns that make it harder to modify than to rewrite.
The difference now: rewriting is cheap. AI agents have unlimited patience. They will regenerate a codebase as many times as you ask. The cost of generation is approaching zero. The bottleneck was always [verification](/verification-is-the-bottleneck.md), making sure the regenerated code does what you intended. And that is exactly what the tests provide.
Writing a comprehensive test suite for a game like RollerCoaster Tycoon would itself be an enormous effort. The feel of a coaster swooping through a loop, the emergent behavior of park guests, the way the camera moves. These are not easy to pin down in assertions. A test suite for a complex system approaches the complexity of the system.
But here is the thing: the OpenRCT2 team had to do that work anyway. They had to reverse-engineer every mechanic, every edge case, every implicit behavior. They just had to do it from raw assembly with no specification to guide them. If that effort had been invested in tests and a design document instead, they would have ended up with an artifact that could regenerate the game on any future platform, not just the one they happened to port to.
The difficulty does not disappear. It relocates. The question is whether you invest that difficulty in an artifact that works once or an artifact that works forever.
## Why the Implementation Will Keep Changing
The implementation layer itself is unstable.
I [run multiple AI agents in parallel](/one-repo-many-agents.md) on my codebases daily. Watching them work, you notice the friction. Agents diverge on style because languages offer too many equivalent ways to do the same thing. They fight sequential assumptions because they see dependency graphs but the language forces step-by-step execution. They scatter error handling across try-catch blocks and then struggle to reason about the combined behavior when they revisit the code. They are working around the languages, not with them.
These are things I see in diffs every day. And they suggest the languages agents write in will change. When they do, every codebase written in today's languages will face the same question that assembly codebases faced when C arrived. Do you port, or do you start fresh?
If your program is defined by tests and a prompt, the answer is obvious. You regenerate. The language changed. The intent did not.
We have already gone from [writing code to directing its generation](/code-archaeologists.md). The next shift is from directing generation to specifying intent and letting generation happen automatically, repeatedly, across changing platforms and changing languages.
## Why Now
At Anthropic, about 90% of Claude Code's own codebase is now written by Claude Code. The tool builds itself.<sup><a href="#cite-3" id="ref-3">[3]</a></sup> At Microsoft, AI writes roughly 30% of all code. At Google, over 25%.<sup><a href="#cite-4" id="ref-4">[4]</a></sup> These are not research demos. These are production codebases at the largest software companies in the world.
The benchmarks tell the same story. When SWE-bench was introduced in October 2023, the best AI approach resolved 2% of real-world GitHub issues. By February 2026, frontier models resolve over 80%, and not just one model: Claude, GPT-5, Gemini, MiniMax, and several open-source systems all cluster above 75%.<sup><a href="#cite-2" id="ref-2">[2]</a></sup> A 40x improvement in 27 months, across the entire field. SWE-bench measures isolated bug fixes, not full codebase generation. But the gap between "fix a bug in an existing repo" and "generate a working system from a specification" is closing fast.
None of this means regeneration is ready today. Statefulness alone may be a decade-hard problem. A program is not just its behavior. It is its database schema, its migration history, its integration contracts with external services. Years of schema migrations, data format decisions that downstream systems depend on, implicit contracts with third-party APIs that break if the interface changes. Regenerating the codebase does not regenerate any of that. The intent specification for a stateful system would need to encode not just what the program does today, but every evolutionary step that brought it here. Most real-world software is stateful, which means the framework described here applies cleanly to a minority of programs today.
But that minority is where the proof of concept will come from. And even for the easier cases, the tooling barely exists. We have test frameworks. We have prompt engineering. What we do not have is a unified system that treats tests and prompts together as the canonical source, with code as the compiled output.
Think about what version control did for code. Before git, coordinating changes across a team was a manual process full of lost work and conflicting edits. Git made code collaboration tractable by treating source files as the versioned artifact. We need the equivalent for intent specifications: a system that versions tests and prompts together, diffs changes to intent the way git diffs changes to code, and makes "regenerate from spec" as routine as "build from source." The first team to build this will prove the concept or expose where it breaks.
This is the infrastructure that needs to be built. Not a better programming language. Not a better IDE. A system that makes intent the first-class artifact, code the ephemeral one, and regeneration the default workflow.
Every previous abstraction layer, assembly to C, C to Python, Python to prompts, reshaped what was possible to build and who could build it. The next one will too.
---
It took 30,000 commits to port one game across one platform shift. Imagine the next shift, and the one after that. The programs we write today will face the same question: does someone care enough to port this?
Or it regenerates.
Backwards compatible for life.
## Citations
<p id="cite-1">[1] <a href="https://github.com/OpenRCT2/OpenRCT2" target="_blank" rel="noopener noreferrer">OpenRCT2</a> — An open-source re-implementation of RollerCoaster Tycoon 2 <a href="#ref-1">↩</a></p>
<p id="cite-2">[2] <a href="https://epoch.ai/benchmarks/swe-bench-verified" target="_blank" rel="noopener noreferrer">SWE-bench Verified Leaderboard</a> — Epoch AI, February 2026 <a href="#ref-2">↩</a></p>
<p id="cite-3">[3] <a href="https://fortune.com/2026/01/29/100-percent-of-code-at-anthropic-and-openai-is-now-ai-written-boris-cherny-roon/" target="_blank" rel="noopener noreferrer">Top engineers at Anthropic, OpenAI say AI now writes 100% of their code</a> — Fortune, January 2026 <a href="#ref-3">↩</a></p>
<p id="cite-4">[4] <a href="https://www.technologyreview.com/2026/01/12/1130027/generative-coding-ai-software-2026-breakthrough-technology/" target="_blank" rel="noopener noreferrer">Generative coding — 10 Breakthrough Technologies 2026</a> — MIT Technology Review, January 2026 <a href="#ref-4">↩</a></p>