🤓 Verification Is the Bottleneck
I’m working on a feature that prevents PlayStation controllers from going to sleep. To test it, I have to wait 15 minutes—that’s how long a DualSense takes to auto-sleep by default. If my fix doesn’t work, I tweak the code and wait another 15 minutes.
The code itself took 30 seconds to write. Verifying it works takes hours.
This is the new bottleneck. AI has made writing code almost instant. But verifying that code actually works? That’s still bounded by real-world time.
The Inversion
There’s a concept in computer science called NP, where problems are hard to solve but easy to verify. Given a proposed solution, you can quickly check if it’s correct—even if finding that solution would take exponential time. The classic example: factoring large numbers is hard, but multiplying two factors together to verify is trivial.
We’re living through an inversion of this pattern.
With AI, generating solutions is now trivial. You describe what you want, and working code appears. The hard part has become verification—confirming the code actually does what it’s supposed to do in the real world.
And unlike computational verification, real-world verification has time constants that can’t be optimized away:
- Testing a sleep-prevention feature requires waiting for the sleep timeout
- Verifying a bug fix requires reproducing the bug’s trigger conditions
- Confirming a race condition is fixed requires running the code enough times
The CodexBar Problem
There’s an app called CodexBar that shows your token usage across AI platforms—Claude, OpenAI, Gemini. It’s genuinely useful, and I have it running in my menu bar constantly.
It has one minor issue: occasionally it prompts you to re-enter your keychain password. I’ve tried fixing this bug twice as a contribution back to the project.
Here’s the problem: to verify any fix, you have to wait those few hours. You make a change, rebuild, wait three hours, and see if the prompt appears. If it does, you try something else and wait another three hours.
The code change takes minutes. Verification takes most of a workday. And you can’t easily parallelize it—you need to see if this specific change fixed this specific bug.
You Are the Oracle
In complexity theory, an “oracle” is a black box that can instantly answer certain questions. Algorithms can query the oracle and get immediate yes/no responses. It’s a theoretical construct for reasoning about computational limits.
Now think about the AI coding workflow:
- You describe what you want
- AI generates code instantly
- You verify if it works
- Repeat until correct
In this loop, you are the oracle. The AI is the generator—fast, cheap, unlimited. You are the verifier—slow, expensive, the bottleneck.
Except unlike theoretical oracles, you can’t answer instantly. Your verification process is bounded by:
- Physical time (waiting for timeouts, user behavior, external events)
- Human attention (actually testing the feature, noticing edge cases)
- Environment constraints (needing specific hardware, network conditions, user accounts)
I often have multiple terminal tabs open, each waiting on a different verification. The AI generates solutions faster than I can verify them.
Where This Matters Most
The verification bottleneck is most painful when:
The feedback loop is long. Sleep timeouts, session expirations, rate limit resets. Anything measured in minutes or hours rather than milliseconds.
The trigger is probabilistic. Race conditions, intermittent failures, bugs that only appear under specific load patterns. You might need dozens of runs to see the failure.
The environment is hard to simulate. Hardware-specific behavior, third-party API quirks, platform-specific bugs. You can’t mock your way to confidence.
The state is hard to reproduce. Bugs that only appear after hours of use, with specific data patterns, in accounts that have evolved over time.
Implications
If verification is the bottleneck, a few things follow:
Test automation becomes even more valuable. Not because it catches bugs faster—because it verifies fixes faster. The ROI on good tests increases when generation costs drop to zero.
Observability matters more than debugging. When you can regenerate code instantly but can’t speed up verification, you want to maximize the information gained from each verification cycle. Better logs, better metrics, better error messages.
Reproducer quality is crucial. A bug report that says “the app crashes sometimes” doesn’t help much. A bug report with exact steps—“open a 4K video, scrub to 2:30, then undo twice”—lets you verify a fix in seconds instead of hours of random testing.
The Asymmetry
We’ve built tools that can generate code at superhuman speed. But we haven’t built tools that can verify code at superhuman speed—at least not for behaviors that depend on time, environment, and human interaction.
Until we do, the human remains the bottleneck. The oracle that can’t answer any faster than reality allows.
The irony isn’t lost on me: I wrote this post while waiting for my controller sleep-prevention code to be verified. Fifteen minutes until I know if it works.
Update: I never figured it out. If you know how to prevent a DualSense controller from auto-sleeping, let me know on X.
I'm working on a feature that prevents PlayStation controllers from going to sleep. To test it, I have to wait 15 minutes—that's how long a DualSense takes to auto-sleep by default. If my fix doesn't work, I tweak the code and wait another 15 minutes.
The code itself took 30 seconds to write. Verifying it works takes hours.
This is the new bottleneck. AI has made writing code almost instant. But verifying that code actually works? That's still bounded by real-world time.
## The Inversion
There's a concept in computer science called NP, where problems are hard to solve but easy to verify. Given a proposed solution, you can quickly check if it's correct—even if finding that solution would take exponential time. The classic example: factoring large numbers is hard, but multiplying two factors together to verify is trivial.
We're living through an inversion of this pattern.
With AI, generating solutions is now trivial. You describe what you want, and working code appears. The hard part has become verification—confirming the code actually does what it's supposed to do in the real world.
And unlike computational verification, real-world verification has time constants that can't be optimized away:
- Testing a sleep-prevention feature requires waiting for the sleep timeout
- Verifying a bug fix requires reproducing the bug's trigger conditions
- Confirming a race condition is fixed requires running the code enough times
## The CodexBar Problem
There's an app called [CodexBar](https://github.com/steipete/CodexBar) that shows your token usage across AI platforms—Claude, OpenAI, Gemini. It's genuinely useful, and I have it running in my menu bar constantly.
It has one minor issue: occasionally it prompts you to re-enter your keychain password. I've tried fixing this bug twice as a contribution back to the project.
Here's the problem: to verify any fix, you have to wait those few hours. You make a change, rebuild, wait three hours, and see if the prompt appears. If it does, you try something else and wait another three hours.
The code change takes minutes. Verification takes most of a workday. And you can't easily parallelize it—you need to see if this specific change fixed this specific bug.
## You Are the Oracle
In complexity theory, an "oracle" is a black box that can instantly answer certain questions. Algorithms can query the oracle and get immediate yes/no responses. It's a theoretical construct for reasoning about computational limits.
Now think about the AI coding workflow:
1. You describe what you want
2. AI generates code instantly
3. You verify if it works
4. Repeat until correct
In this loop, you are the oracle. The AI is the generator—fast, cheap, unlimited. You are the verifier—slow, expensive, the bottleneck.
Except unlike theoretical oracles, you can't answer instantly. Your verification process is bounded by:
- Physical time (waiting for timeouts, user behavior, external events)
- Human attention (actually testing the feature, noticing edge cases)
- Environment constraints (needing specific hardware, network conditions, user accounts)
I often have multiple terminal tabs open, each waiting on a different verification. The AI generates solutions faster than I can verify them.
## Where This Matters Most
The verification bottleneck is most painful when:
The feedback loop is long. Sleep timeouts, session expirations, rate limit resets. Anything measured in minutes or hours rather than milliseconds.
The trigger is probabilistic. Race conditions, intermittent failures, bugs that only appear under specific load patterns. You might need dozens of runs to see the failure.
The environment is hard to simulate. Hardware-specific behavior, third-party API quirks, platform-specific bugs. You can't mock your way to confidence.
The state is hard to reproduce. Bugs that only appear after hours of use, with specific data patterns, in accounts that have evolved over time.
## Implications
If verification is the bottleneck, a few things follow:
Test automation becomes even more valuable. Not because it catches bugs faster—because it verifies fixes faster. The ROI on good tests increases when generation costs drop to zero.
Observability matters more than debugging. When you can regenerate code instantly but can't speed up verification, you want to maximize the information gained from each verification cycle. Better logs, better metrics, better error messages.
Reproducer quality is crucial. A bug report that says "the app crashes sometimes" doesn't help much. A bug report with exact steps—"open a 4K video, scrub to 2:30, then undo twice"—lets you verify a fix in seconds instead of hours of random testing.
## The Asymmetry
We've built tools that can generate code at superhuman speed. But we haven't built tools that can verify code at superhuman speed—at least not for behaviors that depend on time, environment, and human interaction.
Until we do, the human remains the bottleneck. The oracle that can't answer any faster than reality allows.
---
The irony isn't lost on me: I wrote this post while waiting for my controller sleep-prevention code to be verified. Fifteen minutes until I know if it works.
Update: I never figured it out. If you know how to prevent a DualSense controller from auto-sleeping, [let me know on X](https://x.com/_KevinTang).