😱 10 Bits Per Second

My current programming setup: DualSense controller, DJI Mic Mini clipped to my shirt, code on the screen behind me

I bought a DJI Mic Mini last week. It’s a wireless lavalier microphone—the kind YouTubers clip to their shirts. It has a tiny transmitter that weighs 10 grams, a receiver with a USB-C adapter, and 400 meters of wireless range. It was designed for content creators filming vlogs.

I’m using it to write code.

If you’re not already familiar with the shift: since late 2025, AI coding agents have fundamentally changed how software gets built. Tools like Anthropic’s Claude Code and OpenAI’s Codex live in your terminal, read your codebase, write code, run tests, and commit to git—autonomously.

Andrej Karpathy coined the term “vibe coding” in February 2025^[4], describing a workflow where you speak your intent and the AI handles the implementation. By the end of the year, “vibe coding” was Collins Dictionary’s Word of the Year, Claude Code alone was responsible for 4% of all public GitHub commits and growing fast^[5], and three companies—GitHub Copilot, Claude Code, and Cursor—had each crossed $1 billion in annual revenue.

Both DJI Mic Mini transmitters with windscreens, fitting in the palm of my hand

I got the two-pack (2 transmitters + 1 receiver) for $59. It comes with windscreens, magnetic clips for attaching to your shirt, a charging dock, a USB-C splitter cable so you can charge both mics at the same time, and a carrying pouch. The receiver is tiny—just a small USB-C dongle that plugs directly into my laptop. Having two transmitters means I can swap one in while the other charges, though with 11.5 hours of battery life per transmitter I haven’t actually needed to swap yet. When I’m not coding, the same USB-C receiver plugs into my phone for recording content.

The Setup

The DJI Mic Mini receiver plugged into my laptop's USB-C port

I pair it once, and then I just talk. It doesn’t matter if I stand up to stretch, walk to the kitchen to get water, or pace around while thinking through an architecture problem. The mic picks up my voice clearly from anywhere because it’s physically attached to me.

I use VoiceInk, an open source speech-to-text app, for transcription. Here are my stats after a few months:

VoiceInk stats: 38 hours saved, 121,886 words dictated across 2,742 sessions, 104.8 words per minute, 609,430 keystrokes saved

That’s 609,430 keystrokes I didn’t have to type.

Before the DJI Mic Mini, I was using my MacBook’s built-in microphone. It worked fine if I was sitting right in front of my laptop, enunciating clearly, speaking in its direction. But as soon as I stood up and walked across the room, the transcription quality fell off a cliff.

Another issue was volume. If you’re talking to AI all day, you don’t want to be projecting your voice toward a laptop across the room. That’s a recipe for vocal strain. You’d think the solution is to just speak quietly, but there’s a floor to how quiet you can go before the laptop mic can’t pick you up. And going all the way down to a whisper is actually worse for you—otolaryngologists have found that whispering can cause more stress on your vocal cords than normal speech, because many people tighten the muscles around the voice box to compensate^[3].

A lavalier mic solves both problems. The vocal strain one is obvious—it’s inches from your mouth, so I can talk at a quiet indoor voice without projecting. But the accuracy improvement is just as important. Transcription models like Whisper and Parakeet can make errors, but those models are only as good as their input audio. A laptop mic across the room picks up fan noise, room reverb, and a quieter voice signal. A mic clipped to your chest picks up a clean, close-range signal every time. Better source data means fewer transcription errors, which means less friction, which means you actually stay in the voice workflow instead of getting frustrated and reaching for the keyboard.

I’m running this through a speech-to-text tool that feeds directly into Claude Code. I describe what I want, the agent builds it, I look at the result on my phone, I describe what’s wrong, and the agent fixes it. The entire feedback loop is voice-driven. My hands never touch the keyboard—not even for approving tool calls. I handle those with ControllerKeys, an app I built that lets me control macOS entirely with an Xbox controller. I wrote about it in Every Shortcut Within Reach. Between the mic and the controller, my keyboard is essentially a decoration.

10 Bits Per Second

In 2024, Caltech researchers published a paper called “The Unbearable Slowness of Being”^[1] that quantified something neuroscientists had suspected for decades: conscious human thought runs at approximately 10 bits per second.

Ten. Bits. Per second.

Your eyes take in about a billion bits per second. Your ears, your skin, your proprioceptive system—all of it operates at enormous bandwidth. But the conscious part of your brain, the part that makes decisions and forms intentions and chooses what to do next, operates at a rate that would embarrass a 1970s modem.

This isn’t a metaphor. The researchers surveyed decades of studies across wildly different tasks—typing, speaking, solving Rubik’s Cubes, playing video games, reading—and found the same speed limit everywhere. About 10 bits per second of behavioral output, regardless of the task. Our sensory systems take in billions of bits per second, but somehow, at the level of conscious thought, we bottleneck to a trickle.

The implication is startling: no input device can ever be faster than you can think.

The Bandwidth Hierarchy

Here’s how our current input methods stack up:

Method	Words Per Minute
Eye tracking	~20 WPM
Neuralink BCI (current best)	~40 WPM
Typing (average developer)	~54 WPM
Typing (fast developer)	~100 WPM
Natural speech	~100–150 WPM

Voice wins. Not by a little—by a lot compared to average typing, and roughly tied with the fastest typists.

But here’s the thing that surprised me: Neuralink, the brain-computer interface that is supposed to be the future of human-computer interaction, currently tops out at 40 words per minute^[2]. That’s slower than the average developer types. Their ambitious goal with the VOICE clinical trial is 140 WPM—which is just… normal talking speed.

The sci-fi dream of thinking commands directly into a computer runs straight into the Caltech wall. Even if you could read neural signals perfectly, the conscious thoughts generating those signals only produce 10 bits per second. The bottleneck was never the interface. It was always the brain.

Why Voice Beats Typing for Programming

When I say “voice is faster,” I don’t just mean words per minute. I mean the full loop.

You can talk while looking at something else. Say you’re working on a web app. You navigate to the page in your browser, start talking: “the sidebar is overlapping the main content, and the nav links are wrapping to two lines.” Then you pick up your phone, pull up the same page on mobile, and keep going: “on mobile the hamburger menu isn’t opening, and the hero image is way too tall.” You put your phone down, switch back to your terminal, end the transcription, and it’s all there as one prompt. You stayed in the same flow the entire time—looking at the thing, talking about the thing, never breaking out of that observational mode to sit down and translate your thoughts into typed text.

You can talk while thinking. This sounds obvious but it’s genuinely different from typing. When I type, I think first, then type. There’s a serialization step—I formulate the thought, then I encode it into keystrokes. When I speak, the thought and the expression happen almost simultaneously. The bandwidth of speech is close enough to the bandwidth of thought that there’s barely any buffering.

Errors don’t matter. I wrote about this in Misheard Lyrics for Robots—I said “run make install build from source” and my transcription software heard “Ryan Lacon stall book for source.” Claude Code ran make install BUILD_FROM_SOURCE=1 anyway. When your listener is an LLM, transcription errors are just noise that gets filtered out. The error tolerance of natural language is orders of magnitude higher than the error tolerance of a keyboard.

You can move. The DJI Mic Mini on my shirt handles voice input. The DualSense controller in my hand handles everything else—approving tool calls, switching windows, scrolling, navigating—via ControllerKeys. Together, they make me completely untethered. I live in a small unit and I haven’t had a desk in about three years. My back was starting to hurt from hunching over a laptop on the couch, which is part of why I built ControllerKeys in the first place—I needed a way to work that didn’t chain me to one position. Now I can stand up, walk to the kitchen, pace around while thinking through a problem, and keep working the entire time. The only limit I’ve found is wireless range—thick panes of glass can cut the signal short, but barring that, I can work from anywhere in my apartment.

The Real Unlock: Compression of Intent

Here’s the argument that goes beyond raw words-per-minute.

AI coding agents have compressed programming from writing code to describing intent. Instead of typing 47 lines of Swift to implement a camera animation, you say “make the camera do a cinematic swoop into the photo when you tap it” and the agent writes the Bezier math for you. You could type that sentence too — but once programming becomes a series of short, conversational exchanges, voice is the natural medium. You’re not writing code anymore. You’re just talking. And talking is what voice was literally designed for.

The advantages compound from there. You talk while staring at the bug on your phone. You talk while pacing through an architecture problem. You talk while the agent is still finishing its last task, queueing up your next thought. There’s no context-switch to the keyboard, no breaking out of the flow to sit down and type. The conversation just keeps going.

A Content Creator Accessory Is Now a Programming Tool

The DJI Mic Mini costs $59. It was built for people who make YouTube videos and TikToks. The product page shows influencers filming themselves cooking and traveling.

I’m using it to debug RealityKit coordinate transforms.

There’s something funny about the fact that the highest-bandwidth programming peripheral you can buy in 2026 isn’t a mechanical keyboard or an ergonomic split board—it’s a lavalier microphone originally designed for vloggers. The tool categories are converging. Content creation and software engineering now share the same input device because they share the same upstream constraint: getting human intent into a computer as fast as possible.

I think five years from now — assuming on-device mics don’t get significantly better — a wireless mic will be as standard in a developer’s kit as a second monitor. Not because everyone will be recording themselves—but because talking to your AI agent is faster than typing to it, and a good mic is the difference between “works okay” and “works every time.”

The 10-Bit Ceiling

The Caltech paper ends with a question that nobody has answered: why is human conscious thought so slow? We have 86 billion neurons, each capable of transmitting hundreds of bits per second, yet we think one thought at a time at 10 bits per second. The researchers suggest we’re limited not by hardware but by some deep architectural constraint—perhaps the brain can only maintain one coherent “thread” of consciousness at a time.

If that’s true, then the optimizations we should be chasing aren’t about faster interfaces. They’re about richer compression—making each of those 10 bits count for more. And that’s exactly what AI does. It takes a low-bandwidth, noisy, sometimes garbled human signal and reconstructs the high-bandwidth intent behind it.

Voice input is already close to saturating our conscious output bandwidth. The next frontier isn’t a faster pipe from brain to computer. It’s a smarter decoder on the other end.

Citations

[1] The Unbearable Slowness of Being: Why do we live at 10 bits/s? — Zheng & Meister, Neuron (2024) ↩

[2] From Paralysis to Neuroscience: How 21 People are Using Neuralink in 2026 — TeslaNorth ↩

[3] Laryngeal hyperfunction during whispering: reality or myth? — Journal of Voice (2006) ↩

[4] Andrej Karpathy on "vibe coding" — X/Twitter (2025) ↩

[5] Claude Code is the Inflection Point — SemiAnalysis (2026) ↩

![My current programming setup: DualSense controller, DJI Mic Mini clipped to my shirt, code on the screen behind me](/images/voice-setup.png)

I bought a [DJI Mic Mini](https://www.amazon.com/dp/B0FQJH54TR?tag=kevindotmd-20) last week. It's a wireless lavalier microphone—the kind YouTubers clip to their shirts. It has a tiny transmitter that weighs 10 grams, a receiver with a USB-C adapter, and 400 meters of wireless range. It was designed for content creators filming vlogs.

I'm using it to write code.

If you're not already familiar with the shift: since late 2025, AI coding agents have fundamentally changed how software gets built. Tools like Anthropic's [Claude Code](https://docs.anthropic.com/en/docs/claude-code) and OpenAI's [Codex](https://openai.com/index/introducing-codex/) live in your terminal, read your codebase, write code, run tests, and commit to git—autonomously.

Andrej Karpathy coined the term "vibe coding" in February 2025<sup><a href="#cite-4" id="ref-4">[4]</a></sup>, describing a workflow where you speak your intent and the AI handles the implementation. By the end of the year, "vibe coding" was Collins Dictionary's Word of the Year, Claude Code alone was responsible for 4% of all public GitHub commits and growing fast<sup><a href="#cite-5" id="ref-5">[5]</a></sup>, and three companies—GitHub Copilot, Claude Code, and Cursor—had each crossed $1 billion in annual revenue.

![Both DJI Mic Mini transmitters with windscreens, fitting in the palm of my hand](/images/dji-mic-mini-transmitters.png)

I got the [two-pack](https://www.amazon.com/dp/B0FQJH54TR?tag=kevindotmd-20) (2 transmitters + 1 receiver) for $59. It comes with windscreens, magnetic clips for attaching to your shirt, a charging dock, a USB-C splitter cable so you can charge both mics at the same time, and a carrying pouch. The receiver is tiny—just a small USB-C dongle that plugs directly into my laptop. Having two transmitters means I can swap one in while the other charges, though with 11.5 hours of battery life per transmitter I haven't actually needed to swap yet. When I'm not coding, the same USB-C receiver plugs into my phone for recording content.

## The Setup

![The DJI Mic Mini receiver plugged into my laptop's USB-C port](/images/dji-mic-mini-receiver.png)

I pair it once, and then I just talk. It doesn't matter if I stand up to stretch, walk to the kitchen to get water, or pace around while thinking through an architecture problem. The mic picks up my voice clearly from anywhere because it's physically attached to me.

I use [VoiceInk](https://tryvoiceink.com/), an open source speech-to-text app, for transcription. Here are my stats after a few months:

![VoiceInk stats: 38 hours saved, 121,886 words dictated across 2,742 sessions, 104.8 words per minute, 609,430 keystrokes saved](/images/voiceink-stats.png)

That's 609,430 keystrokes I didn't have to type.

Before the DJI Mic Mini, I was using my MacBook's built-in microphone. It worked fine if I was sitting right in front of my laptop, enunciating clearly, speaking in its direction. But as soon as I stood up and walked across the room, the transcription quality fell off a cliff.

Another issue was volume. If you're talking to AI all day, you don't want to be projecting your voice toward a laptop across the room. That's a recipe for vocal strain. You'd think the solution is to just speak quietly, but there's a floor to how quiet you can go before the laptop mic can't pick you up. And going all the way down to a whisper is actually worse for you—otolaryngologists have found that whispering can cause *more* stress on your vocal cords than normal speech, because many people tighten the muscles around the voice box to compensate<sup><a href="#cite-3" id="ref-3">[3]</a></sup>.

A lavalier mic solves both problems. The vocal strain one is obvious—it's inches from your mouth, so I can talk at a quiet indoor voice without projecting. But the accuracy improvement is just as important. Transcription models like Whisper and Parakeet can make errors, but those models are only as good as their input audio. A laptop mic across the room picks up fan noise, room reverb, and a quieter voice signal. A mic clipped to your chest picks up a clean, close-range signal every time. Better source data means fewer transcription errors, which means less friction, which means you actually stay in the voice workflow instead of getting frustrated and reaching for the keyboard.

I'm running this through a speech-to-text tool that feeds directly into Claude Code. I describe what I want, the agent builds it, I look at the result on my phone, I describe what's wrong, and the agent fixes it. The entire feedback loop is voice-driven. My hands never touch the keyboard—not even for approving tool calls. I handle those with [ControllerKeys](https://kevintang.xyz/apps/controller-keys/), an app I built that lets me control macOS entirely with an Xbox controller. I wrote about it in [Every Shortcut Within Reach](/every-shortcut-within-reach.md). Between the mic and the controller, my keyboard is essentially a decoration.

## 10 Bits Per Second

In 2024, Caltech researchers published a paper called "The Unbearable Slowness of Being"<sup><a href="#cite-1" id="ref-1">[1]</a></sup> that quantified something neuroscientists had suspected for decades: conscious human thought runs at approximately 10 bits per second.

Ten. Bits. Per second.

Your eyes take in about a billion bits per second. Your ears, your skin, your proprioceptive system—all of it operates at enormous bandwidth. But the conscious part of your brain, the part that makes decisions and forms intentions and chooses what to do next, operates at a rate that would embarrass a 1970s modem.

This isn't a metaphor. The researchers surveyed decades of studies across wildly different tasks—typing, speaking, solving Rubik's Cubes, playing video games, reading—and found the same speed limit everywhere. About 10 bits per second of behavioral output, regardless of the task. Our sensory systems take in billions of bits per second, but somehow, at the level of conscious thought, we bottleneck to a trickle.

The implication is startling: **no input device can ever be faster than you can think.**

## The Bandwidth Hierarchy

Here's how our current input methods stack up:

| Method                         | Words Per Minute |
|:-------------------------------|----------------:|
| Eye tracking                   |       ~20 WPM   |
| Neuralink BCI (current best)   |       ~40 WPM   |
| Typing (average developer)     |       ~54 WPM   |
| Typing (fast developer)        |      ~100 WPM   |
| Natural speech                 |  ~100–150 WPM   |

Voice wins. Not by a little—by a lot compared to average typing, and roughly tied with the fastest typists.

But here's the thing that surprised me: Neuralink, the brain-computer interface that is supposed to be the future of human-computer interaction, currently tops out at 40 words per minute<sup><a href="#cite-2" id="ref-2">[2]</a></sup>. That's slower than the average developer types. Their ambitious goal with the VOICE clinical trial is 140 WPM—which is just... normal talking speed.

The sci-fi dream of thinking commands directly into a computer runs straight into the Caltech wall. Even if you could read neural signals perfectly, the conscious thoughts generating those signals only produce 10 bits per second. The bottleneck was never the interface. It was always the brain.

## Why Voice Beats Typing for Programming

When I say "voice is faster," I don't just mean words per minute. I mean the full loop.

**You can talk while looking at something else.** Say you're working on a web app. You navigate to the page in your browser, start talking: "the sidebar is overlapping the main content, and the nav links are wrapping to two lines." Then you pick up your phone, pull up the same page on mobile, and keep going: "on mobile the hamburger menu isn't opening, and the hero image is way too tall." You put your phone down, switch back to your terminal, end the transcription, and it's all there as one prompt. You stayed in the same flow the entire time—looking at the thing, talking about the thing, never breaking out of that observational mode to sit down and translate your thoughts into typed text.

**You can talk while thinking.** This sounds obvious but it's genuinely different from typing. When I type, I think first, then type. There's a serialization step—I formulate the thought, then I encode it into keystrokes. When I speak, the thought and the expression happen almost simultaneously. The bandwidth of speech is close enough to the bandwidth of thought that there's barely any buffering.

**Errors don't matter.** I wrote about this in [Misheard Lyrics for Robots](/misheard-lyrics-for-robots.md)—I said "run make install build from source" and my transcription software heard "Ryan Lacon stall book for source." Claude Code ran `make install BUILD_FROM_SOURCE=1` anyway. When your listener is an LLM, transcription errors are just noise that gets filtered out. The error tolerance of natural language is orders of magnitude higher than the error tolerance of a keyboard.

**You can move.** The DJI Mic Mini on my shirt handles voice input. The DualSense controller in my hand handles everything else—approving tool calls, switching windows, scrolling, navigating—via [ControllerKeys](https://kevintang.xyz/apps/controller-keys/). Together, they make me completely untethered. I live in a small unit and I haven't had a desk in about three years. My back was starting to hurt from hunching over a laptop on the couch, which is part of why I built ControllerKeys in the first place—I needed a way to work that didn't chain me to one position. Now I can stand up, walk to the kitchen, pace around while thinking through a problem, and keep working the entire time. The only limit I've found is wireless range—thick panes of glass can cut the signal short, but barring that, I can work from anywhere in my apartment.

## The Real Unlock: Compression of Intent

Here's the argument that goes beyond raw words-per-minute.

AI coding agents have compressed programming from writing code to describing intent. Instead of typing 47 lines of Swift to implement a camera animation, you say "make the camera do a cinematic swoop into the photo when you tap it" and the agent writes the Bezier math for you. You could type that sentence too — but once programming becomes a series of short, conversational exchanges, voice is the natural medium. You're not writing code anymore. You're just talking. And talking is what voice was literally designed for.

The advantages compound from there. You talk while staring at the bug on your phone. You talk while pacing through an architecture problem. You talk while the agent is still finishing its last task, queueing up your next thought. There's no context-switch to the keyboard, no breaking out of the flow to sit down and type. The conversation just keeps going.

## A Content Creator Accessory Is Now a Programming Tool

The DJI Mic Mini costs $59. It was built for people who make YouTube videos and TikToks. The product page shows influencers filming themselves cooking and traveling.

I'm using it to debug RealityKit coordinate transforms.

There's something funny about the fact that the highest-bandwidth programming peripheral you can buy in 2026 isn't a mechanical keyboard or an ergonomic split board—it's a lavalier microphone originally designed for vloggers. The tool categories are converging. Content creation and software engineering now share the same input device because they share the same upstream constraint: getting human intent into a computer as fast as possible.

I think five years from now — assuming on-device mics don't get significantly better — a wireless mic will be as standard in a developer's kit as a second monitor. Not because everyone will be recording themselves—but because talking to your AI agent is faster than typing to it, and a good mic is the difference between "works okay" and "works every time."

## The 10-Bit Ceiling

The Caltech paper ends with a question that nobody has answered: *why* is human conscious thought so slow? We have 86 billion neurons, each capable of transmitting hundreds of bits per second, yet we think one thought at a time at 10 bits per second. The researchers suggest we're limited not by hardware but by some deep architectural constraint—perhaps the brain can only maintain one coherent "thread" of consciousness at a time.

If that's true, then the optimizations we should be chasing aren't about faster interfaces. They're about richer compression—making each of those 10 bits count for more. And that's exactly what AI does. It takes a low-bandwidth, noisy, sometimes garbled human signal and reconstructs the high-bandwidth intent behind it.

Voice input is already close to saturating our conscious output bandwidth. The next frontier isn't a faster pipe from brain to computer. It's a smarter decoder on the other end.

## Citations

<p id="cite-1">[1] <a href="https://arxiv.org/html/2408.10234v2" target="_blank" rel="noopener noreferrer">The Unbearable Slowness of Being: Why do we live at 10 bits/s?</a> — Zheng & Meister, Neuron (2024) <a href="#ref-1">↩</a></p>

<p id="cite-2">[2] <a href="https://teslanorth.com/2026/01/28/from-paralysis-to-neuroscience-how-21-people-are-using-neuralink-in-2026/" target="_blank" rel="noopener noreferrer">From Paralysis to Neuroscience: How 21 People are Using Neuralink in 2026</a> — TeslaNorth <a href="#ref-2">↩</a></p>

<p id="cite-3">[3] <a href="https://pubmed.ncbi.nlm.nih.gov/16503476/" target="_blank" rel="noopener noreferrer">Laryngeal hyperfunction during whispering: reality or myth?</a> — Journal of Voice (2006) <a href="#ref-3">↩</a></p>

<p id="cite-4">[4] <a href="https://x.com/karpathy/status/1886192184808149383" target="_blank" rel="noopener noreferrer">Andrej Karpathy on "vibe coding"</a> — X/Twitter (2025) <a href="#ref-4">↩</a></p>

<p id="cite-5">[5] <a href="https://newsletter.semianalysis.com/p/claude-code-is-the-inflection-point" target="_blank" rel="noopener noreferrer">Claude Code is the Inflection Point</a> — SemiAnalysis (2026) <a href="#ref-5">↩</a></p>

Projects · GitHub · 𝕏 · Instagram · TikTok · Spotify · LinkedIn · Buy me a coffee