The CEO Gap — Steadman

20th June 2026

Every answer an AI gives you sits some distance from the ‘correct’ one. Close that distance and you can trust the work. Leave it open and you can't. The CEO principle (Check, Edit, Own) is how we cross that gap by hand today. This is a picture of the gap: how big it is, what it costs to close, and why the checking it needs keeps shrinking. The interesting question is whether it ever reaches zero, and for which tasks.

The four bands

Not good enoughA wrong or off-target answer. Rare now on a good tool: usually a vague prompt, not a model mistake. A product defect to engineer away.

Good first draftIn the right area, but it absolutely needs the full CEO. Cheap to get, expensive to trust as-is.

Good enough to sendMight carry the odd human-level mistake, the kind you'd make yourself. Light CEO, or just press send.

VerifiedMechanically known to be right: a calculation run in code, a quote checked against the source. No reason left to check.

Closer and closer to the answer

The AI's response climbs toward the correct answer as you put in more effort and spend. The space left above the line is the CEO gap: how much checking, editing and owning the work still needs. Read left to right as a journey: the gap is wide while the answer is not good enough, small once it's a good first draft, and by the verified band the curve meets the line and the gap is gone.

A better AI (model and harness) lifts the whole curve: the same effort carries you further along the bands. Where you choose to stop is set by what the task is worth, not by where the curve could reach. The curve flattens fast, and only in the verified band, where the answer can be mechanically checked, does it actually meet the line. Short of that, a last sliver of doubt remains: the hardest, and often the most expensive, to remove.

Aim for the right band, on purpose

Which band you're aiming for is a decision you make before you start, and it changes from task to task.

Good first draft. For some work this is all you want. Take it, then stop and pour a lot of human judgement into the editing and the rewrite. That pause is where the value is.

Good enough to send. For other work, aim here, do a quick check, and press go.

Verified. For a narrow but growing set, aim high enough that the answer can go out on its own, automatically, without even a light check.

Not good enough. Never the aim. It's wasted effort, and with a little skill and the right care up front, it's the one band you can always avoid.

The cost of getting there

Take one real task: read four transcripts and six documents (about 70,000 tokens), then draft a proposal with two revision passes. Here is what it costs, two ways. The model price is the easy part. The expensive input is rework: the redo, the second look, the "not quite, try again".

Numbers from a verified cost model on June 2026 list prices, built on the May 2026 firm usage data. The full spread from the cheapest model to the dearest on this task is $3.51, about 51 seconds of a senior manager's time. So the moment the cheap model costs you even a minute of redo, the dear one was the cheaper choice. Spend up, and buy the rework out of existence.

Where the work lands, over time

The same picture, three moments apart. As tools improve, the mass of everyday tasks moves out of the left-hand bands and into good enough to send and beyond. The verified band on the right, all but absent at the start, slowly opens up.

Late 2022

When ChatGPT launched

Today

A good model, used well

Six months on?

If the trend holds

Illustrative, not measured. Each bar reads left to right in the same band order as the chart above. The shape is the point: the bands don't change, but the work shifts rightwards through them over time, and the checking each task needs falls away with it.

Three things to hold onto

The gap is shrinking, but you still own the result.

More work landing in the higher bands doesn't move accountability onto the machine. You direct it, you check where it matters, you own what goes out, exactly as you would for anything a colleague handed you. The further the AI reaches on your behalf, the more it's your judgement, not your absence, that makes the work good.

Stakes decide the checking, not just the gap.

Two things set how much CEO a task needs: how close the answer is to correct, and how much a mistake would cost. A narrow gap on a board paper or a due-diligence number still earns a full check. A wide gap on a throwaway note might just get sent. An answer can be probably right and still worth checking, because the one time it isn't might be the time that matters.

Does the top band really exist? Sometimes, and it's growing.

The honest answer is yes, for a narrowing class of tasks. A calculation run in code, a quote checked against its source: there the answer is mechanically known to be right and there's nothing left to check. The open questions are how big that band gets, how fast, and where we're willing to draw the line that says "this part is verified, trust it."

The thing to build: every task should tell you what to check.

If all of this is right, then each piece of work an AI system hands back should come with its own label: roughly which band it landed in, and what kind of checking it needs. Not "here's your answer," but "here's your answer, it's a good first draft, check the figures." That turns the CEO principle from a rule people have to remember into a property of the tool itself, and it's how AI helps us work better, quicker, and happier without quietly handing over the things only a person should own.

Part of a connected set. See where tasks sit on the AI Usage Spectrum, and what you're managing at each level in Three Generations of AI.