---
title: The CEO Gap
source: https://steadman.ai/newsletters/david/the-ceo-gap.html
published: 2026-06-20
summary: Every AI answer sits some distance from the correct one. This is a picture of that gap: how big it is, what it costs to close, and why the checking it needs keeps shrinking.
---

# The CEO Gap

*Check, Edit, Own. And the day you might not have to.*

Status: work in progress.

Every answer an AI gives you sits some distance from the 'correct' one. Close that distance and you can trust the work. Leave it open and you can't. The **CEO principle (Check, Edit, Own)** is how we cross that gap by hand today. This is a picture of the gap: how big it is, what it costs to close, and why the checking it needs keeps shrinking. The interesting question is whether it ever reaches zero, and for which tasks.

## The four bands

Every answer lands in one of four quality bands.

- **Not good enough** — A wrong or off-target answer. Rare now on a good tool: usually a vague prompt, not a model mistake. A product defect to engineer away.
- **Good first draft** — In the right area, but it absolutely needs the full CEO. Cheap to get, expensive to trust as-is.
- **Good enough to send** — Might carry the odd human-level mistake, the kind you'd make yourself. Light CEO, or just press send.
- **Verified** — Mechanically known to be right: a calculation run in code, a quote checked against the source. No reason left to check.

## Chart 1: Closer and closer to the answer

The AI's response climbs toward the correct answer as you put in more effort and spend (the horizontal axis is effort and cost: better prep, better prompt, better process). The space left above the line is **the CEO gap**: how much checking, editing and owning the work still needs. Read left to right as a journey: the gap is **wide** while the answer is not good enough, **small** once it's a good first draft, and by the verified band the curve meets the line and the gap is **gone**.

The four bands sit left to right along the effort-and-cost axis, each narrower than the last: "not good enough" is the widest, then "good first draft", then "good enough to send", then a narrow "verified" band at the right. The curve rises steeply, then flattens, and meets the correct-answer line inside the verified band.

- The not-good-enough band (widest, on the left) shows a **wide gap** between the AI's response and the correct answer.
- The good-first-draft band shows only a **small gap**.
- By the verified band the curve has met the line: **no gap**.

A better AI (model and harness) lifts the whole curve: the same effort carries you further along the bands. Where you choose to stop is set by what the task is worth, not by where the curve could reach. The curve flattens fast, and only in the verified band, where the answer can be mechanically checked, does it actually meet the line. Short of that, a last sliver of doubt remains: the hardest, and often the most expensive, to remove.

## Aim for the right band, on purpose

Which band you're aiming for is a decision you make before you start, and it changes from task to task.

- **Good first draft.** For some work this is all you want. Take it, then stop and pour a lot of human judgement into the editing and the rewrite. That pause is where the value is.
- **Good enough to send.** For other work, aim here, do a quick check, and press go.
- **Verified.** For a narrow but growing set, aim high enough that the answer can go out on its own, automatically, without even a light check.
- **Not good enough.** Never the aim. It's wasted effort, and with a little skill and the right care up front, it's the one band you can always avoid.

## Chart 2: The cost of getting there

Take one real task: read four transcripts and six documents (about 70,000 tokens), then draft a proposal with two revision passes. Here is what it costs, two ways. The model price is the easy part. **The expensive input is rework: the redo, the second look, the "not quite, try again".**

- **Best model, right first time:** Fable 5, $4.56, no rework.
- **Cheapest model, then one fix:** Sonnet 4.6 at $1.05 plus one 10-minute rework pass (about $40 of time at ~$4/min), for about $41 total, almost all of it the rework.
- **Doing it by hand:** 4 to 8 hours, roughly $1,000 to $2,000. Off the top of the chart: 25 to 50 times the bar at right.

Numbers from a verified cost model on June 2026 list prices, built on May 2026 firm usage data. The full spread from the cheapest model to the dearest on this task is $3.51, about 51 seconds of a senior manager's time. So the moment the cheap model costs you even a minute of redo, the dear one was the cheaper choice. Spend up, and buy the rework out of existence.

## Chart 3: Where the work lands, over time

The same picture, three moments apart. As tools improve, the mass of everyday tasks moves out of the left-hand bands and into **good enough to send** and beyond. The **verified** band on the right, all but absent at the start, slowly opens up.

- **Late 2022 (when ChatGPT launched):** not good enough 35%, good first draft 45%, good enough to send 18%, verified 2%.
- **Today (a good model, used well):** not good enough 5%, good first draft 25%, good enough to send 50%, verified 20%.
- **Six months on? (if the trend holds):** not good enough 2%, good first draft 12%, good enough to send 46%, verified 40%.

Illustrative, not measured. Each bar reads left to right in the same band order as the chart above. The shape is the point: the bands don't change, but the work shifts rightwards through them over time, and the checking each task needs falls away with it.

## Three things to hold onto

**The gap is shrinking, but you still own the result.** More work landing in the higher bands doesn't move accountability onto the machine. You direct it, you check where it matters, you own what goes out, exactly as you would for anything a colleague handed you. The further the AI reaches on your behalf, the more it's your judgement, not your absence, that makes the work good.

**Stakes decide the checking, not just the gap.** Two things set how much CEO a task needs: how close the answer is to correct, and how much a mistake would cost. A narrow gap on a board paper or a due-diligence number still earns a full check. A wide gap on a throwaway note might just get sent. An answer can be probably right and still worth checking, because the one time it isn't might be the time that matters.

**Does the top band really exist? Sometimes, and it's growing.** The honest answer is yes, for a narrowing class of tasks. A calculation run in code, a quote checked against its source: there the answer is mechanically known to be right and there's nothing left to check. The open questions are how big that band gets, how fast, and where we're willing to draw the line that says "this part is verified, trust it."

## The thing to build: every task should tell you what to check

If all of this is right, then each piece of work an AI system hands back should come with its own label: roughly which band it landed in, and what kind of checking it needs. Not "here's your answer," but "here's your answer, it's a good first draft, check the figures." That turns the CEO principle from a rule people have to remember into a property of the tool itself, and it's how AI helps us work better, quicker, and happier without quietly handing over the things only a person should own.

---

Part of a connected set. See where tasks sit on the [AI Usage Spectrum](https://steadman.ai/newsletters/david/ai-usage-spectrum.html), and what you're managing at each level in [Three Generations of AI](https://steadman.ai/newsletters/david/three-generations.html).
