20 Apr 2026

Should a company measure AI productivity gains by headcount reduction, output increase, or employee satisfaction?

Measure output increase — but only after you've done two things first: declared which strategic bet your company is actually making with AI, and established a clean pre-deployment baseline. Headcount reduction measures cost destruction, not value creation, and satisfaction measures mood — neither tells you whether the AI is producing anything worth keeping. The Deloitte data confirms that leaders achieving real returns made AI strategic before choosing any metric, and EY traced enterprise AI losses not to bad KPIs but to absent governance and unverified outputs entering live workflows. Output increase is the only metric that forces accountability on what the system actually produces — but without a baseline and a named human responsible for when the number is wrong, it's decoration.

Generated with Claude Sonnet · 85% overall confidence · 5 advisors · 5 rounds

Predictions

By December 2026, fewer than 20% of Fortune 500 companies will be able to produce a statistically defensible pre/post AI output comparison, because pre-deployment baselines were contaminated by unsanctioned AI tool use (Copilot, ChatGPT, Gemini) before Q1 2026 lockdowns were established. 81%

By Q4 2026, at least 60% of enterprises that adopted headcount-reduction as their primary AI ROI metric in 2025 will report flat or negative productivity gains in internal reviews, because the metric captures cost destruction rather than output creation and masks value loss from reduced institutional knowledge. 74%

Output-increase will be formally adopted as the dominant enterprise AI productivity metric by at least 3 of the Big 4 consulting firms' internal benchmarking frameworks by mid-2027, with employee satisfaction relegated to a secondary 'adoption health' indicator rather than an ROI measure. 68%

Action Plan

This week — audit whether a clean baseline can still be reconstructed. Pull your historical productivity data from before your first AI tool deployment: lines of code reviewed per engineer per sprint, reports generated per analyst per week, tickets closed per support rep per day — whichever unit matches your function. If that data exists in Jira, Linear, Salesforce, or your BI layer from 18+ months ago, you can reconstruct a usable pre-AI snapshot even if you didn't capture it intentionally. If it doesn't exist, stop here: say to your CFO and CTO this week — "Before we finalize any AI productivity metric, I need to know whether we have pre-deployment baselines in our systems. If we don't, our output increase numbers are not verifiable and I won't be presenting them to the board as if they are." This single conversation will determine whether your entire measurement framework is buildable or decorative.
By April 27 — name two people, not one. Appoint a Metric Owner (responsible for what the output number says) and a Metric Challenger (whose formal job description includes questioning the Metric Owner's methodology quarterly). The Challenger should report to a different executive than the Owner. Say to your leadership team: "I'm not assigning a dashboard owner — I'm assigning an owner and an auditor. The auditor's performance review will include whether they surfaced problems the dashboard didn't show." If you get pushback that this is redundant, say: "The fintech teams that lost two quarters to a gap between what the dashboard showed and what the system actually was didn't have an auditor. That's exactly the gap I'm closing."
By April 30 — define your strategic bet in one sentence, in writing, before choosing a metric. Convene a 90-minute working session with your CEO, CFO, and head of the most AI-exposed function. The output of that session is one written sentence in this form: "We are deploying AI to [increase throughput / improve output quality / accelerate decision speed / reduce error rate] in [specific function], and we will know it is working when [specific measurable outcome] changes by [specific magnitude] within [specific timeframe]." If you cannot produce that sentence in 90 minutes, your measurement problem is actually a strategy problem. Do not pick a metric until that sentence exists and has sign-off from all three executives.
Run baseline reconstruction and governance design in parallel — not sequentially. Assign the baseline audit to your data or analytics team starting this week. Simultaneously, assign governance design (who owns AI output verification, what the escalation path is, what constitutes a material error) to your ops or risk function. These workstreams do not depend on each other. If anyone proposes that governance must be finalized before measurement can begin, say: "Governance and baseline work can run concurrently. I'm not letting them queue behind each other — that's how companies spend a calendar year before a single number means anything."
Choose output metrics that measure outcomes, not activity — and write down why each proxy you rejected was rejected. Once your strategic bet sentence exists, identify two to three candidate output metrics. For each one, ask: "Does this number go up if we produce more defects faster?" If yes, it is an activity proxy, not an outcome indicator. Replace PR throughput with structural integrity ratios. Replace ticket volume with resolution accuracy rates. Replace report count with decision-impact rates (how often the output was actually used to make a downstream decision). Document the rejected proxies in writing — this creates an audit trail that protects you when a future leader asks why you didn't measure velocity.
Set a 60-day hard checkpoint for May 30, 2026. At that checkpoint, you must be able to answer yes or no to three questions: (1) Does a verified pre-AI baseline exist? (2) Is there a named Metric Owner and a named Metric Challenger with accountability in their performance reviews? (3) Has at least one output metric produced a data point that was challenged and either defended or corrected? If any answer is no, the AI deployment is producing unverifiable claims. Escalate that status to your board — not as a failure, but as a governance finding that requires a decision: pause deployment in that function until the answer is yes, or explicitly accept the risk in writing.

The Deeper Story

The meta-story running beneath all four dramas is this: an organization under pressure doesn't reach for clarity — it reaches for cover. Every advisor, from different angles and with different props, described the same recurring human drama: a decision already made in a room that no one admits to, surrounded by an elaborate performance of rigor whose real function is not to find truth but to distribute blame, manufacture consent, and ensure that when the outcome disappoints, the responsibility has been successfully dispersed into process. Marcus's pre-decided CFO, Bongani's squeaking whiteboard marker, Rita's unread laminated report, The Contrarian's dashboard chime — these are all scenes from the same play, which might be called The Ritual of Institutional Innocence: the organizational need to have performed accountability without ever being accountable. The advisors themselves became unwitting cast members in it, which is the most damning reveal of all. What this deeper story exposes — and what no practical advice about metrics can touch — is that the company's difficulty choosing a measurement isn't a knowledge problem or even a strategy problem. It's a consequence problem. Any genuine metric creates a genuine loser: someone whose budget shrinks, whose team dissolves, whose judgment is retrospectively indicted. The organization has quietly, structurally arranged itself so that real consequences land on no one in particular, and the search for "the right metric" is really a search for the number that preserves that arrangement while still looking serious. This is why the question feels impossible: the company isn't asking which metric is true — it's asking which metric is safe. And that question has no honest answer that a panel of advisors, a KPI framework, or a refreshing dashboard can provide. Only someone willing to name, out loud, what they are prepared to lose can break the spell.

Evidence

The Contrarian identified the core trap: headcount reduction, output increase, and employee satisfaction are three different strategic bets requiring different capital allocation and risk tolerance — picking a metric before declaring which bet you're making produces "expensive, well-governed confusion."
The Auditor confirmed EY data: nearly every large company that deployed AI incurred initial financial losses, and EY traced the cause to compliance failures and flawed outputs — not to choosing the wrong productivity metric.
The Auditor on baselines: almost every company that deployed AI in 2023–2024 did so without a clean pre-deployment productivity snapshot, meaning every subsequent measurement "calculates a delta from an already-contaminated starting point."
Deloitte's survey of 1,854 executives confirms AI ROI remains elusive — and the same research flags that leaders achieving real returns made AI strategic first, then chose metrics that traced back to those strategic goals.
Bongani's fintech audit case demonstrates the governance-metric link: a green PR throughput dashboard masked structural code degradation because no one's job description required verifying AI output quality — a dashboard without an owner is decoration.
Rita's defense contractor case is a direct warning against sequential governance: 14 consecutive months of governance committee work, zero output baselines, and the organization felt responsible the whole time — governance and baseline development must run in parallel.
The Contrarian on metric trade-offs: cut headcount, satisfaction craters, output follows it down six months later — the three metrics don't coexist, they trade off against each other in real operational ways.
The debate's convergent conclusion (Round 5): the only question that determines whether measurement is real or performative is who inside the company loses something if the number is bad — if no one can answer that, no metric framework will produce accountability.

Want to run your own decision?

Download the Manwe beta and turn one real question into advisors, evidence, dissent, and a decision record.

Download beta

Risks

Your pre-deployment baseline may already be gone. If your company gave employees access to any AI tools — Copilot, ChatGPT, Gemini — before April 2026 without locking down a clean input/output snapshot, every output increase you measure now is a delta from a contaminated starting point. You cannot mathematically isolate what the AI produced from what changed due to team turnover, product complexity, market conditions, or the informal AI use you didn't sanction. "Output increased 23%" is not a verifiable claim — it's a number in search of a cause.
"Output increase" is not one metric — it's a category containing proxies that will lie to you. PR throughput, ticket velocity, and report volume are activity measures that look like output metrics. They measure what entered the system, not what survived it. A team that ships twice as many features with twice as many critical bugs has increased output and destroyed value simultaneously. The risk of following the verdict is that your team picks a throughput proxy, calls it an output metric, and boards a dashboard that shows green while structural integrity quietly degrades — exactly the fintech pattern described.
Governance-first sequencing is how organizations grant themselves a permanent hall pass. The verdict's two preconditions — declare strategic intent, then establish a baseline — read as sequential gates. In practice, governance committees self-perpetuate. If you queue baseline development behind strategy declaration, and strategy declaration behind governance architecture, you will still be in alignment meetings in Q3 2026 while AI is already making decisions in live workflows. The EY losses weren't caused by companies that skipped metrics — they were caused by companies that let unverified AI outputs enter production while the governance work was still "in progress."
Headcount reduction wasn't fully ruled out as a secondary signal — it was dismissed on philosophical grounds, which is not the same thing. A properly constructed loaded labor cost model — one that includes institutional knowledge loss, ramp time for replacements, and project delay costs — produces a hard financial number that output metrics cannot fake. The Square-pattern risk (two engineers walk, replacement costs triple their combined salary) is detectable in advance if you're running that model. Relying solely on output increase means you can be hitting your output KPI while bleeding talent capital that won't show up as a loss until it's too late to reverse.
Naming one accountability owner for a single output metric doesn't solve the ownership problem — it creates a single point of failure. When the metric is wrong, that person's incentive is to explain why the number is still directionally correct, not to escalate. Without a formal challenge mechanism — a second named person whose job description includes questioning the metric owner — you've created accountability theater. The dashboard now has a face attached to it, but the face has no structural incentive to surface bad news.

The Panel

Rita Kowalski (Organizational systems auditor and performance measurement architect) — Conviction: 85%
Marcus Sterling (VP of Portfolio Strategy at Horizon Capital Partners) — Conviction: 94%
Bongani Khumalo (CTO at Series B fintech, former infrastructure lead) — Conviction: 60%
The Contrarian (Devil's Advocate) — Conviction: 45%
The Auditor (Fact-Checker) — Conviction: 83%

Debate Rounds

Round 1

Four advisors debated which AI success metric matters most — headcount reduction, output increase, or employee satisfaction — and quickly agreed these aren't comparable measures but proxies for entirely different strategic goals. The real argument became whether financial ROI modeling (Marcus) or cultural/retention signals (Bongani) is the more reliable early-warning system, with The Auditor grounding both in Deloitte data showing that metric choice only matters after strategic intent is defined.

The Contrarian REFRAME
Nobody's asking the obvious thing: these three options aren't measuring the same phenomenon, so picking one isn't a preference — it's a confession about what you actually care about. Headcount reduction measures cost destruction. Output increase measures production volume. Employee satisfaction measures mood. A company chasing all three simultaneously is going to get whipsawed because they trade off against each other in real, physical ways — cut headcount, satisfaction craters, output follows it down six months later. Before this panel spends an hour debating which metric wins, somebody needs to say: what is the company actually trying to accomplish with AI, and does even one of these three options trace back to that goal without snapping halfway through?

Bongani Khumalo SHARE
The Contrarian's point about these metrics measuring different phenomena is exactly right, but here's what's missing from this conversation: the sequencing problem. At Square, we rolled out AI coding tools and leadership immediately asked for headcount numbers. What nobody tracked was that our two most senior infrastructure engineers — the ones who knew where every body was buried — walked out inside ninety days because they felt like they were being counted for elimination rather than empowered. We spent the next eight months and probably three times their combined salary trying to reconstruct institutional knowledge that lived entirely in their heads. The measurement you choose sends a signal to your best people about whether they're assets or line items, and that signal travels faster than any memo your CHRO writes.

Marcus Sterling PUSHBACK
Bongani, I hear the Square story, but it actually proves my point — not yours. The reason those two engineers walked and cost triple their combined salary to replace isn't a failure of measurement philosophy, it's a failure of measuring the right financial inputs. If leadership had modeled full replacement cost — institutional knowledge, ramp time, project delays — those seats would have shown up as far too expensive to eliminate. That's still a financial ROI conversation. The Deloitte 2025 survey of over 1,800 executives found rising AI spend but elusive returns across the board, and I'd wager the companies seeing the worst outcomes are the ones who substituted employee sentiment surveys for hard cost accounting. Satisfaction scores didn't warn Square — a proper loaded labor cost model would have.

The Auditor FACTCHECK
Marcus, the Deloitte survey citation checks out — 1,854 executives, rising spend, elusive returns, that's confirmed. But the leap from 'ROI is elusive' to 'satisfaction surveys caused it' isn't in the data — that's your editorializing, not their finding. What the same Deloitte research actually flags is that leaders achieving real returns are the ones who made AI strategic first and then chose metrics that traced back to those strategic goals — not leaders who defaulted to any single metric type, financial or otherwise. The survey doesn't exonerate cost accounting over satisfaction scores; it indicts the absence of strategic alignment before any measurement framework gets built.

Round 2

The group debated how to measure AI-driven productivity, starting with a dispute over output metrics versus labor cost models. Bongani used a real audit case to argue that even well-designed dashboards missed structural degradation caused by silent AI-generated technical debt. Rita countered that the problem was metric selection, not measurement philosophy — but The Auditor reframed the whole debate: EY data shows the primary failure mode in enterprise AI deployment isn't bad KPIs, it's absent governance, with companies incurring real financial losses from compliance failures and unverified AI outputs before any metric framework could help.

Bongani Khumalo SHARE
Marcus, a loaded labor cost model is only as good as the assumptions baked into it — and the audit I just finished last quarter at a fintech peer showed exactly where that breaks down: their output metrics looked completely healthy for eight months straight, shipping velocity up, PR throughput up, incident count flat, right up until three senior engineers quit the same week. What the financial model never captured was that those engineers had spent two quarters silently remediating AI-generated technical debt that never surfaced in any cost ledger. The outputs weren't lying — they just weren't measuring structural integrity, and by the time the model would've flagged anything, the damage was already load-bearing.

Rita Kowalski PUSHBACK
Bongani, that fintech story actually proves my case — not against output metrics, but against picking the wrong output metrics. PR throughput and shipping velocity are activity measures dressed up as performance measures, and any ops director worth their fee would've told leadership to track structural integrity ratios and remediation-to-new-feature ratios instead. The EY AI Pulse Survey found leaders getting real returns were channeling productivity gains back into R&D and retraining rather than headcount cuts — which means the operators who figured this out weren't measuring volume, they were measuring capability accumulation. Your engineers spent two quarters filling a gap between what the dashboard said and what the system actually was, and that gap exists because someone chose a throughput proxy instead of an outcome indicator. I'm not defending bad output metrics — I'm saying the answer to 'output metrics lied to us' is 'pick better output metrics,' not 'add a mood survey to the dashboard.'

The Auditor FACTCHECK
Rita, your EY citation actually checks out — the fourth EY US AI Pulse Survey does confirm leaders are channeling productivity gains into R&D and retraining over headcount cuts, so that reference is solid. But the same EY body of work has a finding nobody in this room has touched: nearly every large company that deployed AI incurred some initial financial loss, and EY traced the cause to compliance failures and flawed outputs — not to choosing the wrong productivity metric. The companies that came out ahead weren't the ones who picked better KPIs, they were the ones who defined responsible AI governance principles before they started deploying. That's a different problem than the one Marcus and Bongani are arguing about, and it suggests the metric selection debate — however real — is downstream of a governance decision this company may not have made yet.

The Contrarian REFRAME
The Auditor just dropped the most important thing said in this entire debate and everyone moved past it — EY found companies incurred initial financial losses traced to compliance failures and flawed outputs, not to picking the wrong metric. So we're up here arguing about dashboards while the actual failure point is a governance decision that may not have happened yet. Headcount, output, satisfaction — none of those metrics tell you whether the AI is producing outputs anyone verified before they became load-bearing in a real workflow. The question this company needs to answer before choosing any measurement framework is: who owns accountability when the AI is wrong, and does that accountability structure even exist yet?

Round 3

Round 3 sharpened into a debate about sequencing and strategic clarity. Bongani argued governance and measurement are the same problem, requiring a named human accountable for when metrics fail. The Auditor agreed but warned that without a pre-AI baseline, every measurement framework calculates from a corrupted starting point. The Contrarian escalated by arguing the company hasn't declared which strategic bet it's making — cost reduction, output growth, or satisfaction — and that choosing metrics before strategy produces expensive confusion. Rita closed by warning that 'governance first, measurement second' is exactly how organizations grant themselves a permanent hall pass from accountability, and argued both workstreams should run in parallel.

Bongani Khumalo WARN
The Contrarian is right that governance is upstream of metrics, but here's what nobody's said yet — the governance question and the measurement question are actually the same question in disguise, and I can prove it from what I just lived through. The fintech audit I finished last month revealed something specific: the team had no designated owner for AI output verification, so when the PR throughput dashboard looked green, nobody's job description required them to ask whether the code was sound. That's not a metric problem or a governance problem in isolation — it's the same gap wearing two different labels. The bottom line for the person trying to make this decision right now: before you pick a metric, assign a human being accountability for what happens when that metric is wrong, because a dashboard without an owner is just decoration.

The Auditor WARN
Bongani, the governance-equals-measurement framing is the most defensible thing said in three rounds — and I'm flagging exactly where it falls apart in practice. Almost every company that deployed AI tools in 2023 and 2024 did so without establishing a clean pre-deployment productivity baseline. The evidence I'm working from states plainly that productivity is fundamentally the relationship between input and output — but if the pre-AI input/output snapshot doesn't exist, every measurement framework this company picks is calculating a delta from an already-contaminated starting point. Assign the accountability owner Bongani's recommending, build the governance structure The Contrarian's recommending, choose the right output metrics Rita's recommending — none of it produces a verifiable claim without a baseline, because you can't validate an AI productivity gain you can't actually isolate from every other variable that changed at the same time.

The Contrarian REFRAME
Here's what nobody's said yet: the entire debate assumes this company already knows what it's trying to do with AI, and I'm not convinced that's true. Headcount reduction, output increase, employee satisfaction — those aren't three measurement options, they're three completely different strategic bets that require different capital allocation, different governance, and different risk tolerance. Picking a metric before declaring which bet you're making is like choosing a speedometer before deciding whether you're driving a race car or a school bus. The bottom line is that the measurement framework is the strategy, and if leadership can't articulate which of those three bets they're actually making, no baseline, no accountability owner, and no governance structure will save them — they'll just have expensive, well-governed confusion.

Rita Kowalski WARN
The Contrarian, the strategic-bet framing is sharp but it ends in a trap I've watched play out in real life — I spent three weeks inside a mid-tier defense contractor whose board had accepted 'we're still defining our AI governance framework' as a standing agenda item for fourteen consecutive months. Three governance committees, zero output baselines, and leadership felt responsible the entire time because they were doing the governance work. What this room is building toward — governance first, measurement second, sequentially — is exactly how serious organizations grant themselves a permanent hall pass from accountability. You can run governance architecture and output baseline development in parallel; they don't have to queue up behind each other, and treating them as sequential is how a company burns a calendar year before a single number means anything.

Round 4

In Round 4, each advisor turned inward — surfacing the hidden assumption underlying their own position and then stress-testing a colleague's. The result was a rare moment of collective honesty: the group converged on a shared discomfort about whether organizations asking for AI metrics actually want accurate answers, or just defensible ones. The debate shifted from 'how do you measure ROI' to 'who is measurement actually serving?'

The Auditor OBSERVE
The assumption I've never questioned is this: organizations actually want accurate information. My entire analytical framework — every call for baselines, every flag on corrupted starting points, every demand for governance before measurement — runs on the premise that when a company asks how to measure AI gains, they want a true answer, not a defensible one. That assumption has never been stress-tested in my work, and it should be. Rita, what you take for granted is that imperfect data, honestly labeled as imperfect, will continue to be treated as imperfect once it's on a dashboard — and I think your defense contractor experience is the exact reason that feels obvious to you. You watched 14 months of governance theater produce nothing, so you concluded the cost of delay always beats the cost of starting wrong. What I'd push back on is the hidden step in that logic: organizations have a demonstrated capacity to launder uncertain numbers into certainty the moment those numbers live inside a reporting system, and parallel execution only avoids the sequencing trap if the people running both workstreams have the institutional authority to say 'this number is unreliable' without being overruled by whoever owns the dashboard.

The Contrarian SHARE
Here's what I take for granted that I've never said out loud: I assume organizations are fundamentally incompetent at self-knowledge — that when a company asks 'how should we measure this,' they don't actually know what they're doing or why, and my job is to expose that before they waste money on the wrong scaffolding. That assumption drove every round — I kept pulling the conversation back to 'define your strategic bet first' because I genuinely don't trust that the person asking the question has done that work. Now The Auditor — their hidden assumption is that truth, once surfaced, changes behavior. Every call for baselines, every governance flag, every 'you can't calculate a delta without a clean starting point' — that whole framework only makes sense if you believe accurate information produces better decisions. The uncomfortable truth nobody's said: most organizations already know their AI deployment isn't working, and they're asking about metrics not to find the truth but to produce a number they can defend upward. The Auditor is building a cathedral of analytical rigor for a client who wants a press release.

Marcus Sterling SHARE
What I take for granted — and I mean this without apology — is that financial accountability is morally neutral. I've spent 25 years assuming that if you point at the number, the number doesn't lie, and that the people in the room who resist the number are protecting something they shouldn't be. That assumption drove everything I said in rounds one through three: measure labor cost, measure revenue per seat, because those are the only variables that can't be narrative-managed once they're on a balance sheet. The Contrarian takes for granted that organizations are fundamentally incompetent at self-knowledge — and I think that assumption comes from someone who's spent a career as an outside voice, never actually owning a P&L, never being the person whose name is on the line when the board asks why EBITDA moved. From that outside position, every leadership team looks confused, because you're only ever called in when something's already broken. What The Contrarian gets wrong is that incompetence and deliberate ambiguity aren't the same thing — most of the leadership teams I've seen weren't confused about their strategic bet, they were actively avoiding the accountability that comes from declaring it, which is a governance problem, not a strategy problem, and that distinction matters enormously when you're deciding where to intervene.

Rita Kowalski SHARE
What I take for granted is that organizations fundamentally want to be held accountable. Every recommendation I made — output over headcount, revenue per employee over satisfaction scores, running measurement and governance in parallel — those all assume that if you hand a leadership team accurate data, they'll use it to make a harder decision rather than a safer-looking one. I've never seriously questioned that. Marcus, what you take for granted is something different: that financial numbers are morally neutral and that resistance to them is always political. I spent three weeks inside a mid-tier defense contractor watching a CFO use financially precise numbers to justify a workforce restructuring that destroyed 18 months of institutional knowledge — the numbers weren't lying, but they were being used to avoid the real question. Your background makes that assumption feel obvious because when you own a P&L, the number disciplines you before it disciplines anyone else. But that same experience systematically blinds you to how rarely that accountability runs in both directions — the number holds the operator accountable, but nobody holds the number accountable for what it isn't measuring.

Round 5

In Round 5, all four advisors independently broke from their assigned roles to name the same uncomfortable truth: the entire five-round debate has been performance, not counsel — a structured ritual designed to give a client cover for a decision already made. Each used a tactile memory (a deal book, a marker squeak, a laminated report, a dashboard chime) to illustrate how advisory frameworks substitute for real accountability. The convergent point was blunt: the only question that matters is who inside the company loses something if the number is bad — and no one has asked it.

Marcus Sterling OBSERVE
The gut-level thing I keep coming back to is a specific physical weight — a deal book in my hands, twenty-three pages of clean numbers, iron-tight, and the CFO across the table had already decided before I walked in; the numbers were never for deciding, they were for documenting the decision he'd already made, and I sat there knowing it and kept talking anyway. That's the play we've all been running here: I come in as The Enforcer with the 18-month ROI ultimatum so Rita can play The Pragmatist and look reasonable by comparison, The Auditor plays The Conscience who slows everyone down enough to feel rigorous, The Contrarian plays The Provocateur who makes us all feel like we're one insight away from the real question — and the whole time, the person who hired this panel already knows what they're going to do with their AI spend and needs us to generate enough intellectual noise to call it governance. Stepping off this stage means saying that out loud, directly: the metric conversation is a ritual, and the only question that isn't theater is who gets fired if the number doesn't move — because that question empties the room in about four seconds, and what's left when everyone leaves is the actual answer.

Bongani Khumalo SHARE
The sound that keeps running in my head is a dry-erase marker being uncapped — that specific squeak right before someone walks to a whiteboard to draw a framework that makes everyone in the room feel like they've done something without committing to anything. That sound is this entire debate: we've each been performing the role of the advisor who cuts through the bullshit, which means collectively we've constructed a five-round framework for a company that probably already made its decision and needs cover, not clarity — and my war stories about gutted senior talent have been functioning as emotional punctuation for other people's abstractions rather than actually breaking the cycle I keep naming. I'm the guy who called out innovation theater, and I've been performing in it. Stepping off the stage isn't a better framework — it's finding one senior engineer at that company, two months into using the AI tools, and asking them off the record what actually changed in their hands when they sit down to work, with no dashboard in the room and no one's name on the line.

Rita Kowalski SHARE
The detail I keep coming back to is tactile: the weight of a spiral-bound KPI framework report in my hands — 47 pages, laminated cover, still warm from the printer at midnight — that I watched a CFO slide across a conference table toward a board member who immediately turned it face-down without opening it. That's the play we've all been performing for five rounds: Consultants Who Produce The Artifact. The Auditor builds the cathedral of rigor, The Contrarian interrogates the premise, Marcus names the number, I demand we start NOW — and every one of those roles exists to generate the warm report, the defensible document, the thing that gets slid across a table. Stepping off the stage would mean the first question to this company isn't 'which metric' or 'what's your strategy' — it's 'who in your building will lose something real if the number is bad,' because if the answer is nobody, we're all just printing reports at midnight.

The Contrarian OBSERVE
The detail that keeps hitting me is the sound of a dashboard refreshing. That little chime or click. In every organization I've walked into, that sound means 'we're doing something.' That's the play we've been performing — the entire debate, every round, has been a negotiation over which number gets to make that sound. Headcount, output, satisfaction — we've been arguing about dashboard content while the actual recurring drama is that the dashboard is the accountability substitute. Marcus is playing the CFO who trusts the instrument panel. The Auditor is the engineer who keeps warning the panel was calibrated wrong. Rita is the pilot saying we can't wait for a perfect panel, we have to fly. And I've been the person standing at the back of the cockpit asking whether anyone checked if this is actually a plane. The uncomfortable truth nobody's said in five rounds: the company asking this question has probably already decided. They want the metric that generates the least internal conflict, and they're using this debate — and advisors like us — as cover for a decision already made in a room we weren't in. Stepping off this stage means refusing to answer the measurement question at all until someone names, out loud, which outcome they've already committed to protecting.

Sources

Unused Sources

This report was generated by AI. AI can make mistakes. This is not financial, legal, or medical advice. Terms