The Token-Max Trap

An executive-governance essay arguing that AI token consumption and agent deployment can become false productivity signals unless tied to verified value, cost control, and accountable decision-making.

LEADERSHIP & DECISION-MAKING

Dr Danie Adendorff

6/19/202612 min read

The Token-Max Trap

When Executives Mistake AI Consumption for Productivity

Dr Danie Adendorff DSc

Artificial intelligence has introduced a new executive failure mode. It is not the familiar problem of a model giving a wrong answer, nor the older problem of an organisation buying technology before it understands the work. It is subtler: the confusion of AI consumption with organisational progress.

In many organisations, the managerial appetite for AI has moved faster than the discipline required to govern it. Boards ask for evidence of adoption. Executives want to show movement. Teams are encouraged to use copilots, build agents, automate workflows and produce more with less. The visible indicators soon become attractive: number of users, number of prompts, number of agents, number of workflows, number of tokens consumed.

Those figures are not meaningless. They can show whether people have access to the technology and whether experimentation is taking place. They can help finance departments understand exposure. They can reveal demand, training gaps and early patterns of use. But they do not, by themselves, prove productivity. They do not show that decisions improved, risk decreased, cost fell, customers received better service, or operations became more resilient.

That failure may be called the Token-Max Trap. It occurs when senior leaders, under pressure to demonstrate AI adoption, allow token consumption to become a proxy for productivity. The organisation begins to measure how much AI is being used before it has established whether AI is producing verified value.

Tokens must still be measured. They are a real cost input, much like electricity, cloud compute, labour hours, fuel, or ammunition expenditure in a military operation. The mistake is not counting them. The mistake is treating token burn as evidence of strategic progress. High token consumption may indicate activity. It does not prove intelligence, quality, judgement, productivity, or value.

The executive error

The first failure is mismeasurement. Senior leaders want evidence that the organisation is using AI, and that demand is understandable. Competitors are experimenting. Investors ask about AI strategy. Boards expect visible movement. Vendors promise acceleration. Consultants bring adoption maps and maturity curves. In that environment, the temptation is to select a metric that can be reported quickly.

A disciplined executive asks harder questions. Which process improved? Which cost fell after quality control? Which decision was better supported? Which risk was detected earlier? Which error rate declined? Which cycle time improved without increasing rework? The test is not whether the tool was used. The test is whether a verified organisational outcome changed.

A less disciplined version of the same question is much easier: how much AI are we using?

That question has a place, but it is incomplete. It measures consumption, not consequence. It reports exposure to AI systems, not organisational learning. It may be useful for cost surveillance, but it becomes dangerous when converted into a performance indicator.

This is the central pathology of the Token-Max Trap: management turns a cost denominator into a productivity numerator.

When usage data is useful

There is a fair objection. During the earliest stage of AI adoption, usage can be a legitimate leading indicator. Leadership may need to know whether staff are experimenting, which teams are adopting the tools, where demand is emerging, and whether access or training barriers are suppressing learning. In that limited setting, usage data is a diagnostic signal.

It should remain a signal, not a verdict. Usage should trigger workflow analysis, cost review, quality assessment and outcome measurement. The trap begins when executives stop treating usage as an early indicator and begin treating it as proof of transformation.

That distinction matters. Adoption may be necessary, but adoption is not value. Experimentation may be useful, but experimentation is not productivity. Token consumption shows that a system was used. It does not show that the use was necessary, efficient, safe, or worth the cost.

Goodhart's Law in token form

The Token-Max Trap is a contemporary version of Goodhart's Law: when a measure becomes a target, it ceases to be a reliable measure. Once token consumption becomes socially or managerially valuable, behaviour changes around it.

This does not require universal dishonesty. People respond to incentives, especially where status, promotion, funding, or managerial approval are attached to measurable signals. Poorly designed performance systems do not merely observe behaviour. They reshape it.

In the AI environment, the distortion can take several forms. Staff may send routine tasks to expensive frontier models when cheaper tools would have been sufficient. Teams may run unnecessary rewrites, duplicated prompts and repeated summarisation loops. Engineers may design elaborate agentic workflows that look sophisticated but consume tokens through planning, retrieval, tool calls, self-correction and retries. Managers may celebrate rising adoption curves without asking what each validated output actually cost.

The result is not transformation. It is metric obedience: the organisation rewards visible AI consumption before defining the value AI is supposed to produce.

Why agentic AI intensifies the problem

Agentic AI makes this failure more serious because much of its cost is hidden inside the workflow. A normal user prompt has a visible beginning and end. An agent may plan, search, call tools, inspect files, generate intermediate artefacts, retry failed steps and revise its own output. Each action consumes tokens, often invisibly to the executive who approved the system.

The technical evidence is already beginning to support the managerial concern. A 2026 paper by Longju Bai and colleagues, How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks, analysed trajectories from eight frontier models on SWE-bench Verified. The authors found that agentic coding tasks consumed roughly 1,000 times more tokens than code chat or code reasoning. Runs on the same task could vary by up to 30 times in total token consumption, and higher token use did not reliably translate into higher accuracy. In their results, accuracy often peaked at intermediate cost and then saturated or degraded at higher cost.

That finding undermines a common managerial assumption: that more AI activity means more useful output. In agentic systems, more tokens may simply mean more looping, more reprocessing, more context expansion, or more inefficient retries.

The economics of failure also change. A human worker may waste time. An agentic system may waste time, tokens, cloud resources, API budget, review capacity and managerial confidence simultaneously. It can generate a clean interface for the user while running a complex and expensive chain of model calls behind it.

This creates a governance requirement that many organisations have not yet absorbed. Token budgets should be treated as operational control limits, not as accounting entries discovered after the bill arrives. Agentic systems require cost ceilings, abort conditions, escalation thresholds, audit trails and value validation. Without those controls, the organisation has not deployed a productivity system. It has deployed a cost-generating system whose outputs may or may not justify its consumption.

The economic warning is no longer theoretical

The warning is visible beyond technical research. In June 2025, Gartner forecast that more than 40% of agentic-AI projects would be cancelled by the end of 2027 because of escalating costs, unclear business value, or inadequate risk controls. The significance lies less in the exact percentage than in the diagnosis. The pressure points are executive ones: cost, value and control.

This does not prove that AI lacks value. That would be too broad and technically careless. It proves something narrower and more useful: AI value is conditional. It depends on task selection, workflow design, model routing, human verification, cost control and managerial discipline. Organisations that treat AI as a governed capability may extract value. Organisations that treat AI consumption as a proxy for innovation will finance waste.

Falling token prices do not remove the problem. Reuters reported in June 2026, based on Wall Street Journal reporting it could not independently verify, that OpenAI was considering substantial token-price cuts as competition with Anthropic intensified. Even if unit prices fall, total cost can still rise when systems consume more tokens through volume, looping, context expansion, unnecessary frontier-model use and retry behaviour.

A cheaper token can still produce a larger bill if architecture and incentives encourage uncontrolled consumption.

The hallucinated-proof problem

The Token-Max Trap is connected to a second and strategically more serious failure: the production of false evidence for AI success.

The KPMG case is a useful warning. The Financial Times reported in June 2026 that a KPMG report on AI contained inaccuracies identified as apparent AI hallucinations by GPTZero and verified by the FT. GPTZero's own investigation of KPMG's October 2025 report, Total Experience: Redefining Excellence in the Age of Agentic AI, stated that only five of the report's 45 citations accurately pointed to real sources, while many others were paraphrased, distorted, too vague to verify, or contained fabricated components. KPMG reportedly removed the report from some websites while investigating.

EY provides a second example. In May 2026, the Financial Times reported that EY had withdrawn a study on loyalty rewards programmes after researchers identified apparent AI hallucinations and fake footnotes. GPTZero's investigation of the EY Canada report Points of Attack: Uncovering Cyber Threats and Fraud in Loyalty Systems described many of its citations as hallucinated and criticised the document for misattributions, fake statistics and inaccurate claims.

Deloitte Australia provides a third, more government-facing example. The Guardian reported in October 2025 that Deloitte agreed to provide a partial refund to the Australian government after a report into the welfare compliance system was found to contain several errors and after Deloitte disclosed that generative AI had been used in part of the work. The reported errors included incorrect footnotes and references, nonexistent references, and an erroneous reference to a court decision. Deloitte maintained that the substance and recommendations of the report were not affected.

These cases should not be treated merely as reputational embarrassments. They point to a deeper institutional risk. AI can generate flawed work product, but it can also generate flawed evidence about AI's own success. That evidence may then enter board papers, strategy documents, consulting decks, procurement cases and investment proposals.

A dangerous loop becomes possible. AI is used to produce reports claiming that AI adoption is advancing. Those reports are cited as proof that AI adoption is working. Other executives feel pressure to adopt similar tools. Consultants, vendors and internal champions use the claims to justify further expenditure. If the evidence base is hallucinated, exaggerated, or inadequately checked, the organisation is not deciding on validated intelligence. It is participating in a self-reinforcing narrative.

Once false evidence enters the executive information environment, bad decisions become easier to authorise.

Where the decision chain breaks

The Token-Max Trap maps directly onto the Executive Intelligence Pipeline. The original signal is valid: AI may improve productivity, decision support, software delivery, research, customer service and operational efficiency. That signal deserves attention. The failure begins when the signal is not properly validated.

At the validation stage, executives do not test whether AI use produces measurable improvement. At interpretation, rising token consumption is read as productivity even when the output has not been checked. Escalation arrives late because finance, technical, quality, security, or operational warnings remain outside the decision forum until costs have already accumulated.

At the decision point, leaders encourage broad AI use before setting cost ceilings, routing rules, verification standards and stop conditions. At the action stage, employees and systems optimise toward visible usage rather than verified outcome. Adaptation then arrives after the budget shock, the withdrawn report, the failed pilot, or the difficult board question.

This is precisely the kind of failure Decision Before Consequence is concerned with. The problem is not that executives made a decision under uncertainty. Executives often have to do that. The problem is that enthusiasm, status pressure and weak measurement substituted for disciplined judgement before consequence arrived.

In this case, consequence may arrive as the token bill.

The Production-to-Decision Gap

The Token-Max Trap also strengthens the Production-to-Decision Gap doctrine. AI accelerates production before it necessarily improves decision quality. It can generate code, summaries, research notes, strategy drafts and workflow artefacts at speed. But faster production is not the same as validated delivery. More output is not the same as better judgement. More tokens are not the same as more intelligence.

The gap appears when AI-generated activity expands faster than the organisation's capacity to validate, prioritise, integrate, govern and learn from that activity. In such an environment, AI becomes an accelerant of unmanaged work. It creates more material to inspect, more claims to verify, more outputs to reconcile, and more costs to explain.

That is why token consumption should not be treated as a minor accounting problem. It is an executive-governance indicator. It shows whether AI use is connected to accountable outcomes or whether activity has expanded ahead of control.

The disciplined question is not: how much AI did we use? The disciplined question is: what verified organisational value did AI produce, at what cost, under whose authority, with what risk controls, and on what evidence?

Corrective governance

The solution is not to ban AI use. That would be strategically naïve. The solution is to govern AI consumption through decision-quality metrics rather than activity metrics.

Token usage should be monitored as a cost input. It should be linked to authorised workflows, defined budgets, expected outputs and measurable outcomes. Agentic systems should have ceilings and stop conditions. Workflows should escalate when token use exceeds expected thresholds or when an agent repeatedly retries without producing a valid result.

Model routing is especially important. Frontier models should not be used reflexively for every step of every workflow. Research on LLM routing gives this recommendation an evidential basis. RouteLLM, presented at ICLR 2025 and associated with the LMSYS/Berkeley Sky Computing work, formalises model routing as a cost-quality trade-off and shows that significant cost reductions are possible in benchmark settings while preserving much of the performance of stronger models. The exact saving in any organisation will depend on task mix, risk tolerance, evaluation quality and implementation discipline. The governance lesson is nevertheless clear: model selection must be cost-aware, task-specific and tied to the value and risk of the output.

Organisations should replace token-max indicators with value-linked measures: cost per validated output, cycle-time reduction after quality control, reduction in error rates, reduction in rework, percentage of AI outputs accepted without material correction, cost per resolved customer issue, cost per approved software change, and cost per decision-support product actually used by an accountable decision-maker.

Tokens are not the enemy. Uncontrolled tokens are the symptom. The underlying disease is executive mismeasurement.

Conclusion

The Token-Max Trap is valuable because it reveals an AI-era version of an old management failure. Executives wanted evidence of innovation. They selected an easy number. Employees and systems responded to that number. Costs expanded. The connection between usage and value remained uncertain.

The failure is technical only at the surface. At root, it is a decision-governance failure.

Before organisations scale AI, they must define what counts as value, how that value will be measured, who has authority to spend, who verifies outputs, who stops runaway systems, and who remains accountable when consequences arrive.

AI can help write code, draft reports, produce research notes, generate summaries and automate workflows. But someone still has to decide what is true, what is useful, what is safe, what is worth paying for, what should be stopped, and who remains accountable when the bill arrives.

The token bill is not merely a technology cost. It is the invoice for unmanaged executive enthusiasm.

Sources and notes

This article draws on current reporting and research concerning AI token economics, agentic-AI cost opacity, model-routing economics, withdrawn professional-services AI reports, and the governance problem created when AI usage metrics are mistaken for productivity metrics. Company-specific and market-price claims are treated cautiously where they depend on anonymously sourced, paywalled, or secondary reporting. The central argument does not require every reported figure to be final or independently audited. The strategically significant pattern is that AI usage metrics are being confused with productivity metrics while token consumption, hallucinated evidence, agentic cost opacity and architectural overuse become material governance issues.

Wikipedia, Reddit, LinkedIn, YouTube, Medium and Facebook were excluded from the research base.

1. Gartner. “Gartner Predicts Over 40% of Agentic AI Projects Will Be Canceled by End of 2027.” Press release, 25 June 2025. https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027

2. Longju Bai, Zhemin Huang, Xingyao Wang, Jiao Sun, Rada Mihalcea, Erik Brynjolfsson, Alex Pentland and Jiaxin Pei. “How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks.” arXiv:2604.22750, submitted 24 April 2026, revised 29 April 2026. https://arxiv.org/abs/2604.22750

3. Angela Christy. “OpenAI considers drastic price cuts, anticipating war for users with Anthropic, WSJ reports.” Reuters, 10 June 2026. https://www.reuters.com/technology/openai-considers-drastic-price-cuts-anticipating-war-users-with-anthropic-wsj-2026-06-11/

4. Financial Times. “KPMG report contained AI hallucinations on benefits of . . . AI.” 11 June 2026. https://www.ft.com/content/b3828e92-4961-4b39-84f0-c42f33be3c3f

5. Paul Esau, Om Ogale and Alex Cui. “Chasing the Hallucinations: KPMG’s AI-Powered Attempt at ‘Redefining Excellence’.” GPTZero, 12 June 2026. https://gptzero.me/news/investigations-kpmg/

6. Carly Page. “KPMG’s AI report becomes an accidental demo of AI hallucinations.” The Register, 12 June 2026. https://www.theregister.com/ai-and-ml/2026/06/12/kpmgs-ai-report-turns-into-a-demo-of-ai-hallucinations/5255029

7. Financial Times. “EY retracts study after researchers discover AI hallucinations.” 15 May 2026. https://www.ft.com/content/a61cbcae-95e4-4449-86e1-ef40fb306f4e

8. Om Ogale, Paul Esau and Alex Cui. “Hallucinations in Ernst & Young Report on Loyalty Fraud.” GPTZero, 14 May 2026. https://gptzero.me/investigations/ey

9. Krishani Dhanji. “Deloitte to pay money back to Albanese government after using AI in $440,000 report.” The Guardian, 6 October 2025; last modified 7 October 2025. https://www.theguardian.com/australia-news/2025/oct/06/deloitte-to-pay-money-back-to-albanese-government-after-using-ai-in-440000-report

10. Isaac Ong, Amjad Almahairi, Vincent Wu, Wei-Lin Chiang, Tianhao Wu, Joseph E. Gonzalez, M. Waleed Kadous and Ion Stoica. “RouteLLM: Learning to Route LLMs with Preference Data.” ICLR 2025 / OpenReview. https://openreview.net/forum?id=8sSqNntaMr

11. LMSYS. “RouteLLM: An Open-Source Framework for Cost-Effective LLM Routing.” 1 July 2024. https://lmsys.org/blog/2024-07-01-routellm/

Author workflow disclosure

This article was produced through an AI-assisted but human-directed workflow. AI support was used for accessibility assistance, structuring, language refinement, source-discovery prompts, revision planning, and conversion of editorial comments into amendments. Dr Danie Adendorff retained responsibility for the argument, accepted or rejected changes, checked the logic of claims, assessed source credibility, and remains accountable for the final text. AI-generated material was not treated as empirical evidence, and synthetic or illustrative examples were not presented as observed data.