Issue #49 — BPMN Is Not Enough

Dear Reader,

An organisation about to scale AI reaches for the same starting move many do. It buys a process-drawing tool and starts interviewing the people doing the work, in order to document the as-is processes before automating anything. The instinct underneath this move is correct (you cannot redesign what you do not understand), but the tool is insufficient. A swimlane diagram gives you a faithful blueprint of how humans did the work, and it says nothing about how AI should do it.

Deloitte UK’s 2025 survey of 1,854 European executives quantifies what happens to organisations that act on the swimlane: those applying AI to existing processes are 1.6x more likely to report missed expectations than those who redesign work before deploying. GBTEC’s 2025 Global Process Excellence and AI-Readiness Report (600 senior operations leaders) puts a sharper point on it. 84% identify operational chaos as the silent killer of transformation, and 87% say agentic AI requires structured, governed processes before deployment. Both data sets point to the same conclusion: the BPMN tool was never the part that mattered.

The as-is problem

Michael Hammer’s 1990 reengineering argument is often reduced to a single line: don’t automate, obliterate. His actual point ran deeper than the slogan. Every existing process carries fingerprints of the medium that made it necessary. Paper-era processes had approval chains because no one person held the full record. Shift workers handed off because no individual could be on for sixteen hours. Specialists owned narrow steps because cross-training was expensive. Digitisation projects that preserved those structures were preserving solutions to problems the technology had just dissolved.

The same pattern repeats with AI. Your current process exists in its current shape because humans were doing it. The batch rhythms reflect attention spans. The multi-approval chains reflect distributed context that no single human held. AI removes some of those constraints, allowing continuous processing where humans needed batches and full-context synthesis where humans needed handoffs. It introduces new ones in their place: inference cost above all, and opacity in implicit decisions where a human reasoner would have been transparent on request.

Hammer’s question, updated for the AI era: what was this process forced to look like because humans were doing it, and what does it look like when that constraint is gone?

ADAPT Digital’s formulation captures what happens when this question is skipped. Automation magnifies whatever is already happening, good and bad. Map without redesigning, and you industrialise yesterday’s compromises.

Three lenses BPMN does not capture

The map is missing three dimensions that determine whether AI deployment will work.

Decision type — explicit versus implicit. BPMN shows decision gateways. It does not distinguish between decisions that follow documented rules (which AI handles well and cheaply) and decisions that require experience, context, or judgment (which require either human-in-the-loop design, or process restructuring to make the underlying judgment explicit before any AI is deployed). Most processes contain both. Most process maps do not label which is which. A multi-approval chain that exists because no single approver has full context can often be consolidated when AI synthesises context at decision time, but only when the decision is genuinely explicit. If the senior approver was applying unwritten judgment the others lacked, removing them removes the judgment.

Error cost asymmetry. A 5% error rate on expense classification is recoverable. A 5% error rate in credit scoring or regulatory filing is a compliance event. Identical BPMN symbol; entirely different deployment calculus. The automation decision for any step is inseparable from the cost of being wrong on that step. EU AI Act Article 14 mandates human oversight for high-risk systems, but the business case for oversight exists independently of the regulation.

Human oversight architecture as a design choice. Most organisations default to HITL (AI proposes, human approves) for every step. That default eliminates the cost advantage of automation. There are three workable architectures, each with a different cost and risk profile (a distinction covered in more detail in Issue #33):

HITL (Human-in-the-Loop): a human approves AI output before action. Maximum accountability, highest latency, highest unit cost. Warranted where the cost of being wrong is severe or irreversible. A looser variant is batch audit — AI acts autonomously and humans periodically review samples after the fact — which lowers operational cost in exchange for delayed error detection.
HOTL (Human-on-the-Loop): AI acts autonomously within defined parameters; humans intervene only on exceptions the system flags as uncertain. Lowest per-transaction cost. Quality depends entirely on the exception triggers — get those wrong and the human never sees the cases that matter.
HIC (Human-in-Command): a human sets boundaries, objectives, and kill-switch conditions but does not supervise individual transactions. The only viable model when the system operates faster than human cognition allows — algorithmic trading, real-time fraud screening, high-frequency agentic loops.

The redesign question for any given step: which architecture is warranted here, and what process restructuring makes that transition safe? Answering that per step is the work BPMN does not do.

Who runs each step

The redesign is incomplete without deciding who, or what, executes each step. The default in early enterprise pilots is to use the most capable frontier model available, on the logic that better quality reduces risk. For high-volume steps this is the wrong calculus.

The principle is mundane and frequently ignored: the smallest, cheapest model that meets the quality bar for this step. A document classifier does not need a reasoning model. An expense categoriser does not need GPT-5-class capability. Routing the right model to the right step is the difference between an AI programme with positive unit economics and one that looks good in a quarterly deck and bleeds money in production.

Current practice already shows why this matters. Gemini 3.5 is roughly twice the per-token cost of Gemini 3.1 and uses twice as many tokens per task on average. Frontier providers operate inference at thin or negative margins relative to the total cost of building the underlying capability. The direction prices move from here is not given. Designs that assume frontier inference will keep getting cheaper per unit of work are placing a bet that may not pay.

The size of that bet is easier to see in a worked example. Acropolium modelled an enterprise AI agent programme handling three million customer interactions a year. The business case assumes AI handles 50% of those interactions end-to-end, producing 575% ROI over the programme lifetime. If actual deployment achieves only 40% automation, lifetime ROI falls to roughly 440%. If it achieves 60%, ROI rises to roughly 680%. A ten-point miss on the single assumption most directly tied to the redesign work moves the programme’s lifetime economics by about 130 ROI points in either direction. Programme economics are fragile to that one number, and the redesign work is what sets it.

Two design implications follow. Classify each step’s quality requirement before assigning a model class, and test that a cheaper model fails the requirements before assuming the expensive one is needed. Then design the human/AI ratio at each step so it can be adjusted without rewriting the process. This is Hammer’s principle applied to economic risk. Do not embed assumptions you cannot tune.

What AI-ready discovery looks like

If the BPMN drawing tool is insufficient, the replacement is a different kind of discovery exercise, not a different drawing tool.

The minimum viable discovery stack for an enterprise serious about AI redesign runs three layers in parallel rather than as separate exercises. The first is a process-mining layer that reads actual execution traces from system logs (ERP, CRM, ticketing) and reconstructs the real process. Most organisations discover at this point that the documented process and the executed process diverge by far more than the leadership team assumed, and in ways that change the automation calculus. The second is a structured interview layer, run by a properly prompted LLM (supported by human analysts) that knows what to ask about implicit decisions, exception handling, and the unwritten judgment built into how the work actually happens. The interview captures what BPMN cannot. The third is a mapping layer that classifies each discovered step against the three lenses above and assigns a target execution architecture.

The output of all this is an AI-ready process specification rather than a swimlane diagram: each step labelled by decision type, error cost band, oversight pattern (HITL / HOTL / HIC), target model class, and the conditions under which any of those should be revisited.

Where to start

BPMN remains useful as a starting point for the process map. It should be treated as exactly that: a starting point, with the finished output looking very different. The practical sequence:

Discover what is actually happening (process mining over documented mapping, wherever logs exist).
Apply the three lenses to each step: decision type, error cost, choice of oversight architecture (HITL / HOTL / HIC).
Decide the execution layer for each step: model class, human/AI ratio, tunability.
Classify steps: automate as-is, automate with redesigned oversight, redesign process before automating, leave manual.
Begin with steps that are high-volume, explicit-decision, and recoverable-error. Design them from day one as HOTL (intervention on exceptions) with a swappable model class. High volume is exactly where price and model changes bite hardest. Build for that on the first deployment, not the third.

The output is a sequenced automation roadmap that specifies what process changes must precede technical deployment, and what model is assigned to each step (with the governance triggers for revisiting that choice as conditions shift).

This is the same sequencing logic argued in Issue #46. The first deployment locks the pattern. Choose accordingly.

The Briefing

MIT Technology Review published Enabling agent-first process redesign in April, putting the same argument in slightly different language. Scott Rodgers (global chief architect, Deloitte Microsoft Technology Practice) frames the operating-model shift in one line: humans as governors, agents as operators. The piece also concedes what most vendor coverage avoids, which is that bolting AI agents onto fragmented legacy workflows using traditional optimisation methods is the failure mode of every previous IT modernisation wave. The “agent-first” framing is what the analyst layer is now converging on.

Ed Zitron’s analysis of leaked Microsoft revenue-share data puts OpenAI’s inference economics deeply negative on a per-revenue-dollar basis, and a Google researcher’s early-2026 paper identifies inference as the primary bottleneck preventing frontier providers from reaching profitability. Inference now accounts for roughly 85% of enterprise AI budgets. Agentic systems, which decompose a single user request into many model calls, consume an estimated 5-30x more tokens per task than a standard chatbot turn, a multiplier that lands directly on whichever organisation is using the agents. The direction of travel for prices is not a settled question.

Questions for your leadership team

Is the team preparing our “AI process map” working with a drawing tool, or with a decision framework? What specifically do we expect to read off that map before deployment?
For each process being considered for automation, which decisions rest on documented rules and which on expert judgment? Who classified them, and when?
Where would the cost of an error be irreversible (regulatory, financial, reputational)? Does our human-oversight design at those points come from analysis, or from the default of “AI proposes, human approves” applied everywhere?
For each planned deployment, which model are we running and why that one rather than a cheaper alternative? Have we tested that a cheaper model fails the bar, or are we assuming?

Summary

The organisations capturing returns from AI made two design decisions where most made one. The first concerns what the process should look like once it is no longer constrained by the humans who used to run it. The second concerns how much of that design should remain tunable as the cost curve, the model class, and the quality bar move underneath it. A process map produced from interviews with the current operators answers neither question. It documents the current state, which is precisely the state the redesign is meant to leave behind.

Stay balanced, Krzysztof Goworek

Krzysztof Goworek is founder of Quintant — AI governance and EU AI Act advisory for regulated enterprises.