Issue #52 — The Dependency You Don't Control

Use only as much AI as you really need

Dear Reader,

In early June, Microsoft’s Experiences + Devices division told thousands of its engineers to stop using Claude Code and move to GitHub Copilot. The tool had not failed; it had worked so well that engineers reached for it constantly, and because Claude Code bills enterprises by the token, the monthly cost climbed to roughly $2,000 per engineer, and the division’s annual AI budget evaporated within a quarter. A few days later, on the night of 12 June, Anthropic switched off its two most capable models, Fable 5 and Mythos 5, for every user in the world, because the US Commerce Department had issued an export-control directive barring the company from serving them to any foreign national, including its own non-citizen staff — which meant that even Andrej Karpathy, who had recently joined Anthropic, was locked out of the model. Because Anthropic could not cleanly separate foreign users from the rest, it switched the models off completely. Plenty of developers had already adopted them because the quality is very good; on Saturday they woke up to find the models gone.

In one case the cost ran out of control; in the other the product disappeared overnight. That frontier models can become uneconomically expensive has been discussed for a while. We have also seen what looks like deliberate lowering of a model’s reasoning quality. But a top model being switched off completely, overnight, happened for the first time.

A technology that has not settled

It is worth saying why generative AI has these problems when the rest of the enterprise software stack does not. Software-as-a-service and cloud computing had their own unstable adolescence, and then they settled into something closer to utilities. They still raise prices and rewrite their terms, but they tend to do it on published schedules and within ranges a finance team can plan around, which is why a company can build a multi-year dependency on AWS or Salesforce without losing much sleep over it. Generative AI has not reached that stage, for two reasons.

The first is that the technology is still young, and it moves on every axis at once, because capability, price and architecture all change from one quarter to the next. The assumption that inference cost will always fall — which is usually how technological progress shows up — no longer holds: Fable 5 lists at $50 per million output tokens, and because the newer Claude models use a tokeniser that produces up to 35% more tokens for the same text, a model whose headline price has not moved can still cost a third more per request. Agents sharpen the effect, since a single agentic task consumes five to thirty times the tokens of an ordinary chatbot exchange.

The second reason matters more: generative AI reaches into the functioning of society far more deeply than any earlier wave of IT. A SaaS outage is an inconvenience for the people using that particular tool, whereas a model that can be talked into writing working exploit code is a different kind of object, one a government treats as a matter of national security. Quite what triggered the Fable 5 order is still contested. The official reason was the discovery of a way to defeat the model’s guardrails, though the researcher involved says the work was defensive rather than an attack, and Amazon, one of Anthropic’s largest investors, had reportedly flagged its own concerns beforehand. The order also lands in the middle of a months-long standoff between Anthropic and the administration, which may suggest retaliation. And Anthropic had spent the past three months telling the world how dangerous this model was, so it could hardly act surprised when a government eventually took the warning at face value. Cloud computing was never subject to an export-control order.

The practical conclusion: you should not plan as though AI will settle into a calm, predictable utility on the timeline that cloud did. For the foreseeable future, its price and its availability will stay hard to predict.

The exposure ladder

If the dependency is going to stay volatile, the question is how much of your business it is worth resting on it. Almost any task you might hand to a frontier model can be done at a lower level of risk. Picture the options as rungs on a ladder, ordered by how much you depend on an outside AI vendor.

At the bottom is plain deterministic code — rules and validations. It uses no model, so there is no token bill and no AI vendor who can cut off access. What remains is ordinary software risk, and its level depends on whether you run the code on-premise or on SaaS. Above that is a small or open-weights model: cheaper than the frontier and easier to replace, so a price rise or a loss of access lands more lightly. At the top is the frontier API: the most capability, but the most dependence on its price and its availability.

The easiest thing was to make the top rung the default, because it is the simplest to adopt. The frontier API can give a sensible answer to almost any question, so it ends up being asked every question. The result is that we run far more of our operations than we need to on a tool that is outside our control. The fix is the principle attributed to Einstein: make everything as simple as possible, but not simpler.

Climbing down

Use deterministic rules wherever the task is deterministic. That covers the large majority of tasks — validating a field, routing a request on known conditions, pulling data against a stable schema, checking a value against a threshold. A rule solves these exactly, instantly and with no cost per use. We also have proven, stable, well-studied machine-learning models that work very well for decision and prediction tasks. Part of the credibility of an AI programme comes from being able to say where AI is not needed.

It is, on the other hand, worth using GenAI tools to generate that deterministic code quickly. They are good at it, and this is a design-time job, not a run-time one, so the tool being unavailable does not stop your operations.

Where a GenAI model genuinely is required, use the smallest one that clears the quality bar for that step. For capability you do not use, you pay twice — once in price and once in the dependency you build on the vendor. Take a simple classification task, sorting incoming support tickets by type. On a frontier model it might cost a couple of cents a call; on a well-chosen mid-size model it costs a fraction of that, at the same quality. The saving is obvious. Less obvious is that a smaller model is also a smaller dependency. An open-weights model such as MiniMax’s M3 — which recent coverage puts on a par with frontier models on several benchmarks at a fraction of the price — can be downloaded and run locally. Then no export decision and no overnight repricing can take it away from you.

A further case applies to a narrow band of work — tasks that are both high in volume and critical to keep running. For those it can be worth running a small model on your own infrastructure. Then you do not pay per use and no one outside can switch the model off. This makes sense where two conditions hold at once: the task is narrow enough that a small model can do it, and you have a team able to run and maintain that model in production. You trade some of the frontier model’s capability for control — a good trade for a payments-reconciliation step, a bad one for open-ended analysis. The Fable 5 order was, in the end, a US government decision about who may use an American model, and an open-weights model on your own infrastructure is the one option a foreign government’s export policy cannot reach.

Briefing

Mistral AI used its AI Now summit in Paris in late May to set out a full-stack strategy, launching its Vibe agent platform, pushing into industrial AI for aerospace and automotive, and detailing a data-centre build-out, with chief executive Arthur Mensch saying the company will explore designing its own chips. For European enterprises weighing concentration risk, the signal is clear: the continent now has a frontier-scale provider planning to cover the whole stack, from silicon to agents, rather than a thin layer wrapped around American models.

Microsoft, meanwhile, has started pulling away from its closest partner: its AI chief said the company has been “set free” from OpenAI to pursue superintelligence on its own MAI models, ending the period in which Microsoft’s AI strategy and OpenAI’s were effectively the same thing. The firm that built the strongest position in enterprise AI by leaning on a single provider now treats that dependence as something to grow out of.

Anthropic confidentially filed for a public listing at a valuation near $965 billion, weeks before the government switched off its flagship models.

Questions for your leadership team

  1. For our five largest AI workloads, which rung of the exposure ladder is each one on, and is it the lowest rung that would actually do the job? How many of them are deterministic problems we are solving with a probabilistic, metered model?
  2. If our primary model doubled in price or went dark tonight, which workflows would stop, and what would the business impact be? Do we have a plan for that situation?
  3. Do we have the engineering capability to run a model ourselves where it genuinely matters, or is our dependence on a single external provider effectively permanent?
  4. Are any of our critical workflows running on a model whose availability is decided by a foreign government’s export policy? If so, do we understand and accept the risk?

Summary

The price and availability of frontier models will keep moving for as long as the technology keeps developing fast and is treated as a geopolitical advantage — which is to say for a good while yet. What is within our control is how much of the business rests on it. The aim is to use the cheapest and most stable tool that will do each job. That way we manage the risk of unpredictable costs and politically driven decisions far better, and our processes become more predictable and efficient. The catch is that getting there takes more work than simply handing the decisions to an LLM.

Stay balanced, Krzysztof Goworek

Krzysztof Goworek is founder of Quintant — AI advisory that gets enterprises from experiment to production value.