Claude Opus 4.6 Performance Decline: What Businesses Must Learn
Claude Opus 4.6 lost 67% thinking depth while users still pay $200/month. Vendor lock-in with AI tools: 5 concrete recommendations for mid-sized companies.

Your ERP vendor swaps out the database backend overnight. The interface looks the same, but queries take twice as long and reports come back with gaps. No changelog, no announcement. That is exactly what is happening right now with Anthropic’s flagship model Claude Opus 4.6 – and it affects everyone running AI-assisted development or automation in production. This topic falls squarely into my AI & Automation consulting for mid-sized companies.
Thinking Depth in Freefall: The Performance Data on Claude Opus 4.6
On April 2, 2026, Stella Laurenzo, Senior Director AI at AMD, published a detailed analysis on GitHub Issue #42796. Not opinion – data: 6,852 sessions, 234,760 tool calls, 17,871 thinking blocks. The results are clear.
| Metric | Before | After | Change |
|---|---|---|---|
| Median thinking depth (characters) | 2,200 | 720 | -67% |
| Read:Edit ratio | 6.6 | 2.0 | -70% |
| Blind edits (edits without reading first) | 6.2% | 33.7% | +444% |
What does this mean in practice? The agent reads files less often before editing them. It thinks for shorter stretches before responding. A third of all code changes happen blind, without the agent having read the existing code first. The Read:Edit ratio – a measure of how thoroughly the agent works – dropped from 6.6 to 2.0. AMD switched providers.
Adaptive Thinking and Quiet Parameter Changes
What happened? Anthropic made two changes that together explain the performance drop.
February 9, 2026: Introduction of “Adaptive Thinking.” The model automatically adjusts its thinking depth based on the perceived complexity of a task. Sounds reasonable, but there is a catch: the model routinely underestimates the complexity of real-world tasks.
March 3, 2026: The default thinking effort was quietly lowered from “high” to “medium.” No announcement, no changelog entry. Unless you actively tracked release notes on GitHub, you would not have noticed.
Think of it like a supplier silently switching materials. You ordered and received steel grade X. The next batch looks identical, but load-bearing capacity has dropped. The part only fails under stress.
Confirmed change: Anthropic engineer Boris Cherny additionally confirmed a bug where the model produces zero thinking tokens in certain cases – responding with no reasoning at all.
The Rental Car Swap: Model Switches Without Warning
Thinking depth is only part of the problem. On GitHub, Issues #30350, #31480, and #19468 document another incident: users who requested Opus received Sonnet responses in certain cases. The smaller, cheaper model instead of the paid premium one.
Anyone who has ever booked a rental car knows the situation. You reserve a BMW 5 Series, but a 3 Series is waiting at the counter. The agent says: “It’s still a BMW.” Technically correct. Practically a different product. There was no public announcement and no changelog entry for these swaps at Anthropic. The Max Plan price stayed unchanged at $200 per month.
Benchmark Data: Opus 4.6 Behind Sonnet and Its Own Predecessor
The problems are backed by data. Marginlab runs an automated daily tracker at marginlab.ai/trackers/claude-code/ that compares AI coding models. The current ranking:
| Rank | Model | Score | Status |
|---|---|---|---|
| 1 | Sonnet 4 (May 2025) | 70 | STBL (stable) |
| 3 | Opus 4.5 | 68 | STBL (stable) |
| 8 | Opus 4.6 | 62 | VOLA ALERT (volatile) |
The most expensive model lands at rank 8, behind its own predecessor and even behind the significantly cheaper Sonnet 4. The “VOLA ALERT” status means results fluctuate heavily between measurement days. Add to that several status page incidents: February 28, March 31, April 3-4, April 10.
In Reddit threads and GitHub discussions, I see a pattern I recognize from ERP projects: users report problems, the vendor denies, data proves otherwise, the vendor partially concedes, then the cycle repeats. A corporate game of telephone where each round of internal escalation dilutes the original complaint a little more.
Vendor Lock-in with AI Tools: The Same Pattern as ERP
As a consultant who has been guiding mid-sized companies through ERP selection and IT strategy for years, this pattern looks familiar. It is the exact same vendor lock-in problem I described in my article on cloud exit for mid-sized companies.
- Initial enthusiasm. The product convinces, the decision comes fast
- Dependency grows. Teams build workflows around the tool, switching costs rise
- Performance drops or terms change. The vendor knows that switching is expensive
- The customer has no leverage. No SLAs for output quality, no contractual safeguards
With ERP systems, this means: annual license increases of 8-15% that are nearly impossible to negotiate. With AI tools, this means: $200 per month for a product whose quality has measurably declined, with no downward price adjustment.
The difference from traditional ERP lock-in is speed. An ERP system degrades over years. With AI APIs, performance can shift overnight because a parameter was changed on the server. Anyone using AI productively in their mid-sized company needs to be aware of this risk.
The Workaround: What You Can Do Today
After public pressure, Anthropic provided a fix. In Claude Code, you can set adaptive thinking depth to maximum:
# Option 1: Slash command in your session
/effort max
# Option 2: Environment variable (persistent)
export CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1
This restores behavior to the level before March 3. It is a bandage, not a solution. You have to actively apply a workaround to get the performance you are paying for. And you have to trust that Anthropic will not quietly adjust the next parameter. If you are interested in a systematic approach to working with AI agents, my article on context engineering with GSD describes a method that reduces this kind of single-point-of-failure risk.
Five Recommendations Against AI Vendor Lock-in for Mid-Sized Companies
Concrete steps you can take based on these incidents:
- Measure your AI dependency. Document which business processes depend on which AI provider. This is the same exercise as an ERP dependency analysis
- Define an exit strategy. Can you switch to an alternative provider within 30 days? If not, your dependency is too high
- Monitor output quality. Set benchmarks for your specific use cases. When quality drops, you want to know from your own data, not from a Reddit thread
- Evaluate a multi-provider strategy. Building critical workflows on a single AI provider repeats the mistake of a single-vendor ERP strategy
- Push for contractual safeguards. SLAs for output quality are not yet standard. But the more companies demand them, the faster that will change
For Shadow AI in your organization, this applies twice over: when employees build their own AI workflows on a single provider, the risk compounds.
FAQ: Claude Opus 4.6 and AI Performance
What exactly happened with Claude Opus 4.6?
Anthropic introduced “Adaptive Thinking” on February 9, 2026 and lowered the default thinking effort from “high” to “medium” on March 3. Both without a prominent announcement. The result: 67% less thinking depth, a third of all code edits happening blind. On top of that, there were documented cases where users received the cheaper Sonnet model instead of the paid Opus model.
Does this affect me if I only use Claude through the web interface?
The documented issues primarily affect Claude Code (the CLI tool for developers) and the API. The web interface also uses Opus 4.6, but the specific metrics like the Read:Edit ratio do not apply there. The reduced thinking depth potentially affects all usage channels.
Is the /effort max workaround permanent?
No. It is an active setting you must apply per session or via an environment variable. Anthropic can change server-side parameters at any time. A long-term solution would be a contractual guarantee of model quality, which does not exist yet.
How does Opus 4.6 compare to competitors?
According to the Marginlab tracker, Opus 4.6 sits at score 62 on rank 8, with a “VOLA ALERT” (volatile) status. For comparison: Sonnet 4 (score 70, rank 1) and predecessor Opus 4.5 (score 68, rank 3) deliver more stable results at lower cost.
Next Step
You are using AI tools in production and want a realistic assessment of your vendor dependency? I do the inventory and develop a multi-provider strategy with you – one that does not fall apart at the first API update.
-> Schedule an initial call – free of charge
-> Or read more first: AI & Automation for Mid-Sized Companies