The post Viral BridgeBench Post Claims Claude Opus 4.6 Was ‘Nerfed,’ Critics Call It Bad Science appeared on BitcoinEthereumNews.com. BridgeMind AI claimed AnthropicThe post Viral BridgeBench Post Claims Claude Opus 4.6 Was ‘Nerfed,’ Critics Call It Bad Science appeared on BitcoinEthereumNews.com. BridgeMind AI claimed Anthropic

Viral BridgeBench Post Claims Claude Opus 4.6 Was ‘Nerfed,’ Critics Call It Bad Science

2026/04/13 21:42
3분 읽기
이 콘텐츠에 대한 의견이나 우려 사항이 있으시면 crypto.news@mexc.com으로 연락주시기 바랍니다

BridgeMind AI claimed Anthropic’s Claude Opus 4.6 was secretly degraded after a hallucination benchmark retest. The viral post has since drawn sharp criticism for flawed methodology.

The claim triggered widespread debate over whether AI companies are quietly downgrading paid models to reduce costs.

BridgeMind Claims a 98% Surge in Hallucinations

BridgeMind, the team behind the BridgeBench coding benchmark, posted that Claude Opus 4.6 had fallen from second to tenth place on its hallucination leaderboard. Accuracy reportedly dropped from 83.3% to 68.3%.

The post framed this as proof of “reduced reasoning levels.” However, a closer look at the underlying data tells a different story.

Critics Say the Comparison Is Fundamentally Flawed

According to computer scientist Paul Calcraft, the claim is “incredibly bad science,” highlighting a critical problem with the methodology.

The original high score came from just six benchmark tasks. The new retest expanded the benchmark to 30 tasks.

On the six overlapping tasks, performance was nearly identical, dropping only from 87.6% to 85.4%.

That small swing came mostly from a single extra fabrication in one task. With no repeated runs, this falls well within normal statistical variance for AI models.

Large language models are not deterministic, and one bad output on a small sample can shift results significantly.

Broader Frustrations Fuel the Narrative

Still, the post struck a nerve. Since its February 2026 launch, Claude Opus 4.6 has faced persistent complaints about perceived quality decline.

Developers report shorter responses, weaker instruction-following, and reduced reasoning depth during peak hours.

Some of this traces to deliberate product changes. Anthropic introduced adaptive thinking controls that let the model self-adjust its reasoning budget. The default effort level was later set to medium, prioritizing efficiency over maximum depth.

An independent analysis of over 6,800 Claude Code sessions found reasoning depth dropped roughly 67% by late February.

The model’s file-read ratio before editing code fell from 6.6 to 2.0. That suggests it attempted fixes on code it had barely reviewed.

What This Means for AI Users

This reflects a growing tension in the AI industry. Companies optimize models for cost and scale after launch, while heavy users expect consistent peak performance. The gap between those priorities erodes trust.

Based on the available evidence, the BridgeBench data does not prove a deliberate downgrade. The benchmark comparison was apples-to-oranges, and the overlapping results were nearly identical.

However, the underlying frustration is not entirely baseless. Adaptive compute controls and service-level optimizations have changed how Claude Opus 4.6 behaves in practice. For developers relying on consistent output, those changes matter.

Anthropic has not issued a public statement on the specific BridgeBench claims as of April 13.

The post Viral BridgeBench Post Claims Claude Opus 4.6 Was ‘Nerfed,’ Critics Call It Bad Science appeared first on BeInCrypto.

Source: https://beincrypto.com/claude-opus-nerfed-bridgebench-claim-backlash/

시장 기회
4 로고
4 가격(4)
$0.01326
$0.01326$0.01326
+1.53%
USD
4 (4) 실시간 가격 차트
면책 조항: 본 사이트에 재게시된 글들은 공개 플랫폼에서 가져온 것으로 정보 제공 목적으로만 제공됩니다. 이는 반드시 MEXC의 견해를 반영하는 것은 아닙니다. 모든 권리는 원저자에게 있습니다. 제3자의 권리를 침해하는 콘텐츠가 있다고 판단될 경우, crypto.news@mexc.com으로 연락하여 삭제 요청을 해주시기 바랍니다. MEXC는 콘텐츠의 정확성, 완전성 또는 시의적절성에 대해 어떠한 보증도 하지 않으며, 제공된 정보에 기반하여 취해진 어떠한 조치에 대해서도 책임을 지지 않습니다. 본 콘텐츠는 금융, 법률 또는 기타 전문적인 조언을 구성하지 않으며, MEXC의 추천이나 보증으로 간주되어서는 안 됩니다.

USD1 Genesis: 0 Fees + 12% APR

USD1 Genesis: 0 Fees + 12% APRUSD1 Genesis: 0 Fees + 12% APR

New users: stake for up to 600% APR. Limited time!