GitHub's new Rubber Duck feature pairs Claude models with GPT-5.4 for independent code review, closing 74.7% of the performance gap between Sonnet and Opus. (ReadGitHub's new Rubber Duck feature pairs Claude models with GPT-5.4 for independent code review, closing 74.7% of the performance gap between Sonnet and Opus. (Read

GitHub Copilot CLI Adds Rubber Duck Feature for Cross-Model AI Code Review

2026/04/09 01:06
3분 읽기
이 콘텐츠에 대한 의견이나 우려 사항이 있으시면 crypto.news@mexc.com으로 연락주시기 바랍니다

GitHub Copilot CLI Adds Rubber Duck Feature for Cross-Model AI Code Review

Jessie A Ellis Apr 08, 2026 17:06

GitHub's new Rubber Duck feature pairs Claude models with GPT-5.4 for independent code review, closing 74.7% of the performance gap between Sonnet and Opus.

GitHub Copilot CLI Adds Rubber Duck Feature for Cross-Model AI Code Review

GitHub just shipped a feature that addresses one of the most frustrating problems with AI coding assistants: they make confident mistakes that snowball into bigger messes. The new Rubber Duck capability, now available in experimental mode for Copilot CLI, brings in a second AI model from a completely different family to critique the primary agent's work.

Here's the setup: when you're running a Claude model as your main orchestrator, Rubber Duck deploys GPT-5.4 as an independent reviewer. The goal isn't just catching typos—it's questioning architectural decisions before they become expensive technical debt.

The Numbers Worth Knowing

GitHub tested this on SWE-Bench Pro, a benchmark of gnarly real-world coding problems from open-source repos. Claude Sonnet 4.6 paired with Rubber Duck closed 74.7% of the performance gap between Sonnet and the more expensive Opus model running solo.

The gains weren't uniform. Rubber Duck showed the strongest results on complex problems spanning 3+ files that typically require 70+ steps to resolve. On these harder tasks, the Sonnet + Rubber Duck combo scored 3.8% higher than baseline Sonnet, jumping to 4.8% higher on the most difficult problems identified across three trials.

What It Actually Catches

GitHub shared specific examples from their testing. In one OpenLibrary case, Rubber Duck flagged that a proposed scheduler would start and immediately exit without running any jobs—and spotted that even if fixed, one scheduled task contained an infinite loop.

Another catch: a single-line bug in a Solr integration where a loop silently overwrote the same dictionary key on every iteration. Three of four facet categories were being dropped from search queries with zero errors thrown. That's the kind of bug that passes code review and then haunts you in production for months.

A third example involved a NodeBB email confirmation flow where three files all read from a Redis key that new code stopped writing to. The confirmation UI and cleanup paths would have broken silently on deploy.

When It Kicks In

Rubber Duck activates at three checkpoints: after drafting a plan (where GitHub expects the biggest wins), after complex implementations, and after writing tests but before running them. The agent can also call for a critique when it gets stuck in a loop.

Users can trigger a review manually at any point. Copilot queries Rubber Duck, processes the feedback, and shows what changed and why.

The feature works with all Claude family models—Opus, Sonnet, and Haiku—as orchestrators. GitHub says they're already exploring other model family pairings, including options for when GPT-5.4 serves as the primary orchestrator.

To access Rubber Duck, install GitHub Copilot CLI and run the /experimental slash command. You'll need access to GPT-5.4 enabled and a Claude model selected from the picker. Feedback goes to GitHub's community discussion board.

Image source: Shutterstock
  • github
  • ai coding
  • copilot cli
  • developer tools
  • machine learning
시장 기회
DuckChain 로고
DuckChain 가격(DUCK)
$0.00006989
$0.00006989$0.00006989
-0.12%
USD
DuckChain (DUCK) 실시간 가격 차트
면책 조항: 본 사이트에 재게시된 글들은 공개 플랫폼에서 가져온 것으로 정보 제공 목적으로만 제공됩니다. 이는 반드시 MEXC의 견해를 반영하는 것은 아닙니다. 모든 권리는 원저자에게 있습니다. 제3자의 권리를 침해하는 콘텐츠가 있다고 판단될 경우, crypto.news@mexc.com으로 연락하여 삭제 요청을 해주시기 바랍니다. MEXC는 콘텐츠의 정확성, 완전성 또는 시의적절성에 대해 어떠한 보증도 하지 않으며, 제공된 정보에 기반하여 취해진 어떠한 조치에 대해서도 책임을 지지 않습니다. 본 콘텐츠는 금융, 법률 또는 기타 전문적인 조언을 구성하지 않으며, MEXC의 추천이나 보증으로 간주되어서는 안 됩니다.

USD1 Genesis: 0 Fees + 12% APR

USD1 Genesis: 0 Fees + 12% APRUSD1 Genesis: 0 Fees + 12% APR

New users: stake for up to 600% APR. Limited time!