GitHub's new Rubber Duck feature pairs Claude models with GPT-5.4 for independent code review, closing 74.7% of the performance gap between Sonnet and Opus. (ReadGitHub's new Rubber Duck feature pairs Claude models with GPT-5.4 for independent code review, closing 74.7% of the performance gap between Sonnet and Opus. (Read

GitHub Copilot CLI Adds Rubber Duck Feature for Cross-Model AI Code Review

2026/04/09 01:06
Okuma süresi: 3 dk
Bu içerikle ilgili geri bildirim veya endişeleriniz için lütfen crypto.news@mexc.com üzerinden bizimle iletişime geçin.

GitHub Copilot CLI Adds Rubber Duck Feature for Cross-Model AI Code Review

Jessie A Ellis Apr 08, 2026 17:06

GitHub's new Rubber Duck feature pairs Claude models with GPT-5.4 for independent code review, closing 74.7% of the performance gap between Sonnet and Opus.

GitHub Copilot CLI Adds Rubber Duck Feature for Cross-Model AI Code Review

GitHub just shipped a feature that addresses one of the most frustrating problems with AI coding assistants: they make confident mistakes that snowball into bigger messes. The new Rubber Duck capability, now available in experimental mode for Copilot CLI, brings in a second AI model from a completely different family to critique the primary agent's work.

Here's the setup: when you're running a Claude model as your main orchestrator, Rubber Duck deploys GPT-5.4 as an independent reviewer. The goal isn't just catching typos—it's questioning architectural decisions before they become expensive technical debt.

The Numbers Worth Knowing

GitHub tested this on SWE-Bench Pro, a benchmark of gnarly real-world coding problems from open-source repos. Claude Sonnet 4.6 paired with Rubber Duck closed 74.7% of the performance gap between Sonnet and the more expensive Opus model running solo.

The gains weren't uniform. Rubber Duck showed the strongest results on complex problems spanning 3+ files that typically require 70+ steps to resolve. On these harder tasks, the Sonnet + Rubber Duck combo scored 3.8% higher than baseline Sonnet, jumping to 4.8% higher on the most difficult problems identified across three trials.

What It Actually Catches

GitHub shared specific examples from their testing. In one OpenLibrary case, Rubber Duck flagged that a proposed scheduler would start and immediately exit without running any jobs—and spotted that even if fixed, one scheduled task contained an infinite loop.

Another catch: a single-line bug in a Solr integration where a loop silently overwrote the same dictionary key on every iteration. Three of four facet categories were being dropped from search queries with zero errors thrown. That's the kind of bug that passes code review and then haunts you in production for months.

A third example involved a NodeBB email confirmation flow where three files all read from a Redis key that new code stopped writing to. The confirmation UI and cleanup paths would have broken silently on deploy.

When It Kicks In

Rubber Duck activates at three checkpoints: after drafting a plan (where GitHub expects the biggest wins), after complex implementations, and after writing tests but before running them. The agent can also call for a critique when it gets stuck in a loop.

Users can trigger a review manually at any point. Copilot queries Rubber Duck, processes the feedback, and shows what changed and why.

The feature works with all Claude family models—Opus, Sonnet, and Haiku—as orchestrators. GitHub says they're already exploring other model family pairings, including options for when GPT-5.4 serves as the primary orchestrator.

To access Rubber Duck, install GitHub Copilot CLI and run the /experimental slash command. You'll need access to GPT-5.4 enabled and a Claude model selected from the picker. Feedback goes to GitHub's community discussion board.

Image source: Shutterstock
  • github
  • ai coding
  • copilot cli
  • developer tools
  • machine learning
Piyasa Fırsatı
DuckChain Logosu
DuckChain Fiyatı(DUCK)
$0.000177
$0.000177$0.000177
-11.50%
USD
DuckChain (DUCK) Canlı Fiyat Grafiği
Sorumluluk Reddi: Bu sitede yeniden yayınlanan makaleler, halka açık platformlardan alınmıştır ve yalnızca bilgilendirme amaçlıdır. MEXC'nin görüşlerini yansıtmayabilir. Tüm hakları telif sahiplerine aittir. Herhangi bir içeriğin üçüncü taraf haklarını ihlal ettiğini düşünüyorsanız, kaldırılması için lütfen crypto.news@mexc.com ile iletişime geçin. MEXC, içeriğin doğruluğu, eksiksizliği veya güncelliği konusunda hiçbir garanti vermez ve sağlanan bilgilere dayalı olarak alınan herhangi bir eylemden sorumlu değildir. İçerik, finansal, yasal veya diğer profesyonel tavsiye niteliğinde değildir ve MEXC tarafından bir tavsiye veya onay olarak değerlendirilmemelidir.

$30,000 in PRL + 15,000 USDT

$30,000 in PRL + 15,000 USDT$30,000 in PRL + 15,000 USDT

Deposit & trade PRL to boost your rewards!