Quantum error correction is the tax every quantum computer pays. The size of that tax is set by a code — and codes can be searched for, scored, and proven. Here is how we find better ones, and how we keep ourselves honest about what “better” means.
Physical qubits decohere in microseconds. To compute, you encode one logical qubit across many physical ones and constantly measure error syndromes to catch faults before they corrupt the data. The number of physical qubits each logical qubit costs — the overhead — is the single biggest obstacle between today's machines and useful ones.
That overhead is set by the error-correcting code: how many logical qubits it carries (k), how many errors it tolerates (its distance d), and how many physical qubits it burns (n). Improve the code and you improve every machine built on it. So the code is where we look.
We work with bivariate-bicycle (BB) codes — the family behind IBM's Gross [[144,12,12]] code (Bravyi et al., Nature 2024). A BB code is defined by just two polynomials, A and B, over a torus Zl×Zm; together they fix all 2·l·m qubits and the stabiliser checks. It is a vast, structured search space addressable with very little data per candidate.
Our one added rule is the part that matters for hardware: every stabiliser must be local on a heavy-hex processor (toroidal L1 offset ≤ 4) — a connectivity constraint IBM's code does not satisfy. We search for codes that beat Gross while respecting the machine they'd run on.
We score every candidate by the field-standard figure of merit, k·d²/n — logical qubits times error tolerance squared, per physical qubit. The evaluator was first validated to reproduce all five of IBM's published BB codes exactly, parameters and distance, before we trusted a single new number.
60,000 candidates across three seeds — a broad, unbiased sweep of the heavy-hex-local space.
Mutation and crossover over the best polynomials — how the highest-scoring frontier codes were found.
Tested honestly: language models propose valid codes well, but did not beat random at finding frontier ones. We report the negative result.
After rigorous re-verification, 14 of the top 50 candidates beat Gross on k·d²/n — seven of which we put forward as the headline codes, from 1.8× to 4.4× Gross.
Distance is the whole game, and it is easy to over-state. The fast decoder we search with (BP+OSD) only ever gives an upper bound on a code's distance. To certify, we run an exact integer-programming solver over every logical coset — expensive, but conclusive.
For our lead code [[192,8,24]] the solver did exactly what it should: it corrected the estimate from 26 down to 24. That is the value of certification, and why we label every distance on the site asexact orbound. We even publish a code that failed verification — a leaderboard star whose distance collapsed from 20 to 8 — because an unchecked figure of merit is worth nothing. See it in the explorer →
A bigger k·d²/n is not free. Our highest-scoring codes pack more logical qubits and distance into fewer physical qubits — but their circuit-level threshold (the physical error rate below which error correction actually helps) currently sits below today's IBM gate-error rates. On near-term hardware, as measured with a naive syndrome schedule, they would not yet pay off.
We treat that as the headline, not a footnote. Closing the gap — a fault-tolerant syndrome schedule, full-scale threshold runs, and a real hardware memory experiment — is precisely what the next phase is for. The tradeoff is a genuine scientific finding about this corner of the code space, and the honest version of the story is the only one worth telling.