Sūtra
Under researchResults reproducible · hardware validation pending

The findings.

Every number here comes from the program's results and is reproducible from the open release. The headline: a systematic search found heavy-hex-native codes that beat IBM's Gross code on the figure of merit — and we are exact about which distances are proven and which are bounded.

The search · heavy-hex bivariate-bicycle codes
60,000
candidates searched
3 seeds, random + evolutionary
14 of 50
beat Gross on k·d²/n
after rigorous re-verification
1.8–4.4×
figure-of-merit gain
seven headline codes
2.00×
exact-certified
[[192,8,24]], MIP solver
Finding 01 · the codes

Seven codes that beat the benchmark.

Inspect interactively
Code [[n,k,d]]Torusk·d²/nvs GrossDistancequbits/log.Novelty
[[144, 12, 12]]12×612.01.00×baseline12.0IBM Gross
[[196, 18, 24]]14×752.94.41×bound10.9novel
[[192, 16, 20]]12×833.32.78×bound12.0novel
[[196, 8, 26]]14×727.62.30×bound24.5novel
[[192, 8, 24]]12×824.02.00×exact24.0novel
[[168, 8, 22]]12×723.11.92×bound21.0cyclic risk
[[180, 6, 26]]10×922.51.88×bound30.0cyclic risk
[[198, 6, 27]]11×922.11.84×bound33.0cyclic risk

Distances: exact = certified by a MIP solver over every coset; bound = BP+OSD upper bound at partial coverage. Cyclic-torus codes carry residual prior-art risk pending a live-literature check.

Finding 02 · the certified one

Proven, not just estimated.

For the lead code, we didn't settle for a fast-decoder estimate. An exact solver checked every logical coset — and revised the distance down.

[[192, 8, 24]]
12×8 torus · weight-8 checks · asymmetric d_X=30, d_Z=24
fast-decoder estimate
d ≤ 26
MIP exact
d = 24

k·d²/n = 24.0 = exactly 2.00× Gross. Every one of 255 cosets solved by an integer-programming solver. The asymmetric distance makes it especially strong under biased noise. This is the result we'd stake the paper on.

Finding 03 · the honest tradeoff

More merit, less threshold — and we say so.

The figure-of-merit gain comes at a cost: the highest-scoring codes currently have a circuit-level threshold below today's hardware. This is the central result of the study, not a caveat buried in an appendix.

What we gained
  • Up to 4.4× Gross on k·d²/n.
  • As few as 10.9 physical qubits per logical (Gross: 12).
  • Heavy-hex-local checks — a constraint Gross doesn't meet.
  • One distance certified exactly.
What it costs (for now)
  • Circuit-level threshold below IBM gate-error rates.
  • Highest-scoring code uses heavier weight-12 checks.
  • Threshold measured with a naive, non-FT schedule.
  • No hardware run yet — the demo is designed, not done.
Finding 04 · the ledger

Proven, bounded, pending, null.

The whole credibility of a QEC result rests on this distinction. So we publish it line by line — including the negative results and the work still to do.

Proven
Codes beating Gross on k·d²/n

14 of the top 50 candidates beat Gross after rigorous BP+OSD re-verification; reproducible from the open release.

Certified
[[192,8,24]] distance d = 24

Exact MIP solver, every one of 255 cosets solved. It corrected the fast-decoder estimate downward — the value of certification.

Bounded
High-k distances (e.g. [[196,18,24]], [[192,16,20]])

BP+OSD upper bounds at partial coverage (≈12–40% of logical classes). Strong, but not yet exactly certified — k is too large for the MIP solver.

Bounded
Heavy-hex implementability

Enforced via a toroidal L1 ≤ 4 locality surrogate; an explicit Eagle/Heron graph embedding has not yet been produced.

Pending
Circuit-level threshold vs Gross

Currently below IBM gate-error rates — the figure-of-merit gain trades against threshold. Full-scale threshold runs and a fault-tolerant schedule are the next phase.

Pending
Novelty vs the literature

Smith-Normal-Form analysis rules out equivalence to IBM's published codes; a live arXiv / code-tables dedupe is still required before any novelty claim is final.

Null result
LLM-guided discovery

Tested honestly across three rounds: language models are good constraint-respecting proposers but did not beat random search at finding frontier codes. Reported as a negative result.

Not yet
Hardware demonstration

An IBM memory-experiment protocol is designed and preflighted, but has not been run. No experimental logical-error claim is made.

The discipline, in one number

[[192, 20, 20]] topped the raw leaderboard at a claimed score of 41.67 (d=20). Rigorous verification dropped it to d=8, score 6.67 (Δd=-12). We publish failures like this on purpose.

What stands between this and a hardware result is the next experiment.

A fault-tolerant schedule, full threshold runs, and a memory demo on real hardware — that is where backing makes the difference.