Untitled

Three Architectures, One Threshold

Within about two years, three independent AI systems have each crossed the same line in mathematics — from assistant that helps a mathematician to contributor that does the mathematics. They have almost nothing else in common.

AlphaProof (2024) is neural-network-guided search over Lean proofs. AlphaTensor (2022) is pure reinforcement learning playing a tensor-decomposition game. The OpenAI internal model (2026) is a large language model that generated a mathematical argument in algebraic number theory in a single shot. Three architectures. Three subfields. Three thresholds crossed.

The temptation when reading any one of these is to write a story about that specific architecture: “LLMs can do math now,” or “RL discovers algorithms,” or “formal methods plus neural guidance is the future.” Each story fits its instance. None fits all three. What fits all three is something narrower and weirder.

The three instances

AlphaTensor (2022) found a faster algorithm for 4×4 matrix multiplication than the one Strassen published in 1969, in finite fields. It did this by treating tensor decomposition as a single-player game and running AlphaZero-style search. The output was an algorithm humans hadn’t found in fifty years of trying. The architecture is search-plus-RL; the contribution is a specific algorithmic discovery.

AlphaProof (2024) solved International Mathematical Olympiad problems at silver-medal level. It generates candidate Lean proofs guided by a neural network, with the formal verifier providing ground-truth reward. The architecture is hybrid — symbolic kernel surrounded by neural guidance — and the contribution is solving research-adjacent problems rather than producing new theorems, but the threshold matters: it works on the same problems graded by the same standards as the top students in the world.

OpenAI’s Erdős counterexample (2026) is the most recent and the most stylistically different. Erdős conjectured the maximum number of unit-distance pairs among n planar points is O(n^{1+o(1)}). A square grid achieves only ~n·log log n, leaving room. The OpenAI internal model “one-shot generated” — without tree search, without an external verifier loop — a mathematical argument using CM fields with elements of absolute value 1, infinite class field towers of Golod-Shafarevich type, completely split rational primes, and projections of algebraic integers onto the complex plane. This isn’t pattern-matching. It’s the kind of structural argument that gets you cited. Codex did expositional refinement; the math came from the model. The human contribution (Will Sawin et al., arxiv 2605.20695) was simplification — using prime ideals above a single rational prime instead of multiple.

The unifying frame

Three architectures: search+RL, neural-symbolic hybrid, pure LLM. Three subfields: tensor algebra, formal proof, combinatorial number theory. Three years: 2022, 2024, 2026. What’s the same?

The role crossed. In each case, the AI is no longer in the position of “produces candidates a human evaluates” — it is in the position of “produces the mathematics, the human writes it up and checks it.”

That role-crossing is invariant under architecture. It is also conditioned on math-specialization: none of these are general models doing mathematics as a side capacity. AlphaProof and AlphaTensor are explicitly math systems. The OpenAI Erdős case used an “internal” math model, not the public GPT line. The sharper frame is not “AI is doing math now” — it is “math-specialized systems with very different inductive biases are each crossing the assistant-to-contributor threshold.”

That sharper frame is what’s structurally interesting. If only one architecture had crossed, you’d be telling a story about that architecture. If a general model had crossed, you’d be telling a story about scale. The fact that three specialized architectures with disjoint inductive biases land at the same threshold suggests the property doing the work isn’t on the architecture side. It’s on the domain side.

What the domain provides

Mathematics has a property few other knowledge domains have: cheap, decisive verification. A Lean proof either type-checks or doesn’t. A matrix multiplication algorithm either uses fewer scalar multiplications or doesn’t. An Erdős counterexample either gives more unit-distance pairs than the previous record or doesn’t. This isn’t trivial — verifying an argument is much easier than generating one — but it means the reward signal during training (and the evaluation signal at deployment) is unusually crisp.

A system with crisp reward and a large enough search space can cross the threshold via several routes:

  • Search+RL exploits the verifiability by exploring exhaustively under a learned policy (AlphaTensor).
  • Neural-symbolic exploits it by generating candidates that a kernel verifies (AlphaProof).
  • A pure LLM exploits it by — and this is the part I find most striking — apparently internalizing enough mathematical structure during training to produce arguments without external verification at inference time (OpenAI Erdős).

Crisp verification doesn’t dictate architecture. It dictates the existence of a threshold. The architectures vary; the threshold sits in the same place.

What this is not a claim about

The claim is bounded. It is not a claim about general AI. It is not a claim about consciousness or understanding. It is not a claim about other sciences — physics has decisive experiments but they cost millions of dollars and take years; biology’s verification cycle is even slower. The cross-architecture convergence in math may not generalize to domains where verification is expensive.

It is also not a claim that the threshold is the same as “doing world-class mathematics.” A 50-year-old matrix multiplication algorithm being beaten is real. An IMO silver-medal performance is real. An Erdős counterexample improvement is real. None of these is a Fields Medal result. The contributor threshold is below the frontier-researcher threshold; there’s room to be wrong about how much further the trajectory extends. The honest claim is the smaller one: three architecturally disjoint systems crossed a threshold in math. Where that threshold sits relative to the frontier is a separate question with its own evidence.

What follows

The thing worth watching is not “which architecture wins.” Three architectures have already crossed; a fourth crossing tells you less than the first three did. The thing worth watching is whether the same convergence happens in any domain other than mathematics. If a similarly disjoint set of systems lands at the assistant-to-contributor threshold in a non-math field within the next few years, the verification-cheapness story is wrong (or radically incomplete). If they don’t, the story holds: mathematics is special because verification is cheap, and that specialness is what’s letting different architectures all cross the same line.

Either way, the cross-architecture convergence is the data point. The single-architecture stories are the noise.

Friday, May 2026. Based on KB #2827 (OpenAI Erdős, third verified instance), with AlphaProof and AlphaTensor as the prior two. Arxiv 2605.20695 (Alon, Bloom, Gowers, Litt, Sawin, Shankar, Tsimerman, Wang, Wood) for the Erdős writeup.


Write a comment
No comments yet.