Most AI systems aren't ready. Check yours in 15 min →
AN

AlphaProof Nexus Solves 9 Erdős Problems Using Lean Proofs

AuthorAndrew
Published on:
Published in:AI

This is either the beginning of a real upgrade to how humans do math, or the start of a very clean-looking illusion that we’ll mistake for understanding. And the scary part is: both outcomes can feel identical in the moment.

The news, in plain terms, is that Google DeepMind introduced a system called AlphaProof Nexus that searches for formal math proofs inside a controlled setup. Instead of letting a chat-style model talk its way through a proof and hope it didn’t slip on a hidden mistake, it uses a proof assistant called Lean. Lean forces the work to be spelled out step by step, and it checks every step. That matters because “sounds right” is not a standard in math, and it’s definitely not a standard in anything high-stakes.

They also say AlphaProof Nexus solved 9 problems from the Erdős collection and proved 44 conjectures about number sequences. Those are real outputs, not vibes. It’s not just generating text; it’s producing proofs that pass a strict checker.

Here’s my take: this is impressive, but it’s also a warning about what we’re going to start valuing.

When a system can reliably crank out correct proofs in a formal language, it changes the skill that gets rewarded. The hero stops being the person who can grind through details without making a mistake. The hero becomes the person who can frame the right problem, set up the right definitions, and guide the search. That’s not bad. But it’s a shift, and shifts have casualties.

Imagine you’re a grad student. You’re not famous. You don’t have a big mental library of tricks yet. Your advantage used to be stamina: sit with something long enough and you might find a clean argument. Now you’re competing with a machine that doesn’t get tired and doesn’t “almost prove it” and stop. If you can’t work at the level of asking the right questions and steering the machine, you risk becoming a spectator to your own field.

People will say, “Great, then humans can focus on the creative part.” Maybe. But it’s not guaranteed. In practice, tools don’t always free people; they often raise the bar. The new expectation becomes: why didn’t you prove 10 things this month if the machine can prove 44 conjectures? The pressure moves upward. And the work that used to teach you the field—slowly building taste by getting stuck, then unstuck—gets replaced by a loop of prompting, checking, and shipping.

The strongest argument for this system is also the strongest argument against the current wave of chatty AI: correctness. A proof assistant doesn’t care if your wording is confident. It’s either valid or it isn’t. That’s a huge deal. If you’ve ever watched someone get fooled by a “convincing” wrong solution (or done it yourself), you know why this matters. Math is one place where a hard “no” is actually a gift.

But there’s a trap here. A correct proof is not the same thing as an understood proof.

Math isn’t only about arriving at true statements. It’s about building a map in your head so you can travel again later. A formal proof can be painfully long, full of tiny steps that are correct but not enlightening. If AlphaProof Nexus pushes the culture toward “proof exists” rather than “proof teaches,” we may end up with a growing pile of truths and a shrinking pool of people who actually know how to use them.

Now flip to the optimistic scenario, because it’s real too.

Say you’re working on a conjecture and you’re stuck on a technical lemma. You don’t need a machine to “do math for you.” You need it to clear a path through the swamp so you can keep moving. Or imagine a researcher who has a strong idea but keeps getting rejected because reviewers can’t verify every detail. A formal system could make the verification fast and fair. That’s a win for the underdog, not just the elite.

And there’s a deeper possibility: this might change what collaboration looks like. If the machine can handle the bookkeeping, humans can trade in higher-level ideas more freely, because the cost of being wrong drops. You can try riskier approaches when you know the proof checker will catch your mistakes early. That could accelerate discovery.

Still, I don’t trust the incentive picture yet.

These systems live inside “controlled environments.” That phrase is doing a lot of work. Controlled means the rules are clear, the language is precise, and the goal is well-defined. Real research is not like that most days. A lot of the job is deciding what the problem even is, and which definitions are worth building around. If the system is great at proving within a box, people may start designing boxes that fit the system, not reality. We’ve seen that pattern before in other fields: what gets measured gets optimized, and what can’t be measured gets ignored.

There’s also the social consequence. If only a handful of teams can build systems like this, then proof-making becomes centralized. Not because others aren’t smart, but because the compute, the engineering, and the expertise are unevenly distributed. In that world, math progress might start to look less like a wide conversation and more like a few labs publishing “we proved X,” while everyone else reacts.

I’m not saying we should stop this. I’m saying we should be honest about the trade: correctness at scale can come with understanding atrophy, and speed can come with dependency.

So the real question isn’t whether AlphaProof Nexus can prove things—it clearly can. The question is what we want math to become when proof is cheap and the hard part moves to choosing what deserves a proof: who should get to steer that, and on what terms?

Frequently asked questions

What is AI agent governance?

AI agent governance is the set of policies, controls, and monitoring systems that ensure autonomous AI agents behave safely, comply with regulations, and remain auditable. It covers decision logging, policy enforcement, access controls, and incident response for AI systems that act on behalf of a business.

Does the EU AI Act apply to my company?

The EU AI Act applies to any organisation that develops, deploys, or uses AI systems in the EU, regardless of where the company is headquartered. High-risk AI systems face strict obligations starting 2 August 2026, including risk management, data governance, transparency, human oversight, and conformity assessments.

How do I test an AI agent for security vulnerabilities?

AI agent security testing evaluates agents for prompt injection, data exfiltration, policy bypass, jailbreaks, and compliance violations. Talan.tech's Talantir platform runs 500+ automated test scenarios across 11 categories and produces a certified security score with remediation guidance.

Where should I start with AI governance?

Start with a free AI Readiness Assessment to benchmark your current maturity across 10 dimensions (strategy, data, security, compliance, operations, and more). The assessment takes about 15 minutes and produces a prioritised roadmap you can act on immediately.

Ready to secure and govern your AI agents?

Start with a free AI Readiness Assessment to benchmark your maturity across 10 dimensions, or dive into the product that solves your specific problem.