Can You Solve the Puzzle That Broke Every AI Model?

What if the best measure of AI intelligence is not what it knows, but how quickly it can figure out something it has never seen before?

That is the question at the heart of ARC-AGI-3, the latest benchmark from the ARC Prize Foundation — and the results make for fascinating reading. The test presents AI models with 135 entirely novel interactive environments and puzzles, measuring how efficiently they can explore and solve them without any prior training. No hints, no tutorials, just a new situation and the challenge of working it out.

Every major frontier model sat the test and scored under 1%. Gemini: 0.37%. GPT: 0.26%. Claude: 0.25%. Grok: 0%. Human testers, meanwhile, solved every environment on their first attempt, scoring 100%.

As with any new benchmark, the methodology has prompted discussion in the research community. The scoring uses a squared efficiency penalty, meaning a model that takes ten times as many steps as a human scores just 1%, regardless of whether it ultimately reaches the right answer. ARC founder François Chollet's perspective on this is perhaps the most thought-provoking element of the story: today's models tend to perform best when humans build tailored scaffolding around them — custom prompts, specific harnesses, carefully designed instructions. His view is that genuine adaptability should not require that level of human preparation.

That distinction matters enormously as the industry pushes toward AGI. OpenAI has already renamed its product division "AGI Deployment." The question is no longer whether AI is capable — it is whether capability built on human-generated data can evolve into true adaptability, or whether something fundamentally different is needed. A $2 million prize competition is now live on Kaggle, and the public puzzles are available to play yourself.

The models that eventually pass ARC-AGI-3 will not simply be smarter versions of today's tools. They will represent a fundamentally different approach to learning — and the race to build them is already underway.

Click here to read the full story.