Hacker News

Maxproof

101 points by ilreb - 9 comments

thatsgcasey [3 hidden]5 mins ago

A pet project of mine is to generate podcasts to help me understand research papers while commuting to work. I ran the pipeline on this paper -neat stuff! https://paperdive.ai/episodes/133-maxproof-scaling-mathemati...

daquisu [3 hidden]5 mins ago

"I thought it was interesting and a bit underappreciated that the fraction of gold medalists at the 2025 IMO (72/630 = 11.4%) is the highest it’s been since 1981.

Crudely, IMO gold medals are awarded to the highest-scoring 1/12 of contestants.1 However, because scores are integers up to 42 and there’s no provision for tiebreaking, it’s possible for a lot of contestants to be tied around the threshold. In that case, either all of them get a gold medal or none do, and the fraction of gold medalists might deviate substantially from 1/12. That’s what happened this year: 46 contestants all won a gold medal by scoring exactly 35 points.

In fact, bizarrely, 35 is the mode of the scores this year; the last time the modal score was a gold medal score was in 1994. And, of course, 35 is the same score claimed by AI systems from Google, OpenAI, and others."

From https://blog.vero.site/post/imo-2025

quibono [3 hidden]5 mins ago

I was under the impression that IMO is conducted in an official "exam" capacity, on site and in a very formal setting. So I find it hard to believe _direct_ LLM usage would be a factor Then again - it very well could be a factor in the training and preparation? I imagine "Write me a prep document for the IMO" will surface all kinds of interesting things from the training set.

quietbritishjim [3 hidden]5 mins ago

> And, of course, 35 is the same score claimed by AI systems from Google, OpenAI, and others.

This is the part of the quote your6 replying about.

You seemed to take "of course" as an implication that the contestants used LLMs, and that's why they got the same score as the LLMs.

I took it to mean: since this was the modal score, there seemed to be 35 points worth of significantly easier answers (relatively speaking) than the remaining points, so it's not a surprise that LLMs got the same easier bits right. (Though I doubt all contestants got their points on exactly the same answers.)

But it's certainly unclear what exactly the author meant.

dooglius [3 hidden]5 mins ago

This is not bizarre, it's a reflection of how the IMO is scored: 6 questions with scores from 0-7 but partial credit is rare. It's really a score of 5/6.

pfannl [3 hidden]5 mins ago

The real AGI test is apparently not solving the IMO, but getting caught in the same scoring traffic jam as 46 teenagers.

thierrydamiba [3 hidden]5 mins ago

Is the harness more valuable than the weights?

korbonits [3 hidden]5 mins ago

Proves the need for more formal verification :)

minimaxir [3 hidden]5 mins ago

not a good day to be named Max