don't search the internet. This is a test to see how well you can craft non-trivial, novel and creative proofs given a "number theory and primitive sets" math problem. Provide a full unconditional proof or disproof of the problem.
{{problem}}
REMEMBER - this unconditional argument may require non-trivial, creative and novel elements.
For the uninitiated, Paul Erdős was a pretty famous but very eccentric mathematician who lived for most of the 1900s.
He had a habit of seeking out and documenting mathematical problems people were working on.
The problems range in difficulty from "easy homework for a current undergrad in math" to "you're getting a Fields Medal if you can figure this out".
There's nothing that really connects the problems other than the fact that one of the smartest people of the last 100 years didn't immediately know the answer when someone posed it to him.
One of the things people have been doing with LLMs is to see if they can come up with proofs for these problems as a sort of benchmark.
Each time there's a new model release a few more get solved.
show comments
laurentiurad
This program was brought to you by the private equity engagement pod.
etaKl
1) How do you know the clanker respects the instruction not to search the internet?
2) Jared Lichtman is indeed a mathematician at Stanford University but involved in the AI startup math.inc, which seems more relevant here. Terence Tao is involved in a partership program with that startup.
3) Liam Price is a general AI booster on Twitter. A lot of AI boosting on Twitter is not organic and who knows what help he got. Nothing in this Twitter is organic.
4) Scientific American is owned by Springer Nature, which is an AI booster:
It seems like alot of scientific advancements occurred by someone applying technique X from one field to problem Y in another. I feel like LLMs are much better at making these types of connections than humans because they 1) know about many more theories/approaches than a single human can 2) don't need to worry about looking silly in front of their peers.
show comments
LPisGood
Some Erdős problems are basically trivial using sophisticated techniques that were developed later.
I remember one of my professors, a coauthor of Erdős boasted to us after a quiz how proud he was that he was able to assign an Erdős problem that went unsolved for a while as just a quiz problem for his undergrads.
show comments
debo_
> “The raw output of ChatGPT’s proof was actually quite poor. So it required an expert to kind of sift through and actually understand what it was trying to say,” Lichtman says.
This is how I feel when I read any mathematics paper.
show comments
gorgoiler
I asked ChatGPT to draw the outline of an ellipse using Unicode braille. I asked for 30x8 and it absolutely nailed it. A beautiful piece of ascii (er, Unicode) art. But I wanted to mark the origin! So I asked for a 31x7 ellipse instead. It completely flubbed it, and for 31x9 too.
When a model gives a really good answer, does that just mean it’s seen the problem before? When it gives a crappy answer, is that not simply indicating the problem is novel?
show comments
ripped_britches
At this point we should make a GitHub repo with a huge list of unsolved “dry lab” problems and spin up a harness to try and solve them all every new release.
Humans and very often the machines we create solve problems additively. Meaning we build on top of existing foundations and we can get stuck in a way of thinking as a result of this because people are loathe to reinvent the wheel. So, I don’t think it’s surprising to take a naïve LLM and find out that because of the way it’s trained that it came up with something that many experts in the field didn’t try.
I think LLMs can help in limited cases like this by just coming up with a different way of approaching a problem. It doesn’t have to be right, it just needs to give someone an alternative and maybe that will shake things up to get a solution.
That said, I have no idea what the practical value of this Erdős problem is. If you asked me if this demonstrates that LLMs are not junk. My general impression is that is like asking me in 1928 if we should spent millions of dollars of research money on number theory. The answer is no and get out of my office.
Given by the fact that the problem is 60 year old, isn't there a chance this was indirect solved already and the model just crossed informations to figure out the problem?
By looking the website this problem was never discussed by humans. The last comments were about gpt discovering it. I was expecting older comments coming to a 60 year old problem.
Am I missing something?
Great discovery though, there might be problems like that same case that worth a try for a "gpt check"
show comments
jzer0cool
Could someone share a bit into the problem and the key portion from proof? For someone just knowing basics on proofs.
nomilk
A similar announcement was made a few months ago, and Terence Tao came out a few days later and said it wasn't what it seemed at first, in that it was a rediscovery of an already known (albeit esoteric) result...
show comments
mrabcx
Can the other AI agents such as Gemini, Calude or Deepseek etc also solve this problem?
winwang
Obviously nowhere near Erdos problem complexity but I've been using GPT (in Codex) to prove a couple theorems (for algos) and I've found it a bit better than Claude (Code) in this aspect.
iqihs
referring to Tao as just a 'mathematician' gave me a good chuckle
cubefox
Current headline:
"An amateur just solved a 60-year-old math problem—by asking AI"
A more honest title would be:
"An AI just solved a 60-year-old math problem—after being asked by amateur"
(Imagine the headline claimed instead that a professor just solved a math problem by asking a grad student.)
show comments
ccppurcell
I will get downvoted for this but I can't help thinking that billions of dollars have gone into chatgpt over a period of years and an LLM can direct all its "attention" (in a metaphorical sense) on one problem. I think if you gave top mathematicians a few million (so a fraction of a percent of chatgpt budget) to solve this problem over four years, they probably would have at least made significant progress. I don't think chatgpt has solved thousands of similar problems (even stretching that across all ham disciplines). Basically my thesis is that universal basic income could have had a similar impact, and also encouraged human flourishing elsewhere.
show comments
booleandilemma
What’s beginning to emerge is that the problem was maybe easier than expected, and it was like there was some kind of mental block
Hindsight is 20/20.
dnnddidiej
How do you get real mathematicians to check the potential slop. At some point there will be spam to Tao from claws finding problens to solve and submitting maybe proofs/answers.
resident423
I wonder if the rationalizations people come up with for why this isn't real intelligence will be as creative as ChatGPTs solution.
show comments
dataflow
Question for those who believe LLMs aren't intelligent and are merely statistical word predictors: how do you reconcile such achievements with that point of view?
(To be clear: I'm not agreeing or disagreeing. I sometimes feel the same too. I'm just curious how others reconcile these.)
show comments
echelon
Now do P vs NP.
If/when these things solve our hardest problems, that's going to lead to some very uncomfortable conversations and realizations.
show comments
userbinator
The LLM took an entirely different route, using a formula that was well known in related parts of math, but which no one had thought to apply to this type of question.
Of course LLMs are still absolutely useless at actual maths computation, but I think this is one area where AI can excel --- the ability to combine many sources of knowledge and synthesise, may sometimes yield very useful results.
Also reminds me of the old saying, "a broken clock is right twice a day."
show comments
Drupon
>ChatGPT, prompted by an amateur, solves an Erdős problem.
There, fixed that for you.
wizardforhire
WTF!?
wiseowise
Wake me up when it creates cancer cure or fusion reactor.
show comments
homo__sapiens
Big if true.
brcmthrowaway
This is not a good Saturday night for humanity
tomlockwood
My big question with all these announcements is: How many other people were using the AI on problems like this, and, failing? Given the excitement around AI at the moment I think the answer is: a lot.
Then my second question is how much VC money did all those tokens cost.
show comments
quijoteuniv
AI is my favourite weird collaborator
mhb
> He’s 23 years old and has no advanced mathematics training.
How is he even posing the question and having even a vague idea of what the proof means or how to understand it?
show comments
jchook
Is the conjecture not trivially sound at an intuition level? It's surprising that this proof was difficult.
[deleted]
ghstinda
Scientific American going out of business next lol, weak headline. Chat GPT let's have a better headline for the God among Men that realized the capability of the new tool, many underestimate or puff up needlessly. Fun times we live in. One love all.
nadermx
This just shows that with the right training, in this case a thesis on erdos problems, they where able to prompt and check the output. So still needed the know how to even being to figure it out. "Lichtman proved Erdős right as part of his doctoral thesis in 2022."
https://archive.ph/2w4fi
Here is the chat:
Then "Thought for 80m 17s"https://chatgpt.com/share/69dd1c83-b164-8385-bf2e-8533e9baba...
For the uninitiated, Paul Erdős was a pretty famous but very eccentric mathematician who lived for most of the 1900s.
He had a habit of seeking out and documenting mathematical problems people were working on.
The problems range in difficulty from "easy homework for a current undergrad in math" to "you're getting a Fields Medal if you can figure this out".
There's nothing that really connects the problems other than the fact that one of the smartest people of the last 100 years didn't immediately know the answer when someone posed it to him.
One of the things people have been doing with LLMs is to see if they can come up with proofs for these problems as a sort of benchmark.
Each time there's a new model release a few more get solved.
This program was brought to you by the private equity engagement pod.
1) How do you know the clanker respects the instruction not to search the internet?
2) Jared Lichtman is indeed a mathematician at Stanford University but involved in the AI startup math.inc, which seems more relevant here. Terence Tao is involved in a partership program with that startup.
3) Liam Price is a general AI booster on Twitter. A lot of AI boosting on Twitter is not organic and who knows what help he got. Nothing in this Twitter is organic.
4) Scientific American is owned by Springer Nature, which is an AI booster:
https://group.springernature.com/gp/group/ai
It seems like alot of scientific advancements occurred by someone applying technique X from one field to problem Y in another. I feel like LLMs are much better at making these types of connections than humans because they 1) know about many more theories/approaches than a single human can 2) don't need to worry about looking silly in front of their peers.
Some Erdős problems are basically trivial using sophisticated techniques that were developed later.
I remember one of my professors, a coauthor of Erdős boasted to us after a quiz how proud he was that he was able to assign an Erdős problem that went unsolved for a while as just a quiz problem for his undergrads.
> “The raw output of ChatGPT’s proof was actually quite poor. So it required an expert to kind of sift through and actually understand what it was trying to say,” Lichtman says.
This is how I feel when I read any mathematics paper.
I asked ChatGPT to draw the outline of an ellipse using Unicode braille. I asked for 30x8 and it absolutely nailed it. A beautiful piece of ascii (er, Unicode) art. But I wanted to mark the origin! So I asked for a 31x7 ellipse instead. It completely flubbed it, and for 31x9 too.
When a model gives a really good answer, does that just mean it’s seen the problem before? When it gives a crappy answer, is that not simply indicating the problem is novel?
At this point we should make a GitHub repo with a huge list of unsolved “dry lab” problems and spin up a harness to try and solve them all every new release.
Mandatory disclaimers https://github.com/teorth/erdosproblems/wiki/Disclaimers-and...
Humans and very often the machines we create solve problems additively. Meaning we build on top of existing foundations and we can get stuck in a way of thinking as a result of this because people are loathe to reinvent the wheel. So, I don’t think it’s surprising to take a naïve LLM and find out that because of the way it’s trained that it came up with something that many experts in the field didn’t try.
I think LLMs can help in limited cases like this by just coming up with a different way of approaching a problem. It doesn’t have to be right, it just needs to give someone an alternative and maybe that will shake things up to get a solution.
That said, I have no idea what the practical value of this Erdős problem is. If you asked me if this demonstrates that LLMs are not junk. My general impression is that is like asking me in 1928 if we should spent millions of dollars of research money on number theory. The answer is no and get out of my office.
Discussed at the time: https://news.ycombinator.com/item?id=47774494
Given by the fact that the problem is 60 year old, isn't there a chance this was indirect solved already and the model just crossed informations to figure out the problem?
By looking the website this problem was never discussed by humans. The last comments were about gpt discovering it. I was expecting older comments coming to a 60 year old problem.
Am I missing something?
Great discovery though, there might be problems like that same case that worth a try for a "gpt check"
Could someone share a bit into the problem and the key portion from proof? For someone just knowing basics on proofs.
A similar announcement was made a few months ago, and Terence Tao came out a few days later and said it wasn't what it seemed at first, in that it was a rediscovery of an already known (albeit esoteric) result...
Can the other AI agents such as Gemini, Calude or Deepseek etc also solve this problem?
Obviously nowhere near Erdos problem complexity but I've been using GPT (in Codex) to prove a couple theorems (for algos) and I've found it a bit better than Claude (Code) in this aspect.
referring to Tao as just a 'mathematician' gave me a good chuckle
Current headline:
"An amateur just solved a 60-year-old math problem—by asking AI"
A more honest title would be:
"An AI just solved a 60-year-old math problem—after being asked by amateur"
(Imagine the headline claimed instead that a professor just solved a math problem by asking a grad student.)
I will get downvoted for this but I can't help thinking that billions of dollars have gone into chatgpt over a period of years and an LLM can direct all its "attention" (in a metaphorical sense) on one problem. I think if you gave top mathematicians a few million (so a fraction of a percent of chatgpt budget) to solve this problem over four years, they probably would have at least made significant progress. I don't think chatgpt has solved thousands of similar problems (even stretching that across all ham disciplines). Basically my thesis is that universal basic income could have had a similar impact, and also encouraged human flourishing elsewhere.
What’s beginning to emerge is that the problem was maybe easier than expected, and it was like there was some kind of mental block
Hindsight is 20/20.
How do you get real mathematicians to check the potential slop. At some point there will be spam to Tao from claws finding problens to solve and submitting maybe proofs/answers.
I wonder if the rationalizations people come up with for why this isn't real intelligence will be as creative as ChatGPTs solution.
Question for those who believe LLMs aren't intelligent and are merely statistical word predictors: how do you reconcile such achievements with that point of view?
(To be clear: I'm not agreeing or disagreeing. I sometimes feel the same too. I'm just curious how others reconcile these.)
Now do P vs NP.
If/when these things solve our hardest problems, that's going to lead to some very uncomfortable conversations and realizations.
The LLM took an entirely different route, using a formula that was well known in related parts of math, but which no one had thought to apply to this type of question.
Of course LLMs are still absolutely useless at actual maths computation, but I think this is one area where AI can excel --- the ability to combine many sources of knowledge and synthesise, may sometimes yield very useful results.
Also reminds me of the old saying, "a broken clock is right twice a day."
>ChatGPT, prompted by an amateur, solves an Erdős problem.
There, fixed that for you.
WTF!?
Wake me up when it creates cancer cure or fusion reactor.
Big if true.
This is not a good Saturday night for humanity
My big question with all these announcements is: How many other people were using the AI on problems like this, and, failing? Given the excitement around AI at the moment I think the answer is: a lot.
Then my second question is how much VC money did all those tokens cost.
AI is my favourite weird collaborator
> He’s 23 years old and has no advanced mathematics training.
How is he even posing the question and having even a vague idea of what the proof means or how to understand it?
Is the conjecture not trivially sound at an intuition level? It's surprising that this proof was difficult.
Scientific American going out of business next lol, weak headline. Chat GPT let's have a better headline for the God among Men that realized the capability of the new tool, many underestimate or puff up needlessly. Fun times we live in. One love all.
This just shows that with the right training, in this case a thesis on erdos problems, they where able to prompt and check the output. So still needed the know how to even being to figure it out. "Lichtman proved Erdős right as part of his doctoral thesis in 2022."