This reminds me of Antirez's "Don't fall into the anti-AI hype" [0]
In a sentence: These foundation models are really good at optimizing these extremely high level, extremely well defined problem spaces (ie multiply matrices faster). In Antirez's case, it's "make Redis faster".
There have been two reactions: "Oh it would never work for me" and "I have seen months of my life accomplished in an hour", and I think they're both right. I think we should be excited for Antirez, (who has since been popping off [1]), and I think the rest of us should rest easy knowing that LLM's can't (and maybe were never meant to) tackle the tacit-knowledge-filled, human-system-centric, ambiguously-defined-problem-space jobs most mortals work.
The AI CEOs love to pontificate about AI curing cancer, but it seems like DeepMind is the only one actively working on these research problems, while OpenAI/Anthropic largely chase enterprise/coding revenue.
show comments
alecco
Are Googlers themselves happy using Gemini coding agent instead of Claude Code or Codex? (no snark, I'm really asking)
show comments
pingou
AI improving itself (or at least the architecture it runs on), the singularity is near as they say.
Do we have other examples of AI being used to improve the LLMs, apart for the creation of synthetic data and the testing of the models?
show comments
zkmon
An issue I have been noticing with claude is, for simple tasks, it gives extremely bloated code and artifacts, which sometime does not even work. Gemini balances it quite well, by giving a working solution with the exact amount of code and minimal complexity, that is easier to manage.
The only thing I go to Claude these days, is for front-end code (HTML). Here also, it gives too much of CSS code (60% of the file size), but I'm OK with that as it gives a bit of polished look, though heavy on file size.
cpard
All the *Evolve publications have very impressive results but from the time I’ve spent on the information published I feel that the attention goes to the LLMs and the AI side of things, although the outcomes reported are in almost all cases the result of very well designed environments for both the LLM and the evolutionary algorithm to work well.
This paper here is a great example of that and it’s worth a reading.
How many times we have to hear again about Erdös problems? :) It sounds like a great achievement for humanity at first, but after a while they keep coming back!
show comments
igorpcosta
There's not a lot of opportunities in this space yet. This is the closest we can get to High degree solver kinda of problem.
There are only 3 companies doing this to date: Google, Sakana AI and Autohand AI.
stijntonk
I wish that Google would focus on bringing their Gemini 3.x models to GA, and provide enough capacity such that one not constantly has to fight with 429 errors.
It often feels like they do not want me to develop applications for corporate clients using their Vertex API. It is just such a shame, given that their models were so great for document analysis etc.
show comments
sjhalani7
This is crazy- the fact that it is helping with stuff like quantum too is huge!
brkn
I would be interested to see how exactly the agent helped. How was it used, where did it lead to the given improvement and in how far would it have taken a human to come to the same solution.
AlphaEvolve couples map-elites with LLMs. It's an key step in machine learning, in the vein of DQN for reinforcement learning.
AE brings diversity from the genetic algorithms community to large scale optmized deep learning and RL models.
It is a mandatory step for moving forward. The approach is clean and simple, while generic.
The only caveats is the per optimization problem definition of the map élites dimensions. But surely, this will get tackled somehow over the next few years.
If you don't know about map-elites, go look up Jean-Baptiste Mouret' s work and talks, it's both very interesting and universal.
arian_
We went from 'AI will replace programmers' to 'AI will help programmers' to 'AI writes code while other AI reviews it' in about 18 months. At this rate the humans are just providing the electricity.
baq
RSI is here on the hardware level and on software level. Sprinkle with a couple algorithmic breakthroughs and results are nigh unimaginable.
From the comments it seems that this community (mostly career software people) is starting to move into a new phase of grief about the median software engineer losing their hoped for permanent place in society.
-2021-2024 was Denial
-2024-2025 was Anger and Bargaining
-2026 seems to be some combo of anger, bargaining and acceptance depending mostly on your class/age
show comments
guybedo
and yet Gemini still can't code
maxothex
What I'm most curious about is how this translates to messy, real-world codebases without well-defined metrics. Most production software isn't chip design or kernel optimization - it's business logic with unclear success criteria. The infrastructure story is impressive, but I'd love to see how they handle domains where the evaluation function itself is ambiguous.
svieira
> In advertising and marketing, WPP used AlphaEvolve to refine AI model components, navigating complex, high-dimensional campaign data and achieving 10% accuracy gains over their competitive manual model optimizations.
Ah good, we're getting closer and closer to Venus, Inc. every day. /s
This reminds me of Antirez's "Don't fall into the anti-AI hype" [0]
In a sentence: These foundation models are really good at optimizing these extremely high level, extremely well defined problem spaces (ie multiply matrices faster). In Antirez's case, it's "make Redis faster".
There have been two reactions: "Oh it would never work for me" and "I have seen months of my life accomplished in an hour", and I think they're both right. I think we should be excited for Antirez, (who has since been popping off [1]), and I think the rest of us should rest easy knowing that LLM's can't (and maybe were never meant to) tackle the tacit-knowledge-filled, human-system-centric, ambiguously-defined-problem-space jobs most mortals work.
[0] https://antirez.com/news/158 [1] https://antirez.com/news/164
The AI CEOs love to pontificate about AI curing cancer, but it seems like DeepMind is the only one actively working on these research problems, while OpenAI/Anthropic largely chase enterprise/coding revenue.
Are Googlers themselves happy using Gemini coding agent instead of Claude Code or Codex? (no snark, I'm really asking)
AI improving itself (or at least the architecture it runs on), the singularity is near as they say.
Do we have other examples of AI being used to improve the LLMs, apart for the creation of synthetic data and the testing of the models?
An issue I have been noticing with claude is, for simple tasks, it gives extremely bloated code and artifacts, which sometime does not even work. Gemini balances it quite well, by giving a working solution with the exact amount of code and minimal complexity, that is easier to manage.
The only thing I go to Claude these days, is for front-end code (HTML). Here also, it gives too much of CSS code (60% of the file size), but I'm OK with that as it gives a bit of polished look, though heavy on file size.
All the *Evolve publications have very impressive results but from the time I’ve spent on the information published I feel that the attention goes to the LLMs and the AI side of things, although the outcomes reported are in almost all cases the result of very well designed environments for both the LLM and the evolutionary algorithm to work well.
This paper here is a great example of that and it’s worth a reading.
Magellan: Autonomous Discovery of Novel Compiler Optimization Heuristics with AlphaEvolve https://arxiv.org/abs/2601.21096
How many times we have to hear again about Erdös problems? :) It sounds like a great achievement for humanity at first, but after a while they keep coming back!
There's not a lot of opportunities in this space yet. This is the closest we can get to High degree solver kinda of problem.
There are only 3 companies doing this to date: Google, Sakana AI and Autohand AI.
I wish that Google would focus on bringing their Gemini 3.x models to GA, and provide enough capacity such that one not constantly has to fight with 429 errors.
It often feels like they do not want me to develop applications for corporate clients using their Vertex API. It is just such a shame, given that their models were so great for document analysis etc.
This is crazy- the fact that it is helping with stuff like quantum too is huge!
I would be interested to see how exactly the agent helped. How was it used, where did it lead to the given improvement and in how far would it have taken a human to come to the same solution.
A fantastically simple solution to improving algorithms, I wish I had this years ago in activation engineering: https://blog.n.ichol.ai/llm-activation-engineering-an-easy-f...
How do I access AlphaEvolve?
AlphaEvolve couples map-elites with LLMs. It's an key step in machine learning, in the vein of DQN for reinforcement learning.
AE brings diversity from the genetic algorithms community to large scale optmized deep learning and RL models.
It is a mandatory step for moving forward. The approach is clean and simple, while generic.
The only caveats is the per optimization problem definition of the map élites dimensions. But surely, this will get tackled somehow over the next few years.
If you don't know about map-elites, go look up Jean-Baptiste Mouret' s work and talks, it's both very interesting and universal.
We went from 'AI will replace programmers' to 'AI will help programmers' to 'AI writes code while other AI reviews it' in about 18 months. At this rate the humans are just providing the electricity.
RSI is here on the hardware level and on software level. Sprinkle with a couple algorithmic breakthroughs and results are nigh unimaginable.
Meanwhile Gemini CLI has been broken for months!
https://github.com/google-gemini/gemini-cli/issues/22141
Welcome to HN @berlianta; TIL green username === new user in HN; Stories posted by new users are called noobstories [1];
[1]: https://news.ycombinator.com/noobstories
From the comments it seems that this community (mostly career software people) is starting to move into a new phase of grief about the median software engineer losing their hoped for permanent place in society.
-2021-2024 was Denial
-2024-2025 was Anger and Bargaining
-2026 seems to be some combo of anger, bargaining and acceptance depending mostly on your class/age
and yet Gemini still can't code
What I'm most curious about is how this translates to messy, real-world codebases without well-defined metrics. Most production software isn't chip design or kernel optimization - it's business logic with unclear success criteria. The infrastructure story is impressive, but I'd love to see how they handle domains where the evaluation function itself is ambiguous.
> In advertising and marketing, WPP used AlphaEvolve to refine AI model components, navigating complex, high-dimensional campaign data and achieving 10% accuracy gains over their competitive manual model optimizations.
Ah good, we're getting closer and closer to Venus, Inc. every day. /s