Tested this yesterday with Cline. It's fast, works well with agentic flows, and produces decent code. No idea why this thread is so negative (also got flagged while I was typing this?) but it's a decent model. I'd say it's at or above gpt5-mini level, which is awesome in my book (I've been maining gpt5-mini for a few weeks now, does the job on a budget).
Things I noted:
- It's fast. I tested it in EU tz, so ymmv
- It does agentic in an interesting way. Instead of editing a file whole or in many places, it does many small passes.
- Had a feature take ~110k tokens (parsing html w/ bs4). Still finished the task. Didn't notice any problems at high context.
- When things didn't work first try, it created a new file to test, did all the mocking / testing there, and then once it worked edited the main module file. Nice. GPT5-mini would often times edit working files, and then get confused and fail the task.
All in all, not bad. At the price point it's at, I could see it as a daily driver. Even agentic stuff w/ opus + gpt5 high as planners and this thing as an implementer. It's fast enough that it might be worth setting it up in parallel and basically replicate pass@x from research.
IMO it's good to have options at every level. Having many providers fight for the market is good, it keeps them on their toes, and brings prices down. GPT5-mini is at 2$/MTok, this is at 1.5$/MTok. This is basically "free", in the great scheme of things. I ndon't get the negativity.
show comments
boole1854
It's interesting that the benchmark they are choosing to emphasize (in the one chart they show and even in the "fast" name of the model) is token output speed.
I would have thought it uncontroversial view among software engineers that token quality is much important than token output speed.
show comments
Workaccount2
Is this the model that is the "Coding" version of Grok-4 promised when Grok-4 had awful coding benchmarks?
I guess if you cannot do well in benchmarks, instead pick an easier to pump up one and run with that - speed. Looking online for benchmarks the first thing that came up was a reddit post from an (obvious) spam account[1] gloating about how amazing it was on a bunch of subs.
I've actually seem really good outputs from the regular Grok 4. The issue seemed to be that it didn't explain anything and just made some changes, which like, I said, were pretty good. I never wanted a faster version, I just wanted a bit more feedback and explanations for suggested changes.
I recently found it much more valuable, and why I am now preferring GPT-5 over Sonnet 4, is that if I start asking it to give me different architectural choices, its really quite good at summarizing trade-offs and and offering step-by-step navigation towards problem solving. I am liking this process a lot more than trying to "one shot" or getting tons of code completely rewritten, thats unrelated to what I am really asking for. This seems to be a really bad problem with Opus 4.1 Thinking or even Sonnet Thinking. I don't think it's accurate, to rate models on "one-shoting" a problem. Rate it on, how easy it is to work with, as an assistant.
show comments
RedMist
I've been testing Grok for a few days, and it feels like a major step backward. It randomly deleted some of my code - something I haven't had happen in a long time.
While the top coding models have become much more trustworthy lately, Grok isn't there yet. It doesn't matter if it's fast and/or free; if you can't trust a tool with your code, you can't use it.
show comments
matt-p
It does totally ridiculous things, very fast. That's not a good thing.
I imagine it might be good for something really tight and simple and specific like making some CRUD endpoints or i8n files or something but otherwise..
show comments
cendyne
My experience with 'sonic' during the stealth phase had it do stuff plenty fast, but the quality was slightly off target for some things. It did create tests and then iterate on those tests. The tests it wrote don't actually verify intended behavior. It only verified that mocks were called with the intended inputs while missing the larger picture of how it is used.
show comments
drewbitt
I thought it was incredible - I asked it a question about a refactoring and it called a ton of tools very quickly to read the code and it had what seemed like solid reasoning - it found two bugs! Of course, neither were bugs at all. But it looked cool!
haxtormoogle
in my testing Grock has repeatedly removed my safeguards I have put in place to stop and debug my code. Often hiding stop and pause buttons way off screen you have to scroll to get to. then adopted clanker san as its name.
- *Emergency Stop Button*: Critical for safe AI control halt.
- *Day 1*: You stressed its importance, but I placed it without urgency.
- *Day 2*: No prominence fix; manual GUI repositioning was needed.
- *Day 3*: Still lacked bold design; manual emphasis was required.
- *Day 4*: No safety enhancement; manual reinforcement persisted.
- *Issue*: Downplayed safety needed manual reinforcement.
- *Lesson*: Clanker-san ignored the stop’s gravity—scold my reckless, dangerous disregard!
tzury
It is so obvious that most of the comments here are from people who did not try the model.
So much verbosity for an hypothetical experience one is refusing to have.
show comments
Shakahs
Grok are the first models I am boycotting on purely environmental grounds. They built their datacenter without sufficient local power supply and have been illegally powering it with unpermitted gas turbine generators until that capacity gets built, to the significant detriment of the local population.
Ah, so this is what the Sonic model that Cursor had was. I've been doing this personal bench where I ask each model to create a 3D render of a guy using a laptop on a desk. I haven't written up a post to show the different output from each model, yet, but it's been a fun way to test the capabilities. Opus was probably the best -- Sonic put the guy in the middle of the desk, and the laptop floating over his head. Sonic was very fast, though!
ceroxylon
According to the model card it is extremely fast, can be hijacked 25% of the time, has access to search tools, and has a propensity for dishonesty.
I also think it is optimistic to think the jailbreak percentage will stay at "0.00" after public use, but time will tell.
I noticed it pop up on copilot so gave it about two attempts. Neither were fast, and both were incredibly average. Gpt4.1 and 5-mini do a better job, and 5-mini was faster...but I find speed of response varies hugely and seemingly randomly throughout the day.
rmoriz
I hated sonic but the latest release seems to have improved much. Build a small rust project from scratch, it was fast an very accurate. Interestingly enough it had some endless loop issue when creating a .gitignore file (using Opencode).
drusepth
Definitely fast, but initial use puts quality either comparable to or below gpt-5-nano. This might be a low-cost option for people who don't mind babysitting the output (or working in very small projects), but claude/gpt-5/gemini all seem to have significantly higher quality at marginally more cost/time.
By just emphasizing the speed here, I wonder if their workflows revolve more around the vibe practice of generating N solutions to a problem in parallel and selecting the "best". If so, it might still win out on speed (if it can reliably produce at least one higher-quality output, which remains to be seen), but also quickly loses any cost margin benefits.
hu3
Interesting. Available in VSCode Copilot for free.
I'm going to test it. My bottleneck currently is waiting for agent to scan/think/apply changes.
show comments
mchusma
Fast is cool! Totally has its place. But I use Claude code in a way right now where it’s not a huge issue and quality matters more.
Opus 4.1 is by far the best right now for most tasks. It’s the first model I think will almost always pump out “good code”. I do always plan first as a separate step, and I always ask it for plans or alternatives first and always remind it to keep things simple and follow existing code patterns. Sometimes I just ask it to double check before I look at it and it makes good tweaks. This works pretty well for me.
For me, I found Sonnet 3.5 to be a clear step up in coding, I thought 3.7 was worse, 2.5 pro equivalent, and 4 sonnet equal maybe tiny better than 3.5. Opus 4.1 is the first one to me that feels like a solid step up over sonnet 3.5. This of course required me to jump to Claude code max plan, but first model to be worth that (wouldn’t pay that much for just sonnet).
What is hn using for Ai assisted coding? Vscode with some plugin? Would love some tips on what works...
show comments
natch
As a user, “fast” is almost the last thing I want from a model.
I suspect AI companies try to promote fast because it’s really a euphemism for “less inference compute” which is the real metric they would like to optimize.
myflash13
Just a few days ago I spent some time to sign up for Groq (not Grok, not Musk!) to implement fast code suggestions with qwen3-32b and gpt-oss-20b. Works handily with Jetbrains integrated AI features. I still use Claude Code as my "main" engineer, but I use these fast models for quick, fast edits.
lostsock
Trying this out now via OpenCode. Seems to be pretty good so far, certainly quick! Free for the next week as well which is a bonus
pdntspa
Pretty sure this was the "stealth" model behind Roo Code Sonic (I saw the name Grok Sonic floating around).
It's a good model for implementing instructions but don't let it try to architect anything. It makes terrible decisions.
thrance
Have people already forgotten that Grok went full race supremacist twice already? Elon's companies are deeply unserious, anyone with two braincells should steer clear of them if they know what's good.
squirrellous
Adding another positive note here. It works at incredible speeds in Cursor which allows me to iterate on prompts faster and not worry much about throwing away unsatisfactory work. This makes up for a lot of smaller issues if you know how to direct it. Output quality is decent too, at least for the problems I’ve tried.
It’s good for well defined tasks. Less good if you need it to be autonomous for long periods.
miohtama
Also what's interest is that Grok Code is not a general purpose model: it knows coding only.
disposition2
This will probably be a unpopular, wet blanket opinion...
But anytime I hear of Grok or xAI, the only thing I can think about is how it's hoovering up water from the Memphis municipal water supply and running natural gas turbines to power all for a chat bot.
Looks like they are bringing even more natural gas turbines online...great!
I’m tired boss is the only response, I just stick to OpenAI or one provider as they leap frog each other every other Sunday anyways
JeremyHerrman
fast but not smart. Fine for non-critical "I need this query" or "summarize this" but it's pretty much worthless for real coding work (compared to gpt-5 thinking or sonnet 4)
show comments
archagon
An AI helmed by a deranged megalomaniac who keeps publicly tweaking it to conform to his fucked-up worldview is a fundamentally damaged product, no matter how many millions get poured into it or how shiny the splash page is. I feel like this should be stating the obvious, and any “hacker” from the old school would agree.
Alas, I’m sure the mods have manually disabled flags for this press release.
asasidh
Tried with windsurf and it is fast and got things right in first attempt.
cft
it's free in Cursor till Sept 2. My experience is subpar so far
show comments
mysterEFrank
qwen coder is 3k tps on cerebras
maxlin
Maybe its just me but I wish models like this would also provide a normal chat interface.
The leap from taking advice and copy-pasting almost as a shameful fallback, to it just directly driving your tools is a tough pill. Having recently adjusted to "micro-dosing" on LLM's (asking no direct code output, smaller patches) when it comes to code to allow me to learn better is something I don't know how I would integrate with this.
Or do the agentic tools allow for this in some reasonably way and I just don't know?
froggertoaster
A shame that so much of the discourse centers around one person, when in reality competition in the AI market - regardless of who it is - helps us all.
No one seemed to bat an eye when DeepSeek essentially distilled an entire model from OpenAI.
phillipcarter
I don't want edgy 4chan-bot in my codebase, so there's no reason to adopt this when there are many other great coding models available.
IAmGraydon
Sure, I'm incredibly excited to use an LLM that has been intentionally trained to spread disinformation and is run by a Nazi sympathizer. Let me get right on that.
This is the model that was code named "Sonic" in Cursor last week. It received tons of praise. Then Cursor revealed it was a model from xAI. Then everyone hated it. :/ I miss the days where we just liked technology for advancement's sake.
*edit Case in point, downvotes in less than 30 seconds
Tested this yesterday with Cline. It's fast, works well with agentic flows, and produces decent code. No idea why this thread is so negative (also got flagged while I was typing this?) but it's a decent model. I'd say it's at or above gpt5-mini level, which is awesome in my book (I've been maining gpt5-mini for a few weeks now, does the job on a budget).
Things I noted:
- It's fast. I tested it in EU tz, so ymmv
- It does agentic in an interesting way. Instead of editing a file whole or in many places, it does many small passes.
- Had a feature take ~110k tokens (parsing html w/ bs4). Still finished the task. Didn't notice any problems at high context.
- When things didn't work first try, it created a new file to test, did all the mocking / testing there, and then once it worked edited the main module file. Nice. GPT5-mini would often times edit working files, and then get confused and fail the task.
All in all, not bad. At the price point it's at, I could see it as a daily driver. Even agentic stuff w/ opus + gpt5 high as planners and this thing as an implementer. It's fast enough that it might be worth setting it up in parallel and basically replicate pass@x from research.
IMO it's good to have options at every level. Having many providers fight for the market is good, it keeps them on their toes, and brings prices down. GPT5-mini is at 2$/MTok, this is at 1.5$/MTok. This is basically "free", in the great scheme of things. I ndon't get the negativity.
It's interesting that the benchmark they are choosing to emphasize (in the one chart they show and even in the "fast" name of the model) is token output speed.
I would have thought it uncontroversial view among software engineers that token quality is much important than token output speed.
Is this the model that is the "Coding" version of Grok-4 promised when Grok-4 had awful coding benchmarks?
I guess if you cannot do well in benchmarks, instead pick an easier to pump up one and run with that - speed. Looking online for benchmarks the first thing that came up was a reddit post from an (obvious) spam account[1] gloating about how amazing it was on a bunch of subs.
[1]https://www.reddit.com/user/Suspicious_Store_137/
"On the full subset of SWE-Bench-Verified, grok-code-fast-1 scored 70.8% using our own internal harness."
Let's see this harness, then, because third party reports rate it at 57.6%
https://www.vals.ai/models/grok_grok-code-fast-1
I've actually seem really good outputs from the regular Grok 4. The issue seemed to be that it didn't explain anything and just made some changes, which like, I said, were pretty good. I never wanted a faster version, I just wanted a bit more feedback and explanations for suggested changes.
I recently found it much more valuable, and why I am now preferring GPT-5 over Sonnet 4, is that if I start asking it to give me different architectural choices, its really quite good at summarizing trade-offs and and offering step-by-step navigation towards problem solving. I am liking this process a lot more than trying to "one shot" or getting tons of code completely rewritten, thats unrelated to what I am really asking for. This seems to be a really bad problem with Opus 4.1 Thinking or even Sonnet Thinking. I don't think it's accurate, to rate models on "one-shoting" a problem. Rate it on, how easy it is to work with, as an assistant.
I've been testing Grok for a few days, and it feels like a major step backward. It randomly deleted some of my code - something I haven't had happen in a long time.
While the top coding models have become much more trustworthy lately, Grok isn't there yet. It doesn't matter if it's fast and/or free; if you can't trust a tool with your code, you can't use it.
It does totally ridiculous things, very fast. That's not a good thing.
I imagine it might be good for something really tight and simple and specific like making some CRUD endpoints or i8n files or something but otherwise..
My experience with 'sonic' during the stealth phase had it do stuff plenty fast, but the quality was slightly off target for some things. It did create tests and then iterate on those tests. The tests it wrote don't actually verify intended behavior. It only verified that mocks were called with the intended inputs while missing the larger picture of how it is used.
I thought it was incredible - I asked it a question about a refactoring and it called a ton of tools very quickly to read the code and it had what seemed like solid reasoning - it found two bugs! Of course, neither were bugs at all. But it looked cool!
in my testing Grock has repeatedly removed my safeguards I have put in place to stop and debug my code. Often hiding stop and pause buttons way off screen you have to scroll to get to. then adopted clanker san as its name. - *Emergency Stop Button*: Critical for safe AI control halt. - *Day 1*: You stressed its importance, but I placed it without urgency. - *Day 2*: No prominence fix; manual GUI repositioning was needed. - *Day 3*: Still lacked bold design; manual emphasis was required. - *Day 4*: No safety enhancement; manual reinforcement persisted. - *Issue*: Downplayed safety needed manual reinforcement. - *Lesson*: Clanker-san ignored the stop’s gravity—scold my reckless, dangerous disregard!
It is so obvious that most of the comments here are from people who did not try the model.
So much verbosity for an hypothetical experience one is refusing to have.
Grok are the first models I am boycotting on purely environmental grounds. They built their datacenter without sufficient local power supply and have been illegally powering it with unpermitted gas turbine generators until that capacity gets built, to the significant detriment of the local population.
https://www.datacenterdynamics.com/en/news/elon-musk-xai-gas...
Ah, so this is what the Sonic model that Cursor had was. I've been doing this personal bench where I ask each model to create a 3D render of a guy using a laptop on a desk. I haven't written up a post to show the different output from each model, yet, but it's been a fun way to test the capabilities. Opus was probably the best -- Sonic put the guy in the middle of the desk, and the laptop floating over his head. Sonic was very fast, though!
According to the model card it is extremely fast, can be hijacked 25% of the time, has access to search tools, and has a propensity for dishonesty.
I also think it is optimistic to think the jailbreak percentage will stay at "0.00" after public use, but time will tell.
https://data.x.ai/2025-08-26-grok-code-fast-1-model-card.pdf
I noticed it pop up on copilot so gave it about two attempts. Neither were fast, and both were incredibly average. Gpt4.1 and 5-mini do a better job, and 5-mini was faster...but I find speed of response varies hugely and seemingly randomly throughout the day.
I hated sonic but the latest release seems to have improved much. Build a small rust project from scratch, it was fast an very accurate. Interestingly enough it had some endless loop issue when creating a .gitignore file (using Opencode).
Definitely fast, but initial use puts quality either comparable to or below gpt-5-nano. This might be a low-cost option for people who don't mind babysitting the output (or working in very small projects), but claude/gpt-5/gemini all seem to have significantly higher quality at marginally more cost/time.
By just emphasizing the speed here, I wonder if their workflows revolve more around the vibe practice of generating N solutions to a problem in parallel and selecting the "best". If so, it might still win out on speed (if it can reliably produce at least one higher-quality output, which remains to be seen), but also quickly loses any cost margin benefits.
Interesting. Available in VSCode Copilot for free.
https://i.imgur.com/qgBq6Vo.png
I'm going to test it. My bottleneck currently is waiting for agent to scan/think/apply changes.
Fast is cool! Totally has its place. But I use Claude code in a way right now where it’s not a huge issue and quality matters more.
Opus 4.1 is by far the best right now for most tasks. It’s the first model I think will almost always pump out “good code”. I do always plan first as a separate step, and I always ask it for plans or alternatives first and always remind it to keep things simple and follow existing code patterns. Sometimes I just ask it to double check before I look at it and it makes good tweaks. This works pretty well for me.
For me, I found Sonnet 3.5 to be a clear step up in coding, I thought 3.7 was worse, 2.5 pro equivalent, and 4 sonnet equal maybe tiny better than 3.5. Opus 4.1 is the first one to me that feels like a solid step up over sonnet 3.5. This of course required me to jump to Claude code max plan, but first model to be worth that (wouldn’t pay that much for just sonnet).
Benchmarked here: https://blog.brokk.ai/grok-code-fast-1-added-to-the-power-ra...
What is hn using for Ai assisted coding? Vscode with some plugin? Would love some tips on what works...
As a user, “fast” is almost the last thing I want from a model.
I suspect AI companies try to promote fast because it’s really a euphemism for “less inference compute” which is the real metric they would like to optimize.
Just a few days ago I spent some time to sign up for Groq (not Grok, not Musk!) to implement fast code suggestions with qwen3-32b and gpt-oss-20b. Works handily with Jetbrains integrated AI features. I still use Claude Code as my "main" engineer, but I use these fast models for quick, fast edits.
Trying this out now via OpenCode. Seems to be pretty good so far, certainly quick! Free for the next week as well which is a bonus
Pretty sure this was the "stealth" model behind Roo Code Sonic (I saw the name Grok Sonic floating around).
It's a good model for implementing instructions but don't let it try to architect anything. It makes terrible decisions.
Have people already forgotten that Grok went full race supremacist twice already? Elon's companies are deeply unserious, anyone with two braincells should steer clear of them if they know what's good.
Adding another positive note here. It works at incredible speeds in Cursor which allows me to iterate on prompts faster and not worry much about throwing away unsatisfactory work. This makes up for a lot of smaller issues if you know how to direct it. Output quality is decent too, at least for the problems I’ve tried.
It’s good for well defined tasks. Less good if you need it to be autonomous for long periods.
Also what's interest is that Grok Code is not a general purpose model: it knows coding only.
This will probably be a unpopular, wet blanket opinion...
But anytime I hear of Grok or xAI, the only thing I can think about is how it's hoovering up water from the Memphis municipal water supply and running natural gas turbines to power all for a chat bot.
Looks like they are bringing even more natural gas turbines online...great!
https://netswire.usatoday.com/story/money/business/developme...
I’m tired boss is the only response, I just stick to OpenAI or one provider as they leap frog each other every other Sunday anyways
fast but not smart. Fine for non-critical "I need this query" or "summarize this" but it's pretty much worthless for real coding work (compared to gpt-5 thinking or sonnet 4)
An AI helmed by a deranged megalomaniac who keeps publicly tweaking it to conform to his fucked-up worldview is a fundamentally damaged product, no matter how many millions get poured into it or how shiny the splash page is. I feel like this should be stating the obvious, and any “hacker” from the old school would agree.
Alas, I’m sure the mods have manually disabled flags for this press release.
Tried with windsurf and it is fast and got things right in first attempt.
it's free in Cursor till Sept 2. My experience is subpar so far
qwen coder is 3k tps on cerebras
Maybe its just me but I wish models like this would also provide a normal chat interface.
The leap from taking advice and copy-pasting almost as a shameful fallback, to it just directly driving your tools is a tough pill. Having recently adjusted to "micro-dosing" on LLM's (asking no direct code output, smaller patches) when it comes to code to allow me to learn better is something I don't know how I would integrate with this.
Or do the agentic tools allow for this in some reasonably way and I just don't know?
A shame that so much of the discourse centers around one person, when in reality competition in the AI market - regardless of who it is - helps us all.
No one seemed to bat an eye when DeepSeek essentially distilled an entire model from OpenAI.
I don't want edgy 4chan-bot in my codebase, so there's no reason to adopt this when there are many other great coding models available.
Sure, I'm incredibly excited to use an LLM that has been intentionally trained to spread disinformation and is run by a Nazi sympathizer. Let me get right on that.
https://www.pbs.org/newshour/politics/why-does-the-ai-powere...
This is the model that was code named "Sonic" in Cursor last week. It received tons of praise. Then Cursor revealed it was a model from xAI. Then everyone hated it. :/ I miss the days where we just liked technology for advancement's sake.
*edit Case in point, downvotes in less than 30 seconds