Sakana Fugu

199 points110 comments16 hours ago

agalamli

i've seen many AI models, tried some. i'm genuinely interested in trying this kind of model/architecture. however i'm a little confused about the pricing.

holistio

You pay $200/month to Anthropic, $200/month to OpenAI, $200/month to Cursor, $200/month to $200/month to Google, and seeing that it didn't come to a nice round $1024/month, you pay $200/month to Sakana to coordinate it all, because why not.

While you're at it, feel free to send me $200 as well, I'll generate a crypto address ending with "AI".

show comments

quanto

There are so many derisive comments here.

David Ha, CEO and co-founder, was one of the youngest managing director at Goldman Sachs before doing ML at Google. His ML publications were considered top-notch almost a decade ago. I had high hopes for him when he raised money and founded Sakana.

I do agree with some comments here that perhaps this particular product is not well thought out. I also agree with the criticism that David calls Sakana a frontier AI lab while making money just selling AI B2B applications to Japanese businesses. I also agree with the assessment that Sakana has abrasive and antagonistic, sometimes openly hostile, recruiting tactics. I also agree that his then-impressive publications may have lost their luster in the age of LLMs.

However, the man is clearly driven; and he and his team may have more to offer in future. I admire the man for not taking the conventional AI-research career path.

show comments

cortesi

As a developer outside the US I think it's vital to have alternatives to OpenAI and Anthropic, but sadly this is not it. For $200/month you get < 3 hours of use per week, the API is extremely slow, and the output quality in my tests is nowhere near Fable. It's nowhere remotely near usable as a day-to-day workhorse. Very disappointing.

https://x.com/cortesi/status/2068898694238486658

show comments

njoyablpnting

Looking at the technical report I'm a bit confused. The improvement from using their orchestrator models seems minimal (in some cases lower than just the model which I'm assuming is in the orchestrator's pool?). Maybe it's sort of acting as an additional reasoning step upfront? Sort of like how if you asked Claude to create a plan for how best to prompt itself, you would probably end up with a better result than just the base prompt.

Also, from the technical report, looks like they're training on the output of Claude Code, etc. I'm guessing this doesn't violate TOS because they're technically not a directly competing model. This brings me to what I see as the main risk with this service, which is that it seems like an easy thing for a frontier lab to make obsolete, either by models beginning to converge in terms of strengths or by improving their own harnesses to include more of this meta-reasoning.

blixt

I tried running this for some market research for my startup and it did a pretty nice job. It didn't necessarily find any obscure data, and it seemed to rely on older data than what I could find myself. On top of this, it had the same sycophantic tendencies as most LLMs these days (explaining why your idea is great and riffing on that), which I find to be unnecessary use of resources.

All put together, paying ~$60 to get a hit-or-miss report seems a bit excessive, but obviously as the models they use under the hood get better it becomes more and more worth it, assuming they also improve their grounding/search capabilities.

I'm a big fan of Sakana though, and have followed David Ha / @hardmaru since the world models papers (with the racing car game and the Doom clone), which were incredible at the time.

epsteingpt

Beta user: they piloted OpenRouter fusion before it was seen as the viable step. Everyone's understood for months now that having different models check each other is the best path forward.

This gets you that in a nice neat package, without the underlying tinkering mechanics.

If (big iff) the usage mechanics work out, then this is actually a really good anti-big-model strategy.

They'll be incentivized for your success, not token-maximizing for their investors.

The team is super smart too. What's not to like?

Wishing them the best on launch.

show comments

embedding-shape

> Frontier-level performance without single-vendor dependency. [...] Plug collective intelligence directly into your workflows today with a single API.

Does multiple vendors run this "single API" or how is this not replacing a single-vendor dependency for another single-vendor dependency?

prodigycorp

ngl, I thought sakana.ai was doing cooler stuff than this. that said, the release of a product like this makes sense because it follows your natural intuition when using these models. The best way to use LLMs is to have at least two in your pocket, because the models do a good job at covering each others assets and filling in obvious model-specific blindspots.

it's interesting that they're offering in the form of fixed cost subscription plans too. My impression was that the first party providers can do this because they api inference margins to the tune of 80ish percent. Anyone else orchestrating on top of these models have to pass through these costs or eat it themselves.

Lwrless

Got myself the $20 subscription and tried it out. The 5-hour limit runs out surprisingly fast. Quality is okay but it feels slow, and even with my $20 Claude subscription on Fable, the credit usage ends up being lower. Fable usually catches issues in my Opus 4.8-generated code that I'd miss otherwise, but Fugu didn't. Makes me wonder if it's really at the Fable level. Hard to see the value here.

jordemort

Fugu, eh? So there’s a nonzero chance this thing might kill me?

mark_l_watson

Nice idea but expensive. It looks like they don’t add very low cost models like DeepSeek v4 flash into their mix.

After a few months of spending money on the best frontier models, now I am spending time using DeepSeek v4 flash as my workhorse, and flipping to more capable (but still very inexpensive) open models on an as-needed basis. We all make our own tool selection decisions, but for me, I feel happier and enjoy working more following the very fast response and ultra low cost path.

show comments

monkeydust

Imho there are two dimensions here: Firstly different LLMs and secondly the strategy in which you break down the problem in an agentic fashion (e.g. break up to separate agents with own persona and then judge evaluates across all agents). You can of course mix-up the dimensions as well and that's what I have been tinkering* with for a good few months with some success. This was all done using home-brew setup running on openrouter.

Personally I prefer understanding the dimensions and the interplay and controlling it though can see why openrouter and others are now offering this a solved solution.

Just be careful when you start outsourcing too much of your intelligence needs to a blackbox.

* https://github.com/monkeydust/rightmind

show comments

GolfPopper

This is a joke, right?

show comments

hmokiguess

How do you configure it to run with pi or claude code? I'm curious to try it (via subscription ideally)

EDIT: Found something here https://dev.classmethod.jp/en/articles/sakana-fugu-ga-first-...

eevmanu

Reminds me of <https://github.com/irthomasthomas/llm-consortium>

show comments

david_shi

Their research around building a domain specific model is pretty cool, it's kind of like Karpathy's autoresearch but pointed at deciding the optimal model to use at each step of the inference.

If cost becomes an even bigger problem being able to choose "best performance possible" or "strong but cost effective" will be useful.

https://arxiv.org/pdf/2512.04695

ed_mercer

So basically... openrouter?

show comments

andai

See also: OpenRouter Fusion, similar idea, although it seems limited to internet research tasks? (Unclear, maybe someone who has used it can elaborate.)

What's nice is that OpenRouter included a pareto graph showing the cost as well as the performance. (But not time, unfortunately -- model fusion adds a large factor to round trip time.) Benchmarks are a lot less helpful without that.

OpenRouter: Surpassing frontier performance with fusion (blog post with benchmarks)

https://news.ycombinator.com/item?id=48525392

OpenRouter Fusion API

https://news.ycombinator.com/item?id=48537641

See also: Sibling comment with an open source implementation

https://news.ycombinator.com/item?id=48624782#48629598

I did my own last weekend in a few lines of Python, though I haven't tested it much yet. (Looking for some very hard, very cheap benchmarks, if such a thing exists!)

claw-el

Will Le Chat try to eat Sakana? There is Le Chaton Fat and then there is Sakana Fugu too..

show comments

chvid

This would have been much more interesting and impactful if it had relied on open source models rather than commercial models that are only availble via an API.

The reasoning chains could have been used, and the resulting combined model could easily and effectively have been distilled.

panorama22

Is this the beginning of the Hyperion TechnoCore?

adamnemecek

Seems kinda underwhelming considering they raised like $400M.

show comments

nickandbro

Very interesting. I wonder if its kinda functions similarly to how OpenRouter's fusion API does. Hopefully isn't too long to respond.

show comments

bprasanna

Isn't this what perplexity is?

show comments

puttycat

Can someone explain this in layman terms? I don't understand any of it

show comments

dancemethis

Sakana is certainly a choice of a name. In portuguese, goes from anything between "scoundrel" to "sleazebag".

Lethalman

https://xkcd.com/927/

show comments

rvz

Just letting you guys know that the model is not a moat.

nixosbestos

AI noob question, is this like Amp? I just use Amp, I ask it to do neat stuff and it does it. I desperately need to invest in my AI skills but every day I open two new tabs and add it to "AI stuff" folder, and then go back to drowning in work to do.

71bw

And yet, as per usual...

     Not yet available in the EU/EEA while we work toward compliance with GDPR and EU-specific regulations.

show comments

ljlolel

I’ve also developed and open-sourced Mythos level model using fusion/synthesis on TrustedRouter

https://trustedrouter.com/blog/fusion-evals-open-source

chenzhekl

I probably will never pay to Sakana, as they are involved in military contracts.

https://japannews.yomiuri.co.jp/politics/defense-security/20...

show comments