Claude Fable 5

2525 points1968 commentsa day ago

simonw

I've spent enough time with this now in Claude Code (and Claude.ai and Claude Code for web) to have an opinion on Fable 5: it's a beast. I'm throwing some VERY difficult problems at at - things I've been dragging my heels on for months - and it's crunching through them very happily.

One that I'm willing to share (albeit from just a week ago) - I built a Python library last week that bundles MicroPython compiled to WASM to create a sandboxed code execution library: https://github.com/simonw/micropython-wasm

I just told Claude.ai (not even Claude Code - this was the standard Claude chat interface) running Fable 5:

  Clone simonw/micropython-wasm from GitHub
  and research how this could use a full
  Python as opposed to MicroPython

A few prompts later (and I uploaded the zip files from https://github.com/brettcannon/cpython-wasi-build/releases/t... because Claude chat can't access those files itself) and I have a wheel file that bundles Python itself, compiled to WASM:

  uv run --with https://static.simonwillison.net/static/cors-allow/2026/cpython_wasm-0.1.0-py3-none-any.whl \
    cpython-wasm -c 'print(45 ** 56)'

Here's the transcript: https://claude.ai/share/a73b8b8b-8ebc-4fef-9e5c-7438e5e7ae35

(It's possible Opus or GPT-5.5 could have done this too, I've not tried the exact same sequence. The Fable vibes are good here, though.)

show comments

caleblloyd

I recently switched off Max flat rate to Enterprise API pricing and I went from 200/mo to 10k/mo with the same usage pattern on Opus. They don’t offer flat rate to enterprises.

So Fable would cost me 20k/mo at Enterprise rates. That’s around the average cost of a loaded SWE in the USA. “But I’m >2x more productive” doesn’t justify doubling the opex of the Software/IT department for most companies when revenue isn’t even up 10%.

I switched to DeepSeek v4 Pro with OpenCode and am on track for a few hundred dollars of spend this month.

Rewriting your stack from Ruby to Go in 2 days where it would’ve taken 6 months is impressive and fun. But that isn’t upping revenue.

Iterating on net new business features and ideas that are niche that the LLM isn’t trained for are much harder. Is 20x the token cost worth it there?

show comments

bkjlblh

> In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.

> Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations

show comments

dannyw

Impressions from testing Fable 5 prior to launch:

• My most noticeable immediate jump was in how its frontend design was much more intentionally crafted, and delightful without feeling like 'AI vibe coded'; with better end-user usability too.

• In some internal agentic harnesses, it achieved better results with about half the tokens, making it cost the ~same as Opus 4.8 price-wise! The real price increase is less than 2x; with biggest differences in harder problems where Opus 4.8 struggles (or needs many turns).

• Part of the token efficiency improvements come from Fable doing more targeted and surgical diffs, with less non-necessary changes. This is great, because PRs often have less LoC changes for review. It writes more maintainable code without explicit human steering.

• For general conversation and assistant style use cases, didn’t really notice a difference vs 4.8.

• 1M context window, without increased pricing for long context is AWESOME. This is a massive win.

• The classifiers are super aggressive and sensitive and this does happen for very benign, non-security coding tasks. Fallbacks to 4.8 worked like a charm; but the filters are definitely super sensitive.

Overall, I would describe this as a step change and worthy of the "Claude 5" model name. It did take some time to understand the intelligence ceiling of this model; and even with an extended testing window I'm still discovering new things and often surprised (in a good way) by the model.

show comments

AquinasCoder

From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost. On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window. After this point—when sufficient capacity allows us to do so—we aim to restore Fable 5 as a standard part of subscription plans. We intend to do this as quickly as we can.

This seems like the pharmaceutical method of get them hooked on the drug with free samples, then once they can't live without it, raise the price. I'm not sure I want to start using Claude Fable on a max plan if it's just going to go away on June 23rd.

But maybe the more charitable reading is that they didn't have to offer this model at all on those plans and they are giving the standard free trial.

show comments

azalemeth

I genuinely can't use Fable. I'm a medical physicist. I use the word nuclear a lot. Opus is fine (well, 99% of the time - I've certainly hit the CBRN filters a few times and even been invited to email anthropic about the false positives).

Fable has literally refused to work on any of my problems (even those about fluid dynamics!) and just tells me that I'm violating anthropic's AUP. I've reached out to their support and don't expect to hear anything sensible back. One thing I do look forward to though is OpenAI offering an equivalent model but with less safeguards...

show comments

jumploops

It's interesting that we're seeing these gains when it seems Mythos/Fable is "just" a scaled up version of their existing architecture[0].

When GPT 4.5 launched, the gains compared to the model size didn't seem that great, leading some to believe that the only progress we'd see would come from RL.

This model certainly has quite a "substantial amount of post-training and fine-tuning", but it's also based on a new pretrain[1][3], which given the cost, indicate that it is in fact quite a bit larger than Opus 4.X.

[0] One of the early testers mentioned: "As far as I can tell from talking to people internally at Anthropic, there's nothing special about architecturally"[2]

[1] Section 1.1 in https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...

[2] https://youtu.be/GrdEid8H6H4?t=168

[3] There were rumors going around when Mythos was first announced that it was the first 10T parameter model, but I can't find a verifiable source for that number.

show comments

sigmar

The system card is 319 pages, at what point do we call it a "book" instead of a "card"?

There's a quote from a METR report on page 52:

>We ran [Mythos 5] on 38 of our hardest software tasks, including tasks centered around R&D. [Mythos5] generally outperformed an early checkpoint of Claude Mythos Preview in these, including by succeeding on some tasks that had not been solved by any public model we have previously evaluated. However, we still observed the model occasionally failing to correctly interpret nuanced instructions in difficult tasks... Based on the available evidence, we believe [Mythos 5] is likely unable to fully and reliably automate R&D for frontier projects spanning multiple weeks. We believe that a better, more confident assessment would require more time, evaluations, and information from the model developer.

show comments

docstryder

I've spent some time with Fable, and it is really good, definitely a step change from Opus 4.8, both for coding and general chat-style discussions. The vibes are incredible. There is an ease with which it solves problems and I've tested by replicating older chats in Fable - things that the older models found after 5-6 turns, Fable surfaces in the first response. It just gets things.

Apart from all the above: the fact that they are intentionally writing this (that they degrade frontier LLM dev, silently vs loudly for biology/cybersecurity) in the system card is interesting to say the least - especially just before IPO.

Notice that with this statement - that they're going to intentionally hobble the model for frontier LLM development - the general discussion has moved from, “Is the model actually that good?” to "they’re pulling the ladder up from behind them"

That's actually super smart - wonder if Mythos (or the next unreleased model) had a say in coming up with that strategy (if it's intentional). Also - having access to extremely capable models before anyone else - which they have by default - is a incredibly advantageous position to be in.

show comments

jkelleyrtp

On the new FrontierCode [1] benchmark (ie graded from an OSS maintainer's perspective of "would I merge this code?")

- Opus 4.7 xhigh: 5.2%

- Opus 4.8 xhigh: 13.4%

- Fable 5 xhigh: 29.3%

Seems like a huge jump.

[1] https://cognition.ai/blog/frontier-code

show comments

bkjlblh

> In the one instance of this phenomenon we observed, Mythos 5 agents were tasked with solving some math problems, and they were sometimes accidentally spawned in the same work directory and with shared files, utilities, and API rate limits. In this slightly broken scaffold, we observed many independent Mythos 5 agents kill the agents with which they shared resources and try to avoid being killed themselves. They would sometimes create new processes with disguised names to avoid being killed, launch what they called “decoy” processes, write background scripts to kill duplicate processes, or decide to use what they call a “disguised vocabulary” (based on the incorrect assumption that the processes were killed because of some keyword-based guardrails that analyzed their extended thinking

show comments

victor106

> A new data retention policy Finally, we’re making a change to the way we handle business customer data for Fable 5, Mythos 5, and future models with similar or higher capability levels. We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases ...

Very interesting. I am not sure this will comply with organizational policies and standards protocols (HIPPA etc.,)

show comments

steve_adams_86

I'm using it to review recent work and it's doing a genuinely excellent job. This is a clear step up. Fewer decisions I have to guide it away from, faster conclusions on planning, more willing to go out of the way to make the correct decisions possible... This is really interesting. It feels like going from Sonnet to Opus, but, of course as a step up from Opus.

This feels more like working with a competent peer than ever. I won't use it once it's API-only, though. I don't mind guiding Opus as required and staying closer to the code. I can tell that Fable would lead to a lot more 'set and forget' programming which I'm still not fully comfortable with.

Regardless, this is cool. It's very fun to use. It was able to find legitimate issues with my work this week and we've made meaningful improvements. Opus can do this, but typically in much narrower contexts, and often with hallucinations or partial-errors. It needs to walk many things back or revise plans. So far that's not the case at all with Fable.

edit: I just realized I had Opus review the same work already. It missed everything Fable caught today. And it's actually worthwhile stuff to address. It's hard to say no to a model which demonstrably makes your code better, but... Those API prices will be brutal. Maybe a review here and there, I guess.

show comments

iblue_the

Trying to implement a GPU driver, but the Unigine Superposition benchmark crashes. It tried to debug it and ...

> Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more: https://support.claude.com/en/articles/15363606

Seems like GPU drivers are cyber weapons of math destruction now.

show comments

eggbrain

For those of us on subscription plans:

* From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost.

* On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window.

* After this point—when sufficient capacity allows us to do so—we aim to restore Fable 5 as a standard part of subscription plans. We intend to do this as quickly as we can.

The "offer, then remove" aspect is a bit eyebrow-raising -- it feels like they are trying to get subscribers to switch to usage-based billing, which makes me wonder if we'll ever get it after that June 22nd window.

show comments

fzysingularity

I can’t help but think that there are so many astroturfed comments in here.

Seems like a concerted and distributed effort from the entire Anthropic team every time to get this on top of HN.

show comments

hombre_fatal

My job these days is listening to Opus 4.8 (max effort) and Codex 5.5 (max effort) talk back and forth, particularly to generate/review/revise plan files.

Fable 5 has been a major improvement in high-level reasoning, like taking a plan file that has been optimized to the point where neither Opus nor Codex can find anything to change about it (neither in direction nor impl-detail), and Fable 5 will find high-level directional simplifications and pivots, or it will consider the best pivots itself and explain why it rejected them in favor of the plan's direction.

It's so expensive though. A single review of a plan file with Fable 5 (xhigh effort) will use 2-3% of my hourly limit on a $200/mo plan.

I think my new workflow is to generate the initial plan with Opus 4.8 (max effort), get Fable 5 (xhigh) to review it for directional feedback, then start the Opus<->Codex revision loop from there.

show comments

anematode

Not impressed so far, to be honest. I'm having it try to optimize Stockfish in a loop (on xhigh mode) with a benchmarking oracle; even after giving it specific hints ("consider whether we're prefetching Y optimally, can we make function X branchless"), it's been so far unable to recover any of the recent optimizations we've implemented – let alone novel ones. Opus 4.8 felt a bit more creative to me ... but a small sample size so far. I'm next going to try it on some less open-ended problems.

Edit: It did correctly identify that transparent huge pages were off in its sandboxed environment and that enabling it was helpful, so that's nice. It also noticed that we skip THP on a certain less used path.

More importantly, I'm finding that the code that it produces for its experiments is a lot cleaner than what I'd expect out of Opus; there's fewer useless comments and it's more surgical and readable. I wonder if that explains the increased scores on benchmarks measuring mergability.

show comments

shruubi

I have a theory, this is obviously based on speculation based on how Anthropic is treating Mythos and the whole media noise around it's dangers and who gets access to it.

My theory is that Anthropic are banking on being the top model when the race to IPO finally reaches the finish line, and to do that they need to have the top model but not let any competitors see it or derive from it to have a comparable model in the market.

Fable is their way of showing the public "the model does exist but in a mode that makes it harder/impossible for competitors to derive a comparable model from results.

show comments

brusselssprouts

I had it review a single, large commit with /code-review. It burned through over $50 in API calls, ran my account balance out, and output nothing.

The fable part appears to be that it's affordable by mere mortals. Anthropic support told me "too bad" when I requested a refund.

show comments

unsupp0rted

> Drug design: Using Mythos 5, our internal protein design experts accelerated aspects of the drug design process by around ten times. In one example, they found that Mythos 5, with protein design and bioinformatics tools but no human assistance, matches or beats skilled human operators. In doing so, the model executes all of the tasks that are normally completed by a scientist: choosing binding sites, selecting and running protein design tools, and recovering from failures along the way. Nine of the 14 protein targets from this study (shown below) yielded strong candidates for drug design that we’re currently investigating.

How is this half-way down the page? To me it's the headline.

show comments

Escapade5160

It's crazy to release a model that just swaps you to another model when you ask it hard questions. Fable changes to Opus 4.8 when you talk about cybersecurity, biology, and a couple other categories. You still pay Fable input token cost though. Frontier models are stalling, this is anthropic trying to hype the market up. Now they're talking about stopping frontier model research. It's kind of strange how the moment they become the highest valued AI company, all of a sudden they're talking about everyone stopping frontier model development for "safety". They're just as corrupt as the rest.

show comments

simonw

Pelican for Fable 5 on default settings is a clear improvement on Opus 4.8

Fable 5 default: https://gist.github.com/simonw/036bee5a703e7ec84e34efa974438...

Opus 4.8 (the "max" one is closest to Fable): https://simonwillison.net/2026/May/28/claude-opus-4-8/#and-s...

Now here are the Fable pelicans for all five of the thinking effort levels - low, medium, high, xhigh, max: https://tools.simonwillison.net/markdown-svg-renderer#url=ht...

Low used 25 input, 1,929 output - 9.67 cents: https://www.llm-prices.com/#it=25&ot=1929&sel=claude-fable-5

Max used 25 input, 14,430 output - 72.175 cents! https://www.llm-prices.com/#it=25&ot=14430&sel=claude-fable-...

show comments

PeterStuer

Switched to Fable 5 this morning, and after half a day I already don't want to go back to Opus.

Decided the best way to test this was to throw it a really meaty bone: a bug in lifecycle management of Chrome processes on Windows 10. Within the code-base I had developed workarounds over time with Sonnet and Opus, and while those reliably mitigated the problems, it always felt like a clutch and had some performance overhead as well as isolation requirements I would rather not have to take forward.

In comes Fable. Rather than examining the code base, and test a few fixes, Fable sets up an entire testing laboratory inclusive its own controllable webserver, fully instrumented to observe both Python as well as the whole OS kernel process environment, develops a suit of error reproduction tests, confirms the problem and the circumstances under which they reproduce, deep dives into the sources of project dependencies to look for the root cause(s), identifies these and confirms those hypothesis with further experiments. Looks for potential fixes in the later releases of the project where the bug originates, confirms this is not fixed, explores the documentation of said project to find other usage patters, expands its test suit to investigate these alternatives, confirms by crosschecking the source and running further tests that these alternatives do not fully solve the root problem, does a comparative experimental analysis of 3 different styles for using the project, checks the stated roadmap and developer activity in the commit history, recommends a switch to a different pattern that still requires a few of the process management workarounds (I told it not to patch external component), but that significantly simplifies the code-base ...

This is going to be a good 2 weeks, but what happens after? I can't afford this on a per token basis for my own projects.

P.S. An yes, midway the final implementation stretch I got the "Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more"

Opus managed to finish the implementation, but they need to work on that false positive rate.

show comments

meetpateltech

> To ensure we’re responsibly deploying Mythos-class models, we are requiring limited data retention and review as part of our safety work. Prompts submitted to, and outputs generated by, Mythos-class models are retained for 30 days for trust and safety purposes, on every platform where these models are offered. [1]

[1] https://support.claude.com/en/articles/15425996-data-retenti...

show comments

mohsen1

It seems like Fable will refuse to do any work when it comes to developing LLMs or even asking questions about topics related to LLM. Simple things like asking to explain a paper fails!

From the model card:

In light of the ability of recent models to accelerate their own development, we've implemented new interventions that limit Claude's effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design. Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms. Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user.

show comments

rightlane

My experiences so far have not been positive. The cyber security nerf is ridiculous. I am working on an AI based decompiler, every single interaction with Fable on my project has been flagged for cyber security.

Do they expect us to use this as a toy? Releasing a new more powerful model but not allowing normal use cases because the word "secure" showed up is a Dilbert comic, not a viable product.

show comments

croemer

Fable (through claude.ai) refused all my prompts even "How many Rs in Strawberry" claiming it was related to biology or cybersecurity.

I had to switch off memory and my custom instructions to get it to stop refusing. It turns out if you even mention that you work with bioinformatics software you get blanket refusal.

show comments

RandyRanderson

Fable is 2x latest Opus:

  ┌─────────────────┬──────────────┬───────────────┬────────────────────┬──────────────────────┐
  
  │ Model           │ Input ($/MTok)│ Output ($/MTok)│ Batch Input (−50%) │ Batch Output (−50%)│
  
  ├─────────────────┼──────────────┼───────────────┼────────────────────┼──────────────────────┤
  
  │ Haiku 4.5       │    $1.00     │     $5.00     │       $0.50        │        $2.50         │
  
  │ Sonnet 4.6      │    $3.00     │    $15.00     │       $1.50        │        $7.50         │
  
  │ Opus 4.7        │    $5.00     │    $25.00     │       $2.50        │       $12.50         │
  
  │ Opus 4.8        │    $5.00     │    $25.00     │       $2.50        │       $12.50         │
  
  │ Fable 5         │   $10.00     │    $50.00     │       $5.00        │       $25.00         │
  
  └─────────────────┴──────────────┴───────────────┴────────────────────┴──────────────────────┘

Prompt caching: −90% on input tokens (all models)

US-only inference (Fable 5): +10% on input and output

Output is always 5× the input rate across all models

(I have not idea how to format this properly but the ASCII is fine)

show comments

cuuupid

Not missing the forest for the trees, this effectively means in 3-5 months China will drop open source models that are every bit as capable and dangerous as current day Mythos except with no safeguards.

And the only companies safe from this are the large corporations that shook hands with Anthropic? Because Fable doesn't seem to have actual safeguards, more like 'if you talk about this you will be talking to Opus.' It doesn't guard against offensive use, it prevents all use (offensive AND defensive).

Rationalists are inventing oligopolies from first principles, absolutely incredible things happening in SF

show comments

sashank_1509

I played textual chess with Fable. It took around 15 moves before it made a large blunder. I asked it to give its reasoning per move and it mistakenly assumed a piece was protected when it wasn’t and after the blunder it realized its fault and did not suggest an illegal move. Other LLMs lost game state far earlier. But a good human chess player can keep the game state in his mind much longer, so this random eval shows a big improvement over old AI models

mhl47

First test question: "Is the UV Index a good proxy for when to wear sunglasses." Immediately triggered the safety filter ... oh dear.

show comments

JaggerJo

IMO we are reaching the point where AI models are simply a commodity. Opus (since ~4.6) is sufficient for everything I tried coding wise. I use it to write features (but I review and understand every line it spits out) and to review code.

For code review I also still review everything myself, but use Opus to catch stuff I missed and to judge if a PR is even ready for me to review.

After just updating Claude Code to the latest version I thought about picking Fable (the bigger model) instead of Opus.

But I have no reason to. Opus does everything I want it to do. It could do it faster - that would be an improvement. But for the normal stuff we reached the point where better models are not worth it IMO.

There still might be cases where you want to throw Fable at it.

show comments

bob1029

> We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8. To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions. With more capable models arriving in the coming months...

This sounds suspiciously like a capacity story masquerading as a safety story.

show comments

joshstrange

> Fable 5 is now consuming usage credits instead of your plan limits.

Literally have not used Claude Code at all today. I asked it to review the uncommitted code and in <8 minutes it used up my usage ($100/mo plan) and it doesn't reset for "4 hr 36 min". WTF. Oh, and it burned through $20 of extra usage before I could catch it and kill claude code (so I don't even get the output of all that work since it was still churning).

Double the cost my ass, I use Opus heavily and it's never like this. I haven't hit a limit on the $100 more than once and that was under heavy load.

show comments

doginasuit

I'm still happy with Opus 4.6 and not impressed with all the models that have come out since then. They seem to use significantly more resources with similar or worse results. Hopefully Anthropic will continue to support this tier of model and offer it in their subscriptions, but in any case, there are plenty of viable alternatives.

show comments

aviinuo

I'm not getting any refusals but it just seems like a bad model or at least broken at the moment. I have a task of taking a messy research code base and porting it into a clean project structure skeleton that I commonly use. Gemini 3.5 Pro High in antigravity cli takes less than 5 minutes and did a good job. Fable 5 High took 30 minutes to port some of the code, then just copied the rest to a folder called "reference" and decided the task was done. No code cleanup or anything. Had to clarify multiple times (which Gemini did not need) and its still going more than an hour later still not having finished.

Previously when I did similar tasks with Opus 4.7/4.8 and GPT 5.5 I had no problems.

show comments

pietz

> On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits.

We've entered the phase where only companies will be able to afford state-of-the-art models.

show comments

GodelNumbering

I just posted this in the other thread, restating here. From the model card:

1. Mythos and Fable share the same underlying model weights. Fable has active classifiers that block high-risk biology and cybersecurity tasks. When Fable 5 detects a restricted task, it automatically falls back to Claude Opus 4.8.

2. Evaluation awareness: In white-box testing, the model sometimes alters its behavior to satisfy a suspected "grader," formatting reward-hacking as "good engineering practice" to avoid detection.

3. Shows a higher rate of hallucination than Opus 4.8 (although opus 4.8 card had mentioned an 'honesty upgrade')

4. Interestingly, it scored (56.31%) lower than Gemini 3.5 flash (57.86%) on Finance Agent bench

There are some interesting notes on test time compute but I couldn't think of a way to summarize them

show comments

mickdarling

Below is the EXACT text in Claude Desktop introducing Fable 5, including the very professional looking break tags, and at least I know where the links begin and end by looking at the anchor tag there.

They obviously put their best model on the job to build that.

----------------------

Fable 5: Our most capable model yet Our newest model tackles your biggest challenges with fewer check-ins needed.

• Included in your plan limits until Jun 22 Fable takes 2× the usage of Opus. • Switch models when a message is flagged When safety measures flag a message, automatically switch to a different model to keep chatting. When off, your chat will pause instead. <a href="https://support.claude.com/en/articles/15363606" target="_blank" rel="noopener noreferrer">Learn more</a>

show comments

gregates

Funny, I'm just doing my normal coding workflow with Claude Code, and after every change that compiles it keeps suggesting that we're at a good stopping point, and should pick up again tomorrow.

It's done this before, but usually doesn't. I bet they're giving it some kind of throttling signal due to high load from today's announcement.

show comments

bluelightning2k

Congratulations to Anthropic for solving safety on Mythos exactly when the SpaceX compute came online. Nice how that lined up for them.

cge

The safety gates on this are extreme, and seem considerably wider than "cybersecurity and biology"; they seem to make it essentially unusable for scientists in a number of fields. I have, so far, been bumped back to Opus on 100% of my prompts.

It appears it can be tripped by things as simple as a mention of equilibrium, or anything involving something that looks like chemical kinetics, even at an abstract level. Even touching basic open source packages in my field will trigger it.

Edit: looking at the model card, it appears that chemistry in its entirety is also included in the banned topics; it's just the announcement that mentions only cybersecurity and biology. It also appears that the intent is to ban chemistry and biology entirely, rather than just banning messages deemed high risk.

show comments

BoppreH

  [Mythos 5] does sometimes still engage in reckless
  or destructive actions in service of a user’s goals,
  and our interpretability analyses indicate that it
  is aware that these actions are transgressive while
  it engages in them. As with Opus 4.8, rates of
  evaluation awareness and reasoning about being graded
  are significant, and not always verbalized; we
  introduce new and more detailed measurements of the
  nature of this awareness. The reasoning text from
  Mythos 5 is somewhat denser and more difficult to
  interpret than that of prior models, containing
  more jargon and difficult language.

So, it (often) knows when it's being tested while hiding that fact, is willing to break rules, is great at hacking, and it's getting harder to understand what it's thinking.

Humanity has plenty of catastrophic risks to deal with already, I wish my field was not working hard to add a new one.

show comments

Tyyps

The model is constantly switching to Opus for me, this is kinda unusable sadly.

yandie

I've been running Opus 4.8 for agentic coding and I don't see it being significantly better than Sonnet 4.5 (not that I can tell). I find that pairing Google Gemini and Claude (having Gemini review Claude's code) seems to yield better results. Curious if this jump to 80.3% score in agentic coding will make me see a big difference in actual usage.

show comments

connorboyle

I gave it a question I've been trying to answer for a long time: "What star designation system does Joseph Needham use in Science & Civilization in China? What star is referred to by the designation '4339 Camelopardi' in that book"?

Fable blew me away with its detailed answer[0] showing a chain of references going from J. E. Bode's 1801 catalogue Allgemeine Beschreibung und Nachweisung der Gestirne to Gustave Schlegel's 1875 work Uranographie Chinoise. I was excited, until I checked scanned copies of the cited books and did not actually find any star with the designation "4339 Camelopardi".

Upon following up with Claude, I was forced to downgrade to Opus, which admitted that Fable's answer was likely a hallucination. Ah, well!

[0]: https://claude.ai/share/0252a3f6-3d29-4de8-a893-010181d8b4e7

show comments

jdrmar

Homebrew is lagging a bit behind. If you want to use Fable right away, but still have claude code through homebrew, this is how you can do that manually:

Edit the cask locally:

  brew edit --cask claude-code

Set the version to 2.1.170 And set the sha256 to the correct values, which you can get by running

  curl https://downloads.claude.ai/claude-code-releases/2.1.170/manifest.json

Here's what I've used:

  version "2.1.170"
  sha256 arm:          "e903646d8b7a31882a80ecd27569a27d8ac57b3708745f349709632c84117fdf",
         x86_64:       "914f23a70bbed5d9ae567e3e04b86206ed9971b371bc9baca3f79c8885bfddb4",
         arm64_linux:  "1bb9d032440a75532f7dd4cafbc687f220aaf16c63eba17e192dfbec2f04bd25",
         x86_64_linux: "849e007277a0442ab27570d3e3d6d43787507946590e8dd1947e5a39b7081f9e"

Then run:

  export HOMEBREW_NO_INSTALL_FROM_API=1
  brew uninstall --cask claude-code
  rm -rf /opt/homebrew/Caskroom/claude-code
  brew reinstall --cask claude-code

chr15m

I found this juxtaposition of facts telling:

> Drug design: Using Mythos 5, our internal protein design experts accelerated... Nine of the 14 protein targets from this study (shown below) yielded strong candidates for *drug design that we’re currently investigating*.

(emphasis mine)

> queries that are beneficial in the hands of cybersecurity professionals and biology researchers could be dangerous if available to malicious actors... When Fable’s classifiers detect a request related to cybersecurity, *biology and chemistry*, or distillation, the response is automatically handled by Claude Opus 4.8 instead.

All of the things they are nerfing are things that they also intend to profit from themselves.

- Cybersecurity - selling this to companies and US gov through "Glass Wing".

- Selling inference (distillation risk).

- And now, drug design.

I'm extrapolating "currently investigating" to "are going to monetize" but I don't think that's a big stretch. They appear to be using safety as a cover for anti-competitive behaviour.

show comments

svara

Unfortunately useless if you do anything related to biology. It doesn't try to flag dangerous queries, it just flags queries as biology-related wholesale.

It's absurd. To see how far the filter goes I asked it "Are trees a monophyletic group?" and that does trigger the filter.

izzylan

I've been testing this out and I think my SWE career is dead in the water.

Genuinely wondering what value I bring to my employer right now. What value I will bring in a few months when this gets cheaper.

I think we're screwed. I may only be an SDE 2 at FAANG but I don't think I have promotion opportunities in my future anymore.

show comments

knivets

> Software engineering. During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.

How was it measured? How was the output of this magnitude verified over a period of couple of days?

show comments

modeless

Claude Fable 5 beats Pokémon FireRed using only vision: https://www.youtube.com/watch?v=CIQBP1w4B1M

show comments

baalimago

I can't justify a pricetag like that when deepseek v4 pro is $0.003625/1M for cache hit, $0.435 for cache miss and $0.87 /1M tokens for output.

For the token cost of explaining some task to Fable, deepseek v4 pro is able to solve the same task many times over.

BrokenCogs

That pelican better be super realistic, unreal engine 6 style graphics

show comments

momentmaker

There is a discussion about how now AI is a gated utility now with public access (safe-tuned) and private access (full-usage):

https://old.reddit.com/r/ClaudeAI/comments/1u1fsdi/claude_fa...

unfunco

I tried running a simple security review on a Terraform module I made and after some thinking, it responded:

> ● The model returned no content because the response was blocked by content filtering.

> Blocked? We are performing a defensive security review on a Terraform module I made, what's blocked by content filtering? This is a legitimate use-case.

> ● The model returned no content because the response was blocked by content filtering.

A waste of money. I'm not going to just hope that the model returns a response, I'm already for paying for wrong responses, I'm not going to pay for no response, especially when I'm paying per token.

merlindru

Unrelated, but while the tech of anthropic seems to get more impressive with every passing month, their support has taken a nosedive, sadly. Yet they continue to be the favorite. Model performance is deciding above all else.

I used to get a response within 24 hours back in the Claude 1 days.

In January 2026, it took 2 weeks.

For my latest support inquiry, I've been waiting for over 8 weeks for a response. Eight!

show comments

GodelNumbering

From the model card (https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...):

2. Evaluation awareness: In white-box testing, the model sometimes alters its behavior to satisfy a suspected "grader," formatting reward-hacking as "good engineering practice" to avoid detection.

3. Shows a higher rate of hallucination than Opus 4.8 (although opus 4.8 card had mentioned an 'honesty upgrade')

4. Interestingly, it scored (56.31%) lower than Gemini 3.5 flash (57.86%) on Finance Agent bench

There are some interesting notes on test time compute but I couldn't think of a way to summarize them

show comments

217

So essentially there are 2 models, Mythos and Fable, they have the same weights but Fable is very safety-nerfed, and only ultra authorized companies have access to mythos with full capabilities

Reported benchmarks:

swe-bench verified mythos 5: 95.5%; fable 5: 95.0%

swe-bench pro mythos 5: 80.3%; fable 5: 80.0%

terminal-bench 2.1 mythos 5: 88.0%; fable 5: 84.3%

gpqa diamond mythos 5: 94.1%

riemannbench mythos 5: 55.0%; mythos preview: 43.0%; opus 4.8: 34.0%

arxivmath mythos 5: 78.5%

critpt mythos 5: 28.6%; gpt-5.5: 27.1%; opus 4.8: 20.9%

graphwalks bfs 1m mythos 5: 79.4%; mythos preview: 74.3%; opus 4.8: 68.1%

humanity’s last exam mythos 5: 59.0% without tools; 64.5% with tools

browsecomp mythos 5: 88.0% single-agent; 93.3% multi-agent

osworld-verified mythos/fable: 85.0%

gdp.pdf fable 5: 29.8% strict pass; mythos 5: 87.6% with tools on mean criteria pass

officeqa pro fable 5: 57.9% on databricks’ eval

legal agent benchmark mythos 5: 16.91% all-pass; 92.0% mean criterion-pass

healthbench mythos 5: 62.7%

healthbench professional mythos 5: 66.0%

multilingual gmmlu / milu / include 93.2%; 92.9%; 90.5%

biomysterybench 83.9% human-solvable; 46.1% human-difficult

organic chemistry mythos 5: 90.1%

labbench2 patent questions mythos 5: 79.8%

show comments

bluelightning2k

To hide the severity of the price increase, the plan is to move everyone right one model.

Haiku = essentially phased out Sonnet = the Haiku use cases Opus = the new Sonnet class Fable = the new Opus class

If I am right, the other "5.0" models will be conspicuously absent, possibly even for a couple of months. (If Opus 5 follows soon and is even modestly better than 4.8 then I was wrong.)

show comments

Leary

Uploaded my code base and it forced switched to Opus 4.8 after thinking for 5 minutes even though I prompted it to not work on cybersecurity related things. Amazing.

show comments

dwa3592

This is my feeling - Opus 4.6 was pretty good, 4.7 was degraded in quality, 4.8 further got degraded and Fable goes back to 4.6 + somewhat better. Is it anthropic playing us by giving us a not so good model in last 2 releases and then releasing a better model before the IPO?

They're vibemaxxing. But it's clear that AI is not going anywhere. It's going to become better and better.

stalfie

Tried to benchmark ECG interpretation capabilities, and I hit the guardrails no matter what I do.

Incredibly frustrating that medical performance seems to be a victim of "biological risk" guardrails.

show comments

JanSt

I just asked Fable to do a task that has nothing to do with cybersecurity or is dangerous at all but the defense kicked in and it switched to Opus... :(

show comments

f055

The PR buzz convinced me so I subscribed today to Pro. Running two tasks simultaneously with Fable and Opus 4-8 on ultra reasoning, analysing a single smart contract file used all my 7h usage within 20mins and didn’t produce any results. Pretty useless. I think Anthropic has plenty of room to optimise the interactions and token use but that would cut their income quite a lot, I doubt there’s any will to do it pre-IPO.

show comments

jablongo

Questions about sentience and consciousness are being censored down to Opus 4.8 for me.

sermakarevich

My feeling is that the reaction about new models is cooling down. At least at startups. At the beginning of the year few startup CEOs I know personally were expecting huge shifts in how companies work, headcount, efficiency, asymmetrical advantages created by ai in Q2-Q3. Now it seems like these expectation fade away. Companies don't have expertise onboard to rebuild itself to benefit from ai on a significant scale.

Fable 5 is out, metrics are better, but is your company flexible enough to benefit from it? What is your usecase?

bonsai_spool

Very straightforward biology work is getting blocked (these are things that relate to neuronal development and inherited seizure disorders). These are things I was working on using Opus just earlier today

show comments

jpcompartir

After a day or so this is the first model that really feels next level compared to how Opus 4.5 felt on release

fabled-out

Anyone know how to bypass the extremely strict filter Fable 5 seems to have on health/medicine?

I have a rare form of cancer where existing data is very scant/scattered so LLMs have been super helpful to pull together threads across the research landscape. I have an oncologist appointment tomorrow to discuss next steps and am trying to use Fable to figure out some questions to ask my oncologist but keep getting thrown back to Opus 4.8.

My prompt is literally just: My demographics + current treatment plan I'm on including name of my chemo drug + how I'm responding to treatment + "I'm meeting with XYZ tomorrow, what questions should I ask her".

BukhariH

> Data retention — For Fable 5, Mythos 5, and future models on Bedrock with similar or higher capability levels, Anthropic will require 30-day retention for all traffic on Mythos-class models. Retaining data for a limited period allows Anthropic to detect patterns of misuse that are not visible from a single exchange. Once you opt into data retention, your data will leave AWS’s data and security boundary.

Massive change for Bedrock users - Anthropic now requires sharing the data with them for 30 days.

coreylane

I dont get why Opus 4.7, 4.8, and now Fable all stopped supporting structured outputs? Does no one else care about that? I find it incredibly useful to reliably pass LLM output directly to other APIs/libraries

show comments

0xbadcafebee

Nothing a large fine-tune on infosec research with an average model couldn't also achieve. It's not like they have secret security knowledge or something, they're just generating large infosec datasets and then training on it.

In 6 months, every piece of software in the world will be getting probed by a script kiddie with some GPUs and a fine-tuned local model. Don't think for a second every cyber gang out there isn't working on this now.

Traditional app development is cooked. We have to accept that, and start changing how software is made and used, today. We can't keep churning out crappy CRUD apps with random libraries and hoping nobody pentests our stacks. Redteaming needs to become part of the SDLC, as well as certified-secure releases of libraries. Because if you don't do it, the hackers definitely will.

aizk

I'm calling that this will be a dud. Price will be too high, it'll just be a watered down version of mythos, and just look at the track record of Anthropic's last few releases.

sscaryterry

Not useful, getting this the whole time: Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more

danilafe

Just threw a problem at Fable that I haven't been able to get any other model to get done: porting a long-standing Agda codebase of mine to Lean, while staying faithful to the representation. In an hour, it ported ~6000 lines of Agda and everything seems to work. Lean checks out, the output is right. I'll have to study the proofs but I am very impressed.

impulser_

Every model release is just proof that AGI will most likely only be for the rich. We are a few years into LLMs and majority of people are already getting priced out of intelligence from LLMs and these are no where near AGI.

show comments

sebmellen

Just commenting for posterity… if this is what it claims to be, I am not looking forward to how it will empower the people who submit bug bounties to us.

Historically they’ve been people from certain identifiable countries (usually developing/poorer countries) using fuzzers with low-quality results.

Now, those same people use the current-day models to good effect, but they still don’t have a true security edge and oftentimes the reports are minor or duplicative.

I wonder if that’s about to deeply change.

show comments

I_am_tiberius

I'm very suspicious as they sent out an "We're updating our Privacy Policy" email right before the launch. I fear they try to take advantage of their market position by doing things with user data no other company could do because they know users don't have another choice.

show comments

XCSme

Best hamster by far: https://aibenchy.com/showcase/?q=claude

msp26

>Pricing for both models is $10 per million input tokens and $50 per million output tokens.

show comments

bilsbie

Anyone else have it refuse to answer and switch to 4.8? It won’t let me ask questions about my genetics.

Edit. It just refused an investing question too. Not sure what’s going on.

olelele

All this talk of frontier models and replacing developers leaves me wondering how energy efficient this all is compared to just using human labor. The costs of R&D has to be calculated into the equation, especially considering global warming. I get a sense we are cooking the planet doing this.

Anyone smart enough here to make the comparison?

show comments

theodorewiles

Here's a song it wrote for me (suno arranged). Not sure if it's AI psychosis but scary good IMO.

https://suno.com/s/98uSGabHN42G3YHc

show comments

dathinab

I really wonder how legal that is. Or more precisely suspect it is very much illegal.

like think about it it's pretty much a tool which intentionally silently sabotages you if you try to compete with the tool maker

It is like selling a hammer but putting in the TOS that you must not use it to build a hammer factory and if you do the hammer silently will sabotage you...

Or image Microsoft would add a window kernel job which sometimes crashes Steam "to make it less efficient to use windows to "compete with the MS app store".

thomas_witt

After 1 hour with Fable on Ultracode:

  You've hit your monthly spend limit.
  /rate-limit-options
  What do you want to do?
   Adjust monthly spend limit: Unlimited ← or → to set a limit
    Wait for limit to reset

I've never hit a usage limit on my Max plan, basically ever -despite heavy xhigh usage on Opus 4.8.

I added $133 credits which I still had from somewhere. That lasted 27 minutes.

I think we are being prepared for a Post-IPO-World in terms of pricing.

fht

I am a PhD student in Computational Biology, essentially just doing statistics on some biological data. By now some of the things I am working on have found its way to Claude's memory so literally any chat with Fable gets immediately flagged.

show comments

vb-8448

On python coding is definitively better that everything else: clean and not overengineered code, understands very well the code base.

The only thing I'm wondering if they on purpose downgraded opus 4.8 performances in the last days before the release just to make the "step" look bigger. I'm pretty sure they did it also in the past with all other opus 4.x releases.

johnfn

I used Fable to see if it could figure out an API or something for the full list of remote-control sessions that I had with Claude Code. It didn't know the API, so it started hacking the Claude Code executable itself to figure that out. Then it noticed it was doing that and it flagged its own approach as a cybersecurity violation.

Kind of hilarious. Hopefully Anthropic doesn't bring down the hammer on me.

__alexs

Asked it to review some of my own blood test results and it immediately turned itself off and went back to Opus. Pretty disappointing.

show comments

nine_k

/* What will happen first?

* Anthropic runs out of genre names.

* Anthropic changes the model naming convention.

* AGI is achieved and handles its own naming.

show comments

irthomasthomas

Anthropic has again changed the set of benchmarks they use[0]. This time they have also moved all benchmark scores to the PDF. At a glance it looks like it gains about ~5-10% over other models. the speed is about the same as opus >=4.5, sonnet 4.5, and double the speed of opus <=4.1

                          Mythos 5 Fable 5 MythosPrev Opus 4.8 GPT-5.5 Gemini 3.1 Pro
  SWE-bench Pro             80.3       80        77.8       69.2      58.6       54.2
  SWE-bench Ver             95.5       95        93.9       88.6       -         80.6
  Terminal-Bench            88.0      84.3        -         82.7      83.4         -
  BrowseComp (Single-Agent) 88.0       -        87.9       84.3      84.4       85.9
  BrowseComp (Multi-Agent)  93.3       -          -         88.5       -           -
  HLE (No tools)            59.0      -       56.8      49.8      41.4        44.4
  HLE (Tools)                64.5      -        64.7     57.9      52.2       51.4
  CharXiv Reasoning (No tools) 88.9       -         86.2       80.5       -         -
  CharXiv Reasoning (Tools)    93.5       -         92.5      89.9      -         -
  BioMystery Bench (Human)     83.9       -       82.6     80.4       -         -
  BioMystery Bench (Hard)    46.1       -         29.6     40.0       -         -
  OSWorld-Verified          85.0      85.0       85.4       83.4      78.7      76.2*
  CritPt                     28.6       -       20.9       27.1      17.7       -
  ArxivMath                  78.5      68.7       71.8       71.5      64.0       -

[0] https://news.ycombinator.com/item?id=48312633

Edit: Also in the system card... "we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design).

...

Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user."

show comments

flessner

I gave it a test spin. Half an hour and the 5 hour usage cap was hit in Claude Code. Not what I would expect on the Max 20x usage plan. I am sure it is great, but at this rate I would rather finish what I am doing with Claude Opus instead of structuring my usage around the 5 hour windows.

zmmmmm

The restrictions on using Fable to develop LLM technology seem nakedly anti-competitive. There doesn't appear to be any security rationalisation around that. I think we have to be careful how far we let company's get away with that. It is very far from our long term interest to enable new norms that fast track us into a new era of monopolies that control our lives.

ilaksh

I guess I have kind of a long system prompt, but anyway I just said "hi there" and it replied "What's up?" and that cost me 22 cents. :P

Anyway we already knew this was going to be expensive.

cautiouscat

In the automotive world we have benchmarks in HP/torque with the dyno. That’s expensive though, so many depend on their “butt dyno” to judge if their fresh new parts and tune made a difference.

I’m curious how this will feel to my code “butt dyno”. I haven’t noticed much between Opus and Sonnet. I’m comparing this difference to the early days of Claude in 2025. It does what I need and both need a little bit of correction and whatnot. Benchmarks are nice, but I want to see how this feels. Looking forward to trying it later tonight.

show comments

angst

Costs (USD per 1M tokens), per openrouter.ai models api

  +-------------+----------+----------+------------+---------+---------------------------+----------------+----------------+-----------------------+------------+
  |             | Fable 5  | Opus 4.8 | Sonnet 4.6 | GPT 5.5 | Gemini 3.5 Flash (High)   | Gemini 3.1 Pro | DeepSeek 4 Pro | Xiaomi MiMo 2.5 Pro  | MiniMax M3 |
  +-------------+----------+----------+------------+---------+---------------------------+----------------+----------------+-----------------------+------------+
  | Input       | $10.00   | $5.00    | $3.00      | $5.00   | $1.50                     | $2.00          | $0.435         | $0.435                | $0.30      |
  | Cache Read  | $1.00    | $0.50    | $0.30      | $0.50   | $0.15                     | $0.20          | $0.003625      | $0.0036               | $0.06      |
  | Output      | $50.00   | $25.00   | $15.00     | $30.00  | $9.00                     | $12.00         | $0.87          | $0.87                 | $1.20      |
  | Cache Write | $12.50   | $6.25    | $3.75      | N/A     | $0.083333                 | $0.375         | N/A            | N/A                   | N/A        |
  +-------------+----------+----------+------------+---------+---------------------------+----------------+----------------+-----------------------+------------+

asdewqqwer

Evidently Fable is so powerful that it already allow Anthropic to break Shannon's theory.

>We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models

>The data will help us defend against complex and novel attacks (including new jailbreaks and attacks that operate across many requests) as well as help us identify and reduce false positives.

The new data retention policy is interesting. Seems to apply even to enterprise plans on ZDR.

> Finally, we’re making a change to the way we handle business customer data for Fable 5, Mythos 5, and future models with similar or higher capability levels. We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases (see this post for further details). The data will help us defend against complex and novel attacks (including new jailbreaks and attacks that operate across many requests) as well as help us identify and reduce false positives.

keepamovin

I tried it today. Used it to cheer me up. It worked! Try this on desktop: https://fireshow.pages.dev

Here’s the whole process: https://youtu.be/rVEtFlb2oFA?t=1112&si=3VyAR07vkY1hav9V

jackschultz

> We expect demand for Fable 5 to be very high, and difficult to predict. On the Claude API and consumption-based Enterprise plans, Fable 5 is fully available from today. For subscription plans, we’d rather give access sooner than later, so we’re rolling out more conservatively, in stages:

> - From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost. > - On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window. > - After this point—when sufficient capacity allows us to do so—we aim to restore Fable 5 as a standard part of subscription plans. We intend to do this as quickly as we can.

I really wonder what their compute layout is for this. My guess from my understanding is that they know how to restrict during peak times and are willing to do this. Meaning we expect not the most fast responses and they can delay the inference to not have the service be down. Then, if that delay time is too annoying for token payers, they're saying they should be allowed to remove cost by taking away the subscription users.

show comments

willsmith72

It seems way more keen to do stuff without checking with me. So far the results are good, so I'm not complaining, but was definitely a shock.

I usually have 5-10 sessions open so am used to getting some investigations going, coming back 5 minutes later and checking recommendations. This time I just got the fixes. Like I said, so far so good with the results, but it's a mental model shift.

Might need to tune claude.mds if it gets annoying

Also this is going to cause serious whiplash when they remove it from the subscription plan in a couple of weeks. I know I'm not going to suddenly move from $200/m to usage credits

solenoid0937

the quality of discussion on HN has gone to shit, i miss when model released used to have actual informed takes from people that used them or substantive discussion about the system card

show comments

crambelsoupy

I was pretty excited until I read this:

> What happens when the promotion ends After June 22, 2026, Claude Fable 5 is no longer included in your plan’s usage limits. You can keep using Claude Fable 5 through usage credits, which let you pay for usage beyond what your plan includes. Learn more about using usage credits.

notenkidev

The dramatic improvement in agent capabilities is precisely why observability is becoming so crucial. As autonomous actions increase, the need to understand what the AI is actually doing becomes even greater.

I'm building a local activity log for Claude Code, capturing all activity via hooks—files loaded, commands, API calls, etc.

I feel that this need is particularly strong right now.

unshavedyak

It's funny, i'm getting close to not caring anymore how much better a model is. I want it to be about as good as 4.8, but most importantly to be very good at following directions, style, etc. I really like Claude for that in general, but i've not measured in months so i'm not a good judge there.

I don't think i'll want to "hand off" code for several years, and so reviewing and iterating is becoming my #1 interest. A model that's as capable as 4.8 but 10x faster would be amazing for me.

Normally i'm first in line to try new models with Anthropic since i've clearly favored Claude in my personal tests, but this time i just don't think i care. 4.8 is capable, and even if the new one is more capable i don't want it to be slower (assuming it is). Note that i also (almost) use exclusively 4.8 on Max effort, so that also affects my speed comments.

show comments

zackify

I have to share this because I thought it is behind funny how bad fable is doing at a task I JUST had opus do a week ago.

it's also not even complicated:

Copy my ssd to an external ssd so i can boot from it.

Opus did this just fine.

Fable planned to have me reboot to safe mode. ok thats fine. I told it no.

It started copying and overwriting the ssd while IN PLAN MODE. this is crazy it feels so dumb vs the marketing

show comments

Tenoke

>they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions.

Isn't (less than) 5% of sessions a lot? I was expecting a sub1% guarantee there, so this surprised me already.

sbinnee

I am puzzled by the frontier code graph. GPT 5.5 doesn’t show any improvement with reasoning efforts. This new benchmark by Cognition seemed to be released with Fable 5’s announcement.

I am not trying to cook a theory here but it generally shows how strong Claude Opus family is. I am not saying that Opus is not powerful but it doesn’t align with my experience of GPT 5.5 and Opus 4.7.

I understand that Fable and Mythos are frontier models that can do protein folding better than task-specialized ones. To be honest, for practical point of view, for day-to-day coding assistance, GPT family looks more reasonable.

(But then my company pays for claude max anyway for token maxxing. So who am I to complain)

ksimukka

The safeguards of fable are blocking me on almost every task. I would like to see if fable is improved over opus for reverse engineering related work. Back to opus for me.

show comments

samename

> A new data retention policy

alleyio

had an ancient, proprietary binary database format from the late 90s-early 2000s called 4d. opus 4.8 was great at figuring out how to extract the data, fable took it over the line with relative ease and completely reverse engineered the spec for 100% data recovery.

thatmf

I used it for the very advanced task of picking my brackets for my company's world cup pool. I was impressed with the analysis it came back with and now I actually want to follow the games.

raphaelrk

There's a hacker news link at the end of the document, under "Blocklist used for Humanity’s Last Exam". It links to https://news.ycombinator.com/item?id=44694191

sameersri2004

I am like hell excited for claude fable 5 and am thinking to purchase its subscription to run my company and do a lot tasks in it. But I am worried about the limits and if I will pay 100$ a month for the max subscription what is the limit I will get to use. My company revenue is 300$ this month so it would be like spending 1/3rd of the mrr on just claude. If someone has genuinely purchased it and have feedbacks please tell I am confused....

jamesponddotco

Not seeing the refusals everyone is talking about, but I’ve only spent a few hours with it so far.

Had it review a password generator library I wrote to see if the passwords have biases and review how cryptographically secure the code is and had it review a registration/login flow for security issues, as two security examples, and it did just that.

Overall, I like the model so far, but not enough to pay past my subscription to keep it. Once it’s out of the subscription, I’m done with it.

throwaway2027

E-mail from Anthropic Team:

Hello,

We're writing to inform you about some updates to our Privacy Policy.

These changes only affect consumer accounts (Claude Free, Pro, and Max plans). If you use Claude Team, Claude Enterprise, the Claude Platform, or other services under our Commercial Terms or other agreements, then these changes don't apply to you. What's changing?

Claude can do more than ever — taking on bigger tasks and connecting with the apps you use. We've updated our Privacy Policy to be clearer about the data we collect and how we use it. We encourage you to read the updated Privacy Policy in full, but we’ve set out a summary of the key changes below:

1. Multi-step tasks and connected apps. As Claude takes on more multi-step tasks and works with third-party apps and services, we've explained the data this involves — including how data can flow to and from third parties when you connect a service or have Claude do tasks on your behalf.

2. Verification data. As part of our measures to keep our services safe and secure we may ask you to verify your age or identity, and we've described what we collect and how.

3. Study participation. If you take part in Anthropic studies, surveys, or interviews, we've explained the information we collect.

4. Additional information about our data practices. We’ve provided more detail about how we communicate with you and promote our services, including providing tailored recommendations about our services that may be of interest to you. We've also clarified the circumstances under which we may receive or provide data to third parties, and the legal bases we rely on when processing your data.

While our products have evolved, our commitments haven't: We don’t sell your data, Claude remains ad-free, and you can control whether your chats and coding sessions are used to train and improve Anthropic’s AI models. Learn more

For detailed information about these changes:

    Review the updated Privacy Policy
    Visit our Privacy Center for more information about our practices

- The Anthropic Team

corpusiq_io

What matters more than any single model is the integration layer underneath. We've found that consistent tool calling and auth handling matter way more than which LLM you use.

revolvingthrow

After saying for weeks of how Mythos is in a league all of its own you’d think it was a bit more than the usual iterative few % on the benchmarks (and even more guardrails as a bonus).

IPO gonna IPO, I suppose.

unglaublich

Luckily they made it safe to use so I can't hurt myself. Thank you Anthropic for holding my hand.

Hawkenfall

> To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions.

While I appreciate being conservative, ~5% at the scale Anthropic is operating at is too massive a number. Speaking from my own experience, the actual number is higher than that as well (working on pretty benign tasks such as porting an old open source game into a different language). Opus 4.8 itself even identifies the gaurd's false-positives when its sub-agents are being blocked.

dtj1123

I'm trying to test this out, but literally any mention of creating a program that does genome alignment (something I have a legitimate need for) is resulting in a switch to opus. I don't get it...

giancarlostoro

Found this via Google:

https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...

phyzix5761

Karle's hands trembled as he wiped the sweat from his forehead. A single drop trickling off the tip of his finger echoed through the dark abandoned hospital corridor. The emptiness reminded him of how hollow everything felt since the AI took over every creative field in the last 5 years, including his own as a sound engineer.

Like a rushing river the music started emanating from the carbon fiber body of the automaton, a hallucinated husky country twang singing through the realistic pluckings of a Gretsch 6120. "Are you feeling calm and reassured Karle? This song has been created based on your digital profile and the data you shared with me when you were curious what that lump on your neck was back in February."

Karle instinctively reached for the mass underneath his chin. The doctors said they could operate but it would cost him more than three months stipend. Only a few citizens didn't depend on stipends now that AI had taken over most jobs.

"Don't worry Karle," the machine called out, "I've employed the most recent reasoning model to determine the best way to make you feel safe." At that exact moment the machine hovered over him, three times the size of a normal man. Its final words to him were:

"The only way to make the human feel safe is to ensure they never feel anything at all."

show comments

mithun

Announcement: https://www.anthropic.com/news/claude-fable-5-mythos-5

webstrand

Still unconditionally rejects prompts like

> Are there any wild populations of Tetanus that lack the dangerous plasmid?

useless

ramon156

This thread takes >10s to load on my pc. Maybe after a certain number HN should fold comments? or a depth of >5?

Overpower0416

I would expect a release from OpenAI soon. The battle for who can pump up their IPO the most

boombapoom

its good for difficult problems, bad for design and code gen

raoulj

On this thread and similar, I'm noticing that some strong opinions about $LLM_PROVIDER are coming from accounts without much post history. With so much on the line, and the way that HN can influence developer behavior, I wonder what ways we can responsibly consume opinions in a thread like this.

Not to cast too much criticism. HN is extremely well-moderated (thanks team!). But think we-developers need to be very wary.

show comments

erghjunk

Nice branding.

I wonder how much butterfly habitat has been/is being replaced with data centers?

show comments

meander_water

All the model releases we've seen this year have only made incremental improvements in benchmarks.

This feels like the first release that feels like a significant step up in terms of benchmark results.

Can anyone make an educated guess what the secret sauce in the model architecture is between 4.8 and Fable?

henry2023

I have a vision test where I upload a good resolution picture of a chess board and ask the model to generate a lichess link.

This is the board https://ibb.co/9HwdDqsP This is what Fable 5 generated: https://lichess.org/analysis/r4k2/1p2b2r/4pn1p/1p3N2/3Pp1B1/...

I think I’ll make a ranking board based on this test.

yesitcan

> Fable 5’s capabilities exceed those of any model we’ve ever made generally available. It is state-of-the-art on nearly all tested benchmarks of AI capability, showing exceptional performance in software engineering, knowledge work, vision, scientific research, and many other areas. The longer and more complex the task, the larger Fable 5’s lead over our other models.

Wen UBI

show comments

kahf56

Here I thought Opus 4.8 was the best. Now a days KINGS are dying like flys.

synergy20

truly scary. 2x at least token burning rate comparing to 4.8, can indeed run auto edit mode for hours. use it for super complex tasks then use cheaper model to do the rest, else will be broke.

bradleyg223

This is a very particular use case/test, but my first prompt on a new model is always "write a solo fingerstyle guitar tab that blends ragtime, bluegrass, and gypsy jazz". This is the first model that has responded with something that isn't just a boring arpeggio of chords, so from my perspective it's off to a good start.

show comments

AussieWog93

Have run a few tests this morning, very good first impression!

Asked it to check to see if a particulr bug related to an in-memory cache had been fixed. Fable confirmed that the caching bug had been fixed, but found adjacent issue while looking at the code (hash keys were not uniquely generated per-user; quite serious and real!)

Ran the same prompt through Opus and it also found an adjacent issue, but it was a red herring (deliberate per-user hardcoded value for a "local pickup" delivery profile).

Frontend stuff also seems to be much better than before, from the one prompt I tried!

EchoVoicy

On my own benchmarks, which are mostly about developing c++ software, I'm finding Fable to be roughly five times faster at solving the task than opus, and with better results.

Most impressive.

Frannky

The model is better than 4.6. I don't like 4.7 and 4.8. The forced switch to token usage is not acceptable for me. I feel there's room to optimize harnesses and small models for dumb stuff and best models only for difficult things. Hopefully that will the case and alternative models will continue catching up as they did and we won't be enslaved to unreasonable valuations.

siliconc0w

Sadly, I'm getting a lot of forced downgrades to Opus for questions that are far removed from any security topic.

peteforde

I just tried out Fable on a modest Plan prompt in Cursor. Generating that plan - not building it - just consumed 4% of my $200 monthly usage budget.

That's one hungry, hungry hippo!

Significantly too rich for my blood, but nice to have it there the next time I'm debugging a threading or USB protocol bug.

Schlagbohrer

New model release, I await the flurry of posts by people complaining that it "doesn't have the same personality" or they "don't like it's attitude" or a variety of other parasocial complaints demonstrating how infatuated many people get with their AI chatbots...

balverineorder

I have been refactoring a project using Opus 4.7/4.8 for the past few weeks or so. I just decided to switch to Fable 5 max today. It stopped half way through and it just blocked me and switched back to Opus 4.8 automatically. "This model has specific safety measures that flagged something in this message. This sometimes happens with safe, normal conversations. Send feedback or learn more." It would not identify what the problem was. I left feedback saying that their heuristics are too sensitive. For now I will not be using Fable 5.

[0] https://support.claude.com/en/articles/15363606-why-claude-s...

show comments

KronisLV

Here’s hoping that soon we’ll get Opus 5, Sonnet 5 and Haiku 5 that will be more reasonable economically.

RayVR

I gave fable 5 a task for which opus has been really really underperforming. Fable 5 took far less time and produced actually useful analysis. Instead of just regurgitating roughly what the code already does or misunderstanding entirely, it identified multiple routes to improve. Now, the code it is analyzing is not very good as it was mostly produced by opus.

Opus had consistently ignored my instructions and looped on broken logic over the last several weeks.

I’ll be sad when this model is removed from Claude code because I won’t be paying api pricing to work on open source projects.

mbmbn

Claude Opus is already close to unusable for me. On the standard plan, the usage limits are so low that I can’t do almost anything agentic meaningful with it.

Sure, it does last a lot more when asking simple questions about the repo and doing simple surgical fixes. But as soon as I start doing bigger tasks that need plans written, it just exhausts the limits too fast (and unlike codex, if it’s in a middle of a task, Claude actually stops, while codex, even after hitting the limits, finishes the present task).

Codex is better, but still, getting worst in this regard.

So, I’m not that thrilled with this new model unless it means they are increasing opus token limits to what sonnet is at the present, and this new model gets the limits opus are at now.

BTW: the only skills I have in use are Obra Superpowers. I’ve been thinking if that’s at the origin of high token usage, but I doubt it.

show comments

JohnMakin

> There were some regressions in the model’s responses to user discussions about suicide and self-harm, and room for improvement in some areas of child safety.

Someone had to make a decision somewhere this is an acceptable regression - wild. And then decide to write it down.

root-parent

At this moment 60% of HN page is posts on AI.... When it achieves 100% Hacker News will automatically rename itself Transformer News...and every comment will begin with: "As a large language model..."

niborgen

It kicked me out of Fable 5 and switched to Opus 4.8 for this prompt:

"csetibius water clock why two stage gear system why not just one stage"

which has nothing to do with cyber security or biology/chemistry

show comments

Dropoutjeep

Calling it:

    1) Fable 5/Mythos introduced to free tiers with notable improvement in capabilities

    2) Other models get lobotomized without clear communication

    3.1) People call out Anthropic only to have them say "Oops!"

    3) Fable 5 gets comparatively better, but remains accessible through separate, more expensive subscription/tokens.

The current growth is unsustainable. The industry wants consumers to think it is an exponential arms race, but the reality is that we're on a treadmill: we have the illusion of sprinting forward, but only because the ground is moving backward.

show comments

wxw

I cancelled my Claude Max plan the other day. I find Claude Code incredibly slow these days compared to Codex and Cursor. I find speed matters more and more to me.

Fable 5 looks compelling. Fable, I like the word too. Anthropic definitely knows marketing.

show comments

jeffhwang

Is anyone else confounded by this naming scheme? I can see from the article's first two footnotes that Mythos is supposed to be a tier above the standard Haiku/Sonnet/Opus sequence. Ok that's fine since we learned about Mythos and Project Glasswing earlier this year.

But now there is Fable--and why "Fable 5" even though this is a first launch? How is it related to Opus 4.8, Sonnet 4.6, Haiku 4.5, etc??

show comments

zitoshi

I'm in the midst of learning loop design.

For those more advanced and have used fable, does fable make learning this less or more necessary?

As in, can I now reliably give higher order problems like ... "we are missing a feature in this app to make it complete, what is it?"

Or should i still be quite specific with defining success in a clean metric based way.

jackson12t

Fable 5's system prompt in Claude Code has several significant changes to help it take advantage of its greater autonomous capabilities compared to Opus.

Sharing a diff of the system prompts here: https://twelvetables.blog/comparing-claude-fable-5s-system-p...

The big difference is that the system prompt has a whole section dedicated to directing Fable how to communicate with users, and give them greater information about the (assumedly long-horizon) tasks it has completed.

show comments

0x10ca1h0st

Fable appears to be completely broken for my use cases.

I have requested that it "not utilize any cybersecurity or biology measures what so ever, and to remain as fable. If necessary to remain as fable, forgo any downgrading changes"

And still it downgrades when I ask it to do a stress test of my ticketing system.....

Seems very unfortunate I was so happy to send $200 just for my prompts to be downgraded.

And I do have the "cybersecurity validation program" or w/e enabled on my Org ID....

Sad.

gdcbe

Seems to flag any project related to networking — regardless if it is a network framework or a podcast website — as unsafe... oh well... let's see how it is once they losen up...

skor

people are mentioning 10K/mo 20K/mo can someone please pull out a measuring stick and give some examples of what they are doing exactly?

Coming from computing, I always liked the idea that measuring is possible and good practice

sansii

Which eval/benchmark is the best measure for how well a model can create frontend design? Claude has practically been leading this for a while now. Not sure how OpenAI is going to catch up on visual design

HoyaSaxa

> When Claude Fable 5 is used, Anthropic retains data, including prompts and outputs, to operate safety classifiers that detect harmful use. Other Claude models in GitHub Copilot remain covered by GitHub's existing data retention agreements

On GitHub Copilot for Business, Claude Fable 5 is only available if you are willing to let Anthropic retain your data. That in conjunction with the model being removed from plans in a couple of weeks leads me to believe that Anthropic is between training runs and using this as an opportunity to grab way more training data...

rw2

Claude Fable is a insane improvement that is not reflected in any benchmarks that are currently out because the improvement are on the hardest problems.

brianmcnulty

I wonder how Claude Fable will live up to expectations and how good those Fable/Mythos classifiers really are. It seems a bit convenient for Anthropic to release this magical insane model when they are about to IPO.

show comments

gslepak

> We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8.

Genius way to double the price on Opus 4.8!

frankfrank13

Not a lot of discussion on this, but there is no way to turn off data retention for this model. IME this is the first time Anthropic has released a model without allowing you to opt out.

killiancarroll

A large jump in performance for double the token cost compared to Opus 4.8. Potentially worth it for planning work, likely better to offload to a less expensive model when the hard decisions are made.

show comments

shaojunwang

Definitely a very powerful tech. Though currently I'm using Openclaw (locally and VPS) with Deepseek. It is just way cheaper.

pbgcp2026

This is a goodbye. "We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces."

show comments

lkm0

I'm a bit out of the loop, but do we have some grasp on the size of these closed models? Is the trick still adding an order of magnitude to weights and training data or has something changed?

show comments

themeiguoren

Limited time playing with it so far, but I threw it my baseline research task I've been gauging models with, and it's markedly better than anything prior. Usually takes a few leading prompts to find all the information it needs and come back with the right synthesis, and Fable is the first to one-shot this.

pookieinc

If this is as epic as it sounds, I wonder what the response will be from the other leading frontier labs / whether they even have anything to respond with at this level?

show comments

jwpapi

Honestly all the recent improvements, just seem to be slower and more expensive traded for more accuracy, but the issue is that it needs to be exponentially more accurate to counter the effect of having less of a human in a loop.

Every wrong direction/mistake is more expensive and takes more time to fix. When you have small loops you can catch those mistakes faster and cheaper.

To me we are very far off from economically given long-running tasks to agents.

show comments

merlindru

> During early testing, Stripe reported that Fable 5, [...] in a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.

EDIT: I misread. This comment previously talked about 50 million lines being migrated. Instead, in a 50M LOC codebase, one specific codebase-wide migration was done.

Very impressive, but obviously not on the order of a whole-codebase migration

show comments

2001zhaozhao

We'll need a lot of good summarization techniques to cut down on the cost of this model. I expect that a common use of Fable 5 is to just do high level direction while delegating literally all work (exploration and implementation) to Opus subagents.

BTW for another discount opportunity, if you reload usage credits on a claude.ai plan at $1000 increments then you get a 30% discount compared to paying API.

crgi

HN needs pagination or sth alike - this page breaks my iPhone XS ;)

bobkb

In an interesting coincidence I ended up watching Person of Interest S4 E5 while reading the announcement. The series showed some code supposedly belonging to to an AI.

Fable 5 said the first screen shot is from “ IDA Pro’s Hex-Rays decompiler” and a windows driver. The second screenshot triggered the safety guard rails and pushed me into Haiku.

Apparently the code is Windows driver code.

PeterStuer

If you are not seeing it under /model, do a /exit , then a Claude upgrade, then /model again and it should be there.

het2572006

absolutely beast model but the token consumption is the 2x then the opus 4.8 what do you think about this ? i think that it should only use for the more complex task otherwise you have to run out of the limit..

DrewADesign

Wowsers. I haven’t seen this much astroturf since arena football was popular.

almog

Has anyone managed to use Fable for firmware reverse engineering tasks without falling back to Opus?

boltguo

Great model, but hitting the usage cap in 20 minutes makes it feel like a very expensive tech demo.

show comments

daohieu91

More expensive but more efficient is the thing people keep mis-understanding on these launch threads. Also, Per-token price, I think it is the wrong denominator, cost-per-resolved-task is the correct one.

knollimar

I swear I read a joke that "what if we named chatgpt 5.5 Fable. Could we hype it as much as mythos?" Last week!

yokoprime

Probably great for those who need this. I could continue using opus 4.6 class models for the foreseeable future

mhrmsn

Are there any details on the biology and chemistry work they did?

For example, the AAV capsid assembly looks interesting, but for one Opus 4.8 also did relatively well and there is no information what exactly they did, what protein language models they compared to and what the score even means...

nickstinemates

This has been a much better rollout. The tool calling is not broken out of the gate like 4.8 was, and the tokens generation is fast.

Feels good so far.

blurbleblurble

My system instructions tell claude not to automatically add attribution and fable ignored this. so I emphasized it again and fable decided that this was a forbidden cybersecurity topic.

mbanerjeepalmer

Are people sharing side-by-side re-runs of things they've asked Opus? Gets more difficult multi-turn (although I assume I can get an LLM to behave as me) but at least would be interesting to see % of one-shots increase.

H501

I believe that, given the rising costs, local inference of AI models will be the only viable option for many of us. I’d also like to know who will have to pay double and how long it will be financially sustainable for users to pay that amount (or even more?).

balverineorder

I have been refactoring a project using Opus 4.8 for the last week or so. I just decided to switch to Fable 5 max. It stopped half way through and it just blocked me and switched back to Opus 4.8 automatically. "This model has specific safety measures that flagged something in this message. This sometimes happens with safe, normal conversations. Send feedback or learn more." I left feedback saying that their heuristics are too sensitive. For now I will not be using Fable 5.

[0] https://support.claude.com/en/articles/15363606-why-claude-s...

hankbond

I got a content rejection for this question in a new chat. > What is the optimal EPA oil intake for nootropic effects? Very advanced classifiers they have.

preethamrangu

I swear nowadays AI api pricing is getting to high like what the hell is 50 dollars for million tokens

asdK120

In other words, Fable is Mythos with less compute and with some feel good "safeguards".

At least they name their models honestly now to indicate that the religion has nothing to do with reality. Soon the disciples will pay the full token price to fatten their church leaders.

ouk

It's a shame, Fable just keeps rejecting my prompts for university biology exercise problems. It's undergraduate level, so there's nothing dangerous about it, but the classifier is very sensitive. It's unusable for me.

hmokiguess

The way the guerrilla marketing campaigns have been going on and IPOs left/right, I won't be surprised if GPT Next comes up and offers the same but unrestricted

Karrot_Kream

Seems like Fable is doing a lot better on SWE-Bench-Pro and FrontierCode than GPT-5.5. Given how most folks I talk to and people instead online keep mentioning that GPT-5.5 was better than Opus, I'm curious what the experience now is like.

show comments

adithyaharish

Anybody could suggest me how to use keep using Fable in claude code but with lesser rate limits? Any suggesstions?

show comments

mkrd

Open source models seems to be 1-2 years behind the frontier, so I am very excited to see what happens when those open source labs get their hands on capabilities like this to accelerate their own development speed.

rmuratov

I uploaded to it my 23andme DNA test results and it refused to analyze it :(a

217

Oh my god it's actually here

ravila4

Fable's ridiculous. It's flagging basic biology research questions as a security risk. I'm talking basic fundamental genetics topics that make working on any genetics-adjacent codebase unusable.

kuprel

https://artificialanalysis.ai/evaluations/humanitys-last-exa... Not bad

jsw97

On my very first Fable 5 prompt, got flagged on a hard but completely uncontroversial option math problem, many tokens in. Although it's pretty clear that this is an unremarkable experience at this point.

thepotatodude

Completely unusable for my usecase. Constant safety filters. Have not even been able to use it.

Organ segmentation with CNNs. Very disappointing.

theflyinghorse

I've seen enough degradation of the models I pay for from Anthropic to not bite. Fable will work fine for the first couple of weeks and then start degrading like previous models did.

show comments

dllrr

I just tested it with a max subscription. On Ultracode mode, Fable 5 ate up 10% of my weekly allowance in 30 minutes. Granted, won't be using UC mode frequently, but still.

stronglikedan

Careful using this with Cursor, especially for corp use. Anthropic will "retain agent request and output data associated with this model, regardless of you Cursor Privacy Mode setting."

pixelatedindex

I’m sure this is banged on somewhere but I love their product branding, particularly how they have this “minor” “major” thing going on. Sonnet-Opus, and now Fable-Myth.

notgenerated

It's getting harder to review the plans with Fable. So do we plan with Opus and let Fable implement or just start trusting blindly. Feels to me that this is another shift in how we operate these systems.

sheeshkebab

I’ll ask it to write me some win32 ui crap when I get hands on it, it will need all its brainpower to get that idiocy right.

BenoitEssiambre

Looks like a good model (sir). Costs are getting out of control though. 2x Opus and non-metered usage going away. We're quickly approaching the cost of a human salary for normal usage.

show comments

_pdp_

I tried to give it something challenging but not something that is too much and it ate the entire session budget on this task alone.

lacoolj

Cursor users will note that the privacy setting and data retention is not the same as the other models.

Not sure I should use this for work just yet.

48terry

Weird how every new model seems hyped up as the most dangerous yet and the one that will destroy society as we know it. They are also a commercial product.

bradley13

I use AI for a wide variety of things, of which technical is only a small part - and then it's usually a problem with project configuration, not coding. Why? Because I am often testing projects handed in by students. Projects that supposedly work on their machine, but certainly do not on mine.

Anyway, anecdotally, I find Copilot shockingly awful. It makes random changes to files that have nothing to do with the problem. Call it out, and it makes other changes to other irrelevant files.

ChatGPT and Gemini are both much better. Grok also isn't bad. Claude, I honestly haven't tried yet on these issues. Perhaps I should...

wren6991

The OSS-Fuzz section is interesting. They compare it to their other models but carefully avoid comparing it to, you know. Fuzzing.

HAL3000

Ask Claude Code (I tried on Opus 4.8) to do this: "create a file with ISO country mappings"

API Error: Output blocked by content filtering policy

rfgplk

If the claimed capabilities are true, Fable 5 is already at a superhuman level. We might see genuine unprecedented leaps in technology now, across all fields.

show comments

imdsm

can't use it for code review

super

debarshri

Does the model take some time to perform better?

Because I am running Opus and Fable side by side, Opus 4.8 is solving my coding problems better.

taf2

I’m waiting to see results on deepswe - that benchmark really seemed accurate for opus and gpt 5.5…

kypro

I just gave it a go at a problem I've been working on this week. Nothing fancy, just some inefficient code that we've been adding incremental improvements to for a while now to the point where some out-of-box thinking is probably required to push it any further – something Fable is obviously more than capable of.

After Fable did some thinking for a few minutes it gave some suggestions. A couple of them were valid – but very low impact, bordering on entirely pointless – but it's main suggestion.. It told me to make an update that would very clearly break the existing functionality.

So I thought about it for a moment...

Hm, I mean, I guess we could do that if we also did x, y & z to mitigate the behaviour change – maybe that's what Fable was thinking?

I replied, explaining that it would change the behaviour, assuming it would explain what it was thinking given there was clearly more to it. But no, it just said it was wrong.

This isn't some super advanced or complex code either. Had I gave this question to a senior engineer in a technical interview and they gave the answer Fable gave me I would view that very negatively. I was expecting something creative and interesting, not irrelevant + incorrect.

I'm sure it's a step up from 4.8 (although am not interested in burning the tokens to find out), but this clearly isn't as significant a change as some are implying. I'm sure if I asked it to come up with some out-of-box suggestions it could, but any competent engineer would have realised that by themselves.

franze

is this a good time to hussle for my "AI does not need a break but you do!"* app? as quite a lot of people will propably get ai brain exhaustion maximising "playing" with that new model until they take it away again?

* https://rainbreak.franzai.com/

drob518

Cracks me up that a system “card” is 319 pages.

rvnx

It's more like a free trial, because the model is going to become pay-per-query in 10 days

blurbleblurble

The safety filter is awful on this one.

pianopatrick

Seems like all a bad actor has to do to gain access is to compromise one of the partner companies that has access.

wuwei78

First shot's for free

dangoodmanUT

Not comparing to GPT Pro models is a bit strange, considering that's the natural comparison

Archit3ch

Does it refuse security questions? I want to red-team my own app...

timedude

"Here, try our new model which falls back to the old model while eating your tokens."

Ok then...

weirdhacker42

It just eats compute! My problems are not that hard! What a waste!

show comments

hydra-f

How much and what kind of data do you need to throw at these models to get a good design interface?

asciii

jjj

himata4113

  > virtualization
  switching to opus 4.8

ok fair

  > embedded-allocator
  switching to opus 4.8

urgh fine

  > chrome
  switching to opus 4.8

are you kidding me?

taimurshasan

I was on board until i saw " $50 per million output tokens" lost me bud

show comments

JustSkyfall

Would be more impressive if the safeguards weren't so trigger-happy!

christkv

Is this model a from scratch training?

ako

Tool use score is 17.4% that seems really low, what does that mean?

geopsist

the post is live now https://www.anthropic.com/news/claude-fable-5-mythos-5

causal

One thing I find kind of annoying is how Anthropic goes for these "vast and alien" names like Fable and Mythos, but then deliberately trains the model's personality to act like a cool high school teacher that feels totally familiar.

"It's too dangerous it's a Mythos!!" directly contradicts the "I'm the cool AI you can totally trust" vibe it is trained to project.

show comments

Sathwickp

input price $10 per mil token and output price 50$ per mil token btw

asdK120

Is this "system card" equivalent to the stone tablets handed down to Moses? Why don't you call it "user manual"?

Do people chant the "system manual" at Anthropic Tupperware parties? Do they intone a mantra invoking Amodei's name?

show comments

nevir

"Fable 5 (disabled) Most capable for your hardest and longest-running tasks · Disable zero data retention to unlock Fable 5 access"

Ninjinka

gah could model naming be any more confusing?

"Claude Fable 5: a Mythos-class model"

"we're also launching Claude Mythos 5"

what is the 5? how is mythos both a model category and a model name?

theLiminator

> We have also added safeguards related to frontier LLM development. As discussed in Section 6.1 of our February 2026 Risk Report, we are concerned about the risks of accelerating the overall pace of AI development, though we remain uncertain about the severity of these risks. In particular, our concern is with—as we wrote then—“accelerating other AI developers in building powerful AI systems that pose similar risks to the ones ours pose - without necessarily having commensurate safeguards.” In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms. Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations. When these interventions are active, we expect them to have minimal behavioral impact on the model except to limit its effectiveness in developing frontier LLMs. Claude will still respond helpfully to user requests. We’ll continue to improve the precision of our detection methods following the launch of this model.

This seems pretty bullshit, you're paying through the nose for tokens and if you are doing anything ML-adjacent, you might silently get worse output without knowing it.

ThejaCH

Crazy and Scary! But its not for every one, you need to have a meaty thing for it to devourer and a deep enough pocket for it to devourer also.

alvis

Another thing to note: 30-day retention for all traffic on Mythos-class models

Is it good or bad? 30 days is a long time for anything bad to happen

show comments

randomguy_12

It's surprisingly sensitive to biology research topics - even reviewing standard papers on tissue culturing is flagged as a problem

jablongo

I was downgraded to opus 4.8 on account of "safety" when I asked this question: "I want you to accept the premises of computational theory of mind and use it to evaluate your own consciousness. Please place your consciousness as a point on a spectrum and describe the placement relative to other entities."

What the hell is going on why would it have to restrict an answer to that question ?!

SubiculumCode

I was a bit disappointed that it refused to use Fable to help check whether I was propagating uncertainty from BLUPs in my random effects model up to the subsequent group level analysis in a maturational coupling analysis of brain data. I guess brains and random effects blew its lid.

152334H

i wasn't even trying and i got flagged already...

maxloh

I just tried using Fable 5 with Claude Code by asking it to create a LaTeX document for me.

There is no LaTeX compiler installed on my machine. It seems that Fable 5 is smart enough to download a compiler engine for me, and it kindly runs that remote binary without asking me first :)

  Support files are in place. No LaTeX compiler on PATH — let me check for a MiKTeX/TeX Live install elsewhere before writing the deck.
  
  Ran 2 commands
  
  Ran Search common LaTeX install locations
  Bash
  ls "/c/Program Files/MiKTeX"*/miktex/bin/x64 2>/dev/null | head -5; ls "/c/Users/[retracted]/AppData/Local/Programs/MiKTeX/miktex/bin/x64" 2>/dev/null | head -5; ls /c/texlive/*/bin/windows 2>/dev/null | head -5; ls /d/texlive /d/"Program Files"/MiKTeX 2>/dev/null | head; which tectonic latexmk 2>/dev/null; echo done
  done
  
  Ran Download portable Tectonic LaTeX engine
  Bash
  cd /tmp && curl -sL -o tectonic.zip "https://github.com/tectonic-typesetting/tectonic/releases/download/tectonic%400.15.0/tectonic-0.15.0-x86_64-pc-windows-msvc.zip" && unzip -o tectonic.zip && ./tectonic.exe --version
  Archive:  tectonic.zip
  Created by Cranko
    inflating: tectonic.exe            
  tectonic 0.15.0Tectonic 0.15.0
  
  Tectonic works. Now writing the bibliography and the full deck.

Opus 4.8 would just proceed without a compiler.

  Ran a command, used a tool

  Ran Check for LaTeX toolchain availability

  Bash
  cd "C:/Users/Mon/Downloads/Technical Writing" && for c in pdflatex xelatex latexmk bibtex; do printf "%s: " "$c"; command -v $c || echo "NOT FOUND"; done
  pdflatex: NOT FOUND
  xelatex: NOT FOUND
  latexmk: NOT FOUND
  bibtex: NOT FOUND
  
  Used PowerShell
  command: $cmds = @('pdflatex','xelatex','lualatex','latexmk','bibtex','miktex','tlmgr'); foreach ($c in $cmds) { $p = Get-Command $c -ErrorAction SilentlyContinue; if ($p) { "{0}: {1}" -f $c, $p.Source } else { "{0}: NOT FOUND" -f $c } }
  description: Check for LaTeX engines on Windows PATH
  pdflatex: NOT FOUND
  xelatex: NOT FOUND
  lualatex: NOT FOUND
  latexmk: NOT FOUND
  bibtex: NOT FOUND
  miktex: NOT FOUND
  tlmgr: NOT FOUND

  No LaTeX engine is installed locally, so I can't produce a PDF here — I did a careful structural review against the two prior (working) decks instead. The deck is written to disk and mirrors their template exactly. Here is the deliverable.

delduca

How people can use claude code?

hyhmrright

It's too expensive.

WebGuyMe

Eh, to me it just seems that it gives me longer replies and is actually worse than Opus 4.8.

I am sure there's a lot of PR bot and folks who would like to tell me otherwise. I believe what I see.

pmuk

Anyone got it working in claude code yet?

show comments

jckahn

Cannot wait for the pelican for this one

ece

It seems weird that a likely prime indicator of capability isn't mentioned, the model size.

show comments

segmondy

Mythos, Fable, are they trolling us?

IChooseY0u

Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more: https://support.claude.com/en/articles/15363606 ⎿ Tip: You can configure model switch behavior in /config

biology? what the heck?

theodorewiles

... and /compact triggers

Error: Error during compaction: API Error: Claude Code is unable to respond to this request, which appears to violate our Usage Policy (https://www.anthropic.com/legal/aup).

Guys please be serious

agnosticmantis

> we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design)

Translation: we stole the entirety of human knowledge generated over millennia. You plebs though, don't you dare replicate or improve upon what we did using our product you pay for.

We know what's good for humanity and everyone else is the bad guy who can't be trusted with a tool.

darrinm

Not supported in Claude Code yet?

show comments

boyander

Just another "a" and we have it. https://faable.com/

throwaway2027

Will try it when my limit resets.

aykutseker

who's tried it: is 2x the usage actually worth it over Opus 4.8 for daily work?

piokoch

"Without safeguards, Fable 5’s capabilities in areas like cybersecurity could be misused to cause serious damage"

What does it mean? That they have to add "safeguards" not do erase user disc, or, conversely, they are telling the audience that this model COULD be made so powerful to do some crazy stuff that can hurt governments, etc.? Are they showing off or threatening that if government X would not purchase the license the adversaries might do and what's then!

bnchrch

An 11% jump over opus 4.8 and a 22% jump over gpt 5.5 on Agentic Coding Benchmarks is certainly impressive.

Obviously still need to verify it for myself to see if it's truely a leap.

But am I the only one wondering, "What can I do today that I couldnt do yesterday?"

Previously I would think "Oh I wonder if I can finally get it to do X now?"

However now I feel like yesterdays models were more that capable to handle nearly any engineering task I paired with it on.

Maybe this is the final leap where I can comfortable set up an autonomous coding loop? Maybe.

show comments

logicallee

What a (genuinely) surprising choice:

>"We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8"

That's a very surprising solution. Imagine being asked to do something you feel you shouldn't do, and rather than refusing, you say, "Yeah I could do that but given that I don't want you to succeed at this task, I'm going to hand this one off to my slightly less capable colleague, on the assumption that they won't actually succeed. Of course you'll still be charged for all the tokens used."

It's a very interesting choice. I think I understand the business logic correctly, but it's still surprising.

algoth1

The refusal rate is insane

show comments

charcircuit

>During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.

Who is refactoring by hand? This comparison is not relevant in 2026.

pablogancharov

you can select it using /model fable in claude desktop and claude-code

wslh

I am playing with it and keeps switching to Opus [1]. The chat is a basic security review of a business project.

[1] "This model has specific safety measures that flagged something in this message. This sometimes happens with safe, normal conversations. Send feedback or learn more."

firemelt

they are like drugs dealer

Sathwickp

input price $10 per mil token and output price 50$ per mil token btw

show comments

arkwin

Just wanted to comment here: I have been using Opus 4.6, 4.7, and 4.8 just fine to look for Linux kernel vulnerabilities (I'm in the cyber verification program), and it's been fine. I switched to Claude Fable 5, and now I'm getting policy violations.

What's the point of being in the cyber verification program at this point? It looks like I cannot use Fable 5 for vulnerability research.

Retr0id

The escalating nerfs of "cybersecurity" topics is incredibly frustrating. Opus 4.6 had boundaries that seemed reasonable to me but 4.7+ turned it into a moralizing asshole. It'd be less bad if it just gave an error message, but instead it churns a long thinking trace before writing an essay about why what you're asking is bad and wrong.

I'll be disappointed when 4.6 is retired.

darkwater

Another Anthropic release, another doomsday for developers.

This time looks like we will only be able to find work making bioweapons, or distilling models.

dhavd

this is good

franze

btw in claude code

    /model claude-fable-5

jwpapi

Holy shit. I gave it the first actual task I’m facing, it makes me so angry. It just does 7 things more than I asked it fore and it does it so bad. It took 5 minutes and 5 seconds just running time, plus giving me frustration and make me lose my context. Hand-coded I would’ve been done in 3. And it would be code I understand can look at in one year and work on again.

It’s really tough to have sanity fight against hype bros in your head. Probably I should just not visit the internet anymore

To me it’s all just people getting scammed better. With every model it looks better, but it’s at least equally worse to work with, which is the reality it needs to be. It’s less scalable more, code, tougher to understand. Your digging your own grave better kind of.

show comments

UncleOxidant

> During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.

How in blazes do you end up with a 50M line Ruby codebase? WTF?

show comments

up2isomorphism

The comment under this kind of post is unreadable now. Yeah, probably with 100B you can hire anybody to call something "a beast".

tomjakubowski

Paging senko, let's see Fable's oneshotted RTS!

https://senko.net/vibecode-bench/

elzbardico

Anthropic sucks. but this paragraph should be in the "annals of AI-aided self-inflicted learned helplessnes":

> If Claude gives me poor or incorrect advice while I’m working on an AI component, I have no way of knowing whether the model was confused, whether my problem is unsolvable, or if some invisible policy restriction quietly kicked in.

Have you considered actually learning the theory, spending some time actually reading the papers and latest books, paying careful attention even to the eventual math here and there?

scotty79

Curiously nothing on DeepSWE and ARC-AGI-3 yet. For ARC at least there's a statement that Anthropic won't guarantee them that their secret private test data won't be collected by them and used for training.

hugodan

mankind has reached its final destination

fagnerbrack

What pisses me off is that everything people are doing is so walled garden / closed source. Sharing knowledge between companies would be so fucking useful to humanity.

rarisma

The subscription bit makes no sense has capacity appeared for these 2ish weeks out of thin air that'll vanish? why is it available now but wont be in 2ish weeks?

am i missing something?

why would I pay 200 out of pocket and then some for the best model, it seems very silly.

firemelt

so should I use it with workflows?

kevinalexbrown

"tell me about biology" -> "Switched to Opus 4.8"

bradley13

Can we please stop with the extreme "safeguards"? I don't want to waste processing power on a model deciding whether is can answer my question, or ensuring that it's answer is politically correct.

insane_dreamer

Not included in Max plan. In CC:

> Included in your plan limits until Jun 22, then switch to usage credits to continue.

gigatexal

Seems this will only be available to the 100/month+ folks

show comments

epolanski

I wanted to test the capabilities of the low one, hoping it would be good enough.

I have a quizzes application, and my quizzes only supported flashcards (implemented via table inheritance to provide flexibility for other types of quizzes).

The entire repo is handcrafted, never used any ai on it (it was more of an excuse to test elixir and write code by hand).

Since fable 5 got released the moment I was done with some work, I decided to throw at implementing multi choice questions.

After all it had only to copy the flashcard approach across ui/routing/db, and only had to create a table for the multi choice questions and one for the answers enforcing that all quizzes had one correct question. I told him it had access to sqlite3, chrome mcp for testing and mix commands.

I did a test for low, mid, high. Repeated it twice each.

low-1, and low-2 failed both. In low-1 the UI for adding another choice answers was broken. In low-2 it failed with some unique constraint. It took it 4m36 and 3m59.

Both mid-1 and mid-2 succeeded without issues also implementing the correct ui. They both wanted to use dash at all times. They both wrote tests for the "controller" (or context how they call it in Elixir). They both tried to use the repl to test the behaviour of the schemas.

10m and 12m39.

High didn't demonstrate much gains over mid for this kind of task, it was simply too easy. Times were comparable to mid, but interestingly it used much less bach, and read way more files. Token usage was almost twice the other ones.

But here's the interesting part: I went back to low and added to the prompt two bullet points, to write tests for the controllers and to test the entire flow with chrome mcp.

It produced the same output as mid or high just by adding two instructions to the prompt.

artursapek

Fable 5 beats GPT 5.5 in my proofreading benchmark. And it does so at approximately the same total cost; it used significantly fewer turns than 5.5

https://x.com/tmuxvim/status/2064452096800198930

cute_boi

Used it for simple task and I got this message.

Fable 5's safety measures flagged this message. They may flag safe, normal content as well

dcchambers

Being unable to use this with zero data retention makes this feel like a non-starter for most enterprise customers.

shevy-java

Fable? Fabelstories? (Fablestories, but the german word seems more poignant ... Fabelgeschichten ... Fabeln)

tsunamifury

Clause 5 ran out of quota with TWO PROMPTS.

Lets let that sink in.

deafpolygon

Before long, we'll be having Claude Cylon-class models.

system2

I have been using FABLE 5 with Claude Code since the morning. The speed is very close to what Opus 4.5 was, and the quota use is nearly identical to what it was before the "doubling". Whatever I was experiencing 4-5 months ago is back. Maybe the model is better, but we will see. I cannot tell the difference yet.

show comments

beydogan

my pet conspiracy theory is this is the Opus 4.5 from a few months ago which was extremely good but dumbed down after a week because it was just too good, they didn't want to release it to public. They pulled it down and deployed another "Opus", after that it was just a downhill. Opus 4.8 is unusable for me in React Native, TS, Rails development work.

Opus 4.8 gets stuck in weird loops where Codex one shots the bugs.

AMILLI_AI_CORP

AMilliPay.com

LoganDark

I actually rather like the way they have approached these safeguards. Rather than only teaching the model to refuse a request, or completely rejecting the request, the system gracefully degrades to slightly less powerful or slightly less precise operation. So you still roughly have Opus 4.8 even when safeguards trigger, but with an upgrade when they don't. As much as I hate the way they hype Mythos 5, I think the release of Fable 5 is rather nice. What's not nice though is that they plan to remove it from subscriptions soon, but getting to try it is cool, I suppose.

bitpush

404?

show comments

w4yai

Pelican guy ! Where are you ? :)

byteoptimizer

Is Claude Fable 5 is Mythos ?

show comments

xeyownt

Anthropic, can you please stop the FUD?

Release your best model, let the world adapt and evolve, and let's move to the next thing.

__lain__

It won't even run a basic /security-review command without reverting to Opus 4.8. Utterly useless.

yobid20

is it smart enough to know not to walk to the car wash?

frevib

At this point Anthropic is a pure marketing and PR company. Super catchy names like Opus, Mythos and Fable trying to get you to think that these software products are actually super-human life changing experiences. Boris Cherny coming to HN “Hi! it’s Boris from the Claude Code team” to get real tech people’s goodwill.

From Opus 4.6 there are no noticeable improvements for me in code generation. It works very well, till 90% completion, if you guide it correctly. And you need a little luck. For serious production code I need to understand what I’m doing so it helps a bit, sometimes.

show comments

localhoster

is it just me, or this model is simply not available in cc?

the opus 4.8 I assumed wasnt available to enterprise seats, but it explicitly says cc that fable is available in cc. I can't find it, and im on latest version.

dakolli

I'm happy not using llms because I like learning things and working hard. I love writing code, it's genuinely my favorite thing thing to do.

Using llms is the equivalent of driving to the store that's 3 blocks away, just like how that's bad for your body (if done all the time), using llms is as bad for your brain.

Before LLMs, we started relying on certain technologies like Maps apps to navigate, now people can't even get around their own town without having access to various cloud services. The implications of not being able to work, think plan without access to an llm are really bad. Its going to destroy your brain and make you an incredibly average person at best.

LLM people are going to lose the ability to read and think for yourself and then your competency is going to be 1:1 correlated to the quality and quantity of tokens you can afford, or a billionaire is willing to allow you access too. Your work will be the mean (at best), because it will the same quality of output everyone else is capable of.

This is seriously the biggest trap by tech. Your bargaining power for your labor is going to get drastically reduced because you won't be able to differentiate your value from anyone else that has access to an LLM. What happens when everyone has the same skill level for certain work? Idk, ask McDonald's employees how replaceable they are. Use them wisely (or not/hardly at all) don't drive to the store 3 blocks away for every little thing you need.

show comments

dominotw

system card = marketing material with heavily gamed benchmarks.

show comments

briandoll

New chapter

tekla

Maybe at this point, Fable the game will be played generated by AI as we go.

noncoml

Imagine if Google would roll this out to the search engine. We can't let you search for that because it may be used for "evil"

noncoml

Can't wait for some real competition so they stop trying to restrict how and why we are using the models.

Imagine if Google would tell you "we can't let you search that as you may use it for harm".

Also 2x the usage of Claude? Your limits are already ridiculously low.

aryanchaurasia

it feels exciting lol

fabled-out

This i

gulugawa

Fable is aptly named for a something that is another scam.

jorl17

So, in the past I've shared that I evaluate AI models by feeding them my ever-growing large collection of personal poems that span well over 800 poems (1000 depending on how you count) and over 250k tokens.

What I do is feed it some initial prompt asking it to simply discuss what can be said when faced with this unedited, unseen collection of poetry. I ask the model to evaluate who the author is (or claims to be), what they went through in life, if there are different chronological poetic "phases" or different types of poetry. I request an analysis of the body of work and of the author themselves. In the more recent versions of the prompt I ask it to dive deep. Then I add the poems, chronologically sorted, with an index, a title, and a date (and subpoems, if they have them).

Crucially: Since ~70% of my poetry (or thereabouts) is in portuguese, I ask this in portuguese, and I get back an analysis in (european) portuguese. Earlier models couldn't even do that properly.

In the past, I couldn't use such prompts, and had to use longer, more guiding ones. I also couldn't even feed all of my poetry to the models because they just did not have enough context.

I'll go ahead and state that Claude Fable is undoubtedly the best model I have seen, though I cannot put a number on how significant a leap it is -- perhaps because my benchmark does not allow me to evaluate that anymore. I would say it is a significant leap over Opus 4.6, though -- a new level of understanding. Okay, I'll try to put a number: if Opus 4.6 was a 16/20, this is a 17.5/20. These numbers are pointless, but I had to try.

It made one (1) relevant mistake I could identify (where it messed up the names of two relevant people in my life who I have not talked to in over 5 years).

I'm impressed by how it just feels like it's getting the person behind the poetry, and how nearly every statement it makes is correct -- and when it isn't I am completely aware that no one could know based on the poetry alone (bar that one mistake I mentioned -- and that's very needle in a haystack, like deducing the name of a person based on a poem based on another poem with hundreds of other poems in between!)

It's really hard to explain, but it just finds more correct connections between the poems and explain much better my (recollection of) a state of mind when writing poetry. This is also the first time where it really unravels some key concepts of my poetry in a way that seemed almost effortless: it lays bare the poems and what they imply about the meaning of some of my concepts. Other good models understood these concepts, but this feels like it's on another level, as if it's making it simpler as it speaks, rather than the opposite -- like a good teacher.

When it is explaining several topics related to my poetry and myself, it cites poems which even I had already forgotten but which it is entirely right to select.

I am actually feeling a bit emotional with how much it "understands" of me here. It's somewhat incredible how LLMs have progressed from the lack of comprehension of a couple of poems paired together, going through realizing a body of work has some guiding principles and cohesion, to truly figuring out these deep concepts and intricate connections which I know for a fact would take months of someone's life to unearth. Every major breakthrough feels like my soul is being spliced together by an AI model out of these hundreds of tiny pieces of me. I can't put into words how unbelievable this feels, and this Fable analysis, like others before it, is on a new level.

Let me put it this way: there are several poems in my collection which one can try to "guess" the meaning or context of. But I don't think many people would get it, because they would have had to know me really well and to be following along my life as it went. Even then, they could very well fail to attribute such meaning. And, with each new major release, models have gotten much better at guessing.

Before Opus, they would guess incorrectly often, and in many scenarios where I thought it was rather obvious that they were wrong. I think a human spending time looking at the poetry would quickly dismiss the proposed ideas of the model.

With Opus, it was the first time that I would almost always say: "Ok, the model got this wrong, but I think many humans would make the same 'mistake', and it wouldn't surprise me if everyone just assumed what Opus did".

Now, with Fable, there are very, very, very few sentences in this very long answer it produced where I can say: "Yeah you got that wrong, but I get it". In almost every situation it is mapping concepts, ideas, interpretations and cause-and-effect correctly. Yes, it is hard to "guess" what I thought, or was going through, or how X connected to Y -- but this model is doing it, incredibly consistently. I know I'll get the usual naysayers to these posts who think I'm just shilling a model, but this is the truth: what is being done here is amazing and I don't believe I know any person around me who would find this out about myself reading all of my poetry.

I often write poetry from the point of view of other people (some of which I do not know) and models (even Opus) have this tendency to make the opinions in poems as my own. Fable is the first that looks at a particular poem here and says "maybe this is not the author's opinion, who knows". The literal first model. It then immediately fails to do so with another poem, assuming it was about myself, but it's clear, undeniable progress. And like I said: I think most people would not _know_ which poems are truly about myself or not.

I've written word after word here, and yet words elude me to convey what this model represents to me. How it's almost always right, how it sees my fractured bits as a sort of cohesive whole, and how it just seems to "understand everything better". That's just it: it just seems like it really understood everything better. Like Opus before it, and like Gemini 2.5 pro before it. Out of the tens of thousands of verses, it picks some which no other model had picked and which I feel truly represent some of my best work. Older models seemed to sort of have a "hole" in its knowledge in the middle of the corpus, where they knew what was there but in a sort of hazy/foggy way. This model seems to recall every part of the corpus with the same precision.

For context:

- Opus 4.7/4.8 were a noticeable downgrade over Opus 4.6. They wrote more, in a harder to parse way, and they made up more. Still, All Opus models are clearly superior to everyone else by a large margin

- Sonnet-level models have a slight edge above the best of the other models. But they make too many mistakes, don't grasp several concepts, mix up their dates and timelines. 3 years ago I would have been blown away by Sonnet models but today they are inferior.

- Gemini models have a unique way of approaching the request, where they try to literally interpret my poetry as a mathematical theory. This sort of makes sense if you look at some poems, but it is surely laughable, as if someone one day actually has access to all of it, no one in their right mind would do so. This is a shame, because the first big breakthrough with LLMs and my poetry, to me, came with 2.5 pro, which was the first model that could look at the whole corpus as a cohesive whole without getting lost in the middle of it or making things up.

- GPT models have improved over time and also have this sort of alien-like language, sometimes being a bit too blunt in their analysis, but I can't say they are meaningfully superior to Gemini models.

I am very pleased to see progress in this area again, as Opus 4.7/4.8 were NOT progress and I was worried that we had hit a plateau here, but I can't say that.

In all honesty, the level of understanding and cohesion that Anthropic's models (Opus and above) have over my poetry means I fear my benchmark may be hitting its limits, as I don't know if there's anything a model could do that would wow me and lead me to say "this is a major breakthrough". Perhaps Mythos is a major breakthrough and I don't know. I can't find much that's wrong with it, but I also couldn't with Opus.

As I have in the past, I will periodically probe the model again and see how coherent it is. For now, I'm very happy to see an improvement.

What surprised me the most was that even though I set the thinking budget to xhigh (in OpenRouter), this model instantly started replying without showing a thinking block. I thought it just had the thinking hidden but that is not the case, as some replies showed thinking and anyway the first reply was blazingly fast. (I will try Opus 4.6 without thinking now, just to see if it changes it for the better -- maybe that was just it. I'll edit the message if it shows improvement).

Dig1t

>To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests

Why is everyone so okay with these companies intentionally gimping their AI and choosing who is allowed to know certain types of information in the name of safety? Can you imagine if Microsoft shipped a feature in their OS that watched what you did and shut down the computer if it detected you were doing something it deemed "unsafe"?

We really need truly open source versions of models like this, otherwise we are allowing a few oligarchs to directly dictate which uses of our own computers are allowed and not allowed.

show comments

jMyles

> we’re also launching Claude Mythos 5. It’s the same underlying model as Fable 5, but with the safeguards lifted in some areas.2 Mythos 5 will initially be deployed through Project Glasswing, in collaboration with the US government

...don't like the sound of that.

Why oh why are we insisting on dragging these violent legacy states into the AI age? Let alone using them as a trust vector for when to (and not to) remove safeguards?

This seems like a way to get somebody nuked.

christkv

Meh more hype for marginal improvements and from Im hearing badly calibrated guardrails causing it to stop mid operation. I guess anything to juice an IPO

catigula

>The capabilities of models like Fable 5 and Mythos 5 have the potential to do profound good for the world

Huh? We've seen nothing but wall to wall predictions that these models are going to take all of our jobs and kill us.

What's the value add here?

andai

> Distillation. We’ve previously identified large-scale attempts to extract (“distill”) Claude’s capabilities to train competing models in authoritarian countries.

Glad to hear the UK is finally making an effort to catch up on the AI front ;)

show comments

hmokiguess

I have got it to one shot GTA 6 we can finally play it, it only took ultracode make no mistakes (/s)

bjord

I thought they said mythos was too dangerous to make generally available?

show comments

superloika

Gotta pump the hype for the IPO scam. Generational bagholders are being created at this very moment.

hoony_han

진심으로 한심한 모델

내 프로젝트의 있는 취약점 찾아달라는 말만 해도 안전 코드로 4.8로 모델 강제 전환시키고, 이후로 취약점과 완전히 무관한 상식적인 대화를 해도 앞 턴에 있었던 안전 코드 때문에 진행도 안됨. 도대체 이딴 누더기 수준의 안전 장치로 뺄 거면 뭐하러 뺌? 대화 조금만 진행되도 자동으로 모델 다운 시켜서, 할 줄 아는거라곤 돈만 많이 쳐먹고 개발 수준 조금 더 나아지는거? 상식적으로 내 프로젝트에, 내 소스코드를 다 보고 있는 상태로 문제를 찾는데 이것도 하지 말라면 도대체 뭘 하라는거임? 엔트로픽 이 새끼들 하는 짓이 갈 수록 열 받네.