Claude Fable 5 on Bedrock: A Hands-On Comparison, and the Data-Retention Switch You Set First

written by Stefan Christoph

June 10, 2026 - 23 minutes read

TL;DR: Claude Fable 5 went GA on Amazon Bedrock yesterday (June 9, 2026), so within a day I ran it head-to-head against Opus 4.8 and Sonnet 4.6 (all three EU-resident in Frankfurt) on a document-reconciliation task, a cited research synthesis, and a multi-file refactor with a subtle cross-cutting bug. The honest result: on the easy and mid tasks all three were at quality parity and Fable 5 was the slowest; on the hard refactor the two leaner models shipped complete, correct fixes while Fable 5 gave the sharpest diagnosis but, under a bounded output budget, never shipped a runnable one. The more useful takeaway is upstream of any benchmark: before Fable 5 runs at all you must opt a scope into provider_data_share, which retains prompts and completions for 30 days. And the nuance EU teams need: EU Geo keeps the data stored in the EU, but the opt-in still lets Anthropic access flagged content for review, and the public docs do not confine that access to the EU. Decide that switch, where you scope it, and your residency story before you worry about how good the model is.

Disclaimer: I’m a solutions architect, not a lawyer. This post touches data residency, retention, and GDPR / EU AI Act questions that carry real legal weight — treat everything here as my personal, hands-on view, not legal advice or authoritative guidance. Verify anything compliance-critical with your AWS account team and your own legal counsel.

A new frontier model landed on Bedrock yesterday, so I did what I always do before I trust one, the same way I sized up the last batch of model arrivals on Bedrock [5]: I gave it a small, fair test against the models I already run, and I made the test grade itself where it could [6].

Claude Fable 5 went GA on Amazon Bedrock on June 9, 2026 [1]. This post is less than a day old as I write it. It is Anthropic’s first generally available “Mythos-class” model: a 1M-token context window, vision input, and always-on adaptive reasoning, pitched at sustained, multi-day agentic work [3]. I wanted a feel for it for my own learning, and I wanted something I could hand to the customers who will ask me about it this week. Both goals pointed at the same place. The first question about Fable 5 is not “how good is it.” It is “are you allowed to turn it on, and where did you scope that decision?”

This post is about that decision, and what a quick three-model run does and does not tell you. If you advise a regulated team, or you are one, the parts you can act on are the data-retention scoping and the residency story, not the latency numbers.

It landed yesterday, and the first call refused

Fable 5 showed up listed and active. The first call came straight back with this:

data retention mode 'default' is not available for this model

That is not a quota issue or a bug. Fable 5 and its sibling Mythos 5 accept exactly one data-retention mode: provider_data_share. Until the effective mode for your scope is set to it, Bedrock will not let the model run [2]. And this is not an AWS-specific rule — it is an Anthropic policy for the whole Mythos model class that applies on every platform serving these models; I unpack what it is, where it comes from, and how to scope it further down [4]. The two baselines I tested, Opus 4.8 and Sonnet 4.6, accept the stricter default mode and ran with no opt-in at all. Only the Mythos-class model carries the requirement.

So I made the opt-in, ran the three tasks, and set it back. I will come back to exactly what that switch is, what it affects, and where to scope it, because that is the part worth getting right. First, what the run actually showed, because the scores set up the real lesson.

What three EU-resident models showed

I ran everything in eu-central-1 (Frankfurt) through EU Geo inference profiles — Bedrock’s geographic cross-Region inference (the eu. id prefix), which routes a request only across EU Regions so processing stays inside the EU geography for residency [14]. (Data is stored only in the source Region; prompts and outputs may move between EU Regions during inference, over AWS’s encrypted network [14].) So the inference, and the retained data while the switch was on, stayed in the EU. The candidate was Fable 5 (eu.anthropic.claude-fable-5); the baselines were Opus 4.8, the strongest general Claude on this surface, and Sonnet 4.6 as the mid-tier option. Three tasks, increasing in difficulty, each with a planted trap, each graded as objectively as I could make it.

Overview of three tasks of increasing difficulty and the outcome for each — The shape of the run: three tasks of increasing difficulty, each with a planted trap. All three models were correct on the easy and mid tasks; on the hard refactor the two leaner models finished correct fixes while Fable 5 diagnosed deepest but did not ship one.

Easy: reconcile a one-page report (parity)

A one-page media performance report (a KPI table plus a bar chart) with two planted problems: a footnote claiming Q3 added 1,800 subscribers while the table says 1,500, and a Q4 revenue-per-subscriber that drifts well above the stated EUR 300 target. The ask: transcribe it, compute the per-quarter metrics, and flag what does not add up.

Case A task: transcribe a report, compute ARPU per quarter, and catch two planted traps — Case A, the task: read the report image, compute ARPU per quarter, and catch both planted traps — a footnote that contradicts the table, and a Q4 metric over target.

All three nailed it. Each caught the footnote-versus-table contradiction, worked out that the footnote figure would drop Q3 ARPU to EUR 250, and flagged Q4 at EUR 363 against the EUR 300 target. Fable 5 took 22.0s, Opus 11.6s, Sonnet 17.4s for the same findings. This is the buyer lesson in miniature: for routine, well-scoped work, a cheaper general model matches the frontier one. Pick your model deliberately rather than reaching for the newest by reflex.

Case A results: latency and output tokens for Fable 5, Opus 4.8, and Sonnet 4.6 — Case A results. All three were correct; Fable 5 was the slowest and spent the most output tokens for the same answer.

Mid: a cited vendor ranking (parity, with a residency trap)

Twenty fictional source documents about EU content-platform vendors, with the deciding facts scattered so the answer needs cross-document reasoning. The trap: a vendor that hosts in the EU but routes inference through a US sub-processor, which is not the same as EU-resident inference. The ask: apply a buyer rubric, rank all five, recommend one for a regulated EU publisher, and cite a source on every claim.

Case B task: apply a buyer rubric across 20 documents, rank 5 vendors, recommend 1, see through the residency trap — Case B, the task: cross-reference 20 documents against a buyer rubric, rank five vendors with a citation on every claim, and see through the EU-hosted-but-US-inference trap.

All three picked the right vendor, ranked the field the same way, cited document numbers throughout, and saw through the EU-hosted-but-US-inference trap. Fable 5 took 39.6s and ran into its output cap mid-answer; Opus finished in 22.9s, Sonnet in 35.7s. Again, parity on quality, with Fable 5 the slowest and the only one to hit the token ceiling, because its reasoning is always on and spends output budget.

Case B results: latency and output tokens, with Fable 5 hitting the token cap — Case B results. Same right answer from all three; Fable 5 was slowest and was the only one to run into its token cap mid-answer.

Hard: a multi-file refactor with a cross-cutting bug (the honest one)

This is the case I built to give Fable 5 room to win. A small, synthetic legacy shopcart package I wrote for the test — a fictional shopping-cart module of three files (money.py, pricing.py, cart.py) — computes cart totals in cents, and it has a subtle bug that is genuinely cross-cutting: each file truncates money toward zero instead of rounding half-up, and the three truncations compound, so the total is short by one to a few cents on certain quantity, discount, and tax combinations. A reviewer who patches a single file does not fix it. The buggy code scores 6 out of 12 against a known-correct reference. The ask: find the root cause, return all three corrected files, and extend the test suite. I graded it by executing each model’s returned code against the reference.

Case C task: a 3-file shopcart with compounding truncation, graded by running the returned code — Case C, the task: three files each truncate money the same way, so the errors compound and a single-file patch fails. The fix is graded by running the returned code against a known-correct reference (the buggy baseline scores 6/12).

Here is the honest result. Opus 4.8 and Sonnet 4.6 both shipped complete, correct refactors that scored 12 out of 12, added their own tests, and named the root cause, in 24.5s and 26.2s. Fable 5 gave the best diagnosis of the three: it alone went past “truncation” to flag the floating-point representation problem underneath it (10.005 * 100 evaluates to 1000.5000000000001, which int() then truncates to 1000 rather than the 1001 you’d get from rounding half-up) and prescribed decimal.Decimal with ROUND_HALF_UP constructed from the string form. That is the most senior answer in the room. But it never shipped a runnable fix. Its always-on reasoning consumed almost the entire output budget before it finished emitting the files, so the deliverable came back incomplete and scored 0 out of 12. When I raised the budget enough to let it finish, latency went past 400 seconds and the call did not return.

Case C results: latency and tests passed out of 12 for the three models — Case C results, the honest one. Opus and Sonnet finished correct fixes (12/12) in about 25 seconds; Fable 5 ran 90 seconds, spent its whole budget on the deepest diagnosis, and never shipped a runnable fix (0/12) — below even the buggy baseline of 6/12.

I did not get the win I built the test for. On a genuinely hard cross-file task, the two leaner models finished correct fixes quickly, and the frontier model’s always-on reasoning worked against it: it diagnosed deepest and delivered least. Two honest caveats so you don’t over-read that: the 0/12 is a bounded-budget result, not a verdict that the model cannot do the task. It ran out of output tokens mid-fix, and when I lifted the cap it kept reasoning past 400 seconds without returning. So the finding is narrow and specific: on a single, budget-bound deliverable its always-on reasoning is a tax, not that the model is weak. That is a lesson about matching the tool to the shape of the work, which is the next section.

Why the leaner models won the hard task

It is worth sitting with why the result came out this way, because it is the practical part. The short version: the model’s headline strength — sustained, multi-step reasoning — is exactly what made it slower and less complete on a single, bounded request.

Fable 5’s claimed edge is sustained, multi-day agentic execution over a 1M-token context: planning across stages, delegating to sub-agents, self-verifying over long horizons [3]. A single request, even a hard multi-file one with a fixed output budget, does not exercise that. What it does exercise is the cost side of always-on reasoning: every answer carries reasoning that spends output tokens and adds latency, and on a bounded, structured deliverable that is a tax, not a benefit. The leaner models, with no forced reasoning, simply wrote the fix.

The benchmark claims point the same way. AWS calls Fable 5 “state-of-the-art on nearly all tested benchmarks” [1], but the model card publishes no numeric table [3]. Treat that as vendor positioning until you measure it on your own workload. To actually exercise the differentiation you would need a long-horizon agent loop over a large corpus, where there is real planning to amortize the reasoning against. The shared harness for this post swaps any model in with a one-line id change [6]; what it is missing is a task big enough to be worth the candidate. Until you have that task, the cheaper model is not a compromise, it is the right call.

What `provider_data_share` actually is, and what it touches

Now the switch. To invoke Fable 5 you must set the effective data-retention mode to provider_data_share, which retains prompts and completions for up to 30 days so Anthropic can run trust-and-safety review [2]. Three things are worth being precise about, because this is where teams get nervous for the wrong reasons.

Where the requirement comes from

It is not an AWS quirk. It is an Anthropic policy for the whole Mythos model class, and it applies on every platform that serves these models [4]. The reasoning is about misuse that is invisible one request at a time: Best-of-N jailbreaking that fires hundreds of prompt variants, state-sponsored espionage, data-extortion campaigns. Detecting those needs prompts and outputs retained long enough to be analyzed together rather than one at a time [4]. The retained data is used only for safety and is auto-deleted after 30 days — kept longer only in the rare case of a safety investigation or a legal hold [4]; Anthropic’s launch announcement adds that it will not be used to train new Claude models [13]. Per AWS, Bedrock keeps the data within AWS infrastructure [1], and per Anthropic’s data-retention page the retained data stays in your AWS environment, with automated review by default and human review only on flagged content through export-blocked, audit-logged tooling [4]. “Shared with Anthropic” means a supervised, in-environment safety review, not a bulk export of your traffic. One thing the public docs do not settle, though, and you should not gloss over it for a regulated buyer: when content is flagged and a human at Anthropic reviews it, nothing in the docs says that reviewer sits in the EU. The data at rest stays in your region, but enabling the switch grants Anthropic personnel the right to access flagged prompts and outputs, and the docs do not confine that access to any geography [4]. So “EU-resident storage” is guaranteed; “no one outside the EU ever sees flagged content” is not. If that distinction matters to you, treat it as unverified, put it to your AWS account team, and consider the ZDR exception below — and that escalation path is itself official AWS guidance: the Bedrock data-retention doc directs organizations that need zero retention for compliance to contact their AWS account manager, evaluated per-account and per-model in coordination with the model provider [2].

What it does not touch

Setting your scope to provider_data_share does not mean every model starts sharing data with its provider. Each model declares which retention modes it accepts through its own allowed_modes, and your configured mode only sets what you allow [2]. Fable 5 and Mythos 5 require provider_data_share. Opus 4.8 and Sonnet 4.6 accept the stricter default and stay AWS-retained-only even when the surrounding scope permits sharing. I watched this directly at teardown: with the switch off again, Fable 5 refused while Opus kept answering in the same scope. Turning the switch on for the Mythos-class model does not change how your Opus or Sonnet traffic is handled. There is one edge to remember: a Fable 5 request that a safety classifier declines can fall back to Opus 4.8, and that fallback is part of the original Fable 5 exchange, so it follows Fable 5’s retention [2]. “Opus is always AWS-only” holds except in that fallback path.

How the effective mode resolves

The mode is computed as the first explicit setting of project, then account, then the model’s default [2]. That resolution order is the whole reason scoping is a choice rather than a global flip, which is the next section.

Project scope, account scope, and an EU-residency wrinkle

You can set provider_data_share at the account level or, on the Anthropic-native Messages surface, on a single project. The difference is blast radius.

Account level is one call and simple to reason about, but it sets the default for every covered model’s traffic in the account. Project level takes a little more setup (create a project, set its mode, and pass its id as the anthropic-workspace-id header on each call), and in return the opt-in is contained to one workload, the rest of the account keeps its mode, and you get a clean per-project boundary to show a compliance reviewer. Because the effective mode resolves project first, you can leave the account on none or inherit and opt in exactly one project.

Project-level versus account-level provider_data_share scoping — Where to scope the opt-in: project level contains the blast radius to one workload; account level is simpler but sets the default for all covered traffic.

My recommendation for production is to scope it to a project. The only reason to reach for account level is a throwaway sandbox where blast radius does not matter. For anything a regulated team will look at, the per-project boundary is worth two extra API calls, and you can enforce the rest of the account at none with an SCP on the retention-mode condition key.

Here is the wrinkle I hit, and it is worth knowing before you promise a customer both things at once. EU data residency today runs through an EU Geo inference profile (the eu.anthropic.* ids), which lives on the standard bedrock-runtime surface. That surface resolves retention at the account level; there is no per-request project on it. The per-project scoping I just recommended lives on the Anthropic-native Messages surface, which uses the base model id, not the Geo profile. So as of this launch there is a real trade-off, and it is worth stating bluntly: you cannot have both EU residency and project-level scope on the same call today. EU-residency-via-Geo-profile is account-scoped, and tight project-level blast radius is on the other (mantle Messages) surface, which exposes no EU Geo inference profile — the model card lists its Geo and Global inference IDs as N/A [3]. For this post’s EU run that meant the only available path was to opt in at the account level on a disposable account, run, and tear it back down to none — there was no project-scoped EU option to take. (In my testing, project-level scoping worked for US-region inference; the EU-resident Geo path was account-scoped.) For a regulated production workload, raise this with your AWS account team rather than assuming you can have the Geo profile and a one-project blast radius on day one.

Setting it up yourself

Here is the public path end to end, accurate as of June 2026. It moves quickly, so check the docs [2], [3].

Point the SDK at an EU region so the Geo profile keeps data resident:

pip install -U boto3
export AWS_REGION="eu-central-1"   # EU Geo profiles keep retained data in the EU

Set the retention mode for your scope. There is no console UI for this at launch, so it is an API or CLI call. For the project-scoped path (recommended on the native Messages surface), create a project, set its data_retention.mode through the project Data Retention API, and pass the project id as the workspace id on each call. For the EU Geo profile path on bedrock-runtime, the opt-in is account-level. Then invoke with the documented Converse API:

The public invocation path on bedrock-runtime, using the EU Geo profile so inference stays in the EU.

import boto3

rt = boto3.client("bedrock-runtime", region_name="eu-central-1")

resp = rt.converse(
    modelId="eu.anthropic.claude-fable-5",   # EU Geo profile -> EU-resident inference
    messages=[{"role": "user", "content": [{"text": "Summarize this quarter's risks."}]}],
    inferenceConfig={"maxTokens": 4000},      # budget generously: always-on reasoning spends output tokens
)
text = "".join(b["text"] for b in resp["output"]["message"]["content"] if "text" in b)
print(text, resp["usage"])

A few things that cost me time:

No console toggle. The retention mode is an API or CLI call only at launch [2].
Budget for the reasoning. Every Fable 5 answer spends output tokens on adaptive reasoning you cannot disable. Set maxTokens higher than you would for Opus, and expect higher latency. On my hard task it spent the whole budget and never reached the answer.
Geo profile means account-scoped retention. For project-level scoping, use the native Messages surface with anthropic-workspace-id and the base model id.
Tear it down. Set the mode back to none when you are done. There is no hard-delete path, so none is the safe end state, and I verified Fable 5 refuses again once it is set.

The complete, runnable code for this head-to-head — all three cases, the EU Geo invocation, the execution grader for the refactor, and the data-retention opt-in helper — uses only fictional inputs. I’ll publish the full sample code in a follow-up.

What this means if you operate under EU or other regulated rules

Data residency and provider data sharing are a global concern, not only a European one. Any team with a contractual or sector rule about where data goes and who sees it faces the same questions. The EU is one important lens, so here is the EU-specific read, with the general point underneath it.

EU residency is doable today. Route through EU Geo inference and the retained data stays in the destination EU region. In-region processing at launch is Stockholm (eu-north-1); Frankfurt and the other EU regions are available through EU Geo cross-region routing [3]. The earlier worry that this was US-only is wrong: I ran the whole head-to-head EU-resident from Frankfurt.
The 30-day retention is mandatory while the switch is on, and “EU-resident” is not the same as “EU-only access.” Fable 5 cannot run without provider_data_share. EU Geo keeps the data stored in your EU region [1], [4], but the opt-in also lets Anthropic access flagged prompts and outputs for human review, and the public docs do not establish that this access stays within the EU (see the data-retention section above). You can guarantee EU storage today; you cannot, from the public docs alone, guarantee that no one outside the EU ever sees flagged content. If your rules require the latter, that is a ZDR-exception conversation, not a Geo-profile one — confirm the exact boundary with your AWS account team before committing a regulated workload.
Scope it deliberately. Prefer a project boundary where the surface allows it, and weigh the Geo-profile-versus-project trade-off above. Do not flip it account-wide out of convenience on a shared account.
Zero-retention has a path, through AWS. If your policy mandates zero data retention, the standard provider_data_share requirement and ZDR are mutually exclusive, but a ZDR exception is evaluated per account and per model in coordination with Anthropic, and you start that conversation with your AWS account manager, not Anthropic directly. Approved accounts get none added to the model’s allowed_modes. It is not automatic and not instant, so plan for lead time.
The EU AI Act is part of the conversation, not a blocker here. For most uses Fable 5 is a general-purpose model you are building on, so your obligations track how you deploy your own system (transparency, risk classification) more than the model switch itself. The retention decision is a data-protection question; the AI Act is a separate, parallel one. Keep them distinct when you brief stakeholders.

The general version, for any regulated team anywhere: the retention switch is a data-governance decision you make once, deliberately, and scope as tightly as your surface allows. Residency is a routing decision. Treat them as two separate, answerable questions and the model choice gets a lot simpler.

What others are seeing, and what the benchmarks actually say

My run is one data point: single shots, fictional data, on a model less than two days old. So before you make a call, it is worth putting my small result next to what the wider field reported in the same window — and being honest about which numbers are independent and which are the vendor’s own.

Start with the independent harnesses, because they are the load-bearing evidence. An outside team ran Fable 5 through its own hiring-style scaffold and put it clearly ahead on hard work: Every gave it the same “Senior Engineer” test they give human candidates and scored it 91 out of 100, against 63 for Opus 4.8 and 62 for GPT-5.5, with the gap concentrated on multi-file refactors and first-principles architecture [8]. An outside-harness win is worth more than a self-reported one, and it points the same direction my hard task hinted at: on genuinely large, well-specced work, the frontier model pulls ahead.

Now the counter-signal, which matters just as much. CodeRabbit ran Fable 5 through their 105-example code-review benchmark and found it slightly behind Opus 4.8 on review precision (32.8% versus 35.5%) and noisier, with more nitpick comments [9]. In their coding-task run, 19 of 33 tasks hit the agent’s timeout — when it finished it produced serious patches, but it kept exploring past the harness limit when it struggled [9]. That is the same shape as my refactor result: brilliant when it lands, but it needs explicit time and token budgets, and it is not the default for bounded review work. Ethan Mollick’s independent write-up adds the long-horizon end of the spectrum: he ran it autonomously for hours building a real research tool, with the recurring caveat that you lose steerability — it feels like a black box you brief rather than a tool you steer [10].

Then there are the vendor numbers, which you will see quoted everywhere — SWE-bench Verified at 95% and SWE-bench Pro at about 80%, both well ahead of the field [7], [11], and the widely reported Stripe migration of a 50-million-line codebase in about a day [12]. Treat these honestly when you brief a customer: the SWE-bench figures are from Anthropic’s own system card, and the Stripe number is a customer result relayed by the vendor, not an independently audited one [11], [12]. They are impressive and probably directionally right, but they are not the same class of evidence as the Every and CodeRabbit results above. The pattern that survives the filtering is consistent across my run and theirs: Fable 5 is a frontier choice for long, hard, well-briefed work, and overkill — slower and more expensive — for the everyday bounded tasks where a leaner model already clears the bar.

When to reach for Fable 5, and when not to

Be clear about what this exercise is and is not. These were single runs on fictional data, so the latencies are directional, not a benchmark, and three prompts say nothing about the multi-day, large-context agency that is the actual pitch. I did not measure that, and it is the only place this model is likely to pull ahead.

With that caveat: reach for Fable 5 when you have a genuine long-horizon, large-context agentic workload, the kind with real planning to amortize the always-on reasoning against, and a compliance path for the opt-in. Do not reach for it when a strong general model already clears your bar, which covers most short, bounded tasks where my baselines did fine and finished faster; when you contractually require zero retention and have not secured an exception; or when the work is a structured, budget-bound deliverable, where the reasoning is a tax. For my hardest single task, the leaner models were both the faster and the more complete choice.

The decision comes before the model

I set out to measure a model and spent most of the value measuring a decision. For any team, EU or not, the first Fable 5 question is not “how good is it.” The run says a strong general model already covers a lot of ground, and a mid-tier one held parity on everything bounded I gave it. The first question is “are we allowed to turn it on, and where have we scoped the 30-day retention?” Answer that, route for the residency you need, and then go find a workload big enough to be worth a frontier model. That order will save you more grief than any benchmark.

If you have enabled provider_data_share for a Mythos-class model, where did you scope it, and what convinced your compliance team? I would genuinely like to know.

Sources

[1] Claude Fable 5 is now available in Amazon Bedrock, AWS News (Jun 9, 2026)
[2] Amazon Bedrock data retention documentation
[3] Claude Fable 5 model card, AWS documentation
[4] Anthropic: Data retention practices for Mythos-class models
[5] My prior post: Welcome to the Family: GPT-5.5 and Claude on Bedrock
[6] My prior post: Why Your Cheapest Model Should Write the Harness
[7] Claude Fable 5 review and benchmark roundup, llm-stats
[8] Claude Fable 5 vs Opus 4.8 (relays Every’s “Senior Engineer” benchmark, 91/100), AY Automate
[9] Claude Fable 5 model review — 105-example code-review benchmark and coding tasks, CodeRabbit
[10] What it feels like to work with Mythos, Ethan Mollick, One Useful Thing
[11] Claude Fable 5 system card, Anthropic (SWE-bench Verified 95%, SWE-bench Pro ~80%)
[12] Claude Fable 5 launch coverage (Stripe 50M-line migration), Tom’s Hardware
[13] Claude Fable 5 and Claude Mythos 5, Anthropic — launch announcement (states the retained data will not be used to train new Claude models)
[14] Geographic cross-Region inference, Amazon Bedrock User Guide (EU Geo profiles route only within EU Regions; data stored in the source Region)

About the Author

Stefan Christoph is a Principal Solutions Architect at AWS, focused on agentic AI, media & entertainment, and helping builders move from demo to production. He writes about AI architecture, developer productivity, and the future of software.

This is a personal blog. Opinions expressed here are my own and do not represent the views or positions of my employer.

Learn more →

Cross-posted to LinkedIn

❤️ Created with the support of AI (Kiro)