Meet OpenMythos: An Open-Supply PyTorch Reconstruction of Claude Mythos The place 770M Parameters Match a 1.3B Transformer

Anthropic has by no means revealed a technical paper on Claude Mythos. That has not stopped the analysis group from theorizing. A brand new open-source challenge known as OpenMythos, launched on GitHub by Kye Gomez, makes an attempt one thing formidable: a first-principles theoretical reconstruction of what the Claude Mythos structure would possibly really be, constructed completely in PyTorch and grounded in peer-reviewed analysis.

The challenge just isn’t a leaked mannequin, a fine-tune, or a distillation. It’s a speculation rendered in code — and the speculation is restricted sufficient to be falsifiable, which is what makes it fascinating.

The Major Declare: Claude Mythos Is a Recurrent-Depth Transformer

OpenMythos proposes that Claude Mythos belongs to a category of architectures known as Recurrent-Depth Transformers (RDTs), additionally referred to within the literature as Looped Transformers. The idea is meaningfully completely different from commonplace transformer stacks.

In a standard transformer — GPT, LLaMA, Mistral — the mannequin passes enter by a sequence of distinctive layers, one after one other, every with its personal unbiased weights. Extra functionality typically means extra layers and extra parameters. In a Recurrent-Depth Transformer, a hard and fast set of weights is utilized iteratively throughout T loop steps inside a single ahead cross. The identical weights run a number of occasions. Reasoning depth just isn’t a perform of what number of parameters are saved, however of what number of iterations are run at inference time.

Consider it much less like studying a e-book and extra like refining a draft: the mannequin returns to the identical computational block time and again, bettering its inner illustration with every cross.

How the Structure is Structured

OpenMythos instantiates this as a three-part construction: Prelude → Recurrent Block → Coda. The Prelude and Coda are commonplace transformer layers that run precisely as soon as. The Recurrent Block is the computational core, looped as much as T=16 occasions.

At every loop step t, the hidden state is up to date utilizing the next rule:

ht+1 = A·ht + B·e + Transformer(ht, e)

Right here, ht is the hidden state after loop iteration t, and e is the encoded enter from the Prelude — re-injected at each step. The re-injection is deliberate: with out it, the hidden state would drift away from the unique enter sign throughout deep loops. The discovered matrices A and B govern how a lot of the earlier hidden state and the encoded enter carry ahead at every step.

The FFN contained in the Recurrent Block just isn’t a normal feedforward layer. OpenMythos replaces it with a Combination-of-Consultants (MoE) layer following the design launched in DeepSeekMoE: a big pool of fine-grained routed consultants, with solely a sparse top-Okay subset activated per token, alongside a small set of always-active shared consultants that soak up widespread cross-domain patterns. Crucially, the router selects distinct skilled subsets at every loop depth, which means every iteration is computationally distinct regardless of sharing the identical base weights. MoE gives area breadth; looping gives reasoning depth.

Consideration defaults to Multi-Latent Consideration from DeepSeek-V2, which caches a compressed low-rank KV latent fairly than full key/worth tensors, yielding a ten–20× discount in KV reminiscence at manufacturing scale.

Reasoning in Steady Latent Area

One of the vital properties of this structure is that reasoning happens completely in steady latent house. There isn’t any intermediate token emission between loop steps — the mannequin doesn’t produce textual content mid-thought after which re-read it. That is structurally distinct from chain-of-thought prompting, the place reasoning is externalized as token sequences, and has been formally analyzed in each Saunshi et al. (2025) and COCONUT (2024).

Saunshi et al. (2025) formally present that every loop iteration in an RDT is functionally equal to 1 step of chain-of-thought, however working over real-valued vectors fairly than discrete tokens. Steady latent ideas can even encode a number of various subsequent steps concurrently, enabling one thing nearer to breadth-first search over the reasoning house inside a single ahead cross.

This additionally explains a concrete functionality benefit. A normal transformer educated on 5-hop reasoning chains fails when examined on 10-hop chains at inference time — it has no mechanism to increase its depth past what it noticed throughout coaching. A Recurrent-Depth Transformer handles this naturally: working extra inference-time loops extends the reasoning chain with none retraining. Tougher issues obtain extra compute; easier ones exit early.

Fixing the Stability Drawback

Coaching looped fashions has traditionally been brittle. The hidden state ht can develop unboundedly throughout iterations — a failure mode known as residual explosion. OpenMythos addresses this utilizing a Linear Time-Invariant (LTI) injection constraint borrowed from the Parcae structure (Prairie et al., 2026): the spectral radius of A, denoted ρ(A), is enforced to be lower than 1 by building, guaranteeing stability no matter studying fee or gradient noise.

A second failure mode additionally exists on the different excessive: past a sure loop depth, extreme recurrence degrades predictions — the hidden state drifts previous the answer and into noise. That is the ‘overthinking’ downside. Adaptive Computation Time (ACT) halting addresses it with a discovered scalar per place that dynamically decides when to cease looping. Positions which are tougher to course of obtain extra computation; tokens which have already converged halt early.

Lastly, Depth-Clever LoRA adapters introduce a small rank-r adaptation matrix at every iteration depth, giving every loop step barely distinct conduct with out including substantial parameters — bridging the hole between pure weight-tying and absolutely distinct layers.

Introducing OpenMythos

An open-source, first-principles theoretical reconstruction of Claude Mythos, carried out in PyTorch.

The structure instantiates a looped transformer with a Combination-of-Consultants (MoE) routing mechanism, enabling iterative depth by way of weight sharing and… pic.twitter.com/YLvCid6CAr

— Kye Gomez (swarms) (@KyeGomezB) April 19, 2026

Why Parameter Effectivity Issues

The Parcae paper (Prairie et al., 2026) gives empirical grounding for the effectivity declare. At 770M parameters, an RDT matches a 1.3B commonplace transformer educated on similar information — roughly half the parameters for equal downstream high quality. Optimum recurrence and optimum token depend each comply with energy legal guidelines with constant exponents throughout scales, establishing the primary predictable scaling legal guidelines for looped coaching.

The implication is important: reasoning depth scales with inference-time compute, not saved parameter depend. This reframes one of many dominant assumptions within the scaling debate. The related axis might not be parameter depend at coaching, however loop depth at inference.

What OpenMythos Contributes

OpenMythos gives 4 concrete analysis artifacts: a completely configurable PyTorch implementation of the RDT speculation with MoE FFN and Multi-Latent Consideration; LTI-stable recurrent injection built-in as a first-class coaching primitive; depth-wise LoRA adapters enabling per-iteration behavioral differentiation; and a reproducible analysis baseline for learning looped transformer dynamics and inference-time reasoning depth.

Whether or not or not Mythos is definitely an RDT, OpenMythos provides the analysis group one thing concrete and runnable — an implementation of an structure class the literature more and more suggests is underexplored, and one which will symbolize a basically completely different path to succesful AI than merely coaching larger fashions.

Try the Full Codes with Pocket book right here. Additionally, be at liberty to comply with us on Twitter and don’t neglect to affix our 130k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you may be part of us on telegram as nicely.

Must companion with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and so forth.? Join with us

What's Hot

NYT Strands hints and solutions for Tuesday, Might 12 (sport #800)

OpenAI Introduces Dawn: A Cybersecurity Initiative That Places Codex Safety on the Middle of Vulnerability Detection and Patch Validation

FAQ on hantavirus and outbreak on cruise ship Hondius

OpenAI Introduces Dawn: A Cybersecurity Initiative That Places Codex Safety on the Middle of Vulnerability Detection and Patch Validation

College students Boo Graduation Speaker After She Calls AI the ‘Subsequent Industrial Revolution’

10 GitHub Repositories to Grasp FastAPI

Constructing internet search-enabled brokers with Strands and Exa

Understanding LLM Distillation Methods – MarkTechPost

OpenAI simply launched its reply to Claude Mythos

NYT Strands hints and solutions for Tuesday, Might 12 (sport #800)

OpenAI Introduces Dawn: A Cybersecurity Initiative That Places Codex Safety on the Middle of Vulnerability Detection and Patch Validation

FAQ on hantavirus and outbreak on cruise ship Hondius

NYT Strands hints and solutions for Tuesday, Might 12 (sport #800)

OpenAI Introduces Dawn: A Cybersecurity Initiative That Places Codex Safety on the Middle of Vulnerability Detection and Patch Validation

FAQ on hantavirus and outbreak on cruise ship Hondius

Usefull link

categories

What's Hot

The Major Declare: Claude Mythos Is a Recurrent-Depth Transformer

How the Structure is Structured

Reasoning in Steady Latent Area

Fixing the Stability Drawback

Why Parameter Effectivity Issues

What OpenMythos Contributes

Related Posts

Usefull link

categories