Most basis fashions in biology have a basic blind spot: they see cells as frozen snapshots. Give a mannequin a single-cell transcriptome — a readout of which genes are lively in a cell at a given second — and it might probably inform you numerous about what that cell is doing proper now. What it might probably’t inform you is the place that cell is headed.
That limitation issues enormously when learning getting older. Age-related ailments like coronary heart illness, Alzheimer’s dementia, and pulmonary fibrosis don’t occur in a single day. They unfold throughout many years, pushed by sluggish, progressive shifts in gene community states. To know and ultimately reverse these trajectories, you want a mannequin that thinks in time — not simply in snapshots.
That’s precisely what MaxToki is designed to do.
What MaxToki Is, Below the Hood
The group concerned on this analysis consists of researchers from establishments just like the Gladstone Institute of Cardiovascular Illness, the Gladstone Institute of Information Science and Biotechnology, and the Gladstone Institute of Neurological Illness, all alongside the College of California San Francisco’s Division of Cardiology, Organic and Medical Informatics Graduate Program, Division of Pathology, Division of Neurology and Bakar Growing old Analysis Institute, Division of Pediatrics and Cardiovascular Analysis Institute, and Institute for Human Genetics. Additionally contributing had been the College of California Berkeley’s Division of Molecular and Cell Biology and NVIDIA together with the Institute of Cardiovascular Regeneration and Centre for Molecular Drugs at Goethe College Frankfurt, the German Heart for Cardiovascular Analysis, the Cardiopulmonary Institute, and the Clinic for Cardiology at College Hospital Frankfurt from Germany, and the Heart for iPS Cell Analysis and Software at Kyoto College. MaxToki is a transformer decoder mannequin — the identical architectural household behind massive language fashions — however educated on single-cell RNA sequencing knowledge. The mannequin is available in two parameter sizes: 217 million and 1 billion parameters.
The important thing representational alternative is the rank worth encoding. Slightly than feeding uncooked transcript counts into the mannequin, every cell’s transcriptome is represented as a ranked listing of genes, ordered by their relative expression inside that cell after scaling by expression throughout the complete pretraining corpus. This nonparametric strategy deprioritizes ubiquitously expressed housekeeping genes and amplifies genes like transcription components which have excessive dynamic vary throughout distinct cell states — even when lowly expressed in absolute phrases. It’s additionally extra sturdy in opposition to technical batch results, since relative rankings inside a cell are extra secure than absolute depend values.
Coaching occurred in two levels. Stage 1 used Genecorpus-175M — roughly 175 million single-cell transcriptomes from publicly accessible knowledge throughout a broad vary of human tissues in well being and illness, protecting 10,795 datasets, producing roughly 290 billion tokens. Malignant cells and immortalized cell traces had been excluded as a result of their gain-of-function mutations would confound what the mannequin learns about regular gene community dynamics, and no single tissue was permitted to compose greater than 25% of the corpus. The mannequin was educated with an autoregressive goal: given the previous genes within the rank worth encoding, predict the following ranked gene — conceptually similar to how language fashions predict the following token in a sentence.
A key technical discovering from Stage 1 is that mannequin efficiency on the generative goal scaled as an influence regulation with the variety of parameters. This motivated the selection to totally pretrain precisely two variants — the 217M and 1B — slightly than exploring the total spectrum, balancing efficiency in opposition to compute price range constraints.
Stage 2 prolonged the context size from 4,096 to 16,384 tokens utilizing RoPE (Rotary Positional Embeddings) scaling — a method that interpolates extra tokens into the prevailing positional framework by decreasing the rotation frequency. This expanded context allowed the mannequin to course of a number of cells in sequence, enabling temporal reasoning throughout a trajectory slightly than reasoning about one cell at a time. Stage 2 coaching used Genecorpus-Growing old-22M: roughly 22 million single-cell transcriptomes throughout roughly 600 human cell varieties from about 3,800 donors representing each decade of life from delivery to 90-plus years, balanced by gender (49% male, 51% feminine), producing roughly 650 billion tokens. Mixed throughout each levels, MaxToki educated on almost 1 trillion gene tokens in complete.
https://www.biorxiv.org/content material/10.64898/2026.03.30.715396v1.full.pdf
The Temporal Prompting Technique
Essentially the most architecturally novel contribution of MaxToki is its prompting technique. A immediate consists of a context trajectory — two or three cell states plus the timelapses between them — adopted by a question. The mannequin then performs considered one of two duties:
Activity 1: Given a context trajectory and a question cell, predict the timelapse (in months) wanted to succeed in that question cell from the final context cell.
Activity 2: Given a context trajectory and a question timelapse, generate the transcriptome of the cell that will come up after that length.
For Activity 1, an ordinary cross-entropy loss is inadequate as a result of it treats every timelapse worth as a disconnected class. As a substitute, the analysis group used steady numerical tokenization with a mean-squared error (MSE) loss perform, instructing the mannequin that timelapses fall alongside a numerical continuum. This design alternative produced dramatically decrease prediction errors — the median prediction error for held-out ages dropped to 87 months with MaxToki, in comparison with 178 months for a linear SGDRegressor baseline and 180 months for the naive baseline of assuming every question cell was the most typical age for that cell kind and gender.
Crucially, the mannequin is rarely explicitly informed which cell kind or gender it’s coping with. It infers the trajectory context from the cells themselves — a type of in-context studying. Because of this the mannequin generalizes to held-out cell varieties it by no means noticed throughout coaching: it achieves a Pearson correlation of 0.85 between predicted and floor fact timelapses on fully unseen cell kind trajectories, and a Pearson correlation of 0.77 on held-out ages from held-out donors.
GPU Engineering at Scale
Coaching almost 1 trillion gene tokens required severe infrastructure work. For the 1 billion parameter variant, the group applied FlashAttention-2 by way of the NVIDIA BioNeMo stack constructed on NeMo, Megatron-LM, and Transformer Engine. To allow FlashAttention-2, they modified feed-forward hidden dimensions to be evenly divisible by the variety of consideration heads — a tough compatibility requirement. Mixed with mixed-precision coaching utilizing bf16, these adjustments yielded roughly a 5x enchancment in coaching throughput and a 4x improve in achievable micro-batch dimension on H100 80GB GPUs. For inference, adopting the Megatron-Core DynamicInferenceContext abstraction with key-value caching resulted in over 400x sooner autoregressive technology in comparison with the naive baseline.
What the Mannequin Realized — With out Being Advised
Interpretability evaluation on the 217 million parameter variant revealed one thing placing: roughly half of the eye heads realized, totally via self-supervised coaching with no gene perform labels, to pay considerably greater consideration to transcription components in comparison with different genes. Transcription components are grasp regulators of cell state transitions, however the mannequin found their significance by itself.
Ablation research confirmed that each the context cells and the question cell are equally needed for correct predictions — masking both part considerably and equivalently degraded efficiency. Shuffling genes inside the rank worth encoding to supply “bag of genes” cells (preserving which genes are current however destroying their relative ordering) additionally considerably broken predictions, demonstrating that the mannequin realized to make use of the relative expression ordering of genes, not merely their presence or absence. Additional consideration evaluation confirmed that particular person heads specialised for various elements of the immediate — some attending primarily to context cells, others to timelapse tokens, others to the question — with many heads exhibiting cell type-specific activation patterns throughout the roughly 60 cell varieties examined.
One failure mode of generative fashions is studying to output averaged representations. The analysis group educated a doublet detector — a classifier distinguishing particular person cells from simulated doublets fashioned by merging two cells of the identical cell kind — on floor fact cells, then utilized it to MaxToki-generated cells. Roughly 95% of generated cells had been labeled as singlets, confirming that the mannequin produces single-cell decision transcriptomes slightly than blended averages.
Inferring Age Acceleration in Illness — Together with Illnesses By no means Seen Throughout Coaching
Given the mannequin was educated solely on wholesome management donors, the analysis group examined whether or not it might infer getting older signatures in illness states totally absent from coaching. The strategy: present a context trajectory of regular cells, then question with a illness cell and take a look at whether or not the mannequin infers kind of elapsed time in comparison with an age-matched management cell.
In lung mucosal epithelial cells from donors uncovered to heavy smoking, the mannequin inferred roughly 5 years of age acceleration in comparison with age-matched non-smoking controls — according to prior studies linking smoking standing to telomere shortening and lung getting older signatures. In lung fibroblasts from sufferers with pulmonary fibrosis — a illness characterised by telomere attrition and mobile senescence — the mannequin inferred roughly 15 years of age acceleration.
The Alzheimer’s illness evaluation produced a number of clinically essential findings. In microglia from Alzheimer’s sufferers drawn from the Mount Sinai NIH Neurobiobank, the mannequin inferred roughly 3 years of age acceleration in comparison with age-matched controls. This end result was replicated in an unbiased cohort from Duke and Johns Hopkins Alzheimer Illness Analysis Facilities utilizing homeostatic microglia particularly. Critically, this second cohort additionally included sufferers with delicate cognitive impairment and Alzheimer-resilient sufferers — people who share the identical neuropathological adjustments as Alzheimer’s sufferers however exhibit no cognitive impairment. The mannequin didn’t infer age acceleration in homeostatic microglia from both the delicate cognitive impairment or resilient teams in comparison with controls, suggesting these sufferers could also be protected against the disease-related age acceleration on this microglial subtype. This distinction between full Alzheimer’s illness and Alzheimer resilience — captured with none disease-specific coaching — is likely one of the most clinically vital findings within the paper.
Conclusion
MaxToki represents a significant step ahead in how AI fashions can cause about organic time. By shifting past single-cell snapshots to mannequin complete trajectories of gene community change throughout the human lifespan, it addresses a limitation that has constrained computational biology for years. The mixture of rank worth encoding, steady numerical tokenization, RoPE-based context extension, and in-context studying allowed the mannequin to generalize to unseen cell varieties, unseen ages, and even illness states it was by no means educated on — all whereas studying, with none supervision, to pay greater consideration to the transcription components that truly drive cell state transitions.
What makes MaxToki notably compelling for each researchers and engineers is that its predictions didn’t cease on the computational degree. The mannequin nominated novel pro-aging drivers in cardiac cell varieties that had been subsequently validated to trigger age-related gene community dysregulation in iPSC-derived cardiomyocytes and measurable cardiac dysfunction in dwelling mice inside six weeks — a direct line from in silico screening to in vivo consequence. With pretrained fashions and coaching code publicly accessible, MaxToki affords a reusable framework that the broader group can construct on, fine-tune for particular illness contexts, and lengthen to new tissue varieties. As longitudinal single-cell datasets proceed to develop, temporal basis fashions like MaxToki might change into an ordinary device for figuring out intervention factors earlier than age-related ailments take maintain.
Take a look at the Paper, Mannequin and Repo. Additionally, be happy to observe us on Twitter and don’t overlook to affix our 120k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you’ll be able to be part of us on telegram as properly.
Must companion with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and so forth.? Join with us

