Hugging Face Releases ml-intern: An Open-Supply AI Agent that Automates the LLM Put up-Coaching Workflow

Hugging Face has launched ml-intern, an open-source AI agent designed to automate end-to-end post-training workflows for big language fashions (LLMs). Constructed on the corporate’s smolagents framework, the instrument can autonomously carry out literature evaluation, dataset discovery, coaching script execution, and iterative analysis — duties that sometimes require vital guide effort from ML researchers and engineers.

What ml-intern Does

The agent operates as a steady loop that mirrors the workflow of an ML researcher. It begins by looking arXiv and Hugging Face Papers, studying methodology sections and traversing quotation graphs to determine related datasets and methods. It then searches the Hugging Face Hub for referenced datasets, inspects their high quality, and reformats them for coaching. When native compute is unavailable, the agent can launch jobs by way of Hugging Face Jobs. After every coaching run, it reads analysis outputs, diagnoses failures — corresponding to reward collapse in RLHF pipelines — and retrains till benchmark efficiency improves.

The complete monitoring stack depends on Trackio, a Hub-native experiment tracker positioned as an open-source various to Weights & Biases.

Efficiency on PostTrainBench

ml-intern was evaluated towards PostTrainBench, a benchmark launched by researchers on the College of Tübingen and the Max Planck Institute. The benchmark checks an agent’s means to post-train a base mannequin inside a strict 10-hour window on a single H100 GPU.

Within the official launch demo, ml-intern took the Qwen3-1.7B base mannequin—which scores a baseline of roughly 10% on GPQA—and pushed it to 32% in below 10 hours. The agent’s progress was remarkably quick, crossing the 27.5% mark in simply over 3 hours.

This result’s significantly vital when in comparison with the present SOTA. Hugging Face’s knowledge reveals the agent outperforming Claude Code, which at the moment sits at a 22.99% benchmark on the identical activity. Whereas the broader PostTrainBench paper recorded a excessive of 33% utilizing the bigger Gemma-3-4B, ml-intern’s means to extract 32% from the tiny 1.7B Qwen mannequin demonstrates a stage of “data-efficiency” that guide researchers usually battle to copy in such a brief timeframe.

https://x.com/akseljoonas/standing/2046543093856412100

Technical Approaches: Artificial Information and GRPO

Two technical methods that ml-intern demonstrated in revealed demos are value highlighting for practitioners.

Artificial knowledge technology: In a healthcare-domain check, the agent assessed accessible medical datasets, decided their high quality was inadequate for dependable fine-tuning, and wrote a script to generate artificial coaching examples centered on edge circumstances together with medical hedging language and multilingual emergency response eventualities. It then upsampled this knowledge to enhance the coaching distribution earlier than evaluating on HealthBench.

Autonomous RLHF by way of GRPO: In a math-domain check, the agent carried out a Group Relative Coverage Optimization (GRPO) coaching script — a method that performs reinforcement studying from human suggestions with decrease reminiscence overhead than customary PPO. The agent launched coaching on A100 GPUs, monitored reward curves, and ran ablations to isolate efficient parts earlier than finalizing the checkpoint.

Key Takeaways

Autonomous Analysis Loop: The agent replicates the total machine studying workflow, from performing literature evaluations on arXiv and traversing quotation graphs to autonomously executing coaching runs and diagnosing failures.
Vital Reasoning Good points: In lower than 10 hours, the agent pushed a Qwen3-1.7B mannequin’s scientific reasoning rating on the GPQA benchmark from 8.5% to 32%, outperforming the precise GPQA outcomes of Claude Code (22.99%).
Superior Coaching Methods: Past easy fine-tuning, ml-intern can generate high-quality artificial knowledge for edge circumstances and implement advanced methods like Group Relative Coverage Optimization (GRPO) to optimize math efficiency.
Native Ecosystem Integration: Constructed on the smolagents framework, the instrument natively integrates with Hugging Face Jobs for compute and makes use of Trackio for open-source experiment monitoring.

Introducing ml-intern, the agent that simply automated the post-training staff @huggingface

It is an open-source implementation of the actual analysis loop that our ML researchers do daily. You give it a immediate, it researches papers, goes by way of citations, implements concepts in GPU… pic.twitter.com/USLWv6lKz9

— Aksel (@akseljoonas) April 21, 2026

Take a look at the App, and CLI. Additionally, be at liberty to observe us on Twitter and don’t neglect to affix our 130k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you’ll be able to be part of us on telegram as nicely.

Have to accomplice with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and many others.? Join with us

What's Hot

NYT Strands hints and solutions for Tuesday, Might 12 (sport #800)

OpenAI Introduces Dawn: A Cybersecurity Initiative That Places Codex Safety on the Middle of Vulnerability Detection and Patch Validation

FAQ on hantavirus and outbreak on cruise ship Hondius

OpenAI Introduces Dawn: A Cybersecurity Initiative That Places Codex Safety on the Middle of Vulnerability Detection and Patch Validation

College students Boo Graduation Speaker After She Calls AI the ‘Subsequent Industrial Revolution’

10 GitHub Repositories to Grasp FastAPI

Constructing internet search-enabled brokers with Strands and Exa

Understanding LLM Distillation Methods – MarkTechPost

Your AI Use Is Breaking My Mind

NYT Strands hints and solutions for Tuesday, Might 12 (sport #800)

OpenAI Introduces Dawn: A Cybersecurity Initiative That Places Codex Safety on the Middle of Vulnerability Detection and Patch Validation

FAQ on hantavirus and outbreak on cruise ship Hondius

NYT Strands hints and solutions for Tuesday, Might 12 (sport #800)

OpenAI Introduces Dawn: A Cybersecurity Initiative That Places Codex Safety on the Middle of Vulnerability Detection and Patch Validation

FAQ on hantavirus and outbreak on cruise ship Hondius

Usefull link

categories

What's Hot

What ml-intern Does

Efficiency on PostTrainBench

Technical Approaches: Artificial Information and GRPO

Key Takeaways

Related Posts

Usefull link

categories