This submit is co-written with Mark Ross from Atos.
Organizations pursuing AI transformation can face a well-known problem: the way to upskill their workforce at scale in a means that modifications how groups construct, deploy, and use AI. Conventional AI coaching approaches—on-line programs, certification applications, and classroom-based instruction—are mandatory, however usually inadequate. Whereas they construct foundational data, many organizations wrestle with low engagement, restricted hands-on apply, and a niche between theoretical understanding and real-world utility. In consequence, groups might earn certifications with out gaining the boldness or expertise required to use AI meaningfully to enterprise issues.
Via Atos’ partnership with AWS, we’ve lengthy acknowledged that hands-on studying is the lacking ingredient in efficient AI enablement. When mixed with structured e-learning and certification pathways, experiential studying helps translate data into affect. At present, Atos staff maintain over 5,800 AWS Certifications and 11 Golden Jackets, reflecting our sturdy basis in cloud and AI expertise. However with a dedication to attaining a 100% AI-fluent workforce by 2026, we knew we wanted a studying mannequin that might scale engagement, speed up sensible expertise, and encourage engineers to use AI in lifelike eventualities.
To handle this, Atos partnered with AWS to ship a hands-on, gamified studying expertise via the AWS AI League—designed to maneuver past passive studying and immerse individuals in actual AI challenges. On this submit, we’ll discover how Atos used the AWS AI League to assist speed up AI training throughout 400+ individuals, spotlight the tangible advantages of gamified, experiential studying, and share actionable insights you may apply to your personal AI enablement applications.
AI enablement via the AWS AI League
Whereas e-learning programs and certifications are an important basis, many organizations wrestle to translate that data into hands-on expertise, sustained engagement, and actual enterprise affect—notably at scale.
The AWS AI League was designed to deal with this hole. Fairly than focusing solely on conceptual studying, this system combines hands-on experimentation with structured competitors, so builders can work straight with generative AI instruments utilized in real-world environments. For Atos, this strategy supplied a technique to speed up utilized AI expertise throughout the group whereas sustaining engagement, collaboration, and measurable outcomes.
The AWS AI League helps builders degree up their AI expertise by abstracting away deep infrastructure complexity whereas preserving the core mechanics of mannequin customization and analysis. Individuals work with Amazon SageMaker and Amazon SageMaker JumpStart to fine-tune giant language fashions (LLMs), gaining sensible expertise with strategies which can be more and more central to enterprise AI adoption.
Why fine-tuning issues for enterprise use instances
Effective-tuning a big language mannequin is a type of switch studying—a machine studying method the place a pre-trained mannequin is customized utilizing a smaller, domain-specific dataset quite than being educated from scratch. For enterprise groups, this strategy affords a realistic path to customization: it helps cut back coaching time and computational price whereas permitting fashions to mirror specialised data, terminology, and resolution logic.
In apply, organizations that use fine-tuning can adapt general-purpose fashions to particular domains the place accuracy, reasoning, and explainability are vital. For Atos, this meant tailoring fashions to the insurance coverage underwriting area, the place understanding danger profiles, coverage circumstances, exclusions, and premium calculations requires greater than generic language fluency. The AWS AI League demonstrates that, with the fitting construction and tooling, groups throughout roles—together with options architects, builders, consultants, and enterprise analysts—can fine-tune and deploy fashions with out requiring deep machine studying specialization. This makes fine-tuning a sensible functionality for associate organizations centered on delivering customer-ready AI options.
How the AWS AI League works
The AWS AI League follows a three-stage construction designed to construct hands-on, production-oriented AI expertise whereas sustaining momentum and engagement.This system begins with an immersive workshop that introduces the basics of fine-tuning utilizing SageMaker JumpStart. SageMaker JumpStart gives entry to pre-trained basis fashions via a guided interface, permitting individuals to give attention to mannequin conduct and outcomes quite than infrastructure setup.Individuals then transfer into an intensive mannequin improvement section. Throughout this stage, groups iterate throughout a number of fine-tuning methods, experimenting with dataset composition, augmentation strategies, and hyperparameter settings. Mannequin submissions are evaluated on a dynamic leaderboard powered by an AI-based analysis system, which benchmarks efficiency throughout a constant set of standards. This construction encourages speedy experimentation and makes progress seen, permitting groups to check their custom-made fashions towards bigger baseline fashions.This system culminates in a dwell, interactive finale. High-performing groups reveal their fashions via real-time challenges, with outputs evaluated utilizing a multi-dimensional scoring system. Technical judges assess depth and correctness, an AI benchmark measures goal efficiency, and viewers voting introduces a sensible, user-oriented perspective. Collectively, these dimensions reinforce the League’s aim: turning hands-on studying into fashions that carry out properly in real-world eventualities.
Atos’s use case – Clever Insurance coverage Underwriter
With this basis in place, Atos chosen a use case that carefully displays actual buyer wants: the Clever Insurance coverage Underwriter. Developed via an AWS AI League occasion, the aim was to fine-tune a big language mannequin able to analyzing advanced insurance coverage eventualities and offering expert-level underwriting steerage. The mannequin was designed to evaluate danger, advocate applicable coverage circumstances or deductibles, recommend premium changes, and clearly clarify the reasoning behind every resolution — all whereas aligning with skilled business requirements.This use case was chosen not as a theoretical train, however as a practical instance of how generative AI can assist underwriting professionals by bettering consistency and effectivity throughout insurance coverage product traces. Constructed on cost-effective, fine-tuned open supply fashions and powered by Amazon SageMaker, SageMaker Unified Studio, and Amazon S3, the answer incorporates a data base alongside reasoning and advice modules educated on proprietary underwriting information. The result’s an reasonably priced, custom-made assistant that enhances workforce productiveness, sharpens danger evaluation accuracy, and integrates seamlessly with the genuine business experience underwriters already depend on.
Effective-tuning with Amazon SageMaker Studio and Amazon SageMaker JumpStart
AWS AI League individuals do their mannequin fine-tuning inside Amazon SageMaker Studio—a totally built-in, web-based improvement setting for machine studying. SageMaker Studio gives a low-code/no-code (LCNC) interface to construct, fine-tune, deploy, and monitor generative AI fashions end-to-end. By following this strategy, Atos individuals may give attention to experimentation and innovation quite than infrastructure administration, serving to speed up time-to-value. AI League now additionally affords customization of Amazon Nova fashions via serverless SageMaker mannequin customization and agentic challenges constructed on prime of Amazon Bedrock AgentCore.
Customers comply with a streamlined sequence of steps inside Amazon SageMaker Studio:
- Choose a mannequin – SageMaker JumpStart affords a catalog of pre-trained, publicly out there basis fashions for duties corresponding to textual content technology, summarization, and picture creation. Individuals can seamlessly browse and choose fashions from main suppliers, that are pre-integrated for personalisation. For this competitors, individuals have been required to fine-tune the Meta Llama 3.2 3B Instruct mannequin, which is achieved in a no-code means using Amazon SageMaker Jumpstart.
- Present a coaching dataset – Datasets saved in Amazon Easy Storage Service (Amazon S3) are related on to SageMaker, leveraging its just about limitless storage capability for fine-tuning duties.
- Carry out fine-tuning – Customers can configure hyperparameters corresponding to studying charge, epochs, and batch measurement earlier than launching the fine-tuning job. SageMaker then manages the coaching course of, together with provisioning compute sources and logging progress.
- Deploy the mannequin – As soon as coaching is full, individuals can deploy their fashions straight from SageMaker Studio for inference or import them into Amazon Bedrock, which gives a totally managed setting for scalable manufacturing deployment.
- Consider and iterate – In the course of the AWS AI League, analysis was carried out utilizing LLM-as-a-Decide, an inner judging system that routinely scored fashions on high quality, accuracy, and responsiveness.
This simplified workflow, depicted above, exhibits the AWS AI League mannequin improvement lifecycle and the way it helps cut back the complexity of growing and operationalizing specialised AI fashions, whereas preserving efficiency, transparency, and cost-efficiency. For Atos, this hands-on course of gives a sensible, production-ready basis for extending generative AI capabilities into customer-facing options. Individuals have been required to generate insurance coverage use case datasets in JSON Strains (JSONL) format. Every document consisted of two fields:
- Instruction – the immediate or query for the Clever Insurance coverage Underwriter to think about.
- Response – an instance of the best reply the fine-tuned mannequin ought to produce.
These datasets shaped the muse for the mannequin fine-tuned section.
To simplify dataset creation, individuals got entry to an AWS supplied PartyRock utility which supplied an simple-to-use interface for producing and exporting information. As soon as full, the datasets have been uploaded to Amazon Easy Storage Service (Amazon S3), the place they served because the enter for mannequin fine-tuning.
Throughout fine-tuning, individuals may regulate a variety of hyperparameters to affect the fine-tuning together with, however not restricted, to the next:
- Epochs – The variety of occasions the fine-tuning course of will cross over the dataset .
- Studying charge – The weighting utilized to the updates the mannequin makes every time it passes over the information.
After fine-tuning, individuals deployed their custom-made language fashions in Amazon SageMaker and used the endpoints to carry out inference. This allowed them to watch how the fine-tuned fashions responded to pattern insurance coverage queries and assessed the standard of their outputs
Outcomes assorted throughout individuals. Some fine-tuned fashions delivered sturdy, contextually related solutions, whereas others displayed indicators of overfitting — a situation the place a mannequin learns the coaching information too exactly, resulting in repetitive or irrelevant responses when uncovered to new inputs. Overtrained fashions, as an illustration, are likely to echo phrases from the dataset quite than generalizing to unseen eventualities. Armed with these insights, individuals evaluated their fashions’ efficiency and decided which variations to undergo the AWS AI League leaderboard and which to refine or discard. This iterative course of emphasised experimentation, information high quality, and parameter tuning as key success components in attaining high-performing generative AI fashions.
Gamification ignites participation.
Fingers-on labs and workshops are a good way to supply folks with a chance to be taught by doing however offering a gamified strategy the place you’re competing with different folks takes it to a different degree. Atos noticed this with the AWS AI League. Following an preliminary kick-off workshop, Atos individuals created and submitted preliminary fashions, earlier than turning their strategy to maximizing their scores on the leaderboard by iteratively creating or bettering their datasets and tuning their hyperparameters over a two-week digital league. By the completion of the digital spherical, Atos had their greatest degree of engagement for a gamified competitors, with 409 individuals on the leaderboard, with over 4,100 fine-tuned fashions having been created.
Regardless of the gamified nature of the competitors, communication channels and workplace hours have been full of individuals balancing sharing data with one another while avoiding giving every part away. It was a fantastic steadiness which made certain those who wished to participate and enhance have been supported sufficient, while additionally having to determine some issues out for themselves. The pleasant competitors was extremely fierce on the similar time, and to make the highest 5 a participant’s fine-tuned mannequin was required to realize no less than a 93%-win charge towards the solutions supplied by a a lot bigger mannequin, displaying the ability of fine-tuning for area particular data. The digital stage of the competitors was totally automated with a Llama 3.2 90B LLM as a choose offering the scoring. Upon completion of the digital spherical, the highest 5 individuals have been taken ahead to a dwell gameshow finale, competing for a spot within the AWS finals throughout AWS re:Invent Las Vegas in December.
To rank the highest 5, the dwell finale launched extra scoring strategies, in addition to offering the finalists with a chance to affect their mannequin’s response. Finale scoring was break up between 40% for LLM-as-a-Decide, 40% between our 5 human skilled judges from Atos, and 20% for viewers voting. 5 rounds of questions supplied an ample likelihood to take a look at the mannequin’s efficiency, and through every query the finalists have been in a position to affect mannequin output with some system prompting, and hyperparameter tuning for inference (temperature and prime p to manage the randomness and creativity of the reply). Finalists solely had 90 seconds to tune their inference and submit their closing solutions, so it was a tense and shut competitors.
Tricks to fine-tune your technique to success
The fine-tuning competitors comes down to 2 key parts – the individuals’ capability to generate a very good dataset for the topic of the competitors, and a capability to search out the optimum hyperparameters to make use of for fine-tuning with the dataset.
While AWS supplied a PartyRock utility to generate a dataset, a number of the Atos individuals took inspiration from the supplied utility and remixed their very own. The thought of this utility was to a) generate extra information and b) generate various and distinctive information, each enhancements over the AWS supplied utility. Some individuals selected to make use of different generative AI instruments they’d entry to, to generate their very own responses, however this required them to create system prompts that the PartyRock utility took care of to confirm information was supplied in the fitting format, for instance.
Bigger datasets didn’t essentially result in higher outcomes, so there was additionally a requirement to evaluation the datasets that had been generated and work out the way to enhance them. Profitable individuals additionally used generative AI for this, with basic suggestions on the way to enhance (e.g. for the Atos use case areas of insurance coverage which will have been lacking from the dataset), in addition to extra particular suggestions and actions being taken on the dataset, for instance eradicating objects within the dataset that have been too related. This resulted in a brand new PartyRock utility being created and shared amongst individuals to supply enchancment suggestions.
Individuals had management over a number of vital hyperparameters that considerably influenced fine-tuning outcomes. Epochs decide what number of occasions the coaching course of passes over your entire dataset—too few epochs lead to underfitting the place the mannequin hasn’t discovered sufficient, whereas too many could cause overfitting the place the mannequin memorizes coaching information quite than generalizing. Studying charge controls the magnitude of updates the mannequin makes throughout every coaching step; a excessive studying charge allows sooner coaching however dangers overshooting optimum values, whereas a low studying charge gives extra exact changes however requires longer coaching time.
Further parameters included batch measurement, which impacts coaching stability and reminiscence utilization, and Low-Rank Adaptation (LoRA) parameters corresponding to lora_r and lora_alpha, which management the effectivity of the fine-tuning course of. Profitable individuals approached hyperparameter tuning systematically, both altering single values at a time to isolate their results or adjusting associated parameters collectively whereas rigorously logging outcomes to establish patterns
Understanding mannequin efficiency and overfitting
This discrepancy highlights an vital side of mannequin conduct. Throughout fine-tuning, the mannequin steadily turns into higher at answering questions derived from the coaching and analysis datasets, that are subsets of the identical underlying information. Nevertheless, the leaderboard evaluated every mannequin utilizing 87 unseen questions — examples that have been not included within the coaching information.
Throughout fine-tuning, individuals may additionally monitor metrics corresponding to analysis loss (eval-loss) and perplexity (ppl), which assist point out how properly a mannequin matches the coaching information. Decrease eval-loss and perplexity usually recommend the mannequin is studying the dataset successfully, whereas giant gaps between coaching and analysis metrics can sign overfitting and lowered capability to generalize. Analysis loss is the loss worth calculated on the validation or analysis dataset throughout coaching. It measures how properly the mannequin predicts the proper subsequent tokens for examples it has indirectly educated on in that step. Perplexity is a generally used metric for language fashions that represents how “stunned” the mannequin is by the analysis information. Decrease perplexity signifies the mannequin is best in a position to predict the proper subsequent tokens, suggesting it has discovered the underlying patterns within the dataset extra successfully.
In consequence, some fashions grew to become overfitted, that means they carried out extraordinarily properly on the information they’d seen however struggled to generalize to new questions. This sample might be noticed by deploying the mannequin to an inference endpoint and interacting with it straight: overfitted fashions usually produced irrelevant or repetitive responses, a transparent signal that they’d memorized patterns from the coaching set quite than studying to purpose extra broadly.
Upskilling ambitions achieved
Via the AWS AI League Atos’ ambition was to place generative AI tech into individuals’ arms and permit them to really feel extra assured speaking about and utilizing it after the occasion, while having some enjoyable and workforce constructing alongside the way in which. Individuals discovered how a smaller 3 billion parameter mannequin (Llama 3.2 3B Instruct) may outperform a a lot bigger 90 billion parameter mannequin via fine-tuning with related area data, on this occasion turning into a real digital insurance coverage underwriter assistant in a position to reply advanced instances with applicable suggestions on danger areas and applicable ranges of deductibles and so on. As generative AI and agentic AI develop, we see extra use instances for particular data inside AI brokers. Effective tuning a mannequin to supply this particular data can lead to a a lot smaller mannequin which might present sooner inference at a decrease price than bigger fashions, one thing that shall be essential as we enter the age of agentic AI. As you progress towards agentic AI architectures the place a number of specialised AI brokers collaborate to unravel advanced issues, having cost-effective, domain-specific fashions turns into essential. Effective-tuned fashions can function specialised brokers inside bigger agentic programs, every dealing with particular domains whereas sustaining quick response occasions and manageable prices.
Conclusion
As you proceed to discover generative AI implementations, the power to effectively construct, customise, and deploy specialised fashions turns into more and more vital. The AWS AI League gives a structured pathway for companions like Atos to deepen their AI capabilities—whether or not enhancing present choices or creating solely new, AI-driven providers that deal with real-world buyer wants. The AWS AI League program demonstrates how gamified studying can speed up companions’ AI innovation whereas driving measurable enterprise outcomes. The AWS AI League delivered measurable outcomes for Atos past participant engagement. This system confirmed that fine-tuned 3B parameter fashions may obtain win charges exceeding 93% towards a lot bigger 90B parameter fashions for domain-specific duties, demonstrating the cost-efficiency of specialised mannequin improvement. From a useful resource perspective, the fine-tuned fashions required much less computational infrastructure—operating on ml.g5.4xlarge situations in comparison with the ml.g5.48xlarge situations wanted for bigger base fashions—translating to price financial savings for inference at scale. The compressed studying timeline was notably helpful, with individuals having the ability to develop sensible AI expertise in simply two weeks that might sometimes require months of conventional coaching. The 409 energetic individuals and 4,100+ fine-tuned fashions created through the occasion represented an acceleration in Atos’s journey towards their 2026 aim of 100% AI fluency throughout their workforce. Publish-event surveys indicated that 85% of individuals felt extra assured discussing and implementing generative AI options with prospects, straight supporting Atos’s enterprise goals
If you happen to’re focused on constructing AI capabilities via hands-on, gamified studying, you may be taught extra about internet hosting their very own AWS AI League occasion on the official website.
To be taught extra about implementing AI options:
You may also go to the AWS Synthetic Intelligence weblog for extra tales about companions and prospects implementing generative AI options throughout numerous industries.
In regards to the authors
Nick McCarthy
Nick McCarthy is a Senior Generative AI Specialist Options Architect on the Amazon Bedrock workforce, based mostly out of the AWS New York workplace. He helps prospects customise their GenAI fashions on AWS. He has labored with purchasers throughout a variety of industries — together with healthcare, finance, sports activities, telecommunications, and power — serving to them speed up enterprise outcomes via using AI and machine studying. He holds a Bachelor’s diploma in Physics and a Grasp’s diploma in Machine Studying from UCL, London.
Mark Ross
Mark is the Chief Architect for AWS inside Atos’ Cloud and Fashionable Infrastructure engineering operate, and has been working with AWS since 2017. Mark has over twenty years of know-how expertise throughout quite a lot of sectors together with Monetary Companies, Quick Shifting Shopper Items, Authorities, Well being, Utilities and Media. Mark is captivated with serving to prospects construct, migrate and exploit AWS know-how, and is an AWS Ambassador, an AWS Neighborhood Builder and has held the coveted AWS Golden Jacket since 2021. Outdoors of labor Mark loves travelling and rugby union.

