This submit is cowritten by Sebastian Angersbach, Philip Trempler, and Weiran Zhang from Volkswagen Group.
Volkswagen Group stands as one of many world’s largest automotive producers, delivering 6.6 million autos within the first 9 months of 2025. The Group includes ten distinct manufacturers from 5 European international locations: Volkswagen, Volkswagen Industrial Autos, ŠKODA, SEAT, CUPRA, Audi, Lamborghini, Bentley, Porsche, and Ducati. In 2025, the AWS Generative AI Innovation Heart labored with Volkswagen Group’s advertising and technical groups to construct an answer that might harness generative AI’s velocity and scale whereas sustaining the model precision that defines Volkswagen Group. The result’s an end-to-end advertising picture era and analysis pipeline, with picture era fashions hosted on Amazon SageMaker AI endpoints and picture analysis powered by Amazon Bedrock. The next diagram exhibits the end-to-end advertising picture era and analysis pipeline.
On this submit, we discover the challenges that Volkswagen Group confronted in producing brand-compliant advertising property at scale. We stroll via how we constructed a generative AI resolution that generates photorealistic car photographs, validates technical accuracy on the element degree, and helps implement model guideline compliance alignment throughout the ten manufacturers.
The problem – world scale meets model precision
For Volkswagen Group’s advertising groups, this scale creates a unprecedented problem: producing 1000’s of selling property yearly whereas ensuring that each picture displays the precise model requirements that clients have come to count on. A single car launch may require tons of of variations—totally different angles, environments, lighting situations, and regional variations—every historically requiring months of manufacturing work.
On-location picture shoots for a single mannequin may price upwards of six figures. They require bodily prototypes, skilled studio setups with exact lighting rigs, and complicated logistics to move autos between places for various environmental pictures. Past the manufacturing prices, the true bottleneck emerged within the validation course of: ensuring every asset aligned with its model’s distinctive voice and visible pointers earlier than it may attain the market.
What if Volkswagen may generate photorealistic car photographs in minutes as an alternative of weeks? The potential was clear—quicker time-to-market, dramatic price reductions, and the power to create personalised content material at scale. However for a premium automotive model, there was a non-negotiable constraint: each generated picture needed to be indistinguishable from skilled images and completely aligned with model pointers.
The problem prolonged past technical accuracy. Every of the Group’s ten manufacturers has its personal visible language: the understated class of Bentley calls for totally different staging than the performance-focused aesthetic of Porsche or the accessible modernity of ŠKODA. Options would want to generate high-quality photographs and in addition systematically validate that every asset honored its model’s distinctive identification.
Producing on-brand car photographs at scale
Step one in Volkswagen’s generative AI journey was deceptively easy: may basis fashions (FMs) generate photorealistic photographs of their autos? Preliminary experiments with base diffusion fashions revealed two essential gaps. First, whereas these fashions may produce spectacular automotive imagery, they lacked many years of Volkswagen design language. The tiniest options matter: the precise texture of a grille mesh, the exact geometry of headlight housings, the precise wheel spoke patterns for every mannequin line. The fashions would generate a Volkswagen, however with generic wheels and grille patterns that didn’t match an precise mannequin 12 months. Second, base fashions had no information of unreleased autos. They couldn’t generate photographs of subsequent 12 months’s fashions nonetheless underneath growth, severely limiting their utility for forward-looking advertising campaigns.
The answer required fine-tuning basis fashions on Volkswagen’s proprietary visible property. Working with SolidMeta, the group used DreamBooth fine-tuning methods with coaching information collected from digital twins in NVIDIA Omniverse. The next diagram illustrates this course of for the Volkswagen Tiguan. DreamBooth coaching works in two elements: first, the mannequin learns from VW Tiguan photographs paired with a singular identifier token [VW Tiguan] that teaches it this particular car. Second, the mannequin trains on generic automobile photographs to protect its normal capabilities and assist stop overfitting to the coaching set.
With this method, we will generate high-quality coaching information with exact management over car specs and environmental situations. The group deployed the Flux.1-Dev diffusion mannequin enhanced with a LoRA (Low-Rank Adaptation) adapter on an Amazon SageMaker AI endpoint. This method allowed them to specialize the mannequin’s understanding of the VW design language, right down to grille textures and particular trim choices, whereas sustaining the bottom mannequin’s normal picture era capabilities.
The structure used the managed infrastructure of Amazon SageMaker AI for each coaching and inference. The personalized mannequin was deployed to Amazon SageMaker AI endpoints configured for asynchronous inference on ml.g5.2xlarge GPU situations dealing with the computationally intensive diffusion course of. The group configured the pipeline for asynchronous inference with automated scaling, permitting it to deal with variable workloads effectively.
However producing photographs required greater than a fine-tuned mannequin, it required the proper prompts. The group shortly found that efficient prompts for automotive advertising imagery required specialised vocabulary and elegance modifiers that almost all customers lacked. A advertising group member may enter “silver VW in a forest,” however producing model compliance-aligned imagery required much more specificity: lighting situations, digicam angles, environmental particulars, and exact descriptions of auto options.
To bridge this hole, Volkswagen carried out an automatic immediate optimization system utilizing Amazon Nova Lite. Earlier than every picture era request, Nova Lite helps improve the person’s enter immediate, increasing it with brand-appropriate particulars, technical specs, and stylistic parts drawn from VW’s advertising pointers. A easy immediate turns into a complete description that guides the diffusion mannequin towards model compliance-aligned outputs.
The fine-tuned mannequin generated photographs with correct grille textures, right wheel designs particular to every trim degree, and correct car proportions distinctive to every VW model. The immediate optimization facilitated consistency in type and tone throughout totally different customers and use circumstances. Advertising and marketing groups may now generate high-quality car renderings faster – together with for unreleased fashions that may have been not possible to visualise with conventional strategies.
However a brand new problem emerged: at scale, how do you validate that each generated picture meets Volkswagen’s exacting requirements? Guide inspection of every picture wasn’t possible when producing tons of or 1000’s of variations. The group wanted an automatic high quality management system that might consider photographs with the identical precision as a human model knowledgeable—and do it at machine velocity.
Automated high quality management – component-level analysis
The group’s first intuition was to leverage established picture high quality metrics like PSNR (Peak Sign-to-Noise Ratio) and SSIM (Structural Similarity Index). These metrics shortly proved insufficient. They evaluated complete photographs together with backgrounds, making it not possible to isolate the car itself. Extra critically, they couldn’t determine which particular elements had been improper. A generated picture may rating acceptably whereas having an incorrect grille sample or improper wheel design—exactly the small print that matter most. The numerical scores typically didn’t align with human notion: photographs that appeared clearly improper to specialists may rating properly on conventional metrics.
The group wanted a unique method: consider autos the way in which human specialists do, by analyzing particular person elements with detailed standards particular to automotive design.
The answer mixed laptop imaginative and prescient segmentation with vision-language fashions (VLMs) as automated judges. The method begins by breaking down each reference images and generated photographs into particular person elements: wheels, grille, headlights, windshield, mirrors, doorways, bumpers, and logos. The next actual, photographic photographs of the Volkswagen Tiguan present this segmentation from 4 commonplace angles with bounding containers highlighting every element utilizing a zero-shot picture segmentation mannequin.
The next determine exhibits the identical course of utilized to a generated picture:
This segmentation makes use of the open supply Florence-2 mannequin, hosted on an Amazon SageMaker AI endpoint. With this, the group may specify precisely which elements to detect slightly than counting on generic object detection. To deal with occasional errors, the pipeline contains a big language mannequin (LLM)-aided verification step utilizing Nova Lite to verify every extracted section matches its meant label. After the elements are segmented and paired, they’re introduced side-by-side for analysis as proven within the following determine.
The group developed component-specific standards: for wheels, this contains spoke design, middle cap particulars, and rim profile; for grilles, it covers form, texture, and emblem positioning; for headlights, it evaluates housing, trim, and inner construction. Claude 4.5 Sonnet on Amazon Bedrock acts because the VLM choose, making use of these standards to every element pair. The mannequin receives a calibration information defining scores from 1 (apparent flaws seen to informal viewers) to five (no variations detectable even by specialists). Claude evaluates every criterion individually with detailed reasoning. The next determine demonstrates this for a headlight analysis.
Housing and trim obtain good 5/5 scores, however inner construction receives 4/5, with the reason: “the AI-generated picture exhibits extra element within the inner construction, which could not be correct based on the supplied reference picture.” This granular suggestions supplies precisely what Volkswagen wanted—particular, actionable insights about the place generated photographs deviate from reference specs.
The pipeline is orchestrated via AWS Step Features, with Amazon S3 offering storage for reference photographs, generated outputs, and analysis outcomes. The system can combination scores throughout a number of photographs to determine systematic points—for instance, discovering that sure angles constantly rating decrease, indicating a necessity for extra coaching information.
This component-based method solved the technical accuracy problem. However facilitating product correctness was solely half the battle. Volkswagen additionally wanted to validate that generated photographs honored every model’s distinctive identification and advertising pointers.
Facilitating model guideline compliance alignment
Element-level accuracy solved the technical problem of whether or not a generated grille or wheel matched specs. However Volkswagen’s model requirements lengthen far past technical correctness. Every of the Group’s ten manufacturers has fastidiously crafted pointers governing every part from coloration palettes and lighting situations to environmental contexts and emotional tone. A technically good picture of a Porsche may nonetheless violate model pointers if staged incorrectly or lit inappropriately.
Volkswagen’s model identification emphasizes sensible, attainable settings with softer night golden hour tones. Photos ought to present autos in city streets, countryside roads, household driveways—not fantastical or overly stylized environments. The staging should really feel genuine: autos parked legally, positioned naturally, and introduced in ways in which align with the model’s values of high quality, reliability, and considerate engineering.
The complexity multiplies when contemplating regional variations. What’s compliant in a single trade could violate laws or cultural norms in one other. Contemplate advertising the trunk characteristic of the Volkswagen Touareg. In Sweden, native legislation requires a canine to be transported in a security harness or transport field. If the German advertising group makes use of a picture displaying a canine unfastened within the trunk, that content material is legally non-compliant in Sweden. Multiply this by 1000’s of micro-regulations throughout dozens of markets, and handbook evaluation turns into not possible to scale.
The group developed an LLM-based model guideline analysis system to systematically assess these subjective parts. The method makes use of Claude 4.5 Sonnet on Amazon Bedrock, offering it with each the generated picture and Volkswagen’s complete model pointers as context. The mannequin evaluates a number of dimensions: model identification and design language, coloration illustration, picture type and tone, car presentation, staging and setting, perspective and focal size, and compliance with regional laws. The next determine exhibits an instance of a model compliance evaluation.
In contrast to the element analysis system that compares in opposition to reference photographs, this analysis is criteria-based. The mannequin assesses whether or not the picture honors brand-specific parts like Volkswagen’s signature coloration palette, whether or not the emotional tone is “disarmingly sincere, genuinely human, and surprisingly empathetic” for story-driven photographs, and whether or not the staging feels genuine slightly than overly produced.
The system proved notably useful for catching regional compliance points that may be almost not possible to determine manually. In a single instance, the system evaluated a picture meant for UK trade localization. Whereas the picture efficiently confirmed a right-hand drive car in a British city setting, the model guideline analysis flagged a essential subject. The next photographs present an instance of regional compliance analysis.
The mannequin assigned a 2/5 rating to “Logos and License Plates,” explaining that the license plate used a European continental type and recognized it as a German plate beginning with “WOI”. This element would instantly sign to UK clients that the picture wasn’t correctly localized. This type of delicate inconsistency may undermine the authenticity that Volkswagen works so laborious to keep up, but may go unnoticed in a handbook evaluation of tons of of photographs.
By combining component-level technical analysis with model guideline compliance checking, Volkswagen created a complete high quality management system. Generated photographs are routinely filtered for each accuracy and model alignment earlier than reaching advertising groups. The system supplies detailed suggestions on each dimensions, permitting groups to shortly determine which photographs meet the requirements and perceive precisely why others don’t.
The group acknowledged a possibility to go additional. Might they fine-tune the analysis fashions themselves to raised align with Volkswagen’s particular model experience? Might they train the AI judges to assume extra like Volkswagen’s personal advertising specialists?
Steady enchancment – customizing Nova Professional for model analysis
The model guideline analysis system utilizing Claude 4.5 Sonnet supplied robust outcomes, however the group noticed a possibility to go additional. Might they customise a basis mannequin particularly for Volkswagen’s model requirements, instructing it to judge photographs the way in which the corporate’s personal advertising specialists would?
One method is Supervised Effective-Tuning (SFT), however this sometimes requires 1000’s of labeled examples. Getting advertising analysts at Volkswagen Group to manually label 1000’s of photographs could be impractical and costly. The group wanted a extra environment friendly resolution.
Their perception was to make use of the model pointers themselves to generate artificial coaching information. Utilizing an LLM, they generated 1,000 picture prompts designed to provide model compliance-aligned photographs and 1,000 prompts crafted to violate particular model pointers. The next illustration exhibits a compliance-aligned and non-compliance-aligned immediate instance as a part of the artificial coaching information era course of.
As a result of the group knew which prompts had been compliance-aligned and which weren’t based mostly on how they had been constructed, they may routinely generate the corresponding analysis textual content for every picture as proven within the following illustration. This created full SFT coaching pairs: picture enter paired with the proper model analysis output.
With this artificial dataset, the group used the Amazon Nova Mannequin Customization SFT recipe operating on Amazon SageMaker Coaching Jobs to customise Nova Professional. The next determine illustrates this course of finish to finish.
The fine-tuning course of taught the mannequin to acknowledge and articulate model compliance points particular to Volkswagen’s pointers, from coloration palette adherence to environmental authenticity to regional regulatory necessities. The personalized mannequin’s reasoning grew to become extra exact, referencing Volkswagen-specific design language and model values in its evaluations.
This method presents a scalable path ahead. The identical method might be prolonged to every of Volkswagen Group’s ten manufacturers, customizing analysis fashions for the distinctive voice and pointers of Porsche, Audi, ŠKODA, and others. As the corporate generates extra real-world information and gathers suggestions from advertising groups, these personalized fashions might be constantly refined, creating an analysis system that may develop extra aligned with model experience over time.
“By combining our area experience with AWS, we constructed a generative AI platform that makes our advertising quicker, smarter, and safer.”
– Sebastian Angersbach, Head of IT Technique & Innovation, Volkswagen Group Providers
Acknowledgements
Particular because of Egor Krasheninnikov, Satyam Saxena, and Huong Vu for his or her invaluable contributions and steerage.
Concerning the Authors
Liam Byrne
Liam Byrne is an AI/ML Scientist on the AWS Generative AI Innovation Heart, the place he collaborates with enterprise clients to construct bespoke AI options that handle complicated enterprise challenges and drive worth creation. He has a selected curiosity in multimodal visible understanding, customized fashions, and AI Brokers.
Kim Robins
Kim Robins is a Senior Generative AI Strategist at AWS the place he’s guiding clients on navigating complicated and rising AI matters resembling Bodily AI, ADAS and Agentic Options that drive outsized enterprise outcomes.
Philipp Hinderberger
Philipp Hinderberger is a Senior Generative AI & ML Resolution Architect at AWS, the place he companions with automotive and manufacturing clients to design and implement AI options that remedy complicated enterprise challenges. His work focuses on AI brokers and mannequin customization.
Sebastian Angersbach
Sebastian Angersbach is Head of IT Service Technique and Innovation at Volkswagen Group Providers, the place he leads the groups for AI Options and Low‑Code Options and heads the AI Hub. He drives the strategic adoption and scaling of synthetic intelligence, particularly agentic ideas, throughout the VW Group.
Philip Trempler
Philip Trempler is an AI & Cloud Technique Lead at Volkswagen Group Providers, chargeable for the strategic partnership with AWS. He drives structure selections, interprets superior cloud and AI capabilities into scalable enterprise options, and bridges know-how management with enterprise influence.
Weiran Zhang
Weiran Zhang is a part of the AI Hub and IT Service Technique & Innovation at Volkswagen Group Providers. He focuses on prototyping, growing, and implementing AI-driven options, translating rising applied sciences into sensible, scalable purposes.

