Microsoft launched three specialised synthetic intelligence (AI) fashions on Thursday, specializing in picture era, voice era, and speech-to-text transcription. The Redmond-based tech large claims that these fashions outperform specialised fashions from rival corporations, corresponding to Google, OpenAI, and others. The fashions, MAI-Transcribe-1, MAI-Voice-1, and MAI-Picture-2, are additionally mentioned to concentrate on quick era and aggressive pricing. These are at present obtainable by way of the Microsoft Foundry, and they’re additionally being rolled out to numerous client merchandise.
Microsoft Brings Three New AI Fashions
In a newsroom publish, the tech large launched the three new massive language fashions (LLMs). All of them are at present obtainable by way of Microsoft Foundry and the MAI Playground. The most important spotlight is the MAI-Transcribe-1, which the corporate claims delivers state-of-the-art (SOTA) speech-to-text transcription throughout the 25 most used languages.
The claims are based mostly on Microsoft’s inner testing on the FLEURS benchmark. It’s mentioned to outperform Gemini 3.1 Flash and GPT-Transcribe in error fee. Moreover, the corporate says Foundry customers will discover it to be the “finest price-performance of any massive cloud supplier.”
Coming to MAI-Voice-1, the LLM is alleged to generate “pure, real looking speech, wealthy with nuance, emotional vary, and expression.” The mannequin can also be mentioned to ship constant speech and voice id throughout long-form content material era. Inside Foundry, the mannequin can even permit customers to create a customized voice with a couple of seconds of audio.
Microsoft claims that this course of is secure and safe. It’s mentioned to generate 60 seconds of audio in a single second. Notably, the AI mannequin can even energy Copilot Audio Expressions and Copilot Podcasts.
Lastly, the MAI-Picture-2 mannequin builds on the capabilities of its predecessor and is alleged to ship improved output high quality at a sooner pace. Microsoft revealed that the mannequin was created in collaboration with photographers, designers, and visible storytellers, and it focuses on pure lighting, correct textures, and clear in-image textual content. Notably, WPP is among the many first enterprise companions to have adopted the AI mannequin.
The mannequin, much like the opposite two, might be obtainable by way of the Microsoft Foundry and the MAI Playground. Moreover, it is usually rolling out to Copilot, Bing, and PowerPoint.

