Microsoft is doubling down on AI fashions that are not giant language fashions. The corporate introduced on Thursday that it is releasing three new fashions: model new fashions for voice and textual content transcription, and the second era of its in-house picture mannequin.
The voice and textual content transcription fashions are the primary of their type from Microsoft. The transcription mannequin can translate recordings into textual content in 25 completely different languages. It is constructed for video captioning, assembly transcription and voice brokers. The voice mannequin can create audio recordings as much as 60 seconds lengthy. The corporate says its second-generation picture mannequin has a sooner era velocity and extra lifelike depictions, enhancing on its earlier mannequin. They’re accessible now in Microsoft’s Foundry and MAI playground, with future plans to convey MAI-Picture-2 to Bing and PowerPoint. Builders can take a look at pricing information right here.
These new fashions are a transparent signal that Microsoft is seeking to broaden its choices throughout the AI market. Microsoft’s Copilot is likely one of the hottest chatbots for companies, particularly those that already use Microsoft’s Workplace 360 suite and Azure cloud service. Except for the now-outdated authentic picture mannequin, Microsoft has primarily centered on text-based fashions, making an attempt to tell apart itself amongst its many rivals as a safe, enterprise-friendly possibility. Its latest AI instruments, Copilot Cowork and Copilot Well being, are proof of that.
The fashions are additionally a reminder that Microsoft, as a legacy tech firm, has the money and compute to burn on these sorts of “facet quests” that even billion-dollar start-ups like OpenAI cannot all the time afford to do. Final week, OpenAI confirmed that will probably be discontinuing its Sora AI video app, citing that it’ll refocus on core actions. The AI business in 2026 has been aiming to show its instruments are helpful within the office, particularly with Anthropic’s Claude Code leapfrogging the competitors.
Generative media, just like the fashions that energy AI picture and video era, require numerous compute and vitality to run, which might be spent elsewhere. Google, as one other legacy tech firm with billions of its price range allotted to AI analysis, indicated this week that it will not be giving up on generative media however will probably be making an attempt to make fashions extra cost- and energy-efficient, as with its new Veo 3.1 Lite video mannequin.

