Ship hyperpersonalized viewer experiences with an agentic AI film assistant utilizing Amazon Nova Sonic 2.0Recommendation methods are the spine of recent media streaming providers, shaping how customers uncover content material. Conventional machine studying (ML) methods use collaborative or content-based filtering to foretell content material preferences. Nevertheless, they usually miss context-dependent wants, similar to time of the day, temper, or social settings. For instance, after watching ‘The Shawshank Redemption,’ a system would possibly recommend extra jail dramas, ignoring that the person would possibly need one thing lighter to unwind. A hybrid strategy addresses this hole by combining conventional machine studying pattern-recognition capabilities with generative AI’s contextual understanding and conversational talents. Agentic AI takes this additional by participating customers by way of dynamic dialogue and reasoning about viewing context. These advice brokers synthesize data from a number of sources—plot summaries, critiques, viewing historical past—and incorporate real-time person suggestions. Customers can ask about particular scenes or themes, and the agent offers contextual explanations. This creates an expertise like consulting a educated curator who understands each content material and particular person preferences.
On this put up, we stroll by way of two use circumstances that assist improve the person viewing expertise. First, think about telling the AI agent that you really want one thing enjoyable after a protracted day, and getting suggestions that match how you’re feeling and never solely what you’ve watched earlier than. Second, image pausing midmovie to ask: “who’s that actor?” or “summarize what simply occurred?” and getting an on the spot reply. Constructing this conversational assistant requires orchestrating real-time speech processing, context administration, device invocation, and curated responses. This can be a advanced problem that we may help streamline utilizing agentic AI instruments and frameworks together with Strands Brokers SDK, Amazon Bedrock AgentCore, and Amazon Nova Sonic 2.0. This agentic AI system makes use of a Mannequin Context Protocol (MCP) to ship a private leisure concierge that understands person preferences by way of pure dialogue. We share the code samples for this software within the GitHub repository.
Structure
The answer structure focuses on 1/ film advice and a couple of/ film scene evaluation. We elaborate on each these flows in higher element within the subsequent sections of this put up.
Consumer interplay workflow
- Consumer authenticates with the online UI that’s hosted as a static web site on Amazon Easy Storage Service (Amazon S3) and serves by way of Amazon CloudFront with Amazon Cognito.
- A WebSocket connection is established from the shopper to the server hosted in AWS Fargate uncovered utilizing an Amazon CloudFront endpoint. The session communications between shopper and servers are carried out by way of this connection. WebSocket connections require JWT token validation at connection time. The session communications between shopper and servers are carried out by way of this connection.
- AWS Fargate server validates the incoming connection, instantiates a session with Amazon Nova Sonic 2.0 for bidirectional streaming communications with the server.
- Consumer voice instructions are despatched to the Amazon Nova Sonic mannequin by way of the established WebSocket connection. The Fargate container makes use of a bidirectional Smithy streaming RPC protocol to speak with the Nova Sonic mannequin. Responses from the mannequin are processed by the container.
- AWS Fargate container manages the device occasions from Nova Sonic and initiates an agentic workflow by utilizing the MCP server to course of person requests. Amazon Bedrock AgentCore Gateway helps rework AWS Lambda capabilities into MCP-compatible instruments for the agent.
- AWS Lambda makes use of Amazon Nova understanding fashions (micro, lite, professional) for processing, with OpenSearch and S3 Vector serving because the semantic search and storage layers. Outcomes are returned to the server by way of Amazon Bedrock AgentCore Gateway.
- AWS Fargate sends the response to Amazon Nova Sonic for the ultimate voice response formulation. The voice response is streamed to the online UI by way of the WebSocket connection.
Pure speech person interface
We use Amazon Nova Sonic 2.0, our newest speech-to-speech mannequin that delivers real-time, human-like voice conversations with low latency. This offers a person expertise with fluid exchanges that really feel genuinely conversational, serving to rework AI interactions from inflexible Q&A periods into dynamic, productive dialogues. With asynchronous help for job completion, you possibly can keep fluid dialogue whereas processing advanced duties within the background throughout lively conversations. Lastly, Nova Sonic 2.0 natively helps each textual content and streaming speech inputs, providing you with flexibility in the way you work together with the AI assistant. It’s also possible to outline the persona of the AI assistant by offering a system immediate at the start of the dialog. The power to manage the assistant’s persona makes certain that responses can keep on-brand and inside applicable boundaries, serving to to guard your service’s fame. We share some finest practices on creating efficient system prompts with Nova Sonic to assist maximize the outcomes. The entire system immediate outlined in our resolution will be discovered on this module.
Preprocessing workflow
The next diagram illustrates the offline processes that generate key insights from title catalog knowledge, film scenes, and film scripts. These insights help the film personalization and scene evaluation workflows in our resolution and function the foundational information for the film assistant agent.
To showcase the film personalization function, we created 500 pattern films to symbolize the catalog. The film’s metadata, together with title, style, and outline are transformed into an embedding, a numerical illustration that captures its which means. This permits semantic search, the place queries are matched primarily based on which means quite than actual key phrases. Different metadata together with forged members and launched dates are saved as attributes in the identical index inside an Amazon OpenSearch Service cluster with Amazon Easy Storage Service (Amazon S3) Vector because the storage layer. This index is used to energy the hybrid and seek for film advice workflow described within the earlier part.
To allow the scene evaluation function with excessive accuracy, we break up the processing of the media content material into two steps. First, we use Amazon Bedrock Knowledge Automation to extract key insights from the video content material. The insights embrace chapter stage abstract and the corresponding timecodes, transcriptions, audio segments and extra. Moreover, we use the movie star recognition function in Amazon Rekognition to establish celebrities showing within the chapters. Second, we use the embeddings generated from the film scripts extracted by way of Amazon Bedrock Knowledge Automation for semantic similarity search. These embeddings are the muse for which the agent makes use of to seek out probably the most semantically related moments inside the script that match the given film scene abstract. We offer extra particulars on this within the later part.
Film advice stream
The next person interplay demonstrates a content material personalization workflow in additional element:
Referencing the earlier diagram, when a person asks for a film advice, Amazon Nova Sonic acknowledges the person’s intent and triggers the suitable device to deal with the film advice requests. A Lambda perform is triggered utilizing AgentCore Gateway to course of the request. The perform first retrieves the person’s affinity from the DynamoDB desk to raised perceive the person’s profile. This desk represents a customized profile of every person’s preferences, tastes, and viewing patterns. For example, if the person has watched the Harry Potter collection previously, the system may assign greater affinity in the direction of fantasy and journey genres. Combining person affinity and the person question, we course of the request in a number of giant language mannequin (LLM) calls chained in a sequence. First, an LLM classifies the kind of search primarily based on the intent of the question. The classification helps decide the suitable search question to make use of. For example, basic film suggestions, direct film search, film quotes, or one thing utterly unrelated to films. We use Amazon Nova Micro for this immediate given its value efficiency profit. The immediate that extracts the intent classification, and different metadata will be discovered on this code pattern.
Subsequent, we ship the person question to a different LLM to rewrite it to supply a richer, extra related search question that can be utilized for semantic search towards the film catalog knowledge. For example, the next person question:
I’m on the lookout for some enjoyable films, what do you suggest?
might be rewritten to:
Enjoyable and entertaining films that supply humor, pleasure, or pleasurable storytelling
By means of our inner testing, we discovered that utilizing Amazon Nova Lite produced the structured response in probably the most price optimized method. The entire question rewrite immediate snippet is discovered within the following code pattern.
The output from the question to rewrite and intent classification prompts are used as parameters to an Amazon OpenSearch Service search question. We convert the rewritten question right into a 1024-dimension vector utilizing Nova embeddings for semantic search. Moreover, our search question incorporates recency and recognition boosting to raise newer, prime performing reveals within the advice rankings. An instance search question snippet can discovered within the _get_titles_from_query_v2 perform within the following OpenSearch module.
The outcomes from the pattern search question returned 30 related films. Lastly, we use the Amazon Rerank mannequin to re-rank the really useful films primarily based on the search outcomes and the rewritten person question to return the highest three most related films. This course of repeats all through the person session, with the dialog historical past being a part of the context to the earlier LLM chain to additional assist enrich the person expertise. Right here’s a screenshot of an instance for a film advice question:
Common advice request
Q: “I’m on the lookout for some enjoyable films, what do you suggest?”
Q: “Do you’ve one thing more moderen?”
Direct Film Search
Q: “I’m on the lookout for a film referred to as ‘Tears of Metal’, do you’ve this film?”
Film scene evaluation stream
Like the advice stream, right here we define a scene evaluation workflow enabled by way of the identical parts. Think about you needed to take a break and missed a couple of minutes of your favourite present, this assistant will offer you a abstract. It will possibly additionally offer you an in depth evaluation of a scene, together with the actors and what’s taking place within the scene.
When a person pauses the film to ask a query, the applying captures the related metadata similar to the present timecode and film title, then shops this data in an Amazon DynamoDB desk for later use. For instance, if the person asks, “Are you able to inform me what’s taking place on this scene?”, the applying references the person watch log to find the newest state and the film they’re watching. The scene evaluation is dealt with by a device triggered by Amazon Nova Sonic primarily based on the contextual understanding of the person dialog. We course of the scene evaluation request in a number of LLM calls chained in a sequence. First, we use Amazon Nova Micro with a immediate to categorise the intent of the scene evaluation primarily based on the person question. The immediate used for this job will be discovered within the movieScene_classfier perform within the film scene assistant module.
Based mostly on the intent classification performance, we set off the suitable workflow to course of scene particulars. For this part, let’s deal with the scene stage element. Utilizing the retrieved person watch log, we extract the chapter abstract, transcription, and recognized celebrities matching the given timecodes. The film insights together with scene element and film scripts, are processed utilizing Amazon Bedrock Knowledge Automation and saved in an Amazon OpenSearch Serverless assortment for semantic search and filters. We optionally embrace earlier chapter data when the person watch log is located at the start of a chapter in order that we are able to present higher context to assist enrich the scene evaluation. Subsequent, we use the extracted scene particulars to seek out probably the most semantically related segments from the film script and use the script element to supply the enriched scene understanding. To show this course of, let’s contemplate a scene extracted between 5:15-5:40 from “Tears of Metal” as follows:
Based mostly on the insights generated by Amazon Bedrock Knowledge Automation, the chapter abstract for the earlier scene is proven as follows:
In a historic European metropolis, a big robotic crashes by way of rooftops, inflicting destruction. A person with a cybernetic eye watches the robotic’s actions from a balcony. He communicates with somebody off-screen, saying “Get her to return in” and “2 minutes left.” He then urges, “Pace it up, bud,” indicating a time-sensitive operation. One other man with a rifle seems, positioned to have interaction the robotic. The scene reveals a navy or tactical operation in progress, with the robotic as the first risk. The cybernetic-eyed man screens a countdown timer, suggesting they’re getting ready for a confrontation with the robotic.n[spk_0]: Get her to return in. 2 minutes left. 2 minutes. Pace it up, bud.
Utilizing the earlier scene abstract, we carry out a semantic search towards the Amazon OpenSearch index managed by Amazon Bedrock Data Bases. The info supply accommodates the film script preprocessed with a piece dimension of 300 tokens coupled with 10% overlap. We re-rank the retrieved paperwork utilizing the Amazon Rerank mannequin to pick the highest phase for use as the ultimate illustration of the scene description. Right here’s a pattern script phase derived from the reranking course of:
It seems to be round after which spots Barley within the tower. BARLEY: (Into radio) OK, they’re coming. Two minutes left! 14. INT. CHURCH/BUNKER. MORNING The director listening to Barley over his radio. BARLEY: (CONT.) (By means of radio) Two minutes. DIRECTOR: (By means of radio to loudspeaker) Pace it up Thom! Vivacissimo! 15. EXT. STREET. DAY The recreation of Thom and Celia’s breakup resumes, Celia is now extra threatening than she was initially, now not upset a lot as intimidating. CELIA: Why don’t you simply admit that you simply’re freaked out by my robotic hand? Extra phrases seem within the sky for Thom to learn “Celia, I like your robotic hand”, though he decides to disregard them. 6. THOM: Pay attention, Celia, I used to be younger. However that’s no motive to destroy the world. Celia continues to stare menacingly at Thom, opening and shutting her robotic hand. 16. INT. CHURCH/BUNKER. MORNING Director and Assistant in workplace as they watch the scene unfold. ASSISTANT: Why does he do that? DIRECTOR: We already tried that one! 17. EXT. STREET. DAY.’
Lastly, we use Amazon Nova Micro to summarize the earlier scene data. The response is reworked to pure speech by Amazon Nova Sonic and streams again to the person to finish the interplay. Right here’s the ultimate narrative produced by Amazon Nova Sonic:
On this scene, Barley is counting down the seconds from atop a tower, creating a way of urgency. Beneath, in a bunker, the director is giving orders to hurry up the motion. On the road, Thom and Celia are going through one another with rigidity. Celia has a menacing robotic hand, and Thom admits his previous errors. The director and his assistant are watching the scene unfold, with the director remaining centered regardless of the assistant’s confusion.
Right here’s an end-to-end walkthrough of the answer offered at IBC2025, a number one annual commerce present for professionals within the media, leisure, and expertise sectors.
Conclusion
On this put up, we showcased a conversational AI agent that understands and responds in pure voice interplay to assist customers uncover films and TV reveals in a customized approach, whereas offering real-time insights throughout viewing. The system analyzes the customers’ particular person viewing patterns and the watch historical past to create customized profiles that drive related suggestions. For instance, a person who watches loads of motion films can obtain motion associated film suggestions when requested a generic theme query like “enjoyable films”. You may work together with the AI by way of voice instructions to obtain tailor-made content material ideas, entry detailed details about actors, and get on the spot explanations about particular scenes in films you’re at present watching. The answer showcases the flexibility of Nova Sonic 2.0 to know pure language, carry out semantic information base searches, handle playlists, and keep context all through multi-turn conversations. This strategy represents a major leap ahead from implicit feedback-based suggestions to express, conversational desire gathering. This in the end helps to create a extra participating and intuitive content material discovery expertise that may drive greater person engagement and repair retention.
Acknowledgement
This put up is the product of a unbelievable collaborative effort. Big because of Arturo Velasco, Daryl Cartwright, George Vasels, Juan Andres Caycedo, and Vince Palazzo for pouring their time and experience into the weblog, demo, and code samples. This wouldn’t exist with out them.
Concerning the authors
Wei Teh
Wei is a Machine Studying Options Architect at AWS, the place he companions with clients to drive enterprise outcomes by way of generative AI and agentic AI options. Outdoors of labor, he enjoys exploring the outside along with his household.
Tulip Gupta
Tulip Gupta is a Principal Options Architect at Amazon Net Companies the place she serves as each a technical chief and strategic advisor to AWS Media and leisure clients. She focuses on making use of AI and machine studying improvements inside the Media and Leisure business, serving to organizations leverage superior applied sciences to remodel content material creation, supply, and viewers experiences.

