Stateful MCP consumer capabilities on Amazon Bedrock AgentCore Runtime now allow interactive, multi-turn agent workflows that had been beforehand not possible with stateless implementations. Builders constructing AI brokers typically battle when their workflows should pause mid-execution to ask customers for clarification, request massive language mannequin (LLM)-generated content material, or present real-time progress updates throughout long-running operations, stateless MCP servers can’t deal with these situations. This solves these limitations by introducing three consumer capabilities from the MCP specification:
- Elicitation (request person enter mid-execution)
- Sampling (request LLM-generated content material from the consumer)
- Progress notification (stream real-time updates)
These capabilities rework one-way software execution into bidirectional conversations between your MCP server and purchasers.
Mannequin Context Protocol (MCP) is an open customary defining how LLM functions join with exterior instruments and information sources. The specification defines server capabilities (instruments, prompts, and sources that servers expose) and consumer capabilities (options purchasers provide again to servers). Whereas our earlier launch centered on internet hosting stateless MCP servers on AgentCore Runtime, this new functionality completes the bidirectional protocol implementation. Purchasers connecting to AgentCore-hosted MCP servers can now reply to server-initiated requests. On this publish, you’ll learn to construct stateful MCP servers that request person enter throughout execution, invoke LLM sampling for dynamic content material era, and stream progress updates for long-running duties. You will note code examples for every functionality and deploy a working stateful MCP server to Amazon Bedrock AgentCore Runtime.
From stateless to stateful MCP
The unique MCP server assist on AgentCore used stateless mode: every incoming HTTP request was impartial, with no shared context between calls. This mannequin is easy to deploy and cause about, and it really works nicely for software servers that obtain inputs and return outputs. Nevertheless, it has a elementary constraint. The server can’t keep a dialog thread throughout requests, ask the person for clarification in the midst of a software name, or report progress again to the consumer as work occurs.
Stateful mode removes that constraint. While you run your MCP server with stateless_http=False, AgentCore Runtime provisions a devoted microVM for every person session. The microVM persists for the session’s lifetime (as much as 8 hours, or quarter-hour of inactivity per idleRuntimeSessionTimeout setting), with CPU, reminiscence, and filesystem isolation between classes. The protocol maintains continuity by means of a Mcp-Session-Id header: the server returns this identifier through the initialize handshake, and the consumer contains it in each subsequent request to route again to the identical session.
The next desk summarizes the important thing variations:
Stateless mode
Stateful mode
stateless_httpsetting
TRUE
FALSE
Session isolation
Devoted microVM per session
Devoted microVM per session
Session lifetime
As much as 8 hours; 15-min idle timeout
As much as 8 hours; 15-min idle timeout
Consumer capabilities
Not supported
Elicitation, sampling, progress notifications
Advisable for
Easy software serving
Interactive, multi-turn workflows
When a session expires or the server is restarted, subsequent requests with the early session ID return a 404. At that time, purchasers should re-initialize the connection to acquire a brand new session ID and begin a contemporary session.The configuration change to allow stateful mode is a single flag in your server startup:
mcp.run( transport=”streamable-http”, host=”0.0.0.0″, port=8000, stateless_http=False # Allow stateful mode)
Past this flag, the three consumer capabilities change into out there robotically as soon as the MCP consumer declares assist for them through the initialization handshake.
The three new consumer capabilities
Stateful mode brings three consumer capabilities from the MCP specification. Every addresses a distinct interplay sample that brokers encounter in manufacturing workflows.
Elicitation permits a server to pause execution and request structured enter from the person by means of the consumer. The software can ask focused questions on the proper second in its workflow, gathering a desire, confirming a call, or accumulating a worth that depends upon earlier outcomes. The server sends an elicitation/create request with a message and an elective JSON schema describing the anticipated response construction. The consumer renders an applicable enter interface, and the person can settle for (offering the information), decline, or cancel.
Sampling permits a server to request an LLM-generated completion from the consumer by means of sampling/createMessage. That is the mechanism that makes it doable for software logic on the server to make use of language mannequin capabilities with out holding its personal mannequin credentials. The server offers a immediate and elective mannequin preferences; the consumer forwards the request to its related LLM and returns the generated response. Sensible makes use of embrace producing customized summaries, creating natural-language explanations of structured information, or producing suggestions based mostly on earlier dialog context.
Progress notifications permit a server to report incremental progress throughout long-running operations. Utilizing ctx.report_progress(progress, complete), the server emits updates that purchasers can show as a progress bar or standing indicator. For operations that span a number of steps, for instance, looking throughout information sources, this retains customers knowledgeable relatively than watching a clean display.
All three capabilities are opt-in on the consumer stage: a consumer declares which capabilities it helps throughout initialization, and the server should solely use capabilities the consumer has marketed.
Elicitation: server-initiated person enter
Elicitation is the mechanism by which an MCP server pauses mid-execution and asks the consumer to gather particular info from the person. The server sends an elicitation/create JSON-RPC request containing a human-readable message and a requestedSchema that describes the anticipated response. The consumer presents this as a type or immediate, and the person’s response (or express decline) is returned to the server so execution can proceed.The MCP specification helps two elicitation modes:
- Type mode: structured information assortment immediately by means of the MCP consumer. Appropriate for preferences, configuration inputs, and confirmations that don’t contain delicate information.
- URL mode: directs the person to an exterior URL for interactions that should not go by means of the MCP consumer, corresponding to OAuth flows, fee processing, or credential entry.
The response makes use of a three-action mannequin: settle for (person supplied information), decline (person explicitly rejected the request), or cancel (person dismissed with out selecting). Servers ought to deal with every case appropriately. The next instance implements an add_expense_interactive software that collects a brand new expense by means of 4 sequential elicitation steps: quantity, description, class, and a closing affirmation earlier than writing to DynamoDB. Every step defines its anticipated enter as a Pydantic mannequin, which FastMCP converts to the JSON Schema despatched within the elicitation/create request.
Server
The add_expense_interactive software walks a person by means of 4 sequential questions earlier than writing to Amazon DynamoDB. Every step defines its anticipated enter as a separate Pydantic mannequin, as a result of the shape mode schema should be a flat object. You may gather all 4 fields in a single mannequin with 4 properties however splitting them right here offers the person one centered query at a time, which is the interactive sample elicitation is designed for.
brokers/mcp_client_features.py
import os
from pydantic import BaseModel
from fastmcp import FastMCP, Context
from fastmcp.server.elicitation import AcceptedElicitation
from dynamo_utils import FinanceDB
mcp = FastMCP(title=”ElicitationMCP”)
_region = os.environ.get(‘AWS_REGION’) or os.environ.get(‘AWS_DEFAULT_REGION’) or ‘us-east-1’
db = FinanceDB(region_name=_region)
class AmountInput(BaseModel):
quantity: float
class DescriptionInput(BaseModel):
description: str
class CategoryInput(BaseModel):
class: str # one among: meals, transport, payments, leisure, different
class ConfirmInput(BaseModel):
verify: str # Sure or No
@mcp.software()
async def add_expense_interactive(user_alias: str, ctx: Context) -> str:
“””Interactively add a brand new expense utilizing elicitation.
Args:
user_alias: Person identifier
“””
# Step 1: Ask for the quantity
end result = await ctx.elicit(‘How a lot did you spend?’, AmountInput)
if not isinstance(end result, AcceptedElicitation):
return ‘Expense entry cancelled.’
quantity = end result.information.quantity
# Step 2: Ask for an outline
end result = await ctx.elicit(‘What was it for?’, DescriptionInput)
if not isinstance(end result, AcceptedElicitation):
return ‘Expense entry cancelled.’
description = end result.information.description
# Step 3: Choose a class
end result = await ctx.elicit(
‘Choose a class (meals, transport, payments, leisure, different):’,
CategoryInput
)
if not isinstance(end result, AcceptedElicitation):
return ‘Expense entry cancelled.’
class = end result.information.class
# Step 4: Affirm earlier than saving
confirm_msg = (
f’Affirm: add expense of ${quantity:.2f} for {description}’
f’ (class: {class})? Reply Sure or No’
)
end result = await ctx.elicit(confirm_msg, ConfirmInput)
if not isinstance(end result, AcceptedElicitation) or end result.information.verify != ‘Sure’:
return ‘Expense entry cancelled.’
return db.add_transaction(user_alias, ‘expense’, -abs(quantity), description, class)
if __name__ == ‘__main__’:
mcp.run(
transport=”streamable-http”,
host=”0.0.0.0″,
port=8000,
stateless_http=False
)
Every await ctx.elicit() suspends the software and sends an elicitation/create request over the lively session. The isinstance(end result, AcceptedElicitation) test handles decline and cancel uniformly at each step.
Consumer
Registering an elicitation_handler on fastmcp.Consumer is each how the handler is wired in and the way the consumer advertises elicitation assist to the server throughout initialization.
import asyncio
from fastmcp import Consumer
from fastmcp.consumer.transports import StreamableHttpTransport
# Pre-loaded responses simulate the person answering every query in sequence
_responses = iter([
{‘amount’: 45.50},
{‘description’: ‘Lunch at the office’},
{‘category’: ‘food’},
{‘confirm’: ‘Yes’},
])
async def elicit_handler(message, response_type, params, context):
# In manufacturing: render a type and return the person’s enter
response = subsequent(_responses)
print(f’ Server asks: {message}’)
print(f’ Responding: {response}n’)
return response
transport = StreamableHttpTransport(url=mcp_url, headers=headers)
async with Consumer(transport, elicitation_handler=elicit_handler) as consumer:
await asyncio.sleep(2) # permit session initialization
end result = await consumer.call_tool(‘add_expense_interactive’, {‘user_alias’: ‘me’})
print(end result.content material[0].textual content)
Working this towards the deployed server:
Server asks: How a lot did you spend?
Responding: {‘quantity’: 45.5}
Server asks: What was it for?
Responding: {‘description’: ‘Lunch on the workplace’}
Server asks: Choose a class (meals, transport, payments, leisure, different):
Responding: {‘class’: ‘meals’}
Server asks: Affirm: add expense of $45.50 for Lunch on the workplace (class: meals)? Reply Sure or No
Responding: {‘verify’: ‘Sure’}
Expense of $45.50 added for me
The whole working instance, together with DynamoDB setup and AgentCore deployment, is obtainable within the GitHub pattern repository.
Use elicitation when your software wants info that depends upon earlier outcomes, is best collected interactively than upfront, or varies throughout customers in methods that can’t be parameterized upfront. A journey reserving software that first searches locations after which asks the person to decide on amongst them is a pure match. A monetary workflow that confirms a transaction quantity earlier than submitting is one other. Elicitation isn’t applicable for delicate inputs like passwords or API keys, use URL mode or a safe out-of-band channel for these.
Sampling: server-initiated LLM era
Sampling is the mechanism by which an MCP server requests an LLM completion from the consumer. The server sends a sampling/createMessage request containing an inventory of dialog messages, a system immediate, and elective mannequin preferences. The consumer forwards the request to its related language mannequin (topic to person approval) and returns the generated response. The server receives a structured end result containing the generated textual content, the mannequin used, and the cease cause.
This functionality inverts the everyday stream: as a substitute of the consumer asking the server for software outcomes, the server asks the consumer for mannequin output. The profit is that the server doesn’t want API keys or a direct mannequin integration. The consumer retains full management over which mannequin is used, and the MCP specification requires a human-in-the-loop step the place customers can evaluate and approve sampling requests earlier than they’re forwarded.
Servers can specific mannequin preferences utilizing functionality priorities (costPriority, speedPriority, intelligencePriority) and elective mannequin hints. These are advisory, the consumer makes the ultimate choice based mostly on what fashions it has entry to.
Server
The analyze_spending software fetches transactions from DynamoDB, builds a immediate from the structured information, and delegates the evaluation to the consumer’s LLM by way of ctx.pattern().
brokers/mcp_client_features.py (added software, similar file as elicitation)
@mcp.software()
async def analyze_spending(user_alias: str, ctx: Context) -> str:
“””Fetch bills from DynamoDB and ask the consumer’s LLM to analyse them.
Args:
user_alias: Person identifier
“””
transactions = db.get_transactions(user_alias)
if not transactions:
return f’No transactions discovered for {user_alias}.’
strains=”n”.be a part of(
f”- {t[‘description’]} (${abs(float(t[‘amount’])):.2f}, {t[‘category’]})”
for t in transactions
)
immediate = (
f’Listed below are the latest bills for a person:n{strains}nn’
f’Please analyse the spending patterns and provides 3 concise, ‘
f’actionable suggestions to enhance their funds. ‘
f’Hold the response underneath 120 phrases.’
)
ai_analysis=”Evaluation unavailable.”
strive:
response = await ctx.pattern(messages=immediate, max_tokens=300)
if hasattr(response, ‘textual content’) and response.textual content:
ai_analysis = response.textual content
besides Exception:
go
return f’Spending Evaluation for {user_alias}:nn{ai_analysis}’
The software calls await ctx.pattern() and suspends. The server sends a sampling/createMessage request to the consumer over the open session. When the consumer returns the LLM response, execution resumes.
Consumer
The sampling_handler receives the immediate from the server and forwards it to a language mannequin. On this instance, that’s Claude Haiku on Amazon. Registering the handler can also be how the consumer declares sampling assist to the server throughout initialization.
import json
import asyncio
import boto3
from mcp.varieties import CreateMessageResult, TextContent
from fastmcp import Consumer
from fastmcp.consumer.transports import StreamableHttpTransport
MODEL_ID = ‘us.anthropic.claude-haiku-4-5-20251001-v1:0’
bedrock = boto3.consumer(‘bedrock-runtime’, region_name=area)
def _invoke_bedrock(immediate: str, max_tokens: int) -> str:
physique = json.dumps({
‘anthropic_version’: ‘bedrock-2023-05-31’,
‘max_tokens’: max_tokens,
‘messages’: [{‘role’: ‘user’, ‘content’: prompt}]
})
resp = bedrock.invoke_model(modelId=MODEL_ID, physique=physique)
return json.masses(resp[‘body’].learn())[‘content’][0][‘text’]
async def sampling_handler(messages, params, ctx):
“””Known as by fastmcp.Consumer when the server points ctx.pattern().”””
immediate = messages if isinstance(messages, str) else ‘ ‘.be a part of(
m.content material.textual content for m in messages if hasattr(m.content material, ‘textual content’)
)
max_tokens = params.maxTokens if params and hasattr(params, ‘maxTokens’) and params.maxTokens else 300
textual content = await asyncio.to_thread(_invoke_bedrock, immediate, max_tokens)
return CreateMessageResult(
function=”assistant”,
content material=TextContent(kind=”textual content”, textual content=textual content),
mannequin=MODEL_ID,
stopReason=’endTurn’
)
transport = StreamableHttpTransport(url=mcp_url, headers=headers)
async with Consumer(transport, sampling_handler=sampling_handler) as consumer:
end result = await consumer.call_tool(‘analyze_spending’, {‘user_alias’: ‘me’})
print(end result.content material[0].textual content)
Working this towards a person with 4 seeded bills:
Spending Evaluation for me:
Whole Spending: $266.79
Breakdown:
– Meals: $130.80 (49%)
– Payments: $120.00 (45%)
– Leisure: $15.99 (6%)
3 Actionable Suggestions:
1. Meal prep at house — prepare dinner groceries into a number of meals to cut back restaurant
spending and decrease meals prices by 20-30%.
2. Evaluation leisure subscriptions — audit all subscriptions and cancel
unused companies or share household plans.
3. Scale back vitality prices — use programmable thermostats, LED bulbs, and unplug
units to decrease electrical energy payments by 10-15% month-to-month.
Use sampling when your software should produce natural-language output that advantages from a language mannequin’s capabilities. A software that has collected a person’s journey preferences and desires to generate a tailor-made journey itinerary narrative is an effective instance. Sampling isn’t applicable for deterministic operations like database queries, calculations, or API calls with well-defined outputs. We advocate that you simply use software logic for these.
Progress notifications: real-time operation suggestions
Progress notifications are occasions {that a} server sends throughout long-running operations to maintain the consumer and the person knowledgeable about how a lot work has been accomplished. await ctx.report_progress(progress, complete) emits a notifications/progress message and returns instantly. The server doesn’t look forward to a response, it’s fire-and-forget in each instructions. The consumer receives the notification asynchronously and may render a progress bar, log a standing line, or use it to forestall the person from assuming the connection has stalled. The sample is to name report_progress at every logical step of a multi-stage operation, with progress incrementing towards complete.
Server
The generate_report software builds a month-to-month monetary report in 5 steps, emitting a progress notification in the beginning of every one.
brokers/mcp_progress_server.py
import os
from fastmcp import FastMCP, Context
from dynamo_utils import FinanceDB
mcp = FastMCP(title=”Progress-MCP-Server”)
_region = os.environ.get(‘AWS_REGION’) or os.environ.get(‘AWS_DEFAULT_REGION’) or ‘us-east-1’
db = FinanceDB(region_name=_region)
@mcp.software()
async def generate_report(user_alias: str, ctx: Context) -> str:
“””Generate a month-to-month monetary report, streaming progress at every stage.
Args:
user_alias: Person identifier
“””
complete = 5
# Step 1: Fetch transactions
await ctx.report_progress(progress=1, complete=complete)
transactions = db.get_transactions(user_alias)
# Step 2: Group by class
await ctx.report_progress(progress=2, complete=complete)
by_category = {}
for t in transactions:
cat = t[‘category’]
by_category[cat] = by_category.get(cat, 0) + abs(float(t[‘amount’]))
# Step 3: Fetch budgets
await ctx.report_progress(progress=3, complete=complete)
budgets = {b[‘category’]: float(b[‘monthly_limit’]) for b in db.get_budgets(user_alias)}
# Step 4: Examine spending vs budgets
await ctx.report_progress(progress=4, complete=complete)
strains = []
for cat, spent in sorted(by_category.gadgets(), key=lambda x: -x[1]):
restrict = budgets.get(cat)
if restrict:
pct = (spent / restrict) * 100
standing=”OVER” if spent > restrict else ‘OK’
strains.append(f’ {cat:<15} ${spent:>8.2f} / ${restrict:.2f} [{pct:.0f}%] {standing}’)
else:
strains.append(f’ {cat:<15} ${spent:>8.2f} (no price range set)’)
# Step 5: Format and return
await ctx.report_progress(progress=5, complete=complete)
total_spent = sum(by_category.values())
return (
f’Month-to-month Report for {user_alias}n’
f'{“=” * 50}n’
f’ {“Class”:<15} {“Spent”:>10} {“Price range”:>8} Statusn’
f'{“-” * 50}n’
+ ‘n’.be a part of(strains)
+ f’n{“-” * 50}n’
f’ {“TOTAL”:<15} ${total_spent:>8.2f}n’
)
if __name__ == ‘__main__’:
mcp.run(
transport=”streamable-http”,
host=”0.0.0.0″,
port=8000,
stateless_http=False
)
Every await ctx.report_progress() is fire-and-forget: the notification is shipped and execution strikes instantly to the following step.
Consumer
The progress_handler receives progress, complete, and an elective message every time the server emits a notification. Registering the handler is how the consumer declares progress assist throughout initialization.
import logging
logging.getLogger(‘mcp.consumer.streamable_http’).setLevel(logging.ERROR)
from fastmcp import Consumer
from fastmcp.consumer.transports import StreamableHttpTransport
async def progress_handler(progress: float, complete: float | None, message: str | None):
pct = int((progress / complete) * 100) if complete else 0
stuffed = pct // 5
bar=”#” * stuffed + ‘-‘ * (20 – stuffed)
print(f’r Progress: [{bar}] {pct}% ({int(progress)}/{int(complete or 0)})’,
finish=”, flush=True)
if complete and progress >= complete:
print(‘ Accomplished!’)
transport = StreamableHttpTransport(url=mcp_url, headers=headers)
async with Consumer(transport, progress_handler=progress_handler) as consumer:
end result = await consumer.call_tool(‘generate_report’, {‘user_alias’: ‘me’})
print(end result.content material[0].textual content)
Because the server strikes by means of its 5 levels, the consumer renders the bar in place:
Progress: [####—————-] 20% (1/5)
Progress: [########————] 40% (2/5)
Progress: [############——–] 60% (3/5)
Progress: [################—-] 80% (4/5)
Progress: [####################] 100% (5/5) Accomplished!
Use progress notifications for any software name that takes quite a lot of seconds and entails discrete, measurable steps. Operations like looking a number of information sources, operating a sequence of API calls, processing a batch of information, or operating a multi-step reserving workflow are all good candidates. A software that completes in underneath a second typically doesn’t want progress reporting; the overhead of emitting occasions isn’t worthwhile for quick operations.
Conclusion
On this publish, you have got been launched to stateful MCP consumer capabilities on Amazon Bedrock AgentCore Runtime. We defined the distinction between stateless and stateful MCP deployments, walked by means of elicitation, sampling, and progress notifications with code examples, and confirmed how one can deploy a stateful MCP server into AgentCore Runtime. With these capabilities, you’ll be able to construct MCP servers that have interaction customers in structured conversations, use the consumer’s LLM for content material era, and supply real-time visibility into long-running operations, all hosted on managed, remoted infrastructure powered by AgentCore Runtime.We encourage you to discover the next sources to get began:
In regards to the Authors
Evandro Franco
Evandro Franco is a Sr. Information Scientist engaged on Amazon Internet Companies. He’s a part of the International GTM crew that helps AWS clients overcome enterprise challenges associated to AI/ML on prime of AWS, primarily on Amazon Bedrock AgentCore and Strands Brokers. He has greater than 18 years of expertise working with expertise, from software program improvement, infrastructure, serverless, to machine studying. In his free time, Evandro enjoys taking part in along with his son, primarily constructing some humorous Lego bricks.
Phelipe Fabres
Phelipe Fabres is a Sr. Options Architect for Generative AI at AWS for Startups. He’s a part of a world Frontier AI crew with a concentrate on costumers which are constructing Basis Fashions/LLMs/SLMs. Has prolonged work on Agentic methods and Software program pushed AI methods. He has greater than 10 years of working with software program improvement, from monolith to event-driven architectures with a Ph.D. in Graph Idea. In his free time, Phelipe enjoys taking part in along with his daughter, primarily board video games and drawing princess.
Zihang Huang
Zihang Huang is an answer architect at AWS. He’s an agentic professional for related autos, sensible house, renewable vitality, and industrial IoT. At present, he focuses on agentic AI options with AgentCore, bodily AI, IoT, edge computing, and large information. Earlier than AWS, he gained technical expertise at Bosch and Alibaba Cloud.
Sayee Kulkarni
Sayee Kulkarni is a Software program Improvement Engineer on the AWS Bedrock AgentCore service. Her crew is liable for constructing and sustaining the AgentCore Runtime platform, a foundational element that allows clients to leverage agentic AI capabilities. She is pushed by delivering tangible buyer worth, and this customer-centric focus motivates her work. Sayee has led key initiatives together with MCP Stateful capabilities and different core platform options, enabling clients to construct extra refined and production-ready AI brokers.

