Amazon SageMaker JumpStart supplies pretrained fashions for a variety of downside varieties that can assist you get began with AI workloads. SageMaker JumpStart presents entry to options for prime use circumstances that may be deployed to SageMaker AI Managed Inference endpoints or SageMaker HyperPod clusters. By pre-set deployment choices, prospects can shortly transfer from mannequin choice to mannequin deployment.
Mannequin deployments by way of SageMaker JumpStart are quick and easy. Clients may choose choices primarily based on anticipated concurrent customers, with visibility into P50 latency, time-to-first token (TTFT), and throughput (token/second/consumer). Whereas concurrent consumer configuration choices are useful for general-purpose situations, they aren’t task-aware, and we acknowledge that prospects use SageMaker JumpStart for various, particular use circumstances like content material technology, content material summarization, or Q&A. Every use case would possibly require particular configurations to enhance efficiency. Furthermore, the definition of efficiency isn’t constrained to only latency, and a few prospects would possibly measure efficiency in throughput or lowest value per token.
Constructing on this basis, we’re excited to announce the launch of SageMaker JumpStart optimized deployments. SageMaker JumpStart improved deployments handle the necessity for wealthy and easy deployment customization on SageMaker JumpStart by providing pre-defined deployment configurations, designed for particular use circumstances. Clients preserve the identical stage of visibility into the small print of their proposed deployments, however now deployments are optimized for his or her particular use case and efficiency constraint.
Stipulations
To start utilizing SageMaker JumpStart optimized deployments, prospects require at minimal the next:
After these options are in place, prospects can start utilizing SageMaker JumpStart optimized deployments straight away.
Getting began
To get began utilizing SageMaker JumpStart optimized deployments, open SageMaker Studio and select Fashions. Choose any of the fashions that assist optimized deployments (listed within the following part) and select Deploy within the top-right nook. The ensuing display now includes a collapsible window labeled “Efficiency”, which options the choice choices for optimized deployments.
The displayed choices require customers to first choose a use case. For text-based fashions, these use circumstances can vary from generative writing to chat-style interactions; picture and video will characteristic totally different use circumstances after assist is added for these enter varieties. After deciding on a use case, prospects should choose certainly one of three constraint optimizations: Price optimized, Throughput optimized, and Latency optimized. There may be additionally a Balanced choice for patrons in search of the very best common efficiency throughout all logged metrics.
After chosen, a pre-set deployment configuration is outlined for the endpoint. Clients can additional overview and choose extra configuration values like timeouts, endpoint naming, and safety settings. After configuration is full, prospects select the Deploy choice within the bottom-right nook.
Accessible fashions
SageMaker JumpStart optimized deployments can be found for the next fashions:
- Meta
- Llama-3.1-8B-Instruct
- Llama-2-7b-hf
- Llama-3.2-3B
- Meta-Llama-3-8B
- Llama-3.2-1B-Instruct
- Llama-3.2-1B
- Llama-3.1-70B-Instruct
- Llama-3.2-3B-Instruct
- Meta-Llama-3-8B
- Microsoft
- Mistral AI
- Mistral-7B-Instruct-v0.2
- Mistral-Small-24B-Instruct-2501
- Mistral-7B-v0.1
- Mistral-7B-Instruct-v0.3
- Mixtral-8x7B-Instruct-v0.1
- Qwen
- Qwen3-8B
- Qwen3-32B
- Qwen3-0.6B
- Qwen2.5-7B-Instruct
- Qwen2.5-72B-Instruct
- Qwen2-VL-7B-Instruct
- Qwen2-1.5B-Instruct
- Qwen2-7B
- Google
- gemma-7b
- gemma-7b-it
- gemma-2b
- Tiiuae
These are the launch fashions for optimized deployments, and we’re actively increasing assist to incorporate extra fashions.
Name to motion
Clients can begin working with SageMaker JumpStart optimized deployments instantly. Choose one of many out there optimized deployment fashions within the SageMaker Studio mannequin hub. Experiment with the totally different deployment choices to find out the correct configuration to your utility.
In regards to the authors
Dan Ferguson
Dan Ferguson is a Options Architect at AWS, primarily based in New York, USA. As a machine studying companies professional, Dan works to assist prospects on their journey to integrating ML workflows effectively, successfully, and sustainably.
Malav Shastri
Malav Shastri is a Software program Improvement Engineer at AWS, the place he works on the Amazon SageMaker JumpStart and Amazon Bedrock groups. His position focuses on enabling prospects to make the most of state-of-the-art open supply and proprietary basis fashions and conventional machine studying algorithms. Malav holds a Grasp’s diploma in Laptop Science.
Pooja Karadgi
Pooja Karadgi leads product and strategic partnerships for Amazon SageMaker JumpStart, the machine studying and generative AI hub inside SageMaker. She is devoted to accelerating buyer AI adoption by simplifying basis mannequin discovery and deployment, enabling prospects to construct production-ready generative AI purposes throughout your entire mannequin lifecycle – from onboarding and customization to deployment.

