1 DeepSeek R1 Model now Available in Amazon Bedrock Marketplace And Amazon SageMaker JumpStart
Adrianne Taul edited this page 4 months ago


Today, we are excited to announce that DeepSeek R1 distilled Llama and Qwen designs are available through Amazon Bedrock Marketplace and Amazon SageMaker JumpStart. With this launch, you can now release DeepSeek AI's first-generation frontier model, DeepSeek-R1, along with the distilled variations ranging from 1.5 to 70 billion criteria to construct, experiment, and properly scale your generative AI concepts on AWS.

In this post, we demonstrate how to begin with DeepSeek-R1 on Amazon Bedrock Marketplace and SageMaker JumpStart. You can follow similar actions to deploy the distilled versions of the models too.

Overview of DeepSeek-R1

DeepSeek-R1 is a large language model (LLM) developed by DeepSeek AI that utilizes reinforcement discovering to improve reasoning capabilities through a multi-stage training procedure from a DeepSeek-V3-Base structure. An essential identifying feature is its support knowing (RL) step, which was used to improve the model's actions beyond the standard pre-training and tweak process. By including RL, DeepSeek-R1 can adapt more successfully to user feedback and objectives, ultimately boosting both significance and clearness. In addition, DeepSeek-R1 employs a chain-of-thought (CoT) method, meaning it's geared up to break down intricate inquiries and factor through them in a detailed way. This directed thinking procedure enables the model to produce more precise, transparent, and detailed answers. This design combines RL-based fine-tuning with CoT capabilities, aiming to create structured actions while concentrating on interpretability and user interaction. With its extensive abilities DeepSeek-R1 has caught the industry's attention as a flexible text-generation model that can be integrated into different workflows such as representatives, rational thinking and information interpretation tasks.

DeepSeek-R1 uses a Mixture of Experts (MoE) architecture and is 671 billion criteria in size. The MoE architecture allows activation of 37 billion specifications, making it possible for efficient reasoning by routing queries to the most relevant specialist "clusters." This approach enables the design to focus on different issue domains while maintaining general efficiency. DeepSeek-R1 needs at least 800 GB of HBM memory in FP8 format for reasoning. In this post, we will use an ml.p5e.48 xlarge instance to release the model. ml.p5e.48 xlarge features 8 Nvidia H200 GPUs offering 1128 GB of GPU memory.

DeepSeek-R1 distilled models bring the thinking abilities of the main R1 model to more effective architectures based upon popular open designs like Qwen (1.5 B, 7B, 14B, and 32B) and Llama (8B and 70B). Distillation describes a process of training smaller sized, more efficient models to simulate the habits and thinking patterns of the larger DeepSeek-R1 design, raovatonline.org using it as an instructor design.

You can release DeepSeek-R1 model either through SageMaker JumpStart or [forum.batman.gainedge.org](https://forum.batman.gainedge.org/index.php?action=profile