|
|
|
|
|
<br>DeepSeek-R1 is based on DeepSeek-V3, a mix of experts (MoE) design just recently open-sourced by DeepSeek. This [base model](https://githost.geometrx.com) is fine-tuned using Group Relative Policy [Optimization](https://haloentertainmentnetwork.com) (GRPO), a reasoning-oriented variation of RL. The research team likewise performed understanding distillation from DeepSeek-R1 to open-source Qwen and Llama designs and launched a number of variations of each |