Third-Party Large Models
Specifications of Third-Party Large Models
In addition to Pangu models, ModelArts Studio integrates popular open-source third-party NLP models.
For example, DeepSeek V3 was released on December 26, 2024. It is a Mixture-of-Experts (MoE) language model with 671B parameters. DeepSeek V3 outperforms GPT-4.5 on mathematical and coding evaluation benchmarks. DeepSeek R1 has a similar structure to DeepSeek V3. It was officially open-sourced on January 20, 2025. As an outstanding representative of models with strong reasoning capabilities, DeepSeek R1 has attracted great attention. DeepSeek R1 has achieved the same or even better performance than top closed-source models such as GPT-4o and GPT-4o1 in core tasks such as mathematical reasoning and code generation, and is recognized as a leading LLM in the industry.
ModelArts Studio provides you with third-party NLP models of different specifications to meet different scenarios and requirements. The following table lists the supported models. You can choose the most suitable model based on your requirements for development and application.
Supported Region |
Model Name |
Maximum Context Length |
Maximum Output Length |
Description |
---|---|---|---|---|
CN-Hong Kong |
DeepSeek-R1-32K-0.0.2 |
32K |
8K |
It was released in June 2025. It supports inference for a context length of 32K tokens. 16 inference units are required to deploy the model. Inference with a context length of 32K tokens supports up to 256 concurrent calls. The base model of this version is the open-source model DeepSeek R1-0528. |
DeepSeek-V3-32K-0.0.2 |
32K |
8K |
It was released in June 2025. It supports inference for a context length of 32K tokens. 16 inference units are required to deploy the model. Inference with a context length of 32K tokens supports up to 256 concurrent calls. The base model of this version is the open-source model DeepSeek V3-0324. |
|
DeepSeek-R1-Distil-Qwen-32B-0.0.1 |
32K |
8K |
DeepSeek-R1-Distill-Qwen-32B is a model fine-tuned based on the open-source model Qwen2.5-32B using data generated by DeepSeek-R1. |
|
DeepSeek-R1-Distill-LLama-70B-0.0.1 |
32K |
8K |
DeepSeek-R1-Distill-Llama-70B is a model fine-tuned based on the open-source model Llama-3.1-70B using data generated by DeepSeek-R1. |
|
DeepSeek-R1-Distill-LLama-8B-0.0.1 |
32K |
8K |
DeepSeek-R1-Distill-Llama-8B is a model fine-tuned based on the open-source model Llama-3.1-8B using data generated by DeepSeek-R2. |
|
Qwen3-235B-A22B-0.0.1 |
32K |
8K |
Qwen3-235B-A22B uniquely supports seamless switching between thinking mode and non-thinking mode, allowing users to switch between the two in a dialogue. The model's inference capability significantly outperforms that of QwQ, and its general capability far exceeds that of Qwen2.5-72B-Instruct, achieving SOTA performance among models of the same scale in the industry. |
|
Qwen3-32B-0.0.1 |
32K |
8K |
Qwen3-32B uniquely supports seamless switching between thinking mode and non-thinking mode, allowing users to switch between the two in a dialogue. The model's inference capability significantly outperforms that of QwQ, and its general capability far exceeds that of Qwen2.5-32B-Instruct, achieving SOTA performance among models of the same scale in the industry. |
|
Qwen3-30B-A3B-0.0.1 |
32K |
8K |
Qwen3-30B-A3B uniquely supports seamless switching between thinking mode and non-thinking mode, allowing users to switch between the two in a dialogue. The model's inference capability significantly outperforms that of QwQ, and its general capability far exceeds that of Qwen2.5-32B-Instruct, achieving SOTA performance among models of the same scale in the industry. |
|
Qwen3-14B-0.0.1 |
32K |
8K |
Qwen3-14B uniquely supports seamless switching between thinking mode and and non-thinking mode, allowing users to switch between the two in a dialogue. The model's inference capability reaches the SOTA level among models of the same scale in the industry, and its general capability significantly surpasses that of Qwen2.5-14B. |
|
Qwen3-8B-0.0.1 |
32K |
8K |
Qwen3-8B uniquely supports seamless switching between thinking mode and non-thinking mode, allowing users to switch between the two in a dialogue. The model's inference capability reaches the SOTA level among models of the same scale in the industry, and its general capability significantly surpasses that of Qwen2.5-7B. |
|
Qwen2.5-72B-0.0.1 |
32K |
8K |
Compared to Qwen2, Qwen2.5 has acquired significantly more knowledge and has greatly improved capabilities in coding and mathematics. Additionally, the new models achieve significant improvements in instruction following, generating long text, understanding structured data (e.g, tables), and generating structured outputs especially JSON. |
|
Qwen-QWQ-32B-0.0.1 |
32K |
8K |
This model is the QwQ reasoning model trained based on Qwen2.5-32B. Reinforcement learning greatly improves the model's inference capability. The core metrics of the model, including mathematical code (AIME 24/25, LiveCodeBench) and some general metrics (IFEval, LiveBench, etc.), reach the level of the full version of DeepSeek-R1, with all metrics significantly surpassing those of DeepSeek-R1-Distill-Qwen-32B, which is also based on Qwen2.5-32B. |
Supported Platform Operations
Model Name |
Model Evaluation |
Real-Time Inference |
Model Commissioning in Experience Center |
---|---|---|---|
DeepSeek-V3-32K-0.0.2 |
√ |
√ |
√ |
DeepSeek-R1-32K-0.0.2 |
√ |
√ |
√ |
DeepSeek-R1-Distil-Qwen-32B-0.0.1 |
√ |
√ |
√ |
DeepSeek-R1-Distill-LLama-70B-0.0.1 |
√ |
√ |
√ |
DeepSeek-R1-Distill-LLama-8B-0.0.1 |
√ |
√ |
√ |
Qwen3-235B-A22B-0.0.1 |
√ |
√ |
√ |
Qwen3-32B-0.0.1 |
√ |
√ |
√ |
Qwen3-30B-A3B-0.0.1 |
√ |
√ |
√ |
Qwen3-14B-0.0.1 |
√ |
√ |
√ |
Qwen3-8B-0.0.1 |
√ |
√ |
√ |
Qwen2.5-72B-0.0.1 |
√ |
√ |
√ |
Qwen-QWQ-32B-0.0.1 |
√ |
√ |
√ |
Dependency of Third-Party Large Models on Resource Pools
Model Name |
Cloud-based Deployment |
---|---|
Arm+Snt9B3 |
|
DeepSeek-V3-32K-0.0.2 |
Supported |
DeepSeek-R1-32K-0.0.2 |
Supported |
DeepSeek-R1-Distil-Qwen-32B-0.0.1 |
Supported |
DeepSeek-R1-Distill-LLama-70B-0.0.1 |
Supported |
DeepSeek-R1-Distill-LLama-8B-0.0.1 |
Supported |
Qwen3-235B-A22B-0.0.1 |
Supported |
Qwen3-32B-0.0.1 |
Supported |
Qwen3-30B-A3B-0.0.1 |
Supported |
Qwen3-14B-0.0.1 |
Supported |
Qwen3-8B-0.0.1 |
Supported |
Qwen2.5-72B-0.0.1 |
Supported |
Qwen-QWQ-32B-0.0.1 |
Supported |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot