Updated on 2025-11-20 GMT+08:00

Third-Party Large Models

Specifications of Third-Party Large Models

Currently, ModelArts Studio is focused on the NLP field and offers a selection of popular open-source NLP models from third-party providers for users to choose from.

For example, DeepSeek V3 was released on December 26, 2024. It is a Mixture-of-Experts (MoE) language model with 671B parameters. DeepSeek V3 outperforms GPT-4.5 on mathematical and coding evaluation benchmarks. DeepSeek R1 has the same architecture as DeepSeek V3. It was officially open-sourced on January 20, 2025. As an outstanding representative of models with strong reasoning capabilities, DeepSeek R1 has attracted great attention. DeepSeek R1 has achieved the same or even better performance than top closed-source models such as GPT-4o and GPT-4o1 in core tasks such as mathematical reasoning and code generation, and is recognized as a leading LLM in the industry. Recently, DeepSeek has open-sourced updated versions of its language models, DeepSeek V3-0324 and DeepSeek-R1-0528, which offer enhanced capabilities. These models have also been integrated into ModelArts Studio.

In addition to DeepSeek models, ModelArts Studio integrates Qwen series models. It supports Qwen3 series models (Qwen3-8B/14B/30B-A3B/32B/235B-A22B), DeepSeek-R1-Distill-Qwen-32B, DeepSeek-R1-Distill-LLama-70B/8B, Qwen2.5-72B, QwQ-32B, and Qwen2.5-VL-32B.

ModelArts Studio provides you with third-party NLP models of different specifications to meet different scenarios and requirements. The following table lists the supported models. You can choose the most suitable model based on your requirements for development and application.

Table 1 Specifications of third-party large models

Supported Region

Model Name

Maximum Context Length

Description

CN-Hong Kong

DeepSeek-V3-32K-0.0.1

32K

It was released in March 2025. It supports inference for a context length of 32K tokens. 16 inference units are required to deploy the model. Inference with a context length of 32K tokens supports up to 256 concurrent calls.

DeepSeek-V3-32K-0.0.2

32K

It was released in June 2025. It supports inference for a context length of 32K tokens. 16 inference units are required to deploy the model. Inference with a context length of 32K tokens supports up to 256 concurrent calls. The base model of this version is the open-source model DeepSeek V3-0324.

DeepSeek-R1-32K-0.0.1

32K

It was released in March 2025. It supports inference for a context length of 32K tokens. 16 inference units are required to deploy the model. Inference with a context length of 32K tokens supports up to 256 concurrent calls.

DeepSeek-R1-32K-0.0.2

32K

It was released in June 2025. It supports inference for a context length of 32K tokens. 16 inference units are required to deploy the model. Inference with a context length of 32K tokens supports up to 256 concurrent calls. The base model of this version is the open-source model DeepSeek R1-0528.

DeepSeek-R1-distil-Qwen-32B

32K

DeepSeek-R1-Distill-Qwen-32B is a model fine-tuned based on the open-source model Qwen2.5-32B using data generated by DeepSeek-R1.

DeepSeek-R1-distill-LLama-70B

32K

DeepSeek-R1-Distill-Llama-70B is a model fine-tuned based on the open-source model Llama-3.1-70B using data generated by DeepSeek-R1.

DeepSeek-R1-distill-LLama-8B

32K

DeepSeek-R1-Distill-Llama-8B is a model fine-tuned based on the open-source model Llama-3.1-8B using data generated by DeepSeek-R2.

Qwen3-235B-A22B

32K

Qwen3-235B-A22B uniquely supports seamless switching between thinking mode and non-thinking mode, allowing users to switch between the two in a dialogue. The model's inference capability significantly outperforms that of QwQ, and its general capability far exceeds that of Qwen2.5-72B-Instruct, achieving SOTA performance among models of the same scale in the industry.

Qwen3-32B

32K

Qwen3-32B uniquely supports seamless switching between thinking mode and non-thinking mode, allowing users to switch between the two in a dialogue. The model's inference capability significantly outperforms that of QwQ, and its general capability far exceeds that of Qwen2.5-32B-Instruct, achieving SOTA performance among models of the same scale in the industry.

Qwen3-30B-A3B

32K

Qwen3-30B-A3B uniquely supports seamless switching between thinking mode and non-thinking mode, allowing users to switch between the two in a dialogue. The model's inference capability significantly outperforms that of QwQ, and its general capability far exceeds that of Qwen2.5-32B-Instruct, achieving SOTA performance among models of the same scale in the industry.

Qwen3-14B

32K

Qwen3-14B uniquely supports seamless switching between thinking mode and non-thinking mode, allowing users to switch between the two in a dialogue. The model's inference capability reaches the SOTA level among models of the same scale in the industry, and its general capability significantly surpasses that of Qwen2.5-14B.

Qwen3-8B

32K

Qwen3-8B uniquely supports seamless switching between thinking mode and non-thinking mode, allowing users to switch between the two in a dialogue. The model's inference capability reaches the SOTA level among models of the same scale in the industry, and its general capability significantly surpasses that of Qwen2.5-7B.

Qwen2.5-72B

32K

Compared to Qwen2, Qwen2.5 has acquired significantly more knowledge and has greatly improved capabilities in coding and mathematics. Additionally, the new models achieve significant improvements in instruction following, generating long text, understanding structured data (e.g, tables), and generating structured outputs especially JSON.

Qwen2.5-VL-32B

32K

This Qwen2.5-VL 32B model provides capabilities including image recognition, precise visual positioning, text recognition and understanding, document parsing, and video comprehension.

QWQ-32B

32K

This model is the QwQ reasoning model trained based on Qwen2.5-32B. Reinforcement learning greatly improves the model's inference capability. The core metrics of the model, including mathematical code (AIME 24/25, LiveCodeBench) and some general metrics (IFEval, LiveBench, etc.), reach the level of the full version of DeepSeek-R1, with all metrics significantly surpassing those of DeepSeek-R1-Distill-Qwen-32B, which is also based on Qwen2.5-32B.

Platform Operations Supported by Third-Party Large Models

Table 2 Platform operations supported by third-party large models

Model Name

Model Evaluation

Real-Time Inference

Model Commissioning in Experience Center

DeepSeek-V3-32K-0.0.1

DeepSeek-V3-32K-0.0.2

DeepSeek-R1-32K-0.0.1

DeepSeek-R1-32K-0.0.2

DeepSeek-R1-distil-Qwen-32B

DeepSeek-R1-distill-LLama-70B

DeepSeek-R1-distill-LLama-8B

Qwen3-235B-A22B

Qwen3-32B

Qwen3-30B-A3B

Qwen3-14B

Qwen3-8B

Qwen2.5-72B

Qwen2.5-VL-32B

QWQ-32B

Dependency of Third-Party Large Models on Resource Pools

Table 3 Dependency of third-party large models on resource pools

Model Name

Cloud-based Deployment

DeepSeek-V3-32K-0.0.1

Supported. 16 inference units are required to deploy the model.

DeepSeek-V3-32K-0.0.2

Supported. 16 inference units are required to deploy the model.

DeepSeek-R1-32K-0.0.1

Supported. 16 inference units are required to deploy the model.

DeepSeek-R1-32K-0.0.2

Supported. 16 inference units are required to deploy the model.

DeepSeek-R1-distil-Qwen-32B

Supported. Two inference units are required to deploy the model.

DeepSeek-R1-distill-LLama-70B

Supported. Four inference units are required to deploy the model.

DeepSeek-R1-distill-LLama-8B

Supported. One inference unit is required to deploy the model.

Qwen3-235B-A22B

Supported. 16 inference units are required to deploy the model.

Qwen3-32B

Supported. Four inference units are required to deploy the model.

Qwen3-30B-A3B

Supported. Two inference units are required to deploy the model.

Qwen3-14B

Supported. One inference unit is required to deploy the model.

Qwen3-8B

Supported. One inference unit is required to deploy the model.

Qwen2.5-72B

Supported. Four inference units are required to deploy the model.

Qwen2.5-VL-32B

Supported. Four inference units are required to deploy the model.

QWQ-32B

Supported. Four inference units are required to deploy the model.