Third-Party Large Models

Specifications of Third-Party Large Models

Currently, ModelArts Studio is focused on the NLP field and offers a selection of popular open-source NLP models from third-party providers for users to choose from.

For example, DeepSeek V3 was released on December 26, 2024. It is a Mixture-of-Experts (MoE) language model with 671B parameters. DeepSeek V3 outperforms GPT-4.5 on mathematical and coding evaluation benchmarks. DeepSeek R1 has the same architecture as DeepSeek V3. It was officially open-sourced on January 20, 2025. As an outstanding representative of models with strong reasoning capabilities, DeepSeek R1 has attracted great attention. DeepSeek R1 has achieved the same or even better performance than top closed-source models such as GPT-4o and GPT-4o1 in core tasks such as mathematical reasoning and code generation, and is recognized as a leading LLM in the industry. Recently, DeepSeek has open-sourced updated versions of its language models, DeepSeek V3-0324 and DeepSeek-R1-0528, which offer enhanced capabilities. These models have also been integrated into ModelArts Studio.

In addition to DeepSeek models, ModelArts Studio integrates Qwen series models. It supports Qwen3 series models (Qwen3-8B/14B/30B-A3B/32B/235B-A22B), DeepSeek-R1-Distill-Qwen-32B, DeepSeek-R1-Distill-LLama-70B/8B, Qwen2.5-72B, QwQ-32B, and Qwen2.5-VL-32B.

ModelArts Studio provides you with third-party NLP models of different specifications to meet different scenarios and requirements. The following table lists the supported models. You can choose the most suitable model based on your requirements for development and application.

**Table 1** Specifications of third-party large models
Supported Region	Model Name	Maximum Context Length	Description
CN-Hong Kong	DeepSeek-V3-32K-0.0.1	32K	It was released in March 2025. It supports inference for a context length of 32K tokens. 16 inference units are required to deploy the model. Inference with a context length of 32K tokens supports up to 256 concurrent calls.
	DeepSeek-V3-32K-0.0.2	32K	It was released in June 2025. It supports inference for a context length of 32K tokens. 16 inference units are required to deploy the model. Inference with a context length of 32K tokens supports up to 256 concurrent calls. The base model of this version is the open-source model DeepSeek V3-0324.
	DeepSeek-R1-32K-0.0.1	32K	It was released in March 2025. It supports inference for a context length of 32K tokens. 16 inference units are required to deploy the model. Inference with a context length of 32K tokens supports up to 256 concurrent calls.
	DeepSeek-R1-32K-0.0.2	32K	It was released in June 2025. It supports inference for a context length of 32K tokens. 16 inference units are required to deploy the model. Inference with a context length of 32K tokens supports up to 256 concurrent calls. The base model of this version is the open-source model DeepSeek R1-0528.
	DeepSeek-R1-distil-Qwen-32B	32K	DeepSeek-R1-Distill-Qwen-32B is a model fine-tuned based on the open-source model Qwen2.5-32B using data generated by DeepSeek-R1.
	DeepSeek-R1-distill-LLama-70B	32K	DeepSeek-R1-Distill-Llama-70B is a model fine-tuned based on the open-source model Llama-3.1-70B using data generated by DeepSeek-R1.
	DeepSeek-R1-distill-LLama-8B	32K	DeepSeek-R1-Distill-Llama-8B is a model fine-tuned based on the open-source model Llama-3.1-8B using data generated by DeepSeek-R2.
	Qwen3-235B-A22B	32K	Qwen3-235B-A22B uniquely supports seamless switching between thinking mode and non-thinking mode, allowing users to switch between the two in a dialogue. The model's inference capability significantly outperforms that of QwQ, and its general capability far exceeds that of Qwen2.5-72B-Instruct, achieving SOTA performance among models of the same scale in the industry.
	Qwen3-32B	32K	Qwen3-32B uniquely supports seamless switching between thinking mode and non-thinking mode, allowing users to switch between the two in a dialogue. The model's inference capability significantly outperforms that of QwQ, and its general capability far exceeds that of Qwen2.5-32B-Instruct, achieving SOTA performance among models of the same scale in the industry.
	Qwen3-30B-A3B	32K	Qwen3-30B-A3B uniquely supports seamless switching between thinking mode and non-thinking mode, allowing users to switch between the two in a dialogue. The model's inference capability significantly outperforms that of QwQ, and its general capability far exceeds that of Qwen2.5-32B-Instruct, achieving SOTA performance among models of the same scale in the industry.
	Qwen3-14B	32K	Qwen3-14B uniquely supports seamless switching between thinking mode and non-thinking mode, allowing users to switch between the two in a dialogue. The model's inference capability reaches the SOTA level among models of the same scale in the industry, and its general capability significantly surpasses that of Qwen2.5-14B.
	Qwen3-8B	32K	Qwen3-8B uniquely supports seamless switching between thinking mode and non-thinking mode, allowing users to switch between the two in a dialogue. The model's inference capability reaches the SOTA level among models of the same scale in the industry, and its general capability significantly surpasses that of Qwen2.5-7B.
	Qwen2.5-72B	32K	Compared to Qwen2, Qwen2.5 has acquired significantly more knowledge and has greatly improved capabilities in coding and mathematics. Additionally, the new models achieve significant improvements in instruction following, generating long text, understanding structured data (e.g, tables), and generating structured outputs especially JSON.
	Qwen2.5-VL-32B	32K	This Qwen2.5-VL 32B model provides capabilities including image recognition, precise visual positioning, text recognition and understanding, document parsing, and video comprehension.
	QWQ-32B	32K	This model is the QwQ reasoning model trained based on Qwen2.5-32B. Reinforcement learning greatly improves the model's inference capability. The core metrics of the model, including mathematical code (AIME 24/25, LiveCodeBench) and some general metrics (IFEval, LiveBench, etc.), reach the level of the full version of DeepSeek-R1, with all metrics significantly surpassing those of DeepSeek-R1-Distill-Qwen-32B, which is also based on Qwen2.5-32B.

Platform Operations Supported by Third-Party Large Models

**Table 2** Platform operations supported by third-party large models
Model Name	Model Evaluation	Real-Time Inference	Model Commissioning in Experience Center
DeepSeek-V3-32K-0.0.1	√	√	√
DeepSeek-V3-32K-0.0.2	√	√	√
DeepSeek-R1-32K-0.0.1	√	√	√
DeepSeek-R1-32K-0.0.2	√	√	√
DeepSeek-R1-distil-Qwen-32B	√	√	√
DeepSeek-R1-distill-LLama-70B	√	√	√
DeepSeek-R1-distill-LLama-8B	√	√	√
Qwen3-235B-A22B	√	√	√
Qwen3-32B	√	√	√
Qwen3-30B-A3B	√	√	√
Qwen3-14B	√	√	√
Qwen3-8B	√	√	√
Qwen2.5-72B	√	√	√
Qwen2.5-VL-32B	√	√	√
QWQ-32B	√	√	√

Dependency of Third-Party Large Models on Resource Pools

**Table 3** Dependency of third-party large models on resource pools
Model Name	Cloud-based Deployment
DeepSeek-V3-32K-0.0.1	Supported. 16 inference units are required to deploy the model.
DeepSeek-V3-32K-0.0.2	Supported. 16 inference units are required to deploy the model.
DeepSeek-R1-32K-0.0.1	Supported. 16 inference units are required to deploy the model.
DeepSeek-R1-32K-0.0.2	Supported. 16 inference units are required to deploy the model.
DeepSeek-R1-distil-Qwen-32B	Supported. Two inference units are required to deploy the model.
DeepSeek-R1-distill-LLama-70B	Supported. Four inference units are required to deploy the model.
DeepSeek-R1-distill-LLama-8B	Supported. One inference unit is required to deploy the model.
Qwen3-235B-A22B	Supported. 16 inference units are required to deploy the model.
Qwen3-32B	Supported. Four inference units are required to deploy the model.
Qwen3-30B-A3B	Supported. Two inference units are required to deploy the model.
Qwen3-14B	Supported. One inference unit is required to deploy the model.
Qwen3-8B	Supported. One inference unit is required to deploy the model.
Qwen2.5-72B	Supported. Four inference units are required to deploy the model.
Qwen2.5-VL-32B	Supported. Four inference units are required to deploy the model.
QWQ-32B	Supported. Four inference units are required to deploy the model.