Overview

Serverless GPU is a new cloud computing service that allows for flexible, efficient, and on-demand allocation of GPU computing resources. It provides on-demand GPU computing resources to effectively solve issues such as low resource utilization, high costs, and low scalability caused by long-term use of GPUs. This document describes the functions and advantages of serverless GPUs.

The long-term use of a traditional GPU has many issues, such as pre-planning resource requirements and potential resource wastage. Serverless GPU offers a flexible approach to utilizing GPU computing resources. Users simply need to choose the appropriate GPU model and computing resource scale to efficiently handle acceleration workloads, such as AI model inference and training, accelerated audio and video production, and graphics and image acceleration.

GPU functions provide GPU hardware acceleration for simulation, scientific computing, audio/videos, AI, and image processing to improve service efficiency.

**Table 1** GPU function specifications
Card Type	vGPU Memory (GB)	vGPU Computing Power	Feature
NVIDIA-T4	1–16 Must be an integer.	Automatically allocated by the system.	T4 is specifically designed for AI inference workloads, such as neural networks that process video, voice, search engines, and images. The T4 GPU boasts impressive specs, including 16 GB GDDR6 memory, 320 Turing Tensor Cores, and 2,560 Turing CUDA Cores. It also offers exceptional performance and computing capabilities across multiple precisions, including FP32, FP16, INT8, and INT4. Its peak performance is 65T for FP16, 130T for INT8, and 260T for INT4.

Figure 1 GPU cloud product selection guide

This feature is supported only in CN East-Shanghai1.
GPU functions do not support the following network segments: 192.168.64.0/18, 192.168.128.0/18, 10.192.64.0/18, and 10.192.128.0/18.