Help Center/ FunctionGraph/ Service Overview/ Function Instance Types and Usage Modes

Updated on 2025-10-20 GMT+08:00

View PDF

Function Instance Types and Usage Modes

This section describes the CPU and GPU instance types, billing modes, and specifications.

Instance Type

Function instances are classified into the following types:

CPU instances: Basic function workflow instances, which are suitable for burst traffic and compute-intensive scenarios.
GPU instances: GPU instances based on the Turing architecture, which are suitable for audio/video, AI, and image processing scenarios. Different service loads are accelerated by GPU hardware to improve processing efficiency.
GPU instances can be deployed only using container images and custom runtimes.

Usage Modes

CPU and GPU instances are available in two types: on-demand and reserved. The two modes are described as follows:

In on-demand mode, instances are automatically scaled by FunctionGraph based on invocations.

Instances are created by requests and destroyed after 1 minute of inactivity. Initial invocations involve a cold start.

Constraints:

By default, a single Huawei Cloud account (IAM account) can have a maximum of 1,000 instances in a region. If you need more instances, submit a service ticket.

Billing mode:

In on-demand mode, billing begins when the function is triggered by the request and ends when the request is completed. A single instance can process a single request or multiple requests based on the configuration. For details about the execution duration, see Table 1. For details about how to configure concurrent processing, see Configuring Single-Instance Multi-Concurrency.

When no function is invoked, the system does not allocate compute resources and no fees are generated. You will be billed only when the function is invoked and executed. For details about service billing, see FunctionGraph Billing Overview.

**Table 1** Execution duration description
Request Mode	Duration	Example
Single-instance single-concurrency	The execution duration starts from the time when the request arrives at the instance and ends when the request is completed.	If the request arrives at 00:00:00 and ends at 00:00:05, the billing duration is 5 seconds. If three requests arrive at the same time and each takes 5 seconds, the total duration is 3 x 5=15 seconds.
Single-instance multi-concurrency	The execution duration starts from the time when the first request arrives at the instance and ends when the last request is completed.	If the first request arrives at 00:00:00 and ends at 00:00:05, and the last request arrives at 00:00:03 and ends at 00:00:08. They are executed in the same instance and the total duration is 8 seconds.

In the reserved mode, you can manage the lifecycle of function instances to achieve flexible control over computing resources. When reserved instances are configured for a function, FunctionGraph prioritizes scheduling incoming requests to these instances. If traffic exceeds the capacity of reserved resources, FunctionGraph automatically scales resources to dynamically allocate on-demand instances, ensuring service continuity.

After reserved instances are created for a function, the code, dependencies, and initializer of the function are automatically loaded. Reserved instances are always alive in the execution environment, eliminating the influence of cold starts on your services. You are advised to configure a fixed number of reserved instances based on the service requirements, scheduled scaling, metric scaling, and intelligent recommendation policies based on traffic peaks and troughs.

Note:

To ensure service stability and reliability, do not rely on the initializer of the reserved instance to execute one-time services.

Billing mode:

For details, see the execution duration (reserved instances) billing item in Billing Items and Reserved Instance Billing.

Specifications

CPU instance

CPU instances include the following specifications.

**Table 2** CPU instance specifications
Memory	Max. Code Package Size	Max. Execution Duration	Max. Disk Size
128–32,768 MB The value must be a multiple of 64.	ZIP file: 1.5 GB (after decompression) OBS bucket: 300 MB (after compression)	259,200s If the execution takes longer than 900 seconds, use asynchronous invocation.	The value can be 512 MB (default) or 10 GB.

GPU instance

GPU instances have the following specifications.

**Table 3** GPU instance specifications
GPU	Total Graphics Memory	Compute (TFLOPS)		Optional Split Specifications		On-demand Mode	Reserved Mode	Idle Mode
NVIDIA T4	16 GB	FP16	FP32	Graphics Memory (MB)	Memory (MB)	Supported	Supported	Supported
NVIDIA T4	16 GB	65	8	1024–16384 (1–16 GB) The value must be a multiple of 1024 MB.	128–32768 The value must be a multiple of 64.	Supported	Supported	Supported