Help Center/ FunctionGraph/ Service Overview/ Function Instance Types and Usage Modes
Updated on 2025-08-22 GMT+08:00

Function Instance Types and Usage Modes

This section describes the CPU and GPU instance types, billing modes, and specifications.

Instance Type

Function instances are classified into the following types:

  • CPU instances: Basic function workflow instances, which are suitable for burst traffic and compute-intensive scenarios.
  • GPU instances: GPU instances based on the Turing architecture, which are suitable for audio/video, AI, and image processing scenarios. Different service loads are accelerated by GPU hardware to improve processing efficiency.

    GPU instances can be deployed only using container images or custom runtimes.

Usage Modes

CPU and GPU instances are available in two types: on-demand and reserved. The two modes are described as follows:

In on-demand mode, instances are automatically scaled by FunctionGraph based on invocations.

Instances are created by requests and destroyed after 1 minute of inactivity. Initial invocations involve a cold start.

Constraints:

By default, a single Huawei Cloud account (IAM account) can have a maximum of 1,000 instances in a region. If you need more instances, submit a service ticket.

Billing mode:

In on-demand mode, billing begins when the function is triggered by the request and ends when the request is completed. A single instance can process a single request or multiple requests based on the configuration. For details about the execution duration, see Table 1. For details about how to configure concurrent processing, see Configuring Single-Instance Multi-Concurrency.

When no function is invoked, the system does not allocate compute resources and no fees are generated. You will be billed only when the function is invoked and executed. For details about service billing, see FunctionGraph Billing Overview.

Table 1 Execution duration description

Request Mode

Duration

Example

Single-instance single-concurrency

The execution duration starts from the time when the request arrives at the instance and ends when the request is completed.

  • If the request arrives at 00:00:00 and ends at 00:00:05, the billing duration is 5 seconds.
  • If three requests arrive at the same time and each takes 5 seconds, the total duration is 3 x 5=15 seconds.

Single-instance multi-concurrency

The execution duration starts from the time when the first request arrives at the instance and ends when the last request is completed.

If the first request arrives at 00:00:00 and ends at 00:00:05, and the last request arrives at 00:00:03 and ends at 00:00:08. They are executed in the same instance and the total duration is 8 seconds.

In the reserved mode, you can manage the lifecycle of function instances to achieve flexible control over computing resources. When reserved instances are configured for a function, FunctionGraph prioritizes scheduling incoming requests to these instances. If traffic exceeds the capacity of reserved resources, FunctionGraph automatically scales resources to dynamically allocate on-demand instances, ensuring service continuity.

After reserved instances are created for a function, the code, dependencies, and initializer of the function are automatically loaded. Reserved instances are always alive in the execution environment, eliminating the influence of cold starts on your services. You are advised to configure a fixed number of reserved instances based on the service requirements, scheduled scaling, metric scaling, and intelligent recommendation policies based on traffic peaks and troughs.

Note:

To ensure service stability and reliability, do not rely on the initializer of the reserved instance to execute one-time services.

Billing mode:

For details, see the execution duration (reserved instances) billing item in Billing Items and Reserved Instance Billing.

Specifications

  • CPU instance

    CPU instances include the following specifications.

    Table 2 CPU instance specifications

    Memory

    Max. Code Package Size

    Max. Execution Duration

    Max. Disk Size

    128–32,768 MB

    The value must be a multiple of 64.

    • ZIP file: 1.5 GB (after decompression)
    • OBS bucket: 300 MB (after compression)

    259,200s

    If the execution takes longer than 900 seconds, use asynchronous invocation.

    The value can be 512 MB (default) or 10 GB.

  • GPU instance

    GPU instances have the following specifications.

    Table 3 GPU instance specifications

    GPU

    Total Graphics Memory

    Compute (TFLOPS)

    Optional Split Specifications

    On-demand Mode

    Reserved Mode

    Idle Mode

    NVIDIA T4

    16 GB

    FP16

    FP32

    Graphics Memory (MB)

    Memory (MB)

    Supported

    Supported

    Supported

    65

    8

    1024–16384 (1–16 GB)

    The value must be a multiple of 1024 MB.

    128–32768

    The value must be a multiple of 64.