Updated on 2025-07-08 GMT+08:00

Creating a Ray Service

Prerequisites

Creating a Ray Service

  1. Log in to Workspace Management Console.
  2. Select the created workspace and click Access Workspace. In the navigation pane, choose Resources and Assets > Ray Services, and click Create Ray Service in the upper right corner.
  3. On the displayed page, set required parameters, including Basic Settings, Log Settings, Ray Cluster Settings, Data, and Ray Serve Settings.

    For details, see Table 1.
    Table 1 Parameters for creating a Ray service

    Parameter

    Description

    Basic Settings

    Ray Service Name

    Name of the Ray service to be created.

    Add Description

    Click Add Description and enter the introduction to the Ray service in the text box. It can contain a maximum of 1,000 characters.

    Image Package Source

    The options are Public Ray image package and My Ray service image package. Select My Ray service image package.

    • Public Ray image package: Public image packages provided by DataArtsFabric. They are open-source Ray images and support enhanced DataArtsFabric features such as channel encryption, secure dashboard access, and key encryption and decryption.
    • My Ray service image package: Tenants can customize Ray images as required and use image package management provided by DataArtsFabric to create and deploy image packages.

    Image Package Name

    Name of the service image package to be used.

    Image Package Version

    Select a Ray service version as required.

    Log Settings

    Enabling LTS

    Whether to store Ray service run logs in the log service provided by Huawei Cloud LTS.

    After this function is enabled, logs in the following paths are collected:

    • /tmp/ray/session_latest/logs/**/*
    • /var/log/service-log/**/*

    Log Group

    Select a log group of Huawei Cloud LTS. You can create a log group on the LTS console. For details, see Creating a Log Group.

    Log Stream

    Select a log stream of Huawei Cloud LTS. You can create a log stream on the LTS console. For details, see Creating a Log Stream.

    Ray Cluster Settings

    Head Specifications

    Head node specifications of the Ray cluster to be created. Set this parameter as required.

    All specifications are displayed in the specification list. The selected specification can be downward compatible with the created Ray resource. For example, if the fabric.ray.dpu.d4x resource is created, you can select fabric.ray.dpu.d1x, fabric.ray.dpu.d2x, or fabric.ray.dpu.d4x for Head Specifications. That is, a large resource specification can be split into multiple smaller resource specifications.

    Worker Specifications

    Worker group specifications of the Ray cluster to be created. You can click Add Worker Group to create multiple worker groups of different specifications.

    Select a specification from the resource specification list for worker node deployment, and set the minimum and maximum number of worker nodes. The minimum number must be at least 1, and the maximum number can be set based on workloads.

    When the Ray cluster is initialized, the minimum number of worker nodes are created. The number of worker nodes is dynamically scaled to the maximum number based on workloads.

    The selected worker node specification must be also downward compatible with the existing resource. For example, if the purchased Ray resource is fabric.ray.dpu.d4x and fabric.ray.dpu.d1x is selected for Head Specifications, you can also select fabric.ray.dpu.d1x for Worker Specifications and set the maximum number of worker nodes to 3.

    Data

    Data Input

    Model path used for running the inference service. After the Ray service is created, the model files in this path are copied to the Ray service cluster.

    Ray Serve Settings

    Add Application

    You can click Add Application to configure and customize deployment files, running environments, and scheduling parameters. A maximum of five applications can be added.

    Application Name

    Name of the application to be created.

    Code Directory

    Code directory required for inference. You can select OBS, Image Path, or Other.

    Deployment File Path

    Path of the inference instance in the code.

    Routing Prefix

    Routing prefix for inference. The routing prefix of each application must be unique.

    Environment Variables

    Select Environment Variables as required and click Add to configure environment variables. For details, see Managing Environment Variables of a Training Container.

    Deployment

    Inference instance corresponding to the application. Select Deployment and set this parameter based on the specifications of each application.

    Multiple deployments can be created in a single application. Configurations of the Ray actor, automatic scaling, and inference can be customized for each deployment.

    Resources required by the Ray actor of each deployment can be configured separately. However, the total number of resources required for deployments in a single application cannot exceed the worker specifications in the basic settings.

    You can configure the fixed and maximum number of replicas, as well as the automatic scaling range for a deployment. If the fixed number of replicas has been configured for a deployment, automatic scaling cannot be configured.

Viewing Ray Service Details

  1. Log in to Workspace Management Console.
  2. Select the created workspace and click Access Workspace. In the navigation pane on the left, choose Resources and Assets > Ray Services.
  3. On the displayed page, click the name of the target Ray service to access its details page.

    On the displayed page, you can view the Ray service overview and Ray Serve settings. For details, see Table 2 and Table 3.

    Table 2 Parameters on the Overview tab

    Parameter

    Description

    Ray Service Name

    User-defined Ray service name.

    Ray Service ID

    Unique ID of the Ray service.

    Status

    Status of the current Ray service.

    Description

    Custom description of the Ray service.

    Created By

    Creator of the Ray service.

    Created

    Time when the Ray service is created.

    Image Package Version

    Version of the Ray service image required in the current Ray service deployment.

    Head Specifications

    Resource specifications and quantity required by head nodes in the Ray service deployment.

    Worker Specifications

    Resource specifications and quantity required by worker nodes in the Ray service deployment.

    Dashboard

    Link for accessing the Ray dashboard.

    Data

    Path and environment variables generated based on the user-defined input path.

    Log Transfer to LTS

    You can select Yes or No. If you enable LTS in the log settings when creating the Ray service, set this parameter to Yes.

    View LTS Logs

    If Log Transfer to LTS is enabled, you can click the link to go to the LTS log stream to view logs.

    Table 3 Parameters on the Ray Serve Settings tab

    Parameter

    Description

    Application name

    Name of the created application.

    Inference Address

    Address for calling the inference service. For details, see Running an Inference Service.

    Code Directory

    Directory of the code required for inference.

    Deployment File Path

    Path of the inference instance in the code.

    Routing Prefix

    Routing prefix for inference. The routing prefix of each application must be unique.

    Environment Variables

    Environment variables in the container, which are generated based on the code directory and model directory.

    Deployment

    Inference instance corresponding to the application.

    Multiple deployments can be created in a single application. Configurations of the Ray actor, automatic scaling, and inference can be customized for each deployment.

    Resources required by the Ray actor of each deployment can be configured separately. However, the total number of resources required for deployments in a single application cannot exceed the worker specifications in the basic settings.

    You can configure the fixed and maximum number of replicas, as well as the automatic scaling range for a deployment. If the fixed number of replicas has been configured for a deployment, automatic scaling cannot be configured.