Help Center/ ModelArts/ ModelArts User Guide (Lite Cluster)/ Using Lite Cluster Resources/ Using Snt9B for Inference in a Lite Cluster Resource Pool
Updated on 2024-11-11 GMT+08:00

Using Snt9B for Inference in a Lite Cluster Resource Pool

Description

This case outlines the process of using the Deployment mechanism to deploy a real-time inference service in the Snt9B environment. Create a pod to host the service, log in to the pod container to deploy the real-time service, and create a terminal as the client to access the service to test its functions.

Figure 1 Task diagram

Procedure

  1. Pulls the image. The test image is bert_pretrain_mindspore:v1, which contains the test data and code.

    docker pull swr.cn-southwest-2.myhuaweicloud.com/os-public-repo/bert_pretrain_mindspore:v1
    docker tag swr.cn-southwest-2.myhuaweicloud.com/os-public-repo/bert_pretrain_mindspore:v1 bert_pretrain_mindspore:v1

  2. Create the config.yaml file on the host.

    Configure Pods using this file. For debugging, start a Pod with the sleep command. Alternatively, replace the command with the boot command for your job (for example, python inference.py). The job will run once the container starts.

    The file content is as follows:
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: yourapp
      labels:
          app: infers
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: infers
      template:
        metadata: 
          labels:
             app: infers
        spec:
          schedulerName: volcano
          nodeSelector:
            accelerator/huawei-npu: ascend-1980
          containers:
          - image: bert_pretrain_mindspore:v1                  # Inference image name
            imagePullPolicy: IfNotPresent
            name: mindspore
            command:
            - "sleep"
            - "1000000000000000000"
            resources:
              requests:
    huawei.com/ascend-1980: "1"      # Number of required NPUs. The maximum value is 16. You can add lines below to configure resources such as memory and CPU. The key remains unchanged.
              limits:
    huawei.com/ascend-1980: "1"        # Limits the number of cards. The key remains unchanged. The value must be consistent with that in requests.
            volumeMounts:
              - name: ascend-driver               # Mount driver. Retain the settings.
              mountPath: /usr/local/Ascend/driver
              - name: ascend-add-ons           # Mount driver. Retain the settings.
              mountPath: /usr/local/Ascend/add-ons
              - name: hccn                             # HCCN configuration of the driver. Retain the settings.
              mountPath: /etc/hccn.conf
            - name: npu-smi                             #npu-smi
              mountPath: /usr/local/sbin/npu-smi
            - name: localtime                       #The container time must be the same as the host time.
              mountPath: /etc/localtime
          volumes:
          - name: ascend-driver
            hostPath:
              path: /usr/local/Ascend/driver
          - name: ascend-add-ons
            hostPath:
              path: /usr/local/Ascend/add-ons
          - name: hccn
            hostPath:
              path: /etc/hccn.conf
          - name: npu-smi
            hostPath:
              path: /usr/local/sbin/npu-smi
          - name: localtime
            hostPath:
              path: /etc/localtime

  3. Create a pod based on the config.yaml file.

    kubectl apply -f config.yaml

  4. Run the following command to check the pod startup status. If 1/1 running is displayed, the startup is successful.

    kubectl get pod -A

  5. Go to the container, replace {pod_name} with your pod name (displayed by the get pod command), and replace {namespace} with your namespace (default).

    kubectl exec -it {pod_name} bash -n {namespace}

  6. Activate the conda mode.

    su - ma-user //Switch the user identity.
    conda activate MindSpore //Activate the MindSpore environment.

  7. Create test code test.py.

    from flask import Flask, request
    import json 
    app = Flask(__name__)
    
    @app.route('/greet', methods=['POST'])
    def say_hello_func():
        print("----------- in hello func ----------")
        data = json.loads(request.get_data(as_text=True))
        print(data)
        username = data['name']
        rsp_msg = 'Hello, {}!'.format(username)
        return json.dumps({"response":rsp_msg}, indent=4)
    
    @app.route('/goodbye', methods=['GET'])
    def say_goodbye_func():
        print("----------- in goodbye func ----------")
        return '\nGoodbye!\n'
    
    
    @app.route('/', methods=['POST'])
    def default_func():
        print("----------- in default func ----------")
        data = json.loads(request.get_data(as_text=True))
        return '\n called default func !\n {} \n'.format(str(data))
    
    # host must be "0.0.0.0", port must be 8080
    if __name__ == '__main__':
        app.run(host="0.0.0.0", port=8080)
    Execute the code. After the code is executed, a real-time service is deployed. The container is the server.
    python test.py
    Figure 2 Deploying a real-time service

  8. Open a terminal in XShell and access the container (client) by referring to steps 5 to 7. Run the following commands to test the functions of the three APIs of the custom image. If the following information is displayed, the service is successfully invoked.

    curl -X POST -H "Content-Type: application/json" --data '{"name":"Tom"}'  127.0.0.1:8080/
    curl -X POST -H "Content-Type: application/json" --data '{"name":"Tom"}' 127.0.0.1:8080/greet
    curl -X GET 127.0.0.1:8080/goodbye
    Figure 3 Accessing a real-time service

    Set limit and request to proper values to restrict the number of CPUs and memory size. A single Snt9B node is equipped with eight Snt9B cards and 192u1536g. Properly plan the CPU and memory allocations to avoid task failures due to insufficient CPU and memory limits.