在Lite Cluster资源池上使用Snt9B完成推理任务

场景描述

本案例介绍如何在Snt9B环境中利用Deployment机制部署在线推理服务。首先创建一个Pod以承载服务，随后登录至该Pod容器内部署在线服务，并最终通过新建一个终端作为客户端来访问并测试该在线服务的功能。

图1 任务示意图

操作步骤

拉取镜像。本测试镜像为bert_pretrain_mindspore:v1，已经把测试数据和代码打进镜像中。

docker pull swr.cn-southwest-2.myhuaweicloud.com/os-public-repo/bert_pretrain_mindspore:v1
docker tag swr.cn-southwest-2.myhuaweicloud.com/os-public-repo/bert_pretrain_mindspore:v1 bert_pretrain_mindspore:v1

在主机上新建config.yaml文件。

config.yaml文件用于配置pod，本示例中使用sleep命令启动pod，便于进入pod调试。您也可以修改command为对应的任务启动命令（如“python inference.py”），任务会在启动容器后执行。

config.yaml内容如下：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: yourapp
  labels:
      app: infers
spec:
  replicas: 1
  selector:
    matchLabels:
      app: infers
  template:
    metadata: 
      labels:
         app: infers
    spec:
      schedulerName: volcano
      nodeSelector:
        accelerator/huawei-npu: ascend-1980
      containers:
      - image: bert_pretrain_mindspore:v1                  # Inference image name
        imagePullPolicy: IfNotPresent
        name: mindspore
        command:
        - "sleep"
        - "1000000000000000000"
        resources:
          requests:
            huawei.com/ascend-1980: "1"             # 需求卡数，key保持不变。Number of required NPUs. The maximum value is 16. You can add lines below to configure resources such as memory and CPU.
          limits:
            huawei.com/ascend-1980: "1"             # 限制卡数，key保持不变。The value must be consistent with that in requests.
        volumeMounts:
        - name: ascend-driver               #驱动挂载，保持不动
          mountPath: /usr/local/Ascend/driver
        - name: ascend-add-ons           #驱动挂载，保持不动
          mountPath: /usr/local/Ascend/add-ons
        - name: hccn                             #驱动hccn配置，保持不动
          mountPath: /etc/hccn.conf
        - name: npu-smi                             #npu-smi
          mountPath: /usr/local/sbin/npu-smi
        - name: localtime                       #The container time must be the same as the host time.
          mountPath: /etc/localtime
      volumes:
      - name: ascend-driver
        hostPath:
          path: /usr/local/Ascend/driver
      - name: ascend-add-ons
        hostPath:
          path: /usr/local/Ascend/add-ons
      - name: hccn
        hostPath:
          path: /etc/hccn.conf
      - name: npu-smi
        hostPath:
          path: /usr/local/sbin/npu-smi
      - name: localtime
        hostPath:
          path: /etc/localtime

根据config.yaml创建pod。
```
kubectl apply -f config.yaml
```
检查pod启动情况，执行下述命令。如果显示“1/1 running”状态代表启动成功。
```
kubectl get pod -A
```
进入容器，{pod_name}替换为您的pod名字（get pod中显示的名字），{namespace}替换为您的命名空间（默认为default）。
```
kubectl exec -it {pod_name} bash -n {namespace}
```

激活conda模式。

su - ma-user   //切换用户身份
conda activate MindSpore //激活 MindSpore环境

创建测试代码test.py。

from flask import Flask, request
import json 
app = Flask(__name__)

@app.route('/greet', methods=['POST'])
def say_hello_func():
    print("----------- in hello func ----------")
    data = json.loads(request.get_data(as_text=True))
    print(data)
    username = data['name']
    rsp_msg = 'Hello, {}!'.format(username)
    return json.dumps({"response":rsp_msg}, indent=4)

@app.route('/goodbye', methods=['GET'])
def say_goodbye_func():
    print("----------- in goodbye func ----------")
    return '\nGoodbye!\n'


@app.route('/', methods=['POST'])
def default_func():
    print("----------- in default func ----------")
    data = json.loads(request.get_data(as_text=True))
    return '\n called default func !\n {} \n'.format(str(data))

# host must be "0.0.0.0", port must be 8080
if __name__ == '__main__':
    app.run(host="0.0.0.0", port=8080)

执行代码，执行后如下图所示，会部署一个在线服务，该容器即为服务端。

python test.py

图2 部署在线服务

在XShell中新建一个终端，参考步骤5~7进入容器，该容器为客户端。执行以下命令验证自定义镜像的三个API接口功能。当显示如图所示时，即可调用服务成功。
```
curl -X POST -H "Content-Type: application/json" --data '{"name":"Tom"}'  127.0.0.1:8080/
curl -X POST -H "Content-Type: application/json" --data '{"name":"Tom"}' 127.0.0.1:8080/greet
curl -X GET 127.0.0.1:8080/goodbye
```
图3 访问在线服务

limit/request配置cpu和内存大小，已知单节点Snt9B机器为：8张Snt9B卡+192核1536GB，请合理规划，避免cpu和内存限制过小引起任务无法正常运行。