Updated on 2022-08-12 GMT+08:00

Deploying Kubeflow

This section describes the process of deploying Kubeflow on Huawei Cloud CCE and using Kubeflow to build simple TensorFlow training jobs. It also compares the training performance in the single-GPU and multi-GPU scenarios.

For details about the deployment process, see the official document at https://www.kubeflow.org/docs/started/getting-started/.

Prerequisites

  • A cluster named clusterA has been created on CCE. The cluster has an available GPU node that has two or more GPUs.
  • An EIP has been bound to the node, and kubectl has been configured.

Installing ksonnet

You can click here to view the latest ksonnet version. The latest version is v0.13.1. The installation procedure is as follows:

export KS_VER=0.13.1 
export KS_PKG=ks_${KS_VER}_linux_amd64 
wget -O /tmp/${KS_PKG}.tar.gz 
https://github.com/ksonnet/ksonnet/releases/download/v${KS_VER}/${KS_PKG}.tar.gz 
mkdir -p ${HOME}/bin 
tar -xvf /tmp/$KS_PKG.tar.gz -C ${HOME}/bin 
cp ${HOME}/bin/$KS_PKG/ks /usr/local/bin

Downloading kfctl.sh

Run the following commands:

export KUBEFLOW_SRC=/home/kubeflow_src
mkdir ${KUBEFLOW_SRC} 
cd ${KUBEFLOW_SRC}
export KUBEFLOW_TAG=v0.4.1 

curl 
https://raw.githubusercontent.com/kubeflow/kubeflow/${KUBEFLOW_TAG}/scripts/download.sh | bash
  • KUBEFLOW_SRC is the download directory of Kubeflow.
  • KUBEFLOW_TAG indicates the kubeflow version, for example, v0.4.1.

Deploying Kubeflow

Run the following commands:

${KUBEFLOW_SRC}/scripts/kfctl.sh init ${KFAPP} --platform none 
cd ${KFAPP} 
${KUBEFLOW_SRC}/scripts/kfctl.sh generate k8s 
${KUBEFLOW_SRC}/scripts/kfctl.sh apply k8s
  • KFAPP indicates the Kubeflow Deployment name.

After execution is complete, run the kubectl get po -n kubeflow command to check whether the related resources are started normally. Because the storage has not been configured, some pods are not running.

Configuring the Storage Required by Kubeflow

Kubeflow v0.4.1 depends on the following storage volumes:

  • katib-mysql
  • mysql-pv-claim
  • minio-pv-claim

Therefore, select cluster clusterA and namespace kubeflow on the Resource Management > Storage page of the CCE console to create three storage volumes.

After the volume creation is complete, modify the volume-name parameter of the following Deployments:

kubectl edit deploy minio –nkubeflow 
:%s/minio-pv-claim/cce-sfs-kubeflow-minio/g 
:wq!
kubectl edit deploy mysql –nkubeflow 
:%s/mysql-pv-claim/cce-sfs-kubeflow-mysql/g 
:wq!
kubectl edit deploy vizier-db –nkubeflow 
:%s/katib-mysql/cce-sfs-kubeflow-katib/g 
:wq

After a period of time, all pods are in the running state.