Deploying Kubeflow
This section describes the process of deploying Kubeflow on Huawei Cloud CCE and using Kubeflow to build simple TensorFlow training jobs. It also compares the training performance in the single-GPU and multi-GPU scenarios.
For details about the deployment process, see the official document at https://www.kubeflow.org/docs/started/getting-started/.
Prerequisites
- A cluster named clusterA has been created on CCE. The cluster has an available GPU node that has two or more GPUs.
- An EIP has been bound to the node, and kubectl has been configured.
Installing ksonnet
You can click here to view the latest ksonnet version. The latest version is v0.13.1. The installation procedure is as follows:
export KS_VER=0.13.1 export KS_PKG=ks_${KS_VER}_linux_amd64 wget -O /tmp/${KS_PKG}.tar.gz https://github.com/ksonnet/ksonnet/releases/download/v${KS_VER}/${KS_PKG}.tar.gz mkdir -p ${HOME}/bin tar -xvf /tmp/$KS_PKG.tar.gz -C ${HOME}/bin cp ${HOME}/bin/$KS_PKG/ks /usr/local/bin
Downloading kfctl.sh
Run the following commands:
export KUBEFLOW_SRC=/home/kubeflow_src mkdir ${KUBEFLOW_SRC} cd ${KUBEFLOW_SRC} export KUBEFLOW_TAG=v0.4.1 curl https://raw.githubusercontent.com/kubeflow/kubeflow/${KUBEFLOW_TAG}/scripts/download.sh | bash
- KUBEFLOW_SRC is the download directory of Kubeflow.
- KUBEFLOW_TAG indicates the kubeflow version, for example, v0.4.1.
Deploying Kubeflow
Run the following commands:
${KUBEFLOW_SRC}/scripts/kfctl.sh init ${KFAPP} --platform none cd ${KFAPP} ${KUBEFLOW_SRC}/scripts/kfctl.sh generate k8s ${KUBEFLOW_SRC}/scripts/kfctl.sh apply k8s
- KFAPP indicates the Kubeflow Deployment name.
After execution is complete, run the kubectl get po -n kubeflow command to check whether the related resources are started normally. Because the storage has not been configured, some pods are not running.
Configuring the Storage Required by Kubeflow
Kubeflow v0.4.1 depends on the following storage volumes:
- katib-mysql
- mysql-pv-claim
- minio-pv-claim
Therefore, select cluster clusterA and namespace kubeflow on the Resource Management > Storage page of the CCE console to create three storage volumes.
After the volume creation is complete, modify the volume-name parameter of the following Deployments:
kubectl edit deploy minio –nkubeflow :%s/minio-pv-claim/cce-sfs-kubeflow-minio/g :wq!
kubectl edit deploy mysql –nkubeflow :%s/mysql-pv-claim/cce-sfs-kubeflow-mysql/g :wq!
kubectl edit deploy vizier-db –nkubeflow :%s/katib-mysql/cce-sfs-kubeflow-katib/g :wq
After a period of time, all pods are in the running state.
Feedback
Was this page helpful?
Provide feedbackFor any further questions, feel free to contact us through the chatbot.
Chatbot