Deploying Kubeflow
This section describes the process of deploying Kubeflow on HUAWEI CLOUD CCE and using Kubeflow to build simple TensorFlow training jobs. It also compares the training performance in the single-GPU and multi-GPU scenarios.
For details about the deployment process, see https://bbs.huaweicloud.com/blogs/413d1821c1a211e89fc57ca23e93a89f and the official document at https://www.kubeflow.org/docs/started/getting-started/.
Prerequisites
- A cluster named clusterA has been created on CCE. The cluster has an available GPU node that has two or more GPUs.
- An EIP has been bound to the node, and kubectl has been configured.
Installing ksonnet
You can click here to view the latest ksonnet version. The latest version is v0.13.1. The installation procedure is as follows:
export KS_VER=0.13.1
export KS_PKG=ks_${KS_VER}_linux_amd64
wget -O /tmp/${KS_PKG}.tar.gz
https://github.com/ksonnet/ksonnet/releases/download/v${KS_VER}/${KS_PKG}.tar.gz
mkdir -p ${HOME}/bin
tar -xvf /tmp/$KS_PKG.tar.gz -C ${HOME}/bin
cp ${HOME}/bin/$KS_PKG/ks /usr/local/bin Downloading kfctl.sh
Run the following commands:
mkdir ${KUBEFLOW_SRC}
cd ${KUBEFLOW_SRC}
export KUBEFLOW_TAG=v0.4.1
curl
https://raw.githubusercontent.com/kubeflow/kubeflow/${KUBEFLOW_TAG}/scripts/download.sh | bash - KUBEFLOW_SRC is the download directory of Kubeflow.
- KUBEFLOW_TAG indicates the Kubeflow version. The latest version is v0.4.1.
Configuring the Docker Proxy
This test is conducted in a CN East region. Some images cannot be pulled due to network problems. Therefore, to download these images, a proxy needs to be configured for Docker on the node where the container is located.
mkdir -p /etc/systemd/system/docker.service.d vi /etc/systemd/system/docker.service.d/http-proxy.conf [Service] Environment="HTTP_PROXY=http://proxy.example.com:80/" "HTTPS_PROXY=http://proxy.example.com:80/"
Replace {proxy.example.com:80} with an available proxy address. Exit the vi editor and run the following commands to make the proxy setting take effect:
systemctl daemon-reload systemctl restart docker
Run the following command to confirm that the Docker proxy takes effect:
systemctl show --property=Environment docker
Deploying Kubeflow
Run the following commands:
${KUBEFLOW_SRC}/scripts/kfctl.sh init ${KFAPP} --platform none
cd ${KFAPP}
${KUBEFLOW_SRC}/scripts/kfctl.sh generate k8s
${KUBEFLOW_SRC}/scripts/kfctl.sh apply k8s - KFAPP indicates the Kubeflow Deployment name.
After execution is complete, run the kubectl get po -n kubeflow command to check whether the related resources are started normally. Because the storage has not been configured, some pods are not running.
Configuring the Storage Required by Kubeflow
Kubeflow v0.4.1 depends on the following storage volumes:
- katib-mysql
- mysql-pv-claim
- minio-pv-claim
Therefore, select cluster clusterA and namespace kubeflow on the Resource Management > Storage page of the CCE console to create three storage volumes.
After the volume creation is complete, modify the volume-name parameter of the following Deployments:
kubectl edit deploy minio –nkubeflow :%s/minio-pv-claim/cce-sfs-kubeflow-minio/g :wq!
kubectl edit deploy mysql –nkubeflow :%s/mysql-pv-claim/cce-sfs-kubeflow-mysql/g :wq!
kubectl edit deploy vizier-db –nkubeflow :%s/katib-mysql/cce-sfs-kubeflow-katib/g :wq
After a period of time, all pods are in the running state.
Last Article: Running Kubeflow in CCE
Next Article: Training TensorFlow Models
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.