创建TFJob
功能介绍
创建TFJob。
TFJob即Tensorflow任务,是基于Tensorflow开源框架的kubernetes自定义资源类型,有多种角色可以配置,能够帮助我们更简单地实现Tensorflow的单机或分布式训练。Tensorflow开源框架的信息详见:https://www.tensorflow.org 。
URI
POST /apis/kubeflow.org/v1/namespaces/{namespace}/tfjobs
参数 |
是否必选 |
描述 |
---|---|---|
namespace |
Yes |
object name and auth scope, such as for teams and projects |
参数 |
是否必选 |
描述 |
---|---|---|
pretty |
No |
If 'true’, then the output is pretty printed. |
请求消息
请求参数:
请求参数的详细描述请参见表154。
请求示例:
{ "apiVersion": "kubeflow.org/v1", "kind": "TFJob", "metadata": { "name": "tfjob-test" }, "spec": { "backoffLimit": 6, "tfReplicaSpecs": { "Ps": { "replicas": 1, "template": { "spec": { "containers": [ { "args": [ "python", "/opt/tf-benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py", "--batch_size=1", "--model=resnet50", "--variable_update=parameter_server", "--flush_stdout=true", "--num_gpus=1", "--local_parameter_device=cpu", "--device=cpu", "--data_format=NHWC" ], "image": "*.*.*.215:20202/cci/tf-benchmarks-cpu:v1", "name": "tensorflow", "ports": [ { "containerPort": 2222, "name": "tfjob-port" } ], "resources": { "limits": { "cpu": "2", "memory": "4Gi" }, "requests": { "cpu": "2", "memory": "4Gi" } } } ], "restartPolicy": "OnFailure", "imagePullSecrets": [ { "name": "imagepull-secret" } ] } } }, "Worker": { "replicas": 1, "template": { "spec": { "containers": [ { "args": [ "python", "/opt/tf-benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py", "--batch_size=1", "--model=resnet50", "--variable_update=parameter_server", "--flush_stdout=true", "--local_parameter_device=cpu", "--device=cpu", "--data_format=NHWC" ], "image": "*.*.*.215:20202/cci/tf-benchmarks-cpu:v1", "name": "tensorflow", "ports": [ { "containerPort": 2222, "name": "tfjob-port" } ], "resources": { "limits": { "cpu": "2", "memory": "4Gi" }, "requests": { "cpu": "2", "memory": "4Gi" } } } ], "restartPolicy": "OnFailure", "imagePullSecrets": [ { "name": "imagepull-secret" } ] } } } } } }
响应消息
响应参数:
响应参数的详细描述请参考表154。
响应示例:
{ "apiVersion": "kubeflow.org/v1", "kind": "TFJob", "metadata": { "creationTimestamp": "2019-07-23T12:39:47Z", "generation": 1, "name": "tfjob-test", "namespace": "kube-test", "resourceVersion": "72050567", "selfLink": "/apis/kubeflow.org/v1/namespaces/kube-test/tfjobs/tfjob-test", "uid": "f461f966-ad46-11e9-aaa4-340a9837e413" }, "spec": { "backoffLimit": 6, "tfReplicaSpecs": { "Ps": { "replicas": 1, "template": { "spec": { "containers": [ { "args": [ "python", "/opt/tf-benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py", "--batch_size=1", "--model=resnet50", "--variable_update=parameter_server", "--flush_stdout=true", "--num_gpus=1", "--local_parameter_device=cpu", "--device=cpu", "--data_format=NHWC" ], "image": "*.*.*.215:20202/cci/tf-benchmarks-cpu:v1", "name": "tensorflow", "ports": [ { "containerPort": 2222, "name": "tfjob-port" } ], "resources": { "limits": { "cpu": "2", "memory": "4Gi" }, "requests": { "cpu": "2", "memory": "4Gi" } } } ], "imagePullSecrets": [ { "name": "imagepull-secret" } ], "restartPolicy": "OnFailure" } } }, "Worker": { "replicas": 1, "template": { "spec": { "containers": [ { "args": [ "python", "/opt/tf-benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py", "--batch_size=1", "--model=resnet50", "--variable_update=parameter_server", "--flush_stdout=true", "--local_parameter_device=cpu", "--device=cpu", "--data_format=NHWC" ], "image": "*.*.*.215:20202/cci/tf-benchmarks-cpu:v1", "name": "tensorflow", "ports": [ { "containerPort": 2222, "name": "tfjob-port" } ], "resources": { "limits": { "cpu": "2", "memory": "4Gi" }, "requests": { "cpu": "2", "memory": "4Gi" } } } ], "imagePullSecrets": [ { "name": "imagepull-secret" } ], "restartPolicy": "OnFailure" } } } } }, "status": { } }
状态码
状态码 |
描述 |
---|---|
200 |
OK |
201 |
Created |
202 |
Accepted |
401 |
Unauthorized |
400 |
Badrequest |
500 |
Internal error |
403 |
Forbidden |