查询作业详情
使用get命令查询作业的详细信息,该命令同时可以用于获取作业模板。
命令结构
health get job ID [flags]
参数 |
简写 |
是否必选 |
说明 |
---|---|---|---|
ID |
无 |
否 |
|
--detail |
-d |
否 |
配合ID使用,返回作业的详细信息。 |
--sample |
-s |
否 |
获取作业模板,模板为yaml格式。 |
--limit |
-l |
否 |
代表当次请求获取的最大查询条数(默认为10)。 |
--offset |
-o |
否 |
偏移量,从第几条数据开始查询,默认为0。 |
--event |
-e |
否 |
获取作业事件或者作业某一task事件,单独使用表示获取作业事件;与--task一起使用表示获取某一个task事件,并同时输出task实例列表。 |
--log |
-g |
否 |
本地存放task日志的路径,必须与--task一起使用以获取作业某一task的日志。 |
--task |
-a |
否 |
task名称。如果是并发的task,那么默认获取索引号为0的task实例,如果要查看别的实例,格式: --task task名称;实例索引,如--task task-1;1。 |
--finish-from-time |
-x |
否 |
查询任务完成起始时间。例如,2006-01-02 15:04:05。 |
--finish-to-time |
-y |
否 |
查询任务完成结束时间。例如,2006-01-02 15:04:05。 |
--create-from-time |
-c |
否 |
询任务创建起始时间,例子:--create-from-time="2006-01-02 15:04:05"。 |
--create-to-time |
-m |
否 |
询任务创建结束时间,例子:--create-to-time="2006-01-02 15:04:05"。 |
--labels |
-k |
否 |
作业标签列表。以","分隔,如:"a,b"。 |
--status |
-q |
否 |
作业状态(用于获取作业列表),取值:Succeeded、Running、Pending、Failed、Cancelling、Cancelled、Unknown。 |
--workflow-name |
-t |
否 |
流程名。 |
--user-name |
-u |
否 |
用户名。 |
--job-name |
-j |
否 |
任务名。 |
命令示例
本节以Windows为例介绍eihealth-toolkit的使用过程,Linux和macOS环境使用方法基本相同,可参考。
- 使用health get job -s命令获取模板,详细的模板介绍和使用请参见获取作业模板。
- 获取作业详情,以模板方式展示。
health get job 000c6057-cc6c-11ed-bbec-fa163ef30f89 job: id: 000c6057-cc6c-11ed-bbec-fa163ef30f89 name: job-7402 description: "" priority: 0 timeout: 1440 output_dir: /job-7402-de91a3e0-076c-4327-a41c-8e88c7aec6ae workflow_id: f1af14bb-cc69-11ed-bbec-fa163ef30f89 io_acc_id: "" node_labels: [] tasks: - task_name: task-1-test-echo inputs: [] resources: cpu: 0.1C memory: 0.1G gpu: "0" tool_type: workflow tool_id: f1af14bb-cc69-11ed-bbec-fa163ef30f89 labels: []
- 获取作业详情,以json方式展示。
health get job f17a3542-3f7c-11eb-868a-fa163e3ddba1 --detail { "jobs": [{ "id": "2", "name": "zx-1030-mkdir", "description": "测试文件创建", "priority": 0, "timeout": 1440, "output_dir": "", "status": "SUCCEEDED", "create_time": "2021-01-20T03:38:14Z", "finish_time": "2021-01-20T03:43:23Z", "tool_info": { "tool_id": "", "tool_name": "", "tool_version": "", "tool_type": "" }, "tasks": [{ "task_name": "task0", "display_name": "", "output_dir": "", "whole_output_dir": "", "resources": { "cpu": "0.1C", "memory": "0.1G", "gpu_type": "", "gpu": "0" }, "inputs": [{ "name": "in-dir", "values": [ "ei_eihealth_x00356764_02:/zx-1030/" ] }, { "name": "in-str", "values": [ "mkdir1030" ] } ], "app_info": { "app_id": "2", "app_name": "zx-1030-mkdir", "app_version": "1.0.0", "app_src_project_name": "", "app_labels": [], "app_summary": "", "app_description": "", "app_image": "ei_eihealth_x00356764_02/modelarts-base-cpu-py3:custom-2.0.2", "app_commands": [ "mkdir ${in-dir}${in-str}" ], "app_input_parameters": [{ "name": "in-dir", "pattern": "", "type": "DIRECTORY", "required": true, "description": "" }, { "name": "in-str", "pattern": "", "type": "STRING", "required": true, "description": "" } ], "app_output_parameters": [] } }], "task_runtime_info": [{ "task_name": "task0", "status": "SUCCEEDED", "create_time": "2021-01-20 11:38:22", "finish_time": "2021-01-20 11:43:22", "run_time": "5m0s" }], "dag": { "task0": {} }, "io_acc_expected_usage": 10, "io_acc_info": { "id": "35673038-d57b-4dab-942a-72cf3e11e7df", "type": "IO_PERFORMANCE_BANDWIDTH", "space": 500, "free_space": 500.0 } }], "count": 1 }
- 获取作业列表。
health get job #不带任何参数默认获取100条 job_id job_name tool_name tool_version tool_type status user_name create_time finish_time labels 4b682e15-ab92-11ee-a057-fa163ef319da cli-demo-job cp-test 2.0.0 workflow PENDING wwx-test-admin 2024-01-05 14:18:51 -- e7e55c6e-aaf6-11ee-a057-fa163ef319da cli-demo-job-import cli-demo-workflow 4.0.0 workflow FAILED wwx-test-admin 2024-01-04 19:46:32 2024-01-04 19:47:50 aee9e91a-aaf6-11ee-a057-fa163ef319da job-6685 cli-demo-workflow 4.0.0 workflow FAILED wwx-test-admin 2024-01-04 19:44:56 2024-01-04 19:45:50 58a8f13b-aaf3-11ee-a057-fa163ef319da job cp-test 2.0.0 workflow FAILED wwx-test-admin 2024-01-04 19:21:03 2024-01-04 19:23:54 35ff73b3-aaf3-11ee-a057-fa163ef319da job cp-test 2.0.0 workflow SUCCEEDED wwx-test-admin 2024-01-04 19:20:05 2024-01-04 19:24:52 24b72eee-aaf3-11ee-a057-fa163ef319da job cp-test 2.0.0 workflow SUCCEEDED wwx-test-admin 2024-01-04 19:19:36 2024-01-04 19:25:10 4ccef1fb-aaf2-11ee-a057-fa163ef319da job cp-test 2.0.0 workflow SUCCEEDED wwx-test-admin 2024-01-04 19:13:34 2024-01-04 19:17:34 health get job -j cli-demo-job job_id job_name tool_name tool_version tool_type status user_name create_time finish_time labels 70f1baa8-ab96-11ee-a057-fa163ef319da cli-demo-job cp-test 2.0.0 workflow SUCCEEDED wwx-test-admin 2024-01-05 14:48:32 2024-01-05 14:55:13 6c6098f0-ab96-11ee-a057-fa163ef319da cli-demo-job cp-test 2.0.0 workflow SUCCEEDED wwx-test-admin 2024-01-05 14:48:24 2024-01-05 14:54:25 health get job -l 3 同 health get job -l 3 -o 0 #列出当前project的job的基本信息 #表示取3条数据,也就是取1-3 条数据 health get job -o 10 同 health get job -l 100 -o 10 #列出当前project的job的基本信息 #表示取100条数据,也就是取11-110 100 条数据 health get job -l 10 -o 3 #列出当前project的job的基本信息 #表示跳过3条数据,从第4条数据开始取,取10条数据,也就是取4-13 10条数据
- 获取作业事件。
health get job 550e8400-e29b-41d4-a716-446655440000 --event ------------------------------------------------------------------------------------------------------------------------ 成功关联执行器 2024-01-05 14:18:51 ------------------------------------------------------------------------------------------------------------------------ 执行 create, 共计 1 个子任务 2024-01-05 14:18:51 ------------------------------------------------------------------------------------------------------------------------ 执行 create, 共计 1 个子任务 2024-01-05 14:18:51 ------------------------------------------------------------------------------------------------------------------------ 创建k8s Job对象 task-3-two-cp-0-bd5e1f7dac10005f 成功. 2024-01-05 14:18:51 ------------------------------------------------------------------------------------------------------------------------ 等待任务 task-3-two-cp-0-bd5e1f7dac10005f 执行完成 2024-01-05 14:18:56 ------------------------------------------------------------------------------------------------------------------------ 元素(task-3-two-cp-0)第1次重试执行(create),当前异常:Failed to wait the Job(task-3-two-cp-0-bd5e1f7dac10005f) has desiredReplicas: the pod list of job:task-3-two-cp-0-bd5e1f7dac10005f is empty . 2024-01-05 14:18:51 ------------------------------------------------------------------------------------------------------------------------ 创建k8s Job对象 task-2-cp-dir-0-bd5e1f7dac10005f 成功. 2024-01-05 14:18:56 ------------------------------------------------------------------------------------------------------------------------
- 获取作业某一task事件。
health get job 550e8400-e29b-41d4-a716-446655440000 --event --task task-lmx-job-1 Task event list: Status Times Type Details First Report Time Last Report Time SuccessfulCreate 1 Normal Created pod: task-1-rename-0-1b840133ac100049-hkppv 2022-05-24 18:04:55 2022-05-24 18:04:55 JobIsComplete 1 Normal Pod exits with success, the job is complete 2022-05-24 18:07:09 2022-05-24 18:07:09 Task instances list: Name Status PodIP Node RestartCount Request/Limit(CPU) Request/Limit(Memory) CreateTime task-1-rename-0-1b840133ac100049-hkppv Succeeded 172.16.1.20 192.168.125.40 0 / / 2022-05-24T10:04:55Z
- 获取并发task的实例事件。
health get job c5b3d272-f398-11ec-845a-fa163ef3fac0 --task task-1-test-bingfasmial;1 --event Task event list: Status Times Type Details First Report Time Last Report Time SuccessfulCreate 1 Normal Created pod: task-1-test-bingfasmial-1-59620029ac100038-jkdpt 2022-06-24 16:37:20 2022-06-24 16:37:20 JobIsComplete 1 Normal Pod exits with success, the job is complete 2022-06-24 16:37:23 2022-06-24 16:37:23 Task instances list: Name PodIP Node RestartCount Request/Limit(CPU) Request/Limit(Memory) CreateTime task-1-test-bingfasmial-1-59620029ac100038-jkdpt 172.16.3.37 192.168.54.255 0 1/1 1G/1G 2022-06-24 16:37:20
- 获取作业某一task日志。
health get job 550e8400-e29b-41d4-a716-446655440000 --log ./test/demo.log --task task-xxx-job-1 download the log of task task-lmx-job-1 successfully!
- 获取作业列表。
health get job --status Failed --user-name ei_eihealth --create-from-time "2022-12-15 00:40:11" --create-to-time "2022-12-17 00:40:11" --finish-from-time "2022-12-14 17:05:09" --finish-to-time "2022-12-19 23:04:07" --labels "label1,lab_el-A" --job-name h-err-1 --workflow-name herr --limit 1 --offset 1 job_id job_name tool_name tool_version tool_type status user_name create_time finish_time labels 8a6078d9-c307-11ed-a824-fa163e504fdd job-4127-01 new-01 wewe workflow FAILED ei_eihealth_h00541446_01 2023-03-15 16:01:07 2023-03-15 16:02:51 label1,lab_el-A