更新时间:2024-05-14 GMT+08:00
分享

查询作业详情

使用get命令查询作业的详细信息,该命令同时可以用于获取作业模板。

命令结构

health get job ID [flags]
表1 参数说明

参数

简写

是否必选

说明

ID

  • 不选此参数时,列出当前所在项目的所有作业信息。
  • 指定job-id时,列出具体作业的信息。可以同时指定多个job-id。
    • 不带--detail参数,以yaml格式展示作业基本信息。
    • 带--detail参数,以json格式展示作业基本信息。

--detail

-d

配合ID使用,返回作业的详细信息。

--sample

-s

获取作业模板,模板为yaml格式。

--limit

-l

代表当次请求获取的最大查询条数(默认为10)。

--offset

-o

偏移量,从第几条数据开始查询,默认为0。

--event

-e

获取作业事件或者作业某一task事件,单独使用表示获取作业事件;与--task一起使用表示获取某一个task事件,并同时输出task实例列表。

--log

-g

本地存放task日志的路径,必须与--task一起使用以获取作业某一task的日志。

--task

-a

task名称。如果是并发的task,那么默认获取索引号为0的task实例,如果要查看别的实例,格式: --task task名称;实例索引,如--task task-1;1。

--finish-from-time

-x

查询任务完成起始时间。例如,2006-01-02 15:04:05。

--finish-to-time

-y

查询任务完成结束时间。例如,2006-01-02 15:04:05。

--create-from-time

-c

询任务创建起始时间,例子:--create-from-time="2006-01-02 15:04:05"。

--create-to-time

-m

询任务创建结束时间,例子:--create-to-time="2006-01-02 15:04:05"。

--labels

-k

作业标签列表。以","分隔,如:"a,b"。

--status

-q

作业状态(用于获取作业列表),取值:Succeeded、Running、Pending、Failed、Cancelling、Cancelled、Unknown。

--workflow-name

-t

流程名。

--user-name

-u

用户名。

--job-name

-j

任务名。

命令示例

本节以Windows为例介绍eihealth-toolkit的使用过程,Linux和macOS环境使用方法基本相同,可参考。

  • 使用health get job -s命令获取模板,详细的模板介绍和使用请参见获取作业模板
  • 获取作业详情,以模板方式展示。
    health get job 000c6057-cc6c-11ed-bbec-fa163ef30f89
    job:
      id: 000c6057-cc6c-11ed-bbec-fa163ef30f89
      name: job-7402
      description: ""
      priority: 0
      timeout: 1440
      output_dir: /job-7402-de91a3e0-076c-4327-a41c-8e88c7aec6ae
      workflow_id: f1af14bb-cc69-11ed-bbec-fa163ef30f89
      io_acc_id: ""
      node_labels: []
      tasks:
      - task_name: task-1-test-echo
        inputs: []
        resources:
          cpu: 0.1C
          memory: 0.1G
          gpu: "0"
      tool_type: workflow
      tool_id: f1af14bb-cc69-11ed-bbec-fa163ef30f89
      labels: []                                       
    
  • 获取作业详情,以json方式展示。
    health get job f17a3542-3f7c-11eb-868a-fa163e3ddba1 --detail
    
    {
    
    	"jobs": [{
    		"id": "2",
    		"name": "zx-1030-mkdir",
    		"description": "测试文件创建",
    		"priority": 0,
    		"timeout": 1440,
    		"output_dir": "",
    		"status": "SUCCEEDED",
    		"create_time": "2021-01-20T03:38:14Z",
    		"finish_time": "2021-01-20T03:43:23Z",
    		"tool_info": {
    			"tool_id": "",
    			"tool_name": "",
    			"tool_version": "",
    			"tool_type": ""
    		},
    		"tasks": [{
    			"task_name": "task0",
    			"display_name": "",
    			"output_dir": "",
    			"whole_output_dir": "",
    			"resources": {
    				"cpu": "0.1C",
    				"memory": "0.1G",
    				"gpu_type": "",
    				"gpu": "0"
    			},
    			"inputs": [{
    					"name": "in-dir",
    					"values": [
    						"ei_eihealth_x00356764_02:/zx-1030/"
    					]
    				},
    				{
    					"name": "in-str",
    					"values": [
    						"mkdir1030"
    					]
    				}
    			],
    			"app_info": {
    				"app_id": "2",
    				"app_name": "zx-1030-mkdir",
    				"app_version": "1.0.0",
    				"app_src_project_name": "",
    				"app_labels": [],
    				"app_summary": "",
    				"app_description": "",
    				"app_image": "ei_eihealth_x00356764_02/modelarts-base-cpu-py3:custom-2.0.2",
    				"app_commands": [
    					"mkdir ${in-dir}${in-str}"
    				],
    				"app_input_parameters": [{
    						"name": "in-dir",
    						"pattern": "",
    						"type": "DIRECTORY",
    						"required": true,
    						"description": ""
    					},
    					{
    						"name": "in-str",
    						"pattern": "",
    						"type": "STRING",
    						"required": true,
    						"description": ""
    					}
    				],
    				"app_output_parameters": []
    			}
    		}],
    		"task_runtime_info": [{
    			"task_name": "task0",
    			"status": "SUCCEEDED",
    			"create_time": "2021-01-20 11:38:22",
    			"finish_time": "2021-01-20 11:43:22",
    			"run_time": "5m0s"
    		}],
    		"dag": {
    			"task0": {}
    		},
    		"io_acc_expected_usage": 10,
    		"io_acc_info": {
    			"id": "35673038-d57b-4dab-942a-72cf3e11e7df",
    			"type": "IO_PERFORMANCE_BANDWIDTH",
    			"space": 500,
    			"free_space": 500.0
    		}
    	}],
    	"count": 1
    }
  • 获取作业列表。
    health get job #不带任何参数默认获取100条
    job_id                                    job_name                  tool_name              tool_version  tool_type     status         user_name           create_time           finish_time           labels
    4b682e15-ab92-11ee-a057-fa163ef319da      cli-demo-job              cp-test                2.0.0         workflow      PENDING        wwx-test-admin      2024-01-05 14:18:51   --
    e7e55c6e-aaf6-11ee-a057-fa163ef319da      cli-demo-job-import       cli-demo-workflow      4.0.0         workflow      FAILED         wwx-test-admin      2024-01-04 19:46:32   2024-01-04 19:47:50
    aee9e91a-aaf6-11ee-a057-fa163ef319da      job-6685                  cli-demo-workflow      4.0.0         workflow      FAILED         wwx-test-admin      2024-01-04 19:44:56   2024-01-04 19:45:50
    58a8f13b-aaf3-11ee-a057-fa163ef319da      job                       cp-test                2.0.0         workflow      FAILED         wwx-test-admin      2024-01-04 19:21:03   2024-01-04 19:23:54
    35ff73b3-aaf3-11ee-a057-fa163ef319da      job                       cp-test                2.0.0         workflow      SUCCEEDED      wwx-test-admin      2024-01-04 19:20:05   2024-01-04 19:24:52
    24b72eee-aaf3-11ee-a057-fa163ef319da      job                       cp-test                2.0.0         workflow      SUCCEEDED      wwx-test-admin      2024-01-04 19:19:36   2024-01-04 19:25:10
    4ccef1fb-aaf2-11ee-a057-fa163ef319da      job                       cp-test                2.0.0         workflow      SUCCEEDED      wwx-test-admin      2024-01-04 19:13:34   2024-01-04 19:17:34
    health get job -j cli-demo-job
    job_id                                    job_name               tool_name            tool_version  tool_type     status         user_name           create_time           finish_time           labels
    70f1baa8-ab96-11ee-a057-fa163ef319da      cli-demo-job           cp-test              2.0.0         workflow      SUCCEEDED      wwx-test-admin      2024-01-05 14:48:32   2024-01-05 14:55:13
    6c6098f0-ab96-11ee-a057-fa163ef319da      cli-demo-job           cp-test              2.0.0         workflow      SUCCEEDED      wwx-test-admin      2024-01-05 14:48:24   2024-01-05 14:54:25
    health get job -l 3 同 health get job -l 3 -o 0
    #列出当前project的job的基本信息
    #表示取3条数据,也就是取1-3 条数据
    
    health get job -o 10 同 health get job -l 100  -o 10
    #列出当前project的job的基本信息
    #表示取100条数据,也就是取11-110 100 条数据
    
    health get job -l 10 -o 3
    #列出当前project的job的基本信息
    #表示跳过3条数据,从第4条数据开始取,取10条数据,也就是取4-13 10条数据
  • 获取作业事件。
    health get job 550e8400-e29b-41d4-a716-446655440000 --event
    
    ------------------------------------------------------------------------------------------------------------------------
    成功关联执行器
    2024-01-05 14:18:51
    ------------------------------------------------------------------------------------------------------------------------
    执行 create, 共计 1 个子任务
    2024-01-05 14:18:51
    ------------------------------------------------------------------------------------------------------------------------
    执行 create, 共计 1 个子任务
    2024-01-05 14:18:51
    ------------------------------------------------------------------------------------------------------------------------
    创建k8s Job对象 task-3-two-cp-0-bd5e1f7dac10005f 成功.
    2024-01-05 14:18:51
    ------------------------------------------------------------------------------------------------------------------------
    等待任务 task-3-two-cp-0-bd5e1f7dac10005f 执行完成
    2024-01-05 14:18:56
    ------------------------------------------------------------------------------------------------------------------------
    元素(task-3-two-cp-0)第1次重试执行(create),当前异常:Failed to wait the Job(task-3-two-cp-0-bd5e1f7dac10005f) has desiredReplicas: the pod list of job:task-3-two-cp-0-bd5e1f7dac10005f is empty .
    2024-01-05 14:18:51
    ------------------------------------------------------------------------------------------------------------------------
    创建k8s Job对象 task-2-cp-dir-0-bd5e1f7dac10005f 成功.
    2024-01-05 14:18:56
    ------------------------------------------------------------------------------------------------------------------------
  • 获取作业某一task事件。
    health get job 550e8400-e29b-41d4-a716-446655440000 --event --task task-lmx-job-1
    
    Task event list:
    Status              Times  Type        Details                                                First Report Time       Last Report Time
    SuccessfulCreate    1      Normal      Created pod: task-1-rename-0-1b840133ac100049-hkppv    2022-05-24 18:04:55     2022-05-24 18:04:55
    JobIsComplete       1      Normal      Pod exits with success, the job is complete            2022-05-24 18:07:09     2022-05-24 18:07:09
    
    Task instances list:
    Name                                            Status          PodIP           Node                    RestartCount    Request/Limit(CPU)      Request/Limit(Memory)   CreateTime
    task-1-rename-0-1b840133ac100049-hkppv          Succeeded       172.16.1.20     192.168.125.40          0               /                       /                       2022-05-24T10:04:55Z
  • 获取并发task的实例事件。
    health get job c5b3d272-f398-11ec-845a-fa163ef3fac0 --task task-1-test-bingfasmial;1 --event
    
    Task event list:
    Status              Times  Type        Details                                                          First Report Time       Last Report Time
    SuccessfulCreate    1      Normal      Created pod: task-1-test-bingfasmial-1-59620029ac100038-jkdpt    2022-06-24 16:37:20     2022-06-24 16:37:20
    JobIsComplete       1      Normal      Pod exits with success, the job is complete                      2022-06-24 16:37:23     2022-06-24 16:37:23
    
    Task instances list:
    Name                                                    PodIP           Node                    RestartCount    Request/Limit(CPU)      Request/Limit(Memory)   CreateTime
    task-1-test-bingfasmial-1-59620029ac100038-jkdpt        172.16.3.37     192.168.54.255          0               1/1                     1G/1G                   2022-06-24 16:37:20
  • 获取作业某一task日志。
    health get job 550e8400-e29b-41d4-a716-446655440000 --log ./test/demo.log --task task-xxx-job-1
    download the log of task task-lmx-job-1 successfully!
  • 获取作业列表。
    health get job --status Failed --user-name ei_eihealth  --create-from-time "2022-12-15 00:40:11" --create-to-time "2022-12-17 00:40:11"  --finish-from-time "2022-12-14 17:05:09" --finish-to-time "2022-12-19 23:04:07" --labels "label1,lab_el-A"
      --job-name h-err-1 --workflow-name herr --limit 1 --offset 1
    
      job_id                                  job_name       tool_name  tool_version  tool_type   status    user_name                   create_time           finish_time           labels
    8a6078d9-c307-11ed-a824-fa163e504fdd    job-4127-01    new-01     wewe          workflow    FAILED    ei_eihealth_h00541446_01    2023-03-15 16:01:07   2023-03-15 16:02:51   label1,lab_el-A

相关文档