文档首页/ 数据仓库服务 GaussDB(DWS)/ 性能白皮书/ 测试方法/ 执行查询与结果收集/ 通过编写shell脚本自动化执行查询和结果收集
更新时间:2024-10-25 GMT+08:00

通过编写shell脚本自动化执行查询和结果收集

  1. 登录ECS,进入到/opt目录下,使用vim命令生成query.conf和run_query.sh两个脚本文件。脚本内容如下,编辑后按:wq!保存脚本配置。

    run_query.sh脚本如下:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    #!/bin/bash
    script_path=$(cd `dirname $0`;pwd)
    query_mode=$1
    query_path=$2
    query_object=$3
    query_log=${script_path}/query_log_`date +%y%m%d_%H%M%S`
    source ${script_path}/query.conf
    
    function usage()
    {
        echo "[NOTICE]: This script is used to run queries and collect cost time, according to sepcified path and query file name."
        echo "          You can run the script as below:" 
        echo -e ""
        echo "          1. config the query.conf file."
        echo "          2. run the script in batch mode. "
        echo "          eg. sh run_query.sh batch [query file's absolute path]"
        echo -e ""
        echo "          3. run the script in single mode."
        echo "          eg. sh run_query.sh single [query file's absolute path] [specified query file name]"
    }
    
    function log_file_init()
    {
        mkdir -p ${query_log}/explain_log
        mkdir -p ${query_log}/pre_warm_log
        mkdir -p ${query_log}/query_test_log
        touch ${query_log}/query_result.csv
        echo "query name,cost time1,cost time2,cost time3,average cost" > ${query_log}/query_result.csv
    }
    
    function single_query()
    {
        echo "[INFO]: Single mode query is to start."
        echo "*****${query_object} begin*****"
        collect_plan
        pre_warm
        query_test
        echo "*****${query_object} end*****"
        echo "[INFO]: Single mode query is finished."
        echo "[NOTICE]: Get more details by query_result.csv and other logs in ${query_log}."
    }
    
    function batch_query()
    {
        echo "[INFO]: Batch mode query is to start."
        for query_object in `ls ${query_path}`
        do
            echo "*****${query_object} begin*****"
            collect_plan
            pre_warm
            query_test
            echo "*****${query_object} end*****"
        done
        echo "[INFO]: Batch mode query is finished."
        echo "[NOTICE]: Get more details by query_result.csv and other logs in ${query_log}."
    }
    
    function collect_plan()
    {
        echo "[STEP1]: Starting to collect plan."
        echo "explain performance" > ${query_log}/explain_log/${query_object}.tmp
        cat ${query_path}/${query_object} >> ${query_log}/explain_log/${query_object}.tmp
        gsql -h ${cluster_ip} -d ${db_name} -p ${db_port} -U ${db_user} -W "${user_pwd}" -f ${query_log}/explain_log/${query_object}.tmp > ${query_log}/explain_log/${query_object}.explain 2>&1
        echo "[STEP1]: Finished."
    }
    
    function pre_warm()
    {
        echo "[STEP2]: Starting to pre-warm."
        for i in {1..2}
        do
            gsql -h ${cluster_ip} -d ${db_name} -p ${db_port} -U ${db_user} -W "${user_pwd}" -f ${query_path}/${query_object} > ${query_log}/pre_warm_log/${query_object}.pre${i} 2>&1
        done
        echo "[STEP2]: Finished."
    }
    
    function query_test()
    {
        time1=""
        time2=""
        time3=""
        echo "[STEP3]: Starting to do real test."
        for i in {1..3}
        do
            gsql -h ${cluster_ip} -d ${db_name} -p ${db_port} -U ${db_user} -W "${user_pwd}" -f ${query_path}/${query_object} > ${query_log}/query_test_log/${query_object}.real${i} 2>&1
            let "`eval echo "time"${i}`=`cat ${query_log}/query_test_log/${query_object}.real${i}|grep "total time:"|awk {'print$3'}`"
        done
        time_ave=`echo "scale=2;(${time1}+${time2}+${time3})/3"|bc -l`
        echo "${query_object},${time1},${time2},${time3},${time_ave}" >> ${query_log}/query_result.csv
        echo "[step3]: Finished. The average time:${time_ave} ms."
    }
    case ${query_mode} in
        single)log_file_init;single_query;;
        batch)log_file_init;batch_query;;
        *)usage;;
    esac
    

    query.conf为集群信息配置文件,包含如下五个变量:

    1
    2
    3
    4
    5
    cluster_ip=127.0.0.1                集群主cn节点内网ip 
    db_name=tpcds_test                  数据库名称
    db_port=6000                        数据库端口号
    db_user=tpcds_user                  数据库用户
    user_pwd=123456                     数据库用户密码
    

  2. 编辑query.conf为集群对应的信息后,先执行source gsql_env命令配置变量后,执行sh run_query.sh即可开始查询执行和结果收集。

    示例:

    sh run_query.sh batch query1000x/

    参数1:批量执行选择batch,单个query执行选择single。

    参数2:tpcds1000x或者tpch1000x query存放的绝对路径。

    参数3:如果参数1选择batch,此参数忽略;如果参数1选择single,此参数为具体执行的query名称,例如Q1。
    • gsql客户端的使用需要每次连接后,都要执行source gsql_env导入变量命令,执行查询脚本前请确认gsql可执行。
    • query1000x文件夹中必须包含tpch 22个标准查询文件Q1~Q22或者tpcds 99个标准查询文件Q1~Q99。
    • run_query.sh脚本依赖bc命令,执行前确认bc命令存在。
    • 每个查询默认会跑6次,第1次收集执行计划,第2,3次预热,第4到第6次正式查询,最终结果取后3次查询的平均值。
    • 查询脚本执行后会立即生成query_log_yymmdd_hhmmss名称的目录,其中:
      • explain_log子目录存放查询计划。
      • pre_warm子目录存放预热执行结果。
      • query_test子目录存放正式查询执行结果。
      • query_result.csv文件,csv格式汇总所有查询的执行结果。