Help Center> Data Warehouse Service (DWS)> Performance White Paper> Test Methods> Performing Queries and Collecting Results> Using Shell Scripts to Automatically Execute Queries and Collect Results
Updated on 2022-07-26 GMT+08:00

Using Shell Scripts to Automatically Execute Queries and Collect Results

  1. Log in to the ECS, go to the /opt directory, and run the vim commands to generate two script files query.conf and run_query.sh. The script content is as follows. After editing, press :wq! to save the scripts.

    run_query.sh script:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    #!/bin/bash
    script_path=$(cd `dirname $0`;pwd)
    query_mode=$1
    query_path=$2
    query_object=$3
    query_log=${script_path}/query_log_`date +%y%m%d_%H%M%S`
    source ${script_path}/query.conf
    
    function usage()
    {
        echo "[NOTICE]: This script is used to run queries and collect cost time, according to sepcified path and query file name."
        echo "          You can run the script as below:" 
        echo -e ""
        echo "          1. config the query.conf file."
        echo "          2. run the script in batch mode. "
        echo "          eg. sh run_query.sh batch [query file's absolute path]"
        echo -e ""
        echo "          3. run the script in single mode."
        echo "          eg. sh run_query.sh single [query file's absolute path] [specified query file name]"
    }
    
    function log_file_init()
    {
        mkdir -p ${query_log}/explain_log
        mkdir -p ${query_log}/pre_warm_log
        mkdir -p ${query_log}/query_test_log
        touch ${query_log}/query_result.csv
        echo "query name,cost time1,cost time2,cost time3,average cost" > ${query_log}/query_result.csv
    }
    
    function single_query()
    {
        echo "[INFO]: Single mode query is to start."
        echo "*****${query_object} begin*****"
        collect_plan
        pre_warm
        query_test
        echo "*****${query_object} end*****"
        echo "[INFO]: Single mode query is finished."
        echo "[NOTICE]: Get more details by query_result.csv and other logs in ${query_log}."
    }
    
    function batch_query()
    {
        echo "[INFO]: Batch mode query is to start."
        for query_object in `ls ${query_path}`
        do
            echo "*****${query_object} begin*****"
            collect_plan
            pre_warm
            query_test
            echo "*****${query_object} end*****"
        done
        echo "[INFO]: Batch mode query is finished."
        echo "[NOTICE]: Get more details by query_result.csv and other logs in ${query_log}."
    }
    
    function collect_plan()
    {
        echo "[STEP1]: Starting to collect plan."
        echo "explain performance" > ${query_log}/explain_log/${query_object}.tmp
        cat ${query_path}/${query_object} >> ${query_log}/explain_log/${query_object}.tmp
        gsql -h ${cluster_ip} -d ${db_name} -p ${db_port} -U ${db_user} -W "${user_pad}" -f ${query_log}/explain_log/${query_object}.tmp > ${query_log}/explain_log/${query_object}.explain 2>&1
        echo "[STEP1]: Finished."
    }
    
    function pre_warm()
    {
        echo "[STEP2]: Starting to pre-warm."
        for i in {1..2}
        do
            gsql -h ${cluster_ip} -d ${db_name} -p ${db_port} -U ${db_user} -W "${user_pad}" -f ${query_path}/${query_object} > ${query_log}/pre_warm_log/${query_object}.pre${i} 2>&1
        done
        echo "[STEP2]: Finished."
    }
    
    function query_test()
    {
        time1=""
        time2=""
        time3=""
        echo "[STEP3]: Starting to do real test."
        for i in {1..3}
        do
            gsql -h ${cluster_ip} -d ${db_name} -p ${db_port} -U ${db_user} -W "${user_pad}" -f ${query_path}/${query_object} > ${query_log}/query_test_log/${query_object}.real${i} 2>&1
            let "`eval echo "time"${i}`=`cat ${query_log}/query_test_log/${query_object}.real${i}|grep "total time:"|awk {'print$3'}`"
        done
        time_ave=`echo "scale=2;(${time1}+${time2}+${time3})/3"|bc -l`
        echo "${query_object},${time1},${time2},${time3},${time_ave}" >> ${query_log}/query_result.csv
        echo "[step3]: Finished. The average time:${time_ave} ms."
    }
    case ${query_mode} in
        single)log_file_init;single_query;;
        batch)log_file_init;batch_query;;
        *)usage;;
    esac
    

    query.conf is the cluster information configuration file, which contains the following variables:

    1
    2
    3
    4
    5
    cluster_ip=127.0.0.1: private network IP address of the primary Coordinator node in the cluster
    db_name=tpcds_test: database name
    db_port=6000: database port number
    db_user=tpcds_user: database user
    user_pwd=123456: database user password
    

  2. After adding the cluster information to the query.conf script, run the source gsql_env variable and then run the sh run_query.sh command to execute queries and collect results.

    Example: sh run_query.sh batch query1000x/

    Parameter 1: Select batch for batch execution and single for single query execution.

    Parameter 2: absolute path for storing TPC-DS 1000X or TPC-H 1000X queries.

    Parameter 3: If parameter 1 is set to batch, ignore this parameter. If parameter 1 is set to single, set this parameter to the name of the query to be executed, for example, Q1.
    1. To use the gsql client, run the source gsql_env command after each connection. Before running the query script, ensure that gsql is executable.
    2. By default, each query is executed for six times. The execution plan is collected at the first time, warm-up is performed at the second and third times, and formal execution are performed at the fourth to sixth times. The final result is the average value of the three formal execution results.
    3. After the query script is executed, a directory named query_log_yymmdd_hhmmss is generated immediately.
      • The exlain_log subdirectory stores query plans.
      • The pre_warm subdirectory stores the warm-up execution result.
      • The query_test subdirectory stores formal query execution results.
      • The query_result.csv file summarizes the execution results of all queries and stores them in CSV format.