Using Shell Scripts to Automatically Execute Queries and Collect Results

Log in to the ECS, go to the /opt directory, and run the vim commands to generate two script files query.conf and run_query.sh. The script content is as follows. After editing, press :wq! to save the scripts.

run_query.sh script:

#!/bin/bash
script_path=$(cd `dirname $0`;pwd)
query_mode=$1
query_path=$2
query_object=$3
query_log=${script_path}/query_log_`date +%y%m%d_%H%M%S`
source ${script_path}/query.conf

function usage()
{
    echo "[NOTICE]: This script is used to run queries and collect cost time, according to sepcified path and query file name."
    echo "          You can run the script as below:" 
    echo -e ""
    echo "          1. config the query.conf file."
    echo "          2. run the script in batch mode. "
    echo "          eg. sh run_query.sh batch [query file's absolute path]"
    echo -e ""
    echo "          3. run the script in single mode."
    echo "          eg. sh run_query.sh single [query file's absolute path] [specified query file name]"
}

function log_file_init()
{
    mkdir -p ${query_log}/explain_log
    mkdir -p ${query_log}/pre_warm_log
    mkdir -p ${query_log}/query_test_log
    touch ${query_log}/query_result.csv
    echo "query name,cost time1,cost time2,cost time3,average cost" > ${query_log}/query_result.csv
}

function single_query()
{
    echo "[INFO]: Single mode query is to start."
    echo "*****${query_object} begin*****"
    collect_plan
    pre_warm
    query_test
    echo "*****${query_object} end*****"
    echo "[INFO]: Single mode query is finished."
    echo "[NOTICE]: Get more details by query_result.csv and other logs in ${query_log}."
}

function batch_query()
{
    echo "[INFO]: Batch mode query is to start."
    for query_object in `ls ${query_path}`
    do
        echo "*****${query_object} begin*****"
        collect_plan
        pre_warm
        query_test
        echo "*****${query_object} end*****"
    done
    echo "[INFO]: Batch mode query is finished."
    echo "[NOTICE]: Get more details by query_result.csv and other logs in ${query_log}."
}

function collect_plan()
{
    echo "[STEP1]: Starting to collect plan."
    echo "explain performance" > ${query_log}/explain_log/${query_object}.tmp
    cat ${query_path}/${query_object} >> ${query_log}/explain_log/${query_object}.tmp
    gsql -h ${cluster_ip} -d ${db_name} -p ${db_port} -U ${db_user} -W "${user_pad}" -f ${query_log}/explain_log/${query_object}.tmp > ${query_log}/explain_log/${query_object}.explain 2>&1
    echo "[STEP1]: Finished."
}

function pre_warm()
{
    echo "[STEP2]: Starting to pre-warm."
    for i in {1..2}
    do
        gsql -h ${cluster_ip} -d ${db_name} -p ${db_port} -U ${db_user} -W "${user_pad}" -f ${query_path}/${query_object} > ${query_log}/pre_warm_log/${query_object}.pre${i} 2>&1
    done
    echo "[STEP2]: Finished."
}

function query_test()
{
    time1=""
    time2=""
    time3=""
    echo "[STEP3]: Starting to do real test."
    for i in {1..3}
    do
        gsql -h ${cluster_ip} -d ${db_name} -p ${db_port} -U ${db_user} -W "${user_pad}" -f ${query_path}/${query_object} > ${query_log}/query_test_log/${query_object}.real${i} 2>&1
        let "`eval echo "time"${i}`=`cat ${query_log}/query_test_log/${query_object}.real${i}|grep "total time:"|awk {'print$3'}`"
    done
    time_ave=`echo "scale=2;(${time1}+${time2}+${time3})/3"|bc -l`
    echo "${query_object},${time1},${time2},${time3},${time_ave}" >> ${query_log}/query_result.csv
    echo "[step3]: Finished. The average time:${time_ave} ms."
}
case ${query_mode} in
    single)log_file_init;single_query;;
    batch)log_file_init;batch_query;;
    *)usage;;
esac

query.conf is the cluster information configuration file, which contains the following variables:

cluster_ip=127.0.0.1: private network IP address of the primary Coordinator node in the cluster
db_name=tpcds_test: database name
db_port=6000: database port number
db_user=tpcds_user: database user
user_pwd=123456: database user password

After adding the cluster information to the query.conf script, run the source gsql_env variable and then run the sh run_query.sh command to execute queries and collect results.

Example: sh run_query.sh batch query1000x/

Parameter 1: Select batch for batch execution and single for single query execution.

Parameter 2: absolute path for storing TPC-DS 1000X or TPC-H 1000X queries.
Parameter 3: If parameter 1 is set to batch, ignore this parameter. If parameter 1 is set to single, set this parameter to the name of the query to be executed, for example, Q1.
1. To use the gsql client, run the source gsql_env command after each connection. Before running the query script, ensure that gsql is executable.
2. By default, each query is executed for six times. The execution plan is collected at the first time, warm-up is performed at the second and third times, and formal execution are performed at the fourth to sixth times. The final result is the average value of the three formal execution results.
3. After the query script is executed, a directory named query_log_yymmdd_hhmmss is generated immediately.
  
  The exlain_log subdirectory stores query plans.
  
  The pre_warm subdirectory stores the warm-up execution result.
  
  The query_test subdirectory stores formal query execution results.
  
  The query_result.csv file summarizes the execution results of all queries and stores them in CSV format.