更新时间:2024-10-25 GMT+08:00
通过编写shell脚本自动化执行查询和结果收集
- 登录ECS,进入到/opt目录下,使用vim命令生成query.conf和run_query.sh两个脚本文件。脚本内容如下,编辑后按:wq!保存脚本配置。
run_query.sh脚本如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
#!/bin/bash script_path=$(cd `dirname $0`;pwd) query_mode=$1 query_path=$2 query_object=$3 query_log=${script_path}/query_log_`date +%y%m%d_%H%M%S` source ${script_path}/query.conf function usage() { echo "[NOTICE]: This script is used to run queries and collect cost time, according to sepcified path and query file name." echo " You can run the script as below:" echo -e "" echo " 1. config the query.conf file." echo " 2. run the script in batch mode. " echo " eg. sh run_query.sh batch [query file's absolute path]" echo -e "" echo " 3. run the script in single mode." echo " eg. sh run_query.sh single [query file's absolute path] [specified query file name]" } function log_file_init() { mkdir -p ${query_log}/explain_log mkdir -p ${query_log}/pre_warm_log mkdir -p ${query_log}/query_test_log touch ${query_log}/query_result.csv echo "query name,cost time1,cost time2,cost time3,average cost" > ${query_log}/query_result.csv } function single_query() { echo "[INFO]: Single mode query is to start." echo "*****${query_object} begin*****" collect_plan pre_warm query_test echo "*****${query_object} end*****" echo "[INFO]: Single mode query is finished." echo "[NOTICE]: Get more details by query_result.csv and other logs in ${query_log}." } function batch_query() { echo "[INFO]: Batch mode query is to start." for query_object in `ls ${query_path}` do echo "*****${query_object} begin*****" collect_plan pre_warm query_test echo "*****${query_object} end*****" done echo "[INFO]: Batch mode query is finished." echo "[NOTICE]: Get more details by query_result.csv and other logs in ${query_log}." } function collect_plan() { echo "[STEP1]: Starting to collect plan." echo "explain performance" > ${query_log}/explain_log/${query_object}.tmp cat ${query_path}/${query_object} >> ${query_log}/explain_log/${query_object}.tmp gsql -h ${cluster_ip} -d ${db_name} -p ${db_port} -U ${db_user} -W "${user_pwd}" -f ${query_log}/explain_log/${query_object}.tmp > ${query_log}/explain_log/${query_object}.explain 2>&1 echo "[STEP1]: Finished." } function pre_warm() { echo "[STEP2]: Starting to pre-warm." for i in {1..2} do gsql -h ${cluster_ip} -d ${db_name} -p ${db_port} -U ${db_user} -W "${user_pwd}" -f ${query_path}/${query_object} > ${query_log}/pre_warm_log/${query_object}.pre${i} 2>&1 done echo "[STEP2]: Finished." } function query_test() { time1="" time2="" time3="" echo "[STEP3]: Starting to do real test." for i in {1..3} do gsql -h ${cluster_ip} -d ${db_name} -p ${db_port} -U ${db_user} -W "${user_pwd}" -f ${query_path}/${query_object} > ${query_log}/query_test_log/${query_object}.real${i} 2>&1 let "`eval echo "time"${i}`=`cat ${query_log}/query_test_log/${query_object}.real${i}|grep "total time:"|awk {'print$3'}`" done time_ave=`echo "scale=2;(${time1}+${time2}+${time3})/3"|bc -l` echo "${query_object},${time1},${time2},${time3},${time_ave}" >> ${query_log}/query_result.csv echo "[step3]: Finished. The average time:${time_ave} ms." } case ${query_mode} in single)log_file_init;single_query;; batch)log_file_init;batch_query;; *)usage;; esac
query.conf为集群信息配置文件,包含如下五个变量:
1 2 3 4 5
cluster_ip=127.0.0.1 集群主cn节点内网ip db_name=tpcds_test 数据库名称 db_port=6000 数据库端口号 db_user=tpcds_user 数据库用户 user_pwd=123456 数据库用户密码
- 编辑query.conf为集群对应的信息后,先执行source gsql_env命令配置变量后,执行sh run_query.sh即可开始查询执行和结果收集。
示例:
sh run_query.sh batch query1000x/
参数1:批量执行选择batch,单个query执行选择single。
参数2:tpcds1000x或者tpch1000x query存放的绝对路径。
参数3:如果参数1选择batch,此参数忽略;如果参数1选择single,此参数为具体执行的query名称,例如Q1。- gsql客户端的使用需要每次连接后,都要执行source gsql_env导入变量命令,执行查询脚本前请确认gsql可执行。
- query1000x文件夹中必须包含tpch 22个标准查询文件Q1~Q22或者tpcds 99个标准查询文件Q1~Q99。
- run_query.sh脚本依赖bc命令,执行前确认bc命令存在。
- 每个查询默认会跑6次,第1次收集执行计划,第2,3次预热,第4到第6次正式查询,最终结果取后3次查询的平均值。
- 查询脚本执行后会立即生成query_log_yymmdd_hhmmss名称的目录,其中:
- explain_log子目录存放查询计划。
- pre_warm子目录存放预热执行结果。
- query_test子目录存放正式查询执行结果。
- query_result.csv文件,csv格式汇总所有查询的执行结果。
父主题: 执行查询与结果收集