更新时间:2024-08-20 GMT+08:00
TPC-DS测试集
您可以通过命令生成方法生成TPC-DS测试集,也可以直接通过脚本生成方法生成,另我们已经给出前面20个的TPC-DS测试集供您参考。
命令生成方法
TPC-DS标准99个SQL查询语句可用如下方法生成:
- 准备工作。生成TPC-DS查询语句前需要修改query_templates目录下的文件:
- 登录测试过程申请的ECS,进入/data1/script/tpcds-kit/DSGen-software-code-3.2.0rc1/query_templates目录:
1
cd /data1/script/tpcds-kit/DSGen-software-code-3.2.0rc1/query_templates
- 新建文件hwdws.tpl,内容为:
1 2 3 4 5
define __LIMITA = ""; define __LIMITB = ""; define __LIMITC = "limit %d"; define _BEGIN = "-- begin query " + [_QUERY] + " in stream " + [_STREAM] + " using template " + [_TEMPLATE]; define _END = "-- end query " + [_QUERY] + " in stream " + [_STREAM] + " using template " + [_TEMPLATE];
- 因TPC-DS工具中SQL语句生成模板有语法错误,需修改query77.tpl,将135行的‘, coalesce(returns, 0) returns’改为‘, coalesce(returns, 0) as returns’。
- 登录测试过程申请的ECS,进入/data1/script/tpcds-kit/DSGen-software-code-3.2.0rc1/query_templates目录:
- 执行以下命令生成查询语句:
1 2
cd /data1/script/tpcds-kit/DSGen-software-code-3.2.0rc1/tools ./dsqgen -input ../query_templates/templates.lst -directory ../query_templates/ -scale 1000 -dialect hwdws
执行后会生成query_0.sql文件,里面放着99个标准SQL语句,需要手动去切分成99个文件。
- 生成的标准查询中如下日期函数语法在GaussDB(DWS)暂不支持,需要手动进行修改:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
Q5: and (cast('2001-08-19' as date) + 14 days) 修改为 and (cast('2001-08-19' as date) + 14) Q12:and (cast('1999-02-28' as date) + 30 days) 修改为 and (cast('1999-02-28' as date) + 30) Q16:(cast('1999-4-01' as date) + 60 days) 修改为 (cast('1999-4-01' as date) + 60) Q20:and (cast('1998-05-05' as date) + 30 days) 修改为 and (cast('1998-05-05' as date) + 30) Q21:and d_date between (cast ('2000-05-19' as date) - 30 days) 修改为 and d_date between (cast ('2000-05-19' as date) - 30) and (cast ('2000-05-19' as date) + 30 days) 修改为 and (cast ('2000-05-19' as date) + 30) Q32:(cast('1999-02-22' as date) + 90 days) 修改为 (cast('1999-02-22' as date) + 90) Q37:and d_date between cast('1998-04-29' as date) and (cast('1998-04-29' as date) + 60 days) 修改为 and d_date between cast('1998-04-29' as date) and (cast('1998-04-29' as date) + 60) Q40:and d_date between (cast ('2002-05-10' as date) - 30 days) 修改为 and d_date between (cast ('2002-05-10' as date) - 30) and (cast ('2002-05-10' as date) + 30 days) 修改为 and (cast ('2002-05-10' as date) + 30) Q77:and (cast('1999-08-29' as date) + 30 days) 修改为 and (cast('1999-08-29' as date) + 30) Q80:and (cast('2002-08-04' as date) + 30 days) 修改为 and (cast('2002-08-04' as date) + 30) Q82:and d_date between cast('1998-01-18' as date) and (cast('1998-01-18' as date) + 60 days) 修改为 and d_date between cast('1998-01-18' as date) and (cast('1998-01-18' as date) + 60) Q92:(cast('2001-01-26' as date) + 90 days) 修改为 (cast('2001-01-26' as date) + 90) Q94:(cast('1999-5-01' as date) + 60 days) 修改为 (cast('1999-5-01' as date) + 60) Q95:(cast('1999-4-01' as date) + 60 days) 修改为 (cast('1999-4-01' as date) + 60) Q98:and (cast('2002-04-01' as date) + 30 days) 修改为 and (cast('2002-04-01' as date) + 30)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
# -*- coding: utf-8 -*- import os import sys import re import stat def gen_thp_seq(dsqgen_file, scale, query_dir): flags = os.O_WRONLY | os.O_CREAT | os.O_EXCL modes = stat.S_IWUSR | stat.S_IRUSR if not os.path.exists(query_dir): os.mkdir(query_dir) cmd = dsqgen_file + ' -input ../query_templates/templates.lst -directory ../query_templates/ -scale ' + str(scale) + ' -dialect hwdws' os.system(cmd) with open('query_0.sql', 'r') as f1: line = f1.readline() queryname = '' while line: if '-- begin' in line.strip(): #line:'-- begin query 1 in stream 0 using template query96.tpl\n' queryname = line.split(' ')[-1][5:-5] fquery = os.fdopen(os.open(query_dir + '/Q' + queryname, flags, modes), 'w+') line = f1.readline() continue if not queryname or line == '\n': line = f1.readline() continue if '-- end' in line.strip(): fquery.close() line = f1.readline() continue if 'days)' in line: line = line.replace('days', '') fquery.write(line) line = f1.readline() print("TPCDS Q1~Q99 query store at " + query_dir) os.system('rm -rf query_0.sql') if __name__ == '__main__': if len(sys.argv) != 4: print('Wrong number of parameters!') print('Usage:python3 gen_tpcds_thpseq.py dsqgen_file_path scale query_dir') print("""Parameter: qgen_file_path: tpcds dsqgen文件路径 scale: 生成查询对应的数据规模 query_dir: 生成文件的存放路径""") print("""Example: python3 gen_tpcds_thpseq.py ./dsqgen 1000 tpcds_query1000x""") sys.exit(1) dsqgen_file_path = sys.argv[1] scale = sys.argv[2] query_dir = sys.argv[3] try: if not re.match(r'^\.?\/(\w+\/?)+$', dsqgen_file_path): print("error param qgenfilepath:", dsqgen_file_path) if not re.match(r'\d+', scale): print('error param scale:', scale) if not re.match(r'^\/?(\w+\/?)+$', query_dir): print('error param query_dir:', query_dir) except Exception as ex: print('exception: invalid param!') if not os.path.isfile(dsqgen_file_path): print('The file %s is not exist!' % dsqgen_file_path) sys.exit(1) gen_thp_seq(dsqgen_file_path, int(scale), query_dir) |
将以上脚本保存在/data1/script/tpcds-kit/DSGen-software-code-3.2.0rc1/tools/gen_tpcds_thpseq.py,执行如下命令可获得99个SQL语句,其中1000为数据规模,代表TPCDS 1000x,tpcds_query1000x为生成的SQL语句存放的位置:
cd /data1/script/tpcds-kit/DSGen-software-code-3.2.0rc1/tools/ python3 gen_tpcds_thpseq.py ./dsqgen 1000 tpcds_query1000x
SQL1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
with customer_total_return as (select sr_customer_sk as ctr_customer_sk ,sr_store_sk as ctr_store_sk ,sum(SR_RETURN_AMT_INC_TAX) as ctr_total_return from store_returns ,date_dim where sr_returned_date_sk = d_date_sk and d_year =2001 group by sr_customer_sk ,sr_store_sk) select c_customer_id from customer_total_return ctr1 ,store ,customer where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2 from customer_total_return ctr2 where ctr1.ctr_store_sk = ctr2.ctr_store_sk) and s_store_sk = ctr1.ctr_store_sk and s_state = 'PA' and ctr1.ctr_customer_sk = c_customer_sk order by c_customer_id limit 100; |
SQL2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
with wscs as (select sold_date_sk ,sales_price from (select ws_sold_date_sk sold_date_sk ,ws_ext_sales_price sales_price from web_sales union all select cs_sold_date_sk sold_date_sk ,cs_ext_sales_price sales_price from catalog_sales)), wswscs as (select d_week_seq, sum(case when (d_day_name='Sunday') then sales_price else null end) sun_sales, sum(case when (d_day_name='Monday') then sales_price else null end) mon_sales, sum(case when (d_day_name='Tuesday') then sales_price else null end) tue_sales, sum(case when (d_day_name='Wednesday') then sales_price else null end) wed_sales, sum(case when (d_day_name='Thursday') then sales_price else null end) thu_sales, sum(case when (d_day_name='Friday') then sales_price else null end) fri_sales, sum(case when (d_day_name='Saturday') then sales_price else null end) sat_sales from wscs ,date_dim where d_date_sk = sold_date_sk group by d_week_seq) select d_week_seq1 ,round(sun_sales1/sun_sales2,2) ,round(mon_sales1/mon_sales2,2) ,round(tue_sales1/tue_sales2,2) ,round(wed_sales1/wed_sales2,2) ,round(thu_sales1/thu_sales2,2) ,round(fri_sales1/fri_sales2,2) ,round(sat_sales1/sat_sales2,2) from (select wswscs.d_week_seq d_week_seq1 ,sun_sales sun_sales1 ,mon_sales mon_sales1 ,tue_sales tue_sales1 ,wed_sales wed_sales1 ,thu_sales thu_sales1 ,fri_sales fri_sales1 ,sat_sales sat_sales1 from wswscs,date_dim where date_dim.d_week_seq = wswscs.d_week_seq and d_year = 1999) y, (select wswscs.d_week_seq d_week_seq2 ,sun_sales sun_sales2 ,mon_sales mon_sales2 ,tue_sales tue_sales2 ,wed_sales wed_sales2 ,thu_sales thu_sales2 ,fri_sales fri_sales2 ,sat_sales sat_sales2 from wswscs ,date_dim where date_dim.d_week_seq = wswscs.d_week_seq and d_year = 1999+1) z where d_week_seq1=d_week_seq2-53 order by d_week_seq1; |
SQL3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
select dt.d_year ,item.i_brand_id brand_id ,item.i_brand brand ,sum(ss_ext_sales_price) sum_agg from date_dim dt ,store_sales ,item where dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = item.i_item_sk and item.i_manufact_id = 125 and dt.d_moy=11 group by dt.d_year ,item.i_brand ,item.i_brand_id order by dt.d_year ,sum_agg desc ,brand_id limit 100; |
SQL4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
with year_total as ( select c_customer_id customer_id ,c_first_name customer_first_name ,c_last_name customer_last_name ,c_preferred_cust_flag customer_preferred_cust_flag ,c_birth_country customer_birth_country ,c_login customer_login ,c_email_address customer_email_address ,d_year dyear ,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2) year_total ,'s' sale_type from customer ,store_sales ,date_dim where c_customer_sk = ss_customer_sk and ss_sold_date_sk = d_date_sk group by c_customer_id ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_country ,c_login ,c_email_address ,d_year union all select c_customer_id customer_id ,c_first_name customer_first_name ,c_last_name customer_last_name ,c_preferred_cust_flag customer_preferred_cust_flag ,c_birth_country customer_birth_country ,c_login customer_login ,c_email_address customer_email_address ,d_year dyear ,sum((((cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2) ) year_total ,'c' sale_type from customer ,catalog_sales ,date_dim where c_customer_sk = cs_bill_customer_sk and cs_sold_date_sk = d_date_sk group by c_customer_id ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_country ,c_login ,c_email_address ,d_year union all select c_customer_id customer_id ,c_first_name customer_first_name ,c_last_name customer_last_name ,c_preferred_cust_flag customer_preferred_cust_flag ,c_birth_country customer_birth_country ,c_login customer_login ,c_email_address customer_email_address ,d_year dyear ,sum((((ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2) ) year_total ,'w' sale_type from customer ,web_sales ,date_dim where c_customer_sk = ws_bill_customer_sk and ws_sold_date_sk = d_date_sk group by c_customer_id ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_country ,c_login ,c_email_address ,d_year ) select t_s_secyear.customer_id ,t_s_secyear.customer_first_name ,t_s_secyear.customer_last_name ,t_s_secyear.customer_preferred_cust_flag from year_total t_s_firstyear ,year_total t_s_secyear ,year_total t_c_firstyear ,year_total t_c_secyear ,year_total t_w_firstyear ,year_total t_w_secyear where t_s_secyear.customer_id = t_s_firstyear.customer_id and t_s_firstyear.customer_id = t_c_secyear.customer_id and t_s_firstyear.customer_id = t_c_firstyear.customer_id and t_s_firstyear.customer_id = t_w_firstyear.customer_id and t_s_firstyear.customer_id = t_w_secyear.customer_id and t_s_firstyear.sale_type = 's' and t_c_firstyear.sale_type = 'c' and t_w_firstyear.sale_type = 'w' and t_s_secyear.sale_type = 's' and t_c_secyear.sale_type = 'c' and t_w_secyear.sale_type = 'w' and t_s_firstyear.dyear = 2000 and t_s_secyear.dyear = 2000+1 and t_c_firstyear.dyear = 2000 and t_c_secyear.dyear = 2000+1 and t_w_firstyear.dyear = 2000 and t_w_secyear.dyear = 2000+1 and t_s_firstyear.year_total > 0 and t_c_firstyear.year_total > 0 and t_w_firstyear.year_total > 0 and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end > case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else null end and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end > case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else null end order by t_s_secyear.customer_id ,t_s_secyear.customer_first_name ,t_s_secyear.customer_last_name ,t_s_secyear.customer_preferred_cust_flag limit 100; |
SQL5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
with ssr as (select s_store_id, sum(sales_price) as sales, sum(profit) as profit, sum(return_amt) as returns, sum(net_loss) as profit_loss from ( select ss_store_sk as store_sk, ss_sold_date_sk as date_sk, ss_ext_sales_price as sales_price, ss_net_profit as profit, cast(0 as decimal(7,2)) as return_amt, cast(0 as decimal(7,2)) as net_loss from store_sales union all select sr_store_sk as store_sk, sr_returned_date_sk as date_sk, cast(0 as decimal(7,2)) as sales_price, cast(0 as decimal(7,2)) as profit, sr_return_amt as return_amt, sr_net_loss as net_loss from store_returns ) salesreturns, date_dim, store where date_sk = d_date_sk and d_date between cast('2002-08-05' as date) and (cast('2002-08-05' as date) + 14 ) and store_sk = s_store_sk group by s_store_id) , csr as (select cp_catalog_page_id, sum(sales_price) as sales, sum(profit) as profit, sum(return_amt) as returns, sum(net_loss) as profit_loss from ( select cs_catalog_page_sk as page_sk, cs_sold_date_sk as date_sk, cs_ext_sales_price as sales_price, cs_net_profit as profit, cast(0 as decimal(7,2)) as return_amt, cast(0 as decimal(7,2)) as net_loss from catalog_sales union all select cr_catalog_page_sk as page_sk, cr_returned_date_sk as date_sk, cast(0 as decimal(7,2)) as sales_price, cast(0 as decimal(7,2)) as profit, cr_return_amount as return_amt, cr_net_loss as net_loss from catalog_returns ) salesreturns, date_dim, catalog_page where date_sk = d_date_sk and d_date between cast('2002-08-05' as date) and (cast('2002-08-05' as date) + 14 ) and page_sk = cp_catalog_page_sk group by cp_catalog_page_id) , wsr as (select web_site_id, sum(sales_price) as sales, sum(profit) as profit, sum(return_amt) as returns, sum(net_loss) as profit_loss from ( select ws_web_site_sk as wsr_web_site_sk, ws_sold_date_sk as date_sk, ws_ext_sales_price as sales_price, ws_net_profit as profit, cast(0 as decimal(7,2)) as return_amt, cast(0 as decimal(7,2)) as net_loss from web_sales union all select ws_web_site_sk as wsr_web_site_sk, wr_returned_date_sk as date_sk, cast(0 as decimal(7,2)) as sales_price, cast(0 as decimal(7,2)) as profit, wr_return_amt as return_amt, wr_net_loss as net_loss from web_returns left outer join web_sales on ( wr_item_sk = ws_item_sk and wr_order_number = ws_order_number) ) salesreturns, date_dim, web_site where date_sk = d_date_sk and d_date between cast('2002-08-05' as date) and (cast('2002-08-05' as date) + 14 ) and wsr_web_site_sk = web_site_sk group by web_site_id) select channel , id , sum(sales) as sales , sum(returns) as returns , sum(profit) as profit from (select 'store channel' as channel , 'store' || s_store_id as id , sales , returns , (profit - profit_loss) as profit from ssr union all select 'catalog channel' as channel , 'catalog_page' || cp_catalog_page_id as id , sales , returns , (profit - profit_loss) as profit from csr union all select 'web channel' as channel , 'web_site' || web_site_id as id , sales , returns , (profit - profit_loss) as profit from wsr ) x group by rollup (channel, id) order by channel ,id limit 100; |
SQL6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
select a.ca_state state, count(*) cnt from customer_address a ,customer c ,store_sales s ,date_dim d ,item i where a.ca_address_sk = c.c_current_addr_sk and c.c_customer_sk = s.ss_customer_sk and s.ss_sold_date_sk = d.d_date_sk and s.ss_item_sk = i.i_item_sk and d.d_month_seq = (select distinct (d_month_seq) from date_dim where d_year = 1998 and d_moy = 7 ) and i.i_current_price > 1.2 * (select avg(j.i_current_price) from item j where j.i_category = i.i_category) group by a.ca_state having count(*) >= 10 order by cnt, a.ca_state limit 100; |
SQL7
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
select i_item_id, avg(ss_quantity) agg1, avg(ss_list_price) agg2, avg(ss_coupon_amt) agg3, avg(ss_sales_price) agg4 from store_sales, customer_demographics, date_dim, item, promotion where ss_sold_date_sk = d_date_sk and ss_item_sk = i_item_sk and ss_cdemo_sk = cd_demo_sk and ss_promo_sk = p_promo_sk and cd_gender = 'M' and cd_marital_status = 'U' and cd_education_status = 'College' and (p_channel_email = 'N' or p_channel_event = 'N') and d_year = 1999 group by i_item_id order by i_item_id limit 100; |
SQL8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
select s_store_name ,sum(ss_net_profit) from store_sales ,date_dim ,store, (select ca_zip from ( SELECT substr(ca_zip,1,5) ca_zip FROM customer_address WHERE substr(ca_zip,1,5) IN ( '74804','87276','13428','49436','56281','79805', '46826','68570','20368','28846','41886', '68164','68097','16113','18727','96789', '63317','57937','19554','69911','83554', '84246','61336','46999','25229','15960', '61657','28058','64558','39712','74928', '34018','87826','69733','26479','73630', '88683','61704','81441','42706','54175', '45152','49049','30850','63980','40484', '71665','63755','23769','79855','24308', '28241','16343','25663','85999','46359', '93691','34706','99973','74947','60316', '58637','48063','81363','19268','66228', '78136','16368','99907','58139','17043', '89764','14834','25152','70158','76080', '81251','83972','48635','54671','35602', '10788','57325','46354','92707','41103', '89761','31840','69225','76139','18826', '12556','51692','20579','50965','32136', '71357','16309','82922','59273','40999', '73273','93217','65679','12653','16978', '27319','41973','65580','56237','17799', '53192','63632','37089','65994','58048', '14388','58085','80614','40042','79194', '42268','61913','97332','37349','72146', '52681','18176','39332','89283','69023', '84175','11520','33483','60169','93562', '10097','14536','70276','64042','22822', '87229','51528','70269','44519','48044', '78170','81440','60315','14543','30719', '13240','62325','35517','51529','98085', '79007','16582','95187','15625','88780', '38656','74607','75117','62819','31929', '27665','88890','98611','53527','48652', '10324','62273','17726','36232','38526', '50705','61179','30363','54408','58631', '23622','50319','33299','78829','91267', '25571','60347','44750','62797','10713', '46494','91163','20973','42007','54724', '89203','12561','71116','60404','70589', '66744','46074','69138','34737','25092', '59246','74778','40140','89476','71030', '89861','93207','44996','34850','48752', '79574','29570','76507','79728','43195', '47596','46415','42514','68144','14169', '17041','75747','33630','19378','32618', '78704','75807','76800','80916','87272', '37109','21714','14867','83806','33895', '80637','20658','75224','92772','55791', '58603','31681','38788','91922','42465', '74371','72854','75746','80383','75909', '37151','82077','80604','66771','46075', '58723','15380','83174','53615','50347', '98340','68957','63361','18705','47629', '76013','68572','50588','31168','41563', '38936','88746','19052','75648','46403', '24332','54711','28218','80432','53870', '25049','32562','94211','99803','49133', '21202','50005','17953','14324','85525', '51984','37304','69870','64321','66962', '66453','40619','91199','54400','28804', '65544','27059','31143','20303','52429', '24476','91458','52514','55145','99015', '51657','10001','96434','38325','39628', '83338','62381','67697','61542','86076', '89833','32657','56881','93983','85031', '57530','56318','46934','34740','79458', '88443','15861','56034','24808','32336', '34312','96450','12923','91876','53509', '30241','35816','52377','23946','23644', '16413','35796','59100','21689','49199', '40062','82510','14072','78823','49158', '99933','75399','11365','44799','77549', '19569','39186','78909','68143','70468', '14944','33047','98329','42262','68647', '65754','27357','56372','18073','12363', '64467','26221','32914','70431','42436', '55316','33335','27701','44687','22360', '76124','44007','59525','51574','71555', '43130','64199','19616','94285') intersect select ca_zip from (SELECT substr(ca_zip,1,5) ca_zip,count(*) cnt FROM customer_address, customer WHERE ca_address_sk = c_current_addr_sk and c_preferred_cust_flag='Y' group by ca_zip having count(*) > 10)A1)A2) V1 where ss_store_sk = s_store_sk and ss_sold_date_sk = d_date_sk and d_qoy = 1 and d_year = 1999 and (substr(s_zip,1,2) = substr(V1.ca_zip,1,2)) group by s_store_name order by s_store_name limit 100; |
SQL9
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
select case when (select count(*) from store_sales where ss_quantity between 1 and 20) > 40845849 then (select avg(ss_ext_tax) from store_sales where ss_quantity between 1 and 20) else (select avg(ss_net_paid) from store_sales where ss_quantity between 1 and 20) end bucket1 , case when (select count(*) from store_sales where ss_quantity between 21 and 40) > 5712087 then (select avg(ss_ext_tax) from store_sales where ss_quantity between 21 and 40) else (select avg(ss_net_paid) from store_sales where ss_quantity between 21 and 40) end bucket2, case when (select count(*) from store_sales where ss_quantity between 41 and 60) > 30393328 then (select avg(ss_ext_tax) from store_sales where ss_quantity between 41 and 60) else (select avg(ss_net_paid) from store_sales where ss_quantity between 41 and 60) end bucket3, case when (select count(*) from store_sales where ss_quantity between 61 and 80) > 46385791 then (select avg(ss_ext_tax) from store_sales where ss_quantity between 61 and 80) else (select avg(ss_net_paid) from store_sales where ss_quantity between 61 and 80) end bucket4, case when (select count(*) from store_sales where ss_quantity between 81 and 100) > 29981928 then (select avg(ss_ext_tax) from store_sales where ss_quantity between 81 and 100) else (select avg(ss_net_paid) from store_sales where ss_quantity between 81 and 100) end bucket5 from reason where r_reason_sk = 1 ; |
SQL10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
select cd_gender, cd_marital_status, cd_education_status, count(*) cnt1, cd_purchase_estimate, count(*) cnt2, cd_credit_rating, count(*) cnt3, cd_dep_count, count(*) cnt4, cd_dep_employed_count, count(*) cnt5, cd_dep_college_count, count(*) cnt6 from customer c,customer_address ca,customer_demographics where c.c_current_addr_sk = ca.ca_address_sk and ca_county in ('Clark County','Richardson County','Tom Green County','Sullivan County','Cass County') and cd_demo_sk = c.c_current_cdemo_sk and exists (select * from store_sales,date_dim where c.c_customer_sk = ss_customer_sk and ss_sold_date_sk = d_date_sk and d_year = 2000 and d_moy between 1 and 1+3) and (exists (select * from web_sales,date_dim where c.c_customer_sk = ws_bill_customer_sk and ws_sold_date_sk = d_date_sk and d_year = 2000 and d_moy between 1 ANd 1+3) or exists (select * from catalog_sales,date_dim where c.c_customer_sk = cs_ship_customer_sk and cs_sold_date_sk = d_date_sk and d_year = 2000 and d_moy between 1 and 1+3)) group by cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_count, cd_dep_college_count order by cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_count, cd_dep_college_count limit 100; |
SQL11
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
with year_total as ( select c_customer_id customer_id ,c_first_name customer_first_name ,c_last_name customer_last_name ,c_preferred_cust_flag customer_preferred_cust_flag ,c_birth_country customer_birth_country ,c_login customer_login ,c_email_address customer_email_address ,d_year dyear ,sum(ss_ext_list_price-ss_ext_discount_amt) year_total ,'s' sale_type from customer ,store_sales ,date_dim where c_customer_sk = ss_customer_sk and ss_sold_date_sk = d_date_sk group by c_customer_id ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_country ,c_login ,c_email_address ,d_year union all select c_customer_id customer_id ,c_first_name customer_first_name ,c_last_name customer_last_name ,c_preferred_cust_flag customer_preferred_cust_flag ,c_birth_country customer_birth_country ,c_login customer_login ,c_email_address customer_email_address ,d_year dyear ,sum(ws_ext_list_price-ws_ext_discount_amt) year_total ,'w' sale_type from customer ,web_sales ,date_dim where c_customer_sk = ws_bill_customer_sk and ws_sold_date_sk = d_date_sk group by c_customer_id ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_country ,c_login ,c_email_address ,d_year ) select t_s_secyear.customer_id ,t_s_secyear.customer_first_name ,t_s_secyear.customer_last_name ,t_s_secyear.customer_birth_country from year_total t_s_firstyear ,year_total t_s_secyear ,year_total t_w_firstyear ,year_total t_w_secyear where t_s_secyear.customer_id = t_s_firstyear.customer_id and t_s_firstyear.customer_id = t_w_secyear.customer_id and t_s_firstyear.customer_id = t_w_firstyear.customer_id and t_s_firstyear.sale_type = 's' and t_w_firstyear.sale_type = 'w' and t_s_secyear.sale_type = 's' and t_w_secyear.sale_type = 'w' and t_s_firstyear.dyear = 2001 and t_s_secyear.dyear = 2001+1 and t_w_firstyear.dyear = 2001 and t_w_secyear.dyear = 2001+1 and t_s_firstyear.year_total > 0 and t_w_firstyear.year_total > 0 and case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else 0.0 end > case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else 0.0 end order by t_s_secyear.customer_id ,t_s_secyear.customer_first_name ,t_s_secyear.customer_last_name ,t_s_secyear.customer_birth_country limit 100; |
SQL12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
select i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price ,sum(ws_ext_sales_price) as itemrevenue ,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over (partition by i_class) as revenueratio from web_sales ,item ,date_dim where ws_item_sk = i_item_sk and i_category in ('Music', 'Shoes', 'Children') and ws_sold_date_sk = d_date_sk and d_date between cast('2000-05-14' as date) and (cast('2000-05-14' as date) + 30 ) group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio limit 100; |
SQL13
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales ,store ,customer_demographics ,household_demographics ,customer_address ,date_dim where s_store_sk = ss_store_sk and ss_sold_date_sk = d_date_sk and d_year = 2001 and((ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'U' and cd_education_status = '4 yr Degree' and ss_sales_price between 100.00 and 150.00 and hd_dep_count = 3 )or (ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'D' and cd_education_status = '2 yr Degree' and ss_sales_price between 50.00 and 100.00 and hd_dep_count = 1 ) or (ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'S' and cd_education_status = 'Advanced Degree' and ss_sales_price between 150.00 and 200.00 and hd_dep_count = 1 )) and((ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('IL', 'WI', 'TN') and ss_net_profit between 100 and 200 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('MO', 'OK', 'WA') and ss_net_profit between 150 and 300 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('NE', 'VA', 'GA') and ss_net_profit between 50 and 250 )) ; |
SQL14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 |
with cross_items as (select i_item_sk ss_item_sk from item, (select iss.i_brand_id brand_id ,iss.i_class_id class_id ,iss.i_category_id category_id from store_sales ,item iss ,date_dim d1 where ss_item_sk = iss.i_item_sk and ss_sold_date_sk = d1.d_date_sk and d1.d_year between 2000 AND 2000 + 2 intersect select ics.i_brand_id ,ics.i_class_id ,ics.i_category_id from catalog_sales ,item ics ,date_dim d2 where cs_item_sk = ics.i_item_sk and cs_sold_date_sk = d2.d_date_sk and d2.d_year between 2000 AND 2000 + 2 intersect select iws.i_brand_id ,iws.i_class_id ,iws.i_category_id from web_sales ,item iws ,date_dim d3 where ws_item_sk = iws.i_item_sk and ws_sold_date_sk = d3.d_date_sk and d3.d_year between 2000 AND 2000 + 2) where i_brand_id = brand_id and i_class_id = class_id and i_category_id = category_id ), avg_sales as (select avg(quantity*list_price) average_sales from (select ss_quantity quantity ,ss_list_price list_price from store_sales ,date_dim where ss_sold_date_sk = d_date_sk and d_year between 2000 and 2000 + 2 union all select cs_quantity quantity ,cs_list_price list_price from catalog_sales ,date_dim where cs_sold_date_sk = d_date_sk and d_year between 2000 and 2000 + 2 union all select ws_quantity quantity ,ws_list_price list_price from web_sales ,date_dim where ws_sold_date_sk = d_date_sk and d_year between 2000 and 2000 + 2) x) select channel, i_brand_id,i_class_id,i_category_id,sum(sales), sum(number_sales) from( select 'store' channel, i_brand_id,i_class_id ,i_category_id,sum(ss_quantity*ss_list_price) sales , count(*) number_sales from store_sales ,item ,date_dim where ss_item_sk in (select ss_item_sk from cross_items) and ss_item_sk = i_item_sk and ss_sold_date_sk = d_date_sk and d_year = 2000+2 and d_moy = 11 group by i_brand_id,i_class_id,i_category_id having sum(ss_quantity*ss_list_price) > (select average_sales from avg_sales) union all select 'catalog' channel, i_brand_id,i_class_id,i_category_id, sum(cs_quantity*cs_list_price) sales, count(*) number_sales from catalog_sales ,item ,date_dim where cs_item_sk in (select ss_item_sk from cross_items) and cs_item_sk = i_item_sk and cs_sold_date_sk = d_date_sk and d_year = 2000+2 and d_moy = 11 group by i_brand_id,i_class_id,i_category_id having sum(cs_quantity*cs_list_price) > (select average_sales from avg_sales) union all select 'web' channel, i_brand_id,i_class_id,i_category_id, sum(ws_quantity*ws_list_price) sales , count(*) number_sales from web_sales ,item ,date_dim where ws_item_sk in (select ss_item_sk from cross_items) and ws_item_sk = i_item_sk and ws_sold_date_sk = d_date_sk and d_year = 2000+2 and d_moy = 11 group by i_brand_id,i_class_id,i_category_id having sum(ws_quantity*ws_list_price) > (select average_sales from avg_sales) ) y group by rollup (channel, i_brand_id,i_class_id,i_category_id) order by channel,i_brand_id,i_class_id,i_category_id limit 100; with cross_items as (select i_item_sk ss_item_sk from item, (select iss.i_brand_id brand_id ,iss.i_class_id class_id ,iss.i_category_id category_id from store_sales ,item iss ,date_dim d1 where ss_item_sk = iss.i_item_sk and ss_sold_date_sk = d1.d_date_sk and d1.d_year between 2000 AND 2000 + 2 intersect select ics.i_brand_id ,ics.i_class_id ,ics.i_category_id from catalog_sales ,item ics ,date_dim d2 where cs_item_sk = ics.i_item_sk and cs_sold_date_sk = d2.d_date_sk and d2.d_year between 2000 AND 2000 + 2 intersect select iws.i_brand_id ,iws.i_class_id ,iws.i_category_id from web_sales ,item iws ,date_dim d3 where ws_item_sk = iws.i_item_sk and ws_sold_date_sk = d3.d_date_sk and d3.d_year between 2000 AND 2000 + 2) x where i_brand_id = brand_id and i_class_id = class_id and i_category_id = category_id ), avg_sales as (select avg(quantity*list_price) average_sales from (select ss_quantity quantity ,ss_list_price list_price from store_sales ,date_dim where ss_sold_date_sk = d_date_sk and d_year between 2000 and 2000 + 2 union all select cs_quantity quantity ,cs_list_price list_price from catalog_sales ,date_dim where cs_sold_date_sk = d_date_sk and d_year between 2000 and 2000 + 2 union all select ws_quantity quantity ,ws_list_price list_price from web_sales ,date_dim where ws_sold_date_sk = d_date_sk and d_year between 2000 and 2000 + 2) x) select this_year.channel ty_channel ,this_year.i_brand_id ty_brand ,this_year.i_class_id ty_class ,this_year.i_category_id ty_category ,this_year.sales ty_sales ,this_year.number_sales ty_number_sales ,last_year.channel ly_channel ,last_year.i_brand_id ly_brand ,last_year.i_class_id ly_class ,last_year.i_category_id ly_category ,last_year.sales ly_sales ,last_year.number_sales ly_number_sales from (select 'store' channel, i_brand_id,i_class_id,i_category_id ,sum(ss_quantity*ss_list_price) sales, count(*) number_sales from store_sales ,item ,date_dim where ss_item_sk in (select ss_item_sk from cross_items) and ss_item_sk = i_item_sk and ss_sold_date_sk = d_date_sk and d_week_seq = (select d_week_seq from date_dim where d_year = 2000 + 1 and d_moy = 12 and d_dom = 17) group by i_brand_id,i_class_id,i_category_id having sum(ss_quantity*ss_list_price) > (select average_sales from avg_sales)) this_year, (select 'store' channel, i_brand_id,i_class_id ,i_category_id, sum(ss_quantity*ss_list_price) sales, count(*) number_sales from store_sales ,item ,date_dim where ss_item_sk in (select ss_item_sk from cross_items) and ss_item_sk = i_item_sk and ss_sold_date_sk = d_date_sk and d_week_seq = (select d_week_seq from date_dim where d_year = 2000 and d_moy = 12 and d_dom = 17) group by i_brand_id,i_class_id,i_category_id having sum(ss_quantity*ss_list_price) > (select average_sales from avg_sales)) last_year where this_year.i_brand_id= last_year.i_brand_id and this_year.i_class_id = last_year.i_class_id and this_year.i_category_id = last_year.i_category_id order by this_year.channel, this_year.i_brand_id, this_year.i_class_id, this_year.i_category_id limit 100; |
SQL15
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
select ca_zip ,sum(cs_sales_price) from catalog_sales ,customer ,customer_address ,date_dim where cs_bill_customer_sk = c_customer_sk and c_current_addr_sk = ca_address_sk and ( substr(ca_zip,1,5) in ('85669', '86197','88274','83405','86475', '85392', '85460', '80348', '81792') or ca_state in ('CA','WA','GA') or cs_sales_price > 500) and cs_sold_date_sk = d_date_sk and d_qoy = 2 and d_year = 1999 group by ca_zip order by ca_zip limit 100; |
SQL16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
select count(distinct cs_order_number) as "order count" ,sum(cs_ext_ship_cost) as "total shipping cost" ,sum(cs_net_profit) as "total net profit" from catalog_sales cs1 ,date_dim ,customer_address ,call_center where d_date between '1999-2-01' and (cast('1999-2-01' as date) + 60 ) and cs1.cs_ship_date_sk = d_date_sk and cs1.cs_ship_addr_sk = ca_address_sk and ca_state = 'TX' and cs1.cs_call_center_sk = cc_call_center_sk and cc_county in ('Barrow County','Luce County','Mobile County','Richland County', 'Wadena County' ) and exists (select * from catalog_sales cs2 where cs1.cs_order_number = cs2.cs_order_number and cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk) and not exists(select * from catalog_returns cr1 where cs1.cs_order_number = cr1.cr_order_number) order by count(distinct cs_order_number) limit 100; |
SQL17
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
select i_item_id ,i_item_desc ,s_state ,count(ss_quantity) as store_sales_quantitycount ,avg(ss_quantity) as store_sales_quantityave ,stddev_samp(ss_quantity) as store_sales_quantitystdev ,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov ,count(sr_return_quantity) as store_returns_quantitycount ,avg(sr_return_quantity) as store_returns_quantityave ,stddev_samp(sr_return_quantity) as store_returns_quantitystdev ,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as store_returns_quantitycov ,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) as catalog_sales_quantityave ,stddev_samp(cs_quantity) as catalog_sales_quantitystdev ,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov from store_sales ,store_returns ,catalog_sales ,date_dim d1 ,date_dim d2 ,date_dim d3 ,store ,item where d1.d_quarter_name = '2000Q1' and d1.d_date_sk = ss_sold_date_sk and i_item_sk = ss_item_sk and s_store_sk = ss_store_sk and ss_customer_sk = sr_customer_sk and ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number and sr_returned_date_sk = d2.d_date_sk and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3') and sr_customer_sk = cs_bill_customer_sk and sr_item_sk = cs_item_sk and cs_sold_date_sk = d3.d_date_sk and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3') group by i_item_id ,i_item_desc ,s_state order by i_item_id ,i_item_desc ,s_state limit 100; |
SQL18
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
select i_item_id, ca_country, ca_state, ca_county, avg( cast(cs_quantity as decimal(12,2))) agg1, avg( cast(cs_list_price as decimal(12,2))) agg2, avg( cast(cs_coupon_amt as decimal(12,2))) agg3, avg( cast(cs_sales_price as decimal(12,2))) agg4, avg( cast(cs_net_profit as decimal(12,2))) agg5, avg( cast(c_birth_year as decimal(12,2))) agg6, avg( cast(cd1.cd_dep_count as decimal(12,2))) agg7 from catalog_sales, customer_demographics cd1, customer_demographics cd2, customer, customer_address, date_dim, item where cs_sold_date_sk = d_date_sk and cs_item_sk = i_item_sk and cs_bill_cdemo_sk = cd1.cd_demo_sk and cs_bill_customer_sk = c_customer_sk and cd1.cd_gender = 'M' and cd1.cd_education_status = 'Primary' and c_current_cdemo_sk = cd2.cd_demo_sk and c_current_addr_sk = ca_address_sk and c_birth_month in (10,1,8,7,3,5) and d_year = 1998 and ca_state in ('NE','OK','NC' ,'CO','ID','AR','MO') group by rollup (i_item_id, ca_country, ca_state, ca_county) order by ca_country, ca_state, ca_county, i_item_id limit 100; |
SQL19
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
select i_brand_id brand_id, i_brand brand, i_manufact_id, i_manufact, sum(ss_ext_sales_price) ext_price from date_dim, store_sales, item,customer,customer_address,store where d_date_sk = ss_sold_date_sk and ss_item_sk = i_item_sk and i_manager_id=62 and d_moy=11 and d_year=2000 and ss_customer_sk = c_customer_sk and c_current_addr_sk = ca_address_sk and substr(ca_zip,1,5) <> substr(s_zip,1,5) and ss_store_sk = s_store_sk group by i_brand ,i_brand_id ,i_manufact_id ,i_manufact order by ext_price desc ,i_brand ,i_brand_id ,i_manufact_id ,i_manufact limit 100 ; |
SQL20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
select i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price ,sum(cs_ext_sales_price) as itemrevenue ,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over (partition by i_class) as revenueratio from catalog_sales ,item ,date_dim where cs_item_sk = i_item_sk and i_category in ('Sports', 'Shoes', 'Women') and cs_sold_date_sk = d_date_sk and d_date between cast('2001-03-21' as date) and (cast('2001-03-21' as date) + 30) group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio limit 100; |
父主题: 附录