Updated on 2024-08-16 GMT+08:00

Presto Development Plan

Scenario Description

Assume that you need to develop a Presto data analysis application to obtain the call_center table of TPCDS Catalog provided by Presto.

Development Guidelines

  1. Prepare data.

    1. Create three tables: employee information table employees_info, employee contact table employees_contact, and extended employee information table employees_info_extended.
      • Employee information table employees_info contains fields such as employee ID, name, salary currency, salary, tax category, work place, and hire date. In salary currency, R indicates RMB and D indicates USD.
      • Fields in the employees_contact table include the employee ID, phone number, and email address.
      • Fields in the employees_info_extended table include the employee ID, name, mobile phone number, e-mail address, salary currency, salary, tax category, and work place. The partition field is the hire date.

        For details about table creation codes, see Creating a Hive Table.

    2. Load employee information to employees_info.

      For details about data loading codes, see Loading Hive Data.

      Table 1 provides employee information.

      Table 1 Employee information

      Employee ID

      Name

      Salary Currency

      Salary

      Tax Category

      Work Place

      Hire Date

      1

      Wang

      R

      8000.01

      personal income tax&0.05

      China:Shenzhen

      2014

      3

      Tom

      D

      12000.02

      personal income tax&0.09

      America:NewYork

      2014

      4

      Jack

      D

      24000.03

      personal income tax&0.09

      America:Manhattan

      2014

      6

      Linda

      D

      36000.04

      personal income tax&0.09

      America:NewYork

      2014

      8

      Zhang

      R

      9000.05

      personal income tax&0.05

      China:Shanghai

      2014

    3. Load employee contact information to employees_contact.

      Table 2 provides employee contact information.

      Table 2 Employee contact information

      Employee ID

      Phone Number

      e-mail

      1

      135 XXXX XXXX

      xxxx@xx.com

      3

      159 XXXX XXXX

      xxxxx@xx.com.cn

      4

      186 XXXX XXXX

      xxxx@xx.org

      6

      189 XXXX XXXX

      xxxx@xxx.cn

      8

      134 XXXX XXXX

      xxxx@xxxx.cn

  2. Analyze data.

    For details about data analysis codes, see Querying Hive Data.

    • Query contact information of employees whose salaries are paid in USD.
    • Query the IDs and names of employees who were hired in 2014, and load the query results to the partition with the hire date of 2014 in the employees_info_extended table.
    • Collect the number of records in the employees_info table.
    • Query information about employees whose email addresses end with "cn".

  3. Submit a data analysis task to collect the number of records in the employees_info table. For details, see Analyzing Hive Data.