Updated on 2022-09-14 GMT+08:00

Typical Scenario Description

Scenarios

A user develops a Hive data analysis application for managing employee information described in Table 1 and Table 2.

Procedure

  1. Prepare data.

    1. Create three tables: employee information table employees_info, contact table employees_contact, and extended employee information table employees_info_extended.
      • Fields in the employees_info table include the employee ID, name, salary currency, salary, tax category, work place, and hiring date. R indicates RMB, and D indicates USD.
      • Fields in the employees_contact table include the employee ID, mobile phone number, and e-mail address.
      • Fields in the employees_info_extended table include the employee ID, name, mobile phone number, e-mail address, salary currency, salary, tax category, and work place. The partition field is the hiring date.

        For table creation codes, see Creating a Table.

    2. Load employee information to employees_info.

      For data loading codes, see Loading Data.

      Table 1 describes employee information.

      Table 1 Employee information

      ID

      Name

      Salary Currency

      Salary

      Tax Category

      Work Place

      Hiring Date

      1

      Wang

      R

      8000.01

      personal income tax&0.05

      Country1:City1

      2014

      3

      Tom

      D

      12000.02

      personal income tax&0.09

      Country2:City2

      2014

      4

      Jack

      D

      24000.03

      personal income tax&0.09

      Country3:City3

      2014

      6

      Linda

      D

      36000.04

      personal income tax&0.09

      Country4:City4

      2014

      8

      Zhang

      R

      9000.05

      personal income tax&0.05

      Country5:City5

      2014

    3. Load employee contact information to employees_contact.

      Table 2 describes employee contact information.

      Table 2 Employee contact information

      ID

      Mobile Phone Number

      E-mail Address

      1

      135 XXXX XXXX

      xxxx@xx.com

      3

      159 XXXX XXXX

      xxxxx@xx.com.cn

      4

      186 XXXX XXXX

      xxxx@xx.org

      6

      189 XXXX XXXX

      xxxx@xxx.cn

      8

      134 XXXX XXXX

      xxxx@xxxx.cn

    4. Load extended employee information to employees_info_extended.

      Table 3 describes the extended employee information.

      Table 3 Extended employee information

      ID

      Name

      Mobile Phone Number

      E-mail Address

      Salary Currency

      Salary

      Tax Category

      Work Place

      Hiring Date

      1

      Wang

      135 XXXX XXXX

      xxxx@xx.com

      R

      8000.01

      personal income tax&0.05

      Country1:City1

      2014

      3

      Tom

      159 XXXX XXXX

      xxxxx@xx.com.cn

      D

      12000.02

      personal income tax&0.09

      Country2:City2

      2014

      4

      Jack

      186 XXXX XXXX

      xxxx@xx.org

      D

      24000.03

      personal income tax&0.09

      Country3:City3

      2014

      6

      Linda

      189 XXXX XXXX

      xxxx@xxx.cn

      D

      36000.04

      personal income tax&0.09

      Country4:City4

      2014

      8

      Zhang

      134 XXXX XXXX

      xxxx@xxxx.cn

      R

      9000.05

      personal income tax&0.05

      Country5:City5

      2014

  2. Analyze data.

    For data analysis codes, see Querying Data.

    • Query contact information of employees whose salaries are paid in USD.
    • Query the IDs and names of employees who were hired in 2014, and load query results to the partition with the hiring time of 2014 in employees_info_extended.
    • Collect statistics for the number of records in the employees_info table.
    • Query information about employees whose email addresses end with "cn".

  3. Submit a data analysis task to collect statistics for the number of records in the employees_info table.

    For details about the implementation, see Example Program Guide.