Updated on 2024-08-16 GMT+08:00

Impala Development Plan

Scenario Description

Assume that you need to develop an Impala data analysis application to manage the employee information of an enterprise. Table 1 and Table 2 provide employee information.

Development Guidelines

  1. Prepare data.

    1. Create three tables: employee information table employees_info, employee contact table employees_contact, and extended employee information table employees_info_extended.
      • Employee information table employees_info contains fields such as employee ID, name, salary currency, salary, tax category, work place, and hire date. In salary currency, R indicates RMB and D indicates USD.
      • Fields in the employees_contact table include the employee ID, phone number, and email address.
      • Fields in the employees_info_extended table include the employee ID, name, mobile phone number, e-mail address, salary currency, salary, tax category, and work place. The partition field is the hire date.

        For details about table creation codes, see Creating an Impala Table.

    2. Load employee information to employees_info.

      For details about data loading codes, see Loading Impala Data.

      Table 1 provides employee information.

      Table 1 Employee information

      Employee ID

      Name

      Salary Currency

      Salary

      Tax Category

      Work Place

      Hire Date

      1

      Wang

      R

      8000.01

      personal income tax&0.05

      China:Shenzhen

      2014

      3

      Tom

      D

      12000.02

      personal income tax&0.09

      America:NewYork

      2014

      4

      Jack

      D

      24000.03

      personal income tax&0.09

      America:Manhattan

      2014

      6

      Linda

      D

      36000.04

      personal income tax&0.09

      America:NewYork

      2014

      8

      Zhang

      R

      9000.05

      personal income tax&0.05

      China:Shanghai

      2014

    3. Load employee contact information to employees_contact.

      Table 2 provides employee contact information.

      Table 2 Employee contact information

      Employee ID

      Phone Number

      Email

      1

      135 XXXX XXXX

      xxxx@xx.com

      3

      159 XXXX XXXX

      xxxxx@xx.com.cn

      4

      186 XXXX XXXX

      xxxx@xx.org

      6

      189 XXXX XXXX

      xxxx@xxx.cn

      8

      134 XXXX XXXX

      xxxx@xxxx.cn

  2. Analyze data.

    For details about data analysis codes, see Querying Impala Data.

    • Query contact information of employees whose salaries are paid in USD.
    • Query the IDs and names of employees who were hired in 2014, and load the query results to the partition with the hire date of 2014 in the employees_info_extended table.
    • Collect the number of records in the employees_info table.
    • Query information about employees whose email addresses end with "cn".

  3. Submit a data analysis task to collect the number of records in the employees_info table. For details, see Analyzing Impala Data.