Using an Impala Client

Impala is a massively parallel processing (MPP) SQL query engine for processing vast amounts of data stored in Hadoop clusters. It is an open source software written in C++ and Java. It provides high performance and low latency compared with other SQL engines for Hadoop.

Background

Suppose a user develops an application to manage users who use service A in an enterprise. The procedure of operating service A on the Impala client is as follows:

Operations on common tables:

Create the user_info table.
Add users' educational backgrounds and titles to the table.
Query user names and addresses by user ID.
Delete the user information table after service A ends.

**Table 1** User information
No.	Name	Gender	Age	Address
12005000201	A	Male	19	City A
12005000202	B	Female	23	City B
12005000203	C	Male	26	City C
12005000204	D	Male	18	City D
12005000205	E	Female	21	City E
12005000206	F	Male	32	City F
12005000207	G	Female	29	City G
12005000208	H	Female	30	City H
12005000209	I	Male	26	City I
12005000210	J	Female	25	City J

Prerequisites

The client has been installed. For example, the client is installed in the /opt/hadoopclient directory. The client directory in the following operations is only an example. Change it to the actual installation directory.

Procedure

Log in to the node where the client is installed as the client installation user.
Run the following command to go to the client installation directory:
cd /opt/hadoopclient
Run the following command to configure environment variables:
source bigdata_env
Run the Impala client command to implement service A.
- Operations on internal tables:
  Run the Impala client command impala-shell.
  
  By default, impala-shell attempts to connect to the Impala daemon on port 21000 of localhost. To connect to another host, use the -i < host:port > option, for example, impala-shell -i xxx.xxx.xxx.xxx:21000. To automatically connect to a specific Impala database, use the -d <database> option. For example, if all your Kudu tables are in the impala_kudu database, -d impala_kudu can use this database. To exit Impala Shell, run the quit command.
  1. Create the user_info user information table according to Table 1 and add data to it.
```
create table user_info(id string,name string,gender string,age int,addr string);
insert into table user_info(id,name,gender,age,addr) values("12005000201", "A", "Male", 19, "City A");
```
    ... (Other statements are the same.)
  2. Add users' educational backgrounds and titles to the user_info table.
    For example, to add educational background and title information about user 12005000201, run the following commands.
```
alter table user_info add columns(education string,technical string);
```
  3. Query user names and addresses by user ID.
    For example, to query the name and address of user 12005000201, run the following command:
```
select name,addr from user_info where id='12005000201';
```
  4. Delete the user information table:
```
drop table user_info;
```
- Operations on external partition tables:
  Create an external partition table and import data.
  1. Create a path for storing external table data.
    kinit hive (Run this command only in security mode.)
    
    The user must have the hive administrator permissions.
    
    hdfs dfs -mkdir /hive
    
    hdfs dfs -mkdir /hive/user_info
  2. Create a table.
    impala-shell
    
    By default, impala-shell attempts to connect to the Impala daemon on port 21000 of localhost. To connect to another host, use the -i < host:port > option, for example, impala-shell -i xxx.xxx.xxx.xxx:21000. To automatically connect to a specific Impala database, use the -d <database> option. For example, if all your Kudu tables are in the impala_kudu database, -d impala_kudu can use this database. To exit Impala Shell, run the quit command.
```
create external table user_info(id string,name string,gender string,age int,addr string) partitioned by(year string) row format delimited fields terminated by ' ' lines terminated by '\n' stored as textfile location '/hive/user_info';
```
    fields terminated indicates delimiters, for example, spaces.
    
    lines terminated indicates line breaks, for example, \n.
    
    /hive/user_info indicates the path of the data file.
  3. Import data.
    1. Execute the insert statement to insert data.
```
insert into user_info partition(year="2018") values ("12005000201", "A", "Male", 19, "City A");
```
    2. Run the load data command to import file data.
      1. Create a file based on the data in Table 1. For example, the file name is txt.log. Fields are separated by space, and the line feed characters are used as the line breaks.
      2. Upload the file to HDFS.
        hdfs dfs -put txt.log /tmp
      3. Load data to the table.
        load data inpath '/tmp/txt.log' into table user_info partition (year='2018');
  4. Query the imported data:
```
select * from user_info;
```
  5. Delete the user information table:
```
drop table user_info;
```