Help Center > > User Guide> MRS Cluster Component Operation Guide> Using Impala> Using Impala from Scratch

Using Impala from Scratch

Updated at: Feb 12, 2020 GMT+08:00

Impala is a massively parallel processing (MPP) SQL query engine for processing vast amounts of data stored in Hadoop clusters. It is an open source software written in C++ and Java. It provides high performance and low latency compared with other SQL engines for Hadoop

Background

After an MRS cluster is successfully created, the original client is stored in the /opt/client directory on all nodes in the cluster by default. Before using the client, download and update the client configuration file, and ensure that Core nodes of MRS are available.

Suppose a user develops an application to manage users who use service A in an enterprise. The procedure of operating service A on the Impala client is as follows:

Operations on common tables:

  • Create the user_info table.
  • Add users' educational backgrounds and titles to the table.
  • Query user names and addresses by user ID.
  • Delete the user information table after service A ends.
Table 1 User information

ID

Name

Gender

Age

Address

12005000201

A

Male

19

City A

12005000202

B

Female

23

City B

12005000203

C

Male

26

City C

12005000204

D

Male

18

City D

12005000205

E

Female

21

City E

12005000206

F

Male

32

City F

12005000207

G

Female

29

City G

12005000208

H

Female

30

City H

12005000209

I

Male

26

City I

12005000210

J

Female

25

City J

Procedure

  1. Download the client configuration file.

    1. Log in to the MRS management console. In the navigation tree on the left, choose Clusters > Active Clusters and click the cluster name.
    2. Click the Components tab.
    3. Click Download Client.

      Set Client Type to Only configuration files, Save Path to Server, and click OK to generate the client configuration file. The generated file is saved in the /tmp/MRS-client directory on the active management node by default.

      Figure 1 Downloading only the client configuration files

  2. Use the MRS client on any Core node in the cluster. For details, see Using the client on a Core node.
  3. Run the Impala client command to implement service A.

    Run the client command of the Impala component directly.

    impala-shell.sh

    By default, impala-shell attempts to connect to the Impala daemon on port 21000 of localhost. To connect to another host, use the -i <host:port> option. To automatically connect to a specific Impala database, use the -d <database> option. For example, if all your Kudu tables are in the impala_kudu database, -d impala_kudu can use this database. To exit the Impala shell, run the quit command.

    Operations on internal tables:

    1. Create the user_info user information table according to Table 1 and add data to it.
      create table user_info(id string,name string,gender string,age int,addr string);
      insert into table user_info(id,name,gender,age,addr) values("12005000201", "A", "Male", "19", "City A");

      ... (Other statements are the same.)

    2. Add users' educational backgrounds and titles to the user_info table.

      For example, to add educational background and title information about user 12005000201, run the following command:

      alter table user_info add columns(education string,technical string);
    3. Query user names and addresses by user ID.

      For example, to query the name and address of user 12005000201, run the following command:

      select name,addr from user_info where id='12005000201';
    4. Delete the user information table.
      drop table user_info;

    Operations on external partition tables:

    Create an external partition table and import data.

    1. Create a path for storing external table data.

      hdfs dfs -mkdir /hive/user_info

    2. Create a table.
      create external table user_info(id string,name string,gender string,age int,addr string) partitioned by(year string) row format delimited fields terminated by ' ' lines terminated by '\n' stored as textfile location '/hive/user_info';

      fields terminated indicates delimiters, for example, spaces.

      lines terminated indicates line breaks, for example, \n.

      /hive/user_info indicates the path of the data file.

    3. Import data.
      1. Execute the insert statement to insert data.
        insert into user_info partition(year="2018") values ("12005000201", "A", "Male", "19", "City A");
      2. Run the load data command to import file data.
        1. Create a file based on the data in Table 1. For example, the file name is txt.log. Fields are separated by space, and the line feed characters are used as the line breaks.
        2. Upload the file to HDFS.

          hdfs dfs -put txt.log /tmp

        3. Load data to the table.

          load data inpath '/tmp/txt.log' into table user_info partition (year='2011');

    4. Query the imported data.
      select * from user_info;
    5. Delete the user information table.
      drop table user_info;

  4. Delete the cluster.

    For details, see Terminating a Cluster in the User Guide.

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel