Updated on 2022-02-22 GMT+08:00

Using Impala to Operate Kudu

You can use the SQL statements of Impala to insert, query, update, and delete data in Kudu as an alternative to using Kudu APIs to build custom Kudu applications.

Prerequisite

A complete cluster client has been installed. For example, the installation directory is /opt/Bigdata/client. The client directory in the following operations is only an example. Replace it with the actual installation directory.

Impala on Kudu

  1. Log in to the node where the client is installed.
  2. Run the following command to initialize environment variables:

    source /opt/Bigdata/client/bigdata_env

  3. If Kerberos authentication is enabled for the cluster, perform the following operation to authenticate the user. If Kerberos authentication is not enabled for the cluster, skip this step.

    kinit Service user

  4. Run the following command to log in to the Impala client:

    impala-shell

    By default, impala-shell attempts to connect to the Impala daemon on port 21000 of localhost. To connect to another host, use the -i <host:port> option. To automatically connect to a specific Impala database, use the -d <database> option. For example, if all your Kudu tables are in the impala_kudu database, -d impala_kudu can use this database. To exit the Impala shell, run the quit command.

  5. Run the following commands to create an Impala table and import the prepared data, for example, data in the /tmp/data10 directory:

    create table dataorigin (name string,age string,pt string, date_p date) row format delimited fields terminated by ',' stored as textfile;

    load data inpath '/tmp/data10' overwrite into table dataorigin;

  6. Run the following command to create a Kudu table. In the command, kudu.master_addresses indicates the IP address of the KuduMaster instance. Set it to the actual IP address.

    create table dataorigin2 (name string,age string,pt string, date_p date, primary key(name)) stored as kudu TBLPROPERTIES('kudu.master_addresses'='192.168.190.164:7051,192.168.204.178:7051,192.168.244.63:7051');

  7. Perform the following operations on the Kudu table.

    1. Insert data.

      insert into dataorigin2 select * from dataorigin;

    2. Update data.

      UPDATE dataorigin2 SET date_p="2021-03-31" where age="73";

    3. Upsert rows.

      UPSERT INTO dataorigin2 VALUES ("spjted","75","28","2021-03-32");

      UPSERT INTO dataorigin2 VALUES ("kwhakb","92","29","2021-03-33");

      UPSERT INTO dataorigin2 VALUES ("oftrkf","13","30","2021-03-34");

      UPSERT INTO dataorigin2 VALUES ("kiewti","36","31","2021-03-35");

      UPSERT INTO dataorigin2 VALUES ("rknmql","98","32","2021-03-36");

      UPSERT INTO dataorigin2 VALUES ("fwcoij","52","33","2021-03-37");

      UPSERT INTO dataorigin2 VALUES ("pgvpdo","37","34","2021-03-35");

    4. Delete a row.

      DELETE FROM dataorigin2 WHERE date_p="2021-03-31";