更新时间:2024-12-10 GMT+08:00

使用Impala操作Kudu表

您可以使用Impala的SQL语法插入、查询、更新和删除Kudu中的数据,作为使用Kudu API构建自定义Kudu应用程序的替代方案。

前提条件

已安装集群完整客户端。例如安装目录为“/opt/Bigdata/client”,以下操作的客户端目录只是举例,请根据实际安装目录修改。

Impala on Kudu

  1. 登录安装客户端的节点。
  2. 执行如下命令初始化环境变量。

    source /opt/Bigdata/client/bigdata_env

  3. 若集群开启Kerberos认证,请执行如下步骤认证用户。若集群未开启Kerberos认证请跳过该步骤。

    kinit 业务用户

  4. 执行如下命令登录impala客户端。

    impala-shell

    默认情况下,impala-shell尝试连接到localhost的21000端口上的Impala守护程序。如需连接到其他主机,请使用-i <host:port>选项。要自动连接到特定的Impala数据库,请使用-d <database>选项。例如,如果您的所有Kudu表都位于数据库“impala_kudu”中,则-d impala_kudu可以使用此数据库。要退出Impala Shell,请使用以下命令quit

  5. 执行如下命令创建Impala表并导入已准备好的数据,例如/tmp/data10。

    create table dataorigin (name string,age string,pt string, date_p date) row format delimited fields terminated by ',' stored as textfile;

    load data inpath '/tmp/data10' overwrite into table dataorigin;

  6. 执行如下命令创建Kudu表,其中kudu.master_addresses地址为KuduMaster实例的IP,请根据实际集群地址填写。

    create table dataorigin2 (name string,age string,pt string, date_p date, primary key(name)) stored as kudu TBLPROPERTIES('kudu.master_addresses'='192.168.190.164:7051,192.168.204.178:7051,192.168.244.63:7051');

    若impala集群开启了Ranger鉴权,上述命令会报错,需要新增Impalad角色自定义配置--kudu_master_hosts=192.168.190.164:7051,192.168.204.178:7051,192.168.244.63:7051,然后重启Impala集群,使用如下命令创建kudu表:

    create table dataorigin2 (name string,age string,pt string, date_p date, primary key(name)) stored as kudu

  7. 执行如下命令操作Kudu表。

    1. 插入数据

      insert into dataorigin2 select * from dataorigin;

    2. 更新数据

      UPDATE dataorigin2 SET date_p="2021-03-31" where age="73";

    3. 更新或插入行

      UPSERT INTO dataorigin2 VALUES ("spjted","75","28","2021-03-32");

      UPSERT INTO dataorigin2 VALUES ("kwhakb","92","29","2021-03-33");

      UPSERT INTO dataorigin2 VALUES ("oftrkf","13","30","2021-03-34");

      UPSERT INTO dataorigin2 VALUES ("kiewti","36","31","2021-03-35");

      UPSERT INTO dataorigin2 VALUES ("rknmql","98","32","2021-03-36");

      UPSERT INTO dataorigin2 VALUES ("fwcoij","52","33","2021-03-37");

      UPSERT INTO dataorigin2 VALUES ("pgvpdo","37","34","2021-03-35");

    4. 删除行

      DELETE FROM dataorigin2 WHERE date_p="2021-03-31";