Updated on 2024-07-19 GMT+08:00

Adapting Sqoop 1.4.7 to MRS 3.x Clusters

Sqoop is a tool designed for efficiently transmitting a large amount of data between Apache Hadoop and structured databases (such as relational databases). Customers need to use Sqoop to migrate data in MRS. However, MRS of an earlier version does not provide Sqoop. This section describes how to install and use Sqoop. In MRS 3.1.0 or later, you can select the Sqoop component during cluster creation.

Prerequisites

The MRS client and the JDK environment have been installed.

Procedure

  1. Download the open-source sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz package.
  2. Save the downloaded package to the /opt/Bigdata/client directory on the node where the MRS client is installed and decompress it.

    tar zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

  3. Download the MySQL JDBC driver mysql-connector-java-xxx.jar from the MySQL official website. For details about how to select the MySQL JDBC driver, see the following table.

    Table 1 Version information

    JDBC Driver Version

    MySQL Version

    Connector/J 5.1

    MySQL 4.1, MySQL 5.0, MySQL 5.1, and MySQL 6.0 alpha

    Connector/J 5.0

    MySQL 4.1, MySQL 5.0 servers, and distributed transaction (XA)

    Connector/J 3.1

    MySQL 4.1, MySQL 5.0 servers, and MySQL 5.0 except distributed transaction (XA)

    Connector/J 3.0

    MySQL 3.x and MySQL 4.1

  4. Put the MySQL driver package in the /opt/Bigdata/client/sqoop-1.4.7.bin__hadoop-2.6.0/lib directory of Sqoop and modify the owner group and permission of the JAR package. For details, see the owner group and permission of omm:wheel and 755 in Figure 1.

    Figure 1 Owner group and permission of the MySQL driver package

  5. Replace the JAR package in the lib directory of Sqoop with that starting with jackson in the lib directory of Hive on the MRS client, for example, /opt/Bigdata/client/Hive/Beeline/lib.

    Figure 2 JAR package starting with jackson

  6. Copy the jline package from the /opt/Bigdata/client/Hive/Beeline/lib directory of the MRS Hive client to the lib directory of Sqoop.
  7. Run the vim $JAVA_HOME/jre/lib/security/java.policy command to add the following configuration:

    permission javax.management.MBeanTrustPermission "register";

  8. Run the following commands to go to the conf directory of the Sqoop and add the configuration items of variables:

    cd /opt/Bigdata/client/sqoop-1.4.7.bin__hadoop-2.6.0/conf

    cp sqoop-env-template.sh sqoop-env.sh

  9. Run the vim sqoop-env.sh command to set the environment variables of Sqoop. Change the Hadoop and Hive directories as required.

    export HADOOP_COMMON_HOME=/opt/Bigdata/client/HDFS/hadoop
    export HADOOP_MAPRED_HOME=/opt/Bigdata/client/HDFS/hadoop
    export HIVE_HOME=/opt/Bigdata/MRS_1.9.X/install/FusionInsight-Hive-3.1.0/hive (Enter the actual path.)
    export HIVE_CONF_DIR=/opt/Bigdata/client/Hive/config
    export HCAT_HOME=/opt/Bigdata/client/Hive/HCatalog
    Figure 3 Setting environment variables of Sqoop

  10. Build the sqoop script. For example:

    /opt/Bigdata/FusionInsight_Current/1_19_SqoopClient/install/FusionInsight-Sqoop-1.4.7/bin/sqoop import 
    --connect jdbc:mysql://192.168.0.183:3306/test 
    --driver com.mysql.jdbc.Driver 
    --username 'root' 
    --password 'xxx' 
    --query "SELECT id, name  FROM tbtest WHERE \$CONDITIONS" 
    --hcatalog-database default 
    --hcatalog-table test 
    --num-mappers 1