Updated on 2024-07-19 GMT+08:00

Common Sqoop Commands and Parameters

Common Sqoop commands

Table 1 Common Sqoop commands

Command

Description

import

Imports data to a cluster.

export

Exports data of a cluster.

codegen

Obtains data from a table in the database to generate a Java file and compress the file.

create-hive-table

Creates a Hive table.

eval

Executes a SQL statement and view the result.

import-all-tables

Imports all tables in a database to HDFS.

job

Generates a Sqoop job.

list-databases

Lists database names.

list-tables

List table names.

merge

Merges data in different HDFS directories and saves the data to a specified directory.

metastore

Starts the metadata database to record the metadata of a Sqoop job.

help

Prints help information.

version

Prints the version information.

Common Parameters

Table 2 Common parameters

Category

Parameter

Description

Parameters for database connection

--connect

Specifies the URL for connecting to a relational database.

--connection-manager

Specifies the connection manager class.

--driver jdbc

Specifies the driver package for database connection.

--help

Prints help information.

--password

Specifies the password for connecting to a database.

--username

Specifies the username for connecting to a database.

--verbose

Prints detailed information on the console.

import parameters

--fields-terminated-by

Specifies the field delimiter, which must be the same as that in a Hive table or HDFS file.

--lines-terminated-by

Specifies the line delimiter, which must be the same as that in a Hive table or HDFS file.

--mysql-delimiters

Specifies the default delimiter settings of MySQL.

export parameters

--input-fields-terminated-by

Specifies the field delimiter.

--input-lines-terminated-by

Specifies the line delimiter.

Hive parameters

--hive-delims-replacement

Replaces characters such as \r and \n in data with user-defined characters.

--hive-drop-import-delims

Removes characters such as \r and \n when data is imported to Hive.

--map-column-hive

Specifies the data type of fields during the generation of a Hive table.

--hive-partition-key

Creates a partition.

--hive-partition-value

Imports data to a specified partition of a database.

--hive-home

Specifies the installation directory for Hive.

--hive-import

Specifies that data is imported from a relational database to Hive.

--hive-overwrite

Overwrites existing Hive data.

--create-hive-table

Creates a Hive table. The default value is false. A destination table will be created if it does not exist.

--hive-table

Specifies a Hive table to which data is to be imported.

--table

Specifies the relational database table.

--columns

Specifies the fields of a relational data table to be imported.

--query

Specifies the query statement for importing the query result.

HCatalog parameters

--hcatalog-database

Specifies a Hive database and imports data to it using HCatalog.

--hcatalog-table

Specifies a Hive table and imports data to it using HCatalog.

Others

-m or --num-mappers

Specifies the number of map tasks used by a Sqoop job.

--split-by

Specifies the column based on which Sqoop splits work units. This parameter is used together with -m.

--target-dir

Specifies the temporary directory of HDFS.

--null-string string

Specifies the string to be written for a null value for string columns.

--null-non-string

Specifies the string to be written for a null value for non-string columns.

--check-column

Specifies the column for determining incremental data import.

--incremental append or lastmodified

Incrementally imports data.

append: appends records, for example, appending records that are greater than the value specified by last-value.

lastmodified: appends data that is modified after the date specified by last-value.

--last-value

Specifies the last value of the check column from the previous import.

--input-null-string

Specifies the string to be interpreted as NULL for string columns.

--input-null-non-string

Specifies the string to be interpreted as null for non-string columns. If this parameter is not specified, NULL will be used.