Updated on 2024-10-25 GMT+08:00

Customizing Row Separators in Hive Tables

Scenario

In most cases, a carriage return character is used as the row delimiter in Hive tables stored in text files, that is, the carriage return character is used as the terminator of a row during queries. However, some data files are delimited by special characters, and not a carriage return character.

MRS Hive allows you to use different characters or character combinations to delimit rows of Hive text data. When creating a table, set inputformat to SpecifiedDelimiterInputFormat, and set the following parameter before search each time. Then the table data is queried by the specified delimiter.

set hive.textinput.record.delimiter='';

The Hue component of the current version does not support the configuration of multiple separators when files are imported to a Hive table.

Procedure

  1. Log in to the node where the Hive client is installed as the Hive client installation user.
  2. Run the following commands to switch to the client installation directory, configure environment variables, and authenticate users:

    cd Client installation directory

    source bigdata_env

    kinit Hive service user (Skip this step if Kerberos authentication is not enabled for the cluster.)

  3. Run the following command to log in to the Hive client:

    beeline

  4. Specify inputFormat and outputFormat when creating a table.

    CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name [(col_name data_type [COMMENT col_comment], ...)] [ROW FORMAT row_format] STORED AS inputformat 'org.apache.hadoop.hive.contrib.fileformat.SpecifiedDelimiterInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

  5. Specify the delimiter before search.

    set hive.textinput.record.delimiter='!@!';

    Hive will use '!@!' as the row delimiter.