Abnormal Hive Query Due to Damaged Data in the JSON Table
Issue
A user fails to query JSON data on the Hive client because the JSON table contains damaged data.
- On the Hive client, the user executes the default open-source serialized JSON statement to create a table.
create external table if not exists test (
name string
)
row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
stored as textfile
location 'hdfs://hacluster/user/hive/warehouse/database/table/jsondata';
- Failed to query the table.
This section applies to versions later than MRS 1.9.2.
Cause Analysis
When the default open-source serialized JSON statement is used to create a Hive table, damaged JSON data in the table cannot be filtered out. As a result, the query is abnormal.
Procedure
- Download json-serde-1.3.8-jar-with-dependencies.jar to the local host and upload the JAR file to all HiveServer nodes.
- Log in to the nodes where the JAR package is uploaded in 1 and the Hive and HDFS clients are installed as the client installation user.
- Run the following commands to authenticate the user:
cd Client installation directory
source bigdata_env
kinit User who has the Hive administrator rights (Skip this operation for clusters with Kerberos authentication disabled)
- Run the following command to create an HDFS directory for storing JAR files:
hdfs dfs -mkdir HDFS directory for storing JAR files
- Run the following command to upload the JAR files in 2 to HDFS:
hdfs dfs -put JAR file storage path HDFS directory for storing JAR files
- Run the following commands to enable Hive to load a specified JAR file when executing a CLI task:
beeline
set role admin; (Skip this step for clusters with Kerberos authentication disabled)
add jar HDFS directory for storing JAR files
- Run the following commands to create the table again:
create external table if not exists test (
name string
)
row format serde 'org.openx.data.jsonserde.JsonSerDe'
stored as textfile
location 'hdfs://hacluster/user/hive/warehouse/database/table/jsondata';
- Run the following command to modify the table property to ignore damaged JSON data:
ALTER TABLE test SET SERDEPROPERTIES( "ignore.malformed.json" = "true");
After this property is modified, NULL is displayed by default if the queried table contains damaged JSON data.
- Run the following command to query table data:
select * from test;
If the data query is successful, the damaged JSON data is ignored. The damaged JSON data is displayed as NULL.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.