Why Is the Data Volume Changes When Data Is Imported from DLI to OBS?
Symptom
When DLI is used to insert data into an OBS temporary table, only part of data is imported.
Possible Causes
Possible causes are as follows:
- The amount of data read during job execution is incorrect.
- The data volume is incorrectly verified.
Run a query statement to check whether the amount of imported data is correct.
If OBS limits the number of files to be stored, add DISTRIBUTE BY number to the end of the insert statement. For example, if DISTRIBUTE BY 1 is added to the end of the insert statement, multiple files generated by multiple tasks can be inserted into one file.
Procedure
- On the DLI management console, check whether the number of results in the SQL job details is correct. The check result shows that the amount of data is correct.
Figure 1 Checking the amount of data
- Check whether the method to verify the data volume is correct. Perform the following steps to verify the data amount:
- Download the data file from OBS.
- Use the text editor to open the data file. The data volume is less than the expected volume.
If you used this method, you can verify that the text editor cannot read all the data.
Run the query statement to view the amount of data import into the OBS bucket. The query result indicates that all the data is imported.
This issue is caused by incorrect verification of the data volume.
Related Information
For details about the SQL syntax for inserting data, see Inserting Data in Data Lake Insight Spark SQL Syntax Reference.
O&M Guide FAQs
- How Do I Troubleshoot Slow SQL Jobs?
- How Do I View DLI SQL Logs?
- How Do I View SQL Execution Records?
- How Do I Eliminate Data Skew by Configuring AE Parameters?
- What Can I Do If a Table Cannot Be Queried on the DLI Console?
- The Compression Ratio of OBS Tables Is Too High
- How Can I Avoid Garbled Characters Caused by Inconsistent Character Codes?
- Do I Need to Grant Table Permissions to a User and Project After I Delete a Table and Create One with the Same Name?
- Why Can't I Query Table Data After Data Is Imported to a DLI Partitioned Table Because the File to Be Imported Does Not Contain Data in the Partitioning Column?
- How Do I Fix the Data Error Caused by CRLF Characters in a Field of the OBS File Used to Create an External OBS Table?
- Why Does a SQL Job That Has Join Operations Stay in the Running State?
- The on Clause Is Not Added When Tables Are Joined. Cartesian Product Query Causes High Resource Usage of the Queue, and the Job Fails to Be Executed
- Why Can't I Query Data After I Manually Add Data to the Partition Directory of an OBS Table?
- Why Is All Data Overwritten When insert overwrite Is Used to Overwrite Partitioned Table?
- Why Is a SQL Job Stuck in the Submitting State?
- Why Is the create_date Field in the RDS Table Is a Timestamp in the DLI query result?
- What Can I Do If datasize Cannot Be Changed After the Table Name Is Changed in a Finished SQL Job?
- Why Is the Data Volume Changes When Data Is Imported from DLI to OBS?
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.
more