Help Center/ Data Lake Insight/ FAQs/ Problems Related to SQL Jobs/ O&M Guide/ Why Is the Data Volume Changes When Data Is Imported from DLI to OBS?
Updated on 2024-01-23 GMT+08:00

Why Is the Data Volume Changes When Data Is Imported from DLI to OBS?

Symptom

When DLI is used to insert data into an OBS temporary table, only part of data is imported.

Possible Causes

Possible causes are as follows:

  • The amount of data read during job execution is incorrect.
  • The data volume is incorrectly verified.

Run a query statement to check whether the amount of imported data is correct.

If OBS limits the number of files to be stored, add DISTRIBUTE BY number to the end of the insert statement. For example, if DISTRIBUTE BY 1 is added to the end of the insert statement, multiple files generated by multiple tasks can be inserted into one file.

Procedure

  1. On the DLI management console, check whether the number of results in the SQL job details is correct. The check result shows that the amount of data is correct.

    Figure 1 Checking the amount of data

  2. Check whether the method to verify the data volume is correct. Perform the following steps to verify the data amount:

    1. Download the data file from OBS.
    2. Use the text editor to open the data file. The data volume is less than the expected volume.

    If you used this method, you can verify that the text editor cannot read all the data.

    Run the query statement to view the amount of data import into the OBS bucket. The query result indicates that all the data is imported.

    This issue is caused by incorrect verification of the data volume.

Related Information

For details about the SQL syntax for inserting data, see Inserting Data in Data Lake Insight Spark SQL Syntax Reference.