Help Center/ Data Ingestion Service/ FAQs/ Dump Questions/ How Does DIS Dump Data to a Specific Column of DWS?
Updated on 2024-10-25 GMT+08:00

How Does DIS Dump Data to a Specific Column of DWS?

DIS can dump source data in JSON format to DWS. Before dumping data, you need to configure the source data schema.

A source data schema is a user's JSON data sample used to describe the JSON data format. DIS can generate an Avro schema based on the JSON data sample and convert the JSON data uploaded to a stream to the Parquet or CarbonData format.

  1. Create a source data schema. For details, see Managing a Source Data Schema. The following describes how to create a source data schema when adding a dump task.
    1. Select a stream whose source data type is JSON.
    2. On the Dump Tasks tab page, click Create Dump Task.
    3. Set Dump Destination to DWS and configure Source Data Schema by importing a file.
    4. Enter the source data sample, click Convert Source Data Sample, and click Submit.
      Figure 1 Creating a source data schema
  2. Configure schema filtering.

    Schema filtering is valid only for the root node or level-1 subnode of the source data schema that is not of the array type. For details about how to create a source data schema, see Managing a Source Data Schema.

    1. Enable schema filtering.
    2. In the Source Data Attribute Name list, select the corresponding attribute names to map the specific columns in the DWS table.

      Attributes in the source data attribute list are generated by the name field in the source data schema and match the column name in the DWS table.

      Figure 2 Configuring schema attributes
    3. As shown in Figure 2, only id is selected as the source data attribute name, which is less than the total number of fields in the corresponding table.

      Create a cluster on DWS and run the following command to create a table:

      CREATE TABLE dis_test3(id TEXT,dev TEXT,online BIGINT,module TEXT default 'a',logTime TEXT,appId TEXT,event TEXT);

    4. After data is dumped from DIS to DWS, log in to the cluster database and query data in the dis_test3 table. You can find that data is inserted only into the id and module columns. The data in the module column is the default data, as shown in Figure 3.

    Figure 3 Schema filtering result