Updated on 2022-12-07 GMT+08:00

Managing a Source Data Schema

A source data schema is a user's JSON or CSV data sample used to describe the JSON or CSV data format. For example, DIS can generate an Avro schema based on the JSON or CSV data sample and convert the JSON or CSV data uploaded to a stream to the Parquet or CarbonData format.

Three entrances are available for creating a source data schema:

  • Enable Schema when creating a stream. For details, see Figure 1.
  • Keep Schema disabled when creating a stream. After the stream is created, choose Stream Management in the navigation tree and click the created stream. Click Create Source Data Schema next to Source Data Type. For details, see Figure 2.
  • Keep Schema disabled when creating a stream. After the stream is created, choose Stream Management in the navigation tree and click the created stream. On the Dump Tasks tab page, click Create Dump Task. On the displayed page, create a source data schema. For details, see Figure 3.
Figure 1 Entrance 1 for creating a schema
Figure 2 Entrance 2 for creating a schema
Figure 3 Entrance 3 for creating a schema

Creating a Schema for Source Data by Importing Files

Use the following method to create a source data schema:

  1. When configuring Source Data Schema, click Import File.
  2. In the left text box, enter a JSON or CSV source data sample or click to import a source data sample. Example:

    When importing source data samples, you can import only .txt, .json, .csv, and .java files.

  3. In the left text box, click to generate an Avro schema in the right text box according to the source data sample. Example:

  4. In the right text box, click to modify the Avro schema. Example:

  5. Click Format to format the parsed data. Example:

  6. To delete the source data sample, click .

Creating a Schema for Source Data by Creating a Schema Tree

Use the following method to create a source data schema:

  1. When configuring Source Data Schema, click Create Schema Tree.
  2. After configuring an attribute name and data type, click Add to add a root node, as shown in Figure 4.

    Figure 4 Adding the root node

  3. Select the created root node and configure an attribute name and data type in the same way to add subnodes.

    Figure 5 Creating a subnode
    • To delete a node, select the check box of the node and click Delete.
    • To edit the attributes of a node, select the check box of the node and click Edit.
    • To delete all nodes, click Reset.

  4. Click Submit.

Modifying a Source Data Schema

Do not modify the source data schema of a stream if the stream has dump tasks.

  1. Use the account to log in to the DIS console.
  2. Click in the upper left corner of the page and select a region.
  3. In the navigation tree on the left, choose Stream Management.

    1. Click a stream name to access its details page.
    2. Click View Existing Source Data Schema next to Source Data Type.
    3. In the Source Data Schema text box, click to modify the source data schema.
      Figure 6 Modifying the source data schema

      If a stream has dump tasks, modifying the source data schema of the stream will cause some data unable to be successfully dumped.

    4. After the modification is complete, click Submit. Click Cancel to give up modifying the source data schema.