Help Center > > Developer Guide> Data Import> Importing Stream Data from DIS to DWS

Importing Stream Data from DIS to DWS

Updated at: Mar 13, 2020 GMT+08:00

You can use Data Ingestion Service (DIS) to import data from DIS to the database of a data warehouse cluster in real time. Stream data stored in DIS streams will be periodically imported to DWS. Before data is imported to DWS, it is stored on OBS as temporary data and will be deleted after it is dumped to DWS.

Copying data from DIS to DWS involves three phases:

  1. Creating a Data Warehouse Cluster, Database, and Table
  2. Creating a DIS Stream and Accessing Real-Time Data
  3. Viewing Data Imported from DIS

Creating a Data Warehouse Cluster, Database, and Table

  1. Create a data warehouse cluster.

    For details, see Creating a Cluster in the Data Warehouse Service Management Guide.

    If there is one available, skip this step.

    For example, create a data warehouse cluster named dws-demo.

  2. Use an SQL client to access the data warehouse cluster.

    For details, see Methods of Connecting to a Cluster in the Data Warehouse Service Management Guide.

    You can choose any method to connect to a cluster.

  3. On the SQL client, run the SQL statements to create a database and a DWS table, and configure the schema.

    For details, see Before You Start in the Data Warehouse Service Database Developer Guide.

    • Create a database account whose username is joe and password is Bigdata@123.
    • Set the database name to db_tpcds.
    • Set the schema to myschema.

      If you do not specify the schema, public is adopted by default.

    • Create a DWS table named mytable.

      When creating the DWS table, design its structure based on the source data. Fields and field types in the DWS table must correspond to their sources.

Creating a DIS Stream and Accessing Real-Time Data

  1. Log in to the DIS management console and create a DIS stream.

    For detailed procedure, see Step 1: Creating a DIS Stream in the Data Ingestion Service User Guide.

    The DIS streams for importing data from DIS to DWS have the following requirements:

    • Region: Select the same region that the data warehouse cluster resides.
    • Source Data Type: Only CSV is supported.

  2. On the DIS console, add a dump task for the newly purchased stream. Set Dump Destination to DWS to dump stream data to DWS.

    For detailed procedure, see Managing Dump Tasks in the Data Ingestion Service User Guide.

    When adding a dump task, set DWS parameters according to Creating a Data Warehouse Cluster, Database, and Table. The parameters are as follows:

    • Dump Destination: Select DWS. Data in a stream is stored in DIS, and periodically imported to DWS. Before data is imported to DWS, it is stored on OBS as temporary data and will be deleted after it is dumped to DWS.
    • DWS Cluster: Name of the data warehouse cluster to which the data will be dumped, for example, dws-demo
    • DWS Database: Name of the DWS database to which the dumped data will be stored, for example, db_tpcds
    • Database Schema: Schema of the DWS database, for example, myschema
    • DWS Table: DWS table in the specified database schema, for example, mytable
    • Delimiter: Delimiter used to separate row data generated when data is dumped to a DWS data table
    • Username: Username of the target DWS database for data dumping. The database user must have the read and write permissions on DWS Table, for example, joe.
    • Password: Password of the user specified by Username.

  3. Set up the environment for creating a DIS application and send real-time data to DIS.

    For details, see Getting Started > Step 2: Preparing a DIS Application Development Environment and Getting Started > Step 3: Sending Data to DIS in the Data Ingestion Service User Guide.

Viewing Data Imported from DIS

  1. On the SQL client, connect to the database storing the imported data in the data warehouse cluster.

    For details, see Methods of Connecting to a Cluster in the Data Warehouse Service Management Guide.

    You can choose any method to connect to a cluster.

  2. Run the query command to view the data.

    The following is an example command. Replace table_name to the value of DWS Table you set when creating the stream.

    1
    SELECT * FROM table_name;
    

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel