Help Center > > Developer Guide> Data Import> Importing Stream Data from DIS to GaussDB(DWS)

Importing Stream Data from DIS to GaussDB(DWS)

Updated at: Dec 30, 2020 GMT+08:00

You can use Data Ingestion Service (DIS) to import data from DIS to the database of a data warehouse cluster in real time. Stream data stored in DIS streams will be periodically imported to GaussDB(DWS). Before data is imported to GaussDB(DWS), it is stored on OBS as temporary data and will be deleted after it is dumped to GaussDB(DWS).

Copying data from DIS to GaussDB(DWS) involves three phases:

  1. Creating a Data Warehouse Cluster, Database, and Table
  2. Creating a DIS Stream and Accessing Real-Time Data
  3. Viewing Data Imported from DIS

Creating a Data Warehouse Cluster, Database, and Table

  1. Create a data warehouse cluster.

    For details, see Creating Clusters.

    If there is one available, skip this step.

    For example, create a data warehouse cluster named dws-demo.

  2. Use an SQL client to access the data warehouse cluster.

    For details, see Methods of Connecting to a Cluster.

    You can choose any method to connect to a cluster.

  3. On the SQL client, run the SQL statements to create a database and a GaussDB(DWS) table, and configure the schema.

    For details, see Before You Start.

    • Create a database account whose username is joe and password is Bigdata@123.
    • Set the database name to db_tpcds.
    • Set the schema to myschema.

      If you do not specify the schema, public is adopted by default.

    • Create a GaussDB(DWS) table named mytable.

      When creating the GaussDB(DWS) table, design its structure based on the source data. Fields and field types in the GaussDB(DWS) table must correspond to their sources.

Creating a DIS Stream and Accessing Real-Time Data

  1. Log in to the DIS management console and create a DIS stream.

    For details, see Creating a DIS Stream.

    The DIS streams for importing data from DIS to GaussDB(DWS) have the following requirements:

    • Region: Select the same region that the data warehouse cluster resides.
    • Source Data Type: Only CSV is supported.

  2. On the DIS console, add a dump task for the newly purchased stream. Set Dump Destination to GaussDB(DWS) to dump stream data to GaussDB(DWS).

    For details, see Creating a Dump Task.

    When adding a dump task, set GaussDB(DWS) parameters according to Creating a Data Warehouse Cluster, Database, and Table. The parameters are as follows:

    • Dump Destination: Select GaussDB(DWS). Stream data is stored in DIS, and periodically imported to GaussDB(DWS). Before data is imported to GaussDB(DWS), it is stored on OBS as temporary data and will be deleted after it is dumped to GaussDB(DWS).
    • GaussDB(DWS) Cluster: Name of the data warehouse cluster to which the data will be dumped, for example, dws-demo
    • GaussDB(DWS) Database: Name of the GaussDB(DWS) database to which the dumped data will be stored, for example, db_tpcds
    • Database Schema: Schema of the GaussDB(DWS) database, for example, myschema
    • GaussDB(DWS) Table: GaussDB(DWS) tables in the specified database schema, for example, mytable
    • Delimiter: Delimiter used to separate row data generated when data is dumped to a GaussDB(DWS) data table
    • Username: Username of the target GaussDB(DWS) database for data dumping. The database user must have the read and write permissions on GaussDB(DWS) Table, for example, joe.
    • Password: Password of the user specified by Username.

  3. Set up the environment for creating a DIS application and send real-time data to DIS.

    For details, see Getting Started > Step 2: Preparing a DIS Application Development Environment and Getting Started > Step 3: Sending Data to DIS in the Data Ingestion Service User Guide.

Viewing Data Imported from DIS

  1. On the SQL client, connect to the database storing the imported data in the data warehouse cluster.

    For details, see Methods of Connecting to a Cluster.

    You can choose any method to connect to a cluster.

  2. Run the query command to view the data.

    The following is an example command. Replace table_name to the value of GaussDB(DWS) Table you set when creating the stream.

    1
    SELECT * FROM table_name;
    

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel