Updated on 2024-05-21 GMT+08:00

Synchronizing Data

Scenarios

To analyze data in some databases of your GaussDB(for MySQL) instance, you can synchronize data from your GaussDB(for MySQL) instance to an HTAP instance for analysis.

Prerequisites

  • The parameters in your GaussDB(for MySQL) instance have been configured based on Table 1.
    Table 1 Parameter description

    Parameter

    Value

    How to Modify

    default_authentication_plugin

    mysql_native_password

    Modifying a Parameter Template

    binlog_expire_logs_seconds

    86400

    NOTE:

    It is recommended that the binlog retention period be greater than one day (= 86,400s = 60 (seconds) x 60 (minutes) x 24 (hours). This prevents incremental replication failures caused by a short binlog retention period.

    Modifying a Parameter Template

    log_bin

    NOTE:

    To use this parameter, ensure that the GaussDB(for MySQL) kernel version is earlier than 2.0.45.230900.

    ON

    How Do I Enable and View Binlog of My GaussDB(for MySQL) Instance?

    rds_global_sql_log_bin

    NOTE:

    To use this parameter, ensure that the GaussDB(for MySQL) kernel version is 2.0.45.230900 or later.

    ON

    How Do I Enable and View Binlog of My GaussDB(for MySQL) Instance?

    binlog_format

    ROW

    Run the SHOW VARIABLES; command to check the parameter value. If you need to change the parameter value, contact the customer service personnel.

    binlog_row_image

    FULL

    Run the SHOW VARIABLES; command to check the parameter value. If you need to change the parameter value, contact the customer service personnel.

    log_bin_use_v1_row_events

    OFF

    Run the SHOW VARIABLES; command to check the parameter value. If you need to change the parameter value, contact the customer service personnel.

  • Databases and tables have been created for your GaussDB(for MySQL) instance.

Creating a Synchronization Task

  1. Log in to the management console.
  2. Click in the upper left corner and select a region and a project.
  3. Click in the upper left corner of the page, choose Databases > GaussDB(for MySQL).
  4. On the Instances page, locate a GaussDB(for MySQL) instance and click the instance name to go to the Basic Information page.
  5. In the navigation pane on the left, choose HTAP Analysis.
  6. Click the name of an HTAP instance to go to the Basic Information page.
  7. In the navigation pane on the left, choose Data Synchronization. On the displayed page, click Create Synchronization Task.

    Before the synchronization, check the storage and connection status of the HTAP instance by referring to Viewing Metrics.

  8. Configure required parameters.

    Figure 1 Creating a synchronization task

    • Currently, the databases whose name is Chinese cannot be synchronized.
    • Synchronizing Data from a Read Replica: Select Yes: You need to select a read replica. Full data is synchronized from the selected read replica, preventing query pressure on the primary node during full synchronization. If there is only one read replica, this node is selected by default. During a full synchronization, ensure that the read replica is available, or you need to perform the synchronization again after the full synchronization fails.
    • Database to be Synchronized: Select a database that the data will be synchronized to from the drop-down list. You can modify the database parameters as required.
      Table 2 Parameter description

      Parameter

      Default Value

      Allowed Value

      Description

      Suggestion

      default_isolation_level

      QUERY_SNAPSHOT

      • READ_COMMITTED
      • READ_UNCOMMITTED
      • QUERY_RAW
      • QUERY_SNAPSHOT

      Controls the default isolation level. This parameter is suitable only for the MaterializeMySQL engine.

      If the synchronized data will be modified later, set the parameter to QUERY_SNAPSHOT to accelerate queries, but generating snapshots will increase the data synchronization latency. If the synchronized data will not be modified later, set the parameter to READ_COMMITTED to reduce the data synchronization latency.

      enable_snapshots

      true

      • true
      • false

      Disables or enables snapshots. This parameter can be set to true only when mvcc_isolation_level is set to query_snapshot.

      Set this parameter to true only when default_isolation_level is set to QUERY_SNAPSHOT.

      enable_chunk_level

      true

      • true
      • false

      Enables or disables chunk-level parallel replication.

      If you want to quickly complete full synchronization, enable this function. However, during the synchronization, the read pressure of read replicas increases. If there is no time requirement on full synchronization, disable this function.

      max_sync_threads

      1

      1 to vCPUs/2

      Controls the maximum number of threads for full replication.

      If you want to quickly complete full synchronization and HTAP instances deliver good performance, configure multiple threads. However, increasing this value will increase the read pressure of read replicas and affect the performance of HTAP instance. If there are no time requirements on the full synchronization, do not change this parameter value.

      max_rows_in_buffer

      2000000

      1 to 10000000

      Controls the maximum number of rows of data that is allowed to cache in memory (for a single table and the cache data unable to query) When the number of rows exceeds the parameter value, the data will be materialized.

      -

      max_rows_in_buffers

      6000000

      1 to 10000000

      Controls the maximum number of rows of data that is allowed to cache in memory (for databases and the cache data unable to query) When the number of rows exceeds the parameter value, the data will be materialized.

      -

      max_bytes_in_buffer

      200000000

      1~1000000000

      Controls the maximum number of bytes of data that is allowed to cache in memory (for a single table and the cache data unable to query) When the number of bytes exceeds the parameter value, the data will be materialized.

      -

      max_bytes_in_buffers

      600000000

      1~1000000000

      Controls the maximum number of bytes of data that is allowed to cache in memory (for databases and the cache data unable to query) When the number of bytes exceeds the parameter value, the data will be materialized.

      -

      max_flush_data_time

      5000

      1~10000

      Controls how long that data is cached in the memory, in milliseconds.

      If you want to shorten the data synchronization delay, set this parameter to a smaller value (at least 1000 ms). If the value is too small, the table may be fragmented, affecting the query performance. This parameter, max_rows_in_buffer, max_rows_in_buffers, max_bytes_in_buffer and max_bytes_in_buffers determine the time for flushing data to disks. If either of the conditions is met, the data is flushed to the disk.

      treat_numeric_string_as_integer

      false

      • true
      • false

      Whether numeric strings are treated as integers during block-level task generation during full synchronization.

      -

      min_binlog_expire_logs_seconds

      86400

      0~2592000

      Controls the minimum duration for storing binlogs. If the duration for storing binlogs is less than the value of this parameter, the synchronization will fail. 0 indicates that the duration is not limited.

      -

    • Synchronization Scope: Select All Tables or Some Tables.
    • Blacklist and Whitelist: If Synchronization Scope is set to Some Tables, you need to configure tables for the blacklist or whitelist.
      The tables in the whitelist or blacklist can be added or deleted. For details, see Adding or Deleting Tables in the Blacklist or Whitelist.
      • The blacklist or whitelist cannot be empty.

      • You need to configure table for either the whitelist or blacklist as required. If you select the whitelist, only the tables in the whitelist are synchronized. The tables in the blacklist cannot be synchronized.
      • The tables to be synchronized must contain primary keys or a non-empty unique key. Otherwise, the tables cannot be synchronized to the HTAP instance.
      • Extra disk space may be used during backend data combination and query. You are advised to reserve 50% free disk space for the system.
      • To quickly find the desired tables to be added, you can enter multiple tables in the search box and click . Use commas (,), spaces (), and line breaks (\n) to separate multiple tables. The tables that you entered are selected by default in the Available Tables list and displayed in the Selected Tables list.

    • Table Synchronization: Enable or disable table synchronization as required.
      • If you select Enabled:
        1. Select a table on the left and perform column operations, such as ORDER BY, PARTITION BY, SAMPLE BY, PRIMARY KEY, and TTL. For details about the syntax, see Syntax.
          Figure 2 Table synchronization

          Example:

          • In ORDER BY(COL1, COL2), COL1, and COL2 cannot be NULL.
          • PARTITION BY toYYYYMM(data_time);
          • By default, PRIMARY KEY is the same as ORDER BY KEY and does not need to be specified.
          • If SAMPLE BY is specified, ORDER BY must contain the content of SAMPLE BY, for example, SAMPLE BY intHash32(UserID) ORDER BY (CounterID, EventDate, intHash32(UserID));
          • Do not set the TTL to a small value (at least one day), for example, TTL time + INTERVAL 1 DAY

          When configuring ORDER BY KEY, you need to ensure that the data is consistent after synchronization.

          The value of PARTITION BY KEY cannot be too large, or the performance is affected.

        2. After entering statements in Column Operation column, click Verify on the right of this area.
      • If you select Disabled, go to 9.

  9. After the settings are complete, click Create Synchronization Task.
  10. On the details page, confirm the settings and click Sync Now. If you need to modify the information, click Previous.

    If you click Previous on the page or click in the upper left corner of the page to return to the data synchronization page, a task to be synchronized will be generated. The status of the task is Synchronization Stage: Waiting for synchronization. To start the task, click Synchronize in the Operation column. To modify the task, see Editing a Data Synchronization Task.

  11. Click Back to Synchronization List to return to the data synchronization page. View details about the source database, destination database, status, and operations.

    Figure 3 Viewing task status

    If the status of a task is Synchronization Stage: Incremental synchronization in progress, the data synchronization is complete.

Viewing a Data Synchronization Task

  1. Log in to the management console.
  2. Click in the upper left corner and select a region and a project.
  3. Click in the upper left corner of the page, choose Databases > GaussDB(for MySQL).
  4. On the Instances page, locate a GaussDB(for MySQL) instance and click the instance name to go to the Basic Information page.
  5. In the navigation pane on the left, choose HTAP Analysis.
  6. Click the name of an HTAP instance to go to the Basic Information page.
  7. In the navigation pane on the left, choose Data Synchronization.
  8. Locate a task and click View in the Operation column.

    Figure 4 Viewing a data synchronization task

  9. View the detailed information about the task.

Editing a Data Synchronization Task

When the status of a task is Synchronization Stage: Waiting for synchronization, you can edit the task.

  1. Log in to the management console.
  2. Click in the upper left corner and select a region and a project.
  3. Click in the upper left corner of the page, choose Databases > GaussDB(for MySQL).
  4. On the Instances page, locate a GaussDB(for MySQL) instance and click the instance name to go to the Basic Information page.
  5. In the navigation pane on the left, choose HTAP Analysis.
  6. Click the name of an HTAP instance to go to the Basic Information page.
  7. In the navigation pane on the left, choose Data Synchronization.
  8. Locate a task and click Edit in the Operation column.

    Figure 5 Editing a data synchronization task

  9. Configure required parameters.
  10. After the settings are complete, click Edit Synchronization Task.
  11. Confirm the settings and click Sync Now.
  12. Click Back to Synchronization List to return to the data synchronization page. View details about the source database, destination database, status, and operations.

    Figure 6 Viewing task status

    If the status of a task is Synchronization Stage: Incremental synchronization in progress, the data synchronization is complete.

Deleting a Data Synchronization Task

  1. Log in to the management console.
  2. Click in the upper left corner and select a region and a project.
  3. Click in the upper left corner of the page, choose Databases > GaussDB(for MySQL).
  4. On the Instances page, locate a GaussDB(for MySQL) instance and click the instance name to go to the Basic Information page.
  5. In the navigation pane on the left, choose HTAP Analysis.
  6. Click the name of an HTAP instance to go to the Basic Information page.
  7. In the navigation pane on the left, choose Data Synchronization.
  8. Locate a task and click Delete in the Operation column.

    Figure 7 Deleting a data synchronization task

  9. In the displayed dialog box, click Yes.