Synchronizing Data
Scenarios
To analyze data in some databases of your GaussDB(for MySQL) instance, you can synchronize data from your GaussDB(for MySQL) instance to an HTAP instance for analysis.
Prerequisites
- The parameters in your GaussDB(for MySQL) instance have been configured based on Table 1.
Table 1 Parameter description Parameter
Value
How to Modify
default_authentication_plugin
mysql_native_password
binlog_expire_logs_seconds
86400
NOTE:It is recommended that the binlog retention period be greater than one day (= 86,400s = 60 (seconds) x 60 (minutes) x 24 (hours). This prevents incremental replication failures caused by a short binlog retention period.
log_bin
NOTE:To use this parameter, ensure that the GaussDB(for MySQL) kernel version is earlier than 2.0.45.230900.
ON
How Do I Enable and View Binlog of My GaussDB(for MySQL) Instance?
rds_global_sql_log_bin
NOTE:To use this parameter, ensure that the GaussDB(for MySQL) kernel version is 2.0.45.230900 or later.
ON
How Do I Enable and View Binlog of My GaussDB(for MySQL) Instance?
binlog_format
ROW
Run the SHOW VARIABLES; command to check the parameter value. If you need to change the parameter value, contact the customer service personnel.
binlog_row_image
FULL
Run the SHOW VARIABLES; command to check the parameter value. If you need to change the parameter value, contact the customer service personnel.
log_bin_use_v1_row_events
OFF
Run the SHOW VARIABLES; command to check the parameter value. If you need to change the parameter value, contact the customer service personnel.
- Databases and tables have been created for your GaussDB(for MySQL) instance.
Creating a Synchronization Task
- Log in to the management console.
- Click
in the upper left corner and select a region and a project.
- Click
in the upper left corner of the page, choose .
- On the Instances page, locate a GaussDB(for MySQL) instance and click the instance name to go to the Basic Information page.
- In the navigation pane on the left, choose HTAP Analysis.
- Click the name of an HTAP instance to go to the Basic Information page.
- In the navigation pane on the left, choose Data Synchronization. On the displayed page, click Create Synchronization Task.
Before the synchronization, check the storage and connection status of the HTAP instance by referring to Viewing Metrics.
- Configure required parameters.
Figure 1 Creating a synchronization task
- Currently, the databases whose name is Chinese cannot be synchronized.
- Synchronizing Data from a Read Replica: Select Yes: You need to select a read replica. Full data is synchronized from the selected read replica, preventing query pressure on the primary node during full synchronization. If there is only one read replica, this node is selected by default. During a full synchronization, ensure that the read replica is available, or you need to perform the synchronization again after the full synchronization fails.
- Database to be Synchronized: Select a database that the data will be synchronized to from the drop-down list. You can modify the database parameters as required.
Table 2 Parameter description Parameter
Default Value
Allowed Value
Description
Suggestion
default_isolation_level
QUERY_SNAPSHOT
- READ_COMMITTED
- READ_UNCOMMITTED
- QUERY_RAW
- QUERY_SNAPSHOT
Controls the default isolation level. This parameter is suitable only for the MaterializeMySQL engine.
If the synchronized data will be modified later, set the parameter to QUERY_SNAPSHOT to accelerate queries, but generating snapshots will increase the data synchronization latency. If the synchronized data will not be modified later, set the parameter to READ_COMMITTED to reduce the data synchronization latency.
enable_snapshots
true
- true
- false
Disables or enables snapshots. This parameter can be set to true only when mvcc_isolation_level is set to query_snapshot.
Set this parameter to true only when default_isolation_level is set to QUERY_SNAPSHOT.
enable_chunk_level
true
- true
- false
Enables or disables chunk-level parallel replication.
If you want to quickly complete full synchronization, enable this function. However, during the synchronization, the read pressure of read replicas increases. If there is no time requirement on full synchronization, disable this function.
max_sync_threads
1
1 to vCPUs/2
Controls the maximum number of threads for full replication.
If you want to quickly complete full synchronization and HTAP instances deliver good performance, configure multiple threads. However, increasing this value will increase the read pressure of read replicas and affect the performance of HTAP instance. If there are no time requirements on the full synchronization, do not change this parameter value.
max_rows_in_buffer
2000000
1 to 10000000
Controls the maximum number of rows of data that is allowed to cache in memory (for a single table and the cache data unable to query) When the number of rows exceeds the parameter value, the data will be materialized.
-
max_rows_in_buffers
6000000
1 to 10000000
Controls the maximum number of rows of data that is allowed to cache in memory (for databases and the cache data unable to query) When the number of rows exceeds the parameter value, the data will be materialized.
-
max_bytes_in_buffer
200000000
1~1000000000
Controls the maximum number of bytes of data that is allowed to cache in memory (for a single table and the cache data unable to query) When the number of bytes exceeds the parameter value, the data will be materialized.
-
max_bytes_in_buffers
600000000
1~1000000000
Controls the maximum number of bytes of data that is allowed to cache in memory (for databases and the cache data unable to query) When the number of bytes exceeds the parameter value, the data will be materialized.
-
max_flush_data_time
5000
1~10000
Controls how long that data is cached in the memory, in milliseconds.
If you want to shorten the data synchronization delay, set this parameter to a smaller value (at least 1000 ms). If the value is too small, the table may be fragmented, affecting the query performance. This parameter, max_rows_in_buffer, max_rows_in_buffers, max_bytes_in_buffer and max_bytes_in_buffers determine the time for flushing data to disks. If either of the conditions is met, the data is flushed to the disk.
treat_numeric_string_as_integer
false
- true
- false
Whether numeric strings are treated as integers during block-level task generation during full synchronization.
-
min_binlog_expire_logs_seconds
86400
0~2592000
Controls the minimum duration for storing binlogs. If the duration for storing binlogs is less than the value of this parameter, the synchronization will fail. 0 indicates that the duration is not limited.
-
- Synchronization Scope: Select All Tables or Some Tables.
- Blacklist and Whitelist: If Synchronization Scope is set to Some Tables, you need to configure tables for the blacklist or whitelist.
The tables in the whitelist or blacklist can be added or deleted. For details, see Adding or Deleting Tables in the Blacklist or Whitelist.
- You need to configure table for either the whitelist or blacklist as required. If you select the whitelist, only the tables in the whitelist are synchronized. The tables in the blacklist cannot be synchronized.
- The tables to be synchronized must contain primary keys or a non-empty unique key. Otherwise, the tables cannot be synchronized to the HTAP instance.
- Extra disk space may be used during backend data combination and query. You are advised to reserve 50% free disk space for the system.
- To quickly find the desired tables to be added, you can enter multiple tables in the search box and click
. Use commas (,), spaces (), and line breaks (\n) to separate multiple tables. The tables that you entered are selected by default in the Available Tables list and displayed in the Selected Tables list.
- Table Synchronization: Enable or disable table synchronization as required.
- If you select Enabled:
- Select a table on the left and perform column operations, such as ORDER BY, PARTITION BY, SAMPLE BY, PRIMARY KEY, and TTL. For details about the syntax, see Syntax.
Figure 2 Table synchronization
Example:
- In ORDER BY(COL1, COL2), COL1, and COL2 cannot be NULL.
- PARTITION BY toYYYYMM(data_time);
- By default, PRIMARY KEY is the same as ORDER BY KEY and does not need to be specified.
- If SAMPLE BY is specified, ORDER BY must contain the content of SAMPLE BY, for example, SAMPLE BY intHash32(UserID) ORDER BY (CounterID, EventDate, intHash32(UserID));
- Do not set the TTL to a small value (at least one day), for example, TTL time + INTERVAL 1 DAY
When configuring ORDER BY KEY, you need to ensure that the data is consistent after synchronization.
The value of PARTITION BY KEY cannot be too large, or the performance is affected.
- After entering statements in Column Operation column, click Verify on the right of this area.
- Select a table on the left and perform column operations, such as ORDER BY, PARTITION BY, SAMPLE BY, PRIMARY KEY, and TTL. For details about the syntax, see Syntax.
- If you select Disabled, go to 9.
- If you select Enabled:
- After the settings are complete, click Create Synchronization Task.
- On the details page, confirm the settings and click Sync Now. If you need to modify the information, click Previous.
If you click Previous on the page or click
in the upper left corner of the page to return to the data synchronization page, a task to be synchronized will be generated. The status of the task is Synchronization Stage: Waiting for synchronization. To start the task, click Synchronize in the Operation column. To modify the task, see Editing a Data Synchronization Task.
- Click Back to Synchronization List to return to the data synchronization page. View details about the source database, destination database, status, and operations.
Figure 3 Viewing task status
If the status of a task is Synchronization Stage: Incremental synchronization in progress, the data synchronization is complete.
Viewing a Data Synchronization Task
- Log in to the management console.
- Click
in the upper left corner and select a region and a project.
- Click
in the upper left corner of the page, choose .
- On the Instances page, locate a GaussDB(for MySQL) instance and click the instance name to go to the Basic Information page.
- In the navigation pane on the left, choose HTAP Analysis.
- Click the name of an HTAP instance to go to the Basic Information page.
- In the navigation pane on the left, choose Data Synchronization.
- Locate a task and click View in the Operation column.
Figure 4 Viewing a data synchronization task
- View the detailed information about the task.
Editing a Data Synchronization Task
When the status of a task is Synchronization Stage: Waiting for synchronization, you can edit the task.
- Log in to the management console.
- Click
in the upper left corner and select a region and a project.
- Click
in the upper left corner of the page, choose .
- On the Instances page, locate a GaussDB(for MySQL) instance and click the instance name to go to the Basic Information page.
- In the navigation pane on the left, choose HTAP Analysis.
- Click the name of an HTAP instance to go to the Basic Information page.
- In the navigation pane on the left, choose Data Synchronization.
- Locate a task and click Edit in the Operation column.
Figure 5 Editing a data synchronization task
- Configure required parameters.
- After the settings are complete, click Edit Synchronization Task.
- Confirm the settings and click Sync Now.
- Click Back to Synchronization List to return to the data synchronization page. View details about the source database, destination database, status, and operations.
Figure 6 Viewing task status
If the status of a task is Synchronization Stage: Incremental synchronization in progress, the data synchronization is complete.
Deleting a Data Synchronization Task
- Log in to the management console.
- Click
in the upper left corner and select a region and a project.
- Click
in the upper left corner of the page, choose .
- On the Instances page, locate a GaussDB(for MySQL) instance and click the instance name to go to the Basic Information page.
- In the navigation pane on the left, choose HTAP Analysis.
- Click the name of an HTAP instance to go to the Basic Information page.
- In the navigation pane on the left, choose Data Synchronization.
- Locate a task and click Delete in the Operation column.
Figure 7 Deleting a data synchronization task
- In the displayed dialog box, click Yes.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot