Updated on 2024-10-30 GMT+08:00

Migrating Data

InfluxDB Community Edition is a popular time series database that focuses on high-performance query and storage of time series data.

GeminiDB Influx is a cloud-native NoSQL time-series database with a decoupled compute and storage architecture developed by Huawei and full compatibility with InfluxDB. This high availability database is secure and scalable, can be deployed, backed up, or restored quickly, and includes monitoring and alarm management. You can also expand storage or compute resources separately. GeminiDB Influx has better query, write, and data compression performance than InfluxDB Community Edition.

This section describes how to migrate data from InfluxDB Community Edition to GeminiDB Influx.

Migration Principles

Use the migration tool to parse the tsm and wal files of the InfluxDB community edition and write the files to the line protocol file. Then, the line protocol file data is parsed and migrated to the destination side.

The migration process is divided into two phases: export and import.

  • In the export phase, the tsm and wal files of the InfluxDB community edition are concurrently parsed and the parsed data is written into the line protocol file.
  • In the import phase, the line protocol file is concurrently read and the read data is sent to each node in the GeminiDB Influx cluster.

The migration tool supports full migration and incremental migration, which can be configured in the configuration file.

Precautions

  • Migration tool, which is deployed on the same server as the InfluxDB community edition. Prepare the configuration file.
  • The migration tool needs to extract data from tsm and wal to the local line protocol file, obtain data from the line protocol file, and send the data to the destination GeminiDB Influx database. This process may affect the performance of the source side. You are advised to run the migration tool during off-peak hours.
  • Reserve sufficient disk space because the .tsm/wal file data needs to be extracted to the line protocol file.
  • The migration tool supports only the InfluxDB 1.X community edition.

Prerequisites

  • Ensure that the network connection between the source and destination is normal.
  • The corresponding database has been created and the retention policy (RP) has been configured in the destination GeminiDB Influx.

Exporting Data

To obtain the GeminiDB Influx migration tool cvtLocDataTool_all.tar, choose Service Tickets > Create Service Ticket in the upper right corner of the console.

  1. Prepare for data export.

    Run the migration tool to parse the prepared tsm and wal files and convert them into files in the line protocol format for import.

    GeminiDB Influx API offers a compression ratio. Reserve disk space on the ECS for exporting GeminiDB Influx data 30 times the total disk space of the GeminiDB Influx tsm and wal directories.

  2. Modify export configuration files.

    Create an export directory, decompress cvtLocDataTool_all.tar to the export directory, and modify the ./cvtLocDataTool/config/toolcfg.json file in the export directory. The file template content is as follows:

    {
        "orgData" : "./sample_data",
        "expBeginTime" : "2021-03-27T08:00:00+08:00",
        "expEndTime"   : "2021-03-27T20:00:00+08:00",
        "mutilProc"    : true,
        "Concurrent Number" : 12, 
        "openDebugLog" : false,
        "ignoreDBs" : "_internal|myfirstdb"
    }
    • orgData: directory for storing GeminiDB Influx wal and tsm files. Sort out the tsm and wal file directories and ensure that the tsm file directory is xxx/data/[Database name]/[RP name] /[shard Id]/xxxx.tsm and the wal file directory is xxx/wal/[Database name]/[RP name] /[shard Id]/xxxx.wal.
    • expBeginTime: indicates the start time (GMT+08:00) for migrating exported data. If this parameter is left blank, the start time does not need to be specified.
    • expEndTime: indicates the end time (GMT+08:00) of the exported data. If the value is empty, the end time does not need to be specified.
    • mutilProc: indicates whether to enable the multi-process function. Ensure that this parameter is set to true.
    • Concurrent Number: indicates the number of concurrent tasks, which is related to the performance of the running ECS. You are advised to set this parameter to 8 for 16 vCPUs | 64 GB and to 12 for 32 vCPUs | 128 GB.
    • openDebugLog: indicates whether to print debug logs. This parameter is used for testing. Ensure that this parameter is set to false.
    • ignoreDBs: indicates the list of databases that do not need to be exported. Use vertical bars (|) to separate multiple databases. The default value is _internal.

  3. Run the export script.

    After the modification, run the following command to run the export script:

    nohup python cvtAllData.py &

    Run the following command to check whether the task is complete:

    ps -ef|grep cvtAllData.py|grep -v grep

    After the script is executed, the exported file is stored in the cvtLocDataTool/rstData/Output/ directory.

Importing Data

  1. After the import task is complete, modify the import configuration file.

    Decompress importInflux.zip, go to the ./importInflux/import / directory, and modify the config.json file. The file content is as follows:

    {
      "ImportDir":"/root/stefan/stefan-AKC/data/",
      "ProcessorsNum":6,   
      "ConnectDbPool":"xxx.xxx.xxx.xxx",
      "Ssl":false,
      "dropDatabases":"stefaninflux|stefaninflux1|prism"
    }
    • ImportDir: indicates directory for imported data, that is, the export directory in the export step. The directory must be an absolute path of the export directory.
    • ProcessorsNum: indicates the total number of concurrent tasks. You are advised to set the total number of concurrent tasks to a value ranging from 2 to 3 x Number of nodes. Number of nodes refers to the number of GeminiDB Influx instance nodes. For example, if ConnectDbPool is set to the IP addresses of three nodes, the minimum number of concurrent tasks is 2 and the maximum is 9 (3 x 3).
    • ConnectDbPool: indicates the IP address of the connection pool. Enter the IP addresses of the GeminiDB Influx instance nodes. Separate multiple IP addresses with vertical bars (|).
    • Ssl: indicates whether to enable SSL for the GeminiDB Influx instance. If SSL is enabled, set this parameter to true. If SSL is disabled, set this parameter to false.
    • dropDatabases: indicates the list of databases to be deleted before data import. This parameter is used together with deleteDb. Separate multiple databases with vertical bars (|). If multiple databases do not need to be deleted, leave this parameter blank.

  2. Run the import script.

    After modifying the import configuration file, go to the /importInflux/import/ directory.

    cd ./importInflux/import/

    Run the following command to execute an import task:

    nohup ./import -host $host -username &username -password &password - deleteDb[Option] &
    • $host, $username, and $password indicate the instance IP address, database account, and password, respectively. If the password contains special characters, such as! @, insert the escape character (\) before the special characters.
    • Parameter deleteDb is optional. Only use it if you want to delete the database before importing.

  3. Import task execution logs.

    Import logs are recorded in the decompression directory ./importInflux/import. If an export exception occurs, collect the export logs in this directory.

  4. Retry the import task.

    If you need to execute the task again, delete the ./importInflux/import/data directory. If you don't delete the task, the import will resume from the last incomplete or failed task.

    rm -rf ./importInflux/import/data

  5. Check the import results.

    After the import is complete, verify the data integrity. You can sample and compare data from both the source and target ends to confirm their consistency.

Migration Performance Reference

  • Migration environment
    • Source: Deploy InfluxDB and a migration tool on an ECS with 4 vCPUs and 16 GB of memory.
    • Destination: Three-node GeminiDB Influx instance with 4 vCPUs and 16 GB memory
  • Migration performance
    • The data export rate of a single process on the source is 1 GB/min.
    • The single-thread import rate of the destination is 1 GB/min.