Help Center/ GeminiDB/ GeminiDB Cassandra API/ Best Practices/ Performance of GeminiDB Cassandra and On-Premises Open Source Cassandra Clusters
Updated on 2024-08-03 GMT+08:00

Performance of GeminiDB Cassandra and On-Premises Open Source Cassandra Clusters

This section describes how the performance of an open-source Cassandra cluster compares to a GeminiDB Cassandra cluster. The test environment, test model, and test steps will all be described.

Test Environment

  • Open-source Cassandra test environment
    Table 1 Test environment description

    Name

    Open-source Cassandra Cluster

    Version

    3.11.5

    Nodes

    3

    OS

    CentOS 7.4

    ECS Specifications

    • General computing-plus 4 vCPUs | 16 GB
    • General computing-plus 8 vCPUs | 32 GB
    • General computing-plus 16 vCPUs | 64 GB
    • General computing-plus 32 vCPUs | 128 GB
  • GeminiDB Cassandra test environment
    Table 2 Test environment description

    Name

    GeminiDB Cassandra Cluster

    Region

    CN-Hong Kong

    Nodes

    3

    AZ

    AZ 3

    Version

    3.11

    Instance Specifications

    • 4 vCPUs | 16 GB
    • 8 vCPUs | 32 GB
    • 16 vCPUs | 64 GB
    • 32 vCPUs | 128 GB

Load Test Tool Environment

  • Load test tool specifications
    Table 3 Specifications description

    Name

    Test client ECS

    vCPUs

    16

    Memory

    64 GB

    OS

    CentOS 7.4

  • Load test tool information
    Table 4 Load test tool information

    Test Tool

    YCSB

    Version

    0.12.0

    Download Address

    https://github.com/brianfrankcooper/YCSB

    curl -O --location https://github.com/brianfrankcooper/YCSB/releases/download/0.12.0/ycsb-0.12.0.tar.gz

Testing Models

Table 5 Testing models

Service Model

Description

_read95_update5

95% read and 5% update

_update50_read50

50% update and 50% read

_read65_update25_insert10

65% read, 25% update, and 10% write

_insert90_read10

90% write and 10% read

Test Procedure

Testing open-source Cassandra

  1. Purchase an ECS.

    1. Log in to the management console.
    2. Choose Computing > Elastic Cloud Server.
    3. Click Buy ECS in the upper right corner of the page and configure related parameters as follows:
      • Region: CN-Hong Kong
      • AZ: AZ3
      • Specifications: General computing-plus | c6.xlarge.4
      • Image: Public image and CentOS 7.6 64bit(40 GB)
      • Data Disk: Ultra-high I/O and 200 GB
      • Network: Select a VPC and subnet.
      • Other parameters: Set other parameters as needed. You can ignore optional parameters.
    4. Repeat the preceding steps to create five ECSs named Cassandra-1 (192.168.0.15), Cassandra-2 (192.168.0.240), Cassandra-3 (192.168.0.153), Cassandra-4 (192.168.0.175) and ycsb-Cassandra (192.168.0.60).

      ECSs Cassandra-1, Cassandra-2, and Cassandra-3 are for initializing Cassandra clusters. ECS Cassandra-4 is for capacity expansion. ECS ycsb-Cassandra serves as the load test server.

      Figure 1 ECS details
    5. After those ECSs are created, log in to them using the remote login option provided on the management console.
      Figure 2 Logging in to an ECS
    6. Install Java Runtime Environment:

      yum install jre

    7. Install the Cassandra service and create a data directory.
      1. Download the Cassandra installation package:

        wget https://archive.apache.org/dist/cassandra/3.11.5/apache-cassandra-3.11.5-bin.tar.gz

      2. Decompress the installation package:

        tar -zxvf apache-Cassandra-3.11.5-bin.tar.gz -C /root/

      3. Change the installation directory:

        mv /root/apache-Cassandra-3.11.5 /usr/local/Cassandra

      4. Configure environment variables:

        echo "export PATH=/usr/local/Cassandra/bin:$PATH" >> /etc/profile

      5. Apply the variables:

        source /etc/profile

      6. Create a data directory:

        mkdir /data

      7. Confirm that the installation was successful.

        cqlsh

        Figure 3 Successful installation

  2. Configure an open-source Cassandra cluster.

    1. Log in to ECSs Cassandra-1, Cassandra-2, and Cassandra-3.
    2. Go to the /usr/local/Cassandra/conf directory and modify the Cassandra-topology.properties file as follows:
      • Comment out the content in the area marked by No.1 in Figure 4.
      • Add the content in the area marked by No.2 in Figure 4.
      Figure 4 Modifying the configuration file

      The Cassandra-topology.properties configuration files of Cassandra-1, Cassandra-2, and Cassandra-3 must be the same.

    3. Modify the Cassandra.yaml file as follows:
      data_file_directories:
      - /data
      commitlog_directory: /usr/local/Cassandra/commitlog
      saved_caches_directory: /usr/local/Cassandra/saved_caches
      seed_provider:
      # Addresses of hosts that are deemed contact points.
      # Cassandra nodes use this list of hosts to find each other and learn
      # the topology of the ring.  You must change this if you are running
      # multiple nodes!
      - class_name: org.apache.Cassandra.locator.SimpleSeedProvider
      parameters:
      # seeds is actually a comma-delimited list of addresses.
      # Ex: "<ip1>,<ip2>,<ip3>"
      - seeds: "192.168.0.153,192.168.0.240,192.168.0.15" ##Enter IP addresses of the three nodes in the cluster.
      listen_address: 192.168.0.153       #IP address of each node
      rpc_address: 192.168.0.153			#IP address of each node
    4. Run the following command on Cassandra-1, Cassandra-2, and Cassandra-3 to start the Cassandra cluster:

      Cassandra –R &

  3. Add nodes to the open-source Cassandra cluster.

    1. Log in to Cassandra-4.
    2. Go to the /usr/local/cassandra/conf directory and edit the Cassandra-topology.properties file as follows:
      • Comment out the content in the area marked by No.1 in Figure 5.
      • Add the content in the area marked by No.2 in Figure 5.
        Figure 5 Editing the configuration file
    3. Modify the Cassandra.yaml file as follows:
      data_file_directories:
      - /data
      commitlog_directory: /usr/local/Cassandra/commitlog
      saved_caches_directory: /usr/local/Cassandra/saved_caches
      seed_provider:
      # Addresses of hosts that are deemed contact points.
      # Cassandra nodes use this list of hosts to find each other and learn
      # the topology of the ring.  You must change this if you are running
      # multiple nodes!
      - class_name: org.apache.Cassandra.locator.SimpleSeedProvider
      parameters:
      # seeds is actually a comma-delimited list of addresses.
      # Ex: "<ip1>,<ip2>,<ip3>"
      - seeds: "192.168.0.153,192.168.0.240,192.168.0.15" ## Enter IP addresses of the three seed nodes in the cluster, which must be the same as the values entered in step 1.
      listen_address: 192.168.0.175       #IP address of each node
      rpc_address: 192.168.0.175			#IP address of each node
    4. Log in to Cassandra-1.
    5. Stop compaction on all nodes:

      nodetool disableautocompaction

    6. Stop the ongoing compaction task:

      nodetool stop COMPACTION

    7. Limit migration traffic of the node:

      nodetool setstreamthroughput 32

      In the preceding command, the value of nodetool setstreamthroughput 32 is set to 32 MB/s to reduce the impact of migration on services.

    8. Log in to Cassandra-4.
    9. Start the Cassandra service:

      Cassandra –R &

    10. Log in to Cassandra-1.
    11. During the scaling, run the following command every 30 seconds:

      nodetool status

      If the status of Cassandra-4 is UJ, data is being migrated. The migration is complete when the status changes to UN.

      Figure 6 Node statuses

Testing GeminiDB Cassandra

  1. Purchase a GeminiDB Cassandra cluster.

    1. Log in to the management console.
    2. Choose Databases > GeminiDB.
    3. Click Buy DB Instance in the upper right corner of the page and set required parameters as follows:
      • Region: CN-Hong Kong
      • Compatible API: Cassandra
      • Specifications: 4 vCPUs | 16 GB
      • Storage Space: 200 GB
      • Nodes: Enter 3.
      • VPC: The same as that of the purchased ECS.
      • Security Group: The same as that of the purchased ECS.

  2. Add nodes to the GeminiDB Cassandra cluster.

    1. Log in to the management console.
    2. Choose Databases > GeminiDB.
    3. Select an existing GeminiDB Cassandra instance.
    4. Click the instance name to enter the Basic Information page.
    5. In the Node Information area on the Basic Information page, click Add Node.
      Figure 7 Node information

    6. On the displayed page, click + on the right of field Add Nodes .
      Figure 8 Adding nodes

    7. Wait until the nodes are added.
    8. View the change of QPS during the scale-out process.
      Figure 9 QPS changes

      During the scale-out process, the QPS of the GeminiDB Cassandra instance decreases slightly for about 10 seconds, which almost has no effect on services. The whole scaling process takes about 10 minutes.

      After the scale-out is complete, you can analyze test data.

Test Results

  • Performance results
    Table 6 Performance data

    qps_avg Statistics

    Node Class

    Concurrent Threads of the Client

    Data Volume to Be Prepared

    _read95_update5

    _update50_read50

    _read65_update25_insert10

    _insert90_read10

    Open-source Cassandra cluster

    4 vCPUs | 6 GB

    32

    50

    2884

    5068

    8484

    10694

    8 vCPUs | 32 GB

    64

    100

    2796

    2904

    5180

    7854

    16 vCPUs | 64 GB

    128

    200

    5896

    14776

    14304

    15707

    32 vCPUs | 128 GB

    256

    400

    8964

    22284

    19592

    22344

    GeminiDB Cassandra cluster performance data

    4 vCPUs | 6 GB

    32

    50

    8439

    10565

    9468

    23830

    8 vCPUs | 32 GB

    64

    100

    24090

    24970

    21716

    44548

    16 vCPUs | 64 GB

    128

    200

    48985

    51335

    43557

    67290

    32 vCPUs | 128 GB

    256

    400

    91280

    85748

    74313

    111540

    Performance comparison between GeminiDB Cassandra and open-source Cassandra

    4 vCPUs | 6 GB

    32

    50

    2.93

    2.08

    1.12

    2.23

    8 vCPUs | 32 GB

    64

    100

    8.62

    8.60

    4.19

    5.67

    16 vCPUs | 64 GB

    128

    200

    8.31

    3.47

    3.05

    4.28

    32 vCPUs | 128 GB

    256

    400

    10.18

    3.85

    3.79

    4.99

  • Test Conclusion
    1. The GeminiDB Cassandra cluster performs ten times better than the open-source Cassandra cluster in terms of read latency.
    2. GeminiDB Cassandra cluster gives you basically the same write performance as the open-source cluster.
    3. Adding nodes slightly affects both the GeminiDB Cassandra and open-source clusters.
      • The scale-out for GeminiDB Cassandra is fast and only affects services briefly (10s). You do not need to change parameters, and the scale-out process takes 10 minutes.
      • For an open-source Cassandra cluster, the time needed for adding nodes depends on the data volume and parameter settings, and the impact on performance varies. In this test, the scale-out took more than 30 minutes when the preset data size was 50 GB.
      • Calculation formula: Highest migration speed = (nodetool setstreamthroughput 32 value, 200 Mbit/s by default) x Original nodes

        In this test, the highest migration speed = 32 Mbit/s x 3 = 12 MB/s = 720 MB/min = 0.703 GB/min. So, the time needed for migrating 50 GB of data in this scenario was 71.1 minutes (50/0.703).