Updated on 2024-11-29 GMT+08:00

Migrating Solr Data Using Solr2ES

Scenario

This section describes how to query Solr indexes and write index data to the Elasticsearch cluster when Solr and Elasticsearch are running properly.

Prerequisites

  • The Solr and Elasticsearch clusters are running properly.
  • To ensure data consistency after the migration, upper-layer services need to stop write operations on the Solr source cluster. The read operations can be conducted normally. After the migration is complete, the upstream services are switched to the target Elasticsearch cluster for reading and writing.
  • The Solr and Elasticsearch clients of the same version have been installed, and and the client nodes can communicate with the Solr and Elasticsearch clusters. For example, the installation directory is /opt/client.
  • A human-machine user, for example, test, has been created in the source and destination clusters, added to the solr, elasticsearch, and supergroup (primary) user groups, and assigned the Manager_administrator role.

Procedure

Modify configuration files

  1. Check whether Solr and Elasticsearch are in normal mode.

    1. Check whether Solr is in normal mode.

      Log in to FusionInsight Manager as user test (change the password upon your first login) and choose Cluster > Services > Solr. Click the hyperlink next to Solr WebUI to go to the Solr Admin UI page. On the Dashboard page, check whether solr.hdfs.security.kerberos.enabled=true exists. If yes, Solr is in the security mode. If no, Solr is in the normal mode.

    2. Check whether Elasticsearch is in normal mode.

      On Manager, choose Cluster > Services > Elasticsearch and click the Configurations tab. On the displayed page, search for the ELASTICSEARCH_SECURITY_ENABLE parameter and check whether it is searchable. If yes, check whether the value is true, which indicates that the security mode is enabled.

    • If both Solr and Elasticsearch are in normal mode, go to 4.
    • Otherwise, go to 2.

  2. In the destination cluster, choose System > Permission > User. Locate the newly created user and choose More > Download Authentication Credential. Then select the cluster information, and click OK to download the file.
  3. Upload the user.keytab and krb5.conf files obtained after the decompression to the Elasticsearch/tools/elasticsearch-data2es/solr2es/conf directory on the Elasticsearch client.
  4. Log in to the node where the client is deployed as user root.
  5. Run the following command to go to the configuration file directory of the Solr2ES tool:

    cd /opt/client/Elasticsearch/tools/elasticsearch-data2es/solr2es/conf

  6. Before modifying any configuration files, read the description information listed below.

    Configuration File

    Description

    user.keytab

    krb5.conf

    Authentication file of the user used in the migration task, which can be downloaded by referring to 2 and 3. You need to download the file only when Solr or Elasticsearch is in security mode.

    solr2es.jaas.conf

    JAAS configuration file required for authentication. If Solr is in security mode, you need to configure authentication information for the Client module. If Elasticsearch is in security mode, you need to configure the authentication information of the EsClient module.

    solr2es.properties

    General parameters configured for a migration task, which take effect for all indexes.

    migrateIndex.properties

    Used to configure the index migration information.

    solrCursorMark.properties

    If an exception occurs during the index migration task, the Solr Cursor Mark of the current migration index is automatically updated to the configuration file solrCursorMark.properties and used as the query condition of the next Solr query.

    You do not need to configure this file because it is automatically generated by the migration tool.

  7. Configure the solr2es.jaas.conf file.

    • If both Solr and Elasticsearch are in normal mode, skip this step and go to 9.
    • If Solr is in security mode, you need to configure authentication information for the Client module. Otherwise, the Client module does not need to be configured.
    • If Elasticsearch is in security mode, you need to configure the authentication information of the EsClient module. Otherwise, the EsClient module does not need to be configured.

  8. Configure the Client or EsClient module.

    Run the following commands to modify the keyTab and principal parameters in the jaas.conf file: Retain the default values of other parameters unless otherwise specified.

    As shown in the following, the keyTab parameter indicates the path for storing the user.keytab file. test is the username. principal is in the format of test@<System domain name>.

    vi solr2es.jaas.conf
    Client {
    com.sun.security.auth.module.Krb5LoginModule required
    useKeyTab=true
    keyTab="/opt/client/Elasticsearch/tools/elasticsearch-data2es/solr2es/conf/user.keytab"
    principal="test@<System domain name>"
    useTicketCache=false
    storeKey=true
    debug=true;
    };
    EsClient {
    com.sun.security.auth.module.Krb5LoginModule required
    useKeyTab=true
    keyTab="/opt/client/Elasticsearch/tools/elasticsearch-data2es/solr2es/conf/user.keytab"
    principal="test@<System domain name>"
    useTicketCache=false
    storeKey=true
    debug=true;
    };

    You can log in to Manager, choose System > Permission > Domain and Mutual Trust, and view the value of Local Domain, which is the current system domain name.

  9. Run the following command to modify the solr2es.properties configuration file:

    vi solr2es.properties

    Set the parameters by referring to Table1 solr2es.properties parameter configuration. A configuration example is as follows:

    ### Solr to ES Configurations ###
    # Await timeout in seconds after migrate service is shutdown.
    awaitTimeout=10
    
    ### ES Client configurations ###
    esServerHost=ip1:port,ip2:port,ip3:port
    principal=test
    isSecureMode=true
    connectTimeout=5000
    socketTimeout=300000
    connectionRequestTimeout=100000
    maxConnPerRoute=100
    maxConnTotal=1000
    
    ### Solr Client Configuration ###
    # Solr is on safe mode: true; otherwise false.
    solrKerberosEnabled=true
    # Please change it with Solr ZK host.
    solrZookeeperHost=ip1:port,ip2:port,ip3:port/solr
    
    ### Zookeeper Client Configuration ###
    # Zookeeper server principal.
    zookeeperServerPrincipal=zookeeper/HADOOP.HADOOP.COM
    # solr client socket timeout.
    socketTimeoutMillis=180000
    # Zookeeper connect timeout.
    zookeeperConnectTimeout=30000
    # Zookeeper client is safe mode
    zookeeperIsSecureMode=true
    Table 1 solr2es.properties parameter configuration

    Parameter

    Default Value

    Description

    awaitTimeout

    10

    Waiting duration from the time when the index migration task is complete to the time when the scheduling thread pool is completely disabled. The unit is second.

    esServerHost

    ip1:port1,ip2:port2,ip3:port3

    Instance configuration for importing data to Elasticsearch. To facilitate load balancing, multiple groups of IP addresses and port numbers are configured. The parameter value is in the IP address:Port number format. Multiple groups of IP address and port number are separated by comma (,).

    To view the IP addresses and port numbers, log in to FusionInsight Manager, choose Cluster > Services > Elasticsearch, and click the Configurations tab then the All Configurations sub-tab. On the sub-tab page that is displayed, choose EsNode1 > Indicate List, and check the value of INSTANCE_SERVER_PORT_LIST in the right pane.

    principal

    test

    Username used for Elasticsearch authentication. Set this parameter when Elasticsearch is in security mode and set it the same as that of principal on EsClient.

    isSecureMode

    true

    Specifies whether the Elasticsearch cluster is in security mode. true indicates that the cluster is in security mode, and false indicates that the cluster is in normal mode.

    connectTimeout

    5000

    Timeout period for establishing an HTTP connection, in ms.

    socketTimeout

    300000

    Timeout interval for waiting for an HTTP connection response, in millisecond.

    connectionRequestTimeout

    100000

    Maximum timeout interval for obtaining available connections from the connection pool, in millisecond.

    maxConnPerRoute

    100

    Maximum number of concurrent connections to the same route.

    maxConnTotal

    1000

    Maximum number of connections in the connection pool.

    solrKerberosEnabled

    true

    Specifies whether Solr is in security mode. true indicates that Solr is in security mode, and false indicates that Solr is in normal mode.

    solrZookeeperHost

    ip1:port1,ip2:port2,ip3:port3/solr

    IP address: ZooKeeper address used by Solr. You can obtain the IP address by viewing the value of -DzkHost on the Dashboard page of the Solr Admin UI.

    zookeeperServerPrincipal

    zookeeper/HADOOP.<System domain name>

    Specifies the principal of ZooKeeper server.

    Obtain the value by viewing the value of -Dzookeeper.server.principal on the Dashboard page of the Solr Admin UI, for example, zookeeper/hadoop.hadoop.com.

    socketTimeoutMillis

    180000

    Timeout interval for the HTTP object on the Solr client to wait for a connection response, in milliseconds

    zookeeperConnectTimeout

    30000

    Timeout interval for creating a ZooKeeper connection (unit: millisecond).

    zookeeperIsSecureMode

    true

    Specifies whether ZooKeeper is in security mode. true indicates that ZooKeeper is in security mode, and false indicates that ZooKeeper is in normal mode.

  10. Configure the indexes that need to be migrated.

    Set the parameters according to Table 2. The following is a configuration example:

    vi migrateIndex.properties

    ### Solr Index1 Configuration ###
    # Solr index name, can be different with esIndexName.
    solrIndex=Collection_1
    # Query shard name in the solr collection. All: indicates that all shards are queried. shard1,shard2: queries data of shard1 and shard2.
    solrShards=ALL
    # Number of concurrent shard threads for querying Solr collection. Default thread number is 3
    queryThreadNumber=3
    # Solr query string.
    solrQueryString=*:*
    # Solr query field.
    solrQueryFields=*
    # Solr result size in one query return.
    solrRowsPerQuery=4000
    # Solr cursor mark: sort clauses must include the uniqueKey field (either asc or desc). If any other sort is needed, please separate by comma and then add it behind uniqueKey sort, "solrSort=id asc,title desc" for example.
    solrSort=id asc
    # Solr primary key. After the configuration, it is used as the _id of the ES.
    solrUniqueKey=
    
    ### ES Index Configurations ###
    # ES index name, can be different with Solr index name.
    esIndex=collection_1
    # If ES index contains different types, please assgin a correct one, for example, _doc.
    esType=
    # ES batch docs size in one bulk. Proper bulk size will improve index performance.
    esBulkSize=1000
    # Concurrent thread number to migrate each index from Solr to ES. Default thread number is 8.
    migrateThreadNumber=8
    # If the index to be migrated does not exist in the ES, run this statement to create it. The field type in the ES must match the field type in the Solr collection.
    esIndexMappings={"mappings":{"properties":{"id":{"type":"keyword"},"name":{"type":"text"},"features":{"type":"text"},"title":{"type":"text"},"description":{"type":"text"},"comments":{"type":"text"},"keywords":{"type":"text"},"price":{"type":"double"},"price_c":{"type":"keyword"},"weight":{"type":"double"},"popularity":{"type":"long"},"subject":{"type":"text"},"author":{"type":"keyword"},"author_s":{"type":"text"},"category":{"type":"text"},"age":{"type":"text"},"last_modified":{"type":"date"}}},"settings":{"number_of_shards":"3","number_of_replicas":"1"}}
    Table 2 Parameters in the migrateIndex.properties configuration file

    Parameter

    Default Value

    Description

    solrIndex

    Collection_1

    Name of an index in Solr. Enter the Solr index to be migrated.

    solrShards

    ALL

    Solr shard name. The default value is ALL, indicating that all Solr shards are migrated. You can also set this parameter to the names of shards to be migrated. Separate multiple shard names with commas (,), for example, shard1,shard2.

    queryThreadNumber

    3

    Number of concurrent threads for querying Solr shards. The default value is 3, indicating that three pieces of Solr shard data are queried at the same time.

    solrQueryString

    *.*

    Solr query statement. If there is no special query condition, retain the default value *.*.

    solrQueryFields

    *

    Solr query field. If all fields need to be migrated, retain the default value *.

    solrRowsPerQuery

    4000

    Number of documents returned by each Solr query. The default value is 4000.

    solrSort

    id asc

    The tool uses Solr cursor mark for query. Therefore, the query results need to be sorted in ascending or descending order based on uniqueKey. You can use other fields for sorting as required, for example, id asc,title desc.

    solrUniqueKey

    -

    Unique key of a Solr index. By default, this parameter is left blank. After you set this parameter, the value is used as the value of the _id field of Elasticsearch.

    esIndex

    collection_1

    Name of the Elasticsearch destination index. If the index does not exist, the system will automatically create one.

    esType

    -

    Type used in the Elasticsearch index. This parameter is left blank by default. You can change the value as required.

    esBulkSize

    1000

    Size of an Elasticsearch bulk. If the source file is small, increase this value to improve the Elasticsearch write performance. The size of a bulk batch is 5–10 MB.

    migrateThreadNumber

    8

    Number of concurrent threads for Elasticsearch to write data. The default value is 8.

    esIndexMapping

    (See the reference example.)

    If the indexes to be migrated do not exist in Elasticsearch, the program uses this statement to automatically create indexes. The field types in Elasticsearch indexes must match those in Solr indexes.

    • Only one index can be migrated in a single task. Shards in one index are concurrently migrated. If multiple indexes need to be migrated, you can configure multiple concurrent migration tasks. The number of concurrent tasks can be increased or decreased based on the host performance and cluster idleness.
    • For details about how to migrate indexes, see the description. You can adjust parameters such as solrRowsPerQuery, queryThreadNumber, migrateThreadNumber, and esBulkSize, and verify the migration performance in a test environment.

Migrate Solr Data to Elasticsearch

  1. Run the following commands to execute the index migration task:

    cd /opt/client

    source bigdata_env

    kinit test

    cd /opt/client/Elasticsearch/tools/elasticsearch-data2es/solr2es

    java -cp ./conf/:../lib/* com.*.fusioninsight.elasticsearch.solr2es.Solr2ES

  2. Run the curl command to check whether the data is imported by viewing the index.

    • Security mode

      curl -XGET --tlsv1.2 --negotiate -k -u :"https://ip:port/esIndex/_search?pretty"

    • Normal mode

      curl -XGET "http://ip:port/esIndex/_search?pretty"

    • ip: indicates the IP address of any EsNode in the Elasticsearch cluster.
    • To view the port number, log in to FusionInsight Manager, choose Cluster > Services > Elasticsearch, click the Configurations tab and then All Configurations. In the displayed page, search for SERVER_PORT to obtain the port number of the EsNodeX instance.

  3. Run the curl command to check whether the data volume is correct.

    • Security mode

      curl -XGET --tlsv1.2 --negotiate -k -u : 'https://ip:port/_cat/indices?v'

    • Normal mode

      curl -XGET https://ip:port/_cat/indices?v

    For details about how to use the curl command, see Running curl Commands in Linux.