Migrating Solr Data Using Solr2ES
Scenario
This section describes how to query Solr indexes and write index data to the Elasticsearch cluster when Solr and Elasticsearch are running properly.
Prerequisites
- The Solr and Elasticsearch clusters are running properly.
- To ensure data consistency after the migration, upper-layer services need to stop write operations on the Solr source cluster. The read operations can be conducted normally. After the migration is complete, the upstream services are switched to the target Elasticsearch cluster for reading and writing.
- The Solr and Elasticsearch clients of the same version have been installed, and and the client nodes can communicate with the Solr and Elasticsearch clusters. For example, the installation directory is /opt/client.
- A human-machine user, for example, test, has been created in the source and destination clusters, added to the solr, elasticsearch, and supergroup (primary) user groups, and assigned the Manager_administrator role.
Procedure
Modify configuration files
- Check whether Solr and Elasticsearch are in normal mode.
- Check whether Solr is in normal mode.
Log in to FusionInsight Manager as user test (change the password upon your first login) and choose Cluster > Services > Solr. Click the hyperlink next to Solr WebUI to go to the Solr Admin UI page. On the Dashboard page, check whether solr.hdfs.security.kerberos.enabled=true exists. If yes, Solr is in the security mode. If no, Solr is in the normal mode.
- Check whether Elasticsearch is in normal mode.
On Manager, choose Cluster > Services > Elasticsearch and click the Configurations tab. On the displayed page, search for the ELASTICSEARCH_SECURITY_ENABLE parameter and check whether it is searchable. If yes, check whether the value is true, which indicates that the security mode is enabled.
- If both Solr and Elasticsearch are in normal mode, go to 4.
- Otherwise, go to 2.
- Check whether Solr is in normal mode.
- In the destination cluster, choose System > Permission > User. Locate the newly created user and choose More > Download Authentication Credential. Then select the cluster information, and click OK to download the file.
- Upload the user.keytab and krb5.conf files obtained after the decompression to the Elasticsearch/tools/elasticsearch-data2es/solr2es/conf directory on the Elasticsearch client.
- Log in to the node where the client is deployed as user root.
- Run the following command to go to the configuration file directory of the Solr2ES tool:
cd /opt/client/Elasticsearch/tools/elasticsearch-data2es/solr2es/conf
- Before modifying any configuration files, read the description information listed below.
Configuration File
Description
user.keytab
krb5.conf
Authentication file of the user used in the migration task, which can be downloaded by referring to 2 and 3. You need to download the file only when Solr or Elasticsearch is in security mode.
solr2es.jaas.conf
JAAS configuration file required for authentication. If Solr is in security mode, you need to configure authentication information for the Client module. If Elasticsearch is in security mode, you need to configure the authentication information of the EsClient module.
solr2es.properties
General parameters configured for a migration task, which take effect for all indexes.
migrateIndex.properties
Used to configure the index migration information.
solrCursorMark.properties
If an exception occurs during the index migration task, the Solr Cursor Mark of the current migration index is automatically updated to the configuration file solrCursorMark.properties and used as the query condition of the next Solr query.
You do not need to configure this file because it is automatically generated by the migration tool.
- Configure the solr2es.jaas.conf file.
- If both Solr and Elasticsearch are in normal mode, skip this step and go to 9.
- If Solr is in security mode, you need to configure authentication information for the Client module. Otherwise, the Client module does not need to be configured.
- If Elasticsearch is in security mode, you need to configure the authentication information of the EsClient module. Otherwise, the EsClient module does not need to be configured.
- Configure the Client or EsClient module.
Run the following commands to modify the keyTab and principal parameters in the jaas.conf file: Retain the default values of other parameters unless otherwise specified.
As shown in the following, the keyTab parameter indicates the path for storing the user.keytab file. test is the username. principal is in the format of test@<System domain name>.
vi solr2es.jaas.confClient { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="/opt/client/Elasticsearch/tools/elasticsearch-data2es/solr2es/conf/user.keytab" principal="test@<System domain name>" useTicketCache=false storeKey=true debug=true; }; EsClient { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="/opt/client/Elasticsearch/tools/elasticsearch-data2es/solr2es/conf/user.keytab" principal="test@<System domain name>" useTicketCache=false storeKey=true debug=true; };
You can log in to Manager, choose System > Permission > Domain and Mutual Trust, and view the value of Local Domain, which is the current system domain name.
- Run the following command to modify the solr2es.properties configuration file:
vi solr2es.properties
Set the parameters by referring to Table1 solr2es.properties parameter configuration. A configuration example is as follows:
### Solr to ES Configurations ### # Await timeout in seconds after migrate service is shutdown. awaitTimeout=10 ### ES Client configurations ### esServerHost=ip1:port,ip2:port,ip3:port principal=test isSecureMode=true connectTimeout=5000 socketTimeout=300000 connectionRequestTimeout=100000 maxConnPerRoute=100 maxConnTotal=1000 ### Solr Client Configuration ### # Solr is on safe mode: true; otherwise false. solrKerberosEnabled=true # Please change it with Solr ZK host. solrZookeeperHost=ip1:port,ip2:port,ip3:port/solr ### Zookeeper Client Configuration ### # Zookeeper server principal. zookeeperServerPrincipal=zookeeper/HADOOP.HADOOP.COM # solr client socket timeout. socketTimeoutMillis=180000 # Zookeeper connect timeout. zookeeperConnectTimeout=30000 # Zookeeper client is safe mode zookeeperIsSecureMode=true
Table 1 solr2es.properties parameter configuration Parameter
Default Value
Description
awaitTimeout
10
Waiting duration from the time when the index migration task is complete to the time when the scheduling thread pool is completely disabled. The unit is second.
esServerHost
ip1:port1,ip2:port2,ip3:port3
Instance configuration for importing data to Elasticsearch. To facilitate load balancing, multiple groups of IP addresses and port numbers are configured. The parameter value is in the IP address:Port number format. Multiple groups of IP address and port number are separated by comma (,).
To view the IP addresses and port numbers, log in to FusionInsight Manager, choose Cluster > Services > Elasticsearch, and click the Configurations tab then the All Configurations sub-tab. On the sub-tab page that is displayed, choose EsNode1 > Indicate List, and check the value of INSTANCE_SERVER_PORT_LIST in the right pane.
principal
test
Username used for Elasticsearch authentication. Set this parameter when Elasticsearch is in security mode and set it the same as that of principal on EsClient.
isSecureMode
true
Specifies whether the Elasticsearch cluster is in security mode. true indicates that the cluster is in security mode, and false indicates that the cluster is in normal mode.
connectTimeout
5000
Timeout period for establishing an HTTP connection, in ms.
socketTimeout
300000
Timeout interval for waiting for an HTTP connection response, in millisecond.
connectionRequestTimeout
100000
Maximum timeout interval for obtaining available connections from the connection pool, in millisecond.
maxConnPerRoute
100
Maximum number of concurrent connections to the same route.
maxConnTotal
1000
Maximum number of connections in the connection pool.
solrKerberosEnabled
true
Specifies whether Solr is in security mode. true indicates that Solr is in security mode, and false indicates that Solr is in normal mode.
solrZookeeperHost
ip1:port1,ip2:port2,ip3:port3/solr
IP address: ZooKeeper address used by Solr. You can obtain the IP address by viewing the value of -DzkHost on the Dashboard page of the Solr Admin UI.
zookeeperServerPrincipal
zookeeper/HADOOP.<System domain name>
Specifies the principal of ZooKeeper server.
Obtain the value by viewing the value of -Dzookeeper.server.principal on the Dashboard page of the Solr Admin UI, for example, zookeeper/hadoop.hadoop.com.
socketTimeoutMillis
180000
Timeout interval for the HTTP object on the Solr client to wait for a connection response, in milliseconds
zookeeperConnectTimeout
30000
Timeout interval for creating a ZooKeeper connection (unit: millisecond).
zookeeperIsSecureMode
true
Specifies whether ZooKeeper is in security mode. true indicates that ZooKeeper is in security mode, and false indicates that ZooKeeper is in normal mode.
- Configure the indexes that need to be migrated.
Set the parameters according to Table 2. The following is a configuration example:
vi migrateIndex.properties
### Solr Index1 Configuration ### # Solr index name, can be different with esIndexName. solrIndex=Collection_1 # Query shard name in the solr collection. All: indicates that all shards are queried. shard1,shard2: queries data of shard1 and shard2. solrShards=ALL # Number of concurrent shard threads for querying Solr collection. Default thread number is 3 queryThreadNumber=3 # Solr query string. solrQueryString=*:* # Solr query field. solrQueryFields=* # Solr result size in one query return. solrRowsPerQuery=4000 # Solr cursor mark: sort clauses must include the uniqueKey field (either asc or desc). If any other sort is needed, please separate by comma and then add it behind uniqueKey sort, "solrSort=id asc,title desc" for example. solrSort=id asc # Solr primary key. After the configuration, it is used as the _id of the ES. solrUniqueKey= ### ES Index Configurations ### # ES index name, can be different with Solr index name. esIndex=collection_1 # If ES index contains different types, please assgin a correct one, for example, _doc. esType= # ES batch docs size in one bulk. Proper bulk size will improve index performance. esBulkSize=1000 # Concurrent thread number to migrate each index from Solr to ES. Default thread number is 8. migrateThreadNumber=8 # If the index to be migrated does not exist in the ES, run this statement to create it. The field type in the ES must match the field type in the Solr collection. esIndexMappings={"mappings":{"properties":{"id":{"type":"keyword"},"name":{"type":"text"},"features":{"type":"text"},"title":{"type":"text"},"description":{"type":"text"},"comments":{"type":"text"},"keywords":{"type":"text"},"price":{"type":"double"},"price_c":{"type":"keyword"},"weight":{"type":"double"},"popularity":{"type":"long"},"subject":{"type":"text"},"author":{"type":"keyword"},"author_s":{"type":"text"},"category":{"type":"text"},"age":{"type":"text"},"last_modified":{"type":"date"}}},"settings":{"number_of_shards":"3","number_of_replicas":"1"}}
Table 2 Parameters in the migrateIndex.properties configuration file Parameter
Default Value
Description
solrIndex
Collection_1
Name of an index in Solr. Enter the Solr index to be migrated.
solrShards
ALL
Solr shard name. The default value is ALL, indicating that all Solr shards are migrated. You can also set this parameter to the names of shards to be migrated. Separate multiple shard names with commas (,), for example, shard1,shard2.
queryThreadNumber
3
Number of concurrent threads for querying Solr shards. The default value is 3, indicating that three pieces of Solr shard data are queried at the same time.
solrQueryString
*.*
Solr query statement. If there is no special query condition, retain the default value *.*.
solrQueryFields
*
Solr query field. If all fields need to be migrated, retain the default value *.
solrRowsPerQuery
4000
Number of documents returned by each Solr query. The default value is 4000.
solrSort
id asc
The tool uses Solr cursor mark for query. Therefore, the query results need to be sorted in ascending or descending order based on uniqueKey. You can use other fields for sorting as required, for example, id asc,title desc.
solrUniqueKey
-
Unique key of a Solr index. By default, this parameter is left blank. After you set this parameter, the value is used as the value of the _id field of Elasticsearch.
esIndex
collection_1
Name of the Elasticsearch destination index. If the index does not exist, the system will automatically create one.
esType
-
Type used in the Elasticsearch index. This parameter is left blank by default. You can change the value as required.
esBulkSize
1000
Size of an Elasticsearch bulk. If the source file is small, increase this value to improve the Elasticsearch write performance. The size of a bulk batch is 5–10 MB.
migrateThreadNumber
8
Number of concurrent threads for Elasticsearch to write data. The default value is 8.
esIndexMapping
(See the reference example.)
If the indexes to be migrated do not exist in Elasticsearch, the program uses this statement to automatically create indexes. The field types in Elasticsearch indexes must match those in Solr indexes.
- Only one index can be migrated in a single task. Shards in one index are concurrently migrated. If multiple indexes need to be migrated, you can configure multiple concurrent migration tasks. The number of concurrent tasks can be increased or decreased based on the host performance and cluster idleness.
- For details about how to migrate indexes, see the description. You can adjust parameters such as solrRowsPerQuery, queryThreadNumber, migrateThreadNumber, and esBulkSize, and verify the migration performance in a test environment.
Migrate Solr Data to Elasticsearch
- Run the following commands to execute the index migration task:
cd /opt/client
source bigdata_env
kinit test
cd /opt/client/Elasticsearch/tools/elasticsearch-data2es/solr2es
java -cp ./conf/:../lib/* com.*.fusioninsight.elasticsearch.solr2es.Solr2ES
- Run the curl command to check whether the data is imported by viewing the index.
- Security mode
curl -XGET --tlsv1.2 --negotiate -k -u :"https://ip:port/esIndex/_search?pretty"
- Normal mode
- ip: indicates the IP address of any EsNode in the Elasticsearch cluster.
- To view the port number, log in to FusionInsight Manager, choose Cluster > Services > Elasticsearch, click the Configurations tab and then All Configurations. In the displayed page, search for SERVER_PORT to obtain the port number of the EsNodeX instance.
- Security mode
- Run the curl command to check whether the data volume is correct.
For details about how to use the curl command, see Running curl Commands in Linux.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot