Backing Up Solr Service Data

Scenario

To ensure Solr service data security routinely or before a major operation on Solr (such as upgrade or migration), you need to back up Solr service data. The backup data can be used to recover the system if an exception occurs or the operation has not achieved the expected result, minimizing the adverse impacts on services.

You can create a backup task on FusionInsight Manager to back up Solr service data. Both automatic and manual backup tasks are supported.

During snapshot creation, the search and query functions are not affected. After the snapshot creation process starts, new data is not recorded in the snapshot. Only one snapshot can be created at a time.
If some indexes selected during backup task creation are deleted before the backup task is started, the deleted indexes will not be backed up. If all indexes are deleted, the backup task fails to be executed.
Ensure that the running status of all instances in the cluster is normal and can receive requests properly. To ensure successful backup, do not perform operations such as adding, deleting, stopping, or restarting Solr instances, stopping or restarting the Solr service, or stopping or restarting the cluster.
If a large amount of data needs to be backed up in the cluster, back up data at the index level in batches. Otherwise, the backup takes a long time.
If a backup task fails, log in to the backup directory of the target (RemoteHDFS), which is the value of Target Path for a backup to remote HDFS. Delete the subdirectory (Backup task name_Data source_Task creation time) corresponding to the backup task name to delete data that fails to be backed up.
Before the backup, check whether the index to be backed up is in the green state and no shard is lost. Otherwise, the backup fails.

Prerequisites

If data needs to be backed up to the remote HDFS, you have prepared a standby cluster for data backup. The authentication mode of the standby cluster is the same as that of the active cluster. For other backup modes, you do not need to prepare the standby cluster. Currently, Solr data can be backed up only to HDFS.
For the Solr cluster in normal mode, service data cannot be backed up to HDFS in a cluster in security mode.

If the active cluster is deployed in security mode and the active and standby clusters are not managed by the same FusionInsight Manager, mutual trust has been configured. For details, see Configuring Cross-Manager Mutual Trust Between Clusters. If the active cluster is deployed in normal mode, no mutual trust is required.

Time is consistent between the active and standby clusters and the NTP services on the active and standby clusters use the same time source.
The HDFS in the standby cluster has sufficient space. You are advised to save backup files in a custom directory.

Procedure

On FusionInsight Manager, choose O&M > Backup and Restoration > Backup Management.
Click Create.
Set Name to the name of the backup task.
Select the desired cluster from Backup Object.

Set Mode to the type of the backup task.

Periodic indicates that the backup task is executed by the system periodically. Manual indicates that the backup task is executed manually.

**Table 1** Periodic backup parameters
Parameter	Description
Started	Indicates the time when the task is started for the first time.
Period	Indicates the task execution interval. The options include Hours and Days.
Backup Policy	Indicates the volume of data to be backed up in each task execution. Only Full backup every time is supported.

In Configuration, choose Solr > Solr under Service data.
Set Path Type of Solr to a backup directory type.

The following backup directory types are supported:

RemoteHDFS: indicates that the backup files are stored in the HDFS directory of the standby cluster.
If you select this option, set the following parameters:
- Destination NameService Name: indicates the NameService name of the standby cluster, for example, hacluster. You can obtain it from the NameService Management page of HDFS of the standby cluster.
- IP Mode: indicates the mode of the target IP address. The system automatically selects the IP address mode based on the cluster network type, for example, IPv4 or IPv6.
- Destination Active NameNode IP Address: indicates the service plane IP address of the active NameNode in the standby cluster.
- Destination Standby NameNode IP Address: indicates the service plane IP address of the standby NameNode in the standby cluster.
- Destination NameNode RPC Port: indicates the value of dfs.namenode.rpc.port in the HDFS basic configuration of the standby cluster.
- Target Path: indicates the HDFS directory for storing standby cluster backup data. The storage path cannot be an HDFS hidden directory, such as a snapshot or recycle bin directory, or a default system directory, such as /hbase or /user/hbase/backup.
- Maximum Number of Backup Copies: indicates the number of backup file sets that can be retained in the backup directory.
Set Backup Content to one or multiple collections to be backed up.

You can select backup data using either of the following methods:
- Adding a backup data file
  1. Click Add.
  2. Select the table to be backed up under File Directory, and click Add to add the table to Backup Content.
  3. Click OK.
- Selecting using regular expressions
  1. Click Query Regular Expression.
  2. Enter a slash (/) in the first text box. This root directory is not an actual directory but an internal Solr directory.
  3. Enter a regular expression in the second text box. Standard regular expressions are supported. For example, to get indexes containing solr, enter .*solr.*. To get indexes starting with solr, enter solr.*. To get indexes ending with solr, enter .*solr.
  4. Click Refresh to view the displayed tables in Directory Name.
  5. Click Synchronize to save the result.
  - When entering regular expressions, click or to add or delete an expression.
  - If the selected table or directory is incorrect, click Clear Selected Node to deselect it.
Click Verify to check whether the backup task is configured correctly.

The possible causes of the verification failure are as follows:
- The destination active or standby NameNode IP address or NameService name is incorrect.
- The name of the index to be backed up does not exist in the cluster.
Click OK.
In the Operation column of the created task in the backup task list, click More and select Back Up Now to execute the backup task.

After the backup task is executed, the system automatically creates a subdirectory for each backup task in the backup directory. The format of the subdirectory name is Backup task name_Data source_Task creation time, and the subdirectory is used to save latest data source backup files. Each time a backup task is executed, a snapshot directory named _Snapshot absolute seconds is created in the directory. When the number of snapshot directories is greater than the value of Maximum Number of Backup Copies, the earliest directory is automatically deleted.