Routine Maintenance
To ensure a long-term proper and stable running of the system, MRS cluster administrators or maintenance engineers need to check the items listed in Table 1 periodically and rectify faults based on the check results. It is recommended that system administrators record the result in each task scenario and sign off based on the enterprise management regulations.
Routine Maintenance Period |
Task |
Routine Maintenance Content |
---|---|---|
Every day |
Checking the cluster service status |
|
Checking the cluster host status |
|
|
Checking the cluster alarm information |
Check whether there are alarms generated in the previous day and automatically cleared. |
|
Checking the cluster audit information |
Check whether there are Critical and Major operations performed in the previous day and whether the operations are valid. |
|
Checking the cluster backup |
Check whether the OMS, LDAP, DBService, and NameNodeOMS, LDAP, and DBService were automatically backed up in the previous day. |
|
Checking the health check results |
Perform the health check on FusionInsight Manager, and download the health check report to check whether any exception exists in the current cluster. You are advised to enable the automatic health check, export the latest cluster health check result, and repair unhealthy items based on the result. |
|
Checking the network communication |
Check the cluster network running status and check whether delay exists in the network communication between nodes. |
|
Checking the storage status |
Check whether the total amount of cluster data storage increases suddenly.
|
|
Checking logs |
|
|
Every week |
Managing users |
Check whether the user passwords are about to expire and notify users to change their passwords. To change the password of a Machine-Machine user, the keytab file needs to be downloaded again. |
Analyzing alarms |
Export the alarms generated in a specified period and analyze them. |
|
Scanning disks |
Check the disk health status. You are advised to use professional disk health check tools to perform the check. |
|
Collecting statistics of storage |
Check the cluster node disk data in batches and check whether the data is evenly stored. Select the disks where the data amount is too large or too small and check whether the disks are normal. |
|
Recording changes |
Arrange and record the operations on cluster configuration parameters and files to provide references for fault analysis and rectification. |
|
Every month |
Analyzing logs |
|
Diagnosing the network |
Analyze the cluster network health status. |
|
Managing hardware |
Check the equipment rooms where the devices are running and clean the devices. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot