Cluster O&M
Account Maintenance Suggestions
It is recommended that the administrator conduct routine checks on the accounts. The check covers the following items:
- Check whether the accounts of the OS, Manager, and each component are necessary and whether temporary accounts have been deleted.
- Check whether the permissions of the accounts are appropriate. Different administrators have different rights.
- Check and audit the logins and operation records of all types of accounts.
Password Maintenance Suggestions
Accessing portal requires identity authentication. The complexity and validity period of an account password must meet your security requirements.
Refer to the following suggestions to maintain passwords:
- Assign dedicated personnel to keep OS passwords.
- Use passwords that meet certain strength requirements, such as minimum password length or mixing of letter cases.
- Encrypt passwords before transferring them, and do not transfer them via email.
- Encrypt passwords for storage.
- Remind enterprise users to change passwords during system handover.
- Change passwords periodically.
Log Maintenance Suggestions
Operation logs help discover exceptions such as illegal operations and login by unauthorized users. The system records important operations in logs. You can use operation logs to locate problems.
- Checking Logs Regularly
Check system logs periodically and handle exceptions such as unauthorized operations or logins in a timely manner.
- Backing Up Logs Regularly
Audit logs provided by Manager and clusters record user activity and operation information. You can export audit logs from Manager. If there are too many audit logs in the system, you can configure dump parameters to dump audit logs to a specified server to ensure that the cluster nodes disk space is sufficient.
- Maintenance Owner
Network monitoring engineers and system maintenance engineers
Manager Routine Maintenance
To ensure long-term and stable running of the system, administrators or maintenance engineers need to periodically check items listed in the following table and rectify the detected faults based on the check results. It is recommended that administrators or engineers record the result in each task scenario and sign off based on the enterprise management regulations.
Routine Maintenance Frequency |
Role |
Check Item |
---|---|---|
Daily |
Check the cluster service status. |
|
Check the cluster host status. |
|
|
Check the cluster alarm information. |
Check whether alarms were generated for unhandled exceptions on the previous day, including alarms that were automatically cleared. |
|
Check the cluster audit information. |
Check whether critical and major operations are performed on the previous day and whether the operations are valid. |
|
Check the cluster backup status. |
Check whether OMS, LDAP, DBService, and NameNode have been automatically backed up on the previous day. |
|
View the health check result. |
Perform a health check on Manager and download the health check report to check whether the current cluster is abnormal. You are advised to enable the automatic health check, export the latest cluster health check result, and repair unhealthy items based on the result. |
|
Check the network communication. |
Check the cluster network status and check whether the network communication between nodes is delayed. |
|
Check the storage status. |
Check whether the total data storage volume of the cluster increases abruptly.
|
|
Check logs. |
|
|
Weekly |
User management |
Check whether the user password is about to expire and notify the user of changing the password. To change the password of a machine-machine user, you need to download the keytab file again. |
Analyze alarms. |
Export and analyze alarms generated in a specified period. |
|
Scan disks. |
Check the disk health status. You are advised to use a dedicated disk check tool. |
|
Collect statistics on storage. |
Check in batches whether the disk data of cluster nodes is evenly stored, filter out the disks whose data increases significantly or is insufficient, and check whether the disks are normal. |
|
Record changes. |
Arrange and record the operations on cluster configuration parameters and files to provide reference for fault analysis and handling. |
|
Monthly |
Analyze logs. |
|
Diagnose the network. |
Analyze the network health status of the cluster. |
|
Manage hardware. |
Check the equipment room environment and clean the devices. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot