ALM-13009 ZooKeeper Znode Capacity Usage Exceeds the Threshold
Alarm Description
The system checks the level-2 ZNode status in the ZooKeeper data directory every hour. This alarm is generated when the system detects that the capacity usage exceeds the threshold.
Alarm Attributes
Alarm ID |
Alarm Severity |
Alarm Type |
Service Type |
Auto Cleared |
---|---|---|---|---|
13009 |
Critical (default threshold: 90%) Major (default threshold: 80%) |
Quality of service |
ZooKeeper |
Yes |
Alarm Parameters
Type |
Parameter |
Description |
---|---|---|
Location Information |
Source |
Specifies the cluster for which the alarm was generated. |
ServiceName |
Specifies the service for which the alarm was generated. |
|
ServiceDirectory |
Specifies the directory for which the alarm was generated. |
|
RoleName |
Specifies the role for which the alarm was generated. |
|
Additional Information |
Trigger Condition |
Specifies the alarm triggering condition. |
Impact on the System
ZooKeeper cannot provide services for external systems, and the services of upstream components (such as Yarn, Flink, and Spark) that depend on the alarm directory are abnormal.
Possible Causes
- A large volume of data has been written to the ZooKeeper data directory.
- The threshold is improperly defined.
Handling Procedure
Check whether a large volume of data is written to the alarm directory.
- On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Alarm > Alarms. Click the drop-down list in the row containing ALM-13009 ZooKeeper ZNode Capacity Usage Exceeds the Threshold, and find the ZNode for which the alarm is generated in the Location area.
- Choose Cluster > Services > ZooKeeper. On the page that is displayed, click the Resource tab. In the Used Resources (By Second-Level ZNode) area, click By capacity and check whether a large amount of data is written to the top-level ZNode directory.
- Check whether data in the directory can be deleted.
Deleting data from ZooKeeper is a high-risk operation. Exercise caution when performing this operation.
- Log in to the ZooKeeper client and delete unnecessary data from the directory to which a large amount of data is written.
- Log in to the ZooKeeper client installation directory, for example, /opt/client, and configure environment variables.
cd /opt/client
source bigdata_env
- Run the following command to authenticate the user (skip this step for a cluster in normal mode):
- Run the following command to log in to the client tool:
zkCli.sh -server <Service IP address of the node where any ZooKeeper instance resides>:<Client port>
- Run the following command to delete unnecessary data:
- Log in to the ZooKeeper client installation directory, for example, /opt/client, and configure environment variables.
- Log in to FusionInsight Manager and choose Cluster > Services > ZooKeeper. On the page that is displayed, click the Configuration tab then the All Configurations sub-tab, and search for max.data.size. The value of max.data.size is the maximum capacity quota of the ZooKeeper directory. The unit is byte. Search for the GC_OPTS configuration item and check the value of Xmx.
- Compare the values of max.data.size and Xmx*0.65. The threshold is the smaller value multiplied by 80%. You can change the values of max.data.size and Xmx*0.65 to increase the threshold.
- Check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 8.
Collect the fault information.
- On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
- Expand the Service drop-down list, and select ZooKeeper for the target cluster.
- Click in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
- Contact O&M engineers and provide the collected logs.
Alarm Clearance
This alarm is automatically cleared after the fault is rectified.
Related Information
None.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot