Viewing MRS Cluster Events
Alarms and events are important mechanisms for ensuring the stability, reliability, and performance of MRS clusters.
Events capture status changes and system operations that occur during system running. They are used to audit or trace system behavior, including component instance startup, shutdown, and active/standby switchover in an MRS cluster, and node-level changes such as slow disk isolation.
For details about common events in an MRS cluster, see Table 2.
Generally, you do not need to manually handle events. However, for events of the major or higher severity, you need to check whether the related components are running properly and whether related alarms exist. For details, see Viewing Alarms of an MRS Cluster.
Video Tutorial
This tutorial introduces how to view cluster alarms and events and configure an alarm threshold.
The UI may vary depending on the version. This tutorial is for reference only.
Viewing Cluster Events on the Management Console
- Log in to the MRS console.
- On the Active Clusters page, select a running cluster and click its name to switch to the cluster details page.
- On the Dashboard page, click Synchronize next to IAM User Sync to synchronize IAM users.
- Choose Alarms > Events. On the displayed page, view event information about the cluster.
Figure 1 Viewing MRS cluster events
In the event list, you can check events generated in the current cluster, including the event name, severity, and generation time.
You can click the arrow before an event name to expand the event details.
- To export event information, click Export, select the file format, and export the event file to a local directory.
Viewing Cluster Events on Manager
- Log in to FusionInsight Manager of the MRS cluster.
For details about how to log in to FusionInsight Manager, see Accessing MRS Manager.
- Choose
. On the displayed Events page, you can view information about all events in the cluster, including the event name, ID, severity, generation time, source, object, and location. By default, the latest 10 events are displayed on each page.
- For clusters of MRS 2.x or earlier, choose Alarms > Events.
- Click Export All to export all event details.
- You can click
to manually refresh the current page and click
to select the columns that will display.
- You can filter events by object or cluster.
- You can click Advanced Search to search for events by event ID, name, severity, start time, or end time.
You can click
on the left of an event to view detailed event parameters. Table 1 describes the parameters.
Table 1 Event parameters Parameter
Description
Event ID
Event ID.
Event Name
Event name.
Event Severity
Event severity. The options are Critical, Major, Minor, and Suggestion.
Generated
Time when an event is generated.
Object
Possible cause of an event.
Serial Number
Number of events generated by the system.
Location
Detailed information for locating the event, which includes the following:
- Source: cluster for which the event is generated.
- ServiceName: service for which the event is generated.
- RoleName: role for which the event is generated.
- HostName: host for which the event is generated.
Additional Info
Error information.
Event Cause
Possible cause of an event.
Source
Cluster name.
Common Events of an MRS Cluster
Event ID |
Component |
Event Name |
Event Severity |
---|---|---|---|
12019 |
Manager |
Stop Service |
Warning |
12021 |
Manager |
Stop RoleInstance |
Warning |
12023 |
Manager |
Delete Node |
Warning |
12024 |
Manager |
Restart Service |
Warning |
12025 |
Manager |
Restart RoleInstance |
Warning |
12026 |
Manager |
Manager Switchover |
Minor |
12065 |
Manager |
Restart Process |
Minor |
12070 |
Manager |
Job Running Succeeded |
Warning |
12071 |
Manager |
Job Running Failed |
Warning |
12072 |
Manager |
Job Killed |
Warning |
12082 |
Manager |
Automatic Disk Isolation Stopped |
Major |
12083 |
Manager |
Slow Disk Isolated |
Major |
12084 |
Manager |
Disk Data Balancing Failed |
Major |
12085 |
Manager |
Disk Recovered |
Major |
12086 |
Manager |
Restart Agent |
Warning |
12087 |
Manager |
Cancel Disk Isolation Failed |
Major |
12088 |
Manager |
Disk Isolation Cancelled |
Major |
12089 |
Manager |
Disk Isolation Failed |
Major |
12090 |
Manager |
Disk Node Isolated |
Major |
12091 |
Manager |
Disk Node Isolation Cancelled |
Major |
12092 |
Manager |
Disk Node Instance Started |
Major |
12093 |
Manager |
Disk Node Isolation Failed |
Major |
12094 |
Manager |
Disk Node Instance Start Failed |
Major |
12095 |
Manager |
Cancel Disk Node Isolation Failed |
Major |
12096 |
Manager |
Disk Node Recovered |
Major |
12097 |
Manager |
Abnormal Connection to OMS Node Network |
Major |
12152 |
Manager |
Start Periodic Replication |
Minor |
12153 |
Manager |
Periodic Replication Completed |
Minor |
12154 |
Manager |
Start Streaming Replication |
Minor |
12155 |
Manager |
Restart Streaming Replication |
Minor |
12156 |
Manager |
Stop Streaming Replication |
Minor |
12157 |
Manager |
Skip Periodic Synchronization |
Minor |
12158 |
Manager |
Host Information Lost |
Minor |
14005 |
HDFS |
NameNode Switchover |
Minor |
14028 |
HDFS |
HDFS Disk Balance |
Minor |
14029 |
HDFS |
Active NameNode In Safe Mode and New FSimage Generated |
Minor |
17001 |
Oozie |
Oozie Workflow Execution Failed |
Major |
17002 |
Oozie |
Oozie Scheduled Job Execution Failed |
Major |
18001 |
Yarn |
ResourceManager Switchover |
Minor |
18004 |
Mapreduce |
JobHistoryServer Switchover |
Minor |
18029 |
Yarn |
Jobs Occupied Too Many Storage Resources |
Minor |
19001 |
HBase |
HMaster Failover |
Minor |
19027 |
HBase |
RegionServer-level Hotspot Transfer |
Major |
19028 |
HBase |
Hotspot Region Splitting |
Major |
19029 |
HBase |
Hotspot Region Isolation |
Major |
20003 |
Hue |
Hue Failover |
Minor |
23002 |
Loader |
Loader Failover |
Major |
24002 |
Flume |
Flume Channel Overflow |
Major |
25001 |
LdapServer |
LdapServer Failover |
Minor |
27000 |
DBService |
DBServer Switchover |
Minor |
38003 |
Kafka |
Topic Data Storage Period Changed |
Warning |
43014 |
Spark |
Spark Data Skew |
Warning |
43015 |
Spark |
Spark SQl Ultra-Large Query Results |
Warning |
43016 |
Spark |
Spark SQL Timeout |
Warning |
43024 |
Spark |
Start JDBCServer |
Warning |
43025 |
Spark |
Stop JDBCServer |
Warning |
43026 |
Spark |
ZooKeeper Connection Succeeded |
Warning |
43027 |
Spark |
Zookeeper Connection Failed |
Warning |
43601 |
GraphBase |
GraphBase Failover |
Minor |
45002 |
HetuEngine |
QAS Failover |
Minor |
45597 |
IoTDB |
Region Replica Supplementation |
Warning |
45651 |
Flink |
FlinkServer Failover |
Minor |
Helpful Links
- You can configure event notifications to receive notification messages through different subscription endpoints (such as SMS messages and emails). For details, see Configuring Notifications for MRS Cluster Alarms and Events.
- You can view MRS operation logs on the management console. For details, see Viewing MRS Operation Logs.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot