ALM-43029 JDBCServer Job Submission Timed Out
This section applies only to MRS 3.5.0 or later.
Alarm Description
After a user submits a JDBC job, the system attempts to create a JDBCServer process and establish a session connection. This alarm is generated if the preset thresholds are exceeded before the connection is established. There are two configuration parameters affecting alarm triggering:
- spark.thriftserver.proxy.create.session.monitor.enabled: whether to enable the alarm function. The default value is true for the cluster.
- spark.thriftserver.proxy.create.session.timeout.threshold: Maximum time allowed for submitting a JDBC job. This alarm is reported when the system detects that the job does not start after the threshold is exceeded. The unit is second. The default value is 180s.
Alarm Attributes
Alarm ID |
Alarm Severity |
Auto Cleared |
---|---|---|
43029 |
Major |
No |
Alarm Parameters
Type |
Parameter |
Description |
---|---|---|
Location Information |
Source |
Specifies the cluster for which the alarm was generated. |
ServiceName |
Specifies the service for which the alarm was generated. |
|
RoleName |
Specifies the role for which the alarm was generated. |
|
HostName |
Specifies the host for which the alarm was generated. |
|
Additional Information |
User_Queue |
Name of the user who submits the alarm and the queue for which the alarm is generated. |
Impact on the System
The JDBC job submission time increases due to high system load, which may also affect the job execution efficiency. The job can start properly after this alarm is reported because the detection is asynchronous.
Possible Causes
The JDBCServer on the node is overloaded. The cluster health reflected by the system metrics and job execution status is not good.
Handling Procedure
Check the JDBCServer instance for which the alarm is generated.
- On FusionInsight Manager, choose O&M > Alarm > Alarms and select the alarm whose ID is 43029. Check the role name and the IP address of the host where the alarm is generated in Location. Check the username and queue in Additional Information.
Re-executing Affected JDBCServer Jobs
- Choose Cluster > Services > Yarn > ResourceManager (Active) to log in to the YARN web UI. Find the corresponding application based on the username and queue name in Additional Information and check whether the job submission is affected based on the driver log and Spark UI. Confirm and record the affected job to execute it again.
- On FusionInsight Manager, choose Cluster > Services > Spark > Instances, click the JDBCServer for which this alarm is generated, and choose More > Restart Instance.
- Choose O&M > Alarm > Alarms, search for the reported alarm, and click Clear in the Operation column.
- Execute the affected job and check whether the alarm is triggered again for the job.
- If no, no further action is required.
- If yes, go to 6.
Collect fault information.
- On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
- Expand the Service drop-down list, and select Spark for the target cluster.
- Click the edit icon in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
- Contact O&M engineers and provide the collected logs.
Alarm Clearance
This alarm needs to be cleared manually.
Related Information
None.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot