Help Center/ MapReduce Service/ User Guide/ MRS Cluster O&M/ MRS Cluster Alarm Handling Reference/ ALM-45004 Tasks Stacked on HetuEngine Compute Instance
Updated on 2024-11-13 GMT+08:00

ALM-45004 Tasks Stacked on HetuEngine Compute Instance

This section applies to MRS 3.3.1 or later.

Alarm Description

The system checks the number of running tasks on a HetuEngine compute instance every 30 seconds. This alarm is generated when the number of running tasks is greater than 50.

This alarm is cleared when the number of tasks running on the HetuEngine compute instance is no more than 50.

Alarm Attributes

Alarm ID

Alarm Severity

Auto Cleared

45004

Major

Yes

Alarm Parameters

Type

Parameter

Description

Location Information

Source

Specifies the cluster for which the alarm was generated.

ServiceName

Specifies the service for which the alarm was generated.

RoleName

Specifies the role for which the alarm was generated.

HostName

Specifies the host for which the alarm was generated.

Additional Information

Running Queries Backlog

Specifies the tenant name of the compute instance for which the alarm is generated and how much the threshold is exceeded.

Impact on the System

The performance of the compute instance deteriorates and the SQL response becomes slow.

Possible Causes

  • The compute instance specification is too small.
  • Large SQL tasks occupy too many compute resources. No resource is available for other tasks, and the compute instance cannot respond quickly. As a result, tasks are stacked.

Handling Procedure

Check whether compute instance resources are properly configured.

  1. Log in to FusionInsight Manager as an administrator who can access the HetuEngine web UI.
  2. Choose O&M > Alarm > Alarms > Tasks Stacked on HetuEngine Compute Instance, check the Additional Information of the alarm, and view and record the tenant name for which the alarm is generated.
  3. Choose Cluster > Services > HetuEngine. In the Basic Information area in the Dashboard tab, click the link next to HSConsole Web UI. The HSConsole page is displayed.
  4. On the Compute Instance page, click Configure in the Operation column of the tenant to which the compute instance belongs. Check whether the resource configured for the compute instance is proper. (The the minimum resources are used by default. You can adjust the configuration based on the site requirements.)

    • If yes, go to 8.
    • If no, go to 5.

  5. Return to the compute instance list, click Stop Instances in the Operation column, and stop instances as prompted.

    Tasks submitted to the stopped compute instances will be interrupted.

  6. Click Configure, add resources to the target compute instance based on the site requirements, and click OK. Click Start Instances and start instances as prompted.
  7. Wait 2 minutes and check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 8.

Check whether there are large SQL tasks.

  1. On the Compute Instances page, expand the instances of the tenant and click LINK in the WebUI column of a compute instance to view the status of all tasks.
  2. In the Sort column, select Execution Time to sort the running tasks and check whether there are tasks that have been running for hours.

    • If yes, go to 10.
    • If no, go to 12.

  3. End the tasks that have been running for a long time based on service requirement and optimize the service SQL statements.
  4. Wait 2 minutes and check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 12.

Collect fault information.

  1. On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
  2. Expand the Service drop-down list, select HetuEngine for the target cluster, and click OK.
  3. Expand the Hosts drop-down list. In the Select Host dialog box that is displayed, select the hosts to which the role belongs, and click OK.
  4. Click the edit icon in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
  5. Contact O&M engineers and provide the collected logs.

Alarm Clearance

This alarm is automatically cleared after the fault is rectified.

Related Information

None.