Updated on 2023-04-28 GMT+08:00

Creating a HetuEngine Compute Instance

Scenario

This section describes how to create a HetuEngine compute instance. If you want to stop the cluster where compute instances are successfully created, you need to manually stop the compute instances first. If you want to use the compute instances after the cluster is restarted, you need to manually start them.

Prerequisites

  • You have created a user for accessing the HetuEngine web UI, for example, hetu_user. For details, see Creating a HetuEngine User.
  • You have created a tenant in the cluster to be operated. Ensure that the tenant has sufficient memory and CPUs when modifying the HetuEngine compute instance configuration.
    • You must use a leaf tenant when you create a HetuEngine compute instance because YARN jobs can only be submitted to the queues of a leaf tenant.
    • To avoid uncertainties caused by resource competition, you are advised to create independent resource pools for tenants used by HetuEngine.

Procedure

  1. Log in to FusionInsight Manager as user hetu_user and choose Cluster > Services > HetuEngine.
  2. In the Basic Information area on the Dashboard page, click the link next to HSConsole WebUI. The HSConsole page is displayed.
  3. Click Create Configuration above the instance list. In the Configure Instance dialog box, set parameters.

    1. Set parameters in the Basic Configuration area. For details about the parameters, see Table 1.
      Table 1 Basic configuration

      Parameter

      Description

      Example Value

      Resource Queue

      Resource queue of the instance. Only one compute instance can be created in a resource queue.

      Select a queue from the Resource Queue drop-down list.

      Instance Deployment Timeout Period (s)

      Timeout interval for starting a compute instance by Yarn service deployment. The system starts timing when the compute instance is started. If the compute instance is still in the Creating or Starting state after the time specified by this parameter expires, the compute instance status is displayed as Error and the compute instance that is being created or started on Yarn is stopped.

      300

      The value ranges from 1 to 2147483647.

    2. Set parameters in the Coordinator Container Resource Configuration area. For details about the parameters, see Table 2.
      Table 2 Parameters for configuring Coordinator container resources

      Parameter

      Description

      Example Value

      Container Memory (MB)

      Memory size (MB) allocated by Yarn to a single container of the compute instance Coordinator

      Default value: 5120

      The value ranges from 1 to 2147483647.

      vcore

      Number of vCPUs (vCores) allocated by Yarn to a single container of the compute instance Coordinator

      Default value: 1

      The value ranges from 1 to 2147483647.

      Quantity

      Number of containers allocated by Yarn to the compute instance Coordinator

      Default value: 2

      The value ranges from 1 to 3.

      JVM

      Log in to FusionInsight Manager and choose Cluster > Services > HetuEngine > Configurations. On the All Configurations tab page, search for extraJavaOptions. The value of this parameter in the coordinator.jvm.config parameter file is the value of the JVM parameter.

      -

    3. Set parameters in the Worker Container Resource Configuration area. For details about the parameters, see Table 3.
      Table 3 Parameters for configuring Worker container resources

      Parameter

      Description

      Example Value

      Container Memory (MB)

      Memory size (MB) allocated by Yarn to a single container of the compute instance Worker

      Default value: 10240

      The value ranges from 1 to 2147483647.

      vcore

      Number of vCPUs (vCores) allocated by Yarn to a single container of the compute instance Worker

      Default value: 1

      The value ranges from 1 to 2147483647.

      Quantity

      Number of containers allocated by Yarn to the compute instance Worker

      Default value: 2

      The value ranges from 1 to 256.

      JVM

      Log in to FusionInsight Manager and choose Cluster > Services > HetuEngine > Configurations. On the All Configurations tab page, search for extraJavaOptions. The value of this parameter in the worker.jvm.config parameter file is the value of the JVM parameter.

      -

    4. Set parameters in the Advanced Configuration area. For details about the parameters, see Table 4.
      Table 4 Advanced configuration parameters

      Parameter

      Description

      Example Value

      Ratio of Query Memory

      This parameter indicates the ratio of the node query memory to the JVM memory and is set to 0.7 by default. When it is set to 0, the compute function is disabled. In this case, you can start the compute instance only if the value of -Xmx in the JVM configuration is no less than the sum of memory.heap-headroom-per-node and query.max-memory-per-node configured for the coordinator or worker.

      0

      Scaling

      If auto scaling is enabled, you can increase or decrease the number of workers without restarting the instance. However, the instance performance may deteriorate. For details about the parameters for enabling dynamic scaling, see Configuring the Number of Worker Nodes.

      OFF

      Maintenance Instance

      To enable the automatic refresh capability of the materialized views, a computing instance that is set as a maintenance instance must exist.

      OFF

    5. Configure Custom Configuration parameters. Choose Advanced Configuration > Custom Configuration and add custom parameters to the specified parameter file. Select the specified parameter file from the Parameter File drop-down list.
      • You can click Add to add custom configuration parameters.
      • You can click Delete to delete custom configuration parameters.
      • You can set Parameter File to resource-groups.json to configure the resource group mechanism. Table 5 describe the resource group configuration parameter. For details about how to configure a resource group, see Configuring Resource Groups.
        Table 5 Resource group configuration parameter

        Parameter

        Description

        Example Value

        resourcegroups

        Resource management group configuration of the cluster. Select resource-groups.json from the drop-down list of the parameter file.

        {
        "rootGroups": [{
        "name": "global",
        "softMemoryLimit": "100%",
        "hardConcurrencyLimit": 1000,
        "maxQueued": 10000,
        "killPolicy": "no_kill"
        }],
        "selectors": [{
        "group": "global"
        }]
        }
      • After a custom parameter is configured in the coordinator.config.properties, worker.config.properties, log.properties, and resource-groups.json parameter files, if the parameter already exists in another specified parameter file, the value of this custom parameter will replace the values of this parameter in the specified parameter file. If the custom parameter does not exist in another specified parameter file, the custom parameter is added to the specified parameter file.
      • killPolicy: After a query is submitted to worker, if the total memory usage exceeds softMemoryLimit, you can select one of the following policies to terminate running queries:
        • no_kill (default value): Do not terminate the queries.
        • recent_queries: Terminate the queries based on the execution sequence in descending order.
        • oldest_queries: Terminate the queries based on the execution sequence.
        • finish_percentage_queries: Terminate the queries based on query execution percentage. The query with the smallest percentage of execution will be terminated first.
        • high_memory_queries: Terminate the queries based on memory usage. Queries with high memory usage are terminated first to free up more memory with the minimum number of query terminations. If the memory usage of two queries is less than 10%, the query with slower progress (smaller execution percentage) is terminated. If the difference between the execution percentages of two queries is less than 5%, the query with larger memory usage is terminated.
    6. Determine whether to start the instance immediately after the configuration is complete.
      • Select Start Now to start the instance immediately after the configuration is complete.
      • Deselect Start Now and manually start the instance after the configuration is complete.

  4. Click OK and wait until the instance configuration is complete.

    • Restarting HetuEngine

      During the restart or rolling restart of the HetuEngine service, do not create, start, stop, or delete HetuEngine compute instances on HSConsole.

    • Performing batch operations on HetuEngine compute instances

      By default, a maximum of 10 compute instances can be in the starting, creating, deleting, stopping, scaling out, scaling in, or rolling restart state at the same time. O&M tasks that exceed 10 will wait to be executed in the background. To change the number of concurrent tasks, log in to FusionInsight Manager, choose HetuEngine and click the Configurations tab and then All Configurations. On the displayed page, search for hsbroker.event.task.executor.threads and change its value.

    • Restarting HetuEngine compute instances
      • During the restart or rolling restart of HetuEngine compute instances, do not perform any change operations on the data sources on the HetuEngine and HetuEngine web UI, including restarting HetuEngine and changing its configurations.
      • If a compute instance has only one coordinator or worker node, do not perform a rolling restart of the instance.
      • If the number of worker nodes is greater than 10, the rolling restart of the instance may take more than 200 minutes. During this period, do not perform other O&M operations.
      • During the rolling restart of compute instances, HetuEngine releases YARN resources and applies for them again. Ensure that the CPU and memory of YARN are sufficient for starting 20% workers and YARN resources are not preempted by other jobs. Otherwise, the rolling restart will fail.

        Viewing YARN resources: Log in to FusionInsight Manager and choose Tenant Resources. On the navigation pane on the left, choose Tenant Resources Management to view the available queue resources of YARN in the Resource Quota area.

        Viewing the CPU and memory of a worker container: Log in to FusionInsight Manager as a user who can access the HetuEngine WebUI and choose Cluster > Services > HetuEngine. In the Basic Information area, click the link next to HSConsole WebUI to go to the HSConsole page. Click Operation in the row where the target instance is located and click Configure.

      • During the rolling restart, ensure that Application Manager of coordinators or workers in the YARN queue runs stably.

      Troubleshooting

      • If Application Manager of coordinators or workers in the YARN queues is restarted during the rolling restart, the compute instances may be abnormal. In this case, you need to stop the compute instances and then start the compute instance for recovery.
      • After the rolling restart of a compute instance fails, the instance is in the subhealthy state. As a result, the configuration or number of coordinator or worker nodes may become incosistent. In this case, the subhealthy state of the compute instance cannot be automatically recovered. You need to manually check and rectify the fault, perform the rolling restart again, or stop and then restart the compute instance.