Creating a HetuEngine Compute Instance
Scenario
This section describes how to create a HetuEngine compute instance. If you want to stop the cluster where compute instances are successfully created, you need to manually stop the compute instances first. If you want to use the compute instances after the cluster is restarted, you need to manually start them.
A single tenant can create multiple compute instances to balance loads, improving performance and fault tolerance.
Prerequisites
- You have created a user for accessing the HetuEngine web UI, for example, hetu_user. For details, see Creating a HetuEngine User.
- You have created a tenant in the cluster to be operated. Ensure that the tenant has sufficient memory and CPUs when modifying the HetuEngine compute instance configuration.
- You must use a leaf tenant when you create a HetuEngine compute instance because YARN jobs can only be submitted to the queues of a leaf tenant.
- To avoid uncertainties caused by resource competition, you are advised to create independent resource pools for tenants used by HetuEngine.
- The startup of HetuEngine compute instances depends on Python 3. Ensure that Python 3 has been installed on all nodes in the cluster and the Python soft link has been added to the /usr/bin/ directory. For details, see How Do I Do If an Error Is Reported Indicating that Python Does Not Exist When a Compute Instance Fails to Start?.
- The HetuEngine service is running properly.
Procedure
- Log in to FusionInsight Manager as user hetu_user and choose Cluster > Services > HetuEngine.
- In the Basic Information area on the Dashboard page, click the link next to HSConsole WebUI. The HSConsole page is displayed.
- Click Compute Instance then Create Configuration and configure the compute instance parameters.
- Set parameters in the Basic Configuration area. For details about the parameters, see Table 1.
Table 1 Basic configuration Parameter
Description
Example Value
Tenant
Tenant to which the instance belongs. Only tenants without compute instances can be selected for new compute instances.
Select a value from the Tenant drop-down list.
Instance Deployment Timeout Period (s)
Timeout interval for starting a compute instance by Yarn service deployment. The system starts timing when the compute instance is started. If the compute instance is still in the Creating or Starting state after the time specified by this parameter expires, the compute instance status is displayed as Error and the compute instance that is being created or started on Yarn is stopped.
300
The value ranges from 1 to 2147483647.
Instance Count
The number of compute instances created under the current tenant.
1
Value range: 1 to 50
- Set parameters in the Coordinator Container Resource Configuration area. For details about the parameters, see Table 2.
Table 2 Parameters for configuring Coordinator container resources Parameter
Description
Example Value
Container Memory (MB)
Memory size (MB) allocated by Yarn to a single container of the compute instance Coordinator
Default value: 5120
The value ranges from 1 to 2147483647.
vcore
Number of vCPUs (vCores) allocated by Yarn to a single container of the compute instance Coordinator
Default value: 1
The value ranges from 1 to 2147483647.
Quantity
Number of containers allocated by Yarn to the compute instance Coordinator
Default value: 2
The value ranges from 1 to 3.
JVM
Log in to FusionInsight Manager and choose Cluster > Services > HetuEngine > Configurations. On the All Configurations tab page, search for extraJavaOptions. The value of this parameter in the coordinator.jvm.config parameter file is the value of the JVM parameter.
-
- Set parameters in the Worker Container Resource Configuration area. For details about the parameters, see Table 3.
Table 3 Parameters for configuring Worker container resources Parameter
Description
Example Value
Container Memory (MB)
Memory size (MB) allocated by Yarn to a single container of the compute instance Worker
Default value: 10240
The value ranges from 1 to 2147483647.
vcore
Number of vCPUs (vCores) allocated by Yarn to a single container of the compute instance Worker
Default value: 1
The value ranges from 1 to 2147483647.
Quantity
Number of containers allocated by Yarn to the compute instance Worker
Default value: 2
The value ranges from 1 to 256.
JVM
Log in to FusionInsight Manager and choose Cluster > Services > HetuEngine > Configurations. On the All Configurations tab page, search for extraJavaOptions. The value of this parameter in the worker.jvm.config parameter file is the value of the JVM parameter.
-
- Set parameters in the Advanced Configuration area. For details about the parameters, see Table 4.
Table 4 Advanced configuration parameters Parameter
Description
Example Value
Ratio of Query Memory
This parameter indicates the ratio of the node query memory to the JVM memory and is set to 0.7 by default. When it is set to 0, the compute function is disabled. In this case, you can start the compute instance only if the value of -Xmx in the JVM configuration is no less than the sum of memory.heap-headroom-per-node and query.max-memory-per-node configured for the coordinator or worker.
0.7
Scaling
If auto scaling is enabled, you can increase or decrease the number of workers without restarting the instance. However, the instance performance may deteriorate. In multi-instance mode, automatic scaling cannot be enabled. For details about the parameters for enabling dynamic scaling, see Configuring the Number of Worker Nodes.
-
Maintenance Instance
To enable automatic refresh for materialized views, there must be one compute instance that is set as the maintenance instance and the compute instance is globally unique. If there are multiple compute instances, only one compute instance can be used as the maintenance instance.
-
- Configure Custom Configuration parameters. You can add custom parameters to a specified parameter file. Select the specified parameter file from the Parameter File drop-down list.
- You can click Add to add custom configuration parameters.
- You can click Delete to delete custom configuration parameters.
- You can set Parameter File to resource-groups.json to configure the resource group mechanism. Table 5 describe the resource group configuration parameter. For details about how to configure a resource group, see Configuring Resource Groups.
Table 5 Resource group configuration parameter Parameter
Description
Example Value
resourcegroups
Resource management group configuration of the cluster. Select resource-groups.json from the drop-down list of the parameter file.
{ "rootGroups": [{ "name": "global", "softMemoryLimit": "100%", "hardConcurrencyLimit": 1000, "maxQueued": 10000, "killPolicy": "no_kill" }], "selectors": [{ "group": "global" }] }
- After a custom parameter is configured in the coordinator.config.properties, worker.config.properties, log.properties, and resource-groups.json parameter files, if the parameter already exists in another specified parameter file, the value of this custom parameter will replace the values of this parameter in the specified parameter file. If the custom parameter does not exist in another specified parameter file, the custom parameter is added to the specified parameter file.
- killPolicy: After a query is submitted to worker, if the total memory usage exceeds softMemoryLimit, you can select one of the following policies to terminate running queries:
- no_kill (default value): Do not terminate the queries.
- recent_queries: Terminate the queries based on the execution sequence in descending order.
- oldest_queries: Terminate the queries based on the execution sequence.
- finish_percentage_queries: Terminate the queries based on query execution percentage. The query with the smallest percentage of execution will be terminated first.
- high_memory_queries: Terminate the queries based on memory usage. Queries with high memory usage are terminated first to free up more memory with the minimum number of query terminations. If the memory usage of two queries is less than 10%, the query with slower progress (smaller execution percentage) is terminated. If the difference between the execution percentages of two queries is less than 5%, the query with larger memory usage is terminated.
- Determine whether to start the instance immediately after the configuration is complete.
- If yes, the instance is automatically restarted immediately after the configuration is complete.
- If no, you need to manually start the instance after the configuration is complete.
- Set parameters in the Basic Configuration area. For details about the parameters, see Table 1.
- Click OK and wait until the instance configuration is complete.
Precautions for Maintaining Compute Instances
- During the restart or rolling restart of the HetuEngine service, do not create, start, stop, or delete HetuEngine compute instances on HSConsole.
- By default, a maximum of 10 compute instances can be in the starting, creating, deleting, stopping, scaling out, scaling in, or rolling restart state at the same time. O&M tasks that exceed 10 will wait to be executed in the background. To change the number of concurrent tasks, log in to FusionInsight Manager, choose HetuEngine and click the Configurations tab and then All Configurations. On the displayed page, search for hsbroker.event.task.executor.threads and change its value.
- Precautions for restarting HetuEngine compute instances
- During the restart or rolling restart of HetuEngine compute instances, do not perform any change operations on the data sources on the HetuEngine and HetuEngine web UI, including restarting HetuEngine and changing its configurations.
- If a compute instance has only one coordinator or worker node, do not perform a rolling restart of the instance.
- If the number of worker nodes is greater than 10, the rolling restart of the instance may take more than 200 minutes. During this period, do not perform other O&M operations.
- During the rolling restart of compute instances, HetuEngine releases YARN resources and applies for them again. Ensure that the CPU and memory of YARN are sufficient for starting 20% workers and YARN resources are not preempted by other jobs. Otherwise, the rolling restart will fail.
Viewing YARN resources: Log in to FusionInsight Manager and choose Tenant Resources. On the navigation pane on the left, choose Tenant Resources Management to view the available queue resources of YARN in the Resource Quota area.
Viewing the CPU and memory of a worker container: Log in to FusionInsight Manager as a user who can access the HetuEngine WebUI and choose Cluster > Services > HetuEngine. In the Basic Information area, click the link next to HSConsole WebUI to go to the HSConsole page. Click Operation in the row where the target instance is located and click Configure.
- During the rolling restart, ensure that Application Manager of coordinators or workers in the YARN queue runs stably.
- HetuEngine compute instance restart exception handling
- If Application Manager of coordinators or workers in the YARN queues is restarted during the rolling restart, the compute instances may be abnormal. In this case, you need to stop the compute instances and then start the compute instance for recovery.
- After the rolling restart of a compute instance fails, the instance is in the subhealthy state. As a result, the configuration or number of coordinator or worker nodes may become incosistent. In this case, the subhealthy state of the compute instance cannot be automatically recovered. You need to manually check and rectify the fault, perform the rolling restart again, or stop and then restart the compute instance.
Compute Instance Statuses
After a compute instance is created, you can view information about the created instance on the Compute Instance tab page, including the tenant name, number of instances, instance status, and total resources. Instance statuses are as follows:
- Green icon: The instance is in the running or subhealthy state.
- Red icon: The instance is faulty.
- Gray icon: The instance has been stopped and is to be started.
- Blue icon: The instance is in other states, including scaling out, scaling in, rolling restart, creating, starting, safely starting, shutting down, safely shutting down, terminating, terminated, and stopping.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot