One-Click Pressure Test for Lite Server Nodes
Scenario
Lite Server task center provides one-click pressure test. You can quickly perform a pressure test on Lite Server without learning about software stacks such as AI Core and HBM. The task allows you to test the bandwidth, compute, power consumption, and diagnosis pressure of Ascend servers, providing hardware assurance for high-load scenarios such as AI training and inference. In addition, the task can be concurrently executed on multiple servers in batches, greatly improving efficiency.
Constraints
- Currently, only Ascend Snt9b and Ascend Snt9b23 nodes are supported.
- The NodeTaskHub plug-in is required for the node where the task is to be created. Ensure that the plug-in is installed before task creation. For details, see Managing Lite Server AI Plug-ins.
- Only one pressure test task can be executed on a node at the same time. The task cannot be interrupted once started. Plan the task priority.
- Ensure that no services are running on the target nodes. Running commands during the pressure test can cause service interruptions or errors.
- Install the MCU, driver, and firmware for Ascend HDK 23.0.0 or later before starting the pressure test. A preconfigured OS is already installed. If you use a custom OS, ensure that the software has been installed correctly.
- The pressure test requires the Ascend-docker-runtime development kit. This software is pre-installed on the default OS. If you use a custom OS, ensure the software has been installed correctly.
Procedure
- Log in to the ModelArts console.
- In the navigation pane on the left, choose Figure 1 Task center
under Resource Management. On the displayed page, click the Task Center tab.
- Click Create Task in the upper left corner. On the displayed Job Templates page, locate Ascend Stress Testing, and click Create Task.
Figure 2 Task templates
- On the Ascend Stress Testing page, enter the task name and description. Set server model and type, select a pressure test case, select the notice, and click Create now.
Table 1 Parameters for creating a task Parameter
Description
Name
The system automatically enters the name of the pressure test task. You can customize the task name.
Description
Enter the task description for quick search.
Server Model
Only Ascend Snt9b and Ascend Snt9b23 are supported.
Type
You can select Single node or Integrated rack, or search for a specific node by keyword.
Test Case
You can select any of the following pressure test cases. The pressure test cases can be executed one by one or at the same time.
- AI Core Stress Test: Run a stress test on AI Core errors to diagnose issues. The test uses 20 to 40 GB of memory on the host server. Before you start, make sure there is enough memory or the test may fail.
- HBM Stress Test: Run a stress test on the high-bandwidth memory to get results.
- P2P Stress Test: Check for faults on the HCCS communication links between all devices on the test node.
- View the task execution status in the Task Center tab.
Figure 3 Task execution status
- Click the task name to access its details page, where you can view the task details.
Figure 4 Task details
- On the task details page, locate the target node and click View Logs in the Operation column. In the displayed window on the right, view the detailed log about task execution.
Figure 5 Viewing logs
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot