Help Center/ Cloud Eye/ User Guide/ Server Monitoring/ Installing the GPU Metrics Collection Plug-in (Linux)
Updated on 2023-11-08 GMT+08:00

Installing the GPU Metrics Collection Plug-in (Linux)

Scenarios

This topic describes how to install the plug-in to collect GPU and RAID metrics.

  • ECSs support GPU metrics while BMSs do not.
  • BMSs support RAID metrics while ECSs do not.
  • If the Agent is upgraded to 1.0.5 or later, the corresponding plug-in must use the latest version. Otherwise, the metric collection will fail.

Prerequisites

  • The Agent has been installed and is running properly.
  • GPU metric collection requires ECSs to support GPU.
  • Run the following command to check the Agent version:

    if [[ -f /usr/local/uniagent/extension/install/telescope/bin/telescope ]]; then /usr/local/uniagent/extension/install/telescope/bin/telescope -v; elif [[ -f /usr/local/telescope/bin/telescope ]]; then echo "old agent"; else echo 0; fi

    • If old agent is displayed, the early version of the Agent is used.
    • If a version is returned, the new version of the Agent is used.
    • If 0 is returned, the Agent is not installed.

Procedure (New Version)

  1. Log in to an ECS as user root.
    • To monitor the BMS software RAID metrics, log in to a BMS.
    • The examples in the following procedure are based on the GPU plug-in installation. The installation for the software RAID plug-in is similar.
  2. Run the following command to go to the Agent installation path /usr/local/telescope:

    cd /usr/local/uniagent/extension/install/telescope

  3. Run the following command to create the plugins folder:

    mkdir plugins

  4. Run the following command to enter the plugins folder:

    cd plugins

  5. To download the script of the GPU metric collection plug-in, run the following command:

    wget https://telescope-eu-west-101.obs.eu-west-101.myhuaweicloud.eu/gpu_collector

    Table 1 Obtaining the plug-in installation package

    Name

    Download Path

    Linux 64-bit installation package of the GPU metric collection plug-in

    eu-west-101: https://telescope-eu-west-101.obs.eu-west-101.myhuaweicloud.eu/gpu_collector

  6. Run the following command to add the script execution permissions:

    chmod 755 gpu_collector

  7. Run the following command to create the conf.json file, add the configuration content, and configure the plug-in path and metric collection period crontime, which is measured in seconds:

    vi conf.json

    GPU metric plug-in configuration

    { 
       "plugins": [ 
         { 
           "path": "/usr/local/uniagent/extension/install/telescope/plugins/gpu_collector", 
           "crontime": 60 
         }
      ] 
    }

    RAID metric plug-in configuration

    { 
       "plugins": [ 
         { 
           "path": "/usr/local/uniagent/extension/install/telescope/plugins/raid_monitor.sh", 
           "crontime": 60 
         }
      ] 
    }
    • The parameters gpu_collector and raid_monitor.sh indicate the GPU plug-in and RAID plug-in configuration.
    • The collection period of the plug-in is 60 seconds. If the collection period is incorrectly configured, the metric collection will be abnormal.
    • Do not change the plug-in path without permission. Otherwise, the metric collection will be abnormal.
  8. Open the conf_ces.json file in the /usr/local/uniagent/extension/install/telescope/bin directory. Add "EnablePlugin": true to the file to enable the plug-in to collect metric data.
    {    
        "Endpoint": "Region address. Retain the default value.",
        "EnablePlugin": true
    }
  9. Restart the Agent:

    ps -ef | grep telescope | grep -v grep | awk '{print $2}' | xargs kill -9

Procedure (for the Early Version of the Agent)

  1. Log in to an ECS as user root.
    • To monitor the BMS software RAID metrics, log in to a BMS.
    • The examples in the following procedure are based on the GPU plug-in installation. The installation for the software RAID plug-in is similar.
  2. Run the following command to go to the Agent installation path /usr/local/telescope:

    cd /usr/local/telescope

  3. Run the following command to create the plugins folder:

    mkdir plugins

  4. Run the following command to enter the plugins folder:

    cd plugins

  5. To download the script of the GPU metric collection plug-in, run the following command:

    wget https://telescope-eu-west-101.obs.eu-west-101.myhuaweicloud.eu/gpu_collector

    Table 2 Obtaining the plug-in installation package

    Name

    Download Path

    Linux 64-bit installation package of the GPU metric collection plug-in

    eu-west-101: https://telescope-eu-west-101.obs.eu-west-101.myhuaweicloud.eu/gpu_collector

  6. Run the following command to add the script execution permissions:

    chmod 755 gpu_collector

  7. Run the following command to create the conf.json file, add the configuration content, and configure the plug-in path and metric collection period crontime, which is measured in seconds:

    vi conf.json

    GPU metric plug-in configuration

    { 
       "plugins": [ 
         { 
           "path": "/usr/local/telescope/plugins/gpu_collector", 
           "crontime": 60 
         }
      ] 
    }

    RAID metric plug-in configuration

    { 
       "plugins": [ 
         { 
           "path": "/usr/local/telescope/plugins/raid_monitor.sh", 
           "crontime": 60 
         }
      ] 
    }
    • The parameters gpu_collector and raid_monitor.sh indicate the GPU plug-in and RAID plug-in configuration.
    • The collection period of the plug-in is 60 seconds. If the collection period is incorrectly configured, the metric collection will be abnormal.
    • Do not change the plug-in path without permission. Otherwise, the metric collection will be abnormal.
  8. Open the conf_ces.json file in the /usr/local/telescope/bin directory. Add "EnablePlugin": true to the file to enable the plug-in to collect metric data.
    {    
        "Endpoint": "Region address. Retain the default value.",
        "EnablePlugin": true
    }
  9. Run the following command to restart the Agent:

    /usr/local/telescope/telescoped restart