Updated on 2025-05-07 GMT+08:00

Cloud O&M Team

The cloud O&M team is responsible for the routine management and maintenance and ensures high availability, high security, and high performance of cloud infrastructure. The team needs to work with application O&M administrators to ensure that cloud-based business can run stably and securely. The team also needs to continuously improve O&M efficiency through automation and intelligent technologies. The cloud O&M team usually consists of cloud infrastructure administrators, cloud network administrators, database administrators, and automation engineers. The following table lists their responsibilities and skill requirements.

Table 1 Roles and responsibilities of a cloud O&M team

Role

Responsibility

Skill Requirements

Source

Cloud infrastructure administrator

  • Perform routine O&M management on infrastructure such as storage, VMs, and operating systems on the cloud platform.
  • Monitor and optimize the usage of cloud resources to ensure proper resource allocation.
  • Handle VM, storage, and operating system faults to keep the system highly available.
  • Periodically update system patches and harden system security.
  • Be familiar with VMs and cloud storage services of mainstream cloud platforms.
  • Master the management and optimization of Linux and Windows.
  • Be familiar with cloud native monitoring and O&M tools.
  • Be able to write scripts.
  • Excel in troubleshooting and problem-solving capabilities

IT department

Cloud network administrator

  • Design, configure, and perform routine O&M on the cloud network architecture to ensure network stability and security.
  • Manage network components such as VPNs, private lines, VPCs, subnets, network ACLs, routes, load balancers, and firewalls.
  • Monitor network performance, rectify network faults, and optimize network latency and bandwidth usage.
  • Ensure network security and prevent network threats such as DDoS attacks.
  • Be familiar with network services (such as VPC, VPN, private line, load balancer, and firewall services) and their configurations on the cloud platform.
  • Be familiar with network protocols such as TCP/IP, HTTP, DNS, and TLS.
  • Have the capability of troubleshooting network faults.
  • Be familiar with network security technologies (such as firewall rule configuration and intrusion detection).

IT department

Middleware administrator

  • Install, configure, and maintain message queue services (such as Kafka and RabbitMQ), web servers (such as Nginx and Apache), application servers (such as Tomcat and JBoss), and cache services (such as Memcached and Redis).
  • Monitor performance metrics, identify performance bottlenecks, and improve performance and efficiency of middleware services.
  • Quickly diagnose and rectify faults and problems of middleware services to ensure business continuity.
  • Master common middleware technologies, such as Kafka, RabbitMQ, Nginx, and Tomcat.
  • Be familiar with the deployment and management of middleware services on mainstream cloud platforms.
  • Be familiar with operating systems such as Linux and Windows Server.
  • Understand DevOps concepts and practices.
  • Be able to write scripts.
  • Excel in troubleshooting and problem-solving capabilities

IT department

Database administrator

  • Deploy, configure, monitor, and maintain cloud databases.
  • Ensure high availability and data security of databases and periodically perform backup and recovery drills.
  • Optimize database performance and solve problems such as slow query and lock waiting.
  • Manage database permissions and access control to ensure data compliance.
  • Be familiar with database services and database management services on the cloud platform.
  • Be familiar with the management of mainstream databases (such as MySQL and PostgreSQL).
  • Master database performance optimization technologies (such as index optimization and database sharding/partitioning).
  • Have O&M experience in database backup and restoration, primary/secondary synchronization, and distributed architecture.
  • Be familiar with database security policies and data encryption technologies.

IT department

Automation engineer

  • Develop and maintain automation O&M tools to improve O&M efficiency.
  • Implement automatic deployment, monitoring, and expansion of cloud resources.
  • Write scripts or code to automate routine O&M tasks.
  • Promote the application of intelligent O&M technologies, such as AIOps.
  • Be familiar with automation tools (such as Ansible, Terraform, and SaltStack).
  • Master script languages (such as Python and Shell) and cloud platform APIs.
  • Understand DevOps and be familiar with the CI/CD process and tools.
  • Understand AIOps technologies.

IT department