Updated on 2024-06-26 GMT+08:00

e-backup (EOM)

Introduction

The e-backup add-on offers cluster backup and restoration. It backs up application data and service data to OBS and provides local and remote data backup.

Constraints

  • Do not add, delete, or modify the cluster during the backup/restore. Otherwise, the backup/restore may fail or become incomplete.
  • If you change the cluster, you are advised to wait for 15 minutes until the cluster is stable and then perform the backup operation.
  • When EVS disk snapshots are used for backup, only EVS PVs are supported and the snapshot constraints apply (for example, cross-AZ restoration is not supported). The pricing is the same as EVS disk snapshots.
  • When restic is used for backup, data of EVS, SFS, SFS Turbo, and OBS PVs is backed up and uploaded to the OBS backup repository.
  • restic creates a snapshot for the data at the backup time point and uploads the data, which does not affect subsequent data read and write. However, restic does not verify the file content and service consistency. restic restrictions apply.
  • The memory occupied by restic is related to the size of the PV data backed up for the first time. If the data size is greater than 500 GB, you are advised to use the migration methods provided by cloud storage services. If you use this add-on, you can modify the resource quotas of the restic container by referring to the operation guide.
  • You can use Hooks to ensure service data consistency for stateful applications during backup, for example, synchronizing memory data to files.
  • During the restore, you can adjust configurations to adapt to the environment differences before and after the migration.
    • An application can be restored from the original namespace to another specified namespace. However, confirm that the application is not accessed through a fixed Service during the restore.
    • You can change the image address (repo) of the application to another image path. The image name and tag remain unchanged during the restore.
    • You can change the name of the storage class used by the application to a new one. Note that the backend storage resources must be of the same type, for example, from block storage to block storage.
  • Velero and restic constraints apply. For example, during the restore, the Service will clear the ClusterIP to better adapt to the differences between the source and target Kubernetes clusters.

Installing the Add-on

  1. Log in to the CCE console. In the navigation pane, choose Add-ons. Locate the e-backup add-on and click Install.
  2. On the Install Add-on page, select the cluster, set parameters, and click Install.

    The following parameter is supported:

    volumeWorkerNum: number of concurrent volume backup jobs. The default value is 3.

Using the Add-on

e-backup uses OBS buckets as backup storage location. Before backing up data, perform operations in Preparing Keys and Creating a Storage Location.

Backups can be immediate and scheduled. Restores can be immediate.

Preparing Keys

  1. Obtain an access key.

    Log in to the CCE console, move the cursor to the username in the upper right corner, and choose My Credentials. In the navigation pane on the left, choose Access Keys. On the page displayed, click Add Access Key.

  2. Create a key file and format it into a string using Base64.

    # Create a key file.
    $ vi credential-for-huawei-obs
    HUAWEI_CLOUD_ACCESS_KEY_ID=your_access_key
    HUAWEI_CLOUD_SECRET_ACCESS_KEY=your_secret_key
    
    # Use Base64 to format the string.
    $ base64 -w 0 credential-for-huawei-obs
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXHWOBS

  3. Create a secret.

    Create a secret using the following YAML content:
    apiVersion: v1
    kind: Secret
    metadata:
      labels:
        secret.everest.io/backup: 'true'   # The secret is used by e-backup to access the backup storage location.
      name: secret-secure-opaque
      namespace: velero                  # The value must be velero. The secret must be in the same namespace as e-backup.
    type: cfe/secure-opaque
    data:
      # String obtained after the credential file is Base64-encoded.
      cloud: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXHWOBS
    • The secret must be in the same namespace as e-backup, that is, velero.
    • secret.data stores the secret for accessing OBS. The key must be cloud, and the value is the string Base64-encoded in 2. Generally, the displayed Base64-encoded string contains line breaks. Manually delete them when writing the string into secret.data.
    • The secret must be labeled secret.everest.io/backup: true, indicating that the secret is used to manage the backup storage location.

Creating a Storage Location

Create a Kubernetes resource object used by e-backup as the backup storage location to obtain and detect information about the backend OBS.

apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  name: backup-location-001
  namespace: velero            # The object must be in the same namespace as e-backup.
spec:
  config:
    endpoint: obs.{regionname}.myhuaweicloud.com   # OBS endpoint
  credential:
    name: secret-secure-opaque   # Name of the created secret
    key: cloud                   # Key in secret.data
  objectStorage:
    bucket: tools-cce        # OBS bucket name
    prefix: for-backup       #Subpath name
  provider: huawei           # Uses the OBS service.
  • The prefix field is optional, and other fields are mandatory. The value of provider is fixed at huawei.
  • You can obtain the endpoint from Regions and Endpoints. Ensure that all nodes in the cluster can access the endpoint. If the endpoint does not carry a protocol header (http or https), https is used by default.
  • Correctly set name and key in the credential. Otherwise, e-backup cannot access the storage location.

After the creation is complete, wait for 30 seconds for check and synchronization of the backup storage location. Then check whether PHASE is Available. The location is available only when the value is Available.

$ kubectl get backupstoragelocations.velero.io backup-location-001 -n velero 
NAME                  PHASE       LAST VALIDATED   AGE   DEFAULT
backup-location-001   Available   23s              23m  

If PHASE is not Available for a long time, you can view e-backup logs to locate the fault. After e-backup is installed, a workload named velero is created in the velero namespace, recorded in the logs of velero.

Immediate Backup

The backup process starts immediately and stops upon completion. This mode is commonly used for cloning and migration.

You can use the Backup manifest below and run kubectl create to create a backup task.

apiVersion: velero.io/v1
kind: Backup
metadata:
  name: backup-01
  namespace: velero
spec:
  includedNamespaces:
  - nginx
  - mysql
  labelSelector:
    matchExpressions:
    - key: direction
      operator: In
      values:
      - back
      - front
    matchLabels:
      app: nginx
      backup: velero
  runMode: Normal
  appData:
    volumes: Restic
  hooks:
    resources:
    - name: hook01
      includedNamespaces:
      - nginx
      labelSelector: {}
      pre:
      - exec:
          command:
          - /bin/sh
          - -c
          - echo hello > hello.txt && echo goodbye > goodbye.txt
          container: container-0
          onError: Fail
          timeout: 30s
      post:
      - exec:
          command:
          - /bin/sh
          - -c
          - echo hello > hello.txt && echo goodbye > goodbye.txt
          container: container-0
          onError: Fail
          timeout: 30s
  storageLocation: backup-location-001
  ttl: 720h0m0s

Parameter description:

  • Backup parameters
    • storageLocation: (mandatory) name of the backup storage location where the data to be backed up is stored.
    • ttl: duration for storing backups in the location, after which the backups are deleted. The value must be in the specified format. h, m, and s indicate hour, minute, and second, respectively. For example, 24h indicates one day, and 3h4m5s indicates three hours, four minutes, and five seconds. The default value is 720h0m0s (30 days).
  • Resource filtering: The following parameters are used as filters. The intersection of these fields, if all configured, is used to filter all resources in the cluster.
    • includedNamespaces and excludedNamespaces: whether to back up resources in certain namespaces. These two parameters conflict with each other. Choose one to configure. By default, all namespaces are selected.
    • labelSelector: backs up resources with specific labels. The working principle is the same as that in Kubernetes.
    • runMode: (mandatory) backup mode. Value options include Normal (backing up applications and data), AppOnly (backing up applications only), DataOnly (backing up data only), and DryRun (not backing up applications and data; for verification only).
  • Service data backup: The generated service data can be backed up through Everest snapshots (supported only when the EVS PVs as the data volumes) and restic backups (which back up all data volumes except hostPath ones). These two modes can be used together.
    • appData: PV data backup mode. The value can be Restic or Snapshot (not used by default). The Snapshot mode takes effect only when the storage supports snapshots and the CSI snapshot plugin is deployed in the cluster.
  • hook: Hooks are the commands executed before or after a backup to precisely manage your backups. A hook is similar to the kubectl exec command and applies to pods only.
    • includedNamespaces and excludedNamespaces: whether to execute a hook on pods in certain namespaces. These two parameters conflict with each other. Choose one to configure. By default, all namespaces are selected.
    • labelSelector: executes a hook on pods with certain labels. The working principle is the same as that in Kubernetes.
    • command: command to be executed.
    • container: name of the container on which the command is executed. Defaults to the first container when there are multiple containers in the pod.
    • onError: action to take when the hook fails to be executed. The value can be Continue or Fail. Defaults to Fail.
    • Continue indicates that the subsequent operations go on regardless of hook execution failures. Fail indicates that subsequent operations will not continue upon a hook execution failure.
    • timeout: hook execution timeout, after which the hook fails. Defaults to 30s.

    Hook failures affect only pods. The backup of other objects such as Services is not affected.

    Hooks are not globally available. If the pod to execute a hook on is not selected as the backup object, the hook will not be executed. It can be considered that you further filter the objects to be backed up through includedNamespaces or excludedNamespaces.

All configurable items are described above. The following provides some backup configuration suggestions.

  • Retain backups by day (24 hours).
  • Use includeNamespace to specify the backup scope because in most cases, applications are deployed in a specific namespace. Use labelSelector to control backup objects more precisely. Before this, all target objects must have corresponding labels. Using includeNamespace and labelSelector together can satisfy most scenarios.
  • When using Restic to back up service data, if you are not familiar with the OUT/IN mode, you can skip adding annotations to the pods that require volume backup. Instead, set defaultVolumesToRestic to true to back up the service data of the pod volumes. The value false indicates no backups.
  • Use hooks to precisely control your backups. Avoid long-time running tasks. Do not directly operate the file system when running the commands in the hook.

After the backup is complete, run the following commands to view the backup status (status):

$ kubectl -n velero get backups backup-01 -o yaml | grep  "phase"
  phase: Completed

$ kubectl -n velero get backups backup-01 -o yaml
......
status:
  ......

Backup statuses

  • FailedValidation: The backup manifest is incorrectly configured. Check Backup.Status.ValidationErrors to find the cause.
  • InProgress: The backup is in progress.
  • Completed: The backup is complete and no error occurs.
  • PartiallyFailed: The backup is complete, but an error (such as hook execution error) occurs during the backup of certain objects.
  • Failed: The backup fails, and an error that affects the entire process occurs.
  • Deleting: The backup is being deleted.

After the initial backup is complete, the backups and restic folders are displayed in the OBS bucket.

Backup logs are stored in an OBS bucket. Assume that the backup name is backup-001. Go to the OBS console, locate the storage location based on the configured bucket name and sub-path name, go to the backups/backup-01 directory, and find the backup-01-logs.gz file. Then, download, decompress, and view the logs.

Periodic Backup

Data is backed up periodically as configured. This mode is commonly used for disaster recovery.

You can use the Schedule manifest below and run the kubectl create command to create a schedule. You can label the schedule as required. The labels you add in the manifest will be attached to the backups created by the schedule. After a schedule is created in a cluster, a backup is performed immediately. Then, data is backed up periodically as specified.

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: schedule-backup-001
  namespace: velero
spec:
  schedule: 0 */10 * * *
  template:
    runMode: Normal
    hooks: {}
    includedNamespaces:
    - nginx
    - mysql
    labelSelector:
      matchExpressions:
      - key: direction
        operator: In
        values:
        - back
        - front
      matchLabels:
        app: nginx
        backup: velero
    storageLocation: backup-location-001
    ttl: 720h0m0s

Parameter description:

  • schedule: execution time of periodic backups. The @every format and standard Linux cron expressions are supported.
    • @every NUnit: N is a positive integer. The units s, m, and h, stand for seconds, minutes, and hours, respectively. For example, @every 2h30m indicates that the backup is triggered every 2 hours and 30 minutes.
    • Cron expression: The five values stand for minutes, hours, day-of-month, month, and day-of-week, respectively.
  • template: backup manifest, which is the same as spec in Immediate Backup.

Deleting a Backup

You can delete the backup objects and related objects (such as backups, restorations, and schedules) from a cluster and delete backups from the storage location when a large amount of backup data is generated.

You can use the DeleteBackupRequest manifest below and run the kubectl create command to create a backup deletion request.

apiVersion: velero.io/v1
kind: DeleteBackupRequest
metadata:
  name: backup-001-delete
  namespace: velero
spec:
  backupName: backup-001  # Name of the backup to be deleted.

Query the status.

$ kubectl -n velero get deletebackuprequests backup-001-delete -o yaml | grep " phase"
   phase: InProgress
  • InProgress: The deletion task is in progress.
  • Processed: The deletion task has been processed.
  • The Processed state indicates that e-backup has processed the task but may not complete it. You can check the errors in the deletebackuprequest.status.errors field. If e-backup correctly and completely processes the deletion task, the DeleteBackupRequest object is also deleted.
  • Do not manually delete the content in the storage location (OBS bucket).

Immediate Restore

Use an immediate backup as the data source and restore data to another namespace or cluster. This mode applies to all scenarios.

You can use the Restore manifest below and run the kubectl create command to create a backup deletion request.

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: restore-01
  namespace: velero
spec:
  backupName: backup-01
  hooks:
    resources:
    - name: restore-hook-1
      includedNamespaces:
      - mysql
      labelSelector: {}
      postHooks:
      - init:
          initContainers:
          - name: restore-hook-init1
            image: alpine:latest
            volumeMounts:
            - mountPath: /restores/pvc1-vm
              name: pvc1-vm
            command:
            - /bin/ash
            - -c
            - echo -n "FOOBARBAZ" >> /restores/pvc1-vm/foobarbaz
          - name: restore-hook-init2
            image: alpine:latest
            volumeMounts:
            - mountPath: /restores/pvc2-vm
              name: pvc2-vm
            command:
            - /bin/ash
            - -c
            - echo -n "DEADFEED" >> /restores/pvc2-vm/deadfeed
      - exec:
          execTimeout: 1m
          waitTimeout: 5m
          onError: Fail
          container: mysql
          command:
          - /bin/bash
          - '-c'
          - 'while ! mysql_isready; do sleep 1; done'
      - exec:
          container: mysql
          waitTimeout: 6m
          execTimeout: 1m
          onError: Continue
          command:
          - /bin/bash
          - '-c'
          - 'mysql < /backup/backup.sql'
  includedNamespaces:
  - nginx
  - mysql
  namespaceMapping:
    nginx: nginx-another
    mysql: mysql-another
  labelSelector: {}
  preserveNodePorts: false
  storageClassMapping:
    disk: csi-disk
    obs: csi-obs
  imageRepositoryMapping:
    quay.io/coreos:  swr.ap-southeast-1.myhuaweicloud.com/everest

Parameter description:

  • Data source

    backupName: (mandatory) immediate backup that is used as the data source.

  • Resource filtering parameters: similar to those in Immediate Backup.
  • Customized processing
    • namespaceMapping: restores the backup data to another namespace. The value is a mapping in the format of Source: Target. The new namespace does not need to exist in the destination cluster.
    • storageClassMapping: changes the storageClassName used by backup resources such as PVs and PVCs. The storageClass types must be the same.
    • imageRepositoryMapping: changes the images field of the backup. It is used for repository mapping, excluding the change of the image name and tag (to prevent the migration and upgrade from being coupled). For example, after you migrate quay.io/coreos/etcd:2.5 to SWR, you can use swr.ap-southeast-1.myhuaweicloud.com/everest/etcd:2.5 in the local image repository. The configuration format is as follows: quay.io/coreos: swr.ap-southeast-1.myhuaweicloud.com/everest
    • preserveNodePorts: If you set this parameter to false, the system preserves only the nodePorts you configure, not those automatically generated by the Service.
  • hooks: You can add init hooks (used to add initContainers to the pod) and exec hooks (used to execute some commands). For details about how to configure an init hook, see the definition of initContainers in Kubernetes. The following describes the overall hook configuration and the parameters of an exec hook.
    • includedNamespaces and excludedNamespaces: whether to execute a hook on pods in certain namespaces. These two parameters conflict with each other. Choose one to configure. By default, all namespaces are selected.
    • labelSelector: executes a hook on pods with certain labels. The working principle is the same as that in Kubernetes.
    • command: command to be executed.
    • container: name of the container on which the command is executed. Defaults to the first container when there are multiple containers in the pod.
    • onError: action to take when the hook fails to be executed. The value can be Continue or Fail. Defaults to Fail.
    • Continue indicates that the subsequent operations go on regardless of hook execution failures. Fail indicates that subsequent operations will not continue upon a hook execution failure.
    • execTimeout: hook execution timeout, after which the hook fails. Defaults to 30s.
    • waitTimeout: timeout period from the time when e-backup prepares to execute the hook to the time when the container starts to execute the hook. If this period is exceeded, the hook fails. The default value is 0s, indicating that there is no timeout limit.
  • Select a correct data source and ensure that the backup is in the Completed state.
  • Set the parameters related to resource filtering only when necessary.
  • Service data is restored by e-backup based on the selected backup mode. No manual configurations or operations are required.
  • For details about how to use hooks, see the usage suggestions in Immediate Backup. You can skip waitTimeout unless necessary.
  • You are advised to restore what has been backed up to a new namespace to avoid misconfigurations that may disable the restored application.

After the restoration is complete, run the following commands to view the restoration status (status):

$ kubectl -n velero get restores restore-01 -o yaml | grep " phase"
  phase: Completed

$ kubectl -n velero get restores restore-01 -o yaml
......
status:
  ......

Status description

  • FailedValidation: The restore manifest is incorrectly configured. Check Restore.Status.ValidationErrors to find the cause.
  • InProgress: The restore is in progress.
  • Completed: The restore is complete and no error occurs.
  • PartiallyFailed: The restore is complete, but an error (such as hook execution error) occurs during the restore of certain objects.
  • Failed: The restore fails, and an error that affects the entire process occurs.

Check the logs, warnings, and errors generated during the restore.

Assume that the restore name is restore-01. Go to the OBS console, locate the storage location based on the configured bucket name and sub-path name, and go to the restores/restore-01 directory. The following two files exist:

  • restore-01-logs.gz: log file, which can be downloaded, decompressed, and viewed.
  • restore-01-results.gz: restore result file, including warnings and errors.

Change History

Table 1 Release history

Add-on Version

Supported Cluster Version

New Feature

1.2.0

v1.15

v1.17

v1.19

v1.21

  • Supports EulerOS 2.0 (SP5, SP9).
  • Supports security hardening.
  • Optimizes functions.