Updated on 2022-12-16 GMT+08:00

Backup and Restoration

The following figure shows the backup and restoration process of physical fine-grained backup and restoration:

Snapshot

A snapshot is a complete backup that records point-in-time configuration data and service data of a GaussDB(DWS) cluster. A snapshot can be used to restore a cluster. Snapshots are stored on OBS.

  • GaussDB(DWS) provides some free-of-charge storage space for storing snapshot data. However, when you use more storage space than the free-of-charge storage space, the excess space is billed on a pay-per-use basis.
  • The free-of-charge space is the same as the size of the total storage space of a cluster (storage space of a single node x number of nodes).
  • The snapshot management function depends on OBS.

A snapshot contains the data in databases running in a cluster and the cluster information, including the node quantity, node flavors, and administrator names. To restore a cluster from a snapshot, GaussDB(DWS) uses the cluster information to create a new cluster and then restores all databases from the snapshot. The new cluster created from the snapshot has the same configurations (including the number and flavor of nodes) as those of the original cluster. When restoring a cluster from a snapshot, the parameter values are consistent with those in the snapshot unless you specify them.

There are two types of snapshots: automated and manual.

Backup and Restoration Policies of Automated Snapshots

Automated snapshots adopt differential incremental backups. The automated snapshot created for the first time is a full backup (base version), and then the system creates full backups at a specified interval. Incremental backups are generated between two full backups. The incremental backup records change based on the previous backup. During snapshot restoration, GaussDB(DWS) uses all backups between the latest full backup and the current incremental backup to restore the cluster. Therefore, no data loss occurs. To ensure that every incremental snapshot can be used for data restoration, when its retention period exceeds the upper limit, GaussDB(DWS) does not delete the snapshot immediately. Instead, GaussDB(DWS) retains it for future cluster restoration using other incremental snapshots. GaussDB(DWS) deletes the previous full automated snapshots and related incremental snapshots only after a new full snapshot is created. If you disable the automated snapshot function for an existing cluster, all its automated snapshots will be deleted. However, manual snapshots will not be deleted.

Ecosystem Interconnection

Storage media for database backup include NetBackup, EISOO, A8000, OBS, and disk storage media. Local disk storage competes with database data for storage space on disks. Therefore, remote storage media, such as NetBackup, EISOO and A8000, are used to manage backup data. In addition, these storage services perform technical operations such as data deduplication, and provides information about backup space usage. OBS is a data storage service provided by Huawei. It is easy to use. During backup, the backup process directly sends data to OBS for storage. You do not need to set up an intermediate client on the local backup server. The mainstream backup software uses this architecture, as shown in the following figure. Each node has a backup client. When a backup task starts, the backup task is delivered to each client. Each client creates an eefproc process, which invokes the backup command of the database. After the backup data is generated, it is stored in the pipe. The eefproc process reads the data and sends it to the backup server.

For the backup and restoration ecosystem interconnection, GaussDB(DWS) uses the standard XBSA interface for backup, and backup vendors implement the XBSA protocol and use the non-intrusive Roach client provided by GaussDB(DWS) to connect to the GaussDB(DWS) backup storage service.

Figure 1 Backup and restoration ecosystem interconnection architecture

For details, see Snapshots.