数据库的路上

OpenGauss 备份恢复实战之gs_probackup工具

全量备份

  1. 确定存放备份文件的路径,初始化备份路径
gs_probackup init -B backup-path 
  1. 在备份路径下添加一个备份实例,一个实例对应一个文件夹
gs_probackup add-instance -B backup-path 

[opengauss@node1]$ gs_probackup add-instance -B /home/opengauss/backup/ --instance instance1

[opengauss@node1]$ ll
total 0
drwx------ 2 opengauss dbgrp 31 Apr 27 14:45 instance1
  1. 执行备份

    先执行全量备份

gs_probackup backup -B /home/opengauss/backup/ --instance=instance1 -b full 

[opengauss@node1 instance1]$ gs_probackup backup -B /home/opengauss/backup/ --instance=instance1 -b full 
INFO: Backup start, gs_probackup version: 2.4.2, instance: instance1, backup ID: SVQLS2, backup mode: FULL, wal mode: STREAM, remote: false, compress-algorithm: none, compress-level: 1
LOG: Backup destination is initialized
LOG: This openGauss instance was initialized with data block checksums. Data block corruption will be detected
LOG: Database backup start
LOG: started streaming WAL at 0/5C000000 (timeline 1)
[2025-05-04 20:38:26]: check identify system success                                                
[2025-05-04 20:38:26]: send START_REPLICATION 0/5C000000 success                                    
[2025-05-04 20:38:26]: keepalive message is received                                                
[2025-05-04 20:38:26]: keepalive message is received                                                
INFO: PGDATA size: 1707MB
INFO: Start backing up files
LOG: Creating page header map "/home/opengauss/backup/backups/instance1/SVQLS2/page_header_map"
[2025-05-04 20:38:31]: keepalive message is received                                                
Progress: [==================================================] 100% (2726/2726, done_files/total_files). backup file 
INFO: Finish backuping file
INFO: Data files are transferred, time elapsed: 7s
INFO: wait for pg_stop_backup()
INFO: pg_stop backup() successfully executed
LOG: stop_lsn: 0/5C0001E8
LOG: Looking for LSN 0/5C0001E8 in segment: 00000001000000000000005C
LOG: Found WAL segment: /home/opengauss/backup/backups/instance1/SVQLS2/database/pg_xlog/00000001000000000000005C
LOG: Thread [0]: Opening WAL segment "/home/opengauss/backup/backups/instance1/SVQLS2/database/pg_xlog/00000001000000000000005C"
LOG: Found LSN: 0/5C0001E8
LOG: finished streaming WAL at 0/5D000000 (timeline 1)
LOG: Getting the Recovery Time from WAL
LOG: Thread [0]: Opening WAL segment "/home/opengauss/backup/backups/instance1/SVQLS2/database/pg_xlog/00000001000000000000005C"
INFO: Syncing backup files to disk
Progress: [==================================================] 100% (2726/2726, done_files/total_files). Sync backup file 
INFO: Finish Syncing backup files.
INFO: Backup files are synced, time elapsed: 1s
INFO: Validating backup SVQLS2
INFO: Begin validate file
Progress: [==================================================] 100% (2728/2728, done_files/total_files). validate file 
INFO: Finish validate file. 
INFO: Backup SVQLS2 data files are valid
INFO: Backup SVQLS2 resident size: 1724MB
INFO: Backup SVQLS2 completed

查看备份

[opengauss@node1 instance1]$ gs_probackup show  -B /home/opengauss/backup/ 

BACKUP INSTANCE 'instance1'
======================================================================================================================================================
 Instance   Version  ID      Recovery Time           Mode  WAL Mode  TLI  Time    Data   WAL  Zratio  Start LSN   Stop LSN    Type  S3 Status  Status 
======================================================================================================================================================
 instance1  9.2      SVQLS2  2025-05-04 20:38:33+08  FULL  STREAM    1/0    9s  1708MB  16MB    1.00  0/5C000028  0/5C0001E8  FILE  UNKNOWN    OK   

增量备份

执行增量备份报错,提示 ERROR: cannot get cbm tracked lsn location, maybe enable_cbm_tracking is off

[opengauss@node1 ~]$ gs_probackup backup -B /home/opengauss/backup/ --instance=instance1 -b ptrack 
INFO: Backup start, gs_probackup version: 2.4.2, instance: instance1, backup ID: SVQMFW, backup mode: PTRACK, wal mode: STREAM, remote: false, compress-algorithm: none, compress-level: 1
LOG: Backup destination is initialized
LOG: This openGauss instance was initialized with data block checksums. Data block corruption will be detected
LOG: Database backup start
LOG: Latest valid FULL backup: SVQMFB
WARNING: Backup SVQMFO has status: ERROR. Cannot be a parent.
INFO: Parent backup: SVQMFB
LOG: started streaming WAL at 0/4000000 (timeline 1)
[2025-05-04 20:52:44]: check identify system success                                                
[2025-05-04 20:52:44]: send START_REPLICATION 0/4000000 success                                     
INFO: PGDATA size: 622MB
LOG: Current tli: 1
LOG: Parent start_lsn: 0/3000028
LOG: start_lsn: 0/4000028
INFO: Extracting pagemap of changed blocks
INFO: change bitmap start lsn location is 0/3000028
[2025-05-04 20:52:44]: keepalive message is received                                                
ERROR: cannot get cbm tracked lsn location, maybe enable_cbm_tracking is off
WARNING: backup in progress, stop backup
[2025-05-04 20:52:44]: keepalive message is received                                                
INFO: wait for pg_stop_backup()
INFO: pg_stop backup() successfully executed
WARNING: Backup SVQMFW is running, setting its status to ERROR

增量备份要求 参数设置 enable_cbm_tracking = on

[opengauss@node1 ~]$ gs_guc  reload -c "enable_cbm_tracking = on"
The gs_guc run with the following arguments: [gs_guc -c enable_cbm_tracking = on reload ].
NOTICE: Turn on cbm tracking function.
expected instance path: [/home/opengauss/gauss_data/postgresql.conf]
gs_guc reload: enable_cbm_tracking=on: [/home/opengauss/gauss_data/postgresql.conf]
server signaled

Total instances: 1. Failed instances: 0.
Success to perform gs_guc!
gs_probackup backup -B /home/opengauss/backup/ --instance=instance1 -b ptrack 

继续报错 ERROR: could not find valid CBM file that contains the merging start point

[opengauss@node1 ~]$ gs_probackup backup -B /home/opengauss/backup/ --instance=instance1 -b ptrack 
INFO: Backup start, gs_probackup version: 2.4.2, instance: instance1, backup ID: SVQMIW, backup mode: PTRACK, wal mode: STREAM, remote: false, compress-algorithm: none, compress-level: 1
LOG: Backup destination is initialized
LOG: This openGauss instance was initialized with data block checksums. Data block corruption will be detected
LOG: Database backup start
LOG: Latest valid FULL backup: SVQMFB
WARNING: Backup SVQMFW has status: ERROR. Cannot be a parent.
WARNING: Backup SVQMFO has status: ERROR. Cannot be a parent.
INFO: Parent backup: SVQMFB
LOG: started streaming WAL at 0/6000000 (timeline 1)
[2025-05-04 20:54:32]: check identify system success                                                
[2025-05-04 20:54:32]: send START_REPLICATION 0/6000000 success                                     
[2025-05-04 20:54:32]: keepalive message is received                                                
INFO: PGDATA size: 622MB
LOG: Current tli: 1
LOG: Parent start_lsn: 0/3000028
LOG: start_lsn: 0/6000028
INFO: Extracting pagemap of changed blocks
INFO: change bitmap start lsn location is 0/3000028
INFO: change bitmap end lsn location is 00000000/06000028
[2025-05-04 20:54:32]: keepalive message is received                                                
ERROR: query failed: ERROR:  could not find valid CBM file that contains the merging start point 00000000/03000028
 query was: SELECT path,changed_block_number,changed_block_list FROM                          pg_cbm_get_changed_block($1, $2)
Please check the replication slots, if it has slots useless,delete it and try again.
WARNING: backup in progress, stop backup
INFO: wait for pg_stop_backup()
INFO: pg_stop backup() successfully executed
WARNING: Backup SVQMIW is running, setting its status to ERROR

原因是增量备份需要建立在全量备份之上,而当前的全量备份是在未开启enable_cbm_tracking 备份的,所有在此全量备份的的增量备份找不到有效的CBM file。解决方法是开启enable_cbm_tracking 参数后重新做一次全量备份。

最终全量和增量都成功了

[opengauss@node1 ~]$ gs_probackup show  -B /home/opengauss/backup/ 

BACKUP INSTANCE 'instance1'
=====================================================================================================================================================
 Instance   Version  ID      Recovery Time           Mode    WAL Mode  TLI  Time   Data   WAL  Zratio  Start LSN  Stop LSN   Type  S3 Status  Status 
=====================================================================================================================================================
 instance1  9.2      SVQNR6  2025-05-04 21:21:07+08  PTRACK  STREAM    1/1    4s  256MB  16MB    1.00  0/D000028  0/D0001E8  FILE  UNKNOWN    OK     
 instance1  9.2      SVQNQY  2025-05-04 21:20:59+08  FULL    STREAM    1/0    4s  622MB  16MB    1.00  0/C000028  0/C0001E8  FILE  UNKNOWN    OK     
 instance1  9.2      SVQNCO  ----                    PTRACK  STREAM    1/1    2s      0     0    1.00  0/8000028  0/0        FILE  UNKNOWN    ERROR  
 instance1  9.2      SVQMFB  2025-05-04 20:52:26+08  FULL    STREAM    1/0    4s  622MB  16MB    1.00  0/3000028  0/30001E8  FILE  UNKNOWN    OK  

测试数据恢复

恢复到某个时间

准备测试数据

openGauss=# select * from backup_time;
 id |       time_t        
----+---------------------
  1 | 2025-05-08 17:15:37
  2 | 2025-05-08 17:15:46
(2 rows)

做一次增量备份

opengauss@node1 ~]$ gs_probackup show -B /home/opengauss/backup/

BACKUP INSTANCE 'instance1'
=======================================================================================================================================================
 Instance   Version  ID      Recovery Time           Mode    WAL Mode  TLI  Time   Data   WAL  Zratio  Start LSN   Stop LSN    Type  S3 Status  Status 
=======================================================================================================================================================
 instance1  9.2      SVXT0M  2025-05-08 17:58:02+08  PTRACK  STREAM    1/1   11s  301MB  16MB    1.00  0/12000028  0/120001E8  FILE  UNKNOWN    OK     
 instance1  9.2      SVQNR6  2025-05-04 21:21:07+08  PTRACK  STREAM    1/1    4s  256MB  16MB    1.00  0/D000028   0/D0001E8   FILE  UNKNOWN    OK     
 instance1  9.2      SVQNQY  2025-05-04 21:20:59+08  FULL    STREAM    1/0    4s  622MB  16MB    1.00  0/C000028   0/C0001E8   FILE  UNKNOWN    OK   

停止数据库

[opengauss@node1 ~]$ gs_ctl stop
[2025-05-08 18:00:13.991][1246928][][gs_ctl]: gs_ctl stopped ,datadir is /home/opengauss/gauss_data 
waiting for server to shut down..... done
server stopped

模拟数据丢失

移走数据目录

[opengauss@node1 ~]$ mv gauss_data/ gauss_data_bak

开始恢复到指定时间点

指定要恢复到的时间,当前只能指定备份中的recovery-time。不支持随意指定时间点

[opengauss@node1 ~]$ gs_probackup restore  -B /home/opengauss/backup/ --instance instance1 --recovery-target-time='2025-05-08 17:58:02'
LOG: Restore begin.
LOG: there is no file tablespace_map
LOG: check tablespace directories of backup SVXT0M
LOG: check external directories of backup SVXT0M
INFO: Validating parents for backup SVXT0M
INFO: Validating backup SVQNQY
INFO: Begin validate file
Progress: [==================================================] 100% (1656/1656, done_files/total_files). validate file 
INFO: Finish validate file. 
INFO: Backup SVQNQY data files are valid
INFO: Validating backup SVQNR6
INFO: Begin validate file
Progress: [==================================================] 100% (1656/1656, done_files/total_files). validate file 
INFO: Finish validate file. 
INFO: Backup SVQNR6 data files are valid
INFO: Validating backup SVXT0M
INFO: Begin validate file
Progress: [==================================================] 100% (1677/1677, done_files/total_files). validate file 
INFO: Finish validate file. 
INFO: Backup SVXT0M data files are valid
LOG: Thread [1]: Opening WAL segment "/home/opengauss/backup/backups/instance1/SVXT0M/database/pg_xlog/000000010000000000000012"
INFO: Backup validation completed successfully on time 2025-05-08 17:58:02+08, xid 13837 and LSN 0/120001E8
INFO: Backup SVXT0M is valid.
INFO: Restoring the database from backup at 2025-05-08 17:57:58+08
LOG: there is no file tablespace_map
LOG: Restore directories and symlinks... in /home/opengauss/gauss_data
INFO: Start restoring backup files. DATA size: 666MB
INFO: Begin restore file
Progress: [==================================================] 100% (1677/1677, done_files/total_files). Restore file 
INFO: Finish restore file
INFO: Backup files are restored. Transfered bytes: 666MB, time elapsed: 1s
INFO: Restore incremental ratio (less is better): 100% (666MB/666MB)
INFO: Start Syncing restored files to disk
Progress: [==================================================] 100% (1677/1677, done_files/total_files). Sync restore file 
INFO: Finish Syncing restored files.
INFO: Restored backup files are synced, time elapsed: 1s
INFO: Restore of backup SVXT0M completed.

检查测试数据恢复成功


[opengauss@node1 ~]$ gsql -r -d postgres
gsql ((openGauss 7.0.0-RC1 build cff7b04d) compiled at 2025-04-22 13:13:05 commit 0 last mr  debug)
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

openGauss=# \d
                                   List of relations
 Schema |        Name        |   Type   |   Owner   |             Storage              
--------+--------------------+----------+-----------+----------------------------------
 public | backup_time        | table    | opengauss | {orientation=row,compression=no}
 public | backup_time_id_seq | sequence | opengauss | 
(2 rows)

openGauss=# select * from backup_time;
 id |       time_t        
----+---------------------
  1 | 2025-05-08 17:15:37
  2 | 2025-05-08 17:15:46
(2 rows)

o