http://blog.sina.com.cn/s/blog_544a710b0101a122.html http://blog.51cto.com/heyiyi/1898506 https://blog.csdn.net/fjgui/article/details/47421609 https://blog.csdn.net/baiyinqiqi/article/details/47951687
1.standby端,在$PGDATA/recovery里加上recovery_target_timeline = 'latest' pg9以后的官方文档有了这么一段话:
Allow standby recovery to switch to a new timeline automatically (Heikki Linnakangas) Now standby servers scan the archive directory for new timelines periodically 什么是new timeline?后面就会看到
2.关掉primary
pg_ctl stop -D $PGDATA -m fast
2018-11-27 17:23:01.059 CST,,,1624,,5bfcd2a7.658,1,,2018-11-27 13:14:15 CST,,0,LOG,00000,"shutting down",,,,,,,,,"" 2018-11-27 17:23:01.443 CST,,,1624,,5bfcd2a7.658,2,,2018-11-27 13:14:15 CST,,0,LOG,00000,"database system is shut down",,,,,,,,,"" 2018-11-27 17:23:01.672 CST,"repl","",3204,"172.16.10.142:58547",5bfd0cf5.c84,1,"",2018-11-27 17:23:01 CST,,0,FATAL,57P03,"the database system is shutting down",,,,,,,,,"" 2018-11-27 17:23:02.839 CST,"role1","pdb1",3205,"10.1.161.35:54606",5bfd0cf6.c85,1,"",2018-11-27 17:23:02 CST,,0,FATAL,57P03,"the database system is shutting down",,,,,,,,,""
3.在standby端promote
pg_ctl promote -D $PGDATA
2018-11-27 17:25:02.448 CST,,,1940,,5bfd0d6e.794,1,,2018-11-27 17:25:02 CST,,0,FATAL,XX000,"could not connect to the primary server: could not connect to server: Connection refused Is the server running on host ""172.16.10.100"" and accepting TCP/IP connections on port 5432? ",,,,,,,,,"" 2018-11-27 17:25:03.792 CST,,,31753,,5bfd0874.7c09,7,,2018-11-27 17:03:48 CST,1/0,0,LOG,00000,"received promote request",,,,,,,,,"" 2018-11-27 17:25:03.792 CST,,,31753,,5bfd0874.7c09,8,,2018-11-27 17:03:48 CST,1/0,0,LOG,00000,"redo done at 0/19000028",,,,,,,,,"" 2018-11-27 17:25:03.792 CST,,,31753,,5bfd0874.7c09,9,,2018-11-27 17:03:48 CST,1/0,0,LOG,00000,"last completed transaction was at log time 2018-11-27 17:06:58.916715+08",,,,,,,,,"" 2018-11-27 17:25:03.794 CST,,,31753,,5bfd0874.7c09,10,,2018-11-27 17:03:48 CST,1/0,0,LOG,00000,"selected new timeline ID: 2",,,,,,,,,"" 2018-11-27 17:25:03.836 CST,,,31753,,5bfd0874.7c09,11,,2018-11-27 17:03:48 CST,1/0,0,FATAL,42501,"could not open file ""recovery.conf"": Permission denied",,,,,,,,,"" 2018-11-27 17:25:03.836 CST,,,31751,,5bfd0874.7c07,3,,2018-11-27 17:03:48 CST,,0,LOG,00000,"startup process (PID 31753) exited with exit code 1",,,,,,,,,"" 2018-11-27 17:25:03.836 CST,,,31751,,5bfd0874.7c07,4,,2018-11-27 17:03:48 CST,,0,LOG,00000,"terminating any other active server processes",,,,,,,,,"" 2018-11-27 17:25:03.836 CST,"postgres","pdb1",32068,"[local]",5bfd091d.7d44,1,"idle",2018-11-27 17:06:37 CST,3/0,0,WARNING,57P02,"terminating connection because of crash of another server process","The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.","In a moment you should be able to reconnect to the database and repeat your command.",,,,,,,"psql"
recovery.conf没权限更改,实例进程被终止,再开起来已经无法继续
2018-11-28 10:12:51.648 CST,,,18795,,5bfdf855.496b,7,,2018-11-28 10:07:17 CST,1/0,0,LOG,00000,"received promote request",,,,,,,,,""
2018-11-28 10:12:51.648 CST,,,18795,,5bfdf855.496b,8,,2018-11-28 10:07:17 CST,1/0,0,LOG,00000,"redo done at 0/1A000028",,,,,,,,,""
2018-11-28 10:12:51.648 CST,,,18795,,5bfdf855.496b,9,,2018-11-28 10:07:17 CST,1/0,0,LOG,00000,"last completed transaction was at log time 2018-11-28 10:10:28.375684+08",,,,,,,,,""
2018-11-28 10:12:51.649 CST,,,18795,,5bfdf855.496b,10,,2018-11-28 10:07:17 CST,1/0,0,LOG,00000,"selected new timeline ID: 2",,,,,,,,,""
2018-11-28 10:12:51.697 CST,,,18795,,5bfdf855.496b,11,,2018-11-28 10:07:17 CST,1/0,0,LOG,00000,"archive recovery complete",,,,,,,,,""
2018-11-28 10:12:51.715 CST,,,18795,,5bfdf855.496b,12,,2018-11-28 10:07:17 CST,1/0,0,LOG,00000,"MultiXact member wraparound protections are now enabled",,,,,,,,,""
2018-11-28 10:12:51.716 CST,,,18793,,5bfdf855.4969,3,,2018-11-28 10:07:17 CST,,0,LOG,00000,"database system is ready to accept connections",,,,,,,,,""
2018-11-28 10:12:51.716 CST,,,19752,,5bfdf9a3.4d28,1,,2018-11-28 10:12:51 CST,,0,LOG,00000,"autovacuum launcher started",,,,,,,,,""
2018-11-28 10:12:51.760 CST,,,19753,,5bfdf9a3.4d29,1,,2018-11-28 10:12:51 CST,,0,LOG,00000,"archive command failed with exit code 1","The failed archive command was: test ! -f /mysqldata/
pg/pgarch/00000002.history && cp pg_xlog/00000002.history /mysqldata/pg/pgarch/00000002.history",,,,,,,,""
2018-11-28 10:12:52.763 CST,,,19753,,5bfdf9a3.4d29,2,,2018-11-28 10:12:51 CST,,0,LOG,00000,"archive command failed with exit code 1","The failed archive command was: test ! -f /mysqldata/
pg/pgarch/00000002.history && cp pg_xlog/00000002.history /mysqldata/pg/pgarch/00000002.history",,,,,,,,""
2018-11-28 10:12:53.766 CST,,,19753,,5bfdf9a3.4d29,3,,2018-11-28 10:12:51 CST,,0,LOG,00000,"archive command failed with exit code 1","The failed archive command was: test ! -f /mysqldata/
pg/pgarch/00000002.history && cp pg_xlog/00000002.history /mysqldata/pg/pgarch/00000002.history",,,,,,,,""
2018-11-28 10:12:53.766 CST,,,19753,,5bfdf9a3.4d29,4,,2018-11-28 10:12:51 CST,,0,WARNING,01000,"archiving transaction log file ""00000002.history"" failed too many times, will try again l
ater",,,,,,,,,""
[postgres@mycat02 ~]$ pg_controldata
pg_control version number: 942
Catalog version number: 201409291
Database system identifier: 6583145462094845370
Database cluster state: in production
这时standby已经转为primary了,到$PGDATA下可以看到recovery.conf变为了recovery.done
4.把原来的primary恢复,成为新环境下的standby
cd $PGDATA mv recovery.done recovery.conf standby_mode = on # 指定为从库 primary_conninfo = 'host=172.16.10.143 port=5432 user=repl password=mall%9K0924' # 对应的主库信息 recovery_target_timeline = 'latest' # 这个说明这个流复制同步到最新的数据 vi postgres.conf hot_standby = on
新从库上
[postgres@mysql56 pg_log]$ pg_controldata pg_control version number: 942 Catalog version number: 201409291 Database system identifier: 6583145462094845370 Database cluster state: in archive recovery
5.级联状态
master_172.16.10.143 --> slave01_172.16.10.100 --> slave02_172.16.10.142
master
postgres=# select * from pg_stat_replication; -[ RECORD 1 ]----+------------------------------ pid | 20456 usesysid | 16426 usename | repl application_name | walreceiver client_addr | 172.16.10.100 client_hostname | client_port | 39208 backend_start | 2018-11-28 10:17:55.837594+08 backend_xmin | state | streaming sent_location | 0/1A000348 write_location | 0/1A000348 flush_location | 0/1A000348 replay_location | 0/1A000348 sync_priority | 0 sync_state | async
slave01
pdb1=# select * from pg_stat_replication; -[ RECORD 1 ]----+------------------------------ pid | 8202 usesysid | 16426 usename | repl application_name | walreceiver client_addr | 172.16.10.142 client_hostname | client_port | 60725 backend_start | 2018-11-28 10:17:55.108761+08 backend_xmin | 1892 state | streaming sent_location | 0/1A000348 write_location | 0/1A000348 flush_location | 0/1A000348 replay_location | 0/1A000348 sync_priority | 0 sync_state | async