【案例】Oracle报错ORA-16433非归档丢失redo无法启动的恢复过程
转惜纷飞
今天ML的群中女神和travel在纠结一个恢复的问题,11.2.0.3版本,非归档,大概是rm掉current的log,然后重建controlfille后恢复导致一系列问题,并最终出现ora-600 2662错误,虽然这个错误很常见,但是你发现推进scn也是无法open,感觉有点怪,远程了女神的电脑,操作不便,最后将文件压缩传过来,我在自己的vmware进行了恢复。
由于环境的差异,所以解压后我先进行rename file操作,如下:
SQL> SELECT name FROM v$datafile;
NAME
------------------------------------------------
/u01/app/oracle/oradata/travel/system01.dbf
.........
/u01/app/oracle/oradata/travel/users01.dbf
SQL> SELECT member FROM v$logfile;
MEMBER
-------------------------------------------------
/u01/app/oracle/oradata/travel/redo03.log
/u01/app/oracle/oradata/travel/redo02.log
/u01/app/oracle/oradata/travel/redo01.log
SQL> ALTER DATABASE RENAME file '/u01/app/oracle/oradata/travel/system01.dbf' TO '/home/oracle/travel/travel/system01.dbf';
SQL> ALTER DATABASE RENAME file '/u01/app/oracle/oradata/travel/sysaux01.dbf' TO '/home/oracle/travel/travel/sysaux01.dbf';
SQL> ALTER DATABASE RENAME file '/u01/app/oracle/oradata/travel/undotbs01.dbf' TO '/home/oracle/travel/travel/undotbs01.dbf';
SQL> ALTER DATABASE RENAME file '/u01/app/oracle/oradata/travel/users01.dbf' TO '/home/oracle/travel/travel/users01.dbf';
SQL> ALTER DATABASE RENAME file '/u01/app/oracle/oradata/travel/redo01.log' TO '/home/oracle/travel/travel/redo01.log';
SQL> ALTER DATABASE RENAME file '/u01/app/oracle/oradata/travel/redo02.log' TO '/home/oracle/travel/travel/redo02.log';
SQL> ALTER DATABASE RENAME file '/u01/app/oracle/oradata/travel/redo03.log' TO '/home/oracle/travel/travel/redo03.log';
SQL> SELECT name FROM v$datafile;
NAME
-------------------------------------------------
/home/oracle/travel/travel/system01.dbf
/home/oracle/travel/travel/sysaux01.dbf
/home/oracle/travel/travel/undotbs01.dbf
/home/oracle/travel/travel/users01.dbf
SQL> SELECT member FROM v$Logfile;
MEMBER
-------------------------------------------------
/home/oracle/travel/travel/redo03.log
/home/oracle/travel/travel/redo02.log
/home/oracle/travel/travel/redo01.log
此时,进行recover,会报错ORA-16433,如下:
SQL> recover DATABASE;
ORA-00283: recovery SESSION canceled due TO errors
ORA-16433: The DATABASE must be opened IN READ/WRITE mode.
SQL> recover DATABASE USING backup controlfile until cancel;
ORA-00283: recovery SESSION canceled due TO errors
ORA-16433: The DATABASE must be opened IN READ/WRITE mode.
SQL> ALTER DATABASE OPEN;
ALTER DATABASE OPEN
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: DATA file 1: '/home/oracle/travel/travel/system01.dbf'
关于该错误,通过oerr命令可以大概了解一下其含义,如下:
[oracle@11gR2_primary ~]$ oerr ora 16433
16433, 00000, "The database must be opened in read/write mode."
// *Cause: An attempt was made to open the database in read-only mode after an
// operation that requires that the database be opened in read/write
// mode.
// *Action: Open the database in read/write mode. The database can then be
// opened in read-only mode.
通过这个错误,我们可以得出一个信息,数据库可以以read only模式打开,换句话讲,你查询如下几个值都是一样的:
select checkpoint_change# from v$database; --来自controlfile
select checkpoint_change# from v$datafile; --来自controlfile
select checkpoint_change# from v$datafile_orader; --来自datafile header
其实,通常来讲,只要上述几个值一样,那么我们的db应该都是可以直接open打开的,然而,这里却不行,read only都不行,我试过。
然后下面就开始我的恢复工作。
第一次尝试恢复:
---重建controlfile
SQL> startup nomount pfile=/home/oracle/travel/travel/a.ora
ORACLE instance started.
Total System Global Area 626327552 bytes
Fixed Size 2230952 bytes
Variable Size 184550744 bytes
Database Buffers 436207616 bytes
Redo Buffers 3338240 bytes
SQL> CREATE CONTROLFILE REUSE DATABASE "TRAVEL" RESETLOGS NOARCHIVELOG
2 MAXLOGFILES 16
3 MAXLOGMEMBERS 3
4 MAXDATAFILES 100
5 MAXINSTANCES 8
6 MAXLOGHISTORY 292
7 LOGFILE
8 GROUP 1 '/home/oracle/travel/travel/redo01.log' SIZE 50M BLOCKSIZE 512,
9 GROUP 2 '/home/oracle/travel/travel/redo02.log' SIZE 50M BLOCKSIZE 512,
10 GROUP 3 '/home/oracle/travel/travel/redo03.log' SIZE 50M BLOCKSIZE 512
11 -- STANDBY LOGFILE
12 DATAFILE
13 '/home/oracle/travel/travel/system01.dbf',
14 '/home/oracle/travel/travel/sysaux01.dbf',
15 '/home/oracle/travel/travel/undotbs01.dbf',
16 '/home/oracle/travel/travel/users01.dbf'
17 CHARACTER SET AL32UTF8
18 ;
Control file created.
SQL>
开始开始进行recover:
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/home/oracle/travel/travel/system01.dbf'
SQL> recover database until cancel;
ORA-00283: recovery session canceled due to errors
ORA-01610: recovery using the BACKUP CONTROLFILE option must be done
SQL> recover database using backup controlfile until cancel;
ORA-00279: change 244977 generated at 01/19/2013 01:56:54 needed for thread 1
ORA-00289: suggestion : /oracle/product/11.2.0/db_1/dbs/arch1_1_805082211.dbf
ORA-00280: change 244977 for thread 1 is in sequence #1
Specify log: {=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/home/oracle/travel/travel/system01.dbf'
ORA-01112: media recovery not started
因为本身是非归档,所以这个不完全恢复的步骤无非是为了后面可以进行open resetlog。然后停库,加入隐含参数进行open:
SQL> shutdown immediate
ORA-01109: DATABASE NOT OPEN
DATABASE dismounted.
ORACLE instance shut down.
SQL> startup nomount pfile=/home/oracle/travel/travel/b.ora
ORACLE instance started.
Total System Global Area 626327552 bytes
Fixed SIZE 2230952 bytes
Variable SIZE 184550744 bytes
DATABASE Buffers 436207616 bytes
Redo Buffers 3338240 bytes
SQL> ALTER DATABASE mount
DATABASE altered.
SQL> ALTER DATABASE OPEN resetlogs;
ALTER DATABASE OPEN resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance TERMINATED. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [0], [244985], [0],
[244998], [4194432], [], [], [], [], [], []
Process ID: 3641
SESSION ID: 1 Serial NUMBER: 5
此时alert log对应的错误如下:
Fri Dec 14 21:58:39 2012
SMON: enabling cache recovery
Errors in file /oracle/diag/diag/rdbms/travel/travel/trace/travel_ora_3641.trc (incident=4937):
ORA-00600: internal error code, arguments: [2662], [0], [244985], [0], [244998], [4194432], [], [], [], [], [], []
Incident details in: /oracle/diag/diag/rdbms/travel/travel/incident/incdir_4937/travel_ora_3641_i4937.trc
Fri Dec 14 21:58:43 2012
Dumping diagnostic data in directory=[cdmp_20121214215843], requested by (instance=1, osid=3641), summary=[incident=4937].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /oracle/diag/diag/rdbms/travel/travel/trace/travel_ora_3641.trc:
ORA-00600: internal error code, arguments: [2662], [0], [244985], [0], [244998], [4194432], [], [], [], [], [], []
Errors in file /oracle/diag/diag/rdbms/travel/travel/trace/travel_ora_3641.trc:
ORA-00600: internal error code, arguments: [2662], [0], [244985], [0], [244998], [4194432], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 3641): terminating the instance due to error 600
Instance terminated by USER, pid = 3641
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (3641) as a result of ORA-1092
甚至手工执行如下操作都无法open数据库,如下:
SQL> ALTER SESSION SET events '10015 trace name ADJUST_SCN level 14';
SESSION altered.
SQL> ALTER DATABASE OPEN;
ALTER DATABASE OPEN
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: DATA file 1: '/home/oracle/travel/travel/system01.dbf'
最后dunmp controlfile和CURRENT redo logfile,发现low cache rba信息居然是最大值:
SQL> oradebug setmypid
Statement processed.
SQL> ALTER system SET events 'immediate trace name controlf level 4' ;
System altered.
SQL> ALTER system dump logfile '/home/oracle/travel/travel/redo01.log';
SQL> oradebug tracefile_name
/oracle/diag/diag/rdbms/travel/travel/trace/travel_ora_4229.trc
++++++++ controlfile dump 片段
***************************************************************************
CHECKPOINT PROGRESS RECORDS
***************************************************************************
(size = 8180, compat size = 8180, section max = 11, section in-use = 0,
last-recid= 0, old-recno = 0, last-recno = 0)
(extent = 1, blkno = 2, numrecs = 11)
THREAD #1 - status:0x2 flags:0x0 dirty:0
low cache rba:(0xffffffff.ffffffff.ffff) on disk rba:(0x1.3.0) ---low cache rba为无穷大
on disk scn: 0x0000.0003bcf7 12/14/2012 21:58:39
resetlogs scn: 0x0000.0003bcf2 12/14/2012 21:58:36
heartbeat: 802069494 mount id: 2869233386
THREAD #2 - status:0x0 flags:0x0 dirty:0
low cache rba:(0x0.0.0) on disk rba:(0x0.0.0)
on disk scn: 0x0000.00000000 01/01/1988 00:00:00
resetlogs scn: 0x0000.00000000 01/01/1988 00:00:00
heartbeat: 0 mount id: 0
THREAD #3 - status:0x0 flags:0x0 dirty:0
low cache rba:(0x0.0.0) on disk rba:(0x0.0.0)
on disk scn: 0x0000.00000000 01/01/1988 00:00:00
resetlogs scn: 0x0000.00000000 01/01/1988 00:00:00
heartbeat: 0 mount id: 0
+++++++redo01.log dump 片段
DUMP OF REDO FROM FILE '/home/oracle/travel/travel/redo01.log'
Opcodes *.*
RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
Times: creation thru eternity
FILE HEADER:
Compatibility Vsn = 186646528=0xb200000
Db ID=2872261344=0xab333ae0, Db Name='TRAVEL'
Activation ID=2872292516=0xab33b4a4
Control Seq=233=0xe9, File size=102400=0x19000
File Number=1, Blksiz=512, File Type=2 LOG
descrip:"Thread 0001, Seq# 0000000001, SCN 0x00000003bcee-0xffffffffffff"
thread: 1 nab: 0xffffffff seq: 0x00000001 hws: 0x3 eot: 1 dis: 0
resetlogs count: 0x2ffc9463 scn: 0x0000.0003bcee (244974)
prev resetlogs count: 0x2ffc7a41 scn: 0x0000.0003bcea (244970)
Low scn: 0x0000.0003bcee (244974) 01/19/2013 01:56:51
Next scn: 0xffff.ffffffff 01/01/1988 00:00:00
Enabled scn: 0x0000.0003bcee (244974) 01/19/2013 01:56:51
Thread closed scn: 0x0000.0003bcee (244974) 01/19/2013 01:56:51
Disk cksum: 0x3467 Calc cksum: 0x3467
Terminal recovery stop scn: 0x0000.00000000
Terminal recovery 01/01/1988 00:00:00
Most recent redo scn: 0x0000.00000000
Largest LWN: 0 blocks
End-of-redo stream : No
Unprotected mode
Miscellaneous flags: 0x800000
Thread internal enable indicator: thr: 0, seq: 0 scn: 0x0000.00000000
Zero blocks: 0
Format ID is 2
redo log key is 853c461da2eec7ed4b45ce75b8c27d7
redo log key flag is 5
Enabled redo threads: 1
REDO RECORD - Thread:1 RBA: 0x000001.00000002.0010 LEN: 0x0070 VLD: 0x05
SCN: 0x0000.0003bcf2 SUBSCN: 1 01/19/2013 01:56:54
(LWN RBA: 0x000001.00000002.0010 LEN: 0001 NST: 0001 SCN: 0x0000.0003bcee) ,
CHANGE #1 MEDIA RECOVERY MARKER SCN:0x0000.00000000 SEQ:0 OP:17.3 ENC:0
END OF REDO DUMP
可以看到实际上current redo的LWN RBA是对的,为1.2.10,是小于on disk rba (0×1.3.0)的。那么这里为什么会出现这个情况?
猜测可能是写紊乱了。到这里也就比较明白了,仍然是conrolfile有问题。
既然如此,那么我直接将controlfile文件rm掉,然后再次重建。如下:
SQL> startup nomount pfile='/home/oracle/travel/travel/a.ora
ORACLE instance started.
Total System Global Area 626327552 bytes
Fixed Size 2230952 bytes
Variable Size 184550744 bytes
Database Buffers 436207616 bytes
Redo Buffers 3338240 bytes
SQL> CREATE CONTROLFILE REUSE DATABASE "TRAVEL" RESETLOGS NOARCHIVELOG
2 MAXLOGFILES 16
3 MAXLOGMEMBERS 3
4 MAXDATAFILES 100
5 MAXINSTANCES 8
6 MAXLOGHISTORY 292
7 LOGFILE
8 GROUP 1 '/home/oracle/travel/travel/redo01.log' SIZE 50M BLOCKSIZE 512,
9 GROUP 2 '/home/oracle/travel/travel/redo02.log' SIZE 50M BLOCKSIZE 512,
10 GROUP 3 '/home/oracle/travel/travel/redo03.log' SIZE 50M BLOCKSIZE 512
11 -- STANDBY LOGFILE
12 DATAFILE
13 '/home/oracle/travel/travel/system01.dbf',
14 '/home/oracle/travel/travel/sysaux01.dbf',
15 '/home/oracle/travel/travel/undotbs01.dbf',
16 '/home/oracle/travel/travel/users01.dbf'
17 CHARACTER SET AL32UTF8
18 ;
Control file created.
SQL> recover database until cancel using backup controlfile;
ORA-00279: change 244985 generated at 12/14/2012 23:05:06 needed for thread 1
ORA-00289: suggestion : /oracle/product/11.2.0/db_1/dbs/arch1_1_802047904.dbf
ORA-00280: change 244985 for thread 1 is in sequence #1
Specify log: {=suggested | filename | AUTO | CANCEL}
AUTO
ORA-00308: cannot open archived log
'/oracle/product/11.2.0/db_1/dbs/arch1_1_802047904.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-00308: cannot open archived log
'/oracle/product/11.2.0/db_1/dbs/arch1_1_802047904.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/home/oracle/travel/travel/system01.dbf'
SQL>
SQL>
SQL> select file#,CHECKPOINT_CHANGE#,LAST_CHANGE# from v$datafile order by 1;
FILE# CHECKPOINT_CHANGE# LAST_CHANGE#
---------- ------------------ ------------
1 244985
2 244985
3 244985
4 244985
SQL> select file#,CHECKPOINT_CHANGE# from v$datafile_header order by 1;
FILE# CHECKPOINT_CHANGE#
---------- ------------------
1 244985
2 244985
3 244985
4 244985
SQL> select CHECKPOINT_CHANGE# from v$database;
CHECKPOINT_CHANGE#
------------------
0
此时停库,然后在pfile中加入隐含参数:
*._allow_resetlogs_corruption=TRUE
*._allow_error_simulation=TRUE
接着再次进行mount,并进行scn 推进:
SQL> ALTER DATABASE OPEN resetlogs;
ALTER DATABASE OPEN resetlogs
*
ERROR at line 1:
ORA-01194: file 1 needs more recovery TO be consistent
ORA-01110: DATA file 1: '/home/oracle/travel/travel/system01.dbf'
SQL>
SQL> shutdown immediate
ORA-01109: DATABASE NOT OPEN
DATABASE dismounted.
ORACLE instance shut down.
SQL> startup nomount pfile='/home/oracle/travel/travel/b.ora
ORACLE instance started.
Total System Global Area 626327552 bytes
Fixed Size 2230952 bytes
Variable Size 184550744 bytes
Database Buffers 436207616 bytes
Redo Buffers 3338240 bytes
SQL> alter database mount
2 ;
Database altered.
SQL> alter session set events '10015 trace name ADJUST_SCN level 10';
Session altered.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01589: must use RESETLOGS or NORESETLOGS option for database open
SQL> alter database open noresetlogs;
alter database open noresetlogs
*
ERROR at line 1:
ORA-01588: must use RESETLOGS option for database open
SQL> alter database open resetlogs;
Database altered.
SQL> show parameter name
NAME TYPE VALUE
-------------Oracleо----------------------- ----------- ------------------------------
db_file_name_convert string
db_name string travel
db_unique_name string travel
global_names boolean FALSE
instance_name string travel
lock_name_space string
log_file_name_convert string
processor_group_name string
service_names string travel
SQL> select open_mode from v$database;
OPEN_MODE
--------------------
READ WRITE
SQL> alter system switch logfile;
System altered.
这里说明一下是我的alert log里面还出现了数据字典不一致的问题,因为毕竟是强制open的,如下:
Sun Jan 20 00:25:34 2013
Errors in file /oracle/diag/diag/rdbms/travel/travel/trace/travel_m000_4681.trc:
ORA-25153: Temporary Tablespace is Empty
Sun Jan 20 00:25:38 2013
Errors in file /oracle/diag/diag/rdbms/travel/travel/trace/travel_j004_4692.trc (incident=21817):
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/diag/diag/rdbms/travel/travel/incident/incdir_21817/travel_j004_4692_i21817.trc
Sun Jan 20 00:25:43 2013
Dumping diagnostic data in directory=[cdmp_20130120002543], requested by (instance=1, osid=4692 (J004)), summary=[incident=21817].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sun Jan 20 00:25:47 2013
这个ora-600错误很常见了,通过是index信息不一致。
其trace内容如下:
* kdsgrp1-1: *************************************************
row 0x0040589e.4b continuation at
0x0040589e.4b file# 1 block# 22686 slot 75 not found
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 75 ..... nrows: 75
kdsgrp - dump CR block dba=0x0040589e
Block header dump: 0x0040589e
Object id on Block Y
seg/obj: 0x12 csc: 0x00.3bca9 itc: 1 flg: O typ: 1 - DATA
fsl: 0 fnx: 0x40589f ver: 0x01
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0001.01a.000000a1 0x00c05e97.001d.0d --U- 1 fsc 0x0053.0003bcab
bdba: 0x0040589e
data_block_dump,data header at 0x7a26a044
===============
tsiz: 0x1fb8
hsiz: 0xa8
pbl: 0x7a26a044
76543210
flag=--------
ntab=1
nrow=75
frre=-1
fsbo=0xa8
...........
seg/obj: 0×12 这是obj$对象,属于bootstrap$核心对象,涉及到这类的对象处理相对麻烦,针对这类情况,数据库open后都建议把数据导出,然后重建库,我这里就不在继续描述这个ora-600错误了,我博客也有类似的例子。
**最后来个小节:
1. oracle通过系统checkpoint scn,datafile checkpoint scn,start scn三者之间的比较来判断数据文件是否需要进行介质恢复.
2. 在redo 线程打开的情况下,即数据库open的情况下,stop scn会被设置为无穷大,当正常关闭时,stop scn等于datafile scn.
这里需要注意的是,stop scn是存放在controlfile中的,网上部分资料说是存在datafile header中,这个说法是错误的。
3. oracle在open之前是先判断是否进行介质恢复,然后再是判断是否进行instance recovery。
4. 关于4种scn的关系如下:
system checkpoint scn — 存放在controlfile中
datafile checkpoint scn –存放在controlfile中
start scn —存放在datafile header中
stop scn —存放在controlfile中**
system scn,datafile checkpoint scn,start scn,这3种scn用于判断数据文件是否需要进行介质恢复。这3个相等这不需要介质恢复。
如何这4个都相等,那么就不需要进行实例恢复。stop scn是用于判断是否进行实例恢复的。
5. 如果stop scn比其他的几个scn要大,那么就需要进行instance recover,需要进行扫描redo,实例恢复的起点是low cache rba,终点
是redo log的最末端。