据库系统存储有块盘坏了导致RAC节点一状态异常,OCR盘没有挂载,更换磁盘后重启节点一,OCR挂载正常,节点一也正常;节点二alert日志中频繁报错Transaction recovery: lock conflict caught and ignored。
1. 查看恢复时使用的回滚段 - select b.name useg, b.inst# instid, b.status$ status, a.ktuxeusn
- xid_usn, a.ktuxeslt xid_slot, a.ktuxesqn xid_seq, a.ktuxesiz undoblocks,
- a.ktuxesta txstatus
- from x$ktuxe a, undo$ b
- where a.ktuxecfl like '%DEAD%'
- and a.ktuxeusn = b.us#;
复制代码
2.通过dump undo header 寻找事务使用的undo segment ALTER SYSTEM DUMP UNDO HEADER '_SYSSMU12_1920852065$'; select * from v$diag_info WHERE name='Default Trace File'; /home/u01/app/oracle/diag/rdbms/sghsdbp/sghsdbp2/trace/sghsdbp2_ora_252651.trc
3.分析dump文件
这里的值为0x017356c6 查看undo块地址对应十进制 select to_number('017356c6','xxxxxxxxxxxxx') from dual; 20971856
4.查看dump对应的undo块 fileID: select DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE(20971856) from dual; 5 blockID: select DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK(20971856) from dual; 336 alter system dump datafile 5 block 336; select * from v$diag_info WHERE name='Default Trace File'; /home/u01/app/oracle/diag/rdbms/sghsdbp/sghsdbp2/trace/sghsdbp2_ora_252651.trc
5.分析dump的undo块
分析undo record *----------------------------- * Rec #0x2 slt: 0x03 objn: 939448(0x000e55b8) objd: 939448 tblspc: 9(0x00000009) * Layer: 11 (Row) opc: 1 rci 0x01 ..... *----------------------------- * Rec #0xa slt: 0x03 objn: 87951(0x00113113) objd: 87951 tblspc: 9(0x00000009) * Layer: 10 (Table) opc: 22 rci 0x00 select * from dba_objects where object_id=87951 select * from LC10019999.SYLOCKAID; 重建LC10019999.SYLOCKAID表。重建之后大概15分钟之后报错的频率慢慢降低,直到不再报错,系统正常。
|