|
|||||||||||||
|
|
Confirmation of the new control file: SQL> startup ORACLE instance started. Total System Global Area 289406976 bytes Fixed Size 1290184 bytes Variable Size 104857656 bytes Database Buffers 176160768 bytes Redo Buffers 7098368 bytes Database mounted. Database opened. SQL> col name for a52 SQL> select name from v$controlfile; NAME ---------------------------------------------------- D:\ORACLE\PRODUCT\10.2.0\ORADATA\DB10\CONTROL01.CTL D:\ORACLE\PRODUCT\10.2.0\ORADATA\DB10\CONTROL02.CTL F:\ORACLE\CONTROL03.CTL Now I’m ready to disconnect the flash drive. With no direct warning (nothing raised to the user interface until an operation is attempted), the instance crashes. SQL> select name from v$controlfile; select name from v$controlfile * ERROR at line 1: ORA-03113: end-of-file on communication channel The alert log shows the following (plus much more; this is the relevant part): Thu Mar 06 12:41:15 2008 Errors in file d:\oracle\product\10.2.0\admin\db10\bdump\db10_ckpt_2756.trc: ORA-00206: error in writing (block 3, # blocks 1) of control file ORA-00202: control file: 'F:\ORACLE\CONTROL03.CTL' ORA-27072: File I/O error OSD-04008: WriteFile() failure, unable to write to file O/S-Error: (OS 1006) The volume for a file has been externally altered so that the opened file is no longer valid. …some more messages… Thu Mar 06 12:41:29 2008 Instance terminated by CKPT, pid = 2756 The fix part is easy, well known, and well documented elsewhere, but were you absolutely certain what actually happens when a control file is lost?
Coming back to online redo log files, is the loss of a member survivable, that is, does the instance crash like it did for loss of a control file? The answer is: it depends. A group can have one or more members. Let’s look at #members = 1. Like the previous example, I’ll create a group with a member on a flash drive, and make that group the active group (what is the meaning of STATUS for the values of ACTIVE, CURRENT, INACTIVE and UNUSED?). Once the group is active, I’ll pull the plug. More than one way to do this, but what is the current redo log group? From the alert log:
Just like the loss of a control file, the instance is terminated.
SQL> select * from v$logfile;
GROUP# STATUS TYPE MEMBER
---------- ------- ------- ------------------------------------------------
7 ONLINE D:\ORACLE\PRODUCT\10.2.0\ORADATA\DB10\REDO07.LOG
8 ONLINE D:\ORACLE\PRODUCT\10.2.0\ORADATA\DB10\REDO08.LOG
10 ONLINE F:\ORACLE\REDO10.LOG
9 ONLINE D:\ORACLE\PRODUCT\10.2.0\ORADATA\DB10\REDO09.LOG
SQL> select * from v$logfile;
select * from v$logfile
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
In this case, it was LGWR that was responsible for terminating the instance. Note that it was CKPT in the other example. What gets recorded in the control file? The SCN. Which process stamps the control file with the SCN? The checkpoint process. If trying this at home, running the database is noarchivelog mode, and using only one member in the current group, what will it take to be able to open the database again? Had that group been multiplexed, and the same file was “pulled” from the system, what happens? Life goes on, and the alert log is written to with a message about the missing member. Let’s add a second member to group 10, placing it on a different (i.e., permanent) drive, and then pull the flash drive again (that should be a hint about what it takes to open the database). Here is what the alert log shows: Thu Mar 06 14:51:45 2008 Thread 1 advanced to log sequence 216 Current log# 10 seq# 216 mem# 0: F:\ORACLE\REDO10.LOG Current log# 10 seq# 216 mem# 1: D:\ORACLE\PRODUCT\10.2.0\ORADATA\DB10\REDO11.LOG Thu Mar 06 14:53:36 2008 Errors in file d:\oracle\product\10.2.0\admin\db10\bdump\db10_lgwr_2224.trc: ORA-00345: redo log write error block 23 count 2 ORA-00312: online log 10 thread 1: 'F:\ORACLE\REDO10.LOG' ORA-27072: File I/O error OSD-04008: WriteFile() failure, unable to write to file O/S-Error: (OS 1006) The volume for a file has been externally altered so that the opened file is no longer valid. Thu Mar 06 14:53:36 2008 Errors in file d:\oracle\product\10.2.0\admin\db10\bdump\db10_lgwr_2224.trc: ORA-00343: too many errors, log member closed ORA-00346: log member marked as STALE ORA-00312: online log 10 thread 1: 'F:\ORACLE\REDO10.LOG' The best part about the messages is the lack of two words: Instance terminated. After a shutdown and startup, the alert log will still “complain” about the missing file, but the bottom line is that the database will be open. Thu Mar 06 14:56:51 2008 ALTER DATABASE OPEN Thu Mar 06 14:56:52 2008 Errors in file d:\oracle\product\10.2.0\admin\db10\bdump\db10_lgwr_2288.trc: ORA-00313: open failed for members of log group 10 of thread 1 Thu Mar 06 14:56:52 2008 Errors in file d:\oracle\product\10.2.0\admin\db10\bdump\db10_lgwr_2288.trc: ORA-00313: open failed for members of log group 10 of thread 1 Thu Mar 06 14:56:52 2008 Thread 1 opened at log sequence 216 Current log# 10 seq# 216 mem# 1: D:\ORACLE\PRODUCT\10.2.0\ORADATA\DB10\REDO11.LOG Successful open of redo thread 1 …some other messages… Thu Mar 06 14:56:59 2008 Completed: ALTER DATABASE OPEN Loss of datafilesSo far, recovering from loss of a control file or redo log file is pretty close to losing no data. The single member redo log group example is a case where data loss can take place. With respect to datafiles, there are two kinds: the ones that matter to you (your data) and the ones that matter to Oracle (its data). To clarify this categorization, here are two questions: 1. What happens if one your tablespaces loses a datafile? 2. What happens if the SYSTEM tablespace, as an example of an owned-by-Oracle tablespace, loses a datafile? If you answered those questions as is, what was your frame of reference with respect to archiving? If you didn’t consider archiving when formulating your answers, then you stand a 50-50% chance of having been wrong. If operating in NOARCHIVELOG mode, and you lose a datafile – any datafile – it’s game over, instance terminates. If operating in ARCHIVELOG mode, then the answers to the two questions above are different: loss of your datafile means the instance keeps going, loss of an Oracle specific datafile means the instance is terminated. The section titled “Recovering After the Loss of Datafiles: Scenarios” in the Backup and Recovery Advanced User’s Guide spells this out quite nicely. In ClosingPretty much all of the backup and recovery scenarios related to loss of special files can be replicated on a PC using a very inexpensive flash drive. Whether on an AIX 5300L gazillion CPU machine with a gazillion megabytes of RAM or on your personal desktop/laptop that just barely runs Oracle, the principles are virtually the same. Give it a try and see for yourself. These are drills best practiced without the pressure of having to do these things when your instance is crashed and time is money.
|
||||||||||||||||||||||||||||||||||||||||||||||
![]() |
![]()