Oracle® Real Application Clusters Administrator's Guide 10g Release 1 (10.1) Part Number B10765-01 |
|
|
View PDF |
This chapter explains instance recovery and how to use Recovery Manager (RMAN) to back up and restore Oracle Real Application Cluster (RAC) databases. This chapter also describes RAC instance recovery as well as parallel backup and recovery with SQL*Plus. The topics in this chapter include:
Instance failure occurs when software or hardware problems disable an instance. After instance failure, Oracle automatically uses the online redo logs to perform recovery as described in this section.
Instance recovery in RAC does not include the recovery of applications that were running on the failed instance. Oracle clusterware restarts the instance automatically. You can also use callout programs as described in the example in the Oracle Real Application Clusters Deployment and Performance Guide to trigger application recovery.
Applications that were running continue by using failure recognition and recovery. This provides consistent and uninterrupted service in the event of hardware or software failures. When one instance performs recovery for another instance, the surviving instance reads online redo logs generated by the failed instance and uses that information to ensure that committed transactions are recorded in the database. Thus, data from committed transactions is not lost. The instance performing recovery rolls back transactions that were active at the time of the failure and releases resources used by those transactions.
Note: All online redo logs must be accessible for instance recovery. Therefore, Oracle recommends that you mirror your online redo logs. |
When multiple node failures occur, as long as one instance survives, RAC performs instance recovery for any other instances that fail. If all instances of a RAC database fail, then Oracle automatically recovers the instances the next time one instance opens the database. The instance performing recovery can mount the database in either shared or exclusive mode from any node of a RAC database. This recovery procedure is the same for Oracle running in shared mode as it is for Oracle running in exclusive mode, except that one instance performs instance recovery for all the failed instances.
Oracle provides RMAN for backing up and restoring the database. RMAN enables you to back up, copy, restore, and recover datafiles, control files, SPFILEs, and archived redo logs. RMAN is included with the Oracle server and it is installed by default. You can run RMAN from the command line or you can use it from the Backup Manager in Oracle Enterprise Manager. The procedures for using RMAN in RAC environments do not differ substantially from those for Oracle single-instance environments. Refer to the Oracle Backup and Recovery documentation set for more information about single-instance RMAN backup procedures.
RMAN can restore the server parameter file either to the default location or to a location that you specify. This procedure is described in Oracle Database Backup and Recovery Basics.
You cannot specify a net service name that uses Oracle Net to distribute RMAN connections to more than one instance. In any RMAN connection made through a net service name, each net service name must specify only one instance. This applies to all RMAN connections, whether from the command line or through the CONNECT
clause in ALLOCATE
CHANNEL
or CONFIGURE
CHANNEL
RMAN commands. Additionally, when you use RMAN from the command line, you can only connect to one instance in a RAC database at a time. Assume that node1, node2, and node3 are net service names for three instances in a RAC configuration. In this case, connect to the target database with only one of these net service names, for example:
% rman TARGET SYS/oracle@node2 CATALOG rman/cat@catdb
When making backups using channels connected to different instances, each allocated channel can connect to a different instance in the cluster, and each channel connection must resolve to only one instance. For example, configure automatic channels as follows:
CONFIGURE DEFAULT DEVICE TYPE TO sbt; CONFIGURE DEVICE TYPE sbt PARALLELISM 3; CONFIGURE CHANNEL 1 DEVICE TYPE sbt CONNECT = 'SYS/oracle@node1'; CONFIGURE CHANNEL 2 DEVICE TYPE sbt CONNECT = 'SYS/oracle@node2'; CONFIGURE CHANNEL 3 DEVICE TYPE sbt CONNECT = 'SYS/oracle@node3';
During a backup, the instances to which the channels connect must be either all mounted or all open. For example, if the node1 instance has the database mounted while the node2 and node3 instances have the database open, then the backup fails.
In some cluster database configurations, some nodes of the cluster have faster access to certain datafiles than to other datafiles. RMAN automatically detects this, which is known as node affinity awareness. When deciding which channel to use to back up a particular datafile, RMAN gives preference to the nodes with faster access to the datafiles that you want to back up. For example, if you have a three-node cluster, and if node 1 has faster read/write access to datafiles 7, 8, and 9 than the other nodes, then node 1 has greater node affinity to those files than nodes 2 and 3.
To use node affinity, configure RMAN channels on the nodes of the cluster that have affinity to the datafiles you want to back up. For example, use the syntax:
CONFIGURE CHANNEL 1 DEVICE TYPE sbt CONNECT 'user1/password1@node1'; CONFIGURE CHANNEL 2 DEVICE TYPE sbt CONNECT 'user2/password2@node2'; CONFIGURE CHANNEL 3 DEVICE TYPE sbt CONNECT 'user3/password3@node3';
Refer to Oracle Database Recovery Manager Reference for more information about the CONNECT
clause of the CONFIGURE CHANNEL
statement.
The node performing the backup must be able to read all of the files specified in the BACKUP
command. For example, assume that you run the following command on node 1 of a three-node cluster:
BACKUP DATABASE PLUS ARCHIVELOG;
In this case, RMAN attempts to back up all datafiles, archived redo logs, and SPFILEs. Because the datafiles are either cluster file system files or files on a shared disk, RMAN can read them. However, RMAN cannot back up any of the archived redo logs that the local node cannot read. The archiving scenarios in Chapter 6, " Configuring Recovery Manager and Archiving " explain how to configure the environment so that all archived redo logs are accessible by the node performing the backup.
The BACKUP
command must be able to delete the archived redo logs from disk after backing them up. The following script is an example of one method for deleting the archived redo logs from each node after backing them up:
ALLOCATE CHANNEL FOR MAINTENANCE DEVICE TYPE DISK CONNECT 'SYS/oracle@node1'; DELETE ARCHIVELOG LIKE '%arc_dest_1%' BACKED UP 1 TIMES TO DEVICE TYPE sbt; RELEASE CHANNEL; ALLOCATE CHANNEL FOR MAINTENANCE DEVICE TYPE DISK CONNECT 'SYS/oracle@node2'; DELETE ARCHIVELOG LIKE '%arc_dest_2%' BACKED UP 1 TIMES TO DEVICE TYPE sbt; RELEASE CHANNEL; ALLOCATE CHANNEL FOR MAINTENANCE DEVICE TYPE DISK CONNECT 'SYS/oracle@node3'; DELETE ARCHIVELOG LIKE '%arc_dest_3%' BACKED UP 1 TIMES TO DEVICE TYPE sbt; RELEASE CHANNEL;
When configuring the backup options for RAC, you have three possible configurations:
Network Backup Server. A dedicated backup server performs and manages backups for the cluster and the cluster database. None of the nodes have local backup appliances.
One Local Drive. One node has access to a local backup appliance and performs and manages backups for the cluster database. All nodes of the cluster should be on a cluster file system to be able to read all datafiles, archived redo logs, and SPFILEs. Oracle recommends that you do not use the non-cluster file system archiving scheme if you have backup media on only one local drive.
Multiple Drives. Each node has access to a local backup appliance and can write to its own local backup media.
In the cluster file system scheme, any node can access all the datafiles, archived redo logs, and SPFILEs. In the non-cluster file system scheme, you must write the backup script so that the backup is distributed to the correct drive and path for each node. For example, node 1 can back up the archived redo logs whose path names begin with /arc_dest_1
, node 2 can back up the archived redo logs whose path names begin with /arc_dest_2
, and node 3 can back up the archived redo logs whose path names begin with /arc_dest_3
.
RMAN automatically performs autolocation of all files that it needs to back up or restore. This feature is automatically enabled whenever the allocated channels use different CONNECT
or PARMS
settings.
If you use the non-cluster file system local archiving scheme, then a node can only read the archived redo logs that were generated by an instance on that node. RMAN never attempts to back up archived redo logs on a channel it cannot read.
During a restore operation, RMAN automatically performs the autolocation of backups. A channel connected to a specific node only attempts to restore files that were backed up to the node. For example, assume that log sequence 1001 is backed up to the drive attached to node 1, while log 1002 is backed up to the drive attached to node 2. If you then allocate channels that connect to each node, then the channel connected to node 1 can restore log 1001 (but not 1002), and the channel connected to node 2 can restore log 1002 (but not 1001).
This section describes the options for backup scenarios.
In a cluster file system backup scheme, each node in the cluster has read access to all the datafiles, archived redo logs, and SPFILEs. This includes Automated Storage Management (ASM), cluster file systems, and Network Attached Storage (NAS).
This scheme assumes that only one node in the cluster has a local backup appliance such as a tape drive. In this case, run the following one-time configuration commands:
CONFIGURE DEVICE TYPE sbt PARALLELISM 1; CONFIGURE DEFAULT DEVICE TYPE TO sbt;
Because any node performing the backup has read/write access to the archived redo logs written by the other nodes, the backup script for any node is simple:
BACKUP DATABASE PLUS ARCHIVELOG DELETE INPUT;
In this case, the tape drive receives all datafiles, archived redo logs, and SPFILEs.
This scheme assumes that each node in the cluster has its own local tape drive. Perform the following one-time configuration so that one channel is configured for each node in the cluster. This is a one-time configuration step. For example, enter the following at the RMAN prompt:
CONFIGURE DEVICE TYPE sbt PARALLELISM 3; CONFIGURE DEFAULT DEVICE TYPE TO sbt; CONFIGURE CHANNEL 1 DEVICE TYPE sbt CONNECT 'user1/password1@node1'; CONFIGURE CHANNEL 2 DEVICE TYPE sbt CONNECT 'user2/password2@node2'; CONFIGURE CHANNEL 3 DEVICE TYPE sbt CONNECT 'user3/password3@node3';
Similarly, you can perform this configuration for a device type of DISK
. The following backup script, which you can run from any node in the cluster, distributes the datafiles, archived redo logs, and SPFILE backups among the backup drives:
BACKUP DATABASE PLUS ARCHIVELOG DELETE INPUT;
For example, if the database contains 10 datafiles and 100 archived redo logs are on disk, then the node 1 backup drive can back up datafiles 1, 3, and 7 and logs 1-33. Node 2 can back up datafiles 2, 5, and 10 and logs 34-66. The node 3 backup drive can back up datafiles 4, 6, 8 and 9 as well as archived redo logs 67-100.
In a non-cluster file system environment, each node can back up only its own local archived redo logs. For example, node 1 cannot access the archived redo logs on node 2 or node 3 unless you configure the network file system for remote access. To configure NFS, distribute the backup to multiple drives. However, if you configure NFS for backups, then you can only back up to one drive.
This scheme assumes that each node in the cluster has its own local tape drive. Perform the following one-time configuration to configure one channel for each node in the cluster. For example, enter the following at the RMAN prompt:
CONFIGURE DEVICE TYPE sbt PARALLELISM 3; CONFIGURE DEFAULT DEVICE TYPE TO sbt; CONFIGURE CHANNEL 1 DEVICE TYPE sbt CONNECT 'user1/password1@node1'; CONFIGURE CHANNEL 2 DEVICE TYPE sbt CONNECT 'user2/password2@node2'; CONFIGURE CHANNEL 3 DEVICE TYPE sbt CONNECT 'user3/password3@node3';
Similarly, you can perform this configuration for a device type of DISK
.
Develop a production backup script for whole database backups that you can run from any node. The RMAN autolocation feature ensures that the channel allocated on each node only backs up the archived redo logs that are located on that node. The following example uses automatic channels to make a database and archived redo log backup:
BACKUP DATABASE PLUS ARCHIVELOG DELETE INPUT;
In this example, the datafile backups, archived redo logs, and SPFILE backups are distributed among the different tape drives. However, channel 1 can only read the logs archived locally on /arc_dest_1
. This is because the autolocation feature restricts channel 1 to only back up the archived redo logs in the /arc_dest_1
directory and because node 2 can only read files in the /arc_dest_2
directory, channel 2 can only back up the archived redo logs in the /arc_dest_2
directory, and so on. The important point is that all logs are backed up, but they are distributed among the different drives.
Media recovery must be user-initiated through a client application, whereas instance recovery is automatically performed by the database. In these situations, use RMAN to restore backups of the datafiles and then recover the database. The procedures for RMAN media recovery in RAC environments do not differ substantially from the media recovery procedures for single-instance environments. The node that performs the recovery must be able to restore all the required datafiles. That node must also be able to either read all the required archived redo logs on disk or be able to restore them from backups.
This section describes the RMAN restore scenarios.
The restore and recovery procedures in a cluster file system scheme do not differ substantially from Oracle single-instance scenarios.
First, refer to "Backing Up to One Local Drive in the Cluster File System Archiving Scheme" to perform the one-time configuration.
In this example, assume that node 3 performs the backups. If node 3 is available for the restore and recovery processing, and if all the existing archived redo logs have been backed up or are on disk, then run the following commands to perform complete recovery:
RESTORE DATABASE; RECOVER DATABASE;
If node 3 performed the backups but is unavailable, then configure a media management device for one of the remaining nodes and make the backup media from node 3 available to this device.
First, refer to "Backing Up to Multiple Drives in the Cluster File System Archiving Scheme" to perform the one-time configuration so that one channel is configured for each node in the cluster. If all existing archived redo logs have been backed up or are on disk, then run the following commands for complete recovery from any node in the cluster:
RESTORE DATABASE; RECOVER DATABASE;
Because RMAN autolocates the backups before restoring them, the channel connected to each node only restores the files that were backed up to the tape drive attached to the node.
In this scheme, each node archives locally to a different directory. For example, node 1 archives to /arc_dest_1
, node 2 archives to /arc_dest_2
, and node 3 archives to /arc_dest_3
. You must configure NFS so that the recovery node can read the archiving directories on the remaining nodes. The restore and recovery procedure depends on whether the backups are distributed or nondistributed.
First, refer to "Backing Up to Multiple Drives in a Non-Cluster File System Backup Scheme". If all nodes are available and if all archived redo logs have been backed up, then you can perform a complete restore and recovery by mounting the database and running the following commands from any node:
RESTORE DATABASE; RECOVER DATABASE;
The recovery node begins a server session on each node in the cluster. Because this example assumes that database backups are distributed, the server sessions restore the backup datafiles from the tape drives attached to each node. Because the NFS configuration enables each node read access to the other nodes, the recovery node can read and apply the archived redo logs located on the local and remote disks. No manual transfer of archived redo logs is required.
The resetlogs operation automatically archives online logs. This ensures that your database has the necessary archived redo logs if recovery was done to a point in time in online or standby logs. You do not need to perform a full backup after a resetlogs operation.
The default archivelog format includes a resetlogs id. There is no need to change backup scripts after using resetlogs. RMAN backs up earlier incarnations of logs on running BACKUP
ARCHIVELOG
ALL
or BACKUP
ARCHIVELOG
FROM
TIME
or BACKUP
ARCHIVELOG
FROM
SCN
command.
You can use parallel instance recovery, parallel failure recovery, and parallel media recovery in RAC databases. Refer to Oracle Database Backup and Recovery Advanced User's Guide for more information on these topics.
With RMAN's RESTORE
and RECOVER
commands, Oracle automatically makes parallel the following three stages of recovery:
When restoring datafiles, the number of channels you allocate in the RMAN recover script effectively sets the parallelism that RMAN uses. For example, if you allocate five channels, you can have up to five parallel streams restoring datafiles.
Similarly, when you are applying incremental backups, the number of channels you allocate determines the potential parallelism.
RMAN applies archived redo logs using a specific number of parallel processes as determined by the setting for the RECOVERY_PARALLELISM
initialization parameter. This is described under the topic "Setting the RECOVERY_ PARALLELISM Parameter".
Media recovery parallelism is controlled by the PARALLEL
clause of the ALTER DATABASE RECOVER
statement.
If you have user-managed methods to back up and recover your database, then you can parallelize instance and media recovery using either of the procedures described in this section.
The RECOVERY_PARALLELISM
initialization parameter specifies the number of processes that participate in instance and crash recovery. One process reads the archived redo log files sequentially and dispatches redo information to several recovery processes. The recovery processes then apply the changes from the archived redo log files to the datafiles. A value of 0
or 1
indicates that Oracle performs recovery serially by one process. The value of this parameter cannot exceed the value of the PARALLEL_MAX_SERVERS
parameter.
On multiple-CPU systems, the default for instance, crash, and media recovery is to operate in parallel mode. You can, however, enforce the use of serial recovery by using either the RECOVERY_PARALLELISM
parameter or the NOPARALLEL
clause of the ALTER
DATABASE
STATEMENT
.
When you use the RECOVER
statement to parallelize instance and media recovery, the allocation of recovery processes to instances is operating system-specific. The DEGREE
keyword of the PARALLEL
clause can either signify the number of processes on each instance of a RAC database or the number of processes to distribute across all instances.
To use a flash recovery area in RAC, you must place it on an ASM disk, a Cluster File System, or on a shared directory that is configured through NFS for each RAC instance. In other words, the flash recovery area must be shared among all the instances of a RAC database. In addition, set the parameter DB_RECOVERY_FILE_DEST to the same value on all instances.