Contents

Title and Copyright Information

Send Us Your Comments

Preface

Related Documentation

Documentation Accessibility

Part I Getting Started

1 Overview of High Availability

Introduction to High Availability

What is Availability?

Importance of Availability

Causes of Downtime

What Does This Book Contain?

Who Should Read This Book?

2 Determining Your High Availability Requirements

Why It Is Important to Determine High Availability Requirements

Analysis Framework for Determining High Availability Requirements

Business Impact Analysis

Cost of Downtime

Recovery Time Objective

Recovery Point Objective

Choosing a High Availability Architecture

HA Systems Capabilities

Business Performance, Budget and Growth Plans

High Availability Best Practices

Part II Oracle Database High Availability Features, Architectures, and Policies

3 Oracle Database High Availability Features

Oracle Real Application Clusters

Oracle Data Guard

Oracle Streams

Online Reorganization

Transportable Tablespaces

Automatic Storage Management

Flashback Technology

Oracle Flashback Query

Oracle Flashback Version Query

Oracle Flashback Transaction Query

Oracle Flashback Table

Oracle Flashback Drop

Oracle Flashback Database

Dynamic Reconfiguration

Oracle Fail Safe

Recovery Manager

Flash Recovery Area

Hardware Assisted Resilient Data (HARD) Initiative

4 High Availability Architectures

Oracle Database High Availability Architectures

"Database Only" Architecture

"RAC Only" Architecture

"Data Guard Only" Architecture

Maximum Availability Architecture

Streams Architecture

Choosing the Correct HA Architecture

Assessing Other Architectures

5 Operational Policies for High Availability

Introduction to Operational Policies for High Availability

Service Level Management for High Availability

Planning Capacity to Promote High Availability

Change Management for High Availability

Backup and Recovery Planning for High Availability

Disaster Recovery Planning

Planning Scheduled Outages

Staff Training for High Availability

Documentation as a Means of Maintaining High Availability

Physical Security Policies and Procedures for High Availability

Part III Configuring a Highly Available Oracle Environment

6 System and Network Configuration

Overview of System Configuration Recommendations

Recommendations for Configuring Storage

Ensure That All Hardware Components Are Fully Redundant and Fault-Tolerant

Use an Array That Can Be Serviced Online

Mirror and Stripe for Protection and Performance

Load-Balance Across All Physical Interfaces

Create Independent Storage Areas

Storage Recommendations for Specific HA Architectures

Define ASM Disk and Failure Groups Properly

Use HARD-Compliant Storage for the Greatest Protection Against Data Corruption

Storage Recommendation for RAC

Protect the Oracle Cluster Registry and Voting Disk From Media Failure

Recommendations for Configuring Server Hardware

Server Hardware Recommendations for All Architectures

Use Fewer, Faster, and Denser Components

Use Redundant Hardware Components

Use Systems That Can Detect and Isolate Failures

Protect the Boot Disk With a Backup Copy

Server Hardware Recommendations for RAC

Use a Supported Cluster System to Run RAC

Choose the Proper Cluster Interconnect

Server Hardware Recommendations for Data Guard

Use Identical Hardware for Every Machine at Both Sites

Recommendations for Configuring Server Software

Server Software Recommendations for All Architectures

Use the Same OS Version, Patch Level, Single Patches, and Driver Versions

Use an Operating System That is Fault-Tolerant to Hardware Failures

Configure Swap Partititions Appropriately

Set Operating System Parameters to Enable Future Growth

Use Logging or Journal File Systems

Mirror Disks That Contain Oracle and Application Software

Server Software Recommendations for RAC

Use Supported Clustering Software

Use Network Time Protocol (NTP) On All Cluster Nodes

Recommendations for Configuring the Network

Network Configuration Best Practices for All Architectures

Ensure That All Network Components Are Redundant

Use Load Balancers to Distribute Incoming Requests

Network Configuration Best Practices for RAC

Classify Network Interfaces Using the Oracle Interface Configuration Tool

Network Configuration Best Practices for Data Guard

Configure System TCP Parameters Appropriately

Use WAN Traffic Managers to Provide Site Failover Capabilities

7 Oracle Configuration Best Practices

Configuration Best Practices for the Database

Configuration Best Practices for Real Application Clusters

Register All Instances with Remote Listeners

Do Not Set CLUSTER_INTERCONNECTS Unless Required for Scalability

Configuration Best Practices for Data Guard

Use a Simple, Robust Archiving Strategy and Configuration

Use Multiplexed Standby Redo Logs and Configure Size Appropriately

Enable FORCE LOGGING Mode

Use Real Time Apply

Configure the Database and Listener for Dynamic Service Registration

Tune the Network in a WAN Environment

Determine the Data Protection Mode

Determining the Protection Mode

Changing the Data Protection Mode

Conduct a Performance Assessment with the Proposed Network Configuration

Use a LAN or MAN for Maximum Availability or Maximum Protection Modes

Set SYNC=NOPARALLEL/PARALLEL Appropriately

Use ARCH for the Greatest Performance Throughput

Use the ASYNC Attribute with a 50 MB Buffer for Maximum Performance Mode

Evaluate SSH Port Forwarding with Compression

Set LOG_ARCHIVE_LOCAL_FIRST to TRUE

Provide Secure Transmission of Redo Data

Set DB_UNIQUE_NAME

Set LOG_ARCHIVE_CONFIG Correctly

Recommendations for the Physical Standby Database Only

Tune Media Recovery Performance

Recommendations for the Logical Standby Database Only

Use Supplemental Logging and Primary Key Constraints

Set the MAX_SERVERS Initialization Parameter

Increase the PARALLEL_MAX_SERVERS Initialization Parameter

Set the TRANSACTION_CONSISTENCY Initialization Parameter

Skip SQL Apply for Unnecessary Objects

Configuration Best Practices for MAA

Configure Multiple Standby Instances

Configure Connect-Time Failover for Network Service Descriptors

Recommendations for Backup and Recovery

Use Recovery Manager to Back Up Database Files

Understand When to Use Backups

Perform Regular Backups

Initial Data Guard Environment Set-Up

Recovering from Data Failures Using File or Block Media Recovery

Double Failure Resolution

Long-Term Backups

Use an RMAN Recovery Catalog

Use the Autobackup Feature for the Control File and SPFILE

Use Incrementally Updated Backups to Reduce Restoration Time

Enable Change Tracking to Reduce Backup Time

Create Database Backups on Disk in the Flash Recovery Area

Create Tape Backups from the Flash Recovery Area

Determine Retention Policy and Backup Frequency

Configure the Size of the Flash Recovery Area Properly

In a Data Guard Environment, Back Up to the Flash Recovery Area on All Sites

During Backups, Use the Target Database Control File as the RMAN Repository

Regularly Check Database Files for Corruption

Periodically Test Recovery Procedures

Back Up the OCR to Tape or Offsite

Recommendations for Fast Application Failover

Configure Connection Descriptors for All Possible Production Instances

Use RAC Availability Notifications and Events

Use Transparent Application Failover If RAC Notification Is Not Feasible

New Connections

Existing Connections

LOAD_BALANCE Parameter in the Connection Descriptor

FAILOVER Parameter in the Connection Descriptor

SERVICE_NAME Parameter in the Connection Descriptor

RETRIES Parameter in the Connection Descriptor

DELAY Parameter in the Connection Descriptor

Configure Services

Configure CRS for High Availability

Configure Service Callouts to Notify Middle-Tier Applications and Clients

Publish Standby or Nonproduction Services

Publish Production Services

Part IV Managing a Highly Available Oracle Environment

8 Using Oracle Enterprise Manager for Monitoring and Detection

Overview of Monitoring and Detection for High Availability

Using Enterprise Manager for System Monitoring

Set Up Default Notification Rules for Each System

Use Database Target Views to Monitor Health, Availability, and Performance

Use Event Notifications to React to Metric Changes

Use Events to Monitor Data Guard system Availability

Managing the HA Environment with Enterprise Manager

Check Enterprise Manager Policy Violations

Use Enterprise Manager to Manage Oracle Patches and Maintain System Baselines

Use Enterprise Manager to Manage Data Guard Targets

Highly Available Architectures for Enterprise Manager

Recommendations for an HA Architecture for Enterprise Manager

Protect the Repository and Processes As Well as the Configuration They Monitor

Place the Management Repository in a RAC Instance and Use Data Guard

Configure At Least Two Management Service Processes and Load Balance Them

Consider Hosting Enterprise Manager on the Same Hardware as an HA System

Monitor the Network Bandwidth Between Processes and Agents

Unscheduled Outages for Enterprise Manager

Additional Enterprise Manager Configuration

Configure a Separate Listener for Enterprise Manager

Install the Management Repository Into an Existing Database

9 Recovering from Outages

Recovery Steps for Unscheduled Outages

Recovery Steps for Unscheduled Outages on the Primary Site

Recovery Steps for Unscheduled Outages on the Secondary Site

Recovery Steps for Scheduled Outages

Recovery Steps for Scheduled Outages on the Primary Site

Recovery Steps for Scheduled Outages on the Secondary Site

Preparing for Scheduled Secondary Site Maintenance

10 Detailed Recovery Steps

Summary of Recovery Operations

Complete or Partial Site Failover

Complete Site Failover

Partial Site Failover: Middle-Tier Applications Connect to a Remote Database Server

Database Failover

When to Use Data Guard Failover

When Not to Use Data Guard Failover

Data Guard Failover Using SQL*Plus

Physical Standby Failover Using SQL*Plus

Logical Standby Failover Using SQL*Plus

Database Switchover

When to Use Data Guard Switchover

When Not to Use Data Guard Switchover

Data Guard Switchover Using SQL*Plus

Physical Standby Switchover Using SQL*Plus

Logical Standby Switchover Using SQL*Plus

RAC Recovery

RAC Recovery for Unscheduled Outages

Automatic Instance Recovery for Failed Instances

Single Node Failure in Real Application Clusters

Multiple Node Failures in Real Application Clusters

Automatic Service Relocation

RAC Recovery for Scheduled Outages

Disabling CRS-Managed Resources

Planned Service Relocation

Apply Instance Failover

Performing an Apply Instance Failover Using SQL*Plus

Step 1: Ensure That the Chosen Standby Instance is Mounted

Step 2: Verify Oracle Net Connection to the Chosen Standby Host

Step 3: Start Recovery on the Chosen Standby Instance

Step 4: Copy Archived Redo Logs to the New Apply Host

Step 5: Verify the New Configuration

Recovery Solutions for Data Failures

Detecting and Recovering From Datafile Block Corruption

Detecting Datafile Block Corruption

Recovering From Datafile Block Corruption

Determine the Extent of the Corruption Problem

Replace or Move Away From Faulty Hardware

Determine Which Objects Are Affected

Decide Which Recovery Method to Use

Recovering From Media Failure

Determine the Extent of the Media Failure

Replace or Move Away From Faulty Hardware

Decide Which Recovery Action to Take

Recovery Methods for Data Failures

Use RMAN Datafile Media Recovery

Use RMAN Block Media Recovery

Re-Create Objects Manually

Use Data Guard to Recover From Data Failure

Recovering from User Error with Flashback Technology

Resolving Row and Transaction Inconsistencies

Flashback Query

Flashback Version Query

Flashback Transaction Query

Example: Using Flashback Technology to Investigate Salary Discrepancy

Resolving Table Inconsistencies

Flashback Table

Resolving Database-Wide Inconsistencies

Flashback Database

Using Flashback Database to Repair a Dropped Tablespace

RAC Rolling Upgrade

Applying a Patch with opatch

Rolling Back a Patch with opatch

Using opatch to List Installed Software Components and Patches

Recommended Practices for RAC Rolling Upgrades

Upgrade with Logical Standby Database

Online Object Reorganization

Online Table Reorganization

Online Index Reorganization

Online Tablespace Reorganization

11 Restoring Fault Tolerance

Restoring Full Tolerance

Restoring Failed Nodes or Instances in a RAC Cluster

Recovering Service Availability

Considerations for Client Connections After Restoring a RAC Instance

Restoring the Standby Database After a Failover

Restoring a Physical Standby Database After a Failover

Step 1P: Retrieve STANDBY_BECAME_PRIMARY_SCN

Step 2P: Flash Back the Previous Production Database

Step 3P: Mount New Standby Database From Previous Production Database

Step 4P: Archive to New Standby Database From New Production Database

Step 5P: Start Managed Recovery

Step 6P: Restart MRP After It Encounters the End-of-Redo Marker

Restoring a Logical Standby Database After a Failover

Step 1L: Retrieve END_PRIMARY_SCN

Step 2L: Flash Back the Previous Production Database

Step 3L: Open New Logical Standby Database and Start SQL Apply

Restoring Fault Tolerance after Secondary Site or Clusterwide Scheduled Outage

Step 1: Start the Standby Database

Step 2: Start Recovery

Step 3: Verify Log Transport Services on Production Database

Step 4: Verify that Recovery is Progressing on Standby Database

Step 5: Restore Production Database Protection Mode

Restoring Fault Tolerance after a Standby Database Data Failure

Step 1: Fix the Cause of the Outage

Step 2: Restore the Backup of Affected Datafiles

Step 3: Restore Required Archived Redo Log Files

Step 4: Start the Standby Database

Step 5: Start Recovery or Apply

Step 6: Verify Log Transport Services On the Production Database

Step 7: Verify that Recovery or Apply Is Progressing On the Standby Database

Step 8: Restore Production Database Protection Mode

Restoring Fault Tolerance After the Production Database Has Opened Resetlogs

Scenario 1: SCN on Standby is Behind Resetlogs SCN on Production

Scenario 2: SCN on Standby is Ahead of Resetlogs SCN on Production

Restoring Fault Tolerance after Dual Failures

A Hardware Assisted Resilient Data (HARD) Initiative

Preventing Data Corruptions with HARD-Compliant Storage

Data Corruptions

Types of Data Corruption Addressed by HARD

Possible HARD Checks

B Database SPFILE and Oracle Net Configuration File Samples

SPFILE Samples

Oracle Net Configuration Files

SQLNET.ORA File Example for All Hosts Using Dynamic Instance Registration

LISTENER.ORA File Example for All Hosts Using Dynamic Instance Registration

TNSNAMES.ORA File Example for All Hosts Using Dynamic Instance Registration

Index