Skip Headers

Oracle® High Availability Architecture and Best Practices
10g Release 1 (10.1)

Part Number B10726-01
Go to Documentation Home
Home
Go to Book List
Book List
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Feedback

Go to next page
Next
View PDF

Contents

Title and Copyright Information

Send Us Your Comments

Preface

Audience
Organization
Related Documentation
Conventions
Documentation Accessibility

Part I Getting Started

1 Overview of High Availability

Introduction to High Availability
What is Availability?
Importance of Availability
Causes of Downtime
What Does This Book Contain?
Who Should Read This Book?

2 Determining Your High Availability Requirements

Why It Is Important to Determine High Availability Requirements
Analysis Framework for Determining High Availability Requirements
Business Impact Analysis
Cost of Downtime
Recovery Time Objective
Recovery Point Objective
Choosing a High Availability Architecture
HA Systems Capabilities
Business Performance, Budget and Growth Plans
High Availability Best Practices

Part II Oracle Database High Availability Features, Architectures, and Policies

3 Oracle Database High Availability Features

Oracle Real Application Clusters
Oracle Data Guard
Oracle Streams
Online Reorganization
Transportable Tablespaces
Automatic Storage Management
Flashback Technology
Oracle Flashback Query
Oracle Flashback Version Query
Oracle Flashback Transaction Query
Oracle Flashback Table
Oracle Flashback Drop
Oracle Flashback Database
Dynamic Reconfiguration
Oracle Fail Safe
Recovery Manager
Flash Recovery Area
Hardware Assisted Resilient Data (HARD) Initiative

4 High Availability Architectures

Oracle Database High Availability Architectures
"Database Only" Architecture
"RAC Only" Architecture
"Data Guard Only" Architecture
Maximum Availability Architecture
Streams Architecture
Choosing the Correct HA Architecture
Assessing Other Architectures

5 Operational Policies for High Availability

Introduction to Operational Policies for High Availability
Service Level Management for High Availability
Planning Capacity to Promote High Availability
Change Management for High Availability
Backup and Recovery Planning for High Availability
Disaster Recovery Planning
Planning Scheduled Outages
Staff Training for High Availability
Documentation as a Means of Maintaining High Availability
Physical Security Policies and Procedures for High Availability

Part III Configuring a Highly Available Oracle Environment

6 System and Network Configuration

Overview of System Configuration Recommendations
Recommendations for Configuring Storage
Ensure That All Hardware Components Are Fully Redundant and Fault-Tolerant
Use an Array That Can Be Serviced Online
Mirror and Stripe for Protection and Performance
Load-Balance Across All Physical Interfaces
Create Independent Storage Areas
Storage Recommendations for Specific HA Architectures
Define ASM Disk and Failure Groups Properly
Use HARD-Compliant Storage for the Greatest Protection Against Data Corruption
Storage Recommendation for RAC
Protect the Oracle Cluster Registry and Voting Disk From Media Failure
Recommendations for Configuring Server Hardware
Server Hardware Recommendations for All Architectures
Use Fewer, Faster, and Denser Components
Use Redundant Hardware Components
Use Systems That Can Detect and Isolate Failures
Protect the Boot Disk With a Backup Copy
Server Hardware Recommendations for RAC
Use a Supported Cluster System to Run RAC
Choose the Proper Cluster Interconnect
Server Hardware Recommendations for Data Guard
Use Identical Hardware for Every Machine at Both Sites
Recommendations for Configuring Server Software
Server Software Recommendations for All Architectures
Use the Same OS Version, Patch Level, Single Patches, and Driver Versions
Use an Operating System That is Fault-Tolerant to Hardware Failures
Configure Swap Partititions Appropriately
Set Operating System Parameters to Enable Future Growth
Use Logging or Journal File Systems
Mirror Disks That Contain Oracle and Application Software
Server Software Recommendations for RAC
Use Supported Clustering Software
Use Network Time Protocol (NTP) On All Cluster Nodes
Recommendations for Configuring the Network
Network Configuration Best Practices for All Architectures
Ensure That All Network Components Are Redundant
Use Load Balancers to Distribute Incoming Requests
Network Configuration Best Practices for RAC
Classify Network Interfaces Using the Oracle Interface Configuration Tool
Network Configuration Best Practices for Data Guard
Configure System TCP Parameters Appropriately
Use WAN Traffic Managers to Provide Site Failover Capabilities

7 Oracle Configuration Best Practices

Configuration Best Practices for the Database
Use Two Control Files
Set CONTROL_FILE_RECORD_KEEP_TIME Large Enough
Configure the Size of Redo Log Files and Groups Appropriately
Multiplex Online Redo Log Files
Enable ARCHIVELOG Mode
Enable Block Checksums
Enable Database Block Checking
Log Checkpoints to the Alert Log
Use Fast-Start Checkpointing to Control Instance Recovery Time
Capture Performance Statistics About Timing
Use Automatic Undo Management
Use Locally Managed Tablespaces
Use Automatic Segment Space Management
Use Temporary Tablespaces and Specify a Default Temporary Tablespace
Use Resumable Space Allocation
Use a Flash Recovery Area
Enable Flashback Database
Set Up and Follow Security Best Practices
Use the Database Resource Manager
Use a Server Parameter File
Configuration Best Practices for Real Application Clusters
Register All Instances with Remote Listeners
Do Not Set CLUSTER_INTERCONNECTS Unless Required for Scalability
Configuration Best Practices for Data Guard
Use a Simple, Robust Archiving Strategy and Configuration
Use Multiplexed Standby Redo Logs and Configure Size Appropriately
Enable FORCE LOGGING Mode
Use Real Time Apply
Configure the Database and Listener for Dynamic Service Registration
Tune the Network in a WAN Environment
Determine the Data Protection Mode
Determining the Protection Mode
Changing the Data Protection Mode
Conduct a Performance Assessment with the Proposed Network Configuration
Use a LAN or MAN for Maximum Availability or Maximum Protection Modes
Set SYNC=NOPARALLEL/PARALLEL Appropriately
Use ARCH for the Greatest Performance Throughput
Use the ASYNC Attribute with a 50 MB Buffer for Maximum Performance Mode
Evaluate SSH Port Forwarding with Compression
Set LOG_ARCHIVE_LOCAL_FIRST to TRUE
Provide Secure Transmission of Redo Data
Set DB_UNIQUE_NAME
Set LOG_ARCHIVE_CONFIG Correctly
Recommendations for the Physical Standby Database Only
Tune Media Recovery Performance
Recommendations for the Logical Standby Database Only
Use Supplemental Logging and Primary Key Constraints
Set the MAX_SERVERS Initialization Parameter
Increase the PARALLEL_MAX_SERVERS Initialization Parameter
Set the TRANSACTION_CONSISTENCY Initialization Parameter
Skip SQL Apply for Unnecessary Objects
Configuration Best Practices for MAA
Configure Multiple Standby Instances
Configure Connect-Time Failover for Network Service Descriptors
Recommendations for Backup and Recovery
Use Recovery Manager to Back Up Database Files
Understand When to Use Backups
Perform Regular Backups
Initial Data Guard Environment Set-Up
Recovering from Data Failures Using File or Block Media Recovery
Double Failure Resolution
Long-Term Backups
Use an RMAN Recovery Catalog
Use the Autobackup Feature for the Control File and SPFILE
Use Incrementally Updated Backups to Reduce Restoration Time
Enable Change Tracking to Reduce Backup Time
Create Database Backups on Disk in the Flash Recovery Area
Create Tape Backups from the Flash Recovery Area
Determine Retention Policy and Backup Frequency
Configure the Size of the Flash Recovery Area Properly
In a Data Guard Environment, Back Up to the Flash Recovery Area on All Sites
During Backups, Use the Target Database Control File as the RMAN Repository
Regularly Check Database Files for Corruption
Periodically Test Recovery Procedures
Back Up the OCR to Tape or Offsite
Recommendations for Fast Application Failover
Configure Connection Descriptors for All Possible Production Instances
Use RAC Availability Notifications and Events
Use Transparent Application Failover If RAC Notification Is Not Feasible
New Connections
Existing Connections
LOAD_BALANCE Parameter in the Connection Descriptor
FAILOVER Parameter in the Connection Descriptor
SERVICE_NAME Parameter in the Connection Descriptor
RETRIES Parameter in the Connection Descriptor
DELAY Parameter in the Connection Descriptor
Configure Services
Configure CRS for High Availability
Configure Service Callouts to Notify Middle-Tier Applications and Clients
Publish Standby or Nonproduction Services
Publish Production Services

Part IV Managing a Highly Available Oracle Environment

8 Using Oracle Enterprise Manager for Monitoring and Detection

Overview of Monitoring and Detection for High Availability
Using Enterprise Manager for System Monitoring
Set Up Default Notification Rules for Each System
Use Database Target Views to Monitor Health, Availability, and Performance
Use Event Notifications to React to Metric Changes
Use Events to Monitor Data Guard system Availability
Managing the HA Environment with Enterprise Manager
Check Enterprise Manager Policy Violations
Use Enterprise Manager to Manage Oracle Patches and Maintain System Baselines
Use Enterprise Manager to Manage Data Guard Targets
Highly Available Architectures for Enterprise Manager
Recommendations for an HA Architecture for Enterprise Manager
Protect the Repository and Processes As Well as the Configuration They Monitor
Place the Management Repository in a RAC Instance and Use Data Guard
Configure At Least Two Management Service Processes and Load Balance Them
Consider Hosting Enterprise Manager on the Same Hardware as an HA System
Monitor the Network Bandwidth Between Processes and Agents
Unscheduled Outages for Enterprise Manager
Additional Enterprise Manager Configuration
Configure a Separate Listener for Enterprise Manager
Install the Management Repository Into an Existing Database

9 Recovering from Outages

Recovery Steps for Unscheduled Outages
Recovery Steps for Unscheduled Outages on the Primary Site
Recovery Steps for Unscheduled Outages on the Secondary Site
Recovery Steps for Scheduled Outages
Recovery Steps for Scheduled Outages on the Primary Site
Recovery Steps for Scheduled Outages on the Secondary Site
Preparing for Scheduled Secondary Site Maintenance

10 Detailed Recovery Steps

Summary of Recovery Operations
Complete or Partial Site Failover
Complete Site Failover
Partial Site Failover: Middle-Tier Applications Connect to a Remote Database Server
Database Failover
When to Use Data Guard Failover
When Not to Use Data Guard Failover
Data Guard Failover Using SQL*Plus
Physical Standby Failover Using SQL*Plus
Logical Standby Failover Using SQL*Plus
Database Switchover
When to Use Data Guard Switchover
When Not to Use Data Guard Switchover
Data Guard Switchover Using SQL*Plus
Physical Standby Switchover Using SQL*Plus
Logical Standby Switchover Using SQL*Plus
RAC Recovery
RAC Recovery for Unscheduled Outages
Automatic Instance Recovery for Failed Instances
Single Node Failure in Real Application Clusters
Multiple Node Failures in Real Application Clusters
Automatic Service Relocation
RAC Recovery for Scheduled Outages
Disabling CRS-Managed Resources
Planned Service Relocation
Apply Instance Failover
Performing an Apply Instance Failover Using SQL*Plus
Step 1: Ensure That the Chosen Standby Instance is Mounted
Step 2: Verify Oracle Net Connection to the Chosen Standby Host
Step 3: Start Recovery on the Chosen Standby Instance
Step 4: Copy Archived Redo Logs to the New Apply Host
Step 5: Verify the New Configuration
Recovery Solutions for Data Failures
Detecting and Recovering From Datafile Block Corruption
Detecting Datafile Block Corruption
Recovering From Datafile Block Corruption
Determine the Extent of the Corruption Problem
Replace or Move Away From Faulty Hardware
Determine Which Objects Are Affected
Decide Which Recovery Method to Use
Recovering From Media Failure
Determine the Extent of the Media Failure
Replace or Move Away From Faulty Hardware
Decide Which Recovery Action to Take
Recovery Methods for Data Failures
Use RMAN Datafile Media Recovery
Use RMAN Block Media Recovery
Re-Create Objects Manually
Use Data Guard to Recover From Data Failure
Recovering from User Error with Flashback Technology
Resolving Row and Transaction Inconsistencies
Flashback Query
Flashback Version Query
Flashback Transaction Query
Example: Using Flashback Technology to Investigate Salary Discrepancy
Resolving Table Inconsistencies
Flashback Table
Flashback Drop
Resolving Database-Wide Inconsistencies
Flashback Database
Using Flashback Database to Repair a Dropped Tablespace
RAC Rolling Upgrade
Applying a Patch with opatch
Rolling Back a Patch with opatch
Using opatch to List Installed Software Components and Patches
Recommended Practices for RAC Rolling Upgrades
Upgrade with Logical Standby Database
Online Object Reorganization
Online Table Reorganization
Online Index Reorganization
Online Tablespace Reorganization

11 Restoring Fault Tolerance

Restoring Full Tolerance
Restoring Failed Nodes or Instances in a RAC Cluster
Recovering Service Availability
Considerations for Client Connections After Restoring a RAC Instance
Restoring the Standby Database After a Failover
Restoring a Physical Standby Database After a Failover
Step 1P: Retrieve STANDBY_BECAME_PRIMARY_SCN
Step 2P: Flash Back the Previous Production Database
Step 3P: Mount New Standby Database From Previous Production Database
Step 4P: Archive to New Standby Database From New Production Database
Step 5P: Start Managed Recovery
Step 6P: Restart MRP After It Encounters the End-of-Redo Marker
Restoring a Logical Standby Database After a Failover
Step 1L: Retrieve END_PRIMARY_SCN
Step 2L: Flash Back the Previous Production Database
Step 3L: Open New Logical Standby Database and Start SQL Apply
Restoring Fault Tolerance after Secondary Site or Clusterwide Scheduled Outage
Step 1: Start the Standby Database
Step 2: Start Recovery
Step 3: Verify Log Transport Services on Production Database
Step 4: Verify that Recovery is Progressing on Standby Database
Step 5: Restore Production Database Protection Mode
Restoring Fault Tolerance after a Standby Database Data Failure
Step 1: Fix the Cause of the Outage
Step 2: Restore the Backup of Affected Datafiles
Step 3: Restore Required Archived Redo Log Files
Step 4: Start the Standby Database
Step 5: Start Recovery or Apply
Step 6: Verify Log Transport Services On the Production Database
Step 7: Verify that Recovery or Apply Is Progressing On the Standby Database
Step 8: Restore Production Database Protection Mode
Restoring Fault Tolerance After the Production Database Has Opened Resetlogs
Scenario 1: SCN on Standby is Behind Resetlogs SCN on Production
Scenario 2: SCN on Standby is Ahead of Resetlogs SCN on Production
Restoring Fault Tolerance after Dual Failures

A Hardware Assisted Resilient Data (HARD) Initiative

Preventing Data Corruptions with HARD-Compliant Storage
Data Corruptions
Types of Data Corruption Addressed by HARD
Possible HARD Checks

B Database SPFILE and Oracle Net Configuration File Samples

SPFILE Samples
Oracle Net Configuration Files
SQLNET.ORA File Example for All Hosts Using Dynamic Instance Registration
LISTENER.ORA File Example for All Hosts Using Dynamic Instance Registration
TNSNAMES.ORA File Example for All Hosts Using Dynamic Instance Registration

Index