Contents
- Audience
- Organization
- Related Documentation
- Conventions
- Documentation Accessibility
- Introduction to High Availability
- What is Availability?
- Importance of Availability
- Causes of Downtime
- What Does This Book Contain?
- Who Should Read This Book?
- Why It Is Important to Determine High Availability Requirements
- Analysis Framework for Determining High Availability Requirements
- Business Impact Analysis
- Cost of Downtime
- Recovery Time Objective
- Recovery Point Objective
- Choosing a High Availability Architecture
- HA Systems Capabilities
- Business Performance, Budget and Growth Plans
- High Availability Best Practices
- Oracle Real Application Clusters
- Oracle Data Guard
- Oracle Streams
- Online Reorganization
- Transportable Tablespaces
- Automatic Storage Management
- Flashback Technology
- Oracle Flashback Query
- Oracle Flashback Version Query
- Oracle Flashback Transaction Query
- Oracle Flashback Table
- Oracle Flashback Drop
- Oracle Flashback Database
- Dynamic Reconfiguration
- Oracle Fail Safe
- Recovery Manager
- Flash Recovery Area
- Hardware Assisted Resilient Data (HARD) Initiative
- Oracle Database High Availability Architectures
- "Database Only" Architecture
- "RAC Only" Architecture
- "Data Guard Only" Architecture
- Maximum Availability Architecture
- Streams Architecture
- Choosing the Correct HA Architecture
- Assessing Other Architectures
- Introduction to Operational Policies for High Availability
- Service Level Management for High Availability
- Planning Capacity to Promote High Availability
- Change Management for High Availability
- Backup and Recovery Planning for High Availability
- Disaster Recovery Planning
- Planning Scheduled Outages
- Staff Training for High Availability
- Documentation as a Means of Maintaining High Availability
- Physical Security Policies and Procedures for High Availability
- Overview of System Configuration Recommendations
- Recommendations for Configuring Storage
- Ensure That All Hardware Components Are Fully Redundant and Fault-Tolerant
- Use an Array That Can Be Serviced Online
- Mirror and Stripe for Protection and Performance
- Load-Balance Across All Physical Interfaces
- Create Independent Storage Areas
- Storage Recommendations for Specific HA Architectures
- Define ASM Disk and Failure Groups Properly
- Use HARD-Compliant Storage for the Greatest Protection Against Data Corruption
- Storage Recommendation for RAC
- Protect the Oracle Cluster Registry and Voting Disk From Media Failure
- Recommendations for Configuring Server Hardware
- Server Hardware Recommendations for All Architectures
- Use Fewer, Faster, and Denser Components
- Use Redundant Hardware Components
- Use Systems That Can Detect and Isolate Failures
- Protect the Boot Disk With a Backup Copy
- Server Hardware Recommendations for RAC
- Use a Supported Cluster System to Run RAC
- Choose the Proper Cluster Interconnect
- Server Hardware Recommendations for Data Guard
- Use Identical Hardware for Every Machine at Both Sites
- Recommendations for Configuring Server Software
- Server Software Recommendations for All Architectures
- Use the Same OS Version, Patch Level, Single Patches, and Driver Versions
- Use an Operating System That is Fault-Tolerant to Hardware Failures
- Configure Swap Partititions Appropriately
- Set Operating System Parameters to Enable Future Growth
- Use Logging or Journal File Systems
- Mirror Disks That Contain Oracle and Application Software
- Server Software Recommendations for RAC
- Use Supported Clustering Software
- Use Network Time Protocol (NTP) On All Cluster Nodes
- Recommendations for Configuring the Network
- Network Configuration Best Practices for All Architectures
- Ensure That All Network Components Are Redundant
- Use Load Balancers to Distribute Incoming Requests
- Network Configuration Best Practices for RAC
- Classify Network Interfaces Using the Oracle Interface Configuration Tool
- Network Configuration Best Practices for Data Guard
- Configure System TCP Parameters Appropriately
- Use WAN Traffic Managers to Provide Site Failover Capabilities
- Configuration Best Practices for the Database
- Use Two Control Files
- Set CONTROL_FILE_RECORD_KEEP_TIME Large Enough
- Configure the Size of Redo Log Files and Groups Appropriately
- Multiplex Online Redo Log Files
- Enable ARCHIVELOG Mode
- Enable Block Checksums
- Enable Database Block Checking
- Log Checkpoints to the Alert Log
- Use Fast-Start Checkpointing to Control Instance Recovery Time
- Capture Performance Statistics About Timing
- Use Automatic Undo Management
- Use Locally Managed Tablespaces
- Use Automatic Segment Space Management
- Use Temporary Tablespaces and Specify a Default Temporary Tablespace
- Use Resumable Space Allocation
- Use a Flash Recovery Area
- Enable Flashback Database
- Set Up and Follow Security Best Practices
- Use the Database Resource Manager
- Use a Server Parameter File
- Configuration Best Practices for Real Application Clusters
- Register All Instances with Remote Listeners
- Do Not Set CLUSTER_INTERCONNECTS Unless Required for Scalability
- Configuration Best Practices for Data Guard
- Use a Simple, Robust Archiving Strategy and Configuration
- Use Multiplexed Standby Redo Logs and Configure Size Appropriately
- Enable FORCE LOGGING Mode
- Use Real Time Apply
- Configure the Database and Listener for Dynamic Service Registration
- Tune the Network in a WAN Environment
- Determine the Data Protection Mode
- Determining the Protection Mode
- Changing the Data Protection Mode
- Conduct a Performance Assessment with the Proposed Network Configuration
- Use a LAN or MAN for Maximum Availability or Maximum Protection Modes
- Set SYNC=NOPARALLEL/PARALLEL Appropriately
- Use ARCH for the Greatest Performance Throughput
- Use the ASYNC Attribute with a 50 MB Buffer for Maximum Performance Mode
- Evaluate SSH Port Forwarding with Compression
- Set LOG_ARCHIVE_LOCAL_FIRST to TRUE
- Provide Secure Transmission of Redo Data
- Set DB_UNIQUE_NAME
- Set LOG_ARCHIVE_CONFIG Correctly
- Recommendations for the Physical Standby Database Only
- Tune Media Recovery Performance
- Recommendations for the Logical Standby Database Only
- Use Supplemental Logging and Primary Key Constraints
- Set the MAX_SERVERS Initialization Parameter
- Increase the PARALLEL_MAX_SERVERS Initialization Parameter
- Set the TRANSACTION_CONSISTENCY Initialization Parameter
- Skip SQL Apply for Unnecessary Objects
- Configuration Best Practices for MAA
- Configure Multiple Standby Instances
- Configure Connect-Time Failover for Network Service Descriptors
- Recommendations for Backup and Recovery
- Use Recovery Manager to Back Up Database Files
- Understand When to Use Backups
- Perform Regular Backups
- Initial Data Guard Environment Set-Up
- Recovering from Data Failures Using File or Block Media Recovery
- Double Failure Resolution
- Long-Term Backups
- Use an RMAN Recovery Catalog
- Use the Autobackup Feature for the Control File and SPFILE
- Use Incrementally Updated Backups to Reduce Restoration Time
- Enable Change Tracking to Reduce Backup Time
- Create Database Backups on Disk in the Flash Recovery Area
- Create Tape Backups from the Flash Recovery Area
- Determine Retention Policy and Backup Frequency
- Configure the Size of the Flash Recovery Area Properly
- In a Data Guard Environment, Back Up to the Flash Recovery Area on All Sites
- During Backups, Use the Target Database Control File as the RMAN Repository
- Regularly Check Database Files for Corruption
- Periodically Test Recovery Procedures
- Back Up the OCR to Tape or Offsite
- Recommendations for Fast Application Failover
- Configure Connection Descriptors for All Possible Production Instances
- Use RAC Availability Notifications and Events
- Use Transparent Application Failover If RAC Notification Is Not Feasible
- New Connections
- Existing Connections
- LOAD_BALANCE Parameter in the Connection Descriptor
- FAILOVER Parameter in the Connection Descriptor
- SERVICE_NAME Parameter in the Connection Descriptor
- RETRIES Parameter in the Connection Descriptor
- DELAY Parameter in the Connection Descriptor
- Configure Services
- Configure CRS for High Availability
- Configure Service Callouts to Notify Middle-Tier Applications and Clients
- Publish Standby or Nonproduction Services
- Publish Production Services
- Overview of Monitoring and Detection for High Availability
- Using Enterprise Manager for System Monitoring
- Set Up Default Notification Rules for Each System
- Use Database Target Views to Monitor Health, Availability, and Performance
- Use Event Notifications to React to Metric Changes
- Use Events to Monitor Data Guard system Availability
- Managing the HA Environment with Enterprise Manager
- Check Enterprise Manager Policy Violations
- Use Enterprise Manager to Manage Oracle Patches and Maintain System Baselines
- Use Enterprise Manager to Manage Data Guard Targets
- Highly Available Architectures for Enterprise Manager
- Recommendations for an HA Architecture for Enterprise Manager
- Protect the Repository and Processes As Well as the Configuration They Monitor
- Place the Management Repository in a RAC Instance and Use Data Guard
- Configure At Least Two Management Service Processes and Load Balance Them
- Consider Hosting Enterprise Manager on the Same Hardware as an HA System
- Monitor the Network Bandwidth Between Processes and Agents
- Unscheduled Outages for Enterprise Manager
- Additional Enterprise Manager Configuration
- Configure a Separate Listener for Enterprise Manager
- Install the Management Repository Into an Existing Database
- Recovery Steps for Unscheduled Outages
- Recovery Steps for Unscheduled Outages on the Primary Site
- Recovery Steps for Unscheduled Outages on the Secondary Site
- Recovery Steps for Scheduled Outages
- Recovery Steps for Scheduled Outages on the Primary Site
- Recovery Steps for Scheduled Outages on the Secondary Site
- Preparing for Scheduled Secondary Site Maintenance
- Summary of Recovery Operations
- Complete or Partial Site Failover
- Complete Site Failover
- Partial Site Failover: Middle-Tier Applications Connect to a Remote Database Server
- Database Failover
- When to Use Data Guard Failover
- When Not to Use Data Guard Failover
- Data Guard Failover Using SQL*Plus
- Physical Standby Failover Using SQL*Plus
- Logical Standby Failover Using SQL*Plus
- Database Switchover
- When to Use Data Guard Switchover
- When Not to Use Data Guard Switchover
- Data Guard Switchover Using SQL*Plus
- Physical Standby Switchover Using SQL*Plus
- Logical Standby Switchover Using SQL*Plus
- RAC Recovery
- RAC Recovery for Unscheduled Outages
- Automatic Instance Recovery for Failed Instances
- Single Node Failure in Real Application Clusters
- Multiple Node Failures in Real Application Clusters
- Automatic Service Relocation
- RAC Recovery for Scheduled Outages
- Disabling CRS-Managed Resources
- Planned Service Relocation
- Apply Instance Failover
- Performing an Apply Instance Failover Using SQL*Plus
- Step 1: Ensure That the Chosen Standby Instance is Mounted
- Step 2: Verify Oracle Net Connection to the Chosen Standby Host
- Step 3: Start Recovery on the Chosen Standby Instance
- Step 4: Copy Archived Redo Logs to the New Apply Host
- Step 5: Verify the New Configuration
- Recovery Solutions for Data Failures
- Detecting and Recovering From Datafile Block Corruption
- Detecting Datafile Block Corruption
- Recovering From Datafile Block Corruption
- Determine the Extent of the Corruption Problem
- Replace or Move Away From Faulty Hardware
- Determine Which Objects Are Affected
- Decide Which Recovery Method to Use
- Recovering From Media Failure
-
- Determine the Extent of the Media Failure
- Replace or Move Away From Faulty Hardware
- Decide Which Recovery Action to Take
- Recovery Methods for Data Failures
- Use RMAN Datafile Media Recovery
- Use RMAN Block Media Recovery
- Re-Create Objects Manually
- Use Data Guard to Recover From Data Failure
- Recovering from User Error with Flashback Technology
- Resolving Row and Transaction Inconsistencies
- Flashback Query
- Flashback Version Query
- Flashback Transaction Query
- Example: Using Flashback Technology to Investigate Salary Discrepancy
- Resolving Table Inconsistencies
- Flashback Table
- Flashback Drop
- Resolving Database-Wide Inconsistencies
- Flashback Database
- Using Flashback Database to Repair a Dropped Tablespace
- RAC Rolling Upgrade
- Applying a Patch with opatch
- Rolling Back a Patch with opatch
- Using opatch to List Installed Software Components and Patches
- Recommended Practices for RAC Rolling Upgrades
- Upgrade with Logical Standby Database
- Online Object Reorganization
- Online Table Reorganization
- Online Index Reorganization
- Online Tablespace Reorganization
- Restoring Full Tolerance
- Restoring Failed Nodes or Instances in a RAC Cluster
- Recovering Service Availability
- Considerations for Client Connections After Restoring a RAC Instance
- Restoring the Standby Database After a Failover
- Restoring a Physical Standby Database After a Failover
- Step 1P: Retrieve STANDBY_BECAME_PRIMARY_SCN
- Step 2P: Flash Back the Previous Production Database
- Step 3P: Mount New Standby Database From Previous Production Database
- Step 4P: Archive to New Standby Database From New Production Database
- Step 5P: Start Managed Recovery
- Step 6P: Restart MRP After It Encounters the End-of-Redo Marker
- Restoring a Logical Standby Database After a Failover
- Step 1L: Retrieve END_PRIMARY_SCN
- Step 2L: Flash Back the Previous Production Database
- Step 3L: Open New Logical Standby Database and Start SQL Apply
- Restoring Fault Tolerance after Secondary Site or Clusterwide Scheduled Outage
- Step 1: Start the Standby Database
- Step 2: Start Recovery
- Step 3: Verify Log Transport Services on Production Database
- Step 4: Verify that Recovery is Progressing on Standby Database
- Step 5: Restore Production Database Protection Mode
- Restoring Fault Tolerance after a Standby Database Data Failure
- Step 1: Fix the Cause of the Outage
- Step 2: Restore the Backup of Affected Datafiles
- Step 3: Restore Required Archived Redo Log Files
- Step 4: Start the Standby Database
- Step 5: Start Recovery or Apply
- Step 6: Verify Log Transport Services On the Production Database
- Step 7: Verify that Recovery or Apply Is Progressing On the Standby Database
- Step 8: Restore Production Database Protection Mode
- Restoring Fault Tolerance After the Production Database Has Opened Resetlogs
- Scenario 1: SCN on Standby is Behind Resetlogs SCN on Production
- Scenario 2: SCN on Standby is Ahead of Resetlogs SCN on Production
- Restoring Fault Tolerance after Dual Failures
- Preventing Data Corruptions with HARD-Compliant Storage
- Data Corruptions
- Types of Data Corruption Addressed by HARD
- Possible HARD Checks
- SPFILE Samples
- Oracle Net Configuration Files
- SQLNET.ORA File Example for All Hosts Using Dynamic Instance Registration
- LISTENER.ORA File Example for All Hosts Using Dynamic Instance Registration
- TNSNAMES.ORA File Example for All Hosts Using Dynamic Instance Registration