8 Fast Connection Failover

The Fast Connection Failover mechanism depends on the Implicit Connection Cache documented in Chapter 7, " Implicit Connection Caching ". As a result, for Fast Connection Failover to be available, implicit connection caching must be enabled.

This chapter is divided into the following sections:

Introduction
Using Fast Connection Failover
Understanding Fast Connection Failover
Comparison of Fast Connection Failover and TAF

Introduction

Fast Connection Failover offers a driver-independent way for your JDBC application to take advantage of the connection failover facilities offered by 10g Release 1 (10.1). The advantages of Fast Connection Failover include:

Driver independence. Fast Connection Failover supports both the Thin and OCI JDBC drivers.
Integration with the Implicit Connection Cache. The two features work together synergistically to improve application performance and high availability.
Integration with RAC for superior RAC/HA event notification mechanisms.
Easy integration with application code. You simply enable Fast Connection Failover and forget it.

What Can Fast Connection Failover Do?

Rapid detection and cleanup of invalid cached connections (DOWN event processing).
Load balancing of available connections (UP event processing).
Runtime work request distribution to all active RAC instances

Using Fast Connection Failover

Applications manage fast connection failover through DataSource instances.

Fast Connection Failover Prerequisites

Fast connection failover is available under the following circumstances:

The implicit connection cache is enabled. Fast Connection Failover works in conjunction with the JDBC connection caching mechanism. This helps applications manage connections to ensure high availability.
The application uses service names to connect to the database; the application cannot use service IDs.
The underlying database has Release 10 (10.1) Real Application Clusters (RAC) capability. If failover events are not propagated, connection failover cannot occur.
Oracle Notification Service (ONS) is configured and available on the node where JDBC is running. JDBC depends on ONS to propagate database events and notify JDBC of them.
The JVM in which your JDBC instance is running must have oracle.ons.oraclehome set to point to your ORACLE_HOME.

Configuring ONS For Fast Connection Failover

In order for Fast Connection Failover to work, you must configure Oracle Notification Service (ONS) correctly. ONS is shipped as part of 10g Release 1 (10.1).

ONS Configuration File

ONS configuration is controlled by the ONS configuration file, ORACLE_HOME/opmn/conf/ons.config. This file tells the ONS daemon details about how it should behave and who it should talk to. Configuration information within ons.config is defined in simple name/value pairs. There are three values that should always be configured within ons.config. The first is localport, the port that ONS binds to on the localhost interface to talk to local clients. An example of the localport configuration is:

localport=4100

The second value is remoteport, the port that ONS binds to on all interfaces for talking to other ONS daemons. An example of the remoteport configuration is:

remoteport=4200

The third value specifies nodes, a list of other ONS daemons to talk to. Node values are given as a comma-separated list of either hostnames or IP addresses plus ports. Note that the port value that is given is the remote port that each ONS instance is listening on. In order to maintain an identical file on all nodes, the host :port of the current ONS node can also be listed in the nodes list. It will be ignored when reading the list.

The nodes listed in the nodes line correspond to the individual nodes in the RAC cluster. Listing the nodes ensures that the mid-tier node can communicate with the RAC nodes. At least one mid-tier node and one node in the RAC cluster must be configured to see one another. As long as one node on each side is aware of the other, all nodes are visible. You need not list every single cluster and mid-tier node in each RAC node's ONS config file. In particular, if one RAC cluster node is aware of the mid-tier, all nodes in the cluster are aware of it.

An example of the nodes configuration is:

nodes=myhost.example.com:4200,123.123.123.123:4200

There are also several optional values that can be provided in ons.config.The first optional value is a loglevel. This specifies the level of messages that should be logged by ONS. This value is an integer that ranges from 1 (least messages logged) to 9 (most messages logged, use only for debugging purposes). The default value is 3. An example is:

loglevel=3

The second optional value is a logfile name. This specifies a log file that ONS should use for logging messages. The default value for logfile is $ORACLE_HOME/opmn/logs/ons.log. An example is:

logfile=/private/oraclehome/opmn/logs/myons.log

The third optional value is a walletfile name. A wallet file is used by the Oracle SSL layer to store SSL certificates. If a wallet file is specified to ONS, it will use SSL when communicating with other ONS instances and require SSL certificate authentication from all ONS instances that try to connect to it. This means that if you want to turn on SSL for one ONS instance, you must turn it on for all instances that are connected. This value should point to the directory where your ewallet.p12 file is located. An example is:

walletfile=/private/oraclehome/opmn/conf/ssl.wlt/default

One optional value is reserved for use on the server side. useocr=on is used to tell ONS to store all RAC nodes and port numbers in Oracle Cluster Registry (OCR) instead of in the ONS configuration file. Do not use this option on the client side.

The ons.config file allows blank lines and comments on lines that begin with #.

Client-side ONS Configuration

On the client side, you must configure the RAC nodes in the ONS configuration file. A sample configuration file might look like this:

Example 8-1 ons.config file

# This is an example ons.config file
#
# The first three values are required
localport=4100
remoteport=4200
nodes=racnode1.example.com:4200,racnode2.example.com:4200

After configuring ONS, you start the ONS daemon with the onsctl command. It is the user's responsibility to make sure that an ONS daemon is running at all times.

Using the oncstl Command

After configuring, you use ORACLE_HOME/opmn/bin/onsctl to start, stop, reconfigure, and monitor the ONS daemon. Table 8-1 is a summary of the commands that onsctl supports.

Table 8-1 onsctl commands

Command	Effect	Output
`start`	starts the ONS daemon	onsctl: ons started
`stop`	stops the ONS daemon	onsctl: shutting down ons daemon...
`ping`	Verifies whether the ONS daemon is running	ons is running ...
`reconfig`	triggers a reload of the ONS configuration without shutting down the ONS daemon	(Note: intentionally did not doc debug.)
`help`	prints a help summary message for onsctl
`detailed`	prints a detailed help message for onsctl

Server-side ONS Configuration Using racgons

You configure the server side by using racgons to add the mid-tier node information to OCR. This command is found in ORA_CRS_HOME/bin/racgons. Before using racgons, you must edit ons.config to set useocr=on.

The mid-tier node(s) should be configured in OCR so that all nodes share the configuration, and no matter which RAC nodes are up they can communicate to the mid-tier. When running on a cluster, always configure the ONS hosts and ports not by using the ONS configuration files but using racgons. The racgons command stores the ONS hosts and ports in OCR, where every node can see it. That way, you don't need to edit a file on every node to change the configuration, just run a single command on one of the cluster nodes.

The racogns command allows you to specify hosts and ports on one node, then propagate your changes among all nodes in a cluster. The command takes two forms:

racgons add_config hostname:port [hostname:port] [hostname:port] ...
racgons remove_config hostname[:port] [hostname:port] [hostname:port] ...

The add_config version adds the listed hostname(s), the remove_config version removes them. Both commands propagate the changes among all instances in a cluster. NEXT RELEASE ADD EXAMPLE!

If multiple port numbers are configured for a host, the specified port number is removed from hostname. If only hostname is specified, all port numbers for that host are removed.

Other Uses of racgons

You should run racgons whenever you add a new node to the cluster.

Enabling Fast Connection Failover

An application enables fast connection failover by invoking setFastConnectionFailoverEnabled(true); on a DataSource instance before retrieving any connections from that instance.

You cannot enable Fast Connection Failover when reinitializing a connection cache; you must enable it before using the OracleDataSource.

Note:

After a cache is Fast Connection Failover-enabled, you cannot disable Fast Connection Failover during the lifetime of that cache.

To enable fast connection failover, you must:

Configure and start ONS. If ONS is not correctly set up, implicit connection cache creation fails and an ONSException is thrown at the first getConnection() request. See "Configuring ONS For Fast Connection Failover".
Set the FastConnectionFailoverEnabled property before making the first getConnection() request to an OracleDataSource. When Fast Connection Failover is enabled, the failover applies to all connections in the connection cache. If your application explicitly creates a connection cache using the Connection Cache Manager, you must first set FastConnectionFailoverEnabled before retrieving any connections.
Use a service name rather than a SID when setting the OracleDataSource url property.

Example 8-2 Enabling Fast Connection Failover

// declare datasource
ods.setUrl(
"jdbc:oracle:oci:@(DESCRIPTION=
  (ADDRESS=(PROTOCOL=TCP)(HOST=cluster_alias)
    (PORT=1521))
    (CONNECT_DATA=(SERVICE_NAME=service_name)))");
ods.setUser("scott");
ods.setConnectionCachingEnabled(true);
ods.setFastConnectionFailoverEnabled(true):
ctx.bind("myDS",ods);
ds=(OracleDataSource) ctx.lookup("MyDS");
try {
 dx.getConnection();  // transparently creates and accesses cache
 catch (SQLException SE {
  }
 catch (ONSException ONS {
  // Work can continue here, but cache is not FCF enabled
 }

}
...

Querying Fast Connection Failover Status

An application determines whether fast connection failover is enabled by calling OracleDataSource.getFastConnectionFailoverEnabled(), which returns true if failover is enabled, false otherwise.

Understanding Fast Connection Failover

After Fast Connection Failover is enabled, the mechanism is automatic; no application intervention is needed. This section discusses how a connection failover is presented to an application and what steps the application takes to recover.

What The Application Sees

When a RAC service failure is propagated to the JDBC application, the database has already rolled back the local transaction. The cache manager then cleans up all invalid connections. When an application holding an invalid connection tries to do work through that connection, it receives a SQLException ORA-17008, Closed Connection.

When an application receives a Closed Connection error message, it should:

Retry the connection request. This is essential, because the old connection is no longer open.
Replay the transaction. All work done before the connection was closed has been lost.

Note:

The application should not try to roll back the transaction; the transaction was already rolled back in the database by the time the application received the exception.

What's Happening

Under Fast Connection Failover, each connection in the cache maintains a mapping to a service, instance, database, and hostname.

When a database generates a RAC event, that event is forwarded to the virtual machine in which JDBC is running. A daemon thread inside the virtual machine receives the RAC event and passes it on to the connection cache manager. The connection cache manager then throws SQL exceptions to the applications affected by the RAC event.

A typical failover scenario might work like this:

A database instance fails, leaving several stale connections in the cache.
The RAC mechanism in the database generates a RAC event which is sent to the virtual machine containing JDBC.
The daemon thread inside the virtual machine finds all the connections affected by the RAC event, notifies them of the closed connection via SQL exceptions, and rolls back any open transactions.
Each individual connection receives a SQL exception and must retry.

Comparison of Fast Connection Failover and TAF

Fast Connection Failover differs from TAF in the following ways:

Application-Level Connection Retries. Fast Connection Failover supports application-level connection retries. This gives the application control of responding to connection failovers: the application can choose whether to retry the connection or to rethrow the exception. TAF supports connection retries only at the OCI/Net layer.
Integration with the Connection Cache. Fast Connection Failover is well-integrated with the Implicit Connection Cache, which allows the connection cache manager to manage the cache for high availability. For example, failed connections are automatically invalidated in the cache. TAF works at the network level on a per-connection basis, which means that the connection cache cannot be notified of failures.
Event-Based. Fast Connection Failover is based on the RAC event mechanism. This means that Fast Connection Failover is efficient and detects failures quickly for both active and inactive connections. TAF is based on the network call mechanism.
Load-Balancing Support. Fast Connection Failover supports UP event load balancing of connections and runtime work request distribution across active RAC instances.

Note:
We do not recommend using Transparent Application Failover (TAF) and Fast Application Failover in the same application.