Oracle Parallel Server Getting Started 
Release 8.0.5 for Windows NT 
A64425-01
 
Library
 
Product
 
Contents
 
Index
 

Prev Next

B
Troubleshooting

Specific topics covered in this chapter are:

Cluster Configuration Tips

A large fraction of cluster problems that have been reported to Oracle Corporation are due to incorrect cluster configuration, particular of the Cluster Manager (CM) and interconnect components.

The information in this section is based on Oracle Corporation's reference implementation of the cluster Operating System Dependent (OSD) modules. Consequently, some of this information may not be applicable to your particular cluster environment.

Additional Information:  

Consult with your hardware vendor for more details about installing and configuring your particular cluster configuration 

 
 

Note:  

The registry instructions in this section assume REGEDT32, not REGEDIT. 


 
 

Cluster Software

Make sure all nodes have the exact same cluster OSD software installed, as well as the same registry configuration. Software can be verified by ensuring nodes have the same time stamps and file sizes.

CM Configuration

Typically, each node in a cluster will have at least two cards, one for the corporate network and one for the cluster interconnect. A computer, however, can only have one host name associated with it. To get around this problem, a host name for the computer can be assigned just for the cluster interconnect.

To specify a host name for the cluster interconnect:

  1. For each node, ping the host name. For example,
  2. C:\> PING OPS1-NT.US.ORACLE.COM
    
    A message similar to the one below appears:
    Reply from 144.25.188.247: bytes=32 time<10ms TIL=126
    

    The IP address returned is for the corporate network, not the cluster interconnect.

  3. For each node, determine which ethernet card will be used for the cluster interconnect by entering:
  4. C:\> IPCONFIG /ALL
    

    The output looks similar to the sample shown below:

    Windows NT IP Configuration 
     
                  Host Name . . . . . . . . . : ops1-nt.us.oracle.com 
     
    Ethernet adapter El90x1: 
     
                  Description . . . . . . . . : 3Com 3C90x Ethernet Adapter 
                  IP Address. . . . . . . . . : 144.25.188.247 
     
    Ethernet adapter CpqNF31: 
     
                  Description . . . . . . . . : Compaq NetFlex-3 Driver 
                  IP Address. . . . . . . . . : 144.25.190.247
    

    In this case, the first interface is used for the corporate network, while the second interface is (144.25.190.247) is the one intended for the cluster interconnect.

  5. Specify an new host names for each node's interconnect IP address in the HOST file (SYSTEMROOT\SYSTEM32\DRIVERS\ETC\HOSTS). For example:
  6. 144.25.190.247 ops1-ipc 
    144.25.190.248 ops2-ipc 
    144.25.190.249 ops3-ipc 
    144.25.190.250 ops4-ipc

The HOSTS file should have one entry for each node's interconnect, and should be copied to all nodes of the cluster so that they can see each other. To verify that they can see each other, try pinging each host from each node. For example

C:\> PING OPS3-IPC
  • For each node, ensure the DefinedNodes value is specified in HKEY_LOCAL_MACHINE>SOFTWARE>ORACLE>OSD>CM. DefinedNodes specifies the member nodes in the cluster.
  • DefinedNodes: REG_MULTI_SZ: ops1-ipc ops4-ipc ops5-ipc ops2-ipc

    Note:  

    DefinedNodes must be of value class REG_MULTI_SZ, and each host name entry must be entered on a separate line in the Multi-String Editor dialog box. 


     
     
  • For each node, ensure the CmHostName value is specified in HKEY_LOCAL_MACHINE>SOFTWARE>ORACLE>OSD>CM. CmHostName specifies the node's interconnect host name.
  • CmHostName: REG_SZ: ops1-ipc

    Cluster Configuration Verification

    To verify your cluster configuration:

    1. Start PGMS each node:
    2. Check the bottom of PGMS.LOG file stored in SYSTEMROOT\SYSTEM32\PGMS.LOG to ensure that each time a node is brought up, PGMS reconfigures with the correct number of nodes. For example, if two nodes are up, the following should be in the log file:
    3. 15:06:46 | MESSAGE | 006f | HandleReconfig(): Reconfig OK - nodes(2) 
      rcfgGen(5) master(0)
    1. If you are unable to bring up PGMS, check your cluster configuration to make sure that it is correct.

    CM Troubleshooting

    During normal operation, CM on each node checks in with one another to ensure the health of each member. These check-ins occur at interval of N in milliseconds, as specified by the PollInterval registry value in HKEY_LOCAL_MACHINE>SOFTWARE>ORACLE>OSD>CM. A node is allowed to miss M check-ins before it is cast out of the cluster, as specified by the MissCount value in HKEY_LOCAL_MACHINE>SOFTWARE>ORACLE>OSD>CM.

    Failed check-ins are recorded to the CM error log file (CM.LOG).

    These check-in packets are typically UDP packets, and may be lost:

    If one of your database instances is dropping out of the cluster under heavy activity, you may see messages in CM.LOG file similar to:

    05:01:25 | MESSAGE | PollingThread(): node(1) missed(3) checkin(s) 
    05:01:27 | MESSAGE | PollingThread(): node(1) missed(5) checkin(s) 
    05:01:28 | MESSAGE | PollingThread(): node(1) failure detected

    This occurs if the check-in messages were lost because of the heavy activity. Make sure there is a dedicated interconnect for Oracle Parallel Server that is separate from the rest of the network. Slightly increasing the MissCount value may also help.


    Note:  

    MissCount * PollInterval should never be greater than 20 seconds. 


     
     

    CM Secondary Backup

    If you are using the secondary disk backup feature of the CM, try to use a partition on a disk that is not heavily used. The backup disk file is written to by every node member during each check-in. If the backup disk is heavily used, it may cause the CM to miss check-ins and falsely drop node members.


    Note:  

    If you are using the secondary disk backup feature, do not lower PollInterval beyond 500 milliseconds because every node writes to the disk backup partition every PollInterval. 


     
     

    CM Error Log File Specification

    The CM error log file (CM.LOG) is specified by the ErrorLog value in HKEY_LOCAL_MACHINE>SOFTWARE>ORACLE>OSD>CM:

    ErrorLog: REG_SZ: c:\orant\rdbms80\trace\cm.log

    Oracle Corporation recommends specifying an error log location of ORANT\RDBMS80\TRACE\CM.LOG.

    Performance and Manager Configuration Tips

    You must configure the Performance and Management (PM) module so that PGMS can determine the cluster configuration. Each OPS database corresponds to a PGMS group or domain. For example, the INITSID.ORA and INIT_COM.ORA files could have the following parmaters defined:

    INITOPS1.ORA:

    instance_number=1 
    

    INITOPS2.ORA:

    instance_number=2 
    

    INITOPS3.ORA:

    instance_number=4 
    

    INITOPS4.ORA:

    instance_number=4 
    

    INIT_COM.ORA:

    db_name=ops

    The HKEY_LOCAL_MACHINE>SOFTWARE>ORACLE>OSD>PM key would then contain:


    where:


    Note:  

    Each row entry must be entered on a separate line in the Multi-String Editor dialog box. Instance numbers must be sequential, such as 0, 1, 2. Do not skip instance numbers, such as 0, 1, 3. Also, the key name (OPS) must match the value of DB_NAME in INIT_COM.ORA 


     
     

    ORA-29702

    If the instance numbers in the PM key do not match those specified in the INITSID.ORA file, you will receive the following error in ORACLE_HOME\RDBMS80\TRACE\SIDLMON.TRC upon instance startup:

    ORA-29702: error occurred in Group Membership Service operation

    Starting Services

    If you are having difficulty starting services or the database, check the PGMS.LOG file stored in SYSTEMROOT\SYSTEM32\PGMS.LOG.

    If you used the CRTSRV script in "Step 4: Create Services" in Chapter 5, "Configuring Oracle Parallel Server", OraclePGMSService automatically starts up and shuts down when the OracleServiceSID service is started.

    If you did not use the CRTSRV script, you can still have OraclePGMSService start up automatically with a OracleServiceSID service by entering the following at the command for each node:

    C:\> OPSREG80 ADD SID 
    

    You can also discontinue the OraclePGMSService service automatic start up with OracleServiceSID service by entering the following at the command line for each node:

    C:\> OPSREG80 DEL SID

    DYNAMIC RESOURCES ALLOCATED or DYNAMIC LOCKS ALLOCATED

    The following messages appear if LM_RESS and LM_LOCKS values are not sufficient, and additional IDLM locks or resources must be allocated dynamically from the SGA:

    If these messages appear often, it may lead to SGA exhaustion. To resolve this, increase LM_RESS and LM_LOCKS parameters appropriately based on your database needs to avoid exhausting the SGA.

    Additional Information:  

    See Chapter 15, "Allocating PCM Instance Locks Oracle Parallel Server," of the Oracle8 Parallel Server Concepts and Administration guide. 

     
     

    Understanding the Trace Files

    This section discusses the following trace file subjects:

    Background Thread Trace Files

    Oracle Parallel Server background threads use trace files to record occurrences and exceptions of database operations, as well as errors. These detailed trace logs are helpful to Oracle support to debug problems in your cluster configuration. Background thread trace files are created regardless of whether the BACKGROUND_DUMP_DEST parameter is set in the INIT_COM.ORA initialization parameter file. If BACKGROUND_DUMP_DEST is set, the trace files are stored in the directory specified. If the parameter is not set, the trace files are stored in the ORACLE_HOME\RDBMS80\TRACE directory.

    Oracle8 database creates a different trace file for each background thread. The name of the trace file contains the name of the background thread, followed by the extension .TRC, such as:

    Oracle Parallel Server trace information is reported in the following trace files:
    Trace File  Description 

    SIDLCKN.TRC 

    Trace file for the LCKn process. This trace file shows lock request for other background processes. 

    SIDLMDN.TRC 

    Trace file for the LMDn process. This trace file shows lock requests. 

    SIDLMON.TRC 

    Trace file for the LMON process. This trace file show status of cluster, including the "Reconfiguration complete" message. 

    SIDP00N.TRC 

    Trace file for the parallel query slaves. 

     

    User Thread Trace Files

    Trace files are also created for user threads if the USER_DUMP_DEST parameter is set in the initialization parameter file. The trace files for the user threads have the form ORAXXXXX.TRC, where XXXXX is a 5-digit number indicating the Windows NT thread ID.

    Alert File

    The alert file, SIDALRT.LOG, contains important information about error messages and exceptions that occur during database operations. Each instance has one alert file; information is appended to the file each time you start the instance. All threads can write to the alert file.

    SIDALRT.LOG is found in the directory specified by the BACKGROUND_DUMP_DEST parameter in the INIT_COM.ORA initialization parameter file. If the BACKGROUND_DUMP_DEST parameter is not set, the SIDALRT.LOG file is generated in ORACLE_HOME\RDBMS80\TRACE.

    Error Call Trace Stack

    Oracle Worldwide Support may ask you to create an error call trace stack for a particular trace file. An error call trace stack provides program trace of specific background or user threads in the database.

    To create an error call trace:

    1. Obtain the Oracle proccess ID for the background processes:
    2. C:\> SVRMGR30
      SVRMGR30> CONNECT INTERNAL/PASSWORD
      SELECT PID "Oracle Process Id", 
             NAME 
          FROM V$PROCESS, V$BGPROCESS 
          WHERE V$PROCESS.ADDR = V$BGPROCESS.PADDR;

    Output displayed looks like this:

    Oracle Pro NAME 
    ---------- ----- 
             2 PMON 
             3 LMON 
             4 LMD0 
             5 DBW0 
             6 LGWR 
             7 CKPT 
             8 SMON 
             9 RECO 
            10 SNP0 
            11 SNP1 
            13 LCK0 
    
  • Dump the trace stack to the trace file. For example, to dump out the trace stack of LMON, enter:
    1. Set the Oracle process ID to LMON, which is 3 in this example:
    2. SVRMGR30> ORADEBUG SETORAPID 3
    3. Dump the error stack to SIDLMON.TRC:
    4. SVRMGR30> ORADEBUG DUMP ERRORSTACK 3

    Cluster Tracing

    CM and PGMS tracing can be helpful to Oracle Worldwide Support in debugging your cluster configuration problems in cases where the database is not starting, a particular node is hanging, or there is a node crash.

    PGMS Tracing

    PGMS tracing is stored in the PGMS log file, SYSTEMROOT\SYSTEM32\PGMS.LOG.


    Note:  

    Do not enable detailed tracing during normal database operation. 


     
     

    To enable detailed PGMS tracing:

    1. De-install the OraclePGMSService:
    2. PGMS /R
      
    3. Re-install OraclePGMSService with debug flags turned on:
    4. PGMS /I:"C:ORANT\BIN\PGMS.EXE /D /V /S"
      

      where:

      /D 

      debug tracing 

      /V 

      verbose tracing 

      /S 

      spy on PGMS network packets 

       
       

    To disable tracing:

    1. De-install the OraclePGMSService:
    2. PGMS /R
      
    3. Re-install OraclePGMSService with debug flags turned off:
    4. PGMS /I:C:"ORANT\BIN\PGMS.EXE"

    CM Tracing

    CM tracing is stored in the error log file, CM.LOG. The location of CM.LOG is defined by the ErrorLog value in HKEY_LOCAL_MACHINE>SOFTWARE>ORACLE>OSD>CM.

    To enable detailed CM tracing:

    1. Stop the CMSRVR.EXE by rebooting the node.
    2. Specify the CMSrvrpath value in HKEY_LOCAL_MACHINE>SOFTWARE>ORACLE>OSD>CM. ErrorLog specifies the CM log file.
    3. CMSrvrpath: REG_SZ: c:\orant\osdbin\cmsrvr.exe /v /c /s 
      

      where:

      /v 

      verbose 

      /c 

      trace client request 

      /s 

      spy on CM network traffic 

       
       

    Using PhysicalDrive for Raw Partitions

    When creating symbolic links for the logical partitions with SETLINKS utility, do not use prefix \\.\PhysicalDrive. If you use \\.\PhysicalDrive as a symbolic link, you may corrupt your database files. Use the symbolic links provided in the ORALINKx.TBL file(s), as described in Chapter 5, "Configuring Oracle Parallel Server".

    SHUTDOWN ABORT

    SHUTDOWN ABORT is not recommended. Oracle Corporation recommends shutting down the OracleServiceSID service so that resources, such as memory usage or files, will be cleaned up by the Windows NT operating system correctly.

    To shut down OracleServiceSID:

    Contacting Oracle Worldwide Customer Support

    If after reading this appendix, you still cannot resolve your problems, call Oracle Worldwide Customer Support to report the error. Please have the following information at hand:

    Severe Errors

    If an ORA-600 error occurred, it will be printed to SIDALRT.LOG file. If an ORA-600 error or any other severe errors appear in the SIDALRT.LOG file, then provide all files in ORACLE_HOME\RDBMS80\TRACE and PGMS.LOG located in SYSTEMROOT\SYSTEM32.



     
    Prev
     
    Next
     
    Oracle 
    Copyright © 1998 Oracle Corporation. 
    All Rights Reserved. 
     
    Library
     
    Product
     
    Contents
     
    Index