Thursday, October 31, 2019

Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip (Doc ID 1210883.1)

In this Document
Purpose
Scope
Details
 
 Case 1: Single Private Network Adapter
 Case 2: Multiple Private Network Adapters
 2.1. Default Status
 2.2. When Private Network Adapter Fails
 2.3. When Another Private Network Adapter Fails
 2.4. When Private Network Adapter Restores
 
 Miscellaneous
 HAIP Log File
 L1. Log Sample When Private Network Adapter Fails
 L2. Log Sample When Private Network Adapter Restores
  
 Known Issues
References

APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.2 and later
Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Cloud Machine - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Information in this document applies to any platform.

PURPOSE

This document is intended to explain what is ora.cluster_interconnect.haip resource in 11gR2 Grid Infrastructure.

SCOPE


DETAILS

Redundant Interconnect without any 3rd-party IP failover technology (bond, IPMP or similar) is supported natively by Grid Infrastructure starting from 11.2.0.2.  Multiple private network adapters can be defined either during the installation phase or afterward using the oifcfg.  Oracle Database, CSS, OCR, CRS, CTSS, and EVM components in 11.2.0.2 employ it automatically.

Grid Infrastructure can activate a maximum of four private network adapters at a time even if more are defined. The ora.cluster_interconnect.haip resource will start one to four link local  HAIP on private network adapters for interconnect communication for Oracle RAC, Oracle ASM, and Oracle ACFS etc.

Grid automatically picks free link local addresses from reserved 169.254.*.* subnet for HAIP. According to RFC-3927, link local subnet 169.254.*.* should not be used for any other purpose. With HAIP, by default, interconnect traffic will be load balanced across all active interconnect interfaces, and corresponding HAIP address will be failed over transparently to other adapters if one fails or becomes non-communicative. .

After GI is configured, more private network interface can be added with "<GRID_HOME>/bin/oifcfg setif" command. The number of HAIP addresses is decided by how many private network adapters are active when Grid comes up on the first node in the cluster .  If there's only one active private network, Grid will create one; if two, Grid will create two; and if more than two, Grid will create four HAIPs. The number of HAIPs won't change even if more private network adapters are activated later, a restart of clusterware on all nodes is required for the number to change, however, the newly activated adapters can be used for fail over purpose. 


NOTE: If using the 11.2.0.2 (and above) Redundant Interconnect/HAIP feature (as documented in CASE 2 below) - At present it is REQUIRED that all interconnect interfaces be placed on separate subnets.  If the interfaces are all on the same subnet and the cable is pulled from the first NIC in the routing table a rebootless-restart or node reboot will occur. 

At the time of this writing, redundant private network requires different subnet for each network adapter, for example, if eth1, eth2 and eth3 are used for private network, each should be on different subnet, Refer to Case 2.

When Oracle Clusterware is fully up, resource haip should show status of ONLINE:
$GRID_HOME/bin/crsctl stat res -t -init
..
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE      <node1>

Case 1: Single Private Network Adapter

If multiple physical network adapters are bonded together at the OS level and presented as a single device name, for example bond0, it's still considered a single network adapter environment. Single private network adapter does not offer true HAIP, as there's only one adapter, at least two is recommended to gain true HAIP. If only one private network adapter is defined, such as eth1 in the example below, one virtual IP will be created by HAIP. Here is what's expected when Grid is up and running:
$GRID_HOME/bin/oifcfg getif
eth1  10.x.x.128  global  cluster_interconnect
eth3  10.1.x.x  global  public

$GRID_HOME/bin/oifcfg iflist -p -n
eth1  10.x.x.128  PRIVATE  255.255.255.128
eth1  169.254.0.0  UNKNOWN  255.255.0.0
eth3  10.1.x.x  PRIVATE  255.255.255.128

Note: 1. subnet 169.254.0.0 on eth1 is started by resource haip; 2. refer to note 1386709.1 for explanation of the output

ifconfig
..
eth1      Link encap:Ethernet  HWaddr 00:16:3E:11:11:22
          inet addr:10.x.x.168  Bcast:10.1.0.255  Mask:255.255.255.128
          inet6 addr: fe80::216:3eff:fe11:1122/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6369306 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4270790 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3037449975 (2.8 GiB)  TX bytes:2705797005 (2.5 GiB)

eth1:1    Link encap:Ethernet  HWaddr 00:16:3E:11:22:22
          inet addr:169.254.x.x  Bcast:169.254.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
Instance alert.log (ASM and database):

Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
  [name='eth1:1', type=1, ip=169.254.x.x, mac=00-16-3e-11-11-22, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
Public Interface 'eth3' configured from GPnP for use as a public interface.
  [name='eth3', type=1, ip=10.x.x.168, mac=00-16-3e-11-11-44, net=10.1.x.x/25, mask=255.255.255.128, use=public/1]
..
Shared memory segment for instance monitoring created
Picked latch-free SCN scheme 3
..
Cluster communication is configured to use the following interface(s) for this instance
  169.254.x.x

Note: interconnect will use virtual private IP 169.254.x.x instead of real private IP. For pre-11.2.0.2 instance, by default it will still use the real private IP; 
to take advantage of the new feature, init.ora parameter cluster_interconnects can be updated each time Grid is restarted .


For 11.2.0.2 and above, v$cluster_interconnects will show haip info:

SQL> select name,ip_address from v$cluster_interconnects;

NAME            IP_ADDRESS
--------------- ----------------
eth1:1          169.254.x.x

Case 2: Multiple Private Network Adapters

Multiple switches can be deployed if there's more than one private network adapters on each node, in case one network adapter fails, the HAIP on that network segment will be failed over to others on all nodes.

2.1. Default Status

Here is an example of 3 private networks eth1eth6 and eth7 when Grid is up and running:
$GRID_HOME/bin/oifcfg getif
eth1 10.x.x.128   global  cluster_interconnect
eth3 10.1.x.x  global  public
eth6 10.11.x.x  global  cluster_interconnect
eth7 10.12.x.x  global  cluster_interconnect

$GRID_HOME/bin/oifcfg iflist -p -n
eth1  10.x.x.128  PRIVATE  255.255.255.128
eth1  169.254.0.x  UNKNOWN  255.255.192.0
eth1  169.254.192.x  UNKNOWN  255.255.192.0
eth3  10.1.x.x  PRIVATE  255.255.255.128
eth6  10.11.x.x  PRIVATE  255.255.255.128
eth6  169.254.64.x UNKNOWN  255.255.192.0
eth7  10.12.x.x  PRIVATE  255.255.255.128
eth7  169.254.128.x  UNKNOWN  255.255.192.0

Note: resource haip started four virtual private IPs, two on eth1, and one on eth6 and eth7

ifconfig
..
eth1      Link encap:Ethernet  HWaddr 00:16:3E:11:11:22
          inet addr:10.x.x.168  Bcast:10.1.0.255  Mask:255.255.255.128
          inet6 addr: fe80::216:3eff:fe11:1122/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:15176906 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10239298 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:7929246238 (7.3 GiB)  TX bytes:5768511630 (5.3 GiB)

eth1:1    Link encap:Ethernet  HWaddr 00:16:3E:11:11:22
          inet addr:169.254.x.x  Bcast:169.254.63.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1:2    Link encap:Ethernet  HWaddr 00:16:3E:11:11:22
          inet addr:169.254.x.x  Bcast:169.254.255.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth6      Link encap:Ethernet  HWaddr 00:16:3E:11:11:77
          inet addr:10.11.x.x  Bcast:10.11.0.255  Mask:255.255.255.128
          inet6 addr: fe80::216:3eff:fe11:1177/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:7068185 errors:0 dropped:0 overruns:0 frame:0
          TX packets:595746 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2692567483 (2.5 GiB)  TX bytes:382357191 (364.6 MiB)

eth6:1    Link encap:Ethernet  HWaddr 00:16:3E:11:11:77
          inet addr:169.254.x.x  Bcast:169.254.127.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth7      Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:10.12.x.x  Bcast:10.12.0.255  Mask:255.255.255.128
          inet6 addr: fe80::216:3eff:fe11:1188/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6435829 errors:0 dropped:0 overruns:0 frame:0
          TX packets:314780 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2024577502 (1.8 GiB)  TX bytes:172461585 (164.4 MiB)

eth7:1    Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:169.254.x.x  Bcast:169.254.191.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

Instance alert.log (ASM and database):

Private Interface 'eth1:1'
 configured from GPnP for use as a private interconnect.
  [name='eth1:1', type=1, ip=169.254.xx.xx, mac=00-16-3e-11-11-22, net=169.254.x.0/18, mask=255.255.192.0, use=haip:cluster_interconnect/62]
Private Interface 'eth6:1' configured from GPnP for use as a private interconnect.
  [name='eth6:1', type=1, ip=169.254.xx.xx, mac=00-16-3e-11-11-77, net=169.254.x.0/18, mask=255.255.192.0, use=haip:cluster_interconnect/62]
Private Interface 'eth7:1' configured from GPnP for use as a private interconnect.
  [name='eth7:1', type=1, ip=169.254.x.x, mac=00-16-3e-11-11-88, net=169.254.x.0/18, mask=255.255.192.0, use=haip:cluster_interconnect/62]
Private Interface 'eth1:2' configured from GPnP for use as a private interconnect.
  [name='eth1:2', type=1, ip=169.254.x.x, mac=00-16-3e-11-11-22, net=169.254.x.0/18, mask=255.255.192.0, use=haip:cluster_interconnect/62]
Public Interface 'eth3' configured from GPnP for use as a public interface.
  [name='eth3', type=1, ip=10.x.x.68, mac=00-16-3e-11-11-44, net=10.1.x.x/25, mask=255.255.255.128, use=public/1]
Picked latch-free SCN scheme 3

..
Cluster communication is configured to use the following interface(s) for this instance
  169.254.x.98
  169.254.x.250
  169.254.x.237
  169.254.x.103

Note: interconnect communication will use all four virtual private IPs; in case of network failure, as long as there is one private network adapter functioning, all four IPs will remain active.

2.2. When Private Network Adapter Fails

If one private network adapter fails, in this example eth6, virtual private IP on eth6 will be relocated automatically to a healthy adapter, and it is transparent to instances (ASM or database)
$GRID_HOME/bin/oifcfg iflist -p -n
eth1  10.x.x.128  PRIVATE  255.255.255.128
eth1  169.254.0.x  UNKNOWN  255.255.192.0
eth1  169.254.128.x  UNKNOWN  255.255.192.0
eth7  10.12.x.x  PRIVATE  255.255.255.128
eth7  169.254.64.x  UNKNOWN  255.255.192.0
eth7  169.254.192.x  UNKNOWN  255.255.192.0

Note: virtual private IP on eth6 subnet 169.254.64.x relocated to eth7

ifconfig
..
eth1      Link encap:Ethernet  HWaddr 00:16:3E:11:11:22
          inet addr:10.x.x.168  Bcast:10.1.0.255  Mask:255.255.255.128
          inet6 addr: fe80::216:3eff:fe11:1122/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:15183840 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10245071 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:7934311823 (7.3 GiB)  TX bytes:5771878414 (5.3 GiB)

eth1:1    Link encap:Ethernet  HWaddr 00:16:3E:11:11:22
          inet addr:169.254.x.x  Bcast:169.254.63.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1:3    Link encap:Ethernet  HWaddr 00:16:3E:11:11:22
          inet addr:169.254.x.x  Bcast:169.254.191.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth7      Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:10.12.x.x  Bcast:10.12.0.255  Mask:255.255.255.128
          inet6 addr: fe80::216:3eff:fe11:1188/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6438985 errors:0 dropped:0 overruns:0 frame:0
          TX packets:315877 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2026266447 (1.8 GiB)  TX bytes:173101641 (165.0 MiB)

eth7:2    Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:169.254.x.x  Bcast:169.254.127.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth7:3    Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:169.254.x.x  Bcast:169.254.255.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

2.3. When Another Private Network Adapter Fails

If another private network adapter is down, in this example eth1, virtual private IP on it will be relocated automatically to other healthy adapter with no impact on instances (ASM or database)
$GRID_HOME/bin/oifcfg iflist -p -n
eth7  10.12.x.x  PRIVATE  255.255.255.128
eth7  169.254.64.x  UNKNOWN  255.255.192.0
eth7  169.254.192.x  UNKNOWN  255.255.192.0
eth7  169.254.0.x  UNKNOWN  255.255.192.0
eth7  169.254.128.x  UNKNOWN  255.255.192.0

ifconfig
..
eth7      Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:10.12.x.x  Bcast:10.12.0.255  Mask:255.255.255.128
          inet6 addr: fe80::216:3eff:fe11:1188/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6441559 errors:0 dropped:0 overruns:0 frame:0
          TX packets:317271 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2027824788 (1.8 GiB)  TX bytes:173810658 (165.7 MiB)

eth7:1    Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:169.254.x.x  Bcast:169.254.63.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth7:2    Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:169.254.x.x  Bcast:169.254.127.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth7:3    Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:169.254.x.x  Bcast:169.254.255.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth7:4    Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:169.254.x.x  Bcast:169.254.191.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

2.4. When Private Network Adapter Restores

If private network adapter eth6 is restored, it will be activated automatically as virtual private IPs will be assigned to it:
$GRID_HOME/bin/oifcfg iflist -p -n
..
eth6  10.11.x.x  PRIVATE  255.255.255.128
eth6  169.254.128.x  UNKNOWN  255.255.192.0
eth6  169.254.0.x  UNKNOWN  255.255.192.0
eth7  10.12.x.x  PRIVATE  255.255.255.128
eth7  169.254.64.x  UNKNOWN  255.255.192.0
eth7  169.254.192.x  UNKNOWN  255.255.192.0

ifconfig
..
eth6      Link encap:Ethernet  HWaddr 00:16:3E:11:11:77
          inet addr:10.11.x.x  Bcast:10.11.0.255  Mask:255.255.255.128
          inet6 addr: fe80::216:3eff:fe11:1177/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:398 errors:0 dropped:0 overruns:0 frame:0
          TX packets:121 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:185138 (180.7 KiB)  TX bytes:56439 (55.1 KiB)

eth6:1    Link encap:Ethernet  HWaddr 00:16:3E:11:11:77
          inet addr:169.254.x.x  Bcast:169.254.191.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth6:2    Link encap:Ethernet  HWaddr 00:16:3E:11:11:77
          inet addr:169.254.x.x  Bcast:169.254.63.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth7      Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:10.12.x.x  Bcast:10.12.0.255  Mask:255.255.255.128
          inet6 addr: fe80::216:3eff:fe11:1188/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6442552 errors:0 dropped:0 overruns:0 frame:0
          TX packets:317983 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2028404133 (1.8 GiB)  TX bytes:174103017 (166.0 MiB)

eth7:2    Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:169.254.x.x  Bcast:169.254.127.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth7:3    Link encap:Ethernet  HWaddr 00:16:3E:11:11:88
          inet addr:169.254.x.x  Bcast:169.254.255.255  Mask:255.255.192.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

Miscellaneous

It's NOT supported to disable or stop HAIP while the cluster is up and running unless otherwise advised by Oracle Support/Development.
1. The feature is disabled in 11.2.0.2/11.2.0.3 if Sun Cluster exists
2. The feature does not exist in Windows 11.2.0.2/11.2.0.3
3. The feature is disabled in 11.2.0.2/11.2.0.3 if Fujitsu PRIMECLUSTER exists


4. With the fix of bug 11077756 (fixed in 11.2.0.2 GI PSU6, 11.2.0.3), HAIP will be disabled if it fails to start while running root script (root.sh or rootupgrade.sh), for more details, refer to Section bug 11077756
 
5. The feature is disabled on Solaris 11 if IPMP is used for private network. Tracking <bug 16982332 >
6. The feature is disabled on HP-UX and AIX if cluster_interconnect/"private network" is Infiniband


HAIP Log File

Resource haip is managed by ohasd.bin, resource log is located in $GRID_HOME/log/<nodename>/ohasd/ohasd.log and $GRID_HOME/log/<nodename>/agent/ohasd/orarootagent_root/orarootagent_root.log

L1. Log Sample When Private Network Adapter Fails

In a multiple private network adapter environment, if one of the adapters fails:
  • ohasd.log
2010-09-24 09:10:00.891: [GIPCHGEN][1083025728]gipchaInterfaceFail: marking interface failing 0x2aaab0269a10 { host '', haName 'CLSFRAME_a2b2', local (nil), ip '10.11.x.x', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x4d }
2010-09-24 09:10:00.902: [GIPCHGEN][1138145600]gipchaInterfaceDisable: disabling interface 0x2aaab0269a10 { host '', haName 'CLSFRAME_a2b2', local (nil), ip '10.11.x.x', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x1cd }
2010-09-24 09:10:00.902: [GIPCHDEM][1138145600]gipchaWorkerCleanInterface: performing cleanup of disabled interface 0x2aaab0269a10 { host '', haName 'CLSFRAME_a2b2', local (nil), ip '10.11.0.188', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x1ed }
  • orarootagent_root.log
2010-09-24 09:09:57.708: [ USRTHRD][1129138496] {0:0:2} failed to receive ARP request
2010-09-24 09:09:57.708: [ USRTHRD][1129138496] {0:0:2} Assigned IP 169.254.x.x no longer valid on inf eth6
2010-09-24 09:09:57.708: [ USRTHRD][1129138496] {0:0:2} VipActions::startIp {
2010-09-24 09:09:57.708: [ USRTHRD][1129138496] {0:0:2} Adding 169.254.x.x on eth6:1
2010-09-24 09:09:57.719: [ USRTHRD][1129138496] {0:0:2} VipActions::startIp }
2010-09-24 09:09:57.719: [ USRTHRD][1129138496] {0:0:2} Reassigned IP:  169.254.x.x on interface eth6
2010-09-24 09:09:58.013: [ USRTHRD][1082325312] {0:0:2} HAIP:  Updating member info HAIP1;10.11.x.x#0;10.11.x.x#1
2010-09-24 09:09:58.015: [ USRTHRD][1082325312] {0:0:2} HAIP:  Moving ip '169.254.x.x' from inf 'eth6' to inf 'eth7'
2010-09-24 09:09:58.015: [ USRTHRD][1082325312] {0:0:2} pausing thread
2010-09-24 09:09:58.015: [ USRTHRD][1082325312] {0:0:2} posting thread
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]start {
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]start }
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} HAIP:  Moving ip '169.254.x.x' from inf 'eth1' to inf 'eth7'
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} pausing thread
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} posting thread
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]start {
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]start }
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} HAIP:  Moving ip '169.254.x.x' from inf 'eth7' to inf 'eth1'
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} pausing thread
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} posting thread
2010-09-24 09:09:58.017: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]start {
2010-09-24 09:09:58.017: [ USRTHRD][1116531008] {0:0:2} [NetHAWork] thread started
2010-09-24 09:09:58.017: [ USRTHRD][1116531008] {0:0:2}  Arp::sCreateSocket {
2010-09-24 09:09:58.017: [ USRTHRD][1093232960] {0:0:2} [NetHAWork] thread started
2010-09-24 09:09:58.017: [ USRTHRD][1093232960] {0:0:2}  Arp::sCreateSocket {
2010-09-24 09:09:58.017: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]start }
2010-09-24 09:09:58.018: [ USRTHRD][1143847232] {0:0:2} [NetHAWork] thread started
2010-09-24 09:09:58.018: [ USRTHRD][1143847232] {0:0:2}  Arp::sCreateSocket {
2010-09-24 09:09:58.034: [ USRTHRD][1116531008] {0:0:2}  Arp::sCreateSocket }
2010-09-24 09:09:58.034: [ USRTHRD][1116531008] {0:0:2} Starting Probe for ip 169.254.x.x
2010-09-24 09:09:58.034: [ USRTHRD][1116531008] {0:0:2} Transitioning to Probe State
2010-09-24 09:09:58.034: [ USRTHRD][1093232960] {0:0:2}  Arp::sCreateSocket }
2010-09-24 09:09:58.035: [ USRTHRD][1093232960] {0:0:2} Starting Probe for ip 169.254.x.x
2010-09-24 09:09:58.035: [ USRTHRD][1093232960] {0:0:2} Transitioning to Probe State
2010-09-24 09:09:58.050: [ USRTHRD][1143847232] {0:0:2}  Arp::sCreateSocket }
2010-09-24 09:09:58.050: [ USRTHRD][1143847232] {0:0:2} Starting Probe for ip 169.254.x.x
2010-09-24 09:09:58.050: [ USRTHRD][1143847232] {0:0:2} Transitioning to Probe State
2010-09-24 09:09:58.231: [ USRTHRD][1093232960] {0:0:2}  Arp::sProbe {
2010-09-24 09:09:58.231: [ USRTHRD][1093232960] {0:0:2} Arp::sSend:  sending type 1
2010-09-24 09:09:58.231: [ USRTHRD][1093232960] {0:0:2}  Arp::sProbe }

2010-09-24 09:10:04.879: [ USRTHRD][1116531008] {0:0:2}  Arp::sAnnounce {
2010-09-24 09:10:04.879: [ USRTHRD][1116531008] {0:0:2} Arp::sSend:  sending type 1
2010-09-24 09:10:04.879: [ USRTHRD][1116531008] {0:0:2}  Arp::sAnnounce }
2010-09-24 09:10:04.879: [ USRTHRD][1116531008] {0:0:2} Transitioning to Defend State
2010-09-24 09:10:04.879: [ USRTHRD][1116531008] {0:0:2} VipActions::startIp {
2010-09-24 09:10:04.879: [ USRTHRD][1116531008] {0:0:2} Adding 169.254.x.x on eth7:2
2010-09-24 09:10:04.880: [ USRTHRD][1116531008] {0:0:2} VipActions::startIp }
2010-09-24 09:10:04.880: [ USRTHRD][1116531008] {0:0:2} Assigned IP:  169.254.x.x on interface eth7

2010-09-24 09:10:05.150: [ USRTHRD][1143847232] {0:0:2}  Arp::sAnnounce {
2010-09-24 09:10:05.150: [ USRTHRD][1143847232] {0:0:2} Arp::sSend:  sending type 1
2010-09-24 09:10:05.150: [ USRTHRD][1143847232] {0:0:2}  Arp::sAnnounce }
2010-09-24 09:10:05.150: [ USRTHRD][1143847232] {0:0:2} Transitioning to Defend State
2010-09-24 09:10:05.150: [ USRTHRD][1143847232] {0:0:2} VipActions::startIp {
2010-09-24 09:10:05.151: [ USRTHRD][1143847232] {0:0:2} Adding 169.254.x.x on eth1:3
2010-09-24 09:10:05.151: [ USRTHRD][1143847232] {0:0:2} VipActions::startIp }
2010-09-24 09:10:05.151: [ USRTHRD][1143847232] {0:0:2} Assigned IP:  169.254.x.x on interface eth1
2010-09-24 09:10:05.470: [ USRTHRD][1093232960] {0:0:2}  Arp::sAnnounce {
2010-09-24 09:10:05.470: [ USRTHRD][1093232960] {0:0:2} Arp::sSend:  sending type 1
2010-09-24 09:10:05.470: [ USRTHRD][1093232960] {0:0:2}  Arp::sAnnounce }
2010-09-24 09:10:05.470: [ USRTHRD][1093232960] {0:0:2} Transitioning to Defend State
2010-09-24 09:10:05.470: [ USRTHRD][1093232960] {0:0:2} VipActions::startIp {
2010-09-24 09:10:05.471: [ USRTHRD][1093232960] {0:0:2} Adding 169.254.x.x on eth7:3
2010-09-24 09:10:05.471: [ USRTHRD][1093232960] {0:0:2} VipActions::startIp }
2010-09-24 09:10:05.471: [ USRTHRD][1093232960] {0:0:2} Assigned IP:  169.254.x.x on interface eth7
2010-09-24 09:10:06.047: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]stop {
2010-09-24 09:10:06.282: [ USRTHRD][1129138496] {0:0:2} [NetHAWork] thread stopping
2010-09-24 09:10:06.282: [ USRTHRD][1129138496] {0:0:2} Thread:[NetHAWork]isRunning is reset to false here
2010-09-24 09:10:06.282: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]stop }
2010-09-24 09:10:06.282: [ USRTHRD][1082325312] {0:0:2} VipActions::stopIp {
2010-09-24 09:10:06.282: [ USRTHRD][1082325312] {0:0:2} NetInterface::sStopIp {
2010-09-24 09:10:06.282: [ USRTHRD][1082325312] {0:0:2} Stopping ip '169.254.x.x', inf 'eth6', mask '10.11.x.x'
2010-09-24 09:10:06.288: [ USRTHRD][1082325312] {0:0:2} NetInterface::sStopIp }
2010-09-24 09:10:06.288: [ USRTHRD][1082325312] {0:0:2} VipActions::stopIp }
2010-09-24 09:10:06.288: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]stop {
2010-09-24 09:10:06.298: [ USRTHRD][1131239744] {0:0:2} [NetHAWork] thread stopping
2010-09-24 09:10:06.298: [ USRTHRD][1131239744] {0:0:2} Thread:[NetHAWork]isRunning is reset to false here
2010-09-24 09:10:06.298: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]stop }
2010-09-24 09:10:06.298: [ USRTHRD][1082325312] {0:0:2} VipActions::stopIp {

2010-09-24 09:10:06.298: [ USRTHRD][1082325312] {0:0:2} NetInterface::sStopIp {
2010-09-24 09:10:06.298: [ USRTHRD][1082325312] {0:0:2} Stopping ip '169.254.x.x', inf 'eth7', mask '10.12.x.x'
2010-09-24 09:10:06.299: [ USRTHRD][1082325312] {0:0:2} NetInterface::sStopIp }
2010-09-24 09:10:06.299: [ USRTHRD][1082325312] {0:0:2} VipActions::stopIp }
2010-09-24 09:10:06.299: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]stop {
2010-09-24 09:10:06.802: [ USRTHRD][1133340992] {0:0:2} [NetHAWork] thread stopping
2010-09-24 09:10:06.802: [ USRTHRD][1133340992] {0:0:2} Thread:[NetHAWork]isRunning is reset to false here
2010-09-24 09:10:06.802: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]stop }
2010-09-24 09:10:06.802: [ USRTHRD][1082325312] {0:0:2} VipActions::stopIp {
2010-09-24 09:10:06.802: [ USRTHRD][1082325312] {0:0:2} NetInterface::sStopIp {
2010-09-24 09:10:06.802: [ USRTHRD][1082325312] {0:0:2} Stopping ip '169.254.x.x', inf 'eth1', mask '10.1.x.x'
2010-09-24 09:10:06.802: [ USRTHRD][1082325312] {0:0:2} NetInterface::sStopIp }
2010-09-24 09:10:06.802: [ USRTHRD][1082325312] {0:0:2} VipActions::stopIp }
2010-09-24 09:10:06.803: [ USRTHRD][1082325312] {0:0:2} USING HAIP[  0 ]:  eth7 - 169.254.112.x
2010-09-24 09:10:06.803: [ USRTHRD][1082325312] {0:0:2} USING HAIP[  1 ]:  eth1 - 169.254.178.x
2010-09-24 09:10:06.803: [ USRTHRD][1082325312] {0:0:2} USING HAIP[  2 ]:  eth7 - 169.254.244.x
2010-09-24 09:10:06.803: [ USRTHRD][1082325312] {0:0:2} USING HAIP[  3 ]:  eth1 - 169.254.30.x

Note: from above, even only NIC eth6 failed, there could be multiple virtual private IP movement among surviving NICs
  • ocssd.log
2010-09-24 09:09:58.314: [ GIPCNET][1089964352] gipcmodNetworkProcessSend: [network]  failed send attempt endp 0xe1b9150 [0000000000000399] { gipcEndpoint : localAddr 'udp://10.11.x.x:60169', remoteAddr '', numPend 5, numReady 1, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x2, usrFlags 0x4000 }, req 0x2aaab00117f0 [00000000004b0cae] { gipcSendRequest : addr 'udp://10.11.x.x:41486', data 0x2aaab0050be8, len 80, olen 0, parentEndp 0xe1b9150, ret gipcretEndpointNotAvailable (40), objFlags 0x0, reqFlags 0x2 }
2010-09-24 09:09:58.314: [ GIPCNET][1089964352] gipcmodNetworkProcessSend: slos op  :  sgipcnValidateSocket
2010-09-24 09:09:58.314: [ GIPCNET][1089964352] gipcmodNetworkProcessSend: slos dep :  Invalid argument (22)
2010-09-24 09:09:58.314: [ GIPCNET][1089964352] gipcmodNetworkProcessSend: slos loc :  address not
2010-09-24 09:09:58.314: [ GIPCNET][1089964352] gipcmodNetworkProcessSend: slos info:  addr '10.11.x.x:60169', len 80, buf 0x2aaab0050be8, cookie 0x2aaab00117f0
2010-09-24 09:09:58.314: [GIPCXCPT][1089964352] gipcInternalSendSync: failed sync request, ret gipcretEndpointNotAvailable (40)
2010-09-24 09:09:58.314: [GIPCXCPT][1089964352] gipcSendSyncF [gipchaLowerInternalSend : gipchaLower.c : 755]: EXCEPTION[ ret gipcretEndpointNotAvailable (40) ]  failed to send on endp 0xe1b9150 [0000000000000399] { gipcEndpoint : localAddr 'udp://10.11.x.x:60169', remoteAddr '', numPend 5, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x2, usrFlags 0x4000 }, addr 0xe4e6d10 [00000000000007ed] { gipcAddress : name 'udp://10.11.x.x:41486', objFlags 0x0, addrFlags 0x1 }, buf 0x2aaab0050be8, len 80, flags 0x0
2010-09-24 09:09:58.314: [GIPCHGEN][1089964352] gipchaInterfaceFail: marking interface failing 0xe2bd5f0 { host '<node2>', haName 'CSS_a2b2', local 0x2aaaac2098e0, ip '10.11.x.x:41486', subnet '10.11.0.128', mask '255.255.255.128', numRef 0, numFail 0, flags 0x6 }
2010-09-24 09:09:58.314: [GIPCHALO][1089964352] gipchaLowerInternalSend: failed to initiate send on interface 0xe2bd5f0 { host '<node2>', haName 'CSS_a2b2', local 0x2aaaac2098e0, ip '10.11.x.x:41486', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x86 }, hctx 0xde81d10 [0000000000000010] { gipchaContext : host '<node1>', name 'CSS_a2b2', luid '4f06f2aa-00000000', numNode 1, numInf 3, usrFlags 0x0, flags 0x7 }
2010-09-24 09:09:58.326: [GIPCHGEN][1089964352] gipchaInterfaceDisable: disabling interface 0x2aaaac2098e0 { host '', haName 'CSS_a2b2', local (nil), ip '10.11.x.x', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 1, flags 0x14d }
2010-09-24 09:09:58.326: [GIPCHGEN][1089964352] gipchaInterfaceDisable: disabling interface 0xe2bd5f0 { host '<node2>', haName 'CSS_a2b2', local 0x2aaaac2098e0, ip '10.11.x.x:41486', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x86 }
2010-09-24 09:09:58.327: [GIPCHALO][1089964352] gipchaLowerCleanInterfaces: performing cleanup of disabled interface 0xe2bd5f0 { host '<node2>', haName 'CSS_a2b2', local 0x2aaaac2098e0, ip '10.11.x.x:41486', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0xa6 }
2010-09-24 09:09:58.327: [GIPCHGEN][1089964352] gipchaInterfaceReset: resetting interface 0xe2bd5f0 { host '<node2>', haName 'CSS_a2b2', local 0x2aaaac2098e0, ip '10.11.x.x:41486', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0xa6 }
2010-09-24 09:09:58.338: [GIPCHDEM][1089964352] gipchaWorkerCleanInterface: performing cleanup of disabled interface 0x2aaaac2098e0 { host '', haName 'CSS_a2b2', local (nil), ip '10.11.x.x', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x16d }
2010-09-24 09:09:58.338: [GIPCHTHR][1089964352] gipchaWorkerUpdateInterface: created remote interface for node '<node2>', haName 'CSS_a2b2', inf 'udp://10.11.x.x:41486'
2010-09-24 09:09:58.338: [GIPCHGEN][1089964352] gipchaWorkerAttachInterface: Interface attached inf 0xe2bd5f0 { host '<node2>', haName 'CSS_a2b2', local 0x2aaaac2014f0, ip '10.11.x.x:41486', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x6 }
2010-09-24 09:10:00.454: [    CSSD][1108904256]clssnmSendingThread: sending status msg to all nodes

Note: from above, ocssd.bin won't fail as long as there's at least one private network adapter is working

L2. Log Sample When Private Network Adapter Restores

In a multiple private network adapter environment, if one of the failed adapters becomes restored:
  • ohasd.log
2010-09-24 09:14:30.962: [GIPCHGEN][1083025728]gipchaNodeAddInterface: adding interface information for inf 0x2aaaac1a53d0 { host '', haName 'CLSFRAME_a2b2', local (nil), ip '10.11.x.x', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x41 }
2010-09-24 09:14:30.972: [GIPCHTHR][1138145600]gipchaWorkerUpdateInterface: created local bootstrap interface for node '<node1>', haName 'CLSFRAME_a2b2', inf 'mcast://230.0.1.0:42424/10.11.x.x'
2010-09-24 09:14:30.972: [GIPCHTHR][1138145600]gipchaWorkerUpdateInterface: created local interface for node '<node1>', haName 'CLSFRAME_a2b2', inf '10.11.x.x:13235'
  • ocssd.log
2010-09-24 09:14:30.961: [GIPCHGEN][1091541312] gipchaNodeAddInterface: adding interface information for inf 0x2aaab005af00 { host '', haName 'CSS_a2b2', local (nil), ip '10.11.x.x', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x41 }
2010-09-24 09:14:30.972: [GIPCHTHR][1089964352] gipchaWorkerUpdateInterface: created local bootstrap interface for node '<node1>', haName 'CSS_a2b2', inf 'mcast://230.0.1.0:42424/10.11.x.x'
2010-09-24 09:14:30.972: [GIPCHTHR][1089964352] gipchaWorkerUpdateInterface: created local interface for node '<node1>', haName 'CSS_a2b2', inf '10.11.x.x:10884'
2010-09-24 09:14:30.972: [GIPCHGEN][1089964352] gipchaNodeAddInterface: adding interface information for inf 0x2aaab0035490 { host '<node2>', haName 'CSS_a2b2', local (nil), ip '10.21.x.x', subnet '10.12.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x42 }
2010-09-24 09:14:30.972: [GIPCHGEN][1089964352] gipchaNodeAddInterface: adding interface information for inf 0x2aaab00355c0 { host '<node2>', haName 'CSS_a2b2', local (nil), ip '10.11.x.x', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x42 }
2010-09-24 09:14:30.972: [GIPCHTHR][1089964352] gipchaWorkerUpdateInterface: created remote interface for node '<node2>', haName 'CSS_a2b2', inf 'mcast://230.0.1.0:42424/10.12.x.x'
2010-09-24 09:14:30.972: [GIPCHGEN][1089964352] gipchaWorkerAttachInterface: Interface attached inf 0x2aaab0035490 { host '<node2>', haName 'CSS_a2b2', local 0x2aaab005af00, ip '10.12.x.x', subnet '10.12.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x46 }
2010-09-24 09:14:30.972: [GIPCHTHR][1089964352] gipchaWorkerUpdateInterface: created remote interface for node '<node2>', haName 'CSS_a2b2', inf 'mcast://230.0.1.0:42424/10.11.x.x'
2010-09-24 09:14:30.972: [GIPCHGEN][1089964352] gipchaWorkerAttachInterface: Interface attached inf 0x2aaab00355c0 { host '<node2>', haName 'CSS_a2b2', local 0x2aaab005af00, ip '10.11.x.x', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x46 }
2010-09-24 09:14:31.437: [GIPCHGEN][1089964352] gipchaInterfaceDisable: disabling interface 0x2aaab00355c0 { host '<node2>', haName 'CSS_a2b2', local 0x2aaab005af00, ip '10.11.x.x', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x46 }
2010-09-24 09:14:31.437: [GIPCHALO][1089964352] gipchaLowerCleanInterfaces: performing cleanup of disabled interface 0x2aaab00355c0 { host '<node2>', haName 'CSS_a2b2', local 0x2aaab005af00, ip '10.11.x.x', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x66 }
2010-09-24 09:14:31.446: [GIPCHGEN][1089964352] gipchaInterfaceDisable: disabling interface 0x2aaab0035490 { host '<node2>', haName 'CSS_a2b2', local 0x2aaab005af00, ip '10.12.x.x', subnet '10.12.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x46 }
2010-09-24 09:14:31.446: [GIPCHALO][1089964352] gipchaLowerCleanInterfaces: performing cleanup of disabled interface 0x2aaab0035490 { host '<node2>', haName 'CSS_a2b2', local 0x2aaab005af00, ip '10.12.x.x', subnet '10.12.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x66 }

 

Known Issues


Refer to note 1640865.1 for known HAIP issues in 11gR2/12c Grid Infrastructure

New TFA, ORAchk and EXAchk version 19.3 released

New Release
TFA, ORAchk & EXAchk 19.3
Now Available

ORAchk, EXAchk & TFA Combined into Autonomous Health Framework (AHF)

Oracle used to deliver three separate tools:
  • ORAchk
  • EXAchk
  • Trace File Analyzer (TFA)
These are now combined into a single installer called Autonomous Health Framework (AHF).
This single platform specific installer for TFA, ORAchk and EXAchk can be installed either by root (recommended) or not-root users. The installer includes and builds on all functionality of the previous tools.
There is no change to the command-line tools, the same commands you used before will still work with this version.
orachk, exachk and tfactl can be found in the AHF_LOC/bin directory.
Engineered systems have AHF_LOC/bin/exachk and non-engineered systems have AHF_LOC/bin/orachk
TFA, ORAchk and EXAchk remain, a value add-on to the existing support contract. There is no extra fee or license required for use.

Automatic masking or sanitizing of sensitive information

After copies of diagnostic data are collected, TFA, ORAchk and EXAchk use Adaptive Classification and Redaction (ACR) to sanitize sensitive data in the collections.
ACR uses a machine learning based engine to redact a pre-defined set of entity types in a given set of files. ACR also sanitizes or masks entities that occur in path names.
  • Sanitization replaces a sensitive value with random characters
  • Masking replaces a sensitive value with a series of asterixes
ACR currently sanitizes the following entity types:
  • Host names
  • IP addresses
  • MAC addresses
  • Oracle Database names
  • Tablespace names
  • Service names
  • Ports
  • Operating system user names
ORAchk/EXAchk sanitization will convert sensitive data to a string of random characters.
To sanitize ORAchk/EXAchk output include the -sanitize option, e.g.:orachk -profile asm -sanitize
You can also sanitize post process by passing in an existing log, html report or zip file, i.e.: orachk -sanitize {file_name}
TFA diagnostic collections can be redacted (sanitized or masked). To enable automatic redaction use tfactl set redact=[mask|sanitize|none] (default is none)
Alternatively collections can be redacted on-demand e.g.: tfactl diagcollect -srdc ORA-00600 -mask or tfactl diagcollect -srdc ORA-00600 -sanitize
If you want to reverse lookup a sanitized value, use orachk/exachk -rmap e.g.: orachk -rmap pu406jKxg,kEvGFDT will print the real values associated with those sanitized values
Note: orachk -rmap can also be used to lookup a value sanitized by TFA.

Problem repair automation options

ORAchk and EXAchk now have the capability to automatically fix problems when found.
Certain checks have a repair command associated with them.
If you want to see what the repair command actually does you can use orachk -showrepair {check_id}
To run the repair commands include one of the following options:
orachk -repaircheck all
orachk -repaircheck {check_id},[{check_id},{check_id}..]
orachk -repaircheck {file}
Where {check_id} refers to specific check(s) you want to repair or {file} contains a list of check ids to be repaired

Performance improvements in diagnostic collection & DBA tools

TFA indexes diagnostic data, which is used for DBA tools and diagnostic collections. The indexing has been changed to improve the performance.
This change results in lower CPU usage and faster average execution times for diagnostic collections and the running of DBA tools such as ls, grep, tail, vi etc.
If you do not use the DBA tools and are prepared to wait longer for DBA tool execution to complete, you can disable this indexing by running
tfactl set indexInventory=false
Additionally when TFA is run on an Exadata machine cell collection now takes place in parallel and uses EXAchk to call diagpack for improved performance.

New Service Request Data Collections (SRDCs)

This release includes new SRDCs.
As with all other SRDCs use tfactl diagcollect -srdc srdc_name.
  • ahf: Oracle ORAchk or Oracle EXAchk problems (to be run after running with -debug)
  • dbacl: Problems with Access Control Lists (ACLs)
  • dbaqgen: Problems in an Oracle Advanced Queuing Environment
  • dbaqmon: Queue Monitor (QMON) problems
  • dbaqnotify: Notification problems in an Oracle Advanced Queuing Environment
  • dbaqperf: Performance problems in an Oracle Advanced Queuing Environment
  • dbparameters: Oracle Database single instance shutdown problems
  • emagtpatchdeploy: Enterprise Manager 13c Agent patch deployment problems
  • emagentperf: Enterprise Manager 13c Agent performance problems
  • emagentstartup: Enterprise Manager 13c Agent startup problems
  • emfleetpatching: Enterprise Manager Fleet Maintenance Patching problems
  • empatchplancrt: Enterprise Manager patch plan creation problems
  • exservice: Exadata: Storage software service or offload server service problems
  • ORA-25319: for ORA-25319 problems
  • ORA-01000: for ORA-01000 problems
  • ORA-00018: for ORA-00018 problems

ORAchk and EXAchk integration with Database AutoUpgrade & Cluster Verification Utility (CVU)

Both CVU and the database AutoUpgrade tool (in analyze mode) are run by ORAchk & EXAchk in -preupgrade mode
All findings from AutoUpgrade and CVU are cross verified to avoid duplication and contradiction.
Results are included in the ORAchk/EXAchk report output.
CVU checks are only run when a CVU version of 11.2.0.4 or greater if found on the system.
CVU related options:
  • -cvuhome where to find the CVU installation
  • -cvuonly only run CVU checks

ORAchk support for ODA X8 machines

ORAchk now provides support for ODA X8 machines.

ORAchk and EXAchk support for generic asymmetrical oracle_homes

ORAchk and EXAchk used to require all oracle homes to be present on all database servers. If they were not you would see issues such as skipped checks.
ORAchk and EXAchk now support asymmetrical oracle_homes, so it does not require the same oracle_home to exist on each node of the cluster.

EXAchk support for Oracle Exadata using RoCE InfiniBand

RDMA over Converged Ethernet (RoCE) is a network protocol that allows remote direct memory access (RDMA) over an Ethernet network.
EXAchk release 19.3 adds support for RoCE InfiniBand for Oracle Exadata. There are no checks for InfiniBand switches and fabrics when run on RoCE InfiniBand for Oracle Exadata. If the -profile switch is used, it will throw an error saying it is not a supported profile.

How to change Hostname / IP for a Grid Infrastructure Oracle Restart Standalone Configuration (SIHA) 11.2 and Later (Doc ID 1552810.1)

In this Document
Goal
Solution
Community Discussions
References


APPLIES TO:

Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Oracle Database Exadata Cloud Machine - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Oracle Database Backup Service - Version N/A and later
Information in this document applies to any platform.

GOAL

This document provides (in detail) the required & mandatory steps that need to be executed after the hostname was updated/changed/modified.
If you intend to change only the IP address then we dont need to do anything such as deconfig/reconfig.

SOLUTION

 Therefore, please perform the next steps after the hostname was updated/changed/modified in the Oracle Restart configuration:
1) Configure the CSS & OHAS services as root user as follows:
# <Grid Infrastructure Oracle Home>/crs/install/roothas.pl -deconfig -forceFor 18c<Grid Infrastructure Oracle Home>/crs/install/roothas.sh -deconfig -force
# <Grid Infrastructure Oracle Home>/crs/install/roothas.pl
# cd <12.1 Grid infrastructure Oracle Home>/crs/install
# perl roothas.sh -deconfig -force
--This removes any configuration on the system that referenced the old host name.
# cd <12.1 Grid infrastructure Oracle Home>
# ./root.sh
For 12.2,<12.2 Grid infrastructure Oracle Home>/crs/install/roothas.pl                                                                                                                                          Go to the grid home's bin directory. Use the srvctl add database command with the -c SINGLE flag to add the database in an Oracle Restart configuration.
Also use the srvctl add command to add the listener, the Oracle ASM instance, all Oracle ASM disk groups, and any database services to the Oracle Restart configuration.

2) Please perform the next steps as oracle or grid OS user (as the Grid Infrastructure OS owner): 
$> <Grid Infrastructure Oracle Home>/bin/crsctl modify resource "ora.cssd" -init -attr "AUTO_START=1"  -unsupported
 NOTE: "-unsupported" is not required for 11.2 version

3) Restart the OHAS stack as grid or oracle OS user (as the Grid Infrastructure OS owner):
$> <Grid Infrastructure Oracle Home>/bin/crsctl stop has

$> <Grid Infrastructure Oracle Home>/bin/crsctl start has

4) Check the CSS & OHAS state as grid or oracle OS user (as the Grid Infrastructure OS owner):
$> <Grid Infrastructure Oracle Home>/bin/crsctl check has

$> <Grid Infrastructure Oracle Home>/bin/crsctl check css

$> <Grid Infrastructure Oracle Home>/bin/ crsctl stat resource

$> <Grid Infrastructure Oracle Home>/bin/crsctl stat res -t

Note: If the CSS & OHAS service did NOT start, then you will need to reboot the Linux/unix box and check them again.

5) Recreate the default listener (LISTENER) using port 1521 (or using your desired port), thru the NETCA GUI located on the new Grid Infrastructure Oracle Home (or manually if you do not have graphical access) as grid or oracle OS user (as the Grid Infrastructure OS owner):
$> srvctl add listener

$> srvctl start listener

6) Please create the init+ASM.ora file on the <Grid Infrastructure Oracle Home>/dbs directory with the next parameters:
asm_diskgroups= <list of diskgroups>

asm_diskstring= '/dev/oracleasm/disks/*'  

instance_type='asm'

large_pool_size=12M

7) Add the ASM instance as grid or oracle OS user (as the Grid Infrastructure OS owner):
$> <Grid Infrastructure Oracle Home>/bin/srvctl add asm
$> <12.1 Grid Infrastructure Oracle Home>/bin/srvctl add asm

8) Enable ASM instance Auto Start as grid or oracle OS user (as the Grid Infrastructure OS owner) as follow: 
$> <Grid Infrastructure Oracle Home>/bin/crsctl modify resource "ora.asm" -init -attr "AUTO_START=1"  -unsupported
 NOTE: "-unsupported" is not required for 11.2 version

9) Make sure the disks are discovered by kfod as grid or oracle OS user (as the Grid Infrastructure OS owner) as follow:

Example:
$> <Grid Infrastructure Oracle Home>/bin/kfod asm_diskstring='ORCL:*' disks=all

Or
$> <Grid Infrastructure Oracle Home>/bin/kfod asm_diskstring='<full path ASM disks location>/*' disks=all

10) If so, then startup the ASM instance as grid or oracle OS user (as the Grid Infrastructure OS owner) as follow:
$> export ORACLE_SID=+ASM

$> <Grid Infrastructure Oracle Home>/bin/sqlplus “/as sysasm”

SQL> startup pfile=init+ASM.ora --#init file from point #6

SQL> show parameter asm

11) Validate that the candidate disks are being discovered:
SQL> select  path  from v$asm_disk;

12) Create a new ASM instance spfile:
SQL> create spfile from pfile;

13) Add the new ASM spfile and listener to the new ASM instance resource:
$> <Grid Infrastructure Oracle Home>/bin/srvctl modify asm  -p <spfile full path>

$> <Grid Infrastructure Oracle Home>/bin/srvctl modify asm  -l LISTENER

14) Validate the OHAS (Oracle Restart) services start as follows: 
$> <Grid Infrastructure Oracle Home>/bin/crsctl  stop has

$> <Grid Infrastructure Oracle Home>/bin/crsctl  start has

$> <Grid Infrastructure Oracle Home>/bin/crsctl  stat res

$> <Grid Infrastructure Oracle Home>/bin/crsctl  stat res -t



Community Discussions

Still have questions? Use the communities window below to search for similar discussions or start a new discussion on this subject. (Window is the live community not a screenshot)
Click here to open in main browser window

REFERENCES

NOTE:363609.1 - Preparing For Changing the IP Address, Hostname or Domain Of Oracle Database Servers
NOTE:1645523.1 - Alternative Procedure To Upgrade ASM From Release 10.2 Or 11.1 To Release 11.2.0.# or To 12.1.0.1 On Unix/Linux Configurations (Standalone) Using ASM Role separation.
NOTE:986740.1 - How to Reconfigure Oracle Restart
NOTE:1434351.1 - Alternative Way To Upgrade An ASM Standalone Configuration From Release 11.2.0.<#> to release 11.2.0.<#>.
NOTE:293678.1 - How To Reconfigure DB Control After a Hostname, Domainname or Listener Change Has Occurred On The Server
NOTE:1422517.1 - Reconfiguring & Recreating The 11gR2/12cR1 Restart/OHAS/SIHA Stack Configuration (Standalone) / How to Reconfigure Oracle Restart
NOTE:887658.1 - Reconfigure HAS and CSS for nonRAC ASM on 11.2
https://docs.oracle.com/database/121/LADBI/app_ts.htm#LADBI7947

Database Options/Management Packs Usage Reporting for Oracle Databases 11.2 and later (Doc ID 1317265.1)

  Database Options/Management Packs Usage Report You can determine whether an option is currently in use in a database by running options_pa...