HAIP Subnet Fliping Upon Link Down/Up Event

HAIP (High Availability IP) is Oracle’s private network redundancy feature in Grid Infrastructure. When a cluster has multiple private network interfaces, HAIP automatically bonds them, providing failover if one link goes down. Bug 29379299 reveals a critical flaw in how HAIP handles link recovery.

What happens:

  1. Both private network links are UP. HAIP is healthy.
  2. Link A goes DOWN. HAIP correctly fails over to Link B. All traffic continues on Link B.
  3. Link A comes back UP.
  4. Bug: Instead of staying on Link B (the current active link), HAIP “flips” — it attempts to reassign traffic to Link A, triggering a brief HAIP reconfiguration. During this reconfiguration, cluster interconnect traffic is interrupted, which can cause:
    • False node evictions (CSS thinks a node is dead due to missed heartbeats)
    • ORA-29740 (evicted by member) in database alert logs
    • Brief application outages even though both links are UP

Affected versions:

Oracle Grid Infrastructure 12.1.0.2 through 18c. Fixed in 19c starting with 19.4 RU.

Workarounds for older releases:

Option 1 — Disable HAIP and use OS-level bonding (bonding/teaming at the OS layer is more stable in most environments anyway):

# In GI home, set in crsconfig_params or run during install
-haip_no_config

Option 2 — Apply the one-off patch for Bug 29379299 (available for 12.1.0.2 and 12.2.0.1 through MOS).

Recommendation: If you’re on an affected version and your cluster has had mysterious node evictions following network maintenance or link flaps, check the CRS diagnostic logs ($GRID_HOME/log/<hostname>/cssd/ocssd.log) for HAIP reconfiguration events correlated with the eviction timestamps. This bug is the likely culprit.

Discover more from grepOra

Subscribe now to keep reading and get access to the full archive.

Continue reading