OEM: The ILOM server is currently offline or unreachable on the network.

Hi all!
Just got an alarm from OEM with this message. How to check it?
– First thing is to be able to connect on ILOM from DBNode.
– From there we can test the IPv4 and/or IPv6 interfaces through ping, as pe shown below.

This is also documented as per this Doc: Oracle Integrated Lights Out Manager (ILOM) 3.0 HTML Documentation Collection – Test IPv4 or IPv6 Network Configuration (CLI)

In my case, it was only a false alarm, as I was able to connect to other DBNodes from this ILOM:

[root@greporasrv01db01 ~]# ssh greporasrv01-ilom.jcrew.com
The authenticity of host 'greporasrv01-ilom.grepora.com (10.48.18.64)' can't be established.
RSA key fingerprint is 59:c5:9f:b1:60:59:15:16:94:c8:94:88:7b:4e:52:57.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'greporasrv01-ilom.grepora.com' (RSA) to the list of known hosts.
Password: 

Oracle(R) Integrated Lights Out Manager

Version 3.2.9.23 r116695

Copyright (c) 2017, Oracle and/or its affiliates. All rights reserved.

Warning: HTTPS certificate is set to factory default.

Hostname: greporasrv01-ilom

-> show /SP/network

 /SP/network
    Targets:
        interconnect
        ipv6
        test

    Properties:
        commitpending = (Cannot show property)
        dhcp_clientid = none
        dhcp_server_ip = none
        ipaddress = 10.50.12.64
        ipdiscovery = static
        ipgateway = 10.50.12.1
        ipnetmask = 255.255.255.0
        macaddress = 00:10:E0:95:73:E6
        managementport = MGMT
        outofbandmacaddress = 00:10:E0:95:73:E6
        pendingipaddress = 10.50.12.64
        pendingipdiscovery = static
        pendingipgateway = 10.50.12.1
        pendingipnetmask = 255.255.255.0
        pendingmanagementport = MGMT
        pendingvlan_id = (none)
        sidebandmacaddress = 00:10:E0:95:73:E7
        state = ipv4-only
        vlan_id = (none)

    Commands:
        cd
        set
        show

-> cd /SP/network/test
/SP/network/test

-> show

 /SP/network/test
    Targets:

    Properties:
        ping = (Cannot show property)
        ping6 = (Cannot show property)

    Commands:
        cd
        set
        show

-> set ping=10.50.12.51       -- DBNode1
Ping of 10.50.12.51 succeeded

-> set ping=10.50.12.52       -- DBNode2
Ping of 10.50.12.52 succeeded

 

Verifying topology of Exadata

Hey all!
Some time ago I needed to check topology of a client’s Exadata due a network issue and made a very useful note. Sharing with you now. ūüėÄ

This and other cool commands can be found here: Network Diagnostics information for Oracle Database Machine Environments (Doc ID 1053498.1)

# /opt/oracle.SupportTools/ibdiagtools/verify-topology -t quarterrack

Newer versions don’t require -t option.
In case of halfrack, -t halfrack should be used in my case.

Ok, but how to know it? You can have it from here:

[root@greporaexa onecommand]# grep -i MACHINETYPES databasemachine.xml
[MACHINETYPES]X4-2 Eighth Rack HC 4TB[/MACHINETYPES]

Hope it helps! ūüôā

Exacheck: The bundle patch version installed does not match the bundle patch version registered in the database

Hi all!
So, running Exacheck on a recently created database, found this error:

 FAIL => The bundle patch version installed does not match the bundle patch version registered in the database: [host]:[sid],...

This means that a boundle patch with sqlpatch was applied to OH and not to this database. Happens because Exacheck try to match the patch info stored in oraInventory with the patch info stored in dba_registry_sqlpatch.

Also note in some situations, running datapatch may require the database to be in upgrade mode and if you are patching Exadata , which is generally a RAC based environment, you need to set the cluster_database=false and at least 1 job_queue_process before starting the database using startup upgrade command. This should be described in readme on related patch.

When checking for this, I found a really interesting validation script here. As per:

opatch_bp=$($ORACLE_HOME/OPatch/opatch lspatches 2>/dev/null|grep -iwv javavm|grep -wi database|head -1|awk -F';' '{print $1}') 
database_bp_status=$(echo -e "set heading off feedback off timing off \n select STATUS from dba_registry_sqlpatch where PATCH_ID = $opatch_bp;"|$ORACLE_HOME/bin/sqlplus -s " / as sysdba" | sed -e '/^ *$/d')
if [ "$database_bp_status" == SUCCESS ]
then
      echo "SUCCESS: Bundle patch installed in the database matches the software home and is installed successfully."
else
      echo "FAILURE: Bundle patch installed in the database does not match the software home, or is installed with errors." 
 fi

To fix, just set environment variables to correct database, go to $ORACLE_HOME/OPatch and run:

More“Exacheck: The bundle patch version installed does not match the bundle patch version registered in the database”

Exadata: 7 Useful Commands to check Port/Sensor Alarms

Hello all!

This days I had an alarm with message below:

Message=The aggregate sensor /SYS/CABLE_CONN_STAT has a fault.

There is some useful commands I used to verify all ports/sensors in my exadata cluster.

In summary, these commands:
1) Use Intelligent Platform Management Interface (IPMI) to read the Sensor Data Record (SDR) repository
2) Use Intelligent Platform Management Interface (IPMI) to view the ILOM SP System Event Log (SEL)
3) Display all host nodes with ibhosts
4) Use ibcheckstate to scan InfiniBand fabric and validate the port logical and physical state
5) Use ibcheckerrors to scan InfiniBand fabric and validate the connectivity as described in the topology file
6) Checking for sensor healthy from switch
7) Check the overall health of the InfiniBand switch, on the Exadata switch itself

The Commands are:

More“Exadata: 7 Useful Commands to check Port/Sensor Alarms”

Exadata: ORA-07445: exception encountered: core dump [ocl_lock_get_waitobj_owner()+26] [11] [0x000000000] [] [] []

Hello all,

This is because the error is generated by an unpublished bug 17891564, as per described in MOS ORA-7445 [ocl_lock_get_waitobj_owner] on an Exadata storage cell (Doc ID 1906366.1).

It affects Exadata storage cell with image version between 11.2.1.2.0 and 11.2.3.3.0. The CELLSRV process crash with this error as per:

Cellsrv encountered a fatal signal 11
Errors in file /opt/oracle/cell11.2.3.3.0_LINUX.X64_131014.1/log/diag/asm/cell//trace/svtrc_11711_27.trc  (incident=257):
ORA-07445: exception encountered: core dump [ocl_lock_get_waitobj_owner()+26] [11] [0x000000000] [] [] []
Incident details in: /opt/oracle/cell11.2.3.3.0_LINUX.X64_131014.1/log/diag/asm/cell//incident/incdir_257/svtrc_11711_27_i257.trc

The CELLSRV process should auto restart after this error.

More“Exadata: ORA-07445: exception encountered: core dump [ocl_lock_get_waitobj_owner()+26] [11] [0x000000000] [] [] []”

Infiniband Error: Cable is present on Port “X” but it is polling for peer port

Facing this error? Let me guess: Ports 03, 05, 06, 08, 09 and 12 are alerting? You have a Quarter Rack? Have recently installed Exadata plugin to version 12.1.0.3 or higher?
Don’t panic!

This is probably related to¬†Bug 15937297 : EM 12C HAS ERRORS CABLE IS PRESENT ON PORT ‘N’ BUT IT IS POLLING FOR PEER PORT. The full message might be like “Cable is present on Port 6 but it is polling for peer port. This could happen when the peer port is unplugged/disabled“.

In fact, the bug was closed as not a bug. ūüôā
As part of the 12.1.0.3 Exadata plugin, the IB switch ports are now checked for non-terminated cables. So these errors ‘polling for peer port’ are the expected behavior. ¬†Once¬†‘polling for peer port’ is an enhanced feature of the 12.1.0.3 plugin, this explains why you most likely did not see these errors until you upgraded the OMS to 12.1.0.2 and then updated the plugins.

In Quarter Racks, the following ports 3, 5, 6, 8, 9 and 12 are usually cabled ahead of time, but not terminated. In some racks port 32 may also be unterminated. Checking for incident in OEM you might see something like this image:

newscreenshot-2016-12-26-as-20-03-50

More“Infiniband Error: Cable is present on Port “X” but it is polling for peer port”