Clear Exadata Component Messages After Maintenance

Hi all,

Quick one today: So you completed a maintenance in a component (a memory component, as per example below) but keep receiving messages of failure?

Well, try clearing all the error messages after complete the maintenance and lets check if the threshold is reached again. If so, we may need to really replace it.

How to do it? Easy:

ssh root@grepora01-ilom
-> show /SYS/MB/P0/D3
Expected:
[...]
fault_state = Faulted
[..]
-> set /SYS/MB/P0/D3 clear_fault_action=true
Are you sure you want to clear /SYS/MB/P0/D3 (y/n)? y
-> show /SYS/MB/P0/D3
[Expected]
 /SYS/MB/P0/D3
    Targets:
        PRSNT
        SERVICE
Properties:
type = DIMM
ipmi_name = MB/P0/D3
fru_name = 16384MB DDR4 SDRAM DIMM
fru_manufacturer = Samsung
fru_part_number = %
fru_rev_level = 01
fru_serial_number = %
 fault_state = OK
clear_fault_action = (none)

Exadata Compute Node Not Starting after a long Period…

Well,
After a long time on a graceful reboot, the compute node was simply not starting… What do to?
The best is:

1. Connect to ILOM Console:

Go to: Host Management –> Power control –> select Power Cycle in drop down list.

2. Connect to ILOM Server start SP console:
You may do it from another node, of course.

[root@grepora02 ~]# ssh root@grepora01-ilom Password:

Oracle(R) Integrated Lights Out Manager

Version 3.2.9.23 r116695

Copyright (c) 2017, Oracle and/or its affiliates. All rights reserved.

Warning: HTTPS certificate is set to factory default.

Hostname: grepora01-ilom

-> start /SP/console Are you sure you want to start /SP/console (y/n)? y

And, if not, as always, create a SR and follow with Oracle is the best way to go…

Hope it helps!

OEM: The ILOM server is currently offline or unreachable on the network.

Hi all!
Just got an alarm from OEM with this message. How to check it?
– First thing is to be able to connect on ILOM from DBNode.
– From there we can test the IPv4 and/or IPv6 interfaces through ping, as pe shown below.

This is also documented as per this Doc: Oracle Integrated Lights Out Manager (ILOM) 3.0 HTML Documentation Collection – Test IPv4 or IPv6 Network Configuration (CLI)

In my case, it was only a false alarm, as I was able to connect to other DBNodes from this ILOM:

[root@greporasrv01db01 ~]# ssh greporasrv01-ilom.jcrew.com
The authenticity of host 'greporasrv01-ilom.grepora.com (10.48.18.64)' can't be established.
RSA key fingerprint is 59:c5:9f:b1:60:59:15:16:94:c8:94:88:7b:4e:52:57.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'greporasrv01-ilom.grepora.com' (RSA) to the list of known hosts.
Password: 

Oracle(R) Integrated Lights Out Manager

Version 3.2.9.23 r116695

Copyright (c) 2017, Oracle and/or its affiliates. All rights reserved.

Warning: HTTPS certificate is set to factory default.

Hostname: greporasrv01-ilom

-> show /SP/network

 /SP/network
    Targets:
        interconnect
        ipv6
        test

    Properties:
        commitpending = (Cannot show property)
        dhcp_clientid = none
        dhcp_server_ip = none
        ipaddress = 10.50.12.64
        ipdiscovery = static
        ipgateway = 10.50.12.1
        ipnetmask = 255.255.255.0
        macaddress = 00:10:E0:95:73:E6
        managementport = MGMT
        outofbandmacaddress = 00:10:E0:95:73:E6
        pendingipaddress = 10.50.12.64
        pendingipdiscovery = static
        pendingipgateway = 10.50.12.1
        pendingipnetmask = 255.255.255.0
        pendingmanagementport = MGMT
        pendingvlan_id = (none)
        sidebandmacaddress = 00:10:E0:95:73:E7
        state = ipv4-only
        vlan_id = (none)

    Commands:
        cd
        set
        show

-> cd /SP/network/test
/SP/network/test

-> show

 /SP/network/test
    Targets:

    Properties:
        ping = (Cannot show property)
        ping6 = (Cannot show property)

    Commands:
        cd
        set
        show

-> set ping=10.50.12.51       -- DBNode1
Ping of 10.50.12.51 succeeded

-> set ping=10.50.12.52       -- DBNode2
Ping of 10.50.12.52 succeeded

 

Infiniband Error: Cable is present on Port “X” but it is polling for peer port

Facing this error? Let me guess: Ports 03, 05, 06, 08, 09 and 12 are alerting? You have a Quarter Rack? Have recently installed Exadata plugin to version 12.1.0.3 or higher?
Don’t panic!

This is probably related to Bug 15937297 : EM 12C HAS ERRORS CABLE IS PRESENT ON PORT ‘N’ BUT IT IS POLLING FOR PEER PORT. The full message might be like “Cable is present on Port 6 but it is polling for peer port. This could happen when the peer port is unplugged/disabled“.

In fact, the bug was closed as not a bug. 🙂
As part of the 12.1.0.3 Exadata plugin, the IB switch ports are now checked for non-terminated cables. So these errors ‘polling for peer port’ are the expected behavior.  Once ‘polling for peer port’ is an enhanced feature of the 12.1.0.3 plugin, this explains why you most likely did not see these errors until you upgraded the OMS to 12.1.0.2 and then updated the plugins.

In Quarter Racks, the following ports 3, 5, 6, 8, 9 and 12 are usually cabled ahead of time, but not terminated. In some racks port 32 may also be unterminated. Checking for incident in OEM you might see something like this image:

newscreenshot-2016-12-26-as-20-03-50

Continue reading