OEM: The number of hanging transactions are hang_trans is %

Hi all!
So, today is quickie one, just to make the links. Seems this message from OEM is not clear enough for some people, specially regarding non-specialists in Oracle: This means something is in lock in your database!

If this is the case, contact a DBA.

If you ARE a DBA, you may want to read this post about easy locating and solving locks: Solving Simple Locks Through @lock2s and @killlocker.

Also, if the session if from DBLink, is always useful to read this: Lock by DBLink – How to locate the remote session?

There is also some additional/specific material about some issues and bugs in this regard here: Tag: LOCK.

I hope it helps!
Cheers!

OEM Information Reports: ORA-00600 [kpndbcon-svchpnotNULL]

Having this error from an Information Report?

ORA-00600 [kpndbcon-svchpnotNULL]
ORA-00600: internal error code, arguments: [kpndbcon-svchpnotNULL], [], [], [], [], [], [], [], [], [], [], []

Don’t worry… Basically this is not an Oracle direct issue , the cause of this error is that while the report is running (it takes 2 or 3 minutes) one of the following happens:

  • The Database Session in the OEM Repository (Database Repository) is killed.
  • The Database Session in the Target Database (where OEM has to connect and get the data) is killed.
  • There is network issues between OEM Repository and the Target database causing “time outs” or that the session finishes erroneously. .
  • High workload in one database causes “time out” making the session finished erroneously.
  • So basically this is a communication problem, between the OEM Repository and the database from where the data is being gotten.
  • To keep reports like this running with database links is something that Oracle doesn’t support at all because of any network issue can cause that the report gets errors, you can read the following notes:

Some reference about it:

  • ORA-00600 [kpndbcon-svchpnotNULL] Errors (Doc ID 1615517.1)
  • ORA-00600 [kpndbcon-svchpnotNULL] query through dblink (Doc ID 1490700.1)
  • Information Publisher Report fails with Error Rendering Element. Exception: ORA-00600 [kpndbcon-svchpnotNULL] (Doc ID 1930280.1)


So what’s the solution
?
The solution here is easy, just re-run it.

Hope it helps. Cheers!

Managing AWR Warehouse Repository Database

1. Change Retention Period Of AWR Warehouse Repository Database

This retention of the AWR on the Repository Database can be changed by the following:

<OMS_HOME>/bin>./emcli awrwh_reconfigure -retention=<New retention period (in years)>
For example: 
[oracle@oem13c oms]$ emcli awrwh_reconfigure -retention=5

2. Change Staging Location Of AWR Dump Files

For the AWR Warehouse, the target database by default creates dump file in home directory. So after adding the target to AWR warehosue, we need to reconfigure it from OEM CLI to change the dump files directory as following:

<OMS_HOME>/bin>./emcli awrwh_reconfigure_src -target_name=<target database name> -target_type=rac_database -src_dir="<directory path>"
For example: 
[oracle@oem13c ]$ ./emcli awrwh_reconfigure_src -target_name=greporadb -target_type=rac_database -src_dir="/arwdata/awrw"

3. Change Upload Interval Of SnapShots In AWR Warehouse Repository Database

This configuration can be changed by the following:

<OMS_HOME>/bin>./emcli awrwh_reconfigure -upload_interval <New upload interval>
For example: 
[oracle@oem13c oms]$ ./emcli awrwh_reconfigure -upload_interval=12

4. List Current Configuration

This can be accomplished by the following:

[oracle@oem13c oms]$ emcli awrwh_reconfigure -list
Upload Interval (hours) = 12
Retention (years) = 5
Dump Location = /awrdata/awrw/
AWR Warehouse reconfigured successfully
[oracle@oem13c oms]$

Reference
EM13c: How To Change Retention Period Of AWR Warehouse Repository Database In 13.2 OEM Cloud Control (Doc ID 2247437.1)
EM13c: How To Change Staging Location Of Dump Files On AWR Warehouse Repository Database In 13.2 OEM Cloud Control (Doc ID 2247439.1)
EM13c: How To Change Upload Interval Of SnapShots In AWR Warehouse Repository Database In 13.2 OEM Cloud Control (Doc ID 2247438.1)

OEM: The ILOM server is currently offline or unreachable on the network.

Hi all!
Just got an alarm from OEM with this message. How to check it?
– First thing is to be able to connect on ILOM from DBNode.
– From there we can test the IPv4 and/or IPv6 interfaces through ping, as pe shown below.

This is also documented as per this Doc: Oracle Integrated Lights Out Manager (ILOM) 3.0 HTML Documentation Collection – Test IPv4 or IPv6 Network Configuration (CLI)

In my case, it was only a false alarm, as I was able to connect to other DBNodes from this ILOM:

[root@greporasrv01db01 ~]# ssh greporasrv01-ilom.jcrew.com
The authenticity of host 'greporasrv01-ilom.grepora.com (10.48.18.64)' can't be established.
RSA key fingerprint is 59:c5:9f:b1:60:59:15:16:94:c8:94:88:7b:4e:52:57.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'greporasrv01-ilom.grepora.com' (RSA) to the list of known hosts.
Password: 

Oracle(R) Integrated Lights Out Manager

Version 3.2.9.23 r116695

Copyright (c) 2017, Oracle and/or its affiliates. All rights reserved.

Warning: HTTPS certificate is set to factory default.

Hostname: greporasrv01-ilom

-> show /SP/network

 /SP/network
    Targets:
        interconnect
        ipv6
        test

    Properties:
        commitpending = (Cannot show property)
        dhcp_clientid = none
        dhcp_server_ip = none
        ipaddress = 10.50.12.64
        ipdiscovery = static
        ipgateway = 10.50.12.1
        ipnetmask = 255.255.255.0
        macaddress = 00:10:E0:95:73:E6
        managementport = MGMT
        outofbandmacaddress = 00:10:E0:95:73:E6
        pendingipaddress = 10.50.12.64
        pendingipdiscovery = static
        pendingipgateway = 10.50.12.1
        pendingipnetmask = 255.255.255.0
        pendingmanagementport = MGMT
        pendingvlan_id = (none)
        sidebandmacaddress = 00:10:E0:95:73:E7
        state = ipv4-only
        vlan_id = (none)

    Commands:
        cd
        set
        show

-> cd /SP/network/test
/SP/network/test

-> show

 /SP/network/test
    Targets:

    Properties:
        ping = (Cannot show property)
        ping6 = (Cannot show property)

    Commands:
        cd
        set
        show

-> set ping=10.50.12.51       -- DBNode1
Ping of 10.50.12.51 succeeded

-> set ping=10.50.12.52       -- DBNode2
Ping of 10.50.12.52 succeeded

 

OEM 13C Patching Agent: [ERROR- Failed to Update Target Type Metadata]

While applying Patch to OEM13c, specifically to OMS Agent, I got this error when trying to start it back:

Collection Status                            : Collections enabled
Heartbeat Status       : OMS responded illegally [ERROR- Failed to Update Target Type Metadata]
Last attempted heartbeat to OMS              : 2018-04-17 12:12:51
Last successful heartbeat to OMS             : (none)
Next scheduled heartbeat to OMS              : 2018-04-17 12:13:21
---------------------------------------------------------------
Agent is Running and Ready
[oracle@oem13c oms]$ ./emctl upload agent
Oracle Enterprise Manager Cloud Control 13c Release 2
Copyright (c) 1996, 2016 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
EMD upload error:full upload has failed: uploadXMLFiles skipped :: OMS version not checked yet. If this issue persists check trace files for ping to OMS related errors. (OMS_DOWN)
[oracle@oem13c oms]$ ./emctl pingOMS
Oracle Enterprise Manager Cloud Control 13c Release 2
Copyright (c) 1996, 2016 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
EMD pingOMS error: OMS sent an invalid response: “ERROR- Failed to Update Target Type Metadata”

Nice hãn?
I Found to MOS EM 13c Agent : pingOMS error: OMS sent an invalid response: “ERROR- Failed to Update Target Type Metadata” (Doc ID 2318564.1) saying:

“This particular issue is caused when any Agent Plugin is upgraded to a higher level than the OMS plugin.”

The solution according to MOS Doc is to rollback the Agent Plugins ahead to the OMS version. Checking it:
Continue reading

Exadata: 7 Useful Commands to check Port/Sensor Alarms

Hello all!

This days I had an alarm with message below:

Message=The aggregate sensor /SYS/CABLE_CONN_STAT has a fault.

There is some useful commands I used to verify all ports/sensors in my exadata cluster.

In summary, these commands:
1) Use Intelligent Platform Management Interface (IPMI) to read the Sensor Data Record (SDR) repository
2) Use Intelligent Platform Management Interface (IPMI) to view the ILOM SP System Event Log (SEL)
3) Display all host nodes with ibhosts
4) Use ibcheckstate to scan InfiniBand fabric and validate the port logical and physical state
5) Use ibcheckerrors to scan InfiniBand fabric and validate the connectivity as described in the topology file
6) Checking for sensor healthy from switch
7) Check the overall health of the InfiniBand switch, on the Exadata switch itself

The Commands are:

Continue reading

Oracle ASR: Communication Issues

Hello all,
Are you having notifications like this one from you ASR?

ALERT: Oracle Auto Service Request (ASR) has detected a heartbeat failure for these assets.

[list with "Serial"; "Hostnam"; "Information" of affected targets]

IMPACT: ASR would not be able to create a Service Request (SR) if a fault were to occur.
ACTION: Determine why the heartbeat has failed for these assets and resolve the issue.

This is only a notification saying that ASR not able to reach a target.
For detailed information on how to troubleshoot, you can access MOS Oracle Auto Service Request (ASR) No Heartbeat Issue – How to Resolve (Doc ID 1346328.2)

In general why, things to test are:
– Access from ASR server to transport.oracle.com, via https, using port 443.
– Access from ASR server to ASR assets, via http, using port 6481.

Continue reading

OEM Alarm – %MB of Audit Trail files (sizeOfOSAuditFiles:FILE_SIZE)

Hello All,
After upgrading a OEM to 13c, I started to receive notifications for event “sizeOfOSAuditFiles:FILE_SIZE“.

This is a new event implemented on OEM DB Plugin 12.1.0.7.0 under “Operating System Audit Records” metric group. Upgrading DB Plugin was part of OEM Upgrade change once we had some old versioned.

This event is only a notification related to file size for space management ends. The default thresholds are 10MB (warning) and 20MB (critical), which in most of times it’s a pretty low value.
This is specifically related to location under parameter audit_file_dest if you want to check.

Between options to reduce the noise are disable this metric or increase thresholds accordingly, which was what I did.
At this moment, I just increased thresholds to 500MB/2048MB, which I consider good values for the environment.

Some reference about can be found at:
– Enterprise Manager Oracle Database Plug-in Metric Reference Manual (Plug-in Release 12.1.0.7) – Database Instance – Operating System Audit Records
– EM 12c, EM 13c: Troubleshooting Database Metrics in Enterprise Manager 12c and 13c Cloud Control (Doc ID 2032156.1)

Hope it helps,
Cheers!

OEM Alarms: TaskZombieException

Hello all,
Recently I started to have several alarms like this:

Internal error detected: java.lang.Throwable:oracle.sysman.gcagent.tmmain.execution.LongOpManager$ZombieDetection:1017.

Or

Internal error detected: oracle.sysman.gcagent.task.TaskZombieException:oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask:620.

In agent log the message:

# $AGENT_INST/sysman/log/gcagent.log:

2017-06-13 12:54:09,232 [355:GC.Executor.14 (oracle_database:DB_DB12:%DB%)] ERROR - oracle_database:DB_DB12:%DB% oracle.sysman.gcagent.task.TaskZombieException: task declared as a zombie
at oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask.accountedCall(TaskFutureImpl.java:620)
at oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask.call(TaskFutureImpl.java:643)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)

To avoid generating pages and new incidents, I changed parameter _zombieCreateIncident to false on agents in related servers, as per described in MOS EM12c: Incident constantly raised for Oracle.sysman.gcagent.task.TaskZombieException: task declared as a zombie (Doc ID 2116834.1) regarding MOS Bug 22674258 – Zombie processes created for DB metric collection /workaround do not work.
I also added some other parameters that should help, like increasing the wait, all mentioned in same MOS notes.

Like this, in $AGENT_INST/sysman/config/emd.properties:

#GrepOra magical fixes:
_zombieSuspensions=true
_canceledThreadWait=900
_zombieThreadPercentThreshold=0
_zombieCreateIncident=false

And it solved the issue. At leat, the noise stopped. 🙂

Hope it helps,
Cheers!

OEM 12c+: Prevent OMS from Sending Old Notifications for Events

Are you receiving old notifications from OEM? Like 2 or 3 days past, mostly already solved or after a blackout?
Yeah, this is annoying, specially when getting floods and floods of notifications.

Ok, so here go a very good tip: You can set grace period for notifications! 🙂
Easy easy, do this way:

cd /bin
emctl set property -name oracle.sysman.core.notification.grace_period -value [provide value in minutes]

The oracle.sysman.core.notification.grace_period OMS parameter has been introduced in 12c and allows the user to configure the grace period within which the notification should be sent. The value is set in minutes.

For example:

emctl set property -name oracle.sysman.core.notification.grace_period -value 1440

With this, OMS sends only those notifications which have been raised in the past 1440 mins (last 24 hours) and ignores all the notifications for events / incidents created prior to this time period.

After this, you’ll need to start OMS:

emctl start oms

The oracle.sysman.core.notification.grace_period parameter applies to all the Notification methods, but if the requirement is to specify the grace period for a particular notification method only, you can use the below parameters accordingly:

oracle.sysman.core.notification.grace_period_connector: For Connectors
oracle.sysman.core.notification.grace_period_email: For email notifications
oracle.sysman.core.notification.grace_period_oscmd: For OS Command notifications
oracle.sysman.core.notification.grace_period_plsql: For PLSQL notifications
oracle.sysman.core.notification.grace_period_snmp: For SNMP Trap notifications
oracle.sysman.core.notification.grace_period_ticket: For ticketing tools

This is weel described as per MOS: 12c Cloud control: How to Prevent OMS from Sending out Old Notifications for Events / Incidents Occurred in the Past? (Doc ID 1605351.1)

Cheers!