EM Repository: ORA-00060: Deadlock detected error in alert log

Hello!
Some time ago I found some ORA-00060: Deadlock detected. errors in a client OEM database… Like this:

Thu Dec 22 09:01:55 2016
ORA-00060: Deadlock detected. More info in file /oracle/oemdb/diag/rdbms/oemdb/oemdb/trace/oemdb_ora_1757.trc.
Thu Dec 22 09:02:07 2016
ORA-00060: Deadlock detected. More info in file /oracle/oemdb/diag/rdbms/oemdb/oemdb/trace/oemdb_ora_1759.trc.
ORA-00060: Deadlock detected. More info in file /oracle/oemdb/diag/rdbms/oemdb/oemdb/trace/oemdb_ora_1759.trc.

In summary, after investigating the trace (as per below), found that the issueis caused by the following command:

More“EM Repository: ORA-00060: Deadlock detected error in alert log”

Infiniband Error: Cable is present on Port “X” but it is polling for peer port

Facing this error? Let me guess: Ports 03, 05, 06, 08, 09 and 12 are alerting? You have a Quarter Rack? Have recently installed Exadata plugin to version 12.1.0.3 or higher?
Don’t panic!

This is probably related to Bug 15937297 : EM 12C HAS ERRORS CABLE IS PRESENT ON PORT ‘N’ BUT IT IS POLLING FOR PEER PORT. The full message might be like “Cable is present on Port 6 but it is polling for peer port. This could happen when the peer port is unplugged/disabled“.

In fact, the bug was closed as not a bug. 🙂
As part of the 12.1.0.3 Exadata plugin, the IB switch ports are now checked for non-terminated cables. So these errors ‘polling for peer port’ are the expected behavior.  Once ‘polling for peer port’ is an enhanced feature of the 12.1.0.3 plugin, this explains why you most likely did not see these errors until you upgraded the OMS to 12.1.0.2 and then updated the plugins.

In Quarter Racks, the following ports 3, 5, 6, 8, 9 and 12 are usually cabled ahead of time, but not terminated. In some racks port 32 may also be unterminated. Checking for incident in OEM you might see something like this image:

newscreenshot-2016-12-26-as-20-03-50

More“Infiniband Error: Cable is present on Port “X” but it is polling for peer port”

RS-7445 [Serv MS leaking memory] [It will be restarted] [] [] [] [] [] [] [] [] [] []

Hello!
Having this error from cell alerthistory.log? Don’t panic!
Take a look in MOS: Exadata Storage Cell reports error RS-7445 [Serv MS Leaking Memory] (Doc ID 1954357.1). It’s related to Bug  – RS-7445 [SERV MS LEAKING MEMORY].

The issue is a memory leak in the Java executable and affects systems running with JDK 7u51 or later versions. This is relevant for all versions in Release 11.2 to 12.1.

What happens is that MS process is consuming high memory (up to 2GB).  Normally MS use around 1GB but because of the bug the memory allocated can grow upt to 2GB.  You can check it as per example below:

[root@exaserver ~]# ps -feal|grep java
0 S root     16493 14737  0  80   0 - 15317 pipe_w 18:34 pts/0    00:00:00 grep java
0 S root     22310 27043  2  80   0 - 267080 futex_ 18:15 ?       00:00:27 /usr/java/default/bin/java -Xms256m -Xmx512m -XX:-UseLargePages -Djava.library.path=/opt/oracle/cell/cellsrv/lib -Ddisable.checkForUpdate=true -jar /opt/oracle/cell/oc4j/ms/j2ee/home/oc4j.jar -out /opt/oracle/cell/cellsrv/deploy/log/ms.lst -err /opt/oracle/cell/cellsrv/deploy/log/ms.err

Note that: 267080 * 4096 = 1143MB (1GB). If your number is higher than this, it indicates the presence of the bug.

More“RS-7445 [Serv MS leaking memory] [It will be restarted] [] [] [] [] [] [] [] [] [] []”

Getting today’s Errors and Warnings from MySQL log

Quick one!

# Warnings

cat /var/log/mysqld.log |grep `date +%y%m%d` | grep "\[Warning\]"

# Errors

cat /var/log/mysqld.log |grep `date +%y%m%d` | grep "\[ERROR\]"

And a Bonus!
To get entries from X days ago:

cat /var/log/mysqld.log |grep `date --date="46 days ago" +%y%m%d`

Matheus.

Monitoring MySQL with Nagios – Quick View

Hi all!
As you know, we have some commercial solutions to monitoring/alerting MySQL, like MySQL Enterprise Monitor or Oracle Grid/Cloud Control.

But, regarding we are using MySQL instead of Oracle Database, we can assume it’s probably a decision taken based on cost. So, considering Open Source solutions, we basically have Nagios, Zabbix, OpenNMS…

MangagedMonitoringConsole

 

Thinking on Nagios, in my opinion the “supra sumo” is mysql_health_check.pl.
Below whitepaper and presentation:
White Paper
Presentation
Code
Good one by Sheeri Cabral and posted here!

Any way, with theese two we can make lots of magic:

1. check_mysql.pl
– Check status of MySql server (slow queries, etc)
– Queries per second graph

2. check_db_query.pl
– Allowes to run SQL Queries and setting thresholds for warning e critical. Ex:

check_db_query.pl -d database -q query [-w warn] [-c crit] [-C conn_file] [-p placeholder]

Ex for Nagios call:

define command{
command_name    check_db_entries
command_line    /usr/local/bin/perl $USER1$/check_db_query.pl -d "$ARG1$" -q "$ARG2$" $ARG3$
}

So, now it’s just make your queries and implement your free monitoring on MySQL! 🙂
Matheus.

ASM: Disk Size Imbalance Query

It can be useful if you work frequently with OEM metrics…

# OEM’s Query

SELECT file_num, MAX(extent_count) max_disk_extents, MIN(extent_count)
min_disk_extents
, MAX(extent_count) - MIN(extent_count) disk_extents_imbalance
FROM (SELECT number_kffxp file_num, disk_kffxp disk_num, COUNT(xnum_kffxp)
extent_count
FROM x$kffxp
WHERE group_kffxp = 1
AND disk_kffxp != 65534
GROUP BY number_kffxp, disk_kffxp
ORDER BY number_kffxp, disk_kffxp)
GROUP BY file_num
HAVING MAX(extent_count) - MIN(extent_count) > 5
ORDER BY disk_extents_imbalance DESC;

# Matheus’ Query

select max(free_mb) biggest, min(free_mb) lowest, avg(free_mb) AVG,
trunc(GREATEST ((avg(free_mb)*100/max(free_mb)),(min(free_mb)*100/avg(free_mb))),2)||'%' as balanced,
trunc(100-(GREATEST ((avg(free_mb)*100/max(free_mb)),(min(free_mb)*100/avg(free_mb)))),2)||'%' as inbalanced
from v$asm_disk
where group_number in
(select group_number from v$asm_diskgroup where name = upper('&DG'));

I made my own query for two reasons:
1) I didn’t have the OEM query in the time i made it.
2) My query measures the imbalance with the avg of the disks (if every disk would balanced, how would be the difference), rather than the real/present difference between the disk with the maximum and the minimum usage…

So, you can chose the one you need… 🙂

Matheus.

Service Detected on OEM but not in SRVCTL or SERVICE_NAMES Parameter?

Okey, it happens.
To me, after a database moving from a cluster to another. The service was registered by SRVCTL in the old cluster but is not needed. So, was not registered in the new cluster.
But OEM insists to list, for example, the “service3” as offline. The problem is that you can not remove it by SRVCTL, because you had not registered, right? See the example below:

Listing services:
srvdatabase1:/home/oracle>srvctl status service -d systemdb
Service service1_systemdb is running on nodes: srvdatabase1
Service service2 is running on nodes: srvdatabase1
Service service2_systemdb is running on nodes: srvdatabase1

In the service_name parameter:
srvdatabase1:/home/oracle>sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Mon Jun 8 15:21:00 2015
Copyright (c) 1982, 2009, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> show parameters service;
NAME                                 TYPE
------------------------------------ --------------------------------
VALUE
------------------------------
service_names                        string
service2,test,systemdb

And the offline alarm goes to “service3“?
The easiest fix:

SQL> exec dbms_service.DELETE_SERVICE('service3');
PL/SQL procedure successfully completed.

Matheus.