EM Repository: ORA-00060: Deadlock detected error in alert log

Hello!
Some time ago I found some ORA-00060: Deadlock detected. errors in a client OEM database… Like this:

Thu Dec 22 09:01:55 2016
ORA-00060: Deadlock detected. More info in file /oracle/oemdb/diag/rdbms/oemdb/oemdb/trace/oemdb_ora_1757.trc.
Thu Dec 22 09:02:07 2016
ORA-00060: Deadlock detected. More info in file /oracle/oemdb/diag/rdbms/oemdb/oemdb/trace/oemdb_ora_1759.trc.
ORA-00060: Deadlock detected. More info in file /oracle/oemdb/diag/rdbms/oemdb/oemdb/trace/oemdb_ora_1759.trc.

In summary, after investigating the trace (as per below), found that the issueis caused by the following command:

More“EM Repository: ORA-00060: Deadlock detected error in alert log”

umount: This utility only unmounts cifs filesystems

Can’t umount Red-Hat filesystem with error “This utility only unmounts cifs filesystems” either with force?

[@Linux-redhat5 ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.11 (Tikanga)

ERROR:

[@Linux-redhat5 /]# umount /cifsfilesystem/data/custom
This utility only unmounts cifs filesystems.
This utility only unmounts cifs filesystems.
This utility only unmounts cifs filesystems.

More“umount: This utility only unmounts cifs filesystems”

Infiniband Error: Cable is present on Port “X” but it is polling for peer port

Facing this error? Let me guess: Ports 03, 05, 06, 08, 09 and 12 are alerting? You have a Quarter Rack? Have recently installed Exadata plugin to version 12.1.0.3 or higher?
Don’t panic!

This is probably related to Bug 15937297 : EM 12C HAS ERRORS CABLE IS PRESENT ON PORT ‘N’ BUT IT IS POLLING FOR PEER PORT. The full message might be like “Cable is present on Port 6 but it is polling for peer port. This could happen when the peer port is unplugged/disabled“.

In fact, the bug was closed as not a bug. 🙂
As part of the 12.1.0.3 Exadata plugin, the IB switch ports are now checked for non-terminated cables. So these errors ‘polling for peer port’ are the expected behavior.  Once ‘polling for peer port’ is an enhanced feature of the 12.1.0.3 plugin, this explains why you most likely did not see these errors until you upgraded the OMS to 12.1.0.2 and then updated the plugins.

In Quarter Racks, the following ports 3, 5, 6, 8, 9 and 12 are usually cabled ahead of time, but not terminated. In some racks port 32 may also be unterminated. Checking for incident in OEM you might see something like this image:

newscreenshot-2016-12-26-as-20-03-50

More“Infiniband Error: Cable is present on Port “X” but it is polling for peer port”

RS-7445 [Serv MS leaking memory] [It will be restarted] [] [] [] [] [] [] [] [] [] []

Hello!
Having this error from cell alerthistory.log? Don’t panic!
Take a look in MOS: Exadata Storage Cell reports error RS-7445 [Serv MS Leaking Memory] (Doc ID 1954357.1). It’s related to Bug  – RS-7445 [SERV MS LEAKING MEMORY].

The issue is a memory leak in the Java executable and affects systems running with JDK 7u51 or later versions. This is relevant for all versions in Release 11.2 to 12.1.

What happens is that MS process is consuming high memory (up to 2GB).  Normally MS use around 1GB but because of the bug the memory allocated can grow upt to 2GB.  You can check it as per example below:

[root@exaserver ~]# ps -feal|grep java
0 S root     16493 14737  0  80   0 - 15317 pipe_w 18:34 pts/0    00:00:00 grep java
0 S root     22310 27043  2  80   0 - 267080 futex_ 18:15 ?       00:00:27 /usr/java/default/bin/java -Xms256m -Xmx512m -XX:-UseLargePages -Djava.library.path=/opt/oracle/cell/cellsrv/lib -Ddisable.checkForUpdate=true -jar /opt/oracle/cell/oc4j/ms/j2ee/home/oc4j.jar -out /opt/oracle/cell/cellsrv/deploy/log/ms.lst -err /opt/oracle/cell/cellsrv/deploy/log/ms.err

Note that: 267080 * 4096 = 1143MB (1GB). If your number is higher than this, it indicates the presence of the bug.

More“RS-7445 [Serv MS leaking memory] [It will be restarted] [] [] [] [] [] [] [] [] [] []”

Oracle – Free DevOps Massive Open Online Course (MOOC)

Hey!
Just passing by to let you know that Oracle is organizing a free Massive Open Online Course to cover DevOps topics.

6509529_orig

The course will be delivered over three weeks from 13 January to 3 February. 

The discussed topics are:

• What is DevOps ?
• Test-Driven Development (TDD) Using Python or Java
• Agile and Project Management Using Oracle Developer Cloud Service
• Continuous Integration and Continuous Deployment Using Oracle Developer Cloud Service

You can enroll to the course by clicking here.

Enjoy!

MySQL Network Connections on ‘TIME_WAIT’

Hi all!
Recently I caught a bunch of connections in ‘TIME_WAIT’ on a MySQL Server through ‘netstat – antp 3306’…
After some time, we identified this was caused by the environment not using DNS, only fixed IPS (uuugh!)…

As you know, for security measures MySQL maintains a host cache for connections established. From MySQL docs:

“For each new client connection, the server uses the client IP address to check whether the client host name is in the host cache. If not, the server attempts to resolve the host name. First, it resolves the IP address to a host name and resolves that host name back to an IP address. Then it compares the result to the original IP address to ensure that they are the same. The server stores information about the result of this operation in the host cache. If the cache is full, the least recently used entry is discarded.”
9.12.6.2 DNS Lookup Optimization and the Host Cache

For this reason, there is a DNS ‘reverse’ lookup for each login was hanging this connections.

The solution?
Right way: Add an A type registry in DNS for the hosts. Use DNS!
Quick way: Add on /etc/hosts from database server the mapping for the connected hosts, avoiding the DNS Lookup.
Quicker way: Setting the skip-name-resolve variable at /etc/my.cnf. This variable avoids this behavior in database layer for new connections and solve the problem.

This is a good (portuguese) post about it: http://wagnerbianchi.com/blog/?p=831

See ya!
Matheus.

RAC on AIX: Network Best Practices

Hi all!
A few time ago I passed by some performance issues on AIX working with instances with different configuration (proc/mem). The root cause was basically the inefficient configuration of networking for interconnect (UDP).
As you know, the UDP is a non-response (for that reason with less metadata and faster) protocol. By the default, every server have a pool to send udp (and tcp) messages and another to recieve.
In my situation, once there was an ‘inferior’ instance, the pools were automatically set smaller in this one, and it was causing a high interconnection block sending statistics from other instances. In deed, it was lots of resends caused by overflows in this smaller instance…

There is one one to evaluete how much loss are you having for UDP in your AIX server:

netstat -s | grep 'socket buffer overflows'

If you are having considerable number of overflows, it’s recommended to reavaluate the sized of your udp_recvspace. And, of course, maintain the calculation of pools.

Oracle recommends, at least:

tcp_recvspace = 65536
tcp_sendspace = 65536
udp_sendspace = ((DB_BLOCK_SIZE * DB_FILE_MULTIBLOCK_READ_COUNT) + 4 KB) but no lower than 65536
udp_recvspace = 655360 (Minimum recommended value is 10x udp_sendspace, parameter value must be less than sb_max)
rfc1323 = 1
sb_max = 4194304
ipqmaxlen = 512

This and others details about configuring RAC on AIX ban be found in note: RAC and Oracle Clusterware Best Practices and Starter Kit (AIX) (Doc ID 811293.1)

I’d recommend you take a look too.

Have a nice day!
Matheus.

RHEL: Figuring out CPUs, Cores and Hyper-Threading

Hi all!
It’s a recurrent subject, right? But no one is 100% sure to how figure this out… So, let me quickly show you my way:

– Physical CPUs (sockets):

[root@mysrvr ~]# grep -i "physical id" /proc/cpuinfo | sort -u | wc -l
2
[root@mysrvr ~]# dmidecode -t processor |grep CPU
        Version: Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
        Version: Intel(R) Xeon(R) CPU X5570 @ 2.93GHz

So, 2 physical CPUs.

– Physical Cores

[root@mysrvr ~]# egrep -e "core id" -e ^physical /proc/cpuinfo|xargs -l2 echo|sort -u
physical id : 0 core id : 0
physical id : 0 core id : 1
physical id : 0 core id : 2
physical id : 0 core id : 3
physical id : 1 core id : 0
physical id : 1 core id : 1
physical id : 1 core id : 2
physical id : 1 core id : 3

Each one of Physical Processors has 4 cores.
So, there is two quad-cores. This way, we have 8 cores at all.

– Logical CPUs

[root@mysrvr ~]# grep -i "processor" /proc/cpuinfo | sort -u | wc -l
16

Ok, so we have cores in double.
This means we have Hyper-Threading (technology by Intel Processors).

Not so hard, right?

Those links are similar and quite cool to understand the concepts:
https://access.redhat.com/discussions/480953
https://www.redhat.com/archives/redhat-list/2011-August/msg00009.html
http://www.intel.com/content/www/us/en/architecture-and-technology/hyper-threading/hyper-threading-technology.html

Matheus.