Diagsnap Causing Node Eviction on 12c

Hi all,

So, you know how it is when we are having node evictions over a RAC. Lot’s of sessions getting killed, some effects for the clients and applications, also boss bossing in our shoulders to have a quick resolution over this. Plus it’s never an easy thing to drill down and understand.

Troubleshooting node reboots/evictions within Grid Infrastructure (GI) often is difficult due to the lack of Network and OS level resource information. To help circumvent this situation the diagsnap feature has been developed and integrated with Grid Infrastructure. Diagsnap is triggered to collect Network and OS level resource information when a given node is about to get evicted or when Grid Infrastructure is about to crash.

The diagsnap feature is enabled automatically starting from 12.1.0.2 Oct2017 PSU and 12.2.0.1 Oct2017 RU.
For more information about the diagsnap feature, refer to the Document 2345654.1 “What is diagsnap resource in 12c GI and above?”

However, after a lot of research and SR logs sending and interaction with Oracle to investigate a node eviction, we ended up finding a match to Bug 25785073 – OCSSD hangs while DIAGSNAP takes pstack causing a node reboot(Doc ID 25785073.8).

So Diagsnap is not helping, it is the cause of the issues.

After some research, seems this is not the only bug related to it. See some more:

  • Bug 27182006 – Auxiliary commands generated by DIAGSNAP spin CPU(Doc ID 27182006.8)
  • Bug 24692439 – Auxiliary commands generated by DIAGSNAP consumes high CPU(Doc ID 24692439.8)
  • Bug 23101338 – Disable diagsnap after 12.1.0.2.160419 GI PSU patch was installed(Doc ID 23101338.8)
  • Bug 28462215 – The Process diagsnap.pl is Restarted Every 2 Minutes(Doc ID 28462215.8)

Well,

At least for the 12Cs, I’m disabling it in all my environments:

– Check if diagsnap is disabled or not (DIAGSNAP=Disable if disabled, nothing if enabled)

$ egrep '^DIAGSNAP|^PSTACK' /u01/app/12.2.0.1/grid/crf/admin/crf$(hostname -s).ora

– Disable diagsnap (this will disable diagsnap on all nodes)

$ /u01/app/12.2.0.1/grid/bin/oclumon manage -disable diagsnap

– Check that diagsnap is disabled (DIAGSNAP=Disable if disabled, nothing if enabled) — to be done on each node

$ egrep '^DIAGSNAP|^PSTACK' /u01/app/12.2.0.1/grid/crf/admin/crf$(hostname -s).ora

Hope it helps!
Cheers!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.