Diagsnap Causing Node Eviction on 12c

Hi all,

So, you know how it is when we are having node evictions over a RAC. Lot’s of sessions getting killed, some effects for the clients and applications, also boss bossing in our shoulders to have a quick resolution over this. Plus it’s never an easy thing to drill down and understand.

Troubleshooting node reboots/evictions within Grid Infrastructure (GI) often is difficult due to the lack of Network and OS level resource information. To help circumvent this situation the diagsnap feature has been developed and integrated with Grid Infrastructure. Diagsnap is triggered to collect Network and OS level resource information when a given node is about to get evicted or when Grid Infrastructure is about to crash.

The diagsnap feature is enabled automatically starting from 12.1.0.2 Oct2017 PSU and 12.2.0.1 Oct2017 RU.
For more information about the diagsnap feature, refer to the Document 2345654.1 “What is diagsnap resource in 12c GI and above?”

However, after a lot of research and SR logs sending and interaction with Oracle to investigate a node eviction, we ended up finding a match to Bug 25785073 – OCSSD hangs while DIAGSNAP takes pstack causing a node reboot(Doc ID 25785073.8).

So Diagsnap is not helping, it is the cause of the issues.

After some research, seems this is not the only bug related to it. See some more:

  • Bug 27182006 – Auxiliary commands generated by DIAGSNAP spin CPU(Doc ID 27182006.8)
  • Bug 24692439 – Auxiliary commands generated by DIAGSNAP consumes high CPU(Doc ID 24692439.8)
  • Bug 23101338 – Disable diagsnap after 12.1.0.2.160419 GI PSU patch was installed(Doc ID 23101338.8)
  • Bug 28462215 – The Process diagsnap.pl is Restarted Every 2 Minutes(Doc ID 28462215.8)

Well,

At least for the 12Cs, I’m disabling it in all my environments:

– Check if diagsnap is disabled or not (DIAGSNAP=Disable if disabled, nothing if enabled)

$ egrep '^DIAGSNAP|^PSTACK' /u01/app/12.2.0.1/grid/crf/admin/crf$(hostname -s).ora

– Disable diagsnap (this will disable diagsnap on all nodes)

$ /u01/app/12.2.0.1/grid/bin/oclumon manage -disable diagsnap

– Check that diagsnap is disabled (DIAGSNAP=Disable if disabled, nothing if enabled) — to be done on each node

$ egrep '^DIAGSNAP|^PSTACK' /u01/app/12.2.0.1/grid/crf/admin/crf$(hostname -s).ora

Hope it helps!
Cheers!

Oracle Patching with OPlan

Everyone that I’ve worked with knows that I don’t like patching (and sometimes I try to imagine who does), but they are necessary to corrects bugs and improve the Oracle software stability.

When you have a single node server with one database, the patch planing is no brainer but when you have a RAC with multiple nodes, different Oracle homes and so on, the planning and preparations start to get more complex and it is easy to miss or overlook a step in the planning which can lead to issues during your patching.

So to help me with all that I use oplan. Oplan is a tool which comes with along OPatch and you can get its latest version in patch 6880880

More informations on oplan can be found here: Oracle Software Patching with OPLAN (Doc ID 1306814.1)

OK, so what do I used it most for?

Generating the apply patching steps, which are very in handy:

$ORACLE_HOME/OPatch/oplan/oplan generateApplySteps <bundle patch location>

And my favorite, rollback steps, which I have done more times that I would like to admit:

$ORACLE_HOME/OPatch/oplan/oplan generateRollbackSteps <bundle patch location>

Also as rollback, I do tar of the oracle binaries being patched prior as there times even the rollback did not work :-/

Both files will be created under the directory below and you will see an html and text files.

$ORACLE_HOME/cfgtoollogs/oplan/<TimeStamp>/

This process is to help you organise your steps, read it through prior executing to make sure it makes sense in your environment

Oplan has its limitations, from the Oracle note which I mentioned above:

Data Guard configurations are not supported.
OPlan can be used to create patch plans for Oracle home's running Oracle Data Guard configurations, but OPlan does not consider such an environment usable as 'Data Guard Standby-First Patch Apply' alternative. See the following for additional information on 'Data Guard Standby-First Patch Apply'

<Document 1265700.1> Oracle Patch Assurance - Data Guard Standby-First Patch Apply

Shared Oracle Home Configurations are not supported.

Single Instance Databases running in the same configuration are not supported

Even so I would still use it as it generates a plan based on your target environment adding more information that you would need to do manually if you were only to read the README files from the patching

Hope it helps.

Thanks and until next time

Elisson Almeida

RMOUG Training Days 2020!

Hi all,

I’m happy to let you know I’ll be this year at the RMOUG Training Days 2020!

The event is happening this year from Feb 18th to 20th. Here is the event home, for additional information.

word-image-1 (1)

I see there agenda is full of great names, exponents of Oracle Community. Check here for the complete agenda.

Some other additional important links:

I can barely wait to be among some long-distance friends in this great place:

uk1mhqcwwphb1aocwjp9

See you there!

Automatic SQL Tuning Advisor Raising ORA-00600: internal error code, arguments: [qksvcReplaceVC0]

Hi all,

So I got to receive frequently this error, always on same hour, from a database:

ORA-00600: internal error code, arguments: [qksvcReplaceVC0], [], [], [], [], [], [], [], [], [], [], []

Not much was required for matching it to the Automatic SQL Tuning Advisor.

This only seems to happen during execution of Automatic SQL Tuning Advisor. Several bugs have been logged for the issue but have not been resolved as the error is not reproducible at will. For example:

Bug 17401718: ORA-600 [QKSVCREPLACEVC0] USING SQL TUNING ADVISOR
Bug 16491690: ORA-600 [QKSVCREPLACEVC0] WHEN AUTOMATIC SQL TUNING ADVISOR EXECUTED
Bug 13959984: ORA-00600 [QKSVCREPLACEVC0]

How to fix it? Apply the patches!

To workaround it?

A few options:

1. Setting “_replace_virtual_columns” to false.

You can set this parameter at both session (where automatic SQL Tuning Advisor starts)
and system level with the following commands-

SQL> alter session set "_replace_virtual_columns"=false;

SQL> alter system set "_replace_virtual_columns"=false

2. Since it is only failing in the SQL Tuning Advisor auto task and has no effect on the database the error can be ignored.
You can disable that auto task and just run it manually when required:

–check auto job status

SQL> select client_name,status from dba_autotask_task;

SQL> select client_name,status from dba_autotask_client;

SQL> select client_name, operation_name, status from dba_autotask_operation;

–disable SQL Tuning Advisor job

SQL> exec dbms_auto_task_admin.disable ('sql tuning advisor', null, null);

-OR-

SQL> exec dbms_auto_task_admin.disable (client_name => 'sql tuning advisor', operation => null, window_name => null);

–enable SQL Tuning Advisor job

SQL> exec dbms_auto_task_admin.enable ('sql tuning advisor', null, null);

-OR-

SQL> exec dbms_auto_task_admin.enable (client_name => 'sql tuning advisor', operation => null, window_name => null);

 

Hope it helps!