Oracle 12.2.0.1 Trace Facility Problems

Hey!
Quick one today… Not sure if you noticed, but 12.2.0.1 has some problems with the trace facility. Some components/processes generate huge amount of trace files.

MOS notes:
– RMAN BACKUP Command Always Generates Trace File Having “kcbtse_print_kod” Message (Doc ID 2311975.1)
– MMON trace file grows and is flooded with: kmgs_parameter_update_timeout Gen0 messages (Doc ID 2319931.1)

Both have their patches already…

Additionally, Mike Dietrich’s blog posts about above notes:

https://mikedietrichde.com/2018/05/24/rman-backup-generates-traces-in-oracle-12-2-0-1/
https://mikedietrichde.com/2018/05/23/mmon-unconditional-traces-in-oracle-12-2-0-1/

 

Trace on Standby: Tracing MRP Process

Hey all!
Recently I got the following error when trying to start a trace in a standby. The idea was to start a trace in MRP to check on performance, but when trying to use DBMS_MONITOR.session_trace_enable:

oracle:standby-srvr /tmp: sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Fri Jan 19 14:07:56 2018

Copyright (c) 1982, 2014, Oracle. All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> exec DBMS_MONITOR.session_trace_enable(session_id =>1970, serial_num=>55126, waits=>TRUE, binds=>TRUE);
BEGIN DBMS_MONITOR.session_trace_enable(session_id =>1970, serial_num=>55126, waits=>TRUE, binds=>TRUE); END;

*
ERROR at line 1:
ORA-06550: line 1, column 7:
PLS-00201: identifier 'DBMS_MONITOR.SESSION_TRACE_ENABLE' must be declared
ORA-06550: line 1, column 7:
PL/SQL: Statement ignored

OMG!
Don’t panic, you can accomplish the same using oradebug:

-- In my case:
SQL> select process,pid from V$managed_standby where process like '%MRP%';

PROCESS PID
--------- ------------------------
MRP0 45693

SQL> oradebug setospid 19564
Oracle pid: 105, Unix process pid: 19564, image: oracle@standby-srvr (MRP0)
SQL> oradebug unlimit
Statement processed.
SQL> oradebug Event 10046 trace name context forever, level 12
Statement processed.
SQL>

You can apply same for any process/sid you need.

In case you desire to get the spid from a sid:

col machine format a30
col process format 999999
select p.spid,b.sid, p.pid
from v$session b, v$process p
where b.paddr=p.addr
and sid=&sid
/

Ok, and how to stop then?
Easy:

SQL> oradebug setospid 19564
SQL> oradebug Event 10046 trace name context off
SQL> oradebug tracefile_name

Enjoy!

Logs Exploding: ORA-00904: “UTL_RAW”.”CAST_FROM_NUMBER”: invalid identifier

Hello all!
Last week I suddenly had several database connections being denied due “ORA-09925: Unable to create audit trail file”. This just a few hours a maintenance in this Database.

When investigating, realized there was TO MUCH traces like “$DBNAME_q00%.trc using several GBs.
As you know, those traces are from QMON (Queue Monitor) Background Process, usually related to Oracle Streams Advanced Queueing.
Oh! Also noticed that database was started with srvctl, as per usually recommended.

After some research I found MOS Note: ORA-00904 Reported by QMON When Starting Database With SRVCTL (Doc ID 1950142.1).
This situation seems to be a match to bug Bug 18680601/19995869. Regarding Doc:
– As solution, Oracle recommends apply merge of patches 19995869 and patch 18628216.
– As immediate action, the workaround is shutdown database and start with sqlplus instead of srvctl.

Check below one of my trace files to match to your situation.

Hope it helps.

Cheers!

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /app/oracle/product/11.2.0.4
System name:    Linux
Node name:      racgrepora1
Release:        2.6.32-642.15.1.el6.x86_64
Version:        #1 SMP Mon Feb 20 02:26:38 EST 2017
Machine:        x86_64
Instance name: GREPORADB
Redo thread mounted by this instance: 1
Oracle process number: 77
Unix process pid: 36422, image: oracle@racgrepora1 (Q001)


*** 2017-08-01 05:05:51.352
*** SESSION ID:(646.64779) 2017-08-01 05:05:51.352
*** CLIENT ID:() 2017-08-01 05:05:51.352
*** SERVICE NAME:(SYS$BACKGROUND) 2017-08-01 05:05:51.352
*** MODULE NAME:(Streams) 2017-08-01 05:05:51.352
*** ACTION NAME:(QMON Slave) 2017-08-01 05:05:51.352

KSV 12801 error in slave process

*** 2017-08-01 05:05:51.352
ORA-12801: error signaled in parallel query server PZ99, instance racwt22:JTS012 (2)
ORA-00904: "UTL_RAW"."CAST_FROM_NUMBER": invalid identifier
OPIRIP: Uncaught error 447. Error stack:
ORA-00447: fatal error in background process
ORA-12801: error signaled in parallel query server PZ99, instance racwt22:JTS012 (2)
ORA-00904: "UTL_RAW"."CAST_FROM_NUMBER": invalid identifier

Application Looping Until Lock a Row with NOWAIT Clause

Yesterday I treated an interesting situation:
A BATCH stayed on “SQL*Net message from client” event but the last_call_et was always on 1 or 0. Seems OK, with some client contention to send the commands to the DBMS, right? Nope.

It was caused by a loop in the application code “waiting” for a row lock but without “DBMS waiting events” (something like “select * from table for update nowait”). Take a look in how it was identified below.

First the session with no SQL_ID, no wait events and last_Call_et=0 of a “BATH_PROCESS” user:

proddb2> @sid
Sid:9796
Inst:
LAST_CALL_ET SQL_ID   EVENT STATUS SID SERIAL# INST_ID USERNAME
------------ ------- ------------- ---------- ------------------------
0 SQL*Net message from client INACTIVE 9796 45117 2 BATCH_PROCESS
proddb2> @trace
Enter value for sid: 9796
Enter value for serial: 45117
PL/SQL procedure successfully completed.

As you see, with no idea about what is happening, I started a trace. The trace was stuck with this:

*** 2015-06-15 14:03:25.755
WAIT #4574470448: nam='SQL*Net message from client' ela=993072 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833326636999
CLOSE #4574470448:c=10,e=15,dep=0,type=3,tim=12833326637228
PARSE #4574470448:c=25,e=41,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1139820409,tim=12833326637286
BINDS #4574470448:
Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=01 fl2=1000000 frm=00 csi=00 siz=24 off=0
kxsbbbfp=110a8d0d8 bln=22 avl=05 flg=05
value=5022011
WAIT #4574470448: nam='gc cr block 2-way' ela= 709 p1=442 p2=5944 p3=8483 obj#=0 tim=12833326638533
WAIT #4574470448: nam='gc cr block 2-way' ela= 541 p1=3 p2=2088264 p3=4367 obj#=0 tim=12833326639352
WAIT #4574470448: nam='gc cr block 2-way' ela= 651 p1=442 p2=5944 p3=8483 obj#=0 tim=12833326641673
WAIT #4574470448: nam='enq: TX - row lock contention' ela= 1093 name|mode=1415053318 usn<obj#=23141074 tim=12833326643029
EXEC #4574470448:c=1776,e=5836,p=0,cr=117,cu=1,mis=0,r=0,dep=0,og=1,plh=1139820409,tim=12833326643150
ERROR #4574470448:err=54 tim=12833326643172
WAIT #4574470448: nam='SQL*Net break/reset to client' ela= 9 driver id=1413697536 break?=1 p3=0 obj#=23141074 tim=12833326643373
WAIT #4574470448: nam='SQL*Net break/reset to client' ela= 503 driver id=1413697536 break?=0 p3=0 obj#=23141074 tim=12833326643891
WAIT #4574470448: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833326643915

AHÁ!
Did you see the “err=54” there? Yes. You know this error:

ORA-00054: Resource busy and acquire with NOWAIT specified

It’s caused by a SELECT FOR UPDATE NOWAIT in the code.
But, this select is in a loop, so the session don’t go ahead until have it.
(Obviously it could be coded with some treatment/better logic for this loop and errors, buuuut…)

What can we do now?
The easy way is to discover the holding session and kill it.
And sometimes the easy way is the best way. 😉

For that, we use the “obj#” and “value”, also bolded in the trace.
As I know the application, I know that the used field in all “where clauses” is the “RECNO” column. But if you don’t, it’s needed to discover. With this information in mind:

proddb2>select * from dba_objects where object_id='23141074';
OWNER OBJECT_NAME
------------------------------ ----------------
OWNER_EXAMPLE TABLE_XPTO
proddb2> select * from OWNER_EXAMPLE.TABLE_XPTO WHERE recno=5022011;
COL_KEY FSAMED0 FSAMED1 FSMNEG1 FSMNEG2 FSMNEG3 COL_DATE RECNO
------- ---------- ---------- ---------- ---------- ---------- -----
1002974 0 0 -516.8 0 0 15/06/2015 00:00:00 5022011

Ok, I know the row that is holded by the other session.
Let’s discover which session is causing a lock by myself (but in my case, without “NOWAIT” clause, to have time to find the holder):

proddb5>select * from OWNER_EXAMPLE.TABLE_XPTO WHERE recno=5022011 for update;

In another sqlplus session:

proddb2> @me
INST_ID SID SERIAL# USERNAME EVENT BLOCKING_SE BLOCKING_SESSION BLOCKING_INSTANCE
------- ---------- ---------- --------------- ----------------------
5 14174 479 MATHEUS_BOESING enq:TX - row lock contention VALID 11006 1
2 4233 12879 MATHEUS_BOESING PX Deq: Execution Msg NOT IN WAIT
1 15410 7697 MATHEUS_BOESING PX Deq: Execution Msg NOT IN WAIT

AHÁ again!
The SID 11006. Let’s see who is there:

proddb2> @sid
Sid:11006
Inst:
SQL_ID SEQ# EVENT STATUS SID SERIAL# INST_ID USERNAME
-------------------- ---------- --------------------------------------
9jzm6vn5j06js 24919 enq: TX - row lock contention ACTIVE 11006 44627 1 DBLINK_OTHER_BATCH_SCHEMA

Ok, it’s another session of a different batch process in a remote database holding this row. As it’s less relevant, lets kill! Muahaha!
Then, you’ll see, my session get the lock and is in the middle of a transaction:

proddb1> @kill
***
sid : 11006
serial : 44627
***
System altered.
***
proddb1> @me
INST_ID SID SERIAL# USERNAME EVENT BLOCKING_SE BLOCKING_SESSION BLOCKING_INSTANCE
------- ---------- ---------- --------------- --------------------
5 14174 479 MATHEUS_BOESING transaction UNKNOWN
2 4332 56037 MATHEUS_BOESING PX Deq: Execution Msg NOT IN WAIT
1 12058 9 MATHEUS_BOESING class slave wait NO HOLDER

To release the “row locked” to my principal process, lets suicide (kill my own session, this case, that is holding the row lock right now).

proddb5> @kill
***
sid : 14174
serial : 479
***
System altered.
***

After kill all the holding sessions, my BATCH_PROCESS just gone! 😀
Take a look on the trace (running ok):

WAIT #4576933904: nam='SQL*Net message to client' ela= 3 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981531019
FETCH #4576933904:c=45,e=71,p=0,cr=3,cu=0,mis=0,r=5,dep=0,og=1,plh=419358542,tim=12833981531062
WAIT #4576933904: nam='SQL*Net message from client' ela= 562 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981531654
WAIT #4576933904: nam='SQL*Net message to client' ela= 3 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981531788
FETCH #4576933904:c=55,e=86,p=0,cr=2,cu=0,mis=0,r=5,dep=0,og=1,plh=419358542,tim=12833981531826
WAIT #4576933904: nam='SQL*Net message from client' ela= 715 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981532576
WAIT #4576933904: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981532721
FETCH #4576933904:c=61,e=96,p=0,cr=2,cu=0,mis=0,r=5,dep=0,og=1,plh=419358542,tim=12833981532758
WAIT #4576933904: nam='SQL*Net message from client' ela= 600 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981533617
WAIT #4576933904: nam='SQL*Net message to client' ela= 3 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981534163
FETCH #4576933904:c=52,e=82,p=0,cr=2,cu=0,mis=0,r=5,dep=0,og=1,plh=419358542,tim=12833981534203
WAIT #4576933904: nam='SQL*Net message from client' ela= 517 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981534752

Now, with the problem solved, lets disable the trace and continue the other daily tasks… 🙂

proddb2> @untrace
Enter value for sid: 9796
Enter value for serial: 45117
PL/SQL procedure successfully completed.

I hope it was useful!
If helped you, make a comment! 😀

See ya!
Matheus.