EM Event: Metrics “Current Open Cursors Count” is at %

Hi all,
This is simple right?

Some dev is forgetting to close the cursors. 🙂
If you don’t know what I’m talking about, I couldn’t find better reference than this article by Tech on the Net.

As DBA, we can identify the application schema which is causing the issue by the following:More“EM Event: Metrics “Current Open Cursors Count” is at %”

Exadata: 7 Useful Commands to check Port/Sensor Alarms

Hello all!

This days I had an alarm with message below:

Message=The aggregate sensor /SYS/CABLE_CONN_STAT has a fault.

There is some useful commands I used to verify all ports/sensors in my exadata cluster.

In summary, these commands:
1) Use Intelligent Platform Management Interface (IPMI) to read the Sensor Data Record (SDR) repository
2) Use Intelligent Platform Management Interface (IPMI) to view the ILOM SP System Event Log (SEL)
3) Display all host nodes with ibhosts
4) Use ibcheckstate to scan InfiniBand fabric and validate the port logical and physical state
5) Use ibcheckerrors to scan InfiniBand fabric and validate the connectivity as described in the topology file
6) Checking for sensor healthy from switch
7) Check the overall health of the InfiniBand switch, on the Exadata switch itself

The Commands are:

More“Exadata: 7 Useful Commands to check Port/Sensor Alarms”

Long running jobs AQ$_PLSQL_NTFN_%

Hello all!

Some time ago I had an issue with some jobs running for 64 days. Cool han?
All jobs names were AQ$_PLSQL_NTFN_%. All activity of jobs were related to SQLID f7zggdz9p7bhk in wait event “Streams AQ: waiting for messages in the queue”.

# SQLID f7zggdz9p7bhk:
begin SYS.SCHEDULER$_JOB_EVENT_HANDLER(context => :1,reginfo => sys.aq$_reg_info(:2, :3, :4, :5, :6, :7),descr => sys.aq$_descriptor(:8, :9, :10, sys.msg_prop_t(:11, :12, :13, :14, :15, :16, :17, :18, sys.aq$_agent(:19, :20, :21), :22, :23), sys.aq$_ntfn_descriptor(:24), :25, :26), payload => :27, payloadl => :28); end;

Reviewing on MOS, found a match to Bug 20528052 – Many AQ$_PLSQL_NTFN jobs executed affecting database performance (Doc ID 20528052.8).

The root cause is procedure SYS.SCHEDULER$_JOB_EVENT_HANDLER keep waiting AQ PL/SQL Notification callbacks associated with scheduler job email notifications for messages in “SYS”.”SCHEDULER$_EVENT_QUEUE” which no longer exist.

From MOS, this can be easily solved with fix for Bug 16623661 – No email notification is sent by scheduler, wich is included in 12.1.0.1 (base) and affect versions 11.2.0.2 and later.
Also from MOS, and which I did, is drop these jobs. The problem is these jobs just restart and have the same problem again after automated recreation.

So, apply the fix or upgrade your database. 😉

Some related references:
Bug 20528052 – Many AQ$_PLSQL_NTFN jobs executed affecting database performance (Doc ID 20528052.8)
AQ$_PLSQL_NTFN Scheduler Jobs Executed in large numbers affecting Database performance (Doc ID 2001165.1)
Bug 21665897: FIX FOR BUG 14712567 CAUSES LARGE NUMBERS OF AQ$_PLSQL_NTFN_% TO BE SPAWNED

Hope it helps!
Cheers!

Purge Recycle Bin Older than…

Hello all!

Have your recyclebin being full/big but don’t want to make a complete “PURGE RECYCLE BIN;”?
Want remove only older than x days (like older than 30 days)?

Please don’t:

DELETE from DBA_RECYCLEBIN where droptime<sysdate-30;
PURGE DBA_RECYCLEBIN where droptime<sysdate-30;

A good way to perform that is to execute output from:

# 30 Days
select 'purge table '||owner||'."'||OBJECT_NAME||'";' 
from dba_recyclebin where type='TABLE' 
and to_date(droptime,'YYYY-MM-DD:HH24:MI:SS')<sysdate-30;

# 30 minutes
select 'purge table '||owner||'."'||OBJECT_NAME||'";' 
from dba_recyclebin where type='TABLE' and 
to_date(droptime,'YYYY-MM-DD:HH24:MI:SS')<sysdate-(30/(24*60));

There is some reference here too:
How To Drop / Delete / Purge Recyclebin or DBA_RECYCLBIN For Objects Older Than x Days/minutes or selective purge of objects from recycle bin (Doc ID 1596147.1)

Hope it help you!
Cheers!

Logs Exploding: ORA-00904: “UTL_RAW”.”CAST_FROM_NUMBER”: invalid identifier

Hello all!
Last week I suddenly had several database connections being denied due “ORA-09925: Unable to create audit trail file”. This just a few hours a maintenance in this Database.

When investigating, realized there was TO MUCH traces like “$DBNAME_q00%.trc using several GBs.
As you know, those traces are from QMON (Queue Monitor) Background Process, usually related to Oracle Streams Advanced Queueing.
Oh! Also noticed that database was started with srvctl, as per usually recommended.

After some research I found MOS Note: ORA-00904 Reported by QMON When Starting Database With SRVCTL (Doc ID 1950142.1).
This situation seems to be a match to bug Bug 18680601/19995869. Regarding Doc:
– As solution, Oracle recommends apply merge of patches 19995869 and patch 18628216.
– As immediate action, the workaround is shutdown database and start with sqlplus instead of srvctl.

Check below one of my trace files to match to your situation.

Hope it helps.

Cheers!

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /app/oracle/product/11.2.0.4
System name:    Linux
Node name:      racgrepora1
Release:        2.6.32-642.15.1.el6.x86_64
Version:        #1 SMP Mon Feb 20 02:26:38 EST 2017
Machine:        x86_64
Instance name: GREPORADB
Redo thread mounted by this instance: 1
Oracle process number: 77
Unix process pid: 36422, image: oracle@racgrepora1 (Q001)


*** 2017-08-01 05:05:51.352
*** SESSION ID:(646.64779) 2017-08-01 05:05:51.352
*** CLIENT ID:() 2017-08-01 05:05:51.352
*** SERVICE NAME:(SYS$BACKGROUND) 2017-08-01 05:05:51.352
*** MODULE NAME:(Streams) 2017-08-01 05:05:51.352
*** ACTION NAME:(QMON Slave) 2017-08-01 05:05:51.352

KSV 12801 error in slave process

*** 2017-08-01 05:05:51.352
ORA-12801: error signaled in parallel query server PZ99, instance racwt22:JTS012 (2)
ORA-00904: "UTL_RAW"."CAST_FROM_NUMBER": invalid identifier
OPIRIP: Uncaught error 447. Error stack:
ORA-00447: fatal error in background process
ORA-12801: error signaled in parallel query server PZ99, instance racwt22:JTS012 (2)
ORA-00904: "UTL_RAW"."CAST_FROM_NUMBER": invalid identifier

OEM Alarm – %MB of Audit Trail files (sizeOfOSAuditFiles:FILE_SIZE)

Hello All,
After upgrading a OEM to 13c, I started to receive notifications for event “sizeOfOSAuditFiles:FILE_SIZE“.

This is a new event implemented on OEM DB Plugin 12.1.0.7.0 under “Operating System Audit Records” metric group. Upgrading DB Plugin was part of OEM Upgrade change once we had some old versioned.

This event is only a notification related to file size for space management ends. The default thresholds are 10MB (warning) and 20MB (critical), which in most of times it’s a pretty low value.
This is specifically related to location under parameter audit_file_dest if you want to check.

Between options to reduce the noise are disable this metric or increase thresholds accordingly, which was what I did.
At this moment, I just increased thresholds to 500MB/2048MB, which I consider good values for the environment.

Some reference about can be found at:
– Enterprise Manager Oracle Database Plug-in Metric Reference Manual (Plug-in Release 12.1.0.7) – Database Instance – Operating System Audit Records
– EM 12c, EM 13c: Troubleshooting Database Metrics in Enterprise Manager 12c and 13c Cloud Control (Doc ID 2032156.1)

Hope it helps,
Cheers!

ORA-00600: [qkswcWithQbcRefdByMain4]

Hello all,
This days I found this in a client’s 12c Database when trying to create a Materialized View:

ORA-00600: internal error code, arguments: [qkswcWithQbcRefdByMain4]

A perfect match to MOS ORA-00600 [qkswcWithQbcRefdByMain4] when Create MV “WITH” clause (Doc ID 2232872.1).

The root cause is documented on BUG 22867413 – ORA-600 CALLING DBMS_ADVISOR.TUNE_MVIEW.
The given solution is to apply Patch 22867413.

After applying patch, issue solved. 🙂

More“ORA-00600: [qkswcWithQbcRefdByMain4]”

SQLNET.ORA Parameter: SQLNET.EXPIRE_TIME

Hello all,
Asking why to use this parameter? Why this is set in your environment? Or even how it can hep you? Here we go:

As per Oracle documentation, this parameter is used to avoid unused sessions to be kept open in database and locking resources.
It describe in minutes how much time a client/probe can be inactive before be ended.

I recently found an environment where this parameter was too low (1 minute), potentially causing some overhead in communication only for validations. By documentation, if it’s decided to enable it, Oracle recommends value “10”.
This way, after checking and no one be aware about this parameter reason to be there, I just suppressed him from SQLNET.ORA (going back to default “0”, which is equivalent to “disabled”), so we could even reduce the network workload.
However, now we are aware that in case we have abnormal closure of clients, we can have some unused connections opened consuming resources… Not a problem at this point… 🙂

More“SQLNET.ORA Parameter: SQLNET.EXPIRE_TIME”

CRS Not Starting after Removing OS User: How to Workaround and How to Solve!

Hello all!
Turns that a few days ago a client reached me because his CRSD was simply not starting. Like this:

[root@proddb proddb]$ ./crsctl start res ora.crsd -init
CRS-2672: Attempting to start 'ora.crsd' on 'proddb'
CRS-2676: Start of 'ora.crsd' on 'proddb' succeeded

[root@proddb proddb]$ ps -ef |grep crsd
root 19217 13424 0 11:53 pts/0 00:00:00 grep crsd

After some investigation, I found the following:

2017-01-24 14:00:06.859: [ CRSSEC][1690195712]{1:51052:2} Exception: OwnerEntry construction failed to retrieve user id by name with ACL string: owner:jacknobody:rwx and error: 1
2017-01-24 14:00:06.912: [ CRSSEC][1690195712]{1:51052:2} Exception: ACL entry creation failed for: owner:jacknobody:rwx

Hmmm, seems some CRS resources are owned by “Jack Nobody”… Turns that I this us was removed from OS:

[root@proddb proddb]$ cat /etc/passwd |grep jacknobody
[root@proddb proddb]$ 

What to do now?

More“CRS Not Starting after Removing OS User: How to Workaround and How to Solve!”

OGG-00446 No data selecting position from checkpoint table GGATE.CHECKPOINTTABLE for group

After GoldenGate upgrade GoldenGate from 12.1 to 12.2, replicat abend with error:

ERROR OGG-00446 No data selecting position from checkpoint table GGATE.CHECKPOINTTABLE for group ‘R_CRM’, key 2028080425 (0x78e20d29), SQL .

ERROR OGG-01668 PROCESS ABENDING.

It appears replicat is not registered on CHECKPOINTTABLE (is’t true, it’s registered)

Try below:

As a workaround you can ALTER the replicat to the same checkpoint location when you do an INFO REP , DETAIL and then START.

Ex:

GGSCI (labp.grepora.net) 27> info R_CRM

REPLICAT   R_CRM     Last Started 2017-05-09 13:03   Status STOPPED
Checkpoint Lag       00:00:00 (updated 02:40:11 ago)
Log Read Checkpoint  File ./dirdat/s5000609
                     2017-07-04 10:51:44.035765  RBA 98381832


GGSCI (labp.grepora.net) 28> alter R_CRM extseqno 00609 extrba 98381832

2017-07-04 14:44:39  INFO    OGG-06594  Replicat R_CRMhas been altered through GGSCI. Even the start up position might be updated, duplicate suppression remains active in next startup. To override duplicate suppression, start R_CRM with NOFILTERDUPTRANSACTIONS option.

REPLICAT altered.


GGSCI (labp.grepora.net) 29> start R_CRM

Sending START request to MANAGER ...
REPLICAT R_CRM starting

Cheers!