Logs Exploding: ORA-00904: “UTL_RAW”.”CAST_FROM_NUMBER”: invalid identifier

Hello all!
Last week I suddenly had several database connections being denied due “ORA-09925: Unable to create audit trail file”. This just a few hours a maintenance in this Database.

When investigating, realized there was TO MUCH traces like “$DBNAME_q00%.trc using several GBs.
As you know, those traces are from QMON (Queue Monitor) Background Process, usually related to Oracle Streams Advanced Queueing.
Oh! Also noticed that database was started with srvctl, as per usually recommended.

After some research I found MOS Note: ORA-00904 Reported by QMON When Starting Database With SRVCTL (Doc ID 1950142.1).
This situation seems to be a match to bug Bug 18680601/19995869. Regarding Doc:
– As solution, Oracle recommends apply merge of patches 19995869 and patch 18628216.
– As immediate action, the workaround is shutdown database and start with sqlplus instead of srvctl.

Check below one of my trace files to match to your situation.

Hope it helps.

Cheers!

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /app/oracle/product/11.2.0.4
System name:    Linux
Node name:      racgrepora1
Release:        2.6.32-642.15.1.el6.x86_64
Version:        #1 SMP Mon Feb 20 02:26:38 EST 2017
Machine:        x86_64
Instance name: GREPORADB
Redo thread mounted by this instance: 1
Oracle process number: 77
Unix process pid: 36422, image: oracle@racgrepora1 (Q001)


*** 2017-08-01 05:05:51.352
*** SESSION ID:(646.64779) 2017-08-01 05:05:51.352
*** CLIENT ID:() 2017-08-01 05:05:51.352
*** SERVICE NAME:(SYS$BACKGROUND) 2017-08-01 05:05:51.352
*** MODULE NAME:(Streams) 2017-08-01 05:05:51.352
*** ACTION NAME:(QMON Slave) 2017-08-01 05:05:51.352

KSV 12801 error in slave process

*** 2017-08-01 05:05:51.352
ORA-12801: error signaled in parallel query server PZ99, instance racwt22:JTS012 (2)
ORA-00904: "UTL_RAW"."CAST_FROM_NUMBER": invalid identifier
OPIRIP: Uncaught error 447. Error stack:
ORA-00447: fatal error in background process
ORA-12801: error signaled in parallel query server PZ99, instance racwt22:JTS012 (2)
ORA-00904: "UTL_RAW"."CAST_FROM_NUMBER": invalid identifier

OEM Alarm – %MB of Audit Trail files (sizeOfOSAuditFiles:FILE_SIZE)

Hello All,
After upgrading a OEM to 13c, I started to receive notifications for event “sizeOfOSAuditFiles:FILE_SIZE“.

This is a new event implemented on OEM DB Plugin 12.1.0.7.0 under “Operating System Audit Records” metric group. Upgrading DB Plugin was part of OEM Upgrade change once we had some old versioned.

This event is only a notification related to file size for space management ends. The default thresholds are 10MB (warning) and 20MB (critical), which in most of times it’s a pretty low value.
This is specifically related to location under parameter audit_file_dest if you want to check.

Between options to reduce the noise are disable this metric or increase thresholds accordingly, which was what I did.
At this moment, I just increased thresholds to 500MB/2048MB, which I consider good values for the environment.

Some reference about can be found at:
– Enterprise Manager Oracle Database Plug-in Metric Reference Manual (Plug-in Release 12.1.0.7) – Database Instance – Operating System Audit Records
– EM 12c, EM 13c: Troubleshooting Database Metrics in Enterprise Manager 12c and 13c Cloud Control (Doc ID 2032156.1)

Hope it helps,
Cheers!

OEM Alarms: TaskZombieException

Hello all,
Recently I started to have several alarms like this:

Internal error detected: java.lang.Throwable:oracle.sysman.gcagent.tmmain.execution.LongOpManager$ZombieDetection:1017.

Or

Internal error detected: oracle.sysman.gcagent.task.TaskZombieException:oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask:620.

In agent log the message:

# $AGENT_INST/sysman/log/gcagent.log:

2017-06-13 12:54:09,232 [355:GC.Executor.14 (oracle_database:DB_DB12:%DB%)] ERROR - oracle_database:DB_DB12:%DB% oracle.sysman.gcagent.task.TaskZombieException: task declared as a zombie
at oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask.accountedCall(TaskFutureImpl.java:620)
at oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask.call(TaskFutureImpl.java:643)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)

To avoid generating pages and new incidents, I changed parameter _zombieCreateIncident to false on agents in related servers, as per described in MOS EM12c: Incident constantly raised for Oracle.sysman.gcagent.task.TaskZombieException: task declared as a zombie (Doc ID 2116834.1) regarding MOS Bug 22674258 – Zombie processes created for DB metric collection /workaround do not work.
I also added some other parameters that should help, like increasing the wait, all mentioned in same MOS notes.

Like this, in $AGENT_INST/sysman/config/emd.properties:

#GrepOra magical fixes:
_zombieSuspensions=true
_canceledThreadWait=900
_zombieThreadPercentThreshold=0
_zombieCreateIncident=false

And it solved the issue. At leat, the noise stopped. 🙂

Hope it helps,
Cheers!

OEM 12c+: Prevent OMS from Sending Old Notifications for Events

Are you receiving old notifications from OEM? Like 2 or 3 days past, mostly already solved or after a blackout?
Yeah, this is annoying, specially when getting floods and floods of notifications.

Ok, so here go a very good tip: You can set grace period for notifications! 🙂
Easy easy, do this way:

cd /bin
emctl set property -name oracle.sysman.core.notification.grace_period -value [provide value in minutes]

The oracle.sysman.core.notification.grace_period OMS parameter has been introduced in 12c and allows the user to configure the grace period within which the notification should be sent. The value is set in minutes.

For example:

emctl set property -name oracle.sysman.core.notification.grace_period -value 1440

With this, OMS sends only those notifications which have been raised in the past 1440 mins (last 24 hours) and ignores all the notifications for events / incidents created prior to this time period.

After this, you’ll need to start OMS:

emctl start oms

The oracle.sysman.core.notification.grace_period parameter applies to all the Notification methods, but if the requirement is to specify the grace period for a particular notification method only, you can use the below parameters accordingly:

oracle.sysman.core.notification.grace_period_connector: For Connectors
oracle.sysman.core.notification.grace_period_email: For email notifications
oracle.sysman.core.notification.grace_period_oscmd: For OS Command notifications
oracle.sysman.core.notification.grace_period_plsql: For PLSQL notifications
oracle.sysman.core.notification.grace_period_snmp: For SNMP Trap notifications
oracle.sysman.core.notification.grace_period_ticket: For ticketing tools

This is weel described as per MOS: 12c Cloud control: How to Prevent OMS from Sending out Old Notifications for Events / Incidents Occurred in the Past? (Doc ID 1605351.1)

Cheers!

ORA-00600: [qkswcWithQbcRefdByMain4]

Hello all,
This days I found this in a client’s 12c Database when trying to create a Materialized View:

ORA-00600: internal error code, arguments: [qkswcWithQbcRefdByMain4]

A perfect match to MOS ORA-00600 [qkswcWithQbcRefdByMain4] when Create MV “WITH” clause (Doc ID 2232872.1).

The root cause is documented on BUG 22867413 – ORA-600 CALLING DBMS_ADVISOR.TUNE_MVIEW.
The given solution is to apply Patch 22867413.

After applying patch, issue solved. 🙂

Continue reading

SQLNET.ORA Parameter: SQLNET.EXPIRE_TIME

Hello all,
Asking why to use this parameter? Why this is set in your environment? Or even how it can hep you? Here we go:

As per Oracle documentation, this parameter is used to avoid unused sessions to be kept open in database and locking resources.
It describe in minutes how much time a client/probe can be inactive before be ended.

I recently found an environment where this parameter was too low (1 minute), potentially causing some overhead in communication only for validations. By documentation, if it’s decided to enable it, Oracle recommends value “10”.
This way, after checking and no one be aware about this parameter reason to be there, I just suppressed him from SQLNET.ORA (going back to default “0”, which is equivalent to “disabled”), so we could even reduce the network workload.
However, now we are aware that in case we have abnormal closure of clients, we can have some unused connections opened consuming resources… Not a problem at this point… 🙂

Continue reading

CRS Not Starting after Removing OS User: How to Workaround and How to Solve!

Hello all!
Turns that a few days ago a client reached me because his CRSD was simply not starting. Like this:

[root@proddb proddb]$ ./crsctl start res ora.crsd -init
CRS-2672: Attempting to start 'ora.crsd' on 'proddb'
CRS-2676: Start of 'ora.crsd' on 'proddb' succeeded

[root@proddb proddb]$ ps -ef |grep crsd
root 19217 13424 0 11:53 pts/0 00:00:00 grep crsd

After some investigation, I found the following:

2017-01-24 14:00:06.859: [ CRSSEC][1690195712]{1:51052:2} Exception: OwnerEntry construction failed to retrieve user id by name with ACL string: owner:jacknobody:rwx and error: 1
2017-01-24 14:00:06.912: [ CRSSEC][1690195712]{1:51052:2} Exception: ACL entry creation failed for: owner:jacknobody:rwx

Hmmm, seems some CRS resources are owned by “Jack Nobody”… Turns that I this us was removed from OS:

[root@proddb proddb]$ cat /etc/passwd |grep jacknobody
[root@proddb proddb]$ 

What to do now?

Continue reading