Everything Stopped on ORA-02002!

Hi all,
So, I got contacted by a client in emergency because no more sessions were being opened on database with error ORA-02002. Even worse, everything stopped working and started raising this error.

Unusual, right?

So , checking on this trace generated for error here is what I found:

ORA-02002: error while writing to audit trail
ORA-00604: error occurred at recursive SQL level 1
ORA-01653: unable to extend table SYS.AUD$ by 8 in tablespace AUDIT_TBS

Seems client has moved the AUD$ to another tablespace to avoid filling SYSTEM, just like described in this post. The tablespace got full, however, due bad monitoring, it got full.

As the auditing facility was unable to write to the AUDIT_TRAIL table, SQL statements that were currently being audited also failed.

What to do?

  1. Turn off auditing for the database
  2. Restart database
  3. Add space to tablespace
  4. Re-enable auditing, if required.

To avoid application back and forth, I did it in restrict mode, until get all fixed.

To avoid this sort of issue:

  • Be sure you are monitoring properly the new tablespace.
  • Place the audit tablespace on a reliable disk location and perhaps mirrored for protection.

Hope it helps,
Cheers!

12.2 Scheduler Job Disappeared After Creation

Hi all,
Just sharing an experience here. Recently in a migrated 12.2 environment I created a Scheduler Job and it simply disappeared. How come?
Well, seems since 12.2 the AUTO_DROP attribute is set to true by default. I absolutely don’t know why. So I had to recreate the job and set auto_drop to false, simple like this.

I couldn’t find any related bug on MOS, but I’d relate this as a bug, once the default behavior have changed with no prior information.
So, now on, I strongly suggest you to check on your job AUTO_DROP attribute after job creation.

Here is a test case that reproduces the problem:

In a 12.2 database:

begin 
  DBMS_SCHEDULER.create_job (
    job_name        => 'TEST_JOB',
    job_type        => 'PLSQL_BLOCK',
    job_action      => 'BEGIN null; END;',
    start_date      => SYSTIMESTAMP,
    repeat_interval => 'SYSTIMESTAMP + INTERVAL ''10'' SECOND',
    end_date        => NULL,
    enabled         => false,
    comments        => 'Test job');
end;
/
select ENABLED, AUTO_DROP, STATE, RUN_COUNT, FAILURE_COUNT,DEFERRED_DROP from dba_scheduler_jobs where job_name = 'TEST_JOB';
-- Job should appear here

begin 
  dbms_scheduler.enable('TEST_JOB'); 
  dbms_scheduler.run_job('TEST_JOB',false); 
end;
/
 
select ENABLED, AUTO_DROP, STATE, RUN_COUNT, FAILURE_COUNT,DEFERRED_DROP from dba_scheduler_jobs where job_name = 'TEST_JOB';

-- There is no job now in 12.2.

You can take the AUTO_DROP easily from the query above, or using getting the job_ddl and you’ll see the AUTO_DROP set to true, as in the bottom of this page by Tim Hall.

Hope it helps!
Cheers!

ORA-00600 [ipc_recreate_que_2]

Hi all,
So this week I started receiving this error in a client environment. This was happening due Bug 26803191 – Getting ORA-00600 [ipc_recreate_que_2] instead of ORA-27515 – superseded (Doc ID 26803191.8)
In summary, failures due to low memlock limit are leading ORA-00600 [ipc_recreate_que_2] instead of ORA-27515, patching ensure that ORA-27515 is raised on failures due to low memlock limit instead of this ORA-600.

As workaround, the memlock limited can be changed, as also described in MOS ORA-00600: Internal Error Code, Arguments: [ipc_recreate_que_2] When Running Exachk (Doc ID 2480088.1).

Before:

[boesing@exa1dbadm01 trace]$ grep memlock /etc/security/limits.conf |grep oracle
# - memlock - max locked-in-memory address space (KB)
oracle soft memlock 237809520
oracle hard memlock 237809520

After:

[boesing@exa1dbadm01 ~]$ grep memlock /etc/security/limits.conf
# - memlock - max locked-in-memory address space (KB)
oracle soft memlock unlimited
oracle hard memlock unlimited
* soft memlock 32768
* hard memlock 32768

Hope it helps,
Cheers!

ORA-00600: internal error code, arguments: [13011], [7907] on SYS.DBMS_AQ_INV

Hi all,

So a couple weeks ago I was facing the following on a client environment. It was appearing every 10 mins on alert.log:

ORA-00600: internal error code, arguments: [13011], [7907], [12679954], [1], [13461738], [0], [], [], [], [], [], []
ORA-01403: no data found
ORA-06512: at "SYS.DBMS_AQ_INV", line 1248

I could find several references to ORA-600 [13011], always related to some internal corruption. As this table is related to Advanced Queueing, decided to check on Scheduler Job table structures:

SQL> analyze table SYS.SCHEDULER$_EVENT_LOG validate structure cascade; 

Table analyzed. 

SQL> analyze table SYS.SCHEDULER$_JOB_RUN_DETAILS validate structure cascade; 
analyze table SYS.SCHEDULER$_JOB_RUN_DETAILS validate structure cascade 
* 
ERROR at line 1: 
ORA-01499: table/index cross reference failure - see trace file

SQL> analyze table SYS.SCHEDULER$_JOB_RUN_DETAILS validate structure;

Table analyzed.

Ahá!
Also on the generated trace file:

2019-03-01 22:26:37.736 
SESSION ID:(39.32751) 2019-03-01 22:26:37.736 
CLIENT ID) 2019-03-01 22:26:37.736 
SERVICE NAME:(SYS$USERS) 2019-03-01 22:26:37.736 
MODULE NAME:(sqlplus.exe) 2019-03-01 22:26:37.736 
CLIENT DRIVER:(SQL*PLUS) 2019-03-01 22:26:37.736 
ACTION NAME) 2019-03-01 22:26:37.736 
CONTAINER ID:(1) 2019-03-01 22:26:37.736 
Table/Index row count mismatch 
table 273184 : index 275017, 1832 
Index root = tsn: 1 rdba: 0x00c0128a

So ANALYZE on table SCHEDULER$_JOB_RUN_DETAILS fails with CASCADE but succeeds without CASCADE. This means that there is a problem with one of the index of this table. It has gone out of sync with the table.

SQL> select 'alter index '||owner||'.'||index_name||' rebuild;' from dba_indexes where table_name='SCHEDULER$_JOB_RUN_DETAILS';

'ALTERINDEX'||OWNER||'.'||INDEX_NAME||'REBUILD;'
--------------------------------------------------------------------------------
alter index SYS.I_SCHEDULER_JOB_RUN_DETAILS rebuild;

SQL> alter index SYS.I_SCHEDULER_JOB_RUN_DETAILS rebuild online;

Index altered.

After this, error solved and no more recurrence of that ORA-600.

Also note I couldn’t find any document about this on MOS, so this is kind of exclusive by now. 🙂

Cheers!

Starting ASM: ORA-29701: unable to connect to Cluster Synchronization Service

Hey all,
So, I bet you have seen this error already, as this is quite common when messing up with Cluster configuration, which DBAs love to do…. no?

Well, here is what you may be facing:

SQL> startup
ORA-01078: failure in processing system parameters
ORA-29701: unable to connect to Cluster Synchronization Service
SQL>

The error is kind of clear: Cluster Synchronization Service (CSS) is not available. So, let’s start it from ASM Cluster (or HAS).

$GRID_HOME/bin/crsctl start resource -all

Or, for Standalone:

$GRID_HOME/bin/crsctl start has

To check on status:

$GRID_HOME/bin/crsctl status resource -t

Complete example (attention to CSSD):

[root@greporasrv1 ~]# crsctl start has
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.
[root@greporasrv1 ~]# crsctl start resource -all
CRS-5702: Resource ‘ora.evmd’ is already running on ‘greporasrv1’
CRS-2501: Resource ‘ora.ons’ is disabled
CRS-2672: Attempting to start ‘ora.cssd’ on ‘greporasrv1’
CRS-2672: Attempting to start ‘ora.diskmon’ on ‘greporasrv1’
CRS-2676: Start of ‘ora.diskmon’ on ‘greporasrv1’ succeeded
CRS-2676: Start of ‘ora.cssd’ on ‘greporasrv1’ succeeded
CRS-4000: Command Start failed, or completed with errors.
[root@greporasrv1 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ons
OFFLINE OFFLINE greporasrv1 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
1 ONLINE ONLINE greporasrv1 STABLE
ora.diskmon
1 OFFLINE OFFLINE STABLE
ora.evmd
1 ONLINE ONLINE greporasrv1 STABLE
--------------------------------------------------------------------------------
[root@greporasrv1 ~]#

Hope that worked! 😀

Oh, it didn’t? Did you changed hostname name or something? In this case, you may want to deconfig HAS and reconfigure using root.sh (part regular installation):

cd $ORACLE_HOME
./crs/install/roothas.pl -deconfig -force
./crs/install/roothas.pl -delete -force
./root.sh

 

Hey! Be careful with that, it might be unrecoverable. 😉


			

12cR1 – Stop CRS: OC4J Fails to Stop

Hi all!
So, I started to to see this error to stop a 12.1.0.2 CRS :

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'grepora-srv-01'
CRS-2673: Attempting to stop 'ora.crsd' on 'grepora-srv-01'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'grepora-srv-01'
CRS-2679: Attempting to clean 'ora.oc4j' on 'grepora-srv-01'
CRS-2680: Clean of 'ora.oc4j' on 'grepora-srv-01' failed
CRS-2799: Failed to shut down resource 'ora.oc4j' on 'grepora-srv-01'
CRS-2794: Shutdown of Cluster Ready Services-managed resources on 'grepora-srv-01' has failed
CRS-2675: Stop of 'ora.crsd' on 'grepora-srv-01' failed
CRS-2799: Failed to shut down resource 'ora.crsd' on 'grepora-srv-01'
CRS-2795: Shutdown of Oracle High Availability Services-managed resources on 'grepora-srv-01' has failed
CRS-4687: Shutdown command has completed with errors.

When running a crsctl stat res -t it will show something like this

ora.oc4j 1 ONLINE UNKNOWN grpora-srv-01 CLEANING

Solution? Follow The rootupgrade.sh fails because the oc4j resource fails to stop (Doc ID 1921629.1):

1) Stop the OC4J resource, as grid user:

 srvctl stop oc4j -f

Note: Even if the oc4j does not stop, continue with step 2
2) Disable the OC4J resource, as grid user:

 srvctl disable oc4j

3) Confirm the OC4J resource is NOT running, as grid user:

 srvctl status oc4j

If the OC4J resource is still running, go to the node where oc4j is running and recycle GI by issuing the following as root:

 crsctl stop crs -f
 crsctl start crs

Why?
oc4j is used only for Quality of Service Management (QoS feature), as per:
https://docs.oracle.com/database/121/APQOS/install_config.htm#APQOS151
http://docs.oracle.com/cd/E11882_01/server.112/e24611.pdf

Searched on MOS and only found notes saying we cannot remove default services (it is not supported). But as a DBA, in the middle of the night with environment down, this may be a solution for you, specially if you are not using QoS.

Hope it helps. Cheers!

ORA-31600: invalid input value DATABASE LINK for parameter OBJECT_TYPE in function GET_DDL When Dropping DBLink

So while back there was an error while trying to drop a public database link

SQL> drop public database link "BOB.DATABASE_LINK_1";
drop public database link "BOB.DATABASE_LINK_1"
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-31600: invalid input value DATABASE LINK for parameter OBJECT_TYPE in
function GET_DDL
ORA-06512: at "SYS.DBMS_METADATA", line 5805
ORA-06512: at "SYS.DBMS_METADATA", line 8344
ORA-06512: at line 14

And the even strange part is that executing DBMS_METADATE.get_ddl worked against the database link

SQL> SELECT dbms_metadata.get_ddl('DB_LINK','BOB.DATABASE_LINK_1','SYS') FROM dual;

CREATE DATABASE LINK "BOB.DATABASE_LINK_1"
CONNECT TO "NYU" IDENTIFIED BY VALUES '*******'
USING 'ORCL';

The solution relied on MOS note Cannot drop a database link after changing the global_name ORA-02024 (Doc ID 382994.1) all though the symptoms in the note were not a match to the problem above the solution did the trick.
Which is to backup the table sys.link$ as CTAS and delete the row corresponding to the database link which you need to drop.

Take a complete consistent backup of the database or use CTAS can be used to backup sys.link$:

$sqlplus /nolog
connect / as sysdba

SQL> create table backup_link$ as select * from sys.link$:

Delete the DBLINK as follows:

$sqlplus /nolog
connect / as sysdba

SQL> delete sys.link$ where name='db_link_name>';

SQL>commit;

Verify if the operation was correctly proceeded:

select db_link, username, host from user_db_links;

Hope it helps!

DG Broker ORA-16766: Redo Apply is stopped after a Database Restart

So, the other day I was engaged to check a dataguard, while checking its status with the dg broker I found this error

Error: ORA-16766: Redo Apply is stopped

Checking for it on dg broker:

DGMGRL> show configuration verbose

Configuration - DGCONFIG

Protection Mode: MaxPerformance
Databases:
PRIMARY - Primary database
STANDBY - Physical standby database
Error: ORA-16766: Redo Apply is stopped

Properties:
FastStartFailoverThreshold = '30'
OperationTimeout = '30'
FastStartFailoverLagLimit = '30'
CommunicationTimeout = '180'
ObserverReconnect = '0'
FastStartFailoverAutoReinstate = 'TRUE'
FastStartFailoverPmyShutdown = 'TRUE'
BystandersFollowRoleChange = 'ALL'
ObserverOverride = 'FALSE'
ExternalDestination1 = ''
ExternalDestination2 = ''
PrimaryLostWriteAction = 'CONTINUE'

Fast-Start Failover: DISABLED

Configuration Status:
ERROR

GMGRL> show database verbose STANDBY

Database - STANDBY

Enterprise Manager Name: STANDBY
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 3 seconds ago)
Apply Lag: 1 hour(s) 50 minutes 6 seconds (computed 0 seconds ago)
Apply Rate: 828.00 KByte/s
Real Time Query: OFF
Instance(s):
STANDBY

Database Error(s):
ORA-16766: Redo Apply is stopped

Properties:
DGConnectIdentifier = 'STANDBY'
ObserverConnectIdentifier = ''
LogXptMode = 'ASYNC'
DelayMins = '0'
Binding = 'optional'
MaxFailure = '0'
MaxConnections = '1'
ReopenSecs = '300'
NetTimeout = '30'
RedoCompression = 'DISABLE'
LogShipping = 'ON'
PreferredApplyInstance = ''
ApplyInstanceTimeout = '0'
ApplyParallel = 'AUTO'
StandbyFileManagement = 'AUTO'
ArchiveLagTarget = '900'
LogArchiveMaxProcesses = '4'
LogArchiveMinSucceedDest = '1'
DbFileNameConvert = ''
LogFileNameConvert = ''
FastStartFailoverTarget = ''
InconsistentProperties = '(monitor)'
InconsistentLogXptProps = '(monitor)'
SendQEntries = '(monitor)'
LogXptStatus = '(monitor)'
RecvQEntries = '(monitor)'
ApplyLagThreshold = '0'
TransportLagThreshold = '0'
TransportDisconnectedThreshold = '30'
SidName = 'STANDBY'
StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=SB_SERVER)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=STANDBY_DGMGRL)(INSTANCE_NAME=STANDBY)(SERVER=DEDICATED)))'
StandbyArchiveLocation = 'USE_DB_RECOVERY_FILE_DEST'
AlternateLocation = ''
LogArchiveTrace = '0'
LogArchiveFormat = 'STANDBY_%r_%t_%s.arc'
TopWaitEvents = '(monitor)'

Database Status:
ERROR

The standby state was APPLY-ON but when checked the standby if the MRP process was running I found none!

sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Fri Jan 4 17:05:12 2019

Copyright (c) 1982, 2013, Oracle. All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options

17:05:12 SYS@STANDBY AS SYSDBA> SELECT PROCESS, STATUS FROM v$managed_standby;

PROCESS STATUS
--------- ------------
ARCH CLOSING
ARCH CLOSING
ARCH CLOSING
ARCH CLOSING
RFS IDLE
RFS IDLE
RFS IDLE
RFS WRITING

8 rows selected.

Elapsed: 00:00:00.01

Checking further I found that the database was bounced while the MRP was running and that was the only explanation that I could see to see a difference between the dg broker also I could not find any error in the dg broker log $ORACLE_HOME/rdbms/log/drc*.log

After setting the state to apply-off and them to apply-on the issue was resolved.

DGMGRL> edit database STANDBY set state='apply-off';
Succeeded.
DGMGRL> edit database STANDBY set state='apply-on';
Succeeded.
DGMGRL> show database verbose STANDBY;

Database - STANDBY

Enterprise Manager Name: STANDBY
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 1 second ago)
Apply Lag: 0 seconds (computed 1 second ago)
Apply Rate: 0 Byte/s
Real Time Query: OFF
Instance(s):
STANDBY

Properties:
DGConnectIdentifier = 'STANDBY'
ObserverConnectIdentifier = ''
LogXptMode = 'ASYNC'
DelayMins = '0'
Binding = 'optional'
MaxFailure = '0'
MaxConnections = '1'
ReopenSecs = '300'
NetTimeout = '30'
RedoCompression = 'DISABLE'
LogShipping = 'ON'
PreferredApplyInstance = ''
ApplyInstanceTimeout = '0'
ApplyParallel = 'AUTO'
StandbyFileManagement = 'AUTO'
ArchiveLagTarget = '900'
LogArchiveMaxProcesses = '4'
LogArchiveMinSucceedDest = '1'
DbFileNameConvert = ''
LogFileNameConvert = ''
FastStartFailoverTarget = ''
InconsistentProperties = '(monitor)'
InconsistentLogXptProps = '(monitor)'
SendQEntries = '(monitor)'
LogXptStatus = '(monitor)'
RecvQEntries = '(monitor)'
ApplyLagThreshold = '0'
TransportLagThreshold = '0'
TransportDisconnectedThreshold = '30'
SidName = 'STANDBY'
StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=SB_SERVER)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=STANDBY_DGMGRL)(INSTANCE_NAME=STANDBY)(SERVER=DEDICATED)))'
StandbyArchiveLocation = 'USE_DB_RECOVERY_FILE_DEST'
AlternateLocation = ''
LogArchiveTrace = '0'
LogArchiveFormat = 'STANDBY_%r_%t_%s.arc'
TopWaitEvents = '(monitor)'

Database Status:
SUCCESS

DGMGRL> show configuration

Configuration - DGCONFIG

Protection Mode: MaxPerformance
Databases:
PRIMARY - Primary database
STANDBY - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

DGMGRL> exit

sqlplus / as sysdba
17:17:54 SYS@STANDBY AS SYSDBA> SELECT PROCESS, STATUS FROM v$managed_standby;

PROCESS STATUS
--------- ------------
ARCH CLOSING
ARCH CLOSING
ARCH CLOSING
ARCH CLOSING
RFS IDLE
RFS IDLE
RFS IDLE
RFS IDLE
MRP0 APPLYING_LOG

9 rows selected.

Elapsed: 00:00:00.00

Hope it helps, unti the next one.

Elisson Almeida

 

ORA 600 [kdt_bseg_srch_cbk PITL1] / ORA-00700 [PITL6] / ORA-00700 [PITL5]

Hi all,
So, right after a patching, a client environment started receiving entries on alert log like:

ORA-00700: soft internal error, arguments: [PITL6], [], [], [], [], [], [], [], [], [], [], []
ORA-00700: soft internal error, arguments: [kdt_bseg_srch_cbk PITL5], [11], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kdt_bseg_srch_cbk PITL1], [2], [], [], [], [], [], [], [], [], [], []

First a quick word regarding ORA-700: An ORA-700 is a so-called ‘soft’ assert. Soft asserts are triggered when the caller wants to make a note of the fact that something unexpected has happened, but would like to continue on because the failure is not fatal to the process or the instance. This was introduced in 12c and got some of ORA-600 messages (the informative ones), to leave ORA-600 for more critical issues.

Now to the errors: This is a clear match to Bug 28483184 (Bug 28483184 – ORA-600[PITL1] ON UPDATE TO COMPRESSED BLOCK WITH FIX FOR BUG 28364411 INSTALLED), which consists in a known defect in the Oracle code allows continued insertion of non-header block rows past the number of block ITLs in data blocks of OLTP-compressed tables.

More info: MOS Errors Noted in 12.2 and Above During DML on Compressed Tables: ORA-00600 [PITL1] / ORA-00600 [kdt_bseg_srch_cbk PITL1] / ORA-00700 [PITL6] / ORA-00700 [kdt_bseg_srch_cbk PITL5] (Doc ID 2420831.1)

To fix it: Patch on RDBMS for bug 28483184 and reorganize on involved tables. Seems to me that the patch only fixes the incorrect creation of non-header blocks, but doesn’t fix those that already have the problem in place.

My twists: I’d recommend you to increase a little bit the PCTFREE during the table reorganization, also based on Oracle’s recommendation. Recreating the tables with a larger PERCENT FREE will enable more space for ITLs. This is advantageous since the fix changes out a block that has the maximum amount of ITLs for one that has less ITLs.

If arranging a patching window is a big problem, note this patch is online installable. I’d recommend to do it offline using RAC Rolling however, as this seems a more consistent process in general.

So in summary, recommended actions are:
– Apply Patch 28483184 (RAC Rolling, Standby-First and online installable).
– Reorganize table, using methods mentioned by Suresh (ex: Create table as select (CTAS), Alter table move, Data Pump export / import, Online redefinition)

Hope that helps you!

Changing initrans on a Partitioned Index

Hi all,

We were having high ITL waits and high number of deadlocks (due to the hanging transactions waiting on the ITL on the data block) on a table and index and part of the fix was to increase the INITRANS on the table and on the index.

INITRANS is a physical attribute that determines the initial number of concurrent transaction entries allocated within each data block for a given table/index/cluster. Every transaction that updates a block has to acquire an Interested Transaction List (ITL) slot, in order to store the transaction id, rollback information and ultimately lock the necessary row/index entry within the block.

For an existing object it needs to be rebuild as INITRANS is a physical attribute on the Oracle datablock. So if you change on a table and you want it to take effect for the current data you need to move the table or one partition for example.

If you are doing this type of change on a partitioned index you need to change the index default attribute as well rebuild the index partition or the entire index.

But one thing caught my eye while working on this as after setting the new INITRANS default attribute and rebuilt the index online it did not changed the INITRANS values.

See below the execution on a test system

Continue reading