Exadata: Cell Server Crashing on ORA-00600: [LinuxBlockIO::reap]

Hi all,

So I started facing this in a client environment. Here is the alert message:

Target name=db01cel08.xxx.com
Message=ORA-00600: internal error code, arguments: [LinuxBlockIO::reap], [0x60000D502388], [], [], [], [], [], [], [], [], [], []
Event reported time=Dec 19, 2019 2:14:16 AM EDT

When checking on the cellserver I see this message:

[root@db01 ~]# ssh db01cel08
Last login: Thu Dec 19 04:45:13 2019 from db01.xxx.com
[root@db01cel08 ~]# cellcli
CellCLI: Release 12.1.2.3.5 - Production on Fri Dec 19 17:13:31 EDT 2019

Copyright (c) 2007, 2016, Oracle. All rights reserved.

CellCLI> LIST ALERTHISTORY detail

[...]

name: 10
alertDescription: "ORA-07445: exception encountered: core dump [__intel_new_memset()+62] [11] [0x000000000] [] [] []"
alertMessage: "ORA-07445: exception encountered: core dump [__intel_new_memset()+62] [11] [0x000000000] [] [] []"
alertSequenceID: 10
alertShortName: ADR
alertType: Stateless
beginTime: 2019-12-19T02:00:04-04:00
endTime:
examinedBy:
notificationState: 1
sequenceBeginTime: 2019-12-19T02:00:04-04:00
severity: critical
alertAction: "Errors in file /opt/oracle/cell/log/diag/asm/cell/SYS_112331_170406/trace/cellofltrc_19796_53.trc (incident=25). Diagnostic package is attached. It is also accessible at https://db01cel08.xxx.com/diagpack/download?name=db01cel08_2019_12_19T02_00_04_10.tar.bz2 It will be retained on the storage server for 7 days. If the diagnostic package has expired, then it can be re-created at https://db01cel08.xxx.com/diagpack"

name: 11
alertDescription: "ORA-00600: internal error code, arguments: [LinuxBlockIO::reap], [0x60000D502388], [], [], [], [], [], [], [], [], [], []"
alertMessage: "ORA-00600: internal error code, arguments: [LinuxBlockIO::reap], [0x60000D502388], [], [], [], [], [], [], [], [], [], []"
alertSequenceID: 11
alertShortName: ADR
alertType: Stateless
beginTime: 2019-12-19T02:00:04-04:00
endTime:
examinedBy:
notificationState: 1
sequenceBeginTime: 2019-12-19T02:00:04-04:00
severity: critical
alertAction: "Errors in file /opt/oracle/cell/log/diag/asm/cell/db01cel08/trace/svtrc_9174_12.trc (incident=25). Diagnostic package is attached. It is also accessible at https://db01cel08.xxxx.com/diagpack/download?name=jdb01cel08_2019_12_19T02_00_04_11.tar.bz2 It will be retained on the storage server for 7 days. If the diagnostic package has expired, then it can be re-created at https://db01cel08.xxx.com/diagpack"

name: 12_1
alertDescription: "A SQL PLAN quarantine has been added"
alertMessage: "A SQL PLAN quarantine has been added. As a result, Smart Scan is disabled for SQL statements with the quarantined SQL plan. Quarantine id : 21 Quarantine type : SQL PLAN Quarantine reason : Crash Quarantine Plan : SYSTEM Quarantine Mode : FULL_Quarantine DB Unique Name : XPTODB Incident id : 25 SQLID : 8j0az9sgxs5yh SQL Plan details : {SQL_PLAN_HASH_VALUE=281152830, PLAN_LINE_ID=9} In addition, the following disk region has been quarantined, and Smart Scan will be disabled for this region: Disk Region : {Grid Disk Name=Unknown, offset=186750337024, size=1M} "
alertSequenceID: 12
alertShortName: Software
alertType: Stateful
beginTime: 2019-12-19T02:00:12-04:00
examinedBy:
metricObjectName: QUARANTINE/21
notificationState: 1
sequenceBeginTime: 2019-12-19T02:00:12-04:00
severity: critical
alertAction: "A SQL statement caused the Cell Server (CELLSRV) service on the cell to crash. A SQL PLAN quarantine has been created to prevent the same SQL statement from causing the same cell to crash. When possible, disable offload for the SQL statement or apply the RDBMS patch that fixes the crash, then remove the quarantine with the following CellCLI command: CellCLI> drop quarantine 21 All quarantines are automatically removed when a cell is patched or upgraded. For information about how to disable offload for the SQL statement, refer to the section about 'SQL Processing Offload' in Oracle Exadata Storage Server User's Guide. Diagnostic package is attached. It is also accessible at https://db01cel08.xxx.com/diagpack/download?name=db01cel08_2019_12_19T02_00_12_12_1.tar.bz2 It will be retained on the storage server for 7 days. If the diagnostic package has expired, then it can be re-created at https://db01cel08.xxx.com/diagpack"

CellCLI>

After some research, we could match the situation to Bug 13245134 – Query may fail with errors ORA-27618, ORA-27603, ORA-27626 or ORA-00600[linuxblockio::reap_1] or ora-600 [cacheput::process_1]

It’s also described as per: Exadata/SuperCluster: 11.2 databases missing fix for the bug 13245134 may lead to cell service crash with ora-600 [linuxblockio::reap_1]/ora-600 [cacheput::process_1] or ORA-27626: Exadata error: 242/Smart scan issues on the RDBMS side

In order to resolve the crashes quickly, I applied the patch online with:

After applying, all got solved:

[oracle@db01 ~]$ /oracle/xptodb/product/11.2.0.4/OPatch/opatch lsinventory -OH $ORACLE_HOME | grep 13245134
Patch (online) 13245134: applied on Thu Dec 19 23:34:50 EST 2019
13245134
[oracle@db01 ~]$

Hope it helps!

Automatic SQL Tuning Advisor Raising ORA-00600: internal error code, arguments: [qksvcReplaceVC0]

Hi all,

So I got to receive frequently this error, always on same hour, from a database:

ORA-00600: internal error code, arguments: [qksvcReplaceVC0], [], [], [], [], [], [], [], [], [], [], []

Not much was required for matching it to the Automatic SQL Tuning Advisor.

This only seems to happen during execution of Automatic SQL Tuning Advisor. Several bugs have been logged for the issue but have not been resolved as the error is not reproducible at will. For example:

Bug 17401718: ORA-600 [QKSVCREPLACEVC0] USING SQL TUNING ADVISOR
Bug 16491690: ORA-600 [QKSVCREPLACEVC0] WHEN AUTOMATIC SQL TUNING ADVISOR EXECUTED
Bug 13959984: ORA-00600 [QKSVCREPLACEVC0]

How to fix it? Apply the patches!

To workaround it?

A few options:

1. Setting “_replace_virtual_columns” to false.

You can set this parameter at both session (where automatic SQL Tuning Advisor starts)
and system level with the following commands-

SQL> alter session set "_replace_virtual_columns"=false;

SQL> alter system set "_replace_virtual_columns"=false

2. Since it is only failing in the SQL Tuning Advisor auto task and has no effect on the database the error can be ignored.
You can disable that auto task and just run it manually when required:

–check auto job status

SQL> select client_name,status from dba_autotask_task;

SQL> select client_name,status from dba_autotask_client;

SQL> select client_name, operation_name, status from dba_autotask_operation;

–disable SQL Tuning Advisor job

SQL> exec dbms_auto_task_admin.disable ('sql tuning advisor', null, null);

-OR-

SQL> exec dbms_auto_task_admin.disable (client_name => 'sql tuning advisor', operation => null, window_name => null);

–enable SQL Tuning Advisor job

SQL> exec dbms_auto_task_admin.enable ('sql tuning advisor', null, null);

-OR-

SQL> exec dbms_auto_task_admin.enable (client_name => 'sql tuning advisor', operation => null, window_name => null);

 

Hope it helps!

ORA-00600: internal error code, arguments: [kghfrempty:ds]

Hi all,
Annoying thing, just because that sounded veeery unsual to me. I would expect a different error.

I have seen this “[kghfrempty:ds]” in past a couple times, and if you did see it too you may have noticed, this is almost as much generic as the ORA-600 itself. It happens because this is usually in the top of KGH errors in general. Some examples:

1.

kghnerror <- kghfrempty <- kghfrempty_ex <- qerhjFreeSpace
<- qerhjFetch <- qersoProcessULS <- qersoFetch <- opifch2 <- kpoal8
<- opiodr <- kpoodrc <- rpiswu2 <- kpoodr <- upirtrc
<- kpurcsc <- kpufch0 <- kpufch <- OCIStmtFetch2 <- qksanExecSql <- qksanAnalyzeSql

2.

kghfrempty <- kghfrempty_partial <- kghgex <- kghfnd <- kghalo <- kghgex <- kghalf <- ktmrProcessCRClone 
<- ktmbRead <- ktmrget_int <- ktmrget <- kdmsTransGet <- kdst_fetch_imc  <- kdsttgr  <- qertbFetch

And as being it can be related either to analyze bugs (as per stack 1), In-Memory Column Store – IMC (as per stack 2), create views, rollback over cluster, trigger creation, use function based indexes, or several other generic situations (there is no pattern, as you can see).

So what do to if you get this message? Well, generically, read all the MOS notes available and try to match you case. IF you match, pray to have a bug for your current version, or open an SR to either have a backport or start a new investigation.

A few examples of things to check:

  • ALERT: Bug 13605839 ORA-600 [ktbsdp1] ORA-600 [kghfrempty:ds] ORA-600 [kdBlkCheckError]. Corruption in Rollback with Clusterwide Global Transactions in RAC (Doc ID 1527740.1)
  • SQL Analyze Fails with ORA-600[kghfrempty:ds] (Doc ID 2247180.1)
    Bug 21556276 – dataguide createviewonpath hit ora 600 [kghfrempty:ds] (Doc ID 21556276.8)
  • Bug 20878625 – ORA-600[kghfrempty:ds] with IMC enabled (Doc ID 20878625.8)
  • Bug 19212166 – ORA-600 [KGHFREMPTY:DS] With Cache Line Alignment Enabled (Doc ID 19212166.8)
  • Bug 6797925 – ORA-600 [kghfrempty:ds] During Creation Of Trigger In XML Schema (Doc ID 6797925.8)
  • 11.1: ORA-600 [17147] AND [kghfrempty:ds] On Select Using Functional Indexes (Doc ID 884882.1)
  • ORA-600 [kghfrempty:ds] And ORA-600 [kghrcdepth:ds] While Gathering Statistics (Doc ID 1480132.1)

BUUT I just noticed some undocumented scenario causing this and it might help you on fixing this as well.
I noticed I was getting this error for different queries but always from same application. So I addressed a complete review on application layer and noticed the JDBC 7 on this application. Happens we had just upgraded from 12.1 to 19c.

SO, after upgrading jdbc, it started to work again properly and no MOS SR was required. So lesson learned: When you receive this kghfrempty:ds], speacially after migration, also check for your application JDBC version.

For compatibility Matrix: Starting With Oracle JDBC Drivers – Installation, Certification, and More! ( Doc ID 401934.1 )

Hope it helps, cheers!

ORA-00600 [ipc_recreate_que_2]

Hi all,
So this week I started receiving this error in a client environment. This was happening due Bug 26803191 – Getting ORA-00600 [ipc_recreate_que_2] instead of ORA-27515 – superseded (Doc ID 26803191.8)
In summary, failures due to low memlock limit are leading ORA-00600 [ipc_recreate_que_2] instead of ORA-27515, patching ensure that ORA-27515 is raised on failures due to low memlock limit instead of this ORA-600.

As workaround, the memlock limited can be changed, as also described in MOS ORA-00600: Internal Error Code, Arguments: [ipc_recreate_que_2] When Running Exachk (Doc ID 2480088.1).

Before:

[boesing@exa1dbadm01 trace]$ grep memlock /etc/security/limits.conf |grep oracle
# - memlock - max locked-in-memory address space (KB)
oracle soft memlock 237809520
oracle hard memlock 237809520

After:

[boesing@exa1dbadm01 ~]$ grep memlock /etc/security/limits.conf
# - memlock - max locked-in-memory address space (KB)
oracle soft memlock unlimited
oracle hard memlock unlimited
* soft memlock 32768
* hard memlock 32768

Hope it helps,
Cheers!

ORA-00600: internal error code, arguments: [13011], [7907] on SYS.DBMS_AQ_INV

Hi all,

So a couple weeks ago I was facing the following on a client environment. It was appearing every 10 mins on alert.log:

ORA-00600: internal error code, arguments: [13011], [7907], [12679954], [1], [13461738], [0], [], [], [], [], [], []
ORA-01403: no data found
ORA-06512: at "SYS.DBMS_AQ_INV", line 1248

I could find several references to ORA-600 [13011], always related to some internal corruption. As this table is related to Advanced Queueing, decided to check on Scheduler Job table structures:

SQL> analyze table SYS.SCHEDULER$_EVENT_LOG validate structure cascade; 

Table analyzed. 

SQL> analyze table SYS.SCHEDULER$_JOB_RUN_DETAILS validate structure cascade; 
analyze table SYS.SCHEDULER$_JOB_RUN_DETAILS validate structure cascade 
* 
ERROR at line 1: 
ORA-01499: table/index cross reference failure - see trace file

SQL> analyze table SYS.SCHEDULER$_JOB_RUN_DETAILS validate structure;

Table analyzed.

Ahá!
Also on the generated trace file:

2019-03-01 22:26:37.736 
SESSION ID:(39.32751) 2019-03-01 22:26:37.736 
CLIENT ID) 2019-03-01 22:26:37.736 
SERVICE NAME:(SYS$USERS) 2019-03-01 22:26:37.736 
MODULE NAME:(sqlplus.exe) 2019-03-01 22:26:37.736 
CLIENT DRIVER:(SQL*PLUS) 2019-03-01 22:26:37.736 
ACTION NAME) 2019-03-01 22:26:37.736 
CONTAINER ID:(1) 2019-03-01 22:26:37.736 
Table/Index row count mismatch 
table 273184 : index 275017, 1832 
Index root = tsn: 1 rdba: 0x00c0128a

So ANALYZE on table SCHEDULER$_JOB_RUN_DETAILS fails with CASCADE but succeeds without CASCADE. This means that there is a problem with one of the index of this table. It has gone out of sync with the table.

SQL> select 'alter index '||owner||'.'||index_name||' rebuild;' from dba_indexes where table_name='SCHEDULER$_JOB_RUN_DETAILS';

'ALTERINDEX'||OWNER||'.'||INDEX_NAME||'REBUILD;'
--------------------------------------------------------------------------------
alter index SYS.I_SCHEDULER_JOB_RUN_DETAILS rebuild;

SQL> alter index SYS.I_SCHEDULER_JOB_RUN_DETAILS rebuild online;

Index altered.

After this, error solved and no more recurrence of that ORA-600.

Also note I couldn’t find any document about this on MOS, so this is kind of exclusive by now. 🙂

Cheers!

ORA 600 [kdt_bseg_srch_cbk PITL1] / ORA-00700 [PITL6] / ORA-00700 [PITL5]

Hi all,
So, right after a patching, a client environment started receiving entries on alert log like:

ORA-00700: soft internal error, arguments: [PITL6], [], [], [], [], [], [], [], [], [], [], []
ORA-00700: soft internal error, arguments: [kdt_bseg_srch_cbk PITL5], [11], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kdt_bseg_srch_cbk PITL1], [2], [], [], [], [], [], [], [], [], [], []

First a quick word regarding ORA-700: An ORA-700 is a so-called ‘soft’ assert. Soft asserts are triggered when the caller wants to make a note of the fact that something unexpected has happened, but would like to continue on because the failure is not fatal to the process or the instance. This was introduced in 12c and got some of ORA-600 messages (the informative ones), to leave ORA-600 for more critical issues.

Now to the errors: This is a clear match to Bug 28483184 (Bug 28483184 – ORA-600[PITL1] ON UPDATE TO COMPRESSED BLOCK WITH FIX FOR BUG 28364411 INSTALLED), which consists in a known defect in the Oracle code allows continued insertion of non-header block rows past the number of block ITLs in data blocks of OLTP-compressed tables.

More info: MOS Errors Noted in 12.2 and Above During DML on Compressed Tables: ORA-00600 [PITL1] / ORA-00600 [kdt_bseg_srch_cbk PITL1] / ORA-00700 [PITL6] / ORA-00700 [kdt_bseg_srch_cbk PITL5] (Doc ID 2420831.1)

To fix it: Patch on RDBMS for bug 28483184 and reorganize on involved tables. Seems to me that the patch only fixes the incorrect creation of non-header blocks, but doesn’t fix those that already have the problem in place.

My twists: I’d recommend you to increase a little bit the PCTFREE during the table reorganization, also based on Oracle’s recommendation. Recreating the tables with a larger PERCENT FREE will enable more space for ITLs. This is advantageous since the fix changes out a block that has the maximum amount of ITLs for one that has less ITLs.

If arranging a patching window is a big problem, note this patch is online installable. I’d recommend to do it offline using RAC Rolling however, as this seems a more consistent process in general.

So in summary, recommended actions are:
– Apply Patch 28483184 (RAC Rolling, Standby-First and online installable).
– Reorganize table, using methods mentioned by Suresh (ex: Create table as select (CTAS), Alter table move, Data Pump export / import, Online redefinition)

Hope that helps you!

ORA-600 [kggsmGetString:1]

Hi all!
So, checking on this error, found the following relevant reference: ORA-600 [kggsmGetString:1] (Doc ID 1541187.1).
Once it was during a SQL running, the only match could be Bug 17235420 – ORA-600 [kggsmGetString:1] with adaptive cursor sharing (Doc ID 17235420.8).

The problem?
No detailed information, no workaround, only patches to apply. The only information is “ORA-600 [kggsmGetString:1] can occur in 12c if adaptive cursor sharing is used.

So, checking on SQL Plan details, could see it’s using the following adaptive optimizations:

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - statistics feedback used for this statement
   - this is an adaptive plan (rows marked '-' are inactive)
   - 2 Sql Plan Directives used for this statement

MOS note Recommendations for Adaptive Features in Oracle Database 12c Release 1 (Adaptive Features, Adaptive Statistics and 12c SQL Performance)” ID 2312911.1 recommends to apply Oct 2017 BP or one-off patches to disables adaptive statistics only.

In this case, the easiest solution is to disable both, adaptive plans, as this release only has one parameter to control that – optimizer_adaptive_features which defaults to TRUE.

And it’s online:

SQL> show parameter adaptive

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
optimizer_adaptive_features	     boolean	 TRUE
optimizer_adaptive_reporting_only    boolean	 FALSE
parallel_adaptive_multi_user	     boolean	 TRUE
SQL> alter system set optimizer_adaptive_features=false scope=both;

System altered.

SQL> show parameter adaptive

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
optimizer_adaptive_features	     boolean	 FALSE
optimizer_adaptive_reporting_only    boolean	 FALSE
parallel_adaptive_multi_user	     boolean	 TRUE

After that, issue solved. ORA-600 didn’t repeated.

Hope it helps!

OEM Information Reports: ORA-00600 [kpndbcon-svchpnotNULL]

Having this error from an Information Report?

ORA-00600 [kpndbcon-svchpnotNULL]
ORA-00600: internal error code, arguments: [kpndbcon-svchpnotNULL], [], [], [], [], [], [], [], [], [], [], []

Don’t worry… Basically this is not an Oracle direct issue , the cause of this error is that while the report is running (it takes 2 or 3 minutes) one of the following happens:

  • The Database Session in the OEM Repository (Database Repository) is killed.
  • The Database Session in the Target Database (where OEM has to connect and get the data) is killed.
  • There is network issues between OEM Repository and the Target database causing “time outs” or that the session finishes erroneously. .
  • High workload in one database causes “time out” making the session finished erroneously.
  • So basically this is a communication problem, between the OEM Repository and the database from where the data is being gotten.
  • To keep reports like this running with database links is something that Oracle doesn’t support at all because of any network issue can cause that the report gets errors, you can read the following notes:

Some reference about it:

  • ORA-00600 [kpndbcon-svchpnotNULL] Errors (Doc ID 1615517.1)
  • ORA-00600 [kpndbcon-svchpnotNULL] query through dblink (Doc ID 1490700.1)
  • Information Publisher Report fails with Error Rendering Element. Exception: ORA-00600 [kpndbcon-svchpnotNULL] (Doc ID 1930280.1)


So what’s the solution
?
The solution here is easy, just re-run it.

Hope it helps. Cheers!

ORA-600 [kwqitnmphe:ltbagi]

Hi all,

So, some time ago I starter receiving an internal error “ORA-600 [kwqitnmphe:ltbagi]” from a client’s database. Everything was up and running fine, but an ORA-600 is always an ORA-600.

Investigating on issue, found it can be related to a several issues as per MOS ORA-600 [kwqitnmphe:ltbagi] (Doc ID 1346009.1):

Bug Fixed Description
17831758 12.1.0.2, 12.2.0.1 ORA-600 [kwqitnmphe:ltbagi] in Qnnn background process
20987661 12.2.0.1 QMON slave processes reporting ORA-600 [kwqitnmphe:ltbagi]
18591240 11.2.0.4.BP17, 12.1.0.2, 12.2.0.1 ORA-600 [kwqitnmphe:ltbagi] is seen immediately after ORA-1089
18536720 12.1.0.2, 12.2.0.1 ORA-600 [kwqitnmphe:ltbagi] processing History IOT in AQ
16204151 12.1.0.2, 12.2.0.1 ORA-600 [kwqitnmphe:ltbagi] when subscriber is dropped pending enqueue/dequeue
12423122 11.2.0.3, 12.1.0.1 ORA-600 [kwqitnmphe:ltbagi] when scheduler uses AQ

In my situation it was a match to the QMON slave processes issue, the only one was not resolved on 12.1 yet (My DB is 12.1, bad luck?), as per MOS Bug 20987661 – QMON slave processes reporting ORA-600 [kwqitnmphe:ltbagi] (Doc ID 20987661.8).

It is fixed in 12.2.0.1, by now. For 12.1 we have a temporary workaround:

As sysdba:

  DECLARE
   po dbms_aqadm.aq$_purge_options_t;
  BEGIN
     po.block := FALSE;
     DBMS_AQADM.PURGE_QUEUE_TABLE(
       queue_table     => 'SYS.SYS$SERVICE_METRICS_TAB',
       purge_condition => NULL,
       purge_options   => po);
  END;
  /

Hope it helps!
Cheers!

ORA-00600: [qkswcWithQbcRefdByMain4]

Hello all,
This days I found this in a client’s 12c Database when trying to create a Materialized View:

ORA-00600: internal error code, arguments: [qkswcWithQbcRefdByMain4]

A perfect match to MOS ORA-00600 [qkswcWithQbcRefdByMain4] when Create MV “WITH” clause (Doc ID 2232872.1).

The root cause is documented on BUG 22867413 – ORA-600 CALLING DBMS_ADVISOR.TUNE_MVIEW.
The given solution is to apply Patch 22867413.

After applying patch, issue solved. 🙂

More“ORA-00600: [qkswcWithQbcRefdByMain4]”