ORA-19665: size % in file header does not match actual file size of %

That’s an unexpected message to get, right?
I got it related to an ORA-7445: ORA-07445: exception encountered: core dump [kcflfi()+1016] [SIGFPE] [Integer divide by zero] [0x10047EF18] [] []

What’s next?
After some checks, found the following (got from related trace the file name, matching to file_id 106):

SQL> select file_name, bytes from dba_data_files where file_id=106;

FILE_NAME                                         BYTES 
------------------------------------------------ --------------
+DATA/MYDB/DATAFILE/DATAFILE_XX.558.1015447173   14529069056

SQL> select name, bytes from v$datafile where file#= 106;

NAME                                             BYTES 
------------------------------------------------ --------------
+DATA/MYDB/DATAFILE/DATAFILE_XX.558.1015447173   14529067281

This means the Database Dictionary has different sizes for the datafile.
Looking at MOS, it seems to be a match to ORA-07445: Exception Encountered (Doc ID 1958870.1).

How to resolve it?

SQL> alter database datafile 106 offline to drop;
RMAN> restore datafile 106;
RMAN> recover datafile 106;
SQL> alter database datafile 106 online;

This resolved my case, fixing the views.

BE CAREFUL:

  • Make sure you have a backup before dropping the datafile.
  • Make sure you can put the datafile offline or proceed in a non-business hour.
  • Follow change procedures for Production, of course. Things may get wild.

And what if I don’t have a backup?
1. You may want to take it. It may not work, though, considering the original mismatch.
2. Export/Import logically:
– Export the data from the related tablespace (Datapump or Legacy Export, check for limitations and datatypes).
– Drop the tablespace and recreate it.
– Import the data back.

As usual, test it in a non-production environment to validate your plan and commands.

I hope it helps!

Relying in Guaranteed Restore Points? Be careful!

Hi all,

Are you relying on Guaranteed Restore Points (GRP) as a fallback plan for your migration or upgrade strategy? Be careful!

When performing some non-Prod upgrade with the Autoupgrade tool,  after completing the upgrade, I wanted to roll it back and go through the process again,  This is what happened:

SQL> startup
ORA-29702: error occurred in Cluster Group Service operation

When looking for it found this blog post from Mike I missed the last year: https://mikedietrichde.com/2020/11/13/ora-29702-and-your-instance-does-not-startup-in-the-cluster-anymore/

This means my database is not starting anymore! Oh man, glad that I’m in the testing phase!

This caused by of Bug 31561819 – Incompatible maxmembers at CRSD Level Causing Database Instance Not Able to Start.

As per Mike’s post, “you don’t need to even restore or flashback a database to hit this error. A simple instance in NOMOUNT state leads to the same error. Without even any datafile.”

The bug is fixed on:

  • 19.9.0.0.201020 (Oct 2020) OCW RU
  • 18.12.0.0.201020 (Oct 2020) OCW RU
  • 12.2.0.1.201020 (Oct 2020) OCW RU

As being, you should include this patch BEFORE starting any move! Do it right away if you are on these versions!

Also, be aware of the latest change regarding Restore Points propagation on 19c, as per MOS Automatic Propagate Restore Points from Primary to Standby site in 19C (Doc ID 2463082.1)

In my case, the usage is exactly for a 12.1->19c upgrade. So, the fix is not even available (no Extended Support in place). As being, we had to think on alternate fallback plans, like a physical standby. But this is a topic for another post.

So for YOU:

  • Apply this patch if you can!
  • If not, be very careful on the fallback plans and as usual: Test, Test and Test!

See you next post!

Exadata DNS Change – Pitfalls to be avoided

Hi all, it’s been a while but here I am!

There were some changes in the infrastructure at the place I work and I was asked to do a DNS change on a bit old Exadata X5. I had never done one before this, so the idea of this post is to help others who might face the issues I had.

The first thing I did was to look up the documentation about it and see the steps, yes there are blogs about it but the doc can help to get at least the first glance of the situation.

Long story short: Exadata has lots of components and the new DNS should be changed on all of them.

Here is a summary of the steps.

Infiniband switches

Connect to the switches and sudo to ilom-admin and change the DNS

su - ilom-admin
show /SP/clients/dns
set /SP/clients/dns nameserver=192.168.16.1,192.168.16.2,192.168.16.3
show /SP/clients/dns

 

Database nodes

For my image I only needed to change the /etc/resolv.conf, if you have a newer one you will need to user ipconf – That´s why you need to go to the documentation, at least there we hope that they will put some mentions on the pitfalls (well keep reading and you will see that was not my case)

Also changed the DNS on wach database node ilom, runing the ipmtool from the each node

ipmitool sunoem cli 'show /SP/clients/dns'
ipmitool sunoem cli 'set /SP/clients/dns nameserver=192.168.16.1,192.168.16.2,192.168.16.3'
ipmitool sunoem cli 'show /SP/clients/dns'


Cell nodes – Here things start to get interesting

For the storage cell there are some points that need to be taken under consideration:

Increase the ASM disk_repair_time – the goal here is to avoid a full rebalance if you do this within its timeframe, if you don’t know this parameter,  ASM will wait for up to the interval specified for DISK_REPAIR_TIME for the disk(s) to come online. If the disk(s) come back online within this interval, a resync operation will occur, where only the extents that were modified while the disks were offline are written to the disks once back online. If the disk(s) do not come back within this interval, ASM will initiate a forced drop of the disk(s), which will trigger a rebalance.

On each cell node we need to make sure all disks are OK, stop all cell disks, stop all cell services and user ipconfig to change the DNS configuration

#Check that putting the grid disks offline will not cause a problem for Oracle ASM - it should all say YES on the 3rd column 
cellcli -e LIST GRIDDISK ATTRIBUTES name,asmmodestatus,asmdeactivationoutcome

#Inactivate all grid disks on the cell - may take a while to complete
cellcli -e ALTER GRIDDISK ALL INACTIVE


#Confirm the grid disks are offline, it should show asmmodestatus=OFFLINE or asmmodestatus=UNUSED, and asmdeactivationoutcome=Yes for all grid disks
cellcli -e LIST GRIDDISK ATTRIBUTES name, asmmodestatus,asmdeactivationoutcome

#Confirm that the disks are offline
cellcli -e LIST GRIDDISK

#Shut down the cell services and ocrvottargetd service
cellcli -e ALTER CELL SHUTDOWN SERVICES ALL
service ocrvottargetd stop #on some images this services does not exists

To execute the ipconf on the old way we only need to call it can follow the prompts, but if you have a newer image you will need to provide its parameters as is shown in the documentation.

The documentation says that after it we could start the cell services back up but I would recommend validating the DNS prior to doing that, why is that you might say because mine did not work and I could have a bigger issue with a cell node without DNS trying to start the services.

So, how to test, use nslookup, dig and curl

nslookup dns_domain.com
curl -v 192.168.16.1:53
dig another_server_in_the_network

 

My tests did not work, I was able to ping the DNS servers but not to resolve any name, I had an SR on MOS but did not help much either, looking up as this is a production system I tried to see if the firewall was up on the Linux site, and to my surprise it was.

I tried to manually add rules to iptables but it did not work and then I came across this note Exadata: New DNS server is not accessible after changing using IPCONF (Doc ID 1581417.1)

And there it was, I needed to restart the cellwall service to recreate the iptables rules.

# Restart cellwall service
service cellwall restart
service cellwall status

One final point, check if ASM started the rebalance or not, if it did, do not start to bring down another cell node until the rebalance is finish, otherwise you may run into deeper issues.

 

I hope it helps!

Elisson Almeida

Failed Logon Delay Causing Performance Issues

On the other day when I got to the office I was called to check a database that was running slow. They had implemented a new process there and wanted to make sure it was not impacted.

When checked I saw this issue using OEM

 

User SYS causing a strange wait event Failed Logon Delay

Someone had  created a process running with the user SYS but they did not fully configured and a part of the process was trying to connect with the wrong password.

While they were looking in the configuration files and servers to see from where the issue was coming from, I started my own investigation to speed up the process.

1st I had to enable audit as it was disabled for unsuccessful loging attemps

SQL> audit session whenever not successful;

Audit succeeded.

 

Than I was able to see from where the failed connection came from, I just needed to look for the return code 1017 as ORA-1017 is invalid username/password; logon denied on sys.aud$

col ntimestamp# for a30 heading "Timestamp"
col userid for a6 heading "Username"
col userhost for a15 heading "Machine"
col spare1 for a10 heading "OS User"
col comment$text for a80 heading "Details"

select ntimestamp#, userid, userhost, spare1, comment$text,returncode from sys.aud$ where returncode=1017 or returncode=28000;

 

Oldie but goldie =)

Hope it helps,

Elisson Almeida

MySQL won’t start [ERROR] Found option without preceding group in config file

Hey folks,

have you ever received a call for a MySQL on windows that stopped working after someone did something at their my.cnf? Then you try to start the service by cmd and get the following error.

mysqld: [ERROR] Found option without preceding group in config file C:\ProgramData\MySQL\MySQL Server 8.0\my.ini at line 1.
mysqld: [ERROR] Fatal error in defaults handling. Program aborted!

Well, for some reason, the editor that was used (no idea which one was), threw some random byte at the beginning of the file. To solve that (on windows at least), open the file on Notepad++, go to Format > Convert to ANSI. Save the file and start again the service.

What was the weirdest thing that happened to you on a Windows Server?

MySQL Error1075 – Incorrect table definition; What’s happening?

Hey Folks,

A few months ago, I found an issue, where, for some reason, someone ignored the warnings and tried to restore a backup from a different version of MySQL (or even MariaDB, IDK). And as a result, half the database was running without Primary Keys. So when a system was trying to update their schema, we were getting errors (like the error bellow) trowed at our face.

1075 - Incorrect table definition; there can be only one auto column and it must be defined as a key

Ok, first things first, you would like to run an ALTER TABLE TABLE_NAME_HERE
ADD PRIMARY KEY (ID);
and see if it works.  The error was being thrown because the table key doesn’t have a single index on it… if you have problems with duplicated records on it, you can try the following script to solve the issue.

First, get the max id from the table, and then run the following: 

UPDATE TABLE_NAME_HERE JOIN (SELECT @sequence := MAX_ID_HERE ) r SET id=@sequence:=@sequence+1 where id= DUPLICATED_ID_HERE;

WARNING

Be aware that, if the rows that were duplicated, where referenced as FK on another table, you will get some headache (well, you already have problems…) !!

19c+ DBMS_STATS.GATHER_TABLE_STATS Memory Leaking on PGA

Hi all,

So, here is a quick post but which wasn’t so eaasy to be mapped, though.

It happens that right after a 19c upgrade from 18c, we started facing Memory Leak messages related to PGA area on a regular basis. After a while following the v$process_memory allocation, it was possible to map it to an ODI routing with DBMS_STATS.GATHER_TABLE_STATS. The PGA exceeds the limit just a few minutes after.

Nice han, so what is happening?

Long story short: Bug 30846782 : 19C+ FAST/EXCESSIVE PGA GROWTH WHEN USING DBMS_STATS.GATHER_TABLE_STATS.

According to Oracle Docs: No Workaround!

But here is the golden info I want to share after an SR:

EXECUTE IMMEDIATE 'alter session set "_fix_control"='20424684:OFF';

You are welcome!

Cheers!

DBCA “Recovery manager failed to restore datafiles”

Hi all,

If that’s the 5th Blog Post with the same title one open, don’t close, READ THIS ONE!

This one is different.

It’s actually another case about Oracle throwing generic errors for DBCA where 99% of times it’s the same error, so all blogs are different but the same in essence, and none resolve your problem. So, let’s go by parts:

The Error from Client:

DBCA_Error

 

Generic Case (if this is the first blog you open about the subject):

  • Make sure the file $ORACLE_HOME/bin/oracle has privilege set to chmod 6751 in both (ASM and DB) homes. it should look like this:
[oracle@PROD01 bin]$ cd /u01/app/oracle/product/19c/db/bin
[oracle@PROD01 bin]$
[oracle@PROD01 bin]$ ls -ltr oracle*
-rwsr-s--x. 1 oracle asmadmin 441253104 Aug 27 22:29 oracle
  • If you are not sure, set it accordingly:
cd $ORACLE_HOME/bin && chmod 6751 oracle
  • Not yet? Check the disks assigned to ASM privileges and groups:
kfod status=TRUE asm_diskstring='/dev/asm*' disk=ALL
  • Ater all this, still not working? Go for the atypical case below:

 

Atypical Case (Exception)

After some struggle and no success, I started validating everything I could. Some piece of words before the silver bullet here:

Noticed I knew you do have ASM? How come that? Well, most likely the error below the ASM happens at the point the DBCA runs a RMAN restore to create the database base files and metadata. This most likely happens at the time the write is happening on the ASM, once this is the most unstable process involved.

By looking deeper on the installation logs I could see:

[Thread-527] [ 2020-08-27 23:50:04.942 PDT ] [RMANUtil$RMANUtilErrorListener.handleError:1386] ERROR=channel ORA_DISK_1: restoring datafile 00001 to +DATA
[Thread-527] [ 2020-08-27 23:50:04.942 PDT ] [RMANUtil$RMANUtilErrorListener.handleError:1386] ERROR=channel ORA_DISK_1: reading from backup piece /ora01/app/oracle/product/19c/db/assistants/dbca/templates/Seed_Database.dfb
[Thread-527] [ 2020-08-27 23:50:04.942 PDT ] [RMANUtil$RMANUtilErrorListener.handleError:1386] ERROR=channel ORA_DISK_1: ORA-19870: error while restoring backup piece /ora01/app/oracle/product/19c/db/assistants/dbca/templates/Seed_Database.dfb
[Thread-527] [ 2020-08-27 23:50:04.942 PDT ] [RMANUtil$RMANUtilErrorListener.handleError:1386] ERROR=ORA-19504: failed to create file "+DATA"
[Thread-527] [ 2020-08-27 23:50:04.942 PDT ] [RMANUtil$RMANUtilErrorListener.handleError:1386] ERROR=ORA-17502: ksfdcre:4 Failed to create file +DATA
[Thread-527] [ 2020-08-27 23:50:04.942 PDT ] [RMANUtil$RMANUtilErrorListener.handleError:1386] ERROR=ORA-15001: diskgroup "DATA" does not exist or is not mounted
[Thread-527] [ 2020-08-27 23:50:04.942 PDT ] [RMANUtil$RMANUtilErrorListener.handleError:1386] ERROR=ORA-01017: invalid username/password; logon denied

Bingo, so it’s a password issue?

Well, I’m creating the database and this actually matches with all the chmod 6751 thing…

What then?

Well, after a while going crazy validating passwd files and so one, realized something about the oracle user:

[oracle@PROD01 bin]$ id -a
uid=500(oracle) gid=501(oinstall) groups=501(oinstall),10(wheel),203(dba),503(asmadmin),504(asmoper),525(madhoc) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
[oracle@PROD01 bin]$ grep oracle /etc/group
wheel:x:10:oracle
asmadmin:x:503:oracle
asmoper:x:504:oracle
madhoc:x:525:oracle

Can you see the oracle user is part of the oinstall group but not really appearing on /etc/group? Also not in osusergroup dba.

Well, let’s force it?

[oracle@PROD01 bin]$ sudo su -
Last login: Fri Aug 28 14:13:21 PDT 2020 on pts/3
[root@DMSDB1PA ~]# usermod -g oinstall -G oinstall,dba,asmadmin,asmoper,madhoc oracle
[root@PROD01 ~]# id oracle
uid=500(oracle) gid=501(oinstall) groups=501(oinstall),10(wheel),203(dba),503(asmadmin),504(asmoper),525(madhoc)
[root@PROD01 ~]# grep oracle /etc/group
wheel:x:10:oracle
dba:x:203:oracle
asmadmin:x:503:oracle
asmoper:x:504:oracle
oinstall:x:501:oracle
madhoc:x:525:oracle
[root@PROD01 ~]#

Well done!

Now try running DBCA again. That was a very tricky issue to find.

Know something else? At the point I was writing this I decided to have another look and ended up finding this MOS note: “ORA-17502 /ORA-01017: invalid username/password; logon denied ” While Creating 19c Database (Doc ID 2545858.1). We have a bug for it: BUG:29821687 – ORA-17502 /ORA-01017: INVALID USERNAME/PASSWORD; LOGON DENIED ” WHILE CREATING 19C DATABASE

You have the workaround already though. Go champs!

Hope it helps you, cheers!

19c: Could not execute DBMS_LOGMNR.START_LOGMNR: ORA-44609: CONTINOUS_MINE is desupported for use with DBMS_LOGMNR.START_LOGMNR.

Hi all,
This is to show you we should never trust 100% on documentation and how running on new versions yearly can put additional pressure on the documentation and cause errors…

So, I started supporting a new tool for data mining. There were no version restrictions as per their documentation, so I was more than happy about creating a PDB on my brand new 19c CDB, proudly using 19c for this new tool!

What happened?

Could not execute DBMS_LOGMNR.START_LOGMNR: ORA-44609: CONTINOUS_MINE is desupported for use with DBMS_LOGMNR.START_LOGMNR.

Doing my validation against the database:

SQL> execute dbms_logmnr.start_logmnr( options => dbms_logmnr.dict_from_online_catalog + SYS.DBMS_LOGMNR.CONTINUOUS_MINE);
BEGIN dbms_logmnr.start_logmnr( options => dbms_logmnr.dict_from_online_catalog + SYS.DBMS_LOGMNR.CONTINUOUS_MINE); END;

*
ERROR at line 1:
ORA-44609: CONTINOUS_MINE is desupported for use with DBMS_LOGMNR.START_LOGMNR.

Why is this happening?

This happens because the continuous_mine feature is deprecated since 12.2 and desupported/unavailable 19c on.

It seems this LogMining tool devs don’t check what changes on the new DBs versions before confirm compatibility, right?

And they don’t do it or a while, as this as announced as deprecated on 12.2…

Which makes me ask myself how are their confidence on the tool running on other clients they have… LOL

Anyway, are you facing the same? Here is some reference documentation to answer your boss about:

From: 19.1 Announcement – ORA-44609: CONTINOUS_MINE is Desupported For Use With DBMS_LOGMNR.START_LOGMNR (Doc ID 2456684.1):

  • CONTINUOUS_MINE was deprecated in Oracle Database 12c release 2 (12.2) and starting with 19.1 is desuppported. There is no replacement functionality.
  • The continuous_mine option for the dbms_logmnr.start_logmnr package is desupported in Oracle Database 19c (19.1), and is no longer available.

The real reasoning behind is: Nothing seems to be populating the V$LOGMNR_LOGS, so the ORA-1291 occurs.

Here is a quick test case or you, from the 19c version for Oracle Utilities Guide version 19c “Mining without specifying the list of redo log files“. (19c – 22. Using LogMiner to Analyze Redo Log Files)

I just put it all together to show you how it does [not] work:

  • Setting up it all:
ALTER DATABASE add SUPPLEMENTAL LOG DATA;
SELECT SUPPLEMENTAL_LOG_DATA_MIN FROM V$DATABASE;
EXECUTE DBMS_LOGMNR_D.BUILD (OPTIONS=>DBMS_LOGMNR_D.STORE_IN_REDO_LOGS);
@?/rdbms/admin/dbmslm.sql
alter system switch logfile;
alter system switch logfile;
alter system switch logfile;
create user BLABLA identified by BLABLA default tablespace users quota unlimited on users profile default;
grant connect, resource to ;
  • Doing some stuff to generate logs:
connect BLABLA/BLABLA
alter session set nls_date_format='dd-mon-yyyy hh24:mi:ss';
set time on
CREATE TABLE TEST_NULLS (COLUMNA1 NUMBER(3,0));
ALTER TABLE TEST_NULLS ADD (COLUMNA2 NUMBER(3) DEFAULT 0 NOT NULL);
insert into TEST_NULLS(columna1) values (4);
commit;
select * from TEST_NULLS;
update TEST_NULLS set columna2=221 where columna1=4;
commit;
select * from TEST_NULLS;
  • Gathering info or mining:
connect / as sysdba;
set echo on
set serveroutput on
alter session set nls_date_format='dd-mon-yyyy hh24:mi:ss';
set linesize 254
set pagesize 3000
column name format a40;
SELECT FILENAME name FROM V$LOGMNR_LOGS;
SELECT NAME, FIRST_TIME FROM V$ARCHIVED_LOG;
SELECT NAME, FIRST_TIME FROM V$ARCHIVED_LOG WHERE SEQUENCE# = (SELECT MAX(SEQUENCE#) FROM V$ARCHIVED_LOG WHERE DICTIONARY_BEGIN = 'YES');
  • Introduce the first_time to recover for the previous query:
EXEC DBMS_LOGMNR.START_LOGMNR(STARTTIME =>'&1',ENDTIME => SYSDATE,OPTIONS => DBMS_LOGMNR.DICT_FROM_REDO_LOGS + DBMS_LOGMNR.COMMITTED_DATA_ONLY + DBMS_LOGMNR.PRINT_PRETTY_SQL);
EXEC BEGIN DBMS_LOGMNR.START_LOGMNR(STARTTIME =>'&1',ENDTIME => SYSDATE,OPTIONS => DBMS_LOGMNR.DICT_FROM_ONLINE_CATALOG + DBMS_LOGMNR.COMMITTED_DATA_ONLY + DBMS_LOGMNR.PRINT_PRETTY_SQL);
select FIRST_CHANGE#,NEXT_CHANGE# from V$archived_log;
SELECT CHECKPOINT_CHANGE#, CURRENT_SCN FROM V$DATABASE;
  • Use the checkpoint at startscn and the current_scn at endscn:
EXEC DBMS_LOGMNR.START_LOGMNR(STARTSCN =>&1,ENDSCN => &2,OPTIONS => DBMS_LOGMNR.DICT_FROM_REDO_LOGS + DBMS_LOGMNR.COMMITTED_DATA_ONLY + DBMS_LOGMNR.PRINT_PRETTY_SQL);
EXEC DBMS_LOGMNR.START_LOGMNR(STARTSCN =>&1,ENDSCN => &2,OPTIONS => DBMS_LOGMNR.DICT_FROM_ONLINE_CATALOG + DBMS_LOGMNR.COMMITTED_DATA_ONLY + DBMS_LOGMNR.PRINT_PRETTY_SQL);
EXEC DBMS_LOGMNR.START_LOGMNR(STARTTIME =>SYSDATE, ENDTIME => SYSDATE +5/24,OPTIONS => DBMS_LOGMNR.DICT_FROM_REDO_LOGS + DBMS_LOGMNR.COMMITTED_DATA_ONLY + DBMS_LOGMNR.PRINT_PRETTY_SQL);
EXEC DBMS_LOGMNR.START_LOGMNR(STARTTIME =>SYSDATE ,ENDTIME => SYSDATE +5/24 ,OPTIONS => DBMS_LOGMNR.DICT_FROM_ONLINE_CATALOG + DBMS_LOGMNR.COMMITTED_DATA_ONLY + DBMS_LOGMNR.PRINT_PRETTY_SQL);

Ref: 19c – 22. Using LogMiner to Analyze Redo Log Files

  • You can expect or this:
ERROR at line 1:
ORA-01291: missing log file
ORA-06512: at "SYS.DBMS_LOGMNR", line 72
ORA-06512: at line 1

Hmmmm… So, the 19c Documentation is not working? Precisely.

As per (Doc ID 2613490.1), this will be fixed in 20.1 documentation.

  • Sections 22.13.2 Examples of Mining without specifying the list of redo log files explicitly and child example topics will be removed.
  • Section 22.4.2. Automatic Redo log Files options will be changed to Specifying Redo Log Files for Data Mining.
  • Section 22.7.4 The next sentence will be removed too.

In summary, whatever reference related to automatic mining in the documentation will be removed as this feature it’s not supported in 19.1 version and higher.

Ok, but this doesn’t solve my problem. What should I do with the client tool?

So, The continous_mine was the only method of starting LogMiner without first adding logfiles using dbms_logmnr.add_logfile.

What can I do, to workaround it:

  1. Add a logfile manually with dbms_logmnr.add_logfile for each logfile.
  2. Remove SYS.DBMS_LOGMNR.CONTINUOUS_MINE from the code.
    1. For this, you’ll need to specify the logfile explicitly. So I guess, some additional codding will be needed from your side…

I hope it helps!
Matheus.

Exadata: Cell Server Crashing on ORA-00600: [LinuxBlockIO::reap]

Hi all,

So I started facing this in a client environment. Here is the alert message:

Target name=db01cel08.xxx.com
Message=ORA-00600: internal error code, arguments: [LinuxBlockIO::reap], [0x60000D502388], [], [], [], [], [], [], [], [], [], []
Event reported time=Dec 19, 2019 2:14:16 AM EDT

When checking on the cellserver I see this message:

[root@db01 ~]# ssh db01cel08
Last login: Thu Dec 19 04:45:13 2019 from db01.xxx.com
[root@db01cel08 ~]# cellcli
CellCLI: Release 12.1.2.3.5 - Production on Fri Dec 19 17:13:31 EDT 2019

Copyright (c) 2007, 2016, Oracle. All rights reserved.

CellCLI> LIST ALERTHISTORY detail

[...]

name: 10
alertDescription: "ORA-07445: exception encountered: core dump [__intel_new_memset()+62] [11] [0x000000000] [] [] []"
alertMessage: "ORA-07445: exception encountered: core dump [__intel_new_memset()+62] [11] [0x000000000] [] [] []"
alertSequenceID: 10
alertShortName: ADR
alertType: Stateless
beginTime: 2019-12-19T02:00:04-04:00
endTime:
examinedBy:
notificationState: 1
sequenceBeginTime: 2019-12-19T02:00:04-04:00
severity: critical
alertAction: "Errors in file /opt/oracle/cell/log/diag/asm/cell/SYS_112331_170406/trace/cellofltrc_19796_53.trc (incident=25). Diagnostic package is attached. It is also accessible at https://db01cel08.xxx.com/diagpack/download?name=db01cel08_2019_12_19T02_00_04_10.tar.bz2 It will be retained on the storage server for 7 days. If the diagnostic package has expired, then it can be re-created at https://db01cel08.xxx.com/diagpack"

name: 11
alertDescription: "ORA-00600: internal error code, arguments: [LinuxBlockIO::reap], [0x60000D502388], [], [], [], [], [], [], [], [], [], []"
alertMessage: "ORA-00600: internal error code, arguments: [LinuxBlockIO::reap], [0x60000D502388], [], [], [], [], [], [], [], [], [], []"
alertSequenceID: 11
alertShortName: ADR
alertType: Stateless
beginTime: 2019-12-19T02:00:04-04:00
endTime:
examinedBy:
notificationState: 1
sequenceBeginTime: 2019-12-19T02:00:04-04:00
severity: critical
alertAction: "Errors in file /opt/oracle/cell/log/diag/asm/cell/db01cel08/trace/svtrc_9174_12.trc (incident=25). Diagnostic package is attached. It is also accessible at https://db01cel08.xxxx.com/diagpack/download?name=jdb01cel08_2019_12_19T02_00_04_11.tar.bz2 It will be retained on the storage server for 7 days. If the diagnostic package has expired, then it can be re-created at https://db01cel08.xxx.com/diagpack"

name: 12_1
alertDescription: "A SQL PLAN quarantine has been added"
alertMessage: "A SQL PLAN quarantine has been added. As a result, Smart Scan is disabled for SQL statements with the quarantined SQL plan. Quarantine id : 21 Quarantine type : SQL PLAN Quarantine reason : Crash Quarantine Plan : SYSTEM Quarantine Mode : FULL_Quarantine DB Unique Name : XPTODB Incident id : 25 SQLID : 8j0az9sgxs5yh SQL Plan details : {SQL_PLAN_HASH_VALUE=281152830, PLAN_LINE_ID=9} In addition, the following disk region has been quarantined, and Smart Scan will be disabled for this region: Disk Region : {Grid Disk Name=Unknown, offset=186750337024, size=1M} "
alertSequenceID: 12
alertShortName: Software
alertType: Stateful
beginTime: 2019-12-19T02:00:12-04:00
examinedBy:
metricObjectName: QUARANTINE/21
notificationState: 1
sequenceBeginTime: 2019-12-19T02:00:12-04:00
severity: critical
alertAction: "A SQL statement caused the Cell Server (CELLSRV) service on the cell to crash. A SQL PLAN quarantine has been created to prevent the same SQL statement from causing the same cell to crash. When possible, disable offload for the SQL statement or apply the RDBMS patch that fixes the crash, then remove the quarantine with the following CellCLI command: CellCLI> drop quarantine 21 All quarantines are automatically removed when a cell is patched or upgraded. For information about how to disable offload for the SQL statement, refer to the section about 'SQL Processing Offload' in Oracle Exadata Storage Server User's Guide. Diagnostic package is attached. It is also accessible at https://db01cel08.xxx.com/diagpack/download?name=db01cel08_2019_12_19T02_00_12_12_1.tar.bz2 It will be retained on the storage server for 7 days. If the diagnostic package has expired, then it can be re-created at https://db01cel08.xxx.com/diagpack"

CellCLI>

After some research, we could match the situation to Bug 13245134 – Query may fail with errors ORA-27618, ORA-27603, ORA-27626 or ORA-00600[linuxblockio::reap_1] or ora-600 [cacheput::process_1]

It’s also described as per: Exadata/SuperCluster: 11.2 databases missing fix for the bug 13245134 may lead to cell service crash with ora-600 [linuxblockio::reap_1]/ora-600 [cacheput::process_1] or ORA-27626: Exadata error: 242/Smart scan issues on the RDBMS side

In order to resolve the crashes quickly, I applied the patch online with:

After applying, all got solved:

[oracle@db01 ~]$ /oracle/xptodb/product/11.2.0.4/OPatch/opatch lsinventory -OH $ORACLE_HOME | grep 13245134
Patch (online) 13245134: applied on Thu Dec 19 23:34:50 EST 2019
13245134
[oracle@db01 ~]$

Hope it helps!