Oracle: “Invisible” Locks!

Have you ever heard:
“My session was killed. I’m trying again. There is no locks. But My SQL doesn’t run!”

By default, when a session is disconnected from Oracle, the SQLs uncommitted are undone (rollback). So, in case of a long routing there is a probably a rollback in place.

How Oracle rollback take that long? Well, it’s Oracle mechanism. The RDBMS basically assume you know what your are doing and start writing the new blocks so the commit will be very quick. Which, however, is bad in case of rollbacks.

How to check if this is your case?

select ktuxeusn, to_char(sysdate,'DD-MON-YYYY HH24:MI:SS') "Time", ktuxesiz, ktuxesta
from x$ktuxe
where ktuxecfl = 'DEAD'
4 /

KTUXEUSN Time KTUXESIZ KTUXESTA
---------- -------------------- ---------- ----------------
31 07-APR-2018 16:48:53 17705 ACTIVE
48 07-APR-2018 16:48:53 108856 ACTIVE

Hm.. What can I do? Not much really, we need to wait the rollback to complete.
One thing that can be done to increase the rollback is changing the fast_start_parallel_rollback parameter, as per:

SQL> show parameters roll

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
fast_start_parallel_rollback string LOW
rollback_segments string
transactions_per_rollback_segment integer 5
SQL> alter system set fast_start_parallel_rollback=HIGH;

System altered.

Hope it helps!
Cheers!

Auditing Logons with Triggers

Hello all!
So, I was attending a client who wanted to audit logon on database without Advanced Security pack. Complicated? Not at all.

Performatic? eehh, not really the best option, but Oracle is not really good for opening sessions anyway (this is why we have connection pools), so I consider this not bad…

Ok, how to do it?

A sequence for ID control:

create sequence sys.logon_capture_seq;
A Table for logon logging:
create table sys.logon_capture (
 id                     number,
 capture_time           date,
 authenticated_identity varchar2(30),
 authentication_method  varchar2(30),
 identification_type    varchar2(30),
 network_protocol       varchar2(30),
 session_user           varchar2(30),
 os_user                varchar2(30),
 host                   varchar2(30),
 ip_address             varchar2(30),
 program                varchar2(30),
 module                 varchar2(30),
 action                 varchar2(30),
 service_name           varchar2(30))
tablespace logon_capture;

* Here is an important point: Be always sure to have enough space on this tablespace, otherwise all new connections can be frozen.

Create the logon trigger:

create or replace trigger SYS.trg_capture_logons
after logon on database
when (SYS_CONTEXT ('USERENV', 'SESSION_USER') not in ('SYS'))
begin
  insert into sys.logon_capture (id,capture_time,authenticated_identity,authentication_method,identification_type,network_protocol,session_user,os_user,host,ip_address,program,module,action,service_name)
  select
    sys.logon_capture_seq.nextval,
    sysdate,
    substr(sys_context('userenv','authenticated_identity'),1,30),
    substr(sys_context('userenv','authentication_method'),1,30),
    substr(sys_context('userenv','identification_type'),1,30),
    substr(sys_context('userenv','network_protocol'),1,30),
    substr(sys_context('userenv','session_user'),1,30),
    substr(sys_context('userenv','os_user'),1,30),
    substr(sys_context('userenv','host'),1,30),
    substr(sys_context('userenv','ip_address'),1,30),
    substr(program,1,30),
    substr(sys_context('userenv','module'),1,30),
    substr(sys_context('userenv','action'),1,30),
    substr(sys_context('userenv','service_name'),1,30)
  from v$session
  where sid = sys_context('userenv','sid');
  commit;
exception
  when others then null;
end;
/

Know what is a good idea? a cleanup job, keeping only the last 6 month of “audit” data:

begin
  DBMS_SCHEDULER.CREATE_JOB (
   job_name             => 'SYS.PURGE_LOGON_CAPTURE',
   job_type             => 'PLSQL_BLOCK',
   job_action           => 'begin delete from sys.logon_capture where capture_time < add_months(sysdate, -6); commit; end;',    number_of_arguments  => 0,
   start_date           => trunc(sysdate+1) + 23/24,
   repeat_interval      => 'FREQ=DAILY;BYHOUR=23;BYMINUTE=40;BYSECOND=0',
   enabled              => false,
   auto_drop            => false,
   comments             => '');
end;
/
exec DBMS_SCHEDULER.ENABLE ( name => 'SYS.PURGE_LOGON_CAPTURE' );
select owner, enabled from dba_scheduler_jobs where job_name = 'PURGE_LOGON_CAPTURE'

Hope it helps you!

CRS-10051: CVU found following errors with Clusterware setup : PRCW-1015 : Wallet % does not exist

Hello all!
So, recently I found this error in a CRS alert log from a client environment. Interesting error…

2018-03-26 16:33:53.277 [SRVM(9624)]CRS-10051: CVU found following errors with Clusterware setup : PRCW-1015 : Wallet greporadb does not exist. 
CLSW-9: The cluster wallet to be operated on does not exist. :[1015]PRCW-1015 : Wallet greporadb does not exist.

I found this also related to the error:

PRVG-1512 : Failed to retrieve current selection of public and private network classifications

So, it was mapped to known Bug 18234669, as per described in CRS-10051: CVU Found Following Errors With Clusterware Setup :PRCW-1015 : Wallet ora603ut does not exist (Doc ID 2008466.1).

Ok, but what to do?

1) In case you have the wallet, you can simply add it to the database:

crsctl add wallet -type CVUDB -name [dbname]

2) In case you haven’t, you can simply disable the resource ora.cvu, that is the one checking this:

> Checking Status

[root@grepora-srv ~]# crsctl stat res ora.cvu -p | grep CHECK_RESULT
CHECK_RESULTS=PRVG-1512 : Failed to retrieve current selection of public and private network classifications,PRCW-1015 : Wallet greporadb does not exist. ,CLSW-9: The cluster wallet to be operated on does not exist. :[1015],PRCW-1015 : Wallet greporadb does not exist. ,CLSW-9: The cluster wallet to be operated on does not exist. :[1015],PRCW-1015 : Wallet c4prod does not exist. ,CLSW-9: The cluster wallet to be operated on does not exist. :[1015],PRVG-1512 : Failed to retrieve current selection of public and private network classifications,PRCW-1015 : Wallet greporadb does not exist. ,CLSW-9: The cluster wallet to be operated on does not exist. :[1015],PRCW-1015 : Wallet greporadb does not exist. ,CLSW-9: The cluster wallet to be operated on does not exist. :[1015],PRCW-1015 : Wallet c4prod does not exist. ,CLSW-9: The cluster wallet to be operated on does not exist. :[1015],PRVG-1512 : Failed to retrieve current selection of public and private network classifications,PRCW-1015 : Wallet greporadb does not exist. ,CLSW-9: The cluster wallet to be operated on does not exist. :[1015],PRCW-1015 : Wallet greporadb does not exist. ,CLSW-9: The cluster wallet to be operated on does not exist. :[1015],PRCW-1015 : Wallet c4prod does not exist. ,CLSW-9: The cluster wallet to be operated on does not exist. :[1015],PRVG-1512 : Failed to retrieve current selection of public and private network classifications,PRCW-1015 : Wallet greporadb does not exist. ,CLSW-9: The cluster wallet to be operated on does not exist. :[1015],PRCW-1015 : Wallet greporadb does not exist. ,CLSW-9: The cluster wallet to be operated on does not exist. :[1015],PRCW-1015 : Wallet greporadb does not exist. ,CLSW-9: The cluster wallet to be operated on does not exist. :[1015]

> Disable CVU:

oracle:grepora-srv: srvctl disable cvu
oracle:grepora-srv:
oracle:grepora-srv: crsctl stat res ora.cvu -p | grep ENABLED
ENABLED=0
oracle:grepora-srv: srvctl status cvu                       
CVU is disabled

Hope it helps!

Services [not] Starting Automatically with CRS after Reboot

Hello all!
So, these days a client asked me to check why his database was not starting with CRS after reboot. So I started investigating and noticed this…

On Oracle 11.2, the database auto start policy in the clusterware is restore, which means that clusterware will remember the last state of the database. As well as database, Oracle 11.2 comes by default with several important resources with attribute AUTO_START=restore in the profile.

With that, if the database was stopped normally then on the next restart of clusterware it won’t be started. Otherwise if the server crashes or by some reason the OS is rebooted then clusterware will start the database because last state was ONLINE (running).

In my case, a second reboot was done with database down, and this is why database didn’t started. And same applied to some other target types.

The conclusion I get?
For best availability, it is a good practice to change this default behaviour after each installation on 11.2. Some checking and adjusting scripts below:

To check all Resources:

crsctl stat res -p

Checking on database resource:

...
NAME=ora.grepora.db
TYPE=ora.database.type
ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=1
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
AUTO_START=restore
...

But besides DB, several resources were in restore or even in never, like ASM and Voting Disk DG… Check for all resourced AUTO_START:

[root@greporasrv1 ~]# crsctl stat res -p |grep AUTO_START
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=restore
AUTO_START=restore
AUTO_START=restore
AUTO_START=restore
AUTO_START=never
AUTO_START=never
AUTO_START=restore
AUTO_START=always
AUTO_START=restore
AUTO_START=restore
AUTO_START=restore
AUTO_START=restore
AUTO_START=restore
AUTO_START=always
AUTO_START=restore
AUTO_START=restore
AUTO_START=restore
AUTO_START=restore

Well, so to fix all the resources:

crsctl modify resource ora.APS1.dg -attr AUTO_START=always
crsctl modify resource ora.DG1.dg -attr AUTO_START=always
crsctl modify resource ora.FRA.dg -attr AUTO_START=always
crsctl modify resource ora.LISTENER.lsnr -attr AUTO_START=always
crsctl modify resource ora.VDISK_OCR.dg -attr AUTO_START=always
crsctl modify resource ora.asm -attr AUTO_START=always
crsctl modify resource ora.gsd -attr AUTO_START=always
crsctl modify resource ora.net1.network -attr AUTO_START=always
crsctl modify resource ora.ons -attr AUTO_START=always
crsctl modify resource ora.registry.acfs -attr AUTO_START=always
crsctl modify resource ora.LISTENER_SCAN1.lsnr -attr AUTO_START=always
crsctl modify resource ora.LISTENER_SCAN2.lsnr -attr AUTO_START=always
crsctl modify resource ora.LISTENER_SCAN3.lsnr -attr AUTO_START=always
crsctl modify resource ora.cvu -attr AUTO_START=always
crsctl modify resource ora.grepora.db -attr AUTO_START=always
crsctl modify resource ora.greporasrv1.vip -attr AUTO_START=always
crsctl modify resource ora.greporasrv2.vip -attr AUTO_START=always
crsctl modify resource ora.oc4j -attr AUTO_START=always
crsctl modify resource ora.scan1.vip -attr AUTO_START=always
crsctl modify resource ora.scan2.vip -attr AUTO_START=always
crsctl modify resource ora.scan3.vip -attr AUTO_START=always

Below the output and the checking:

[root@greporasrv1 ~]# crsctl modify resource ora.LISTENER.lsnr -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.VDISK_OCR.dg -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.asm -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.gsd -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.net1.network -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.ons -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.registry.acfs -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.LISTENER_SCAN1.lsnr -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.LISTENER_SCAN2.lsnr -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.LISTENER_SCAN3.lsnr -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.cvu -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.grepora.db -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.greporasrv1.vip -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.greporasrv2.vip -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.oc4j -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.scan1.vip -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.scan2.vip -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.scan3.vip -attr AUTO_START=always
[root@greporasrv1 ~]#
[root@greporasrv1 ~]# crsctl stat res -p |grep AUTO_START
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
[root@greporasrv1 ~]#

 

After Patch: MRP0: Background Media Recovery terminated with error 10485

Ok,
I had that some time ago after applying Patch 27475598 – Oracle JavaVM Component 11.2.0.4.180417 Database PSU.
Why? Well, this is Non RAC-Rolling Installable and also Not Data Guard Standby First Installable.

This means there downtime for this patch, no escape.

I had to (skipping all the standard opatch steps, you can see those on README):

  • Stop DG Replication:
dgmgrl /
show configuration
show database mydg
edit database 'mydg' set state='apply-off';
show database mydg
  • Run postinstall.sql in upgrade mode with only 1 instance on (disable RAC):
cd $ORACLE_HOME/sqlpatch/27475598
sqlplus /nolog
SQL> CONNECT / AS SYSDBA
SQL> STARTUP
SQL> alter system set cluster_database=false scope=spfile;
SQL> SHUTDOWN
SQL> STARTUP UPGRADE
SQL> @postinstall.sql
SQL> alter system set cluster_database=true scope=spfile;
SQL> SHUTDOWN
SQL> STARTUP

Ok, all good, seems all fine.

But now when starting my DG replication:

dgmgrl /
show configuration
show database mydg
edit database 'mydg' set state='apply-on';
show database mydg

What I see is:

DGMGRL> show database mydg

Database - mydg

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   0 seconds (computed 1 second ago)
  Apply Lag:       41 minutes 53 seconds (computed 1 second ago)
  Apply Rate:      (unknown)
  Real Time Query: OFF
  Instance(s):
    myprod

  Database Error(s):
    ORA-16766: Redo Apply is stopped

Database Status:
ERROR

DGMGRL>

And on Database Alert Log:

MRP0: Background Media Recovery terminated with error 10485
Errors in file /u01/app/oracle/diag/rdbms/axwest/greporaprod/trace/greporaprod_pr00_42628.trc:
ORA-10485: Real-Time Query cannot be enabled while applying migration redo.
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION
Recovery interrupted!
MRP0: Background Media Recovery process shutdown (greporaprod)

Well, in my case it happens because I use an Active Dataguard, in open read only. The solution? Start you DG in Mount Mode to apply the patching replication!

This is well described as per MOS: MRP process getting terminated with error ORA-10485 (Doc ID 1618485.1).

After getting sync, you can simple promote it to read only mode again.

Hope it helps!

12c New Feature: Unused Privileges

Did you know 12c now has a mechanism to examine which privileges are actually being used by an user, module or for the database as a whole, as opposed to merely the privileges granted?
This is a great security improvement in order give users the privileges that they need precisely without granting too much.

Here’s how is works?

1. First you have to set up a capture process using the new package DBMS_PRIVILEGE_CAPTURE.

# Example:

BEGIN 
DBMS_PRIVILEGE_CAPTURE.CREATE_CAPTURE( 
        name         => 'all_priv_analysis_pol', 
        description  => 'database-wide policy to analyze all privileges', 
        type         => DBMS_PRIVILEGE_CAPTURE.G_DATABASE); 
END;
/

This starts the capture process database wide. Let it run for a bit.

2. Now you can either generate a report:

# Example

BEGIN 
DBMS_PRIVILEGE_CAPTURE.GENERATE_RESULT( 
        name         => 'all_priv_analysis_pol'); 
END;
/

Or examine the new views:

DBA_USED_PRIVS 
DBA_UNUSED_PRIVS 
DBA_USED_OBJPRIVS 
DBA_UNUSED_OBJPRIVS

3. To turn off you proceed with the following:

BEGIN 
DBMS_PRIVILEGE_CAPTURE.DROP_CAPTURE( 
        name         => 'all_priv_analysis_pol'); 
END;
/

So now you ensure that you use the best practice of least privilege for users.

Cheers!

Exacheck: The bundle patch version installed does not match the bundle patch version registered in the database

Hi all!
So, running Exacheck on a recently created database, found this error:

 FAIL => The bundle patch version installed does not match the bundle patch version registered in the database: [host]:[sid],...

This means that a boundle patch with sqlpatch was applied to OH and not to this database. Happens because Exacheck try to match the patch info stored in oraInventory with the patch info stored in dba_registry_sqlpatch.

Also note in some situations, running datapatch may require the database to be in upgrade mode and if you are patching Exadata , which is generally a RAC based environment, you need to set the cluster_database=false and at least 1 job_queue_process before starting the database using startup upgrade command. This should be described in readme on related patch.

When checking for this, I found a really interesting validation script here. As per:

opatch_bp=$($ORACLE_HOME/OPatch/opatch lspatches 2>/dev/null|grep -iwv javavm|grep -wi database|head -1|awk -F';' '{print $1}') 
database_bp_status=$(echo -e "set heading off feedback off timing off \n select STATUS from dba_registry_sqlpatch where PATCH_ID = $opatch_bp;"|$ORACLE_HOME/bin/sqlplus -s " / as sysdba" | sed -e '/^ *$/d')
if [ "$database_bp_status" == SUCCESS ]
then
      echo "SUCCESS: Bundle patch installed in the database matches the software home and is installed successfully."
else
      echo "FAILURE: Bundle patch installed in the database does not match the software home, or is installed with errors." 
 fi

To fix, just set environment variables to correct database, go to $ORACLE_HOME/OPatch and run:

Continue reading

Trace on Standby: Tracing MRP Process

Hey all!
Recently I got the following error when trying to start a trace in a standby. The idea was to start a trace in MRP to check on performance, but when trying to use DBMS_MONITOR.session_trace_enable:

oracle:standby-srvr /tmp: sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Fri Jan 19 14:07:56 2018

Copyright (c) 1982, 2014, Oracle. All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> exec DBMS_MONITOR.session_trace_enable(session_id =>1970, serial_num=>55126, waits=>TRUE, binds=>TRUE);
BEGIN DBMS_MONITOR.session_trace_enable(session_id =>1970, serial_num=>55126, waits=>TRUE, binds=>TRUE); END;

*
ERROR at line 1:
ORA-06550: line 1, column 7:
PLS-00201: identifier 'DBMS_MONITOR.SESSION_TRACE_ENABLE' must be declared
ORA-06550: line 1, column 7:
PL/SQL: Statement ignored

OMG!
Don’t panic, you can accomplish the same using oradebug:

-- In my case:
SQL> select process,pid from V$managed_standby where process like '%MRP%';

PROCESS PID
--------- ------------------------
MRP0 45693

SQL> oradebug setospid 19564
Oracle pid: 105, Unix process pid: 19564, image: oracle@standby-srvr (MRP0)
SQL> oradebug unlimit
Statement processed.
SQL> oradebug Event 10046 trace name context forever, level 12
Statement processed.
SQL>

You can apply same for any process/sid you need.

In case you desire to get the spid from a sid:

col machine format a30
col process format 999999
select p.spid,b.sid, p.pid
from v$session b, v$process p
where b.paddr=p.addr
and sid=&sid
/

Ok, and how to stop then?
Easy:

SQL> oradebug setospid 19564
SQL> oradebug Event 10046 trace name context off
SQL> oradebug tracefile_name

Enjoy!

Recover Standby Using an Incremental Backup

Hey all,
I know is there is a lot of posts about it in the internet, but I’m doing mine about it. The main reason is that the other has lots of steps and outputs and I miss simplicity. Hope you enjoy this one as well.

But why?
Sometimes if the standby database fails for any reason – for example when doing NOLOGGING actions in the primary database – then it can be necessary to recover the datafiles from the incremental backup as the archive logs will not contain the right information.

The recovery of the standby is relatively simple.

1. Take a cumulative incremental backup on the primary

Example

Either take a level 1 backup

backup as compressed backupset incremental level 1 cumulative database filesperset 15;

Or get the scn to recover from on the standby using the following SQL

select min(f.fhscn) from x$kcvfh f, v$datafile d where f.hxfil =d.file# and d.enabled != 'READ ONLY';

2. Run the backup using the SCN

backup as compressed backupset incremental from scn &scn;

3. Create a copy of controlfile for Standby

alter database create standby controlfile as '/tmp/newctfl.ctl';

4. Copy incremental backup files and controlfile to standby

5. Restore copied Controlfile

Example

SQL> shutdown abort; 
SQL> startup nomount; 
RMAN> restore controlfile from '/tmp/newctfl.ctl'; 
SQL> alter database mount;

6. Catalog the backup files in the standby controlfile

Example

catalog start with '/tmp/rmanbackup/';

7. Recover the database with no redo

Example

recover database noredo;

8. Start the managed recovery

Example

alter database recover managed standby database using current logfile disconnect from session;

This is also an easy alternative to the lot more complex described in MOS Rolling a Standby Forward using an RMAN Incremental Backup To Fix The Nologging Changes (Doc ID 958181.1).

Hope you enjoy!
Cheers!

CRS Not Starting after Removing OS User: How to Workaround and How to Solve!

Hello all!
Turns that a few days ago a client reached me because his CRSD was simply not starting. Like this:

[root@proddb proddb]$ ./crsctl start res ora.crsd -init
CRS-2672: Attempting to start 'ora.crsd' on 'proddb'
CRS-2676: Start of 'ora.crsd' on 'proddb' succeeded

[root@proddb proddb]$ ps -ef |grep crsd
root 19217 13424 0 11:53 pts/0 00:00:00 grep crsd

After some investigation, I found the following:

2017-01-24 14:00:06.859: [ CRSSEC][1690195712]{1:51052:2} Exception: OwnerEntry construction failed to retrieve user id by name with ACL string: owner:jacknobody:rwx and error: 1
2017-01-24 14:00:06.912: [ CRSSEC][1690195712]{1:51052:2} Exception: ACL entry creation failed for: owner:jacknobody:rwx

Hmmm, seems some CRS resources are owned by “Jack Nobody”… Turns that I this us was removed from OS:

[root@proddb proddb]$ cat /etc/passwd |grep jacknobody
[root@proddb proddb]$ 

What to do now?

Continue reading