12cR1 – Stop CRS: OC4J Fails to Stop

Hi all!
So, I started to to see this error to stop a 12.1.0.2 CRS :

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'grepora-srv-01'
CRS-2673: Attempting to stop 'ora.crsd' on 'grepora-srv-01'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'grepora-srv-01'
CRS-2679: Attempting to clean 'ora.oc4j' on 'grepora-srv-01'
CRS-2680: Clean of 'ora.oc4j' on 'grepora-srv-01' failed
CRS-2799: Failed to shut down resource 'ora.oc4j' on 'grepora-srv-01'
CRS-2794: Shutdown of Cluster Ready Services-managed resources on 'grepora-srv-01' has failed
CRS-2675: Stop of 'ora.crsd' on 'grepora-srv-01' failed
CRS-2799: Failed to shut down resource 'ora.crsd' on 'grepora-srv-01'
CRS-2795: Shutdown of Oracle High Availability Services-managed resources on 'grepora-srv-01' has failed
CRS-4687: Shutdown command has completed with errors.

When running a crsctl stat res -t it will show something like this

ora.oc4j 1 ONLINE UNKNOWN grpora-srv-01 CLEANING

Solution? Follow The rootupgrade.sh fails because the oc4j resource fails to stop (Doc ID 1921629.1):

1) Stop the OC4J resource, as grid user:

 srvctl stop oc4j -f

Note: Even if the oc4j does not stop, continue with step 2
2) Disable the OC4J resource, as grid user:

 srvctl disable oc4j

3) Confirm the OC4J resource is NOT running, as grid user:

 srvctl status oc4j

If the OC4J resource is still running, go to the node where oc4j is running and recycle GI by issuing the following as root:

 crsctl stop crs -f
 crsctl start crs

Why?
oc4j is used only for Quality of Service Management (QoS feature), as per:
https://docs.oracle.com/database/121/APQOS/install_config.htm#APQOS151
http://docs.oracle.com/cd/E11882_01/server.112/e24611.pdf

Searched on MOS and only found notes saying we cannot remove default services (it is not supported). But as a DBA, in the middle of the night with environment down, this may be a solution for you, specially if you are not using QoS.

Hope it helps. Cheers!

GRID Infrastructure life after rootcrs.pl -deconfig -force -verbose -lastnode

Hi all,

I had a client which asked us to reconfigure the GRID infrastructure on 11g after the did a:

$GI_HOME/crs/install/rootcrs.pl -deconfig -force -verbose -lastnode

The “deconfig” option is used when we need to remove the GI configuration cleanly and the “lastnode”  is executed on the last cluster node.

But what we need to do to recreate the the cluster? Well most would say “Run root.sh again” and that should solve it on most cases.

But when I tried to execute it I have several issues on crsconfig_params file. I tried to manually add the missing data and as there was much info to add but what to do next?

A colleague pointed to 2 MOS notes:

How to Deconfigure/Reconfigure(Rebuild OCR) or Deinstall Grid Infrastructure (Doc ID 1377349.1)
How to Configure or Re-configure Grid Infrastructure With config.sh/config.bat (Doc ID 1354258.1)

So if you follow those notes  you should prepare a response run the config.sh to create a proper crsconfig_params and then run root.sh

$GI_HOME/root.sh
Performing root user operation for Oracle 11g

The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /app/11.2.0.4/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /app/11.2.0.4/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'greporarac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'greporarac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'greporarac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'greporarac1'
CRS-2676: Start of 'ora.diskmon' on 'greporarac1' succeeded
CRS-2676: Start of 'ora.cssd' on 'greporarac1' succeeded
PROT-1: Failed to initialize ocrconfig
PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM instance serving a required diskgroup

Failed to create Oracle Cluster Registry configuration, rc 255
Oracle Grid Infrastructure Repository configuration failed at /app/11.2.0.4/grid/crs/install/crsconfig_lib.pm line 6911.
/app/11.2.0.4/grid/perl/bin/perl -I/app/11.2.0.4/grid/perl/lib -I/app/11.2.0.4/grid/crs/install /app/11.2.0.4/grid/crs/install/rootcrs.pl execution failed
$GI_HOME/crs/config/config.sh -silent -responseFile $GI_HOME/crs/config/grid_configwizard_1.rsp -ignorePreReq

As a root user, execute the following script(s):
1. /app/11.2.0.4/grid/root.sh

Execute /app/11.2.0.4/grid/root.sh on the following nodes:
[rac1, rac1]

Successfully Setup Software.
[WARNING] [INS-32091] Software installation was successful. But some configuration assistants failed, were cancelled or skipped.
ACTION: Refer to the logs or contact Oracle Support Services.
oracle@rac1:/app/11.2.0.4/grid/crs/config>


root@rac1 /app/11.2.0.4/grid/root.sh
Performing root user operation for Oracle 11g

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /app/11.2.0.4/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Relinking oracle with rac_on option
Using configuration parameter file: /app/11.2.0.4/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
OLR initialization - successful
Adding Clusterware entries to oracle-ohasd.conf

CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded

ASM created and started successfully.

Disk Group CRS created successfully.

clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Successful addition of voting disk 04713db813e14f5abf6b385896b1ca1d.
Successful addition of voting disk 8910f31e58db4f30bfdca40e34a0ffbd.
Successful addition of voting disk 7fefe24a8d4b4fcbbfd2d25a4307a1e0.
Successfully replaced voting disk group with +CRS.
CRS-4266: Voting file(s) successfully replaced
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 04713db813e14f5abf6b385896b1ca1d (/dev/mapper/ora-pure-ractestwt1-crs-1) [CRS]
2. ONLINE 8910f31e58db4f30bfdca40e34a0ffbd (/dev/mapper/ora-pure-ractestwt1-crs-2) [CRS]
3. ONLINE 7fefe24a8d4b4fcbbfd2d25a4307a1e0 (/dev/mapper/ora-pure-ractestwt1-crs-3) [CRS]
Located 3 voting disk(s).


CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2676: Start of 'ora.asm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.CRS.dg' on 'rac1'
CRS-2676: Start of 'ora.CRS.dg' on 'rac1' succeeded
Preparing packages for installation...
cvuqdisk-1.0.9-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded

We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:

#1) Respect the privacy of others.
#2) Think before you type.
#3) With great power comes great responsibility.

p_raghu's password on rac1:
Performing root user operation for Oracle 11g

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /app/11.2.0.4/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Relinking oracle with rac_on option
Using configuration parameter file: /app/11.2.0.4/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
OLR initialization - successful
Adding Clusterware entries to oracle-ohasd.conf

CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node rac1, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
Preparing packages for installation...
cvuqdisk-1.0.9-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded

 

Services [not] Starting Automatically with CRS after Reboot

Hello all!
So, these days a client asked me to check why his database was not starting with CRS after reboot. So I started investigating and noticed this…

On Oracle 11.2, the database auto start policy in the clusterware is restore, which means that clusterware will remember the last state of the database. As well as database, Oracle 11.2 comes by default with several important resources with attribute AUTO_START=restore in the profile.

With that, if the database was stopped normally then on the next restart of clusterware it won’t be started. Otherwise if the server crashes or by some reason the OS is rebooted then clusterware will start the database because last state was ONLINE (running).

In my case, a second reboot was done with database down, and this is why database didn’t started. And same applied to some other target types.

The conclusion I get?
For best availability, it is a good practice to change this default behaviour after each installation on 11.2. Some checking and adjusting scripts below:

To check all Resources:

crsctl stat res -p

Checking on database resource:

...
NAME=ora.grepora.db
TYPE=ora.database.type
ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=1
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
AUTO_START=restore
...

But besides DB, several resources were in restore or even in never, like ASM and Voting Disk DG… Check for all resourced AUTO_START:

[root@greporasrv1 ~]# crsctl stat res -p |grep AUTO_START
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=restore
AUTO_START=restore
AUTO_START=restore
AUTO_START=restore
AUTO_START=never
AUTO_START=never
AUTO_START=restore
AUTO_START=always
AUTO_START=restore
AUTO_START=restore
AUTO_START=restore
AUTO_START=restore
AUTO_START=restore
AUTO_START=always
AUTO_START=restore
AUTO_START=restore
AUTO_START=restore
AUTO_START=restore

Well, so to fix all the resources:

crsctl modify resource ora.APS1.dg -attr AUTO_START=always
crsctl modify resource ora.DG1.dg -attr AUTO_START=always
crsctl modify resource ora.FRA.dg -attr AUTO_START=always
crsctl modify resource ora.LISTENER.lsnr -attr AUTO_START=always
crsctl modify resource ora.VDISK_OCR.dg -attr AUTO_START=always
crsctl modify resource ora.asm -attr AUTO_START=always
crsctl modify resource ora.gsd -attr AUTO_START=always
crsctl modify resource ora.net1.network -attr AUTO_START=always
crsctl modify resource ora.ons -attr AUTO_START=always
crsctl modify resource ora.registry.acfs -attr AUTO_START=always
crsctl modify resource ora.LISTENER_SCAN1.lsnr -attr AUTO_START=always
crsctl modify resource ora.LISTENER_SCAN2.lsnr -attr AUTO_START=always
crsctl modify resource ora.LISTENER_SCAN3.lsnr -attr AUTO_START=always
crsctl modify resource ora.cvu -attr AUTO_START=always
crsctl modify resource ora.grepora.db -attr AUTO_START=always
crsctl modify resource ora.greporasrv1.vip -attr AUTO_START=always
crsctl modify resource ora.greporasrv2.vip -attr AUTO_START=always
crsctl modify resource ora.oc4j -attr AUTO_START=always
crsctl modify resource ora.scan1.vip -attr AUTO_START=always
crsctl modify resource ora.scan2.vip -attr AUTO_START=always
crsctl modify resource ora.scan3.vip -attr AUTO_START=always

Below the output and the checking:

[root@greporasrv1 ~]# crsctl modify resource ora.LISTENER.lsnr -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.VDISK_OCR.dg -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.asm -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.gsd -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.net1.network -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.ons -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.registry.acfs -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.LISTENER_SCAN1.lsnr -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.LISTENER_SCAN2.lsnr -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.LISTENER_SCAN3.lsnr -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.cvu -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.grepora.db -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.greporasrv1.vip -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.greporasrv2.vip -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.oc4j -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.scan1.vip -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.scan2.vip -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.scan3.vip -attr AUTO_START=always
[root@greporasrv1 ~]#
[root@greporasrv1 ~]# crsctl stat res -p |grep AUTO_START
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
[root@greporasrv1 ~]#

 

After Patch: MRP0: Background Media Recovery terminated with error 10485

Ok,
I had that some time ago after applying Patch 27475598 – Oracle JavaVM Component 11.2.0.4.180417 Database PSU.
Why? Well, this is Non RAC-Rolling Installable and also Not Data Guard Standby First Installable.

This means there downtime for this patch, no escape.

I had to (skipping all the standard opatch steps, you can see those on README):

  • Stop DG Replication:
dgmgrl /
show configuration
show database mydg
edit database 'mydg' set state='apply-off';
show database mydg
  • Run postinstall.sql in upgrade mode with only 1 instance on (disable RAC):
cd $ORACLE_HOME/sqlpatch/27475598
sqlplus /nolog
SQL> CONNECT / AS SYSDBA
SQL> STARTUP
SQL> alter system set cluster_database=false scope=spfile;
SQL> SHUTDOWN
SQL> STARTUP UPGRADE
SQL> @postinstall.sql
SQL> alter system set cluster_database=true scope=spfile;
SQL> SHUTDOWN
SQL> STARTUP

Ok, all good, seems all fine.

But now when starting my DG replication:

dgmgrl /
show configuration
show database mydg
edit database 'mydg' set state='apply-on';
show database mydg

What I see is:

DGMGRL> show database mydg

Database - mydg

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   0 seconds (computed 1 second ago)
  Apply Lag:       41 minutes 53 seconds (computed 1 second ago)
  Apply Rate:      (unknown)
  Real Time Query: OFF
  Instance(s):
    myprod

  Database Error(s):
    ORA-16766: Redo Apply is stopped

Database Status:
ERROR

DGMGRL>

And on Database Alert Log:

MRP0: Background Media Recovery terminated with error 10485
Errors in file /u01/app/oracle/diag/rdbms/axwest/greporaprod/trace/greporaprod_pr00_42628.trc:
ORA-10485: Real-Time Query cannot be enabled while applying migration redo.
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION
Recovery interrupted!
MRP0: Background Media Recovery process shutdown (greporaprod)

Well, in my case it happens because I use an Active Dataguard, in open read only. The solution? Start you DG in Mount Mode to apply the patching replication!

This is well described as per MOS: MRP process getting terminated with error ORA-10485 (Doc ID 1618485.1).

After getting sync, you can simple promote it to read only mode again.

Hope it helps!

Shellscript: Which database is this service_name from?

Hey!
Strange question, right? You should know you applications and databases and how everything works, right?

Happen that some days ago a client requested me to identify from which database is a service_name. Easy? Not if the server have 46 database instances running…

Ok, now how can I can this information as quick as possible?

I know that the command below give me all distinct databases in CRS:

/u01/app/12.1.0.2/grid/bin/crsctl stat res -t -w "TYPE = ora.database.type"

Now, lets make some sed magic:

[root@nprd01dbadm01 ~]# /u01/app/12.1.0.2/grid/bin/crsctl stat res -t -w "TYPE = ora.database.type" |grep ora. |sed 's/ora./srvctl status service -d /' |sed 's/.db/ |grep \$1/' | head -1
srvctl status service -d dba |grep $1

Continue reading

RAC on AIX: Network Best Practices

Hi all!
A few time ago I passed by some performance issues on AIX working with instances with different configuration (proc/mem). The root cause was basically the inefficient configuration of networking for interconnect (UDP).
As you know, the UDP is a non-response (for that reason with less metadata and faster) protocol. By the default, every server have a pool to send udp (and tcp) messages and another to recieve.
In my situation, once there was an ‘inferior’ instance, the pools were automatically set smaller in this one, and it was causing a high interconnection block sending statistics from other instances. In deed, it was lots of resends caused by overflows in this smaller instance…

There is one one to evaluete how much loss are you having for UDP in your AIX server:

netstat -s | grep 'socket buffer overflows'

If you are having considerable number of overflows, it’s recommended to reavaluate the sized of your udp_recvspace. And, of course, maintain the calculation of pools.

Oracle recommends, at least:

tcp_recvspace = 65536
tcp_sendspace = 65536
udp_sendspace = ((DB_BLOCK_SIZE * DB_FILE_MULTIBLOCK_READ_COUNT) + 4 KB) but no lower than 65536
udp_recvspace = 655360 (Minimum recommended value is 10x udp_sendspace, parameter value must be less than sb_max)
rfc1323 = 1
sb_max = 4194304
ipqmaxlen = 512

This and others details about configuring RAC on AIX ban be found in note: RAC and Oracle Clusterware Best Practices and Starter Kit (AIX) (Doc ID 811293.1)

I’d recommend you take a look too.

Have a nice day!
Matheus.

VMWare: Adding Shared Disks for Clustered Oracle Database

Hi folks!
Today a friend asked about how to configure disks on VMWare to create a virtualized cluster database. I revisited my old notes and decided to share. Here it goes…

First, I really have some constraints about it:
– Fake “high availability”: To have HA with VM it’s not needed 2 vms, if a host fail VMWare should make a VMotion (if well configured), and no services will be affected. So, one VM is ok.
– Not real “horizontally scallated”: It probably would be better to use one server as physical than have two vms on it. Not make sense to do it…

So, why?
To prove concept, evaluate RAC configuration (caches on sequences, etc) and labs, to learn and practice RAC stuffs…

Ok, now how to make it happen?

1. Add new disk to one of the machines. Some way, one will be the “primary” and share disks with another.
add1.jpg

2. Set Mode Thick Eager Zeroed

add2.jpg

3. Create a specific controller to this “shared disks”

add3.jpg

4. Set controller to virtual sharing

add4.jpg

# Other Machine
5. Adding the existent disk to other VM (not primary, but from primary)

add5.jpg
6. Select disk from primary

add6.jpg

add7.jpg

7. Create a new controller, as you made on primary and select it:

add8.jpgadd9.jpg

8. Set controller to virtual sharing
add10

OBS:
If this error happen, one of your controller is not in sharing mode. Please check it.
add11

 

And here we are! 🙂
Good lab!
Matheus.

Kludge: Mounting ACFS Thought Shellscript

Just the script. The history is here.
This is a “workaround” script. As always, is recommended to use Oracle Restart, like I posted here.

#!/bin/sh
$GRID_HOME/bin/srvctl add filesystem -d /dev/asm/dbhome-270 -g 'DGHOME' -v DBHOME -m /oracle/db -u oracle
if [ $? = "0" -o $? = "2" ]; then
$GRID_HOME/bin/srvctl start filesystem -d /dev/asm/dbhome-270
if [ $? = "0" ]; then
chown oracle:oinstall /oracle/db
chmod 775 /oracle/db
$GRID_HOME/bin/srvctl status filesystem -d /dev/asm/dbhome-270
exit 0
fi
$GRID_HOME/bin/srvctl status filesystem -d /dev/asm/dbhome-270
fi

There is a good post ACFS and ACFS restart scripting:
https://levipereira.wordpress.com/2011/07/28/oracle-acfs-filesystem-managed-by-ohas-on-oracle-restart/

See ya!

Matheus.

Scheduler Job by Node (RAC Database)

Sometimes you want to run something just in one node of the RAC. Here is an example to do it:

create or replace procedure USER_JOB.PRC_SOMETHING is
begin
-- do something
null;
end;
/
begin
sys.dbms_scheduler.create_job(job_name => 'USER_JOB.JOB_SOMETHING',
job_type            => 'PLSQL_BLOCK',
job_action          => 'USER_JOB.PRC_SOMETHING;',
start_date          => sysdate,
repeat_interval     => 'Freq=Minutely;Interval=30',
end_date            => to_date(null),
job_class           => 'DEFAULT_JOB_CLASS',
enabled             => true,
auto_drop           => false,
comments            => 'Something Job.');
end;
/
begin
dbms_scheduler.set_attribute(name => 'USER_JOB.JOB_SOMETHING',
 attribute=>'INSTANCE_ID', value=> 1);
end;
/

Matheus.