Services [not] Starting Automatically with CRS after Reboot

Hello all!
So, these days a client asked me to check why his database was not starting with CRS after reboot. So I started investigating and noticed this…

On Oracle 11.2, the database auto start policy in the clusterware is restore, which means that clusterware will remember the last state of the database. As well as database, Oracle 11.2 comes by default with several important resources with attribute AUTO_START=restore in the profile.

With that, if the database was stopped normally then on the next restart of clusterware it won’t be started. Otherwise if the server crashes or by some reason the OS is rebooted then clusterware will start the database because last state was ONLINE (running).

In my case, a second reboot was done with database down, and this is why database didn’t started. And same applied to some other target types.

The conclusion I get?
For best availability, it is a good practice to change this default behaviour after each installation on 11.2. Some checking and adjusting scripts below:

To check all Resources:

crsctl stat res -p

Checking on database resource:

...
NAME=ora.grepora.db
TYPE=ora.database.type
ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=1
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
AUTO_START=restore
...

But besides DB, several resources were in restore or even in never, like ASM and Voting Disk DG… Check for all resourced AUTO_START:

[root@greporasrv1 ~]# crsctl stat res -p |grep AUTO_START
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=restore
AUTO_START=restore
AUTO_START=restore
AUTO_START=restore
AUTO_START=never
AUTO_START=never
AUTO_START=restore
AUTO_START=always
AUTO_START=restore
AUTO_START=restore
AUTO_START=restore
AUTO_START=restore
AUTO_START=restore
AUTO_START=always
AUTO_START=restore
AUTO_START=restore
AUTO_START=restore
AUTO_START=restore

Well, so to fix all the resources:

crsctl modify resource ora.APS1.dg -attr AUTO_START=always
crsctl modify resource ora.DG1.dg -attr AUTO_START=always
crsctl modify resource ora.FRA.dg -attr AUTO_START=always
crsctl modify resource ora.LISTENER.lsnr -attr AUTO_START=always
crsctl modify resource ora.VDISK_OCR.dg -attr AUTO_START=always
crsctl modify resource ora.asm -attr AUTO_START=always
crsctl modify resource ora.gsd -attr AUTO_START=always
crsctl modify resource ora.net1.network -attr AUTO_START=always
crsctl modify resource ora.ons -attr AUTO_START=always
crsctl modify resource ora.registry.acfs -attr AUTO_START=always
crsctl modify resource ora.LISTENER_SCAN1.lsnr -attr AUTO_START=always
crsctl modify resource ora.LISTENER_SCAN2.lsnr -attr AUTO_START=always
crsctl modify resource ora.LISTENER_SCAN3.lsnr -attr AUTO_START=always
crsctl modify resource ora.cvu -attr AUTO_START=always
crsctl modify resource ora.grepora.db -attr AUTO_START=always
crsctl modify resource ora.greporasrv1.vip -attr AUTO_START=always
crsctl modify resource ora.greporasrv2.vip -attr AUTO_START=always
crsctl modify resource ora.oc4j -attr AUTO_START=always
crsctl modify resource ora.scan1.vip -attr AUTO_START=always
crsctl modify resource ora.scan2.vip -attr AUTO_START=always
crsctl modify resource ora.scan3.vip -attr AUTO_START=always

Below the output and the checking:

[root@greporasrv1 ~]# crsctl modify resource ora.LISTENER.lsnr -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.VDISK_OCR.dg -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.asm -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.gsd -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.net1.network -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.ons -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.registry.acfs -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.LISTENER_SCAN1.lsnr -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.LISTENER_SCAN2.lsnr -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.LISTENER_SCAN3.lsnr -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.cvu -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.grepora.db -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.greporasrv1.vip -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.greporasrv2.vip -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.oc4j -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.scan1.vip -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.scan2.vip -attr AUTO_START=always
[root@greporasrv1 ~]# crsctl modify resource ora.scan3.vip -attr AUTO_START=always
[root@greporasrv1 ~]#
[root@greporasrv1 ~]# crsctl stat res -p |grep AUTO_START
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
AUTO_START=always
[root@greporasrv1 ~]#

 

After Patch: MRP0: Background Media Recovery terminated with error 10485

Ok,
I had that some time ago after applying Patch 27475598 – Oracle JavaVM Component 11.2.0.4.180417 Database PSU.
Why? Well, this is Non RAC-Rolling Installable and also Not Data Guard Standby First Installable.

This means there downtime for this patch, no escape.

I had to (skipping all the standard opatch steps, you can see those on README):

  • Stop DG Replication:
dgmgrl /
show configuration
show database mydg
edit database 'mydg' set state='apply-off';
show database mydg
  • Run postinstall.sql in upgrade mode with only 1 instance on (disable RAC):
cd $ORACLE_HOME/sqlpatch/27475598
sqlplus /nolog
SQL> CONNECT / AS SYSDBA
SQL> STARTUP
SQL> alter system set cluster_database=false scope=spfile;
SQL> SHUTDOWN
SQL> STARTUP UPGRADE
SQL> @postinstall.sql
SQL> alter system set cluster_database=true scope=spfile;
SQL> SHUTDOWN
SQL> STARTUP

Ok, all good, seems all fine.

But now when starting my DG replication:

dgmgrl /
show configuration
show database mydg
edit database 'mydg' set state='apply-on';
show database mydg

What I see is:

DGMGRL> show database mydg

Database - mydg

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   0 seconds (computed 1 second ago)
  Apply Lag:       41 minutes 53 seconds (computed 1 second ago)
  Apply Rate:      (unknown)
  Real Time Query: OFF
  Instance(s):
    myprod

  Database Error(s):
    ORA-16766: Redo Apply is stopped

Database Status:
ERROR

DGMGRL>

And on Database Alert Log:

MRP0: Background Media Recovery terminated with error 10485
Errors in file /u01/app/oracle/diag/rdbms/axwest/greporaprod/trace/greporaprod_pr00_42628.trc:
ORA-10485: Real-Time Query cannot be enabled while applying migration redo.
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION
Recovery interrupted!
MRP0: Background Media Recovery process shutdown (greporaprod)

Well, in my case it happens because I use an Active Dataguard, in open read only. The solution? Start you DG in Mount Mode to apply the patching replication!

This is well described as per MOS: MRP process getting terminated with error ORA-10485 (Doc ID 1618485.1).

After getting sync, you can simple promote it to read only mode again.

Hope it helps!

Shellscript: Which database is this service_name from?

Hey!
Strange question, right? You should know you applications and databases and how everything works, right?

Happen that some days ago a client requested me to identify from which database is a service_name. Easy? Not if the server have 46 database instances running…

Ok, now how can I can this information as quick as possible?

I know that the command below give me all distinct databases in CRS:

/u01/app/12.1.0.2/grid/bin/crsctl stat res -t -w "TYPE = ora.database.type"

Now, lets make some sed magic:

[root@nprd01dbadm01 ~]# /u01/app/12.1.0.2/grid/bin/crsctl stat res -t -w "TYPE = ora.database.type" |grep ora. |sed 's/ora./srvctl status service -d /' |sed 's/.db/ |grep \$1/' | head -1
srvctl status service -d dba |grep $1

Continue reading

RAC on AIX: Network Best Practices

Hi all!
A few time ago I passed by some performance issues on AIX working with instances with different configuration (proc/mem). The root cause was basically the inefficient configuration of networking for interconnect (UDP).
As you know, the UDP is a non-response (for that reason with less metadata and faster) protocol. By the default, every server have a pool to send udp (and tcp) messages and another to recieve.
In my situation, once there was an ‘inferior’ instance, the pools were automatically set smaller in this one, and it was causing a high interconnection block sending statistics from other instances. In deed, it was lots of resends caused by overflows in this smaller instance…

There is one one to evaluete how much loss are you having for UDP in your AIX server:

netstat -s | grep 'socket buffer overflows'

If you are having considerable number of overflows, it’s recommended to reavaluate the sized of your udp_recvspace. And, of course, maintain the calculation of pools.

Oracle recommends, at least:

tcp_recvspace = 65536
tcp_sendspace = 65536
udp_sendspace = ((DB_BLOCK_SIZE * DB_FILE_MULTIBLOCK_READ_COUNT) + 4 KB) but no lower than 65536
udp_recvspace = 655360 (Minimum recommended value is 10x udp_sendspace, parameter value must be less than sb_max)
rfc1323 = 1
sb_max = 4194304
ipqmaxlen = 512

This and others details about configuring RAC on AIX ban be found in note: RAC and Oracle Clusterware Best Practices and Starter Kit (AIX) (Doc ID 811293.1)

I’d recommend you take a look too.

Have a nice day!
Matheus.

VMWare: Adding Shared Disks for Clustered Oracle Database

Hi folks!
Today a friend asked about how to configure disks on VMWare to create a virtualized cluster database. I revisited my old notes and decided to share. Here it goes…

First, I really have some constraints about it:
– Fake “high availability”: To have HA with VM it’s not needed 2 vms, if a host fail VMWare should make a VMotion (if well configured), and no services will be affected. So, one VM is ok.
– Not real “horizontally scallated”: It probably would be better to use one server as physical than have two vms on it. Not make sense to do it…

So, why?
To prove concept, evaluate RAC configuration (caches on sequences, etc) and labs, to learn and practice RAC stuffs…

Ok, now how to make it happen?

1. Add new disk to one of the machines. Some way, one will be the “primary” and share disks with another.
add1.jpg

2. Set Mode Thick Eager Zeroed

add2.jpg

3. Create a specific controller to this “shared disks”

add3.jpg

4. Set controller to virtual sharing

add4.jpg

# Other Machine
5. Adding the existent disk to other VM (not primary, but from primary)

add5.jpg
6. Select disk from primary

add6.jpg

add7.jpg

7. Create a new controller, as you made on primary and select it:

add8.jpgadd9.jpg

8. Set controller to virtual sharing
add10

OBS:
If this error happen, one of your controller is not in sharing mode. Please check it.
add11

 

And here we are! 🙂
Good lab!
Matheus.

Kludge: Mounting ACFS Thought Shellscript

Just the script. The history is here.
This is a “workaround” script. As always, is recommended to use Oracle Restart, like I posted here.

#!/bin/sh
$GRID_HOME/bin/srvctl add filesystem -d /dev/asm/dbhome-270 -g 'DGHOME' -v DBHOME -m /oracle/db -u oracle
if [ $? = "0" -o $? = "2" ]; then
$GRID_HOME/bin/srvctl start filesystem -d /dev/asm/dbhome-270
if [ $? = "0" ]; then
chown oracle:oinstall /oracle/db
chmod 775 /oracle/db
$GRID_HOME/bin/srvctl status filesystem -d /dev/asm/dbhome-270
exit 0
fi
$GRID_HOME/bin/srvctl status filesystem -d /dev/asm/dbhome-270
fi

There is a good post ACFS and ACFS restart scripting:
https://levipereira.wordpress.com/2011/07/28/oracle-acfs-filesystem-managed-by-ohas-on-oracle-restart/

See ya!

Matheus.

Scheduler Job by Node (RAC Database)

Sometimes you want to run something just in one node of the RAC. Here is an example to do it:

create or replace procedure USER_JOB.PRC_SOMETHING is
begin
-- do something
null;
end;
/
begin
sys.dbms_scheduler.create_job(job_name => 'USER_JOB.JOB_SOMETHING',
job_type            => 'PLSQL_BLOCK',
job_action          => 'USER_JOB.PRC_SOMETHING;',
start_date          => sysdate,
repeat_interval     => 'Freq=Minutely;Interval=30',
end_date            => to_date(null),
job_class           => 'DEFAULT_JOB_CLASS',
enabled             => true,
auto_drop           => false,
comments            => 'Something Job.');
end;
/
begin
dbms_scheduler.set_attribute(name => 'USER_JOB.JOB_SOMETHING',
 attribute=>'INSTANCE_ID', value=> 1);
end;
/

Matheus.

ORA-27302: failure occurred at: sskgpcreates

# Error:

dbsrvr1:/home/oracle>srvctl start database -d mydb
PRCR-1079 : Failed to start resource ora.mydb.db
CRS-5017: The resource action "ora.mydb.db start" encountered the following error:
ORA-27154: post/wait create failed
ORA-27300: OS system dependent operation:semget failed with status: 28
ORA-27301: OS failure message: No space left on device
ORA-27302: failure occurred at: sskgpcreates
. For details refer to "(:CLSN00107:)" in "/grid/product/11.2.0.4/log/dbsrvr2/agent/crsd/oraagent_oracle/oraagent_oracle.log".
CRS-2674: Start of 'ora.mydb.db' on 'dbsrvr2' failed
CRS-2632: There are no more servers to try to place resource 'ora.mydb.db' on that would satisfy its placement policy

Seems the weeror is happening on dbsrvr2, right?
The doc below talks more about the error and the semaphores calculation:
Database Startup Fails with ORA-27300: OS system dependent operation:semget failed with status: 28 (Doc ID 949468.1)

Let’s make an adjust here:

[root@dbsrvr2 ~]# cat /etc/sysctl.conf |grep sem
kernel.sem = 250 32000 100 142
[root@dbsrvr2 ~]# vi /etc/sysctl.conf
[root@dbsrvr2 ~]# cat /etc/sysctl.conf |grep sem
kernel.sem = 250 32000 100 256
[root@dbsrvr2 ~]# sysctl -p

And try again:

dbsrvr1:/home/oracle>srvctl start database -d mydb
dbsrvr1:/home/oracle>

Well done! 😀

Matheus.

Sqlplus: Connect without configure TNSNAMES

Okey, you must to know, but is always useful to remmember that… If you don’t want to configure your TNSNAMES, you can connect directly to description of your database. This way:

sqlplus> conn matheus_boesing@(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=mydb.domain.net)(PORT=1531)))(CONNECT_DATA=(service_name=mydb)))
Enter password: ********
Connected.
sqlplus>

Based on this, I made two scripts, to connect with the sid (c.sql) or with the service_name (s.sql) and make my life easier. Here the scripts:

sqlplus>get c
1 DEFINE VHOST = &1.
2 DEFINE VPORT = &2.
3 DEFINE VSID = &3.
4 DEFINE VDESC='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=&VHOST)(PORT=&VPORT)))(CONNECT_DATA=(SID=&VSID)(server=dedicated)))'
5 disconnect
6 connect matheus_boesing@&&VDESC
7 set linesize 1000
8 set sqlprom '&&VSID> '
9 select instance_name, host_name
10 from v$instance;
11 exec dbms_application_info.SET_MODULE('MATHEUS_BOESING','DBA');
12 alter session set nls_date_format='DD/MM/YYYY HH24:MI:SS';
13 UNDEFINE VDESC
14 UNDEFINE 1
15 UNDEFINE 2
16* UNDEFINE 3
sqlplus>get s
1 DEFINE VHOST = &1.
2 DEFINE VPORT = &2.
3 DEFINE VSID = &3.
4 DEFINE VDESC='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=&VHOST)(PORT=&VPORT)))(CONNECT_DATA=(SERVICE_NAME=&VSID)(server=dedicated)))'
5 prompt &VDESC
6 disconnect
7 connect matheus_boesing@&&VDESC
8 set linesize 1000
9 set sqlprom '&&VSID> '
10 select instance_name, host_name
11 from v$instance;
12 exec dbms_application_info.SET_MODULE('MATHEUS_BOESING','DBA');
13 alter session set nls_date_format='DD/MM/YYYY HH24:MI:SS';
14 UNDEFINE VDESC
15 UNDEFINE 1
16 UNDEFINE 2
17* UNDEFINE 3
sqlplus>

It can be used like this:

sqlplus>@s mydb.domain.net 1531 mydb
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=mydb.domain.net)(PORT=1531)))(CONNECT_DATA=(SERVICE_NAME=mydb)(server=dedicated)))
Enter password: ********
Connected.

Ok, but, let’s suppose you are working in a cluster and wants to connect directly to the another instance. I made the script below (ci.sql). It’s not beautiful, but is a lot hopeful:

sqlplus> get ci
1 DEFINE VINT = &1.
2 undefine VHOST
3 undefine VSID
4 VARIABLE VCONN varchar2(100)
5 PRINT ret_val
6 BEGIN
7 SELECT '@c '||host_name||' 1521 '||INSTANCE_NAME
8 INTO :VCONN
9 FROM gv$instance where INSTANCE_NUMBER=&VINT;
10 END;
11 /
12 set head off;
13 spool auxcon.sql
14 prompt set head on;
15 print :VCONN
16 prompt set head on;
17 spool off;
18* @auxcon
sqlplus>

As you see, you inform the inst_id you want to connect. It can be used like:

mydb> @instance
INSTANCE_NAME
------------------------------
mydb_2
mydb> @instances
INST_NUMBER INST_NAME
----------- ---------------------------------------
1 db2srvr2p.grepora.net:mydb_1
2 db1srvr1p.grepora.net:mydb_2
mydb> @ci 1
@c db2srvr2p.grepora.net 1521 mydb_1
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
Enter password: ********
Connected.
mydb_1> @instance
INSTANCE_NAME
------------------------------
mydb_1

These scripts use to help me a lot on daily basis, and it’s exclusive.
I couldn’t find anything like this so far. So, I made it. 🙂

Matheus.