ORA-27302: failure occurred at: sskgpcreates

# Error:

dbsrvr1:/home/oracle>srvctl start database -d mydb
PRCR-1079 : Failed to start resource ora.mydb.db
CRS-5017: The resource action "ora.mydb.db start" encountered the following error:
ORA-27154: post/wait create failed
ORA-27300: OS system dependent operation:semget failed with status: 28
ORA-27301: OS failure message: No space left on device
ORA-27302: failure occurred at: sskgpcreates
. For details refer to "(:CLSN00107:)" in "/grid/product/11.2.0.4/log/dbsrvr2/agent/crsd/oraagent_oracle/oraagent_oracle.log".
CRS-2674: Start of 'ora.mydb.db' on 'dbsrvr2' failed
CRS-2632: There are no more servers to try to place resource 'ora.mydb.db' on that would satisfy its placement policy

Seems the weeror is happening on dbsrvr2, right?
The doc below talks more about the error and the semaphores calculation:
Database Startup Fails with ORA-27300: OS system dependent operation:semget failed with status: 28 (Doc ID 949468.1)

Let’s make an adjust here:

[root@dbsrvr2 ~]# cat /etc/sysctl.conf |grep sem
kernel.sem = 250 32000 100 142
[root@dbsrvr2 ~]# vi /etc/sysctl.conf
[root@dbsrvr2 ~]# cat /etc/sysctl.conf |grep sem
kernel.sem = 250 32000 100 256
[root@dbsrvr2 ~]# sysctl -p

And try again:

dbsrvr1:/home/oracle>srvctl start database -d mydb
dbsrvr1:/home/oracle>

Well done! 😀

Matheus.

Sqlplus: Connect without configure TNSNAMES

Okey, you must to know, but is always useful to remmember that… If you don’t want to configure your TNSNAMES, you can connect directly to description of your database. This way:

sqlplus> conn matheus_boesing@(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=mydb.domain.net)(PORT=1531)))(CONNECT_DATA=(service_name=mydb)))
Enter password: ********
Connected.
sqlplus>

Based on this, I made two scripts, to connect with the sid (c.sql) or with the service_name (s.sql) and make my life easier. Here the scripts:

sqlplus>get c
1 DEFINE VHOST = &1.
2 DEFINE VPORT = &2.
3 DEFINE VSID = &3.
4 DEFINE VDESC='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=&VHOST)(PORT=&VPORT)))(CONNECT_DATA=(SID=&VSID)(server=dedicated)))'
5 disconnect
6 connect matheus_boesing@&&VDESC
7 set linesize 1000
8 set sqlprom '&&VSID> '
9 select instance_name, host_name
10 from v$instance;
11 exec dbms_application_info.SET_MODULE('MATHEUS_BOESING','DBA');
12 alter session set nls_date_format='DD/MM/YYYY HH24:MI:SS';
13 UNDEFINE VDESC
14 UNDEFINE 1
15 UNDEFINE 2
16* UNDEFINE 3
sqlplus>get s
1 DEFINE VHOST = &1.
2 DEFINE VPORT = &2.
3 DEFINE VSID = &3.
4 DEFINE VDESC='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=&VHOST)(PORT=&VPORT)))(CONNECT_DATA=(SERVICE_NAME=&VSID)(server=dedicated)))'
5 prompt &VDESC
6 disconnect
7 connect matheus_boesing@&&VDESC
8 set linesize 1000
9 set sqlprom '&&VSID> '
10 select instance_name, host_name
11 from v$instance;
12 exec dbms_application_info.SET_MODULE('MATHEUS_BOESING','DBA');
13 alter session set nls_date_format='DD/MM/YYYY HH24:MI:SS';
14 UNDEFINE VDESC
15 UNDEFINE 1
16 UNDEFINE 2
17* UNDEFINE 3
sqlplus>

It can be used like this:

sqlplus>@s mydb.domain.net 1531 mydb
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=mydb.domain.net)(PORT=1531)))(CONNECT_DATA=(SERVICE_NAME=mydb)(server=dedicated)))
Enter password: ********
Connected.

Ok, but, let’s suppose you are working in a cluster and wants to connect directly to the another instance. I made the script below (ci.sql). It’s not beautiful, but is a lot hopeful:

sqlplus> get ci
1 DEFINE VINT = &1.
2 undefine VHOST
3 undefine VSID
4 VARIABLE VCONN varchar2(100)
5 PRINT ret_val
6 BEGIN
7 SELECT '@c '||host_name||' 1521 '||INSTANCE_NAME
8 INTO :VCONN
9 FROM gv$instance where INSTANCE_NUMBER=&VINT;
10 END;
11 /
12 set head off;
13 spool auxcon.sql
14 prompt set head on;
15 print :VCONN
16 prompt set head on;
17 spool off;
18* @auxcon
sqlplus>

As you see, you inform the inst_id you want to connect. It can be used like:

mydb> @instance
INSTANCE_NAME
------------------------------
mydb_2
mydb> @instances
INST_NUMBER INST_NAME
----------- ---------------------------------------
1 db2srvr2p.grepora.net:mydb_1
2 db1srvr1p.grepora.net:mydb_2
mydb> @ci 1
@c db2srvr2p.grepora.net 1521 mydb_1
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
Enter password: ********
Connected.
mydb_1> @instance
INSTANCE_NAME
------------------------------
mydb_1

These scripts use to help me a lot on daily basis, and it’s exclusive.
I couldn’t find anything like this so far. So, I made it. 🙂

Matheus.

ORA-29760: instance_number parameter not specified

I felt myself stupid when I lost a few minutes to undestand this error:

SQL> startup pfile=init_corpdb.ora
ORA-29760: instance_number parameter not specified

Do you belive the solution was simply to set a number in ORACLE_SID?
Take a look:

dbsrvr>echo $ORACLE_SID
corpdb
dbsrvr>export ORACLE_SID=corpdb_1
dbsrvr>sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Sun Jun 28 00:18:05 2015
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup pfile=init_corpdb.ora
ORACLE instance started.
Total System Global Area 4275781632 bytes
Fixed Size 2220200 bytes
Variable Size 889196376 bytes
Database Buffers 3372220416 bytes
Redo Buffers 12144640 bytes
Database mounted.
Database opened.

I hope neve miss time with this again… 😛

Matheus.

GoldenGate: RAC One Node Archivelog Missing

The situation:

We have a GoldenGate on Allow Mode running some extracts on RAC One Node Database (reading the archivelogs). And then, suddenly, the instance crashes (network lost contact to the server) and the other instance (thread) was auto started by CRS. To the database no problems: The other node redologs was used during the startup recover and every thing is ok.

The application running with Weblogic serverpool and gridlink just had a little contention and continued the operation thought the started instance. The Goldengate switch was manually made, but some sequences was lost. What we found? the sequences was in the old thread’s redologfiles. It should be backed up if fast_start_mttr_target was different to zero. Buuut, the world is not so beautiful:

raconenodedb> show parameters mttr
NAME TYPE VALUE
------------------------------------
fast_start_mttr_target integer 0

How we solved?
Simple solution: identified the group/thread and made a cp from ASM. The copied redolog was used as archivelog on goldengate and everything was ok.

raconenodedb> select sequence#,group#,thread# from v$log where thread#=2 order by 1;
SEQUENCE# GROUP# THREAD#
---------- ---------- ----------
39636 6 2
39637 7 2
39638 8 2
39639 9 2
39640 10 2
ASMCMD> cp group_10.288.859482805 /oracle/grup10_thread2
copying +DGDATA/MYDB/ONLINELOG/group_10.288.859482805 -> /oracle/grup10_thread2

Easy like that.

Matheus.

High CPU usage by LMS and Node Evictions: Solved by Setting “_high_priority_processes”

Another thing that may help you in environments with highly interdependent applications:

Our env has high interconnect network block changing, and, as a consequence, high CPU usage by Global Cache Services (GCS)/Lock Manager Server Process (LMS).

This way, for each little latency in the interconnect interface, we were having a node eviction and all the impacts to the legacy application you can imagine (without gridlink or any solution to make the relocation ‘transparent’, as is usual to legacy application) and, of course, the business impact.

Oracle obviously suggested that we reduce the block concurrency over the cluster nodes grouping the application by affinity. But, it’s just no applicable to our env… 🙁

When nothing seemed to help, the workaround came from here: Top 5 Database and/or Instance Performance Issues in RAC Environment (Doc ID 1373500.1).

Here is our change:

boesing@proddb> alter system set "_high_priority_processes"='LMS*|LGWR|VKTM' scope=spfile sid='*';
System altered.

No magic, but the problem stopped to happen. After that, we’re having some warnings about clock synchronization over the cluster nodes on CRS alerts. Like this:

CRS-2409:The clock on host proddb1 is not synchronous with the mean cluster time. No action has been taken as the Cluster Time Synchronization. Service is running in observer mode.

I believe it happens because VKTM lost priority. But it’s OK: The node evictions has stopped! 😀

Matheus.

How to Prevent Automatic Database Startup

This is a quick post! 😀
About Oracle Restart
Reference to SRVCTL

Ok!
In a nutshell, my notes:

To register the database, if not already registered:
> srvctl add database -d $DBNAME -o $ORACLE_HOME -p $ORACLE_HOME/dbs/spfile.ora -y manual

Once the database is registered, change the management policy for the database to manual:
> srvctl modify database -d $DBNAME -y manual

Matheus.

Ordering Sequences over RAC – Hang on ‘DFS lock handle’

Hi all!
Whats up?
I had a fun weekend. So, some things to write about. 🙂

This post is just to show an exerience with the event ‘DFS lock handle’, related to sequence ordering over the cluster nodes.

I started a process to make pk id’s adjustment, related to application’s number limitation (int 32 bits -> 4294967296), looking for move older entries and “release” some ids.
As a legacy of database unification, we have a lot of tables with the same name in different schemas, those use the same sequence for pk ids generation.
The readjustment involve a select of a sequence and is runned in a lot of parallel “sqlplus” call to optimize the time and fit to the business maintenance window.

When I started, I just used the global service name (dedicated connection) and the scanlistener. The result was distribuiting connections over the 5 nodes of the cluster. Bad idea.
In the first, I suspected about the concurrency by the sequence over different nodes (could occour if the node caches are too small), based on a few XA transaction bugs involving this event.

By the way, if you’re facing this hang with XA transactions, please take a look on “High rdbms ipc reply and DFS lock handle in 11gR2 RAC With XA Enabled Application (Doc ID 1361615.1)“.
It can be solved by setting “_clusterwide_global_transactions” to FALSE.
It’s recommendable, additionally, to read the Best Practices for Using XA with RAC.

Take a look on the blocking session over the cluster nodes:

proddb4> @sess
User:MATHEUS
SID SERIAL# INST_ID EVENT SQL_ID BLOCKING_SE BLOCKING_SESSION BLOCKING_INSTANCE
------ ---------- ---------- ----------------- -------------- ----------- ---------------- -----------------
9386 147 4 DFS lock handle 9zr9vpvmkqzkv VALID 10968 4
9499 179 4 DFS lock handle fqk9y9q7u2d5c VALID 11082 4
8821 153 4 DFS lock handle 2jd84taf7krh2 VALID 13902 4
22442 1155 3 DFS lock handle 8ycpfxq2jthq3 VALID 9067 3
9860 1339 3 DFS lock handle 2jmzv23ug9kth VALID 10299 3
9772 1529 3 DFS lock handle 802kn9htah6pt VALID 22442 3
22543 1673 5 DFS lock handle 6tgvwkt6cqngk VALID 3074 5
22307 135 5 DFS lock handle 5b3zgqgq7bbdz VALID 3665 5
21010 91 5 DFS lock handle gkmycubvn9aa3 VALID 3546 5
9508 1459 3 DFS lock handle 7cw6bcjsf8xf2 VALID 10387 3
10299 4669 3 DFS lock handle 7y2tnuckh37wp VALID 11795 3
121 139 5 DFS lock handle 6tgvwkt6cqngk VALID 3310 5
596 113 5 DFS lock handle 8yqbzu29shvnm VALID 2603 5
360 113 5 DFS lock handle dv49pafm9z8zy VALID 596 5
10740 3177 3 DFS lock handle c6q65hnq0ju7x VALID 11707 3
9838 181 4 DFS lock handle aqa7afq2upkuq VALID 9386 4
714 77 5 DFS lock handle ft8xzyzhycpn2 VALID 360 5
9951 147 4 DFS lock handle 697mts944db7y VALID 9725 4
950 109 5 DFS lock handle cd2gsz5rb2qw9 VALID 3192 5
10387 1529 3 DFS lock handle 2tqnrbh0x60dp VALID 12238 3
10064 143 4 DFS lock handle d833wg4u9cfyb VALID 10649 1
833 1503 5 DFS lock handle 7ynbg2t4taxha VALID 2366 5
10649 53 1 DFS lock handle 0sgzmj1tbx4rh VALID 10737 1
2249 149 5 DFS lock handle aa6jr8ugxaz4z VALID 833 5
9612 175 4 DFS lock handle d2nrr4gtdjq9b VALID 9499 4
10825 57 1 DFS lock handle acmyc4sw7zzc2 VALID 10649 1
2603 1415 5 DFS lock handle fg47vs5wa8zq8 VALID 22307 5
2485 65 5 DFS lock handle 702x9zwtfktu6 VALID 714 5
10737 55 1 DFS lock handle bthxrpmz0ug63 VALID 12148 3

Ok doke, let’s cancel the sessions and rerun the process just in node node (by SID). It should solve the small caches over cluster hang, without need to modify the sequence, right?

snailBeeep. Wrong:

proddb4> @sess
User:MATHEUS
SID SERIAL# INST_ID EVENT SQL_ID BLOCKING_SE BLOCKING_SESSION BLOCKING_INSTANCE
------ ---------- ---------- --------------- ------------- ----------- ---------------- -----------------
2494 53953 4 DFS lock handle fc3cam368zsp6 UNKNOWN
6561 32113 4 DFS lock handle f618p0hd4xsy0 UNKNOWN
9269 111 4 DFS lock handle fkn8hxbsfkfnz UNKNOWN
9047 175 4 DFS lock handle fqk9y9q7u2d5c VALID 8931 4
459 12605 4 DFS lock handle 5b3zgqgq7bbdz VALID 9271 4
1929 305 4 DFS lock handle 6tgvwkt6cqngk VALID 8026 4
7349 1013 4 DFS lock handle 802kn9htah6pt UNKNOWN
7800 175 4 DFS lock handle 0hc1bmqj1fp4f UNKNOWN
21475 17349 4 DFS lock handle cfh3r4sq788vu VALID 9042 4
8026 641 4 DFS lock handle 6tgvwkt6cqngk VALID 459 4
14919 59 4 DFS lock handle gkmycubvn9aa3 VALID 15373 4
15032 2267 4 DFS lock handle 9zr9vpvmkqzkv VALID 7688 4
15145 2411 4 DFS lock handle ddkqx4xttc9s9 UNKNOWN
15373 1657 4 DFS lock handle 2jd84taf7krh2 VALID 15713 4
8934 157 4 DFS lock handle 8ycpfxq2jthq3 VALID 1929 4
15826 551 4 DFS lock handle d8dhmr2sx08xq VALID 9612 4
15713 3357 4 DFS lock handle 2jmzv23ug9kth VALID 10177 4
8821 155 4 DFS lock handle 9fpmw9cwak21s UNKNOWN
16050 7007 4 DFS lock handle 4t5qkth35r2um VALID 8705 4
2042 1269 4 DFS lock handle 7cw6bcjsf8xf2 UNKNOWN

What a hell!
Lets take a look in one of the sqls to find the sequence…

proddb4> @sqlid 6tgvwkt6cqngk
UPDATE TABLE_XPTO SET RECNO = SEQ_OWNER.SEQ_NAME.NEXTVAL WHERE ROWID=:B1 EXTVAL WHERE ROWID=:B1

And what about the sequence configuration?

proddb4> @getddl sequence SEQ_OWNER SEQ_NAME
create sequence SEQ_OWNER.SEQ_NAME
minvalue 1
maxvalue 9999999999
start with 85669803
increment by 1
cache 120000
cycle
order;

ORDER!
Man, of course. It create a several control over the nodes just to keep the sequence in order, as explained in this post by Christo Kutrovsky.

To my situation, in the business maintenance window, it’s not an important constraint. So, lets disable the ordering:

proddb4> alter sequence SEQ_OWNER.SEQ_NAME noorder;
Sequence altered.

Then, TAADÃÃ!

proddb4> @sess
User:MATHEUS
SID SERIAL# INST_ID EVENT SQL_ID BLOCKING_SE BLOCKING_SESSION BLOCKING_INSTANCE
----- ---------- ------- ------------------------ ------------- ----------- ---------------- -----------------
15145 2411 4 library cache: mutex X ddkqx4xttc9s9 UNKNOWN
15032 2267 4 library cache: mutex X 9zr9vpvmkqzkv UNKNOWN
14919 59 4 library cache: mutex X gkmycubvn9aa3 NOT IN WAIT
9269 111 4 library cache: mutex X fkn8hxbsfkfnz UNKNOWN
9047 175 4 library cache: mutex X fqk9y9q7u2d5c UNKNOWN
8934 157 4 library cache: mutex X 8ycpfxq2jthq3 UNKNOWN
8821 155 4 library cache: mutex X 9fpmw9cwak21s UNKNOWN
8026 641 4 library cache: mutex X 6tgvwkt6cqngk UNKNOWN
7800 175 4 library cache: mutex X 0hc1bmqj1fp4f UNKNOWN
7349 1013 4 library cache: mutex X 802kn9htah6pt UNKNOWN
2042 1269 4 library cache: mutex X 7cw6bcjsf8xf2 UNKNOWN
9160 1205 4 library cache: mutex X 6tgvwkt6cqngk NOT IN WAIT
9042 293 4 library cache: mutex X 4jc1u6n2qx94z UNKNOWN
10177 5611 4 library cache: mutex X c6q65hnq0ju7x UNKNOWN
9271 235 4 library cache: mutex X d2nrr4gtdjq9b NOT IN WAIT
9951 1191 4 library cache: mutex X bkdhxhhqdbqb9 UNKNOWN
8931 291 4 library cache: mutex X 697mts944db7y UNKNOWN
9838 1315 4 library cache: mutex X 6q6r7ht1hnctg NOT IN WAIT
8818 325 4 library cache: mutex X 2tqnrbh0x60dp UNKNOWN

Of course we’re having some mutex x, but it’s a lot better then DFS lock, and the process just “go”. 🙂

After be done, to keep the configuration, let’s enable ordering again:

proddb4> alter sequence SEQ_OWNER.SEQ_NAME order;
Sequence altered.

Matheus.