Application Hangs: resmgr:become active

Application APP hangs with resmgr:become active. There is a resource plan defined who has a specific group to this Application. What is wrong and how to fix?

Here I presume you what is a resource manager and a resource plan. And, of course, for what purpose they exists. You must to know that this event is related to high active sessions in the group of resource plan too.

Before everything else, please understand if this is an acceptable behavior of the application. Then, in which resource group the sessions in this event are. The are other application in this same group with an unacceptable behavior? Yes? So, fix it.
No? Consider tho adjust the resource plan, switch the application to a new group, or, like in my case, remap the Application APP to the right group… ¬¬

To make it clear: In my case, the mapping is missing, so the schema MYAPP (Application APP) fit to OTHER_GROUP, where we use to set minimal limits:

SID SERIAL# INST_ID USERNAME RESOURCE_CONSUMER_GROUP EVENT
----- ---------- ---------- ------------------------------ -----------
492 29459 2 MYAPP OTHER_GROUPS resmgr:become active
1102 19145 2 MYAPP OTHER_GROUPS resmgr:become active
955 33161 2 MYAPP OTHER_GROUPS resmgr:become active
1084 33839 2 MYAPP OTHER_GROUPS db file sequential read
MYDB> show parameters resource_manager_plan
NAME TYPE VALUE
--------------------- ------ --------------
resource_manager_plan string MYDB_PLAN
MYDB> select group_or_subplan, active_sess_pool_p1, cpu_p1, cpu_p2, cpu_p3, cpu_p4 from DBA_RSRC_PLAN_DIRECTIVES where plan = 'MYDB_PLAN'
Enter value for plano: MYDB_PLAN
GROUP_OR_SUBPLAN ACTIVE_SESS_POOL_P1 CPU_P1 CPU_P2 CPU_P3 CPU_P4
------------------------------ ------------------- ---------- ---------- ---------- ----------
BATCH_GROUP 60 0 10 0 0
SYS_GROUP 80 0 0 0
APP_PLAN 20 0 30 0 0
OTHER_GROUPS 20 0 20 0 0
GGATE_GROUP 0 10 0 0
PAYTRUE_GROUP 40 0 30 0 0
DBA_GROUP 20 0 0 0

You can configure the mapping by user like that:

BEGIN
DBMS_RESOURCE_MANAGER.clear_pending_area;
DBMS_RESOURCE_MANAGER.create_pending_area;
DBMS_RESOURCE_MANAGER.set_consumer_group_mapping (
attribute => DBMS_RESOURCE_MANAGER.oracle_user,
-- DBMS_RESOURCE_MANAGER.service_name (or a lot of possibilities. Google it!)
value => 'MYAPP',
consumer_group => 'APP_PLAN');
DBMS_RESOURCE_MANAGER.validate_pending_area;
DBMS_RESOURCE_MANAGER.submit_pending_area;
END;
/

To switch the connected sessions, it can be done like:

SELECT 'EXEC DBMS_RESOURCE_MANAGER.SWITCH_CONSUMER_GROUP_FOR_SESS ('''||SID||''','''||SERIAL#||''',''APP_PLAN'');' FROM V$SESSION where username='MYAPP'
and RESOURCE_CONSUMER_GROUP='OTHER_GROUPS';

Remember that creating a resource plan without making the mappings is a bit pointless… 😛

Matheus.

Application Looping Until Lock a Row with NOWAIT Clause

Yesterday I treated an interesting situation:
A BATCH stayed on “SQL*Net message from client” event but the last_call_et was always on 1 or 0. Seems OK, with some client contention to send the commands to the DBMS, right? Nope.

It was caused by a loop in the application code “waiting” for a row lock but without “DBMS waiting events” (something like “select * from table for update nowait”). Take a look in how it was identified below.

First the session with no SQL_ID, no wait events and last_Call_et=0 of a “BATH_PROCESS” user:

proddb2> @sid
Sid:9796
Inst:
LAST_CALL_ET SQL_ID   EVENT STATUS SID SERIAL# INST_ID USERNAME
------------ ------- ------------- ---------- ------------------------
0 SQL*Net message from client INACTIVE 9796 45117 2 BATCH_PROCESS
proddb2> @trace
Enter value for sid: 9796
Enter value for serial: 45117
PL/SQL procedure successfully completed.

As you see, with no idea about what is happening, I started a trace. The trace was stuck with this:

*** 2015-06-15 14:03:25.755
WAIT #4574470448: nam='SQL*Net message from client' ela=993072 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833326636999
CLOSE #4574470448:c=10,e=15,dep=0,type=3,tim=12833326637228
PARSE #4574470448:c=25,e=41,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1139820409,tim=12833326637286
BINDS #4574470448:
Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=01 fl2=1000000 frm=00 csi=00 siz=24 off=0
kxsbbbfp=110a8d0d8 bln=22 avl=05 flg=05
value=5022011
WAIT #4574470448: nam='gc cr block 2-way' ela= 709 p1=442 p2=5944 p3=8483 obj#=0 tim=12833326638533
WAIT #4574470448: nam='gc cr block 2-way' ela= 541 p1=3 p2=2088264 p3=4367 obj#=0 tim=12833326639352
WAIT #4574470448: nam='gc cr block 2-way' ela= 651 p1=442 p2=5944 p3=8483 obj#=0 tim=12833326641673
WAIT #4574470448: nam='enq: TX - row lock contention' ela= 1093 name|mode=1415053318 usn<obj#=23141074 tim=12833326643029
EXEC #4574470448:c=1776,e=5836,p=0,cr=117,cu=1,mis=0,r=0,dep=0,og=1,plh=1139820409,tim=12833326643150
ERROR #4574470448:err=54 tim=12833326643172
WAIT #4574470448: nam='SQL*Net break/reset to client' ela= 9 driver id=1413697536 break?=1 p3=0 obj#=23141074 tim=12833326643373
WAIT #4574470448: nam='SQL*Net break/reset to client' ela= 503 driver id=1413697536 break?=0 p3=0 obj#=23141074 tim=12833326643891
WAIT #4574470448: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833326643915

AHÁ!
Did you see the “err=54” there? Yes. You know this error:

ORA-00054: Resource busy and acquire with NOWAIT specified

It’s caused by a SELECT FOR UPDATE NOWAIT in the code.
But, this select is in a loop, so the session don’t go ahead until have it.
(Obviously it could be coded with some treatment/better logic for this loop and errors, buuuut…)

What can we do now?
The easy way is to discover the holding session and kill it.
And sometimes the easy way is the best way. 😉

For that, we use the “obj#” and “value”, also bolded in the trace.
As I know the application, I know that the used field in all “where clauses” is the “RECNO” column. But if you don’t, it’s needed to discover. With this information in mind:

proddb2>select * from dba_objects where object_id='23141074';
OWNER OBJECT_NAME
------------------------------ ----------------
OWNER_EXAMPLE TABLE_XPTO
proddb2> select * from OWNER_EXAMPLE.TABLE_XPTO WHERE recno=5022011;
COL_KEY FSAMED0 FSAMED1 FSMNEG1 FSMNEG2 FSMNEG3 COL_DATE RECNO
------- ---------- ---------- ---------- ---------- ---------- -----
1002974 0 0 -516.8 0 0 15/06/2015 00:00:00 5022011

Ok, I know the row that is holded by the other session.
Let’s discover which session is causing a lock by myself (but in my case, without “NOWAIT” clause, to have time to find the holder):

proddb5>select * from OWNER_EXAMPLE.TABLE_XPTO WHERE recno=5022011 for update;

In another sqlplus session:

proddb2> @me
INST_ID SID SERIAL# USERNAME EVENT BLOCKING_SE BLOCKING_SESSION BLOCKING_INSTANCE
------- ---------- ---------- --------------- ----------------------
5 14174 479 MATHEUS_BOESING enq:TX - row lock contention VALID 11006 1
2 4233 12879 MATHEUS_BOESING PX Deq: Execution Msg NOT IN WAIT
1 15410 7697 MATHEUS_BOESING PX Deq: Execution Msg NOT IN WAIT

AHÁ again!
The SID 11006. Let’s see who is there:

proddb2> @sid
Sid:11006
Inst:
SQL_ID SEQ# EVENT STATUS SID SERIAL# INST_ID USERNAME
-------------------- ---------- --------------------------------------
9jzm6vn5j06js 24919 enq: TX - row lock contention ACTIVE 11006 44627 1 DBLINK_OTHER_BATCH_SCHEMA

Ok, it’s another session of a different batch process in a remote database holding this row. As it’s less relevant, lets kill! Muahaha!
Then, you’ll see, my session get the lock and is in the middle of a transaction:

proddb1> @kill
***
sid : 11006
serial : 44627
***
System altered.
***
proddb1> @me
INST_ID SID SERIAL# USERNAME EVENT BLOCKING_SE BLOCKING_SESSION BLOCKING_INSTANCE
------- ---------- ---------- --------------- --------------------
5 14174 479 MATHEUS_BOESING transaction UNKNOWN
2 4332 56037 MATHEUS_BOESING PX Deq: Execution Msg NOT IN WAIT
1 12058 9 MATHEUS_BOESING class slave wait NO HOLDER

To release the “row locked” to my principal process, lets suicide (kill my own session, this case, that is holding the row lock right now).

proddb5> @kill
***
sid : 14174
serial : 479
***
System altered.
***

After kill all the holding sessions, my BATCH_PROCESS just gone! 😀
Take a look on the trace (running ok):

WAIT #4576933904: nam='SQL*Net message to client' ela= 3 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981531019
FETCH #4576933904:c=45,e=71,p=0,cr=3,cu=0,mis=0,r=5,dep=0,og=1,plh=419358542,tim=12833981531062
WAIT #4576933904: nam='SQL*Net message from client' ela= 562 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981531654
WAIT #4576933904: nam='SQL*Net message to client' ela= 3 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981531788
FETCH #4576933904:c=55,e=86,p=0,cr=2,cu=0,mis=0,r=5,dep=0,og=1,plh=419358542,tim=12833981531826
WAIT #4576933904: nam='SQL*Net message from client' ela= 715 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981532576
WAIT #4576933904: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981532721
FETCH #4576933904:c=61,e=96,p=0,cr=2,cu=0,mis=0,r=5,dep=0,og=1,plh=419358542,tim=12833981532758
WAIT #4576933904: nam='SQL*Net message from client' ela= 600 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981533617
WAIT #4576933904: nam='SQL*Net message to client' ela= 3 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981534163
FETCH #4576933904:c=52,e=82,p=0,cr=2,cu=0,mis=0,r=5,dep=0,og=1,plh=419358542,tim=12833981534203
WAIT #4576933904: nam='SQL*Net message from client' ela= 517 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981534752

Now, with the problem solved, lets disable the trace and continue the other daily tasks… 🙂

proddb2> @untrace
Enter value for sid: 9796
Enter value for serial: 45117
PL/SQL procedure successfully completed.

I hope it was useful!
If helped you, make a comment! 😀

See ya!
Matheus.

Adding ASM Disks on RHEL Cluster with Failgroups

# Recognizing as ASMDISK on ASM Libs (ORACLEASM):

1) All cluster nodes: /etc/init.d/oracleasm scandisk
[root@db1host1p ~]# /etc/init.d/oracleasm scandisks
Scanning the system for Oracle ASMLib disks: [ OK ]
[root@db2host2p ~]# /etc/init.d/oracleasm scandisks
Scanning the system for Oracle ASMLib disks: [ OK ]

2) One of cluster nodes:
[root@db1host1p ~]# /etc/init.d/oracleasm createdisk DGDATA059 /dev/asmdsk/DGDATA059
Marking disk "DGDATA059" as an ASM disk: [ OK ]
[root@db1host1p ~]# /etc/init.d/oracleasm createdisk DGDATA060 /dev/asmdsk/DGDATA060
Marking disk "DGDATA060" as an ASM disk: [ OK ]
[root@db1host1p ~]# /etc/init.d/oracleasm createdisk DGDATA061 /dev/asmdsk/DGDATA061
Marking disk "DGDATA061" as an ASM disk: [ OK ]
[root@db1host1p ~]# /etc/init.d/oracleasm createdisk DGDATA062 /dev/asmdsk/DGDATA062
Marking disk "DGDATA062" as an ASM disk: [ OK ]
[root@db1host1p ~]# /etc/init.d/oracleasm createdisk DGDATA159 /dev/asmdsk/DGDATA159
Marking disk "DGDATA159" as an ASM disk: [ OK ]
[root@db1host1p ~]# /etc/init.d/oracleasm createdisk DGDATA160 /dev/asmdsk/DGDATA160
Marking disk "DGDATA160" as an ASM disk: [ OK ]
[root@db1host1p ~]# /etc/init.d/oracleasm createdisk DGDATA161 /dev/asmdsk/DGDATA161
Marking disk "DGDATA161" as an ASM disk: [ OK ]
[root@db1host1p ~]# /etc/init.d/oracleasm createdisk DGDATA162 /dev/asmdsk/DGDATA162
Marking disk "DGDATA162" as an ASM disk: [ OK ]

3) All cluster nodes: /etc/init.d/oracleasm scandisk
[root@db1host1p ~]# /etc/init.d/oracleasm scandisks
Scanning the system for Oracle ASMLib disks: [ OK ]
[root@db2host2p ~]# /etc/init.d/oracleasm scandisks
Scanning the system for Oracle ASMLib disks: [ OK ]

# Adding Disk on Diskgroup (sqlplus / as sysasm – ASM Instance)
1) Listing Failgroups
SQL> select distinct failgroup from v$asm_disk where group_number in (select group_number from v$asm_diskgroup where name='DGDATA');
FAILGROUP
----------------------------------------------------
FGMASTER
FGAUX

1) Adding Disks (naming and setting rebalance power):
SQL> alter diskgroup DGDATA
2 add failgroup FG01 disk
3 'ORCL:DGDATA059' name DGDATA059,
4 'ORCL:DGDATA060' name DGDATA060,
5 'ORCL:DGDATA061' name DGDATA061,
6 'ORCL:DGDATA062' name DGDATA062
7 add failgroup FG02 disk
8 'ORCL:DGDATA159' name DGDATA159,
9 'ORCL:DGDATA160' name DGDATA160,
10 'ORCL:DGDATA161' name DGDATA161,
11 'ORCL:DGDATA162' name DGDATA162
12 rebalance power 10 nowait;
Diskgroup altered

2) Be patient, and wait the rebalancing:
SQL> select * from v$asm_operation;
GROUP_NUMBER OPERATION STATE POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE
------------ ----------- ---------- ---------- ---------- -----------
4 REBAL RUN 10 10 191386 540431 1651 211 5 REBAL WAIT 4
SQL> /
GROUP_NUMBER OPERATION STATE POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE
------------ --------------- ----------------------------------------
4 REBAL RUN 10 10 443438 548118 2345 44 5 REBAL WAIT 4
SQL> /
no rows selected

Well done! 😀
Matheus.

Manually Mounting ACFS

A server rebooted and I needed to remount the ACFS where the Oracle Home is. About that:
Today’s post: Manually Mounting ACFS
Tomorrow’s Someday’s post: Kludge: Mounting ACFS Thought Shellscript
Day Before Tomorrow’s Another Day’s post: Auto Mounting Cluster Services Through Oracle Restart

But, first, some usefull links:
– ACFS Introduction
– ACFS Advanced
– ACFS Command-Line Utilities

# Manually Mounting ACFS
Checked my $ORACLE_HOME (mounted on ACFS) is not available to start the database. Checked ACFS service is down. So, let’s do all the process:

# Starting ACFS
[root@db1host1p ~]$ $GRID_HOME/bin/acfsload start -s

# Volumes OFFLINE: Let’s Enable it:
[root@db1host1p ~]$ $GRID_HOME/bin/crsctl stat res -t |grep acfs
ora.dghome.sephome.acfs
ONLINE OFFLINE db1host1p
[root@db1host1p ~]$ su - grid
[grid@db1host1p ~]$ asmcmd
ASMCMD> volinfo -a
Diskgroup Name: DGHOME
Volume Name: LVHOME
Volume Device: /dev/asm/lvhome-270
State: DISABLED
Size (MB): 10240
Resize Unit (MB): 32
Redundancy: MIRROR
Stripe Columns: 4
Stripe Width (K): 128
Usage: ACFS
Mountpath: /oracle/MYDB
ASMCMD> volenable -a
ASMCMD> volinfo -a
Diskgroup Name: DGHOME
Volume Name: LVHOME
Volume Device: /dev/asm/lvhome-270
State: ENABLED
Size (MB): 10240
Resize Unit (MB): 32
Redundancy: MIRROR
Stripe Columns: 4
Stripe Width (K): 128
Usage: ACFS
Mountpath: /oracle/MYDB

[root@db1host1p ~]$ $GRID_HOME/bin/crsctl stat res -t |grep acfs
ora.dghome.sephome.acfs
ONLINE ONLINE db1host1p mounted on /oracle/MYDB
ONLINE ONLINE db2host2p mounted on /oracle/MYDB

# As root, let’s mount it:
[root@db1host1p ~]# mount -t acfs /dev/asm/lvhome-270 /oracle/MYDB

# Then, with the $ORACLE_HOME available:
[oracle@db1host1p ~]$ srvctl start instance -d MYDB -i MYDB001

Matheus.

Rebuild all indexes of a Partioned Table

Another quick post!

Regarding you frequently need to collect all indexes of a partioned table (local and global indexes), this is a quick script that make the task a little bit easier:

begin
-- local indexes
for i in (select p.index_owner owner, p.index_name, p.partition_name
from dba_indexes i, dba_ind_partitions p
where i.owner='&OWNER'
and   i.table_name='&TABLE'
and   i.partitioned='YES'
and   i.visibility='VISIBLE' -- Rebuild only of the visible indexes, to get real effect :)
and   p.index_name=i.index_name
and   p.index_owner=i.owner
order by 1,2) loop
execute immediate 'alter index '||i.owner||'.'||i.index_name||' rebuild  partition '||i.partition_name||' online parallel 12'; -- parallel 12 solve most of the problems
execute immediate 'alter index '||i.owner||'.'||i.index_name||' parallel 1'; -- If you don't use parallel indexes in your database, or the default parallel of the index, or what you want...
end loop;
-- global indexes
for i in (select i.owner owner, i.index_name
from dba_indexes i
where i.owner='&OWNER'
and   i.table_name='&TABLE'
and   i.partitioned='NO'
and   i.visibility='VISIBLE' -- same comment
order by 1,2) loop
execute immediate 'alter index '||i.owner||'.'||i.index_name||' rebuild online parallel 12'; -- same
execute immediate 'alter index '||i.owner||'.'||i.index_name||' parallel 1'; -- same :)
end loop;
end;
/

I hope this script make your life easier. Hugs!

Matheus.

Ordering Sequences over RAC – Hang on ‘DFS lock handle’

Hi all!
Whats up?
I had a fun weekend. So, some things to write about. 🙂

This post is just to show an exerience with the event ‘DFS lock handle’, related to sequence ordering over the cluster nodes.

I started a process to make pk id’s adjustment, related to application’s number limitation (int 32 bits -> 4294967296), looking for move older entries and “release” some ids.
As a legacy of database unification, we have a lot of tables with the same name in different schemas, those use the same sequence for pk ids generation.
The readjustment involve a select of a sequence and is runned in a lot of parallel “sqlplus” call to optimize the time and fit to the business maintenance window.

When I started, I just used the global service name (dedicated connection) and the scanlistener. The result was distribuiting connections over the 5 nodes of the cluster. Bad idea.
In the first, I suspected about the concurrency by the sequence over different nodes (could occour if the node caches are too small), based on a few XA transaction bugs involving this event.

By the way, if you’re facing this hang with XA transactions, please take a look on “High rdbms ipc reply and DFS lock handle in 11gR2 RAC With XA Enabled Application (Doc ID 1361615.1)“.
It can be solved by setting “_clusterwide_global_transactions” to FALSE.
It’s recommendable, additionally, to read the Best Practices for Using XA with RAC.

Take a look on the blocking session over the cluster nodes:

proddb4> @sess
User:MATHEUS
SID SERIAL# INST_ID EVENT SQL_ID BLOCKING_SE BLOCKING_SESSION BLOCKING_INSTANCE
------ ---------- ---------- ----------------- -------------- ----------- ---------------- -----------------
9386 147 4 DFS lock handle 9zr9vpvmkqzkv VALID 10968 4
9499 179 4 DFS lock handle fqk9y9q7u2d5c VALID 11082 4
8821 153 4 DFS lock handle 2jd84taf7krh2 VALID 13902 4
22442 1155 3 DFS lock handle 8ycpfxq2jthq3 VALID 9067 3
9860 1339 3 DFS lock handle 2jmzv23ug9kth VALID 10299 3
9772 1529 3 DFS lock handle 802kn9htah6pt VALID 22442 3
22543 1673 5 DFS lock handle 6tgvwkt6cqngk VALID 3074 5
22307 135 5 DFS lock handle 5b3zgqgq7bbdz VALID 3665 5
21010 91 5 DFS lock handle gkmycubvn9aa3 VALID 3546 5
9508 1459 3 DFS lock handle 7cw6bcjsf8xf2 VALID 10387 3
10299 4669 3 DFS lock handle 7y2tnuckh37wp VALID 11795 3
121 139 5 DFS lock handle 6tgvwkt6cqngk VALID 3310 5
596 113 5 DFS lock handle 8yqbzu29shvnm VALID 2603 5
360 113 5 DFS lock handle dv49pafm9z8zy VALID 596 5
10740 3177 3 DFS lock handle c6q65hnq0ju7x VALID 11707 3
9838 181 4 DFS lock handle aqa7afq2upkuq VALID 9386 4
714 77 5 DFS lock handle ft8xzyzhycpn2 VALID 360 5
9951 147 4 DFS lock handle 697mts944db7y VALID 9725 4
950 109 5 DFS lock handle cd2gsz5rb2qw9 VALID 3192 5
10387 1529 3 DFS lock handle 2tqnrbh0x60dp VALID 12238 3
10064 143 4 DFS lock handle d833wg4u9cfyb VALID 10649 1
833 1503 5 DFS lock handle 7ynbg2t4taxha VALID 2366 5
10649 53 1 DFS lock handle 0sgzmj1tbx4rh VALID 10737 1
2249 149 5 DFS lock handle aa6jr8ugxaz4z VALID 833 5
9612 175 4 DFS lock handle d2nrr4gtdjq9b VALID 9499 4
10825 57 1 DFS lock handle acmyc4sw7zzc2 VALID 10649 1
2603 1415 5 DFS lock handle fg47vs5wa8zq8 VALID 22307 5
2485 65 5 DFS lock handle 702x9zwtfktu6 VALID 714 5
10737 55 1 DFS lock handle bthxrpmz0ug63 VALID 12148 3

Ok doke, let’s cancel the sessions and rerun the process just in node node (by SID). It should solve the small caches over cluster hang, without need to modify the sequence, right?

snailBeeep. Wrong:

proddb4> @sess
User:MATHEUS
SID SERIAL# INST_ID EVENT SQL_ID BLOCKING_SE BLOCKING_SESSION BLOCKING_INSTANCE
------ ---------- ---------- --------------- ------------- ----------- ---------------- -----------------
2494 53953 4 DFS lock handle fc3cam368zsp6 UNKNOWN
6561 32113 4 DFS lock handle f618p0hd4xsy0 UNKNOWN
9269 111 4 DFS lock handle fkn8hxbsfkfnz UNKNOWN
9047 175 4 DFS lock handle fqk9y9q7u2d5c VALID 8931 4
459 12605 4 DFS lock handle 5b3zgqgq7bbdz VALID 9271 4
1929 305 4 DFS lock handle 6tgvwkt6cqngk VALID 8026 4
7349 1013 4 DFS lock handle 802kn9htah6pt UNKNOWN
7800 175 4 DFS lock handle 0hc1bmqj1fp4f UNKNOWN
21475 17349 4 DFS lock handle cfh3r4sq788vu VALID 9042 4
8026 641 4 DFS lock handle 6tgvwkt6cqngk VALID 459 4
14919 59 4 DFS lock handle gkmycubvn9aa3 VALID 15373 4
15032 2267 4 DFS lock handle 9zr9vpvmkqzkv VALID 7688 4
15145 2411 4 DFS lock handle ddkqx4xttc9s9 UNKNOWN
15373 1657 4 DFS lock handle 2jd84taf7krh2 VALID 15713 4
8934 157 4 DFS lock handle 8ycpfxq2jthq3 VALID 1929 4
15826 551 4 DFS lock handle d8dhmr2sx08xq VALID 9612 4
15713 3357 4 DFS lock handle 2jmzv23ug9kth VALID 10177 4
8821 155 4 DFS lock handle 9fpmw9cwak21s UNKNOWN
16050 7007 4 DFS lock handle 4t5qkth35r2um VALID 8705 4
2042 1269 4 DFS lock handle 7cw6bcjsf8xf2 UNKNOWN

What a hell!
Lets take a look in one of the sqls to find the sequence…

proddb4> @sqlid 6tgvwkt6cqngk
UPDATE TABLE_XPTO SET RECNO = SEQ_OWNER.SEQ_NAME.NEXTVAL WHERE ROWID=:B1 EXTVAL WHERE ROWID=:B1

And what about the sequence configuration?

proddb4> @getddl sequence SEQ_OWNER SEQ_NAME
create sequence SEQ_OWNER.SEQ_NAME
minvalue 1
maxvalue 9999999999
start with 85669803
increment by 1
cache 120000
cycle
order;

ORDER!
Man, of course. It create a several control over the nodes just to keep the sequence in order, as explained in this post by Christo Kutrovsky.

To my situation, in the business maintenance window, it’s not an important constraint. So, lets disable the ordering:

proddb4> alter sequence SEQ_OWNER.SEQ_NAME noorder;
Sequence altered.

Then, TAADÃÃ!

proddb4> @sess
User:MATHEUS
SID SERIAL# INST_ID EVENT SQL_ID BLOCKING_SE BLOCKING_SESSION BLOCKING_INSTANCE
----- ---------- ------- ------------------------ ------------- ----------- ---------------- -----------------
15145 2411 4 library cache: mutex X ddkqx4xttc9s9 UNKNOWN
15032 2267 4 library cache: mutex X 9zr9vpvmkqzkv UNKNOWN
14919 59 4 library cache: mutex X gkmycubvn9aa3 NOT IN WAIT
9269 111 4 library cache: mutex X fkn8hxbsfkfnz UNKNOWN
9047 175 4 library cache: mutex X fqk9y9q7u2d5c UNKNOWN
8934 157 4 library cache: mutex X 8ycpfxq2jthq3 UNKNOWN
8821 155 4 library cache: mutex X 9fpmw9cwak21s UNKNOWN
8026 641 4 library cache: mutex X 6tgvwkt6cqngk UNKNOWN
7800 175 4 library cache: mutex X 0hc1bmqj1fp4f UNKNOWN
7349 1013 4 library cache: mutex X 802kn9htah6pt UNKNOWN
2042 1269 4 library cache: mutex X 7cw6bcjsf8xf2 UNKNOWN
9160 1205 4 library cache: mutex X 6tgvwkt6cqngk NOT IN WAIT
9042 293 4 library cache: mutex X 4jc1u6n2qx94z UNKNOWN
10177 5611 4 library cache: mutex X c6q65hnq0ju7x UNKNOWN
9271 235 4 library cache: mutex X d2nrr4gtdjq9b NOT IN WAIT
9951 1191 4 library cache: mutex X bkdhxhhqdbqb9 UNKNOWN
8931 291 4 library cache: mutex X 697mts944db7y UNKNOWN
9838 1315 4 library cache: mutex X 6q6r7ht1hnctg NOT IN WAIT
8818 325 4 library cache: mutex X 2tqnrbh0x60dp UNKNOWN

Of course we’re having some mutex x, but it’s a lot better then DFS lock, and the process just “go”. 🙂

After be done, to keep the configuration, let’s enable ordering again:

proddb4> alter sequence SEQ_OWNER.SEQ_NAME order;
Sequence altered.

Matheus.

ASM: Adding disk “_DROPPED%” FORCE

Ok doke,
First let I make it clear: Adding a disk with force should be avoided, mainly by all the rebalance involved. The best choice, if you has “time”, is to just put disks online, like:

1) ALTER DISKGROUP ONLINE DISK ; or
2) ALTER DISKGROUP ONLINE DISKS IN FAILGROUP ; or
3) ALTER DISKGROUP ONLINE ALL;

But, the post is about adding back to DG the dropped disks.
Let’s imagine, to undestand my situation, you lost the contact with one of your two site storage… In this example, represented by failgroup FGAUX. You would see the disks like this:

SQL> select name,failgroup,state from v$asm_disk a where state <> 'NORMAL';

NAME FAILGROUP STATE
------------------------------ ------------------------------ --------
_DROPPED_0000_DGDATA FGAUX FORCING
_DROPPED_0001_DGDATA FGAUX FORCING
_DROPPED_0002_DGDATA FGAUX FORCING

So, you know your disks by the name pattern (0 are FGMAIN and 1 are FGAUX, the problematic). You can do something like:

[root@database-host ~]# /etc/init.d/oracleasm listdisks |grep DGDATA
DGDATA001
DGDATA002
DGDATA003
DGDATA101
DGDATA102
DGDATA103

Now, make the simple… 🙂

SQL> ALTER DISKGROUP DGDATA ADD
FAILGROUP FGAUX
DISK
'ORCL:DGDATA101' name DGDATA101 FORCE,
'ORCL:DGDATA102' name DGDATA102 FORCE,
'ORCL:DGDATA103' name DGDATA103 FORCE;

Diskgroup altered.

SQL> ALTER DISKGROUP DGDATA rebalance power 8;

Diskgroup altered.

While waiting the reball, let’s see the disks in DG:

SQL> select * from v$asm_operation where group_number=(select group_number from v$asm_diskgroup where name='DGDATA');

GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE
------------ ----- ---- ---------- ---------- ---------- ---------- ---------- ----------- --------------------------------------------
3 REBAL WAIT 8
SQL> select name,failgroup,state from v$asm_disk a where group_number=(select group_number from v$asm_diskgroup where name='DGDATA');

NAME FAILGROUP STATE
------------------------------ ------------------------------ --------
_DROPPED_0000_DGDATA FGAUX FORCING
_DROPPED_0001_DGDATA FGAUX FORCING
_DROPPED_0002_DGDATA FGAUX FORCING
DGDATA101 FGAUX NORMAL
DGDATA102 FGAUX NORMAL
DGDATA103 FGAUX NORMAL
DGDATA001 FGMAIN NORMAL
DGDATA002 FGMAIN NORMAL
DGDATA003 FGMAIN NORMAL

And, when the rebalance end, the situation will be OK:

SQL> select * from v$asm_operation where group_number=(select group_number from v$asm_diskgroup where name='DGDATA');

GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE
------------ ----- ---- ---------- ---------- ---------- ---------- ---------- ----------- --------------------------------------------
3 REBAL RUN 8 8 629 19087 10143 1

SQL> select * from v$asm_operation where group_number=(select group_number from v$asm_diskgroup where name='DGDATA');

no rows selected

SQL> select name,failgroup,state from v$asm_disk a where group_number=(select group_number from v$asm_diskgroup where name='DGDATA');

NAME FAILGROUP STATE
------------------------------ ------------------------------ --------
DGDATA101 FGAUX NORMAL
DGDATA102 FGAUX NORMAL
DGDATA103 FGAUX NORMAL
DGDATA001 FGMAIN NORMAL
DGDATA002 FGMAIN NORMAL
DGDATA003 FGMAIN NORMAL

OK? Easy! 😀

Matheus.

Grepping Entries from Alert.log

Hey hey,
One more McGayver by me! Haha
Again to find some information in alert. This time, I’m looking to count and list all occurrences of an action in alert. To archive this, I made the script below.

grep-swiss-knife-590x295

The functionality is just a little bit more complex than the script of the last post, but stills quite simple. Take a look:

Parameters:
PAR1: name of alert (the main alert.log)
PAR2: Searched token
PAR3: Start day you want to, in the format “Mon dd” or just “Mon”. Below an example.
PAR4: Start Year (4 digits)
PAR5: [optional]End day you want to, in the format “Mon dd” or just “Mon”. The default value is “until now”.
PAR6: [optional]End Year (4 digits). The default value is “until now”. If you use the PAR5, you have to use PAR6.
PAR7: [optional] List All entries and when?. If you want to use this PAR, you must to use PAR5 and PAR6.

Examples (Looking for service reconfigurations):
Ex1: sh grep_entries_alert.sh alert_xxdb_1.log “services=” “Apr 12” 2015
(Seach between April 12 and now and count entries).
Ex2: sh grep_entries_alert.sh alert_xxdb_1.log “services=” “Apr 01” 2015 “May 30” 2015
(Seach between April 01 and May 30 and count the entries).
Ex3: sh grep_entries_alert.sh alert_xxdb_1.log “services=” “Apr 01” 2015 “May 30” 2015 LIST
(Seach between April 01 and May 30 and count the entries and list them all…)

# Script grep_entries_alert.sh
if [ $# -lt 6 ]; then
FIN=`cat $1 |wc -l`
else FIN=`cat $1 |grep -n $5 |grep $6$ |head -n 1 |cut -d':' -f1`
fi
BEG=`cat $1 |grep -n "$3" |grep $4$ |head -n 1 |cut -d':' -f1`
NMB=`expr $FIN - $BEG`
ENTR=`cat $1 |head -n $FIN |tail -$NMB| grep $2|wc -l`
echo Number of Entries: $ENTR >log.log
if [ $# -lt 7 ]; then
echo ------- Complete List Of Entries and When ---------- >> log.log
for line in `cat $1 |head -n $FIN |tail -$NMB| grep -n $2|cut -d':' -f1`;do
LR=`expr $line + $BEG` # To get "real line", without the displacement
DAT=`expr $LR - 1`     # To get line date of entry
echo awk \'NR==$DAT\' $1 >>aux.sh # Printing the lines just calculted
echo awk \'NR==$LR\' $1 >>aux.sh  # with aux.sh
done;
sh aux.sh >>log.log
fi
cat log.log

It’s not beautiful. But it works! 🙂

After that, there is the new blog sponsor:
MacGyver-macgyver-880400_200_228
(Hahahaha)

Matheus.

Grepping Alert by Day

Hi all,
For that moment when your alert is very big and some OS doesn’t “work very well with it” (in my case was using AIX), I jerry-ringged the shellscript bellow. It puts in a new log just the log entries of a selected day.

24 7 365

The call can be made with two or three parameters, this way:

Parameters:
PAR1:
name of alert (the main alert.log)
PAR2: Day you want to, in the format “Mon dd”. Below an example.
PAR3: [optional] desired year. The default is the current year. But is useful specially on the “new year” period…

Examples:
Ex1: sh grep_day.sh alert_xxdb_1.log “Apr 12”
Ex2: sh grep_day.sh alert_xxdb_1.log “Apr 12” 2014

Generated files:
dalert_2015Apr12.log
dalert_2014Apr12.log

# Script grep_day.sh
if [ $# -lt 3 ]; then
YEAR=`date +"%Y"`
else YEAR=$3
fi
DATEFORMAT=`echo $2|cut -d' ' –f1`""`echo $2|cut -d' ' –f2`
BEG=`cat $1 |grep -n "$2" |grep $YEAR |head -1 |cut -d':' -f1`
FIN=`cat $1 |grep -n "$2" | grep $YEAR |tail -1 |cut -d':' -f1`
NMB=`expr $FIN - $BEG`
cat $1 |head -$FIN |tail -$NMB > dalert_$YEAR$DATEFORMAT.log

Belive me! It can be useful…. haha

See ya!

Matheus.

Oracle Convert Number into Days, Hours, Minutes

There’s a little trick…
Today I had to convert a “number” of minutes into hours:minutes format. Something like convert 570 minutes in format hour:minutes. As you know, 570/60 is “9,5” and should be “9:30”.

Lets use 86399 seconds (23:59:59) as example:

I began testing “to_char(to_date)” functions:
boesing@db>select to_char(to_date(86399,'sssss'),'hh24:mi:ss') formated from dual;

FORMATED
——–
23:59:59

Ok, it works. But using “seconds past midnight” (sssss). By the way, it works between 0 and 86399 only:

boesing@db> select to_char(to_date(86400,'sssss'),'hh24:mi:ss') from dual;
select to_char(to_date(86400,'sssss'),'hh24:mi:ss') from dual
*
ERROR at line 1:
ORA-01853: seconds in day must be between 0 and 86399

The problem remains. How to use minutes in 3 digits (570 minutes -> 9:30), for example?
The best way I solve was:

--- Seconds in hours:minutes:seconds
--- If you comment the first "TO_CHAR" line, can be minutes in hours:minutes too..
select
TO_CHAR(TRUNC(vlr/3600),'FM9900') || ':' || -- hours
TO_CHAR(TRUNC(MOD(vlr,3600)/60),'FM00') || ':' || -- minutes
TO_CHAR(MOD(vlr,60),'FM00') -- second
from dual;

It always works. 🙂

boesing@db>select
2 TO_CHAR(TRUNC(86399/3600),'FM9900') || ':' || -- hours
3 TO_CHAR(TRUNC(MOD(86399,3600)/60),'FM00') || ':' || -- minutes
4 TO_CHAR(MOD(86399,60),'FM00') -- second
5 from dual;

TO_CHAR(TRUNC
————-
23:59:59

boesing@db>select
2 TO_CHAR(TRUNC(570/3600),’FM9900′) || ‘:’ || — hours
3 TO_CHAR(TRUNC(MOD(570,3600)/60),’FM00′) || ‘:’ || — minutes
4 TO_CHAR(MOD(570,60),’FM00′) — second
5 from dual;

TO_CHAR(TRUNC
————-
00:09:30

boesing@db>select
2 TO_CHAR(TRUNC(MOD(570,3600)/60),’FM00′) || ‘:’ || — hours
3 TO_CHAR(MOD(570,60),’FM00′) — minutes
4 from dual;

TO_CHAR
——-
09:30

Any better way? Leave a comment. Thanks!

Matheus.