High CPU usage by LMS and Node Evictions: Solved by Setting “_high_priority_processes”

Another thing that may help you in environments with highly interdependent applications:

Our env has high interconnect network block changing, and, as a consequence, high CPU usage by Global Cache Services (GCS)/Lock Manager Server Process (LMS).

This way, for each little latency in the interconnect interface, we were having a node eviction and all the impacts to the legacy application you can imagine (without gridlink or any solution to make the relocation ‘transparent’, as is usual to legacy application) and, of course, the business impact.

Oracle obviously suggested that we reduce the block concurrency over the cluster nodes grouping the application by affinity. But, it’s just no applicable to our env… 🙁

When nothing seemed to help, the workaround came from here: Top 5 Database and/or Instance Performance Issues in RAC Environment (Doc ID 1373500.1).

Here is our change:

boesing@proddb> alter system set "_high_priority_processes"='LMS*|LGWR|VKTM' scope=spfile sid='*';
System altered.

No magic, but the problem stopped to happen. After that, we’re having some warnings about clock synchronization over the cluster nodes on CRS alerts. Like this:

CRS-2409:The clock on host proddb1 is not synchronous with the mean cluster time. No action has been taken as the Cluster Time Synchronization. Service is running in observer mode.

I believe it happens because VKTM lost priority. But it’s OK: The node evictions has stopped! 😀

Matheus.

VPD: “row cache objects” latch contention

The other day, we found high occurrence of latch events in our principal/core environment (11.2.0.3.0). The origins are all “different businesses channels” that access objects through the use of VPD. The latch events was bit by bit dominating the environment during the last months and turn on an “attention alarm” to us.

Then we found the the note: Bug 12772404 – Significant “row cache objects” latch contention when using VPD – superseded (Doc ID 12772404.8).

The situation is right the same:

“When VPD is used, intense row cache objects latch contention (dc_users) may caused by an internal Exempt Access Policy privilege check. Rediscovery Information: 
VPD is in use 
Significant “latch: row cache objects” waits occur
The waits are for the latch covering dc_users”

Take a look on the DC_USERS latches:
dc_users

And about the workaround:
“There is no direct workaround available.
The following guidelines may help to alleviate the problem :
– Dropping the database roles from our user:
The Number of Roles granted to user can increase the row cache
look-ups proportionally. When database is required to check whether
a system privilege is granted to User, it checks if that privilege
is granted to any of the User’s roles. Hence, it’s not helpful
to do something like “set role A, B, C, D, F …” to recreate its
environment for every execution.
– Changing the policy function might be helpful in some cases
eg: To use CONTEXT dependent policies instead of DYNAMIC policies”

Take a look in one of the examples of:

boesing@mydb4> /
P1RAW EVENT USERNAME SQL_ID SQL_CHILD_NUMBER LAST_CALL_ET SID SEQ# WAIT_TIME SECOND
--------- ---------------- ------ ------ ---------- ---------- ------
0700011807B50D08 latch: row cache objects CHANNELAPP 4nwvpx8xt3h3m 22 0 1276 59113 0
0700011807B50D08 latch: row cache objects CHANNELAPP fp3mft3usb74w 0 21719 16636 0
0700011807B50D08 latch: row cache objects CHANNELAPP 58pund2p09hgg 0 6774 11061 0
0700011807B50D08 latch: row cache objects OTHER_CHANNELAPP 54a2wfa60rgu1 1 0 8046 12386 0
0700011807B50D08 latch: row cache objects CHANNELAPP 1gwr69wduk9v4 42 0 9454 53927 0
0700011807B50D08 latch: row cache objects OTHER_CHANNELAPP 9pqrqqfzukrq4 68 0 9732 19311 0
0700011807B50D08 latch: row cache objects CHANNELAPP d1bnq8wb0nhrf 0 1 11425 56830 -1
0700011807B50D08 latch: row cache objects CHANNELAPP 32aqdd8cbmc4b 0 11711 39182 0
0700011807B50D08 latch: row cache objects IB_RUN adgnrpwazbfmz 0 12133 3372 0
0700011807B50D08 latch: row cache objects IB_RUN cqmgxvb78q9hy 0 17913 6345 0
0700011807B50D08 latch: row cache objects CHANNELAPP byzm159jbjxaa 0 6 19606 52624 0
0700011807B50D08 latch: row cache objects OTHER_CHANNELAPP 2kbjztd9yzqfm 61 0 20732 28687 0
0700011807B50D08 latch: row cache objects CHANNELAPP 6dvagdabts9nx 19 7 21011 504 0
0700011807B50D08 latch: row cache objects CHANNELAPP 9pqrqqfzukrq4 78 0 21439 19030 0
0700011807B50D08 latch: row cache objects CHANNELAPP gq1avu79h2np3 85 0 3815 33831 -1
boesing@mydb4>SELECT child# FROM v$latch_children WHERE addr= '0700011807B50D08';
CHILD#
----------
8
boesing@mydb4> select s.kqrstcln latch#, s.kqrstcid cache#, kqrsttxt name from x$kqrst s where s.kqrstcln=8;
LATCH# CACHE# NAME
---------- ---------- --------------------------------
8 10 dc_users
8 7 dc_users
8 7 dc_users
8 7 dc_users

The problem was definitively solved by applying the 11.2.0.4.2 PSU. No problems after that.
Good luck, if it’s your situation.

Hugs!
Matheus.

Adding ASM Disks on RHEL Cluster with Failgroups

# Recognizing as ASMDISK on ASM Libs (ORACLEASM):

1) All cluster nodes: /etc/init.d/oracleasm scandisk
[root@db1host1p ~]# /etc/init.d/oracleasm scandisks
Scanning the system for Oracle ASMLib disks: [ OK ]
[root@db2host2p ~]# /etc/init.d/oracleasm scandisks
Scanning the system for Oracle ASMLib disks: [ OK ]

2) One of cluster nodes:
[root@db1host1p ~]# /etc/init.d/oracleasm createdisk DGDATA059 /dev/asmdsk/DGDATA059
Marking disk "DGDATA059" as an ASM disk: [ OK ]
[root@db1host1p ~]# /etc/init.d/oracleasm createdisk DGDATA060 /dev/asmdsk/DGDATA060
Marking disk "DGDATA060" as an ASM disk: [ OK ]
[root@db1host1p ~]# /etc/init.d/oracleasm createdisk DGDATA061 /dev/asmdsk/DGDATA061
Marking disk "DGDATA061" as an ASM disk: [ OK ]
[root@db1host1p ~]# /etc/init.d/oracleasm createdisk DGDATA062 /dev/asmdsk/DGDATA062
Marking disk "DGDATA062" as an ASM disk: [ OK ]
[root@db1host1p ~]# /etc/init.d/oracleasm createdisk DGDATA159 /dev/asmdsk/DGDATA159
Marking disk "DGDATA159" as an ASM disk: [ OK ]
[root@db1host1p ~]# /etc/init.d/oracleasm createdisk DGDATA160 /dev/asmdsk/DGDATA160
Marking disk "DGDATA160" as an ASM disk: [ OK ]
[root@db1host1p ~]# /etc/init.d/oracleasm createdisk DGDATA161 /dev/asmdsk/DGDATA161
Marking disk "DGDATA161" as an ASM disk: [ OK ]
[root@db1host1p ~]# /etc/init.d/oracleasm createdisk DGDATA162 /dev/asmdsk/DGDATA162
Marking disk "DGDATA162" as an ASM disk: [ OK ]

3) All cluster nodes: /etc/init.d/oracleasm scandisk
[root@db1host1p ~]# /etc/init.d/oracleasm scandisks
Scanning the system for Oracle ASMLib disks: [ OK ]
[root@db2host2p ~]# /etc/init.d/oracleasm scandisks
Scanning the system for Oracle ASMLib disks: [ OK ]

# Adding Disk on Diskgroup (sqlplus / as sysasm – ASM Instance)
1) Listing Failgroups
SQL> select distinct failgroup from v$asm_disk where group_number in (select group_number from v$asm_diskgroup where name='DGDATA');
FAILGROUP
----------------------------------------------------
FGMASTER
FGAUX

1) Adding Disks (naming and setting rebalance power):
SQL> alter diskgroup DGDATA
2 add failgroup FG01 disk
3 'ORCL:DGDATA059' name DGDATA059,
4 'ORCL:DGDATA060' name DGDATA060,
5 'ORCL:DGDATA061' name DGDATA061,
6 'ORCL:DGDATA062' name DGDATA062
7 add failgroup FG02 disk
8 'ORCL:DGDATA159' name DGDATA159,
9 'ORCL:DGDATA160' name DGDATA160,
10 'ORCL:DGDATA161' name DGDATA161,
11 'ORCL:DGDATA162' name DGDATA162
12 rebalance power 10 nowait;
Diskgroup altered

2) Be patient, and wait the rebalancing:
SQL> select * from v$asm_operation;
GROUP_NUMBER OPERATION STATE POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE
------------ ----------- ---------- ---------- ---------- -----------
4 REBAL RUN 10 10 191386 540431 1651 211 5 REBAL WAIT 4
SQL> /
GROUP_NUMBER OPERATION STATE POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE
------------ --------------- ----------------------------------------
4 REBAL RUN 10 10 443438 548118 2345 44 5 REBAL WAIT 4
SQL> /
no rows selected

Well done! 😀
Matheus.

Manually Mounting ACFS

A server rebooted and I needed to remount the ACFS where the Oracle Home is. About that:
Today’s post: Manually Mounting ACFS
Tomorrow’s Someday’s post: Kludge: Mounting ACFS Thought Shellscript
Day Before Tomorrow’s Another Day’s post: Auto Mounting Cluster Services Through Oracle Restart

But, first, some usefull links:
– ACFS Introduction
– ACFS Advanced
– ACFS Command-Line Utilities

# Manually Mounting ACFS
Checked my $ORACLE_HOME (mounted on ACFS) is not available to start the database. Checked ACFS service is down. So, let’s do all the process:

# Starting ACFS
[root@db1host1p ~]$ $GRID_HOME/bin/acfsload start -s

# Volumes OFFLINE: Let’s Enable it:
[root@db1host1p ~]$ $GRID_HOME/bin/crsctl stat res -t |grep acfs
ora.dghome.sephome.acfs
ONLINE OFFLINE db1host1p
[root@db1host1p ~]$ su - grid
[grid@db1host1p ~]$ asmcmd
ASMCMD> volinfo -a
Diskgroup Name: DGHOME
Volume Name: LVHOME
Volume Device: /dev/asm/lvhome-270
State: DISABLED
Size (MB): 10240
Resize Unit (MB): 32
Redundancy: MIRROR
Stripe Columns: 4
Stripe Width (K): 128
Usage: ACFS
Mountpath: /oracle/MYDB
ASMCMD> volenable -a
ASMCMD> volinfo -a
Diskgroup Name: DGHOME
Volume Name: LVHOME
Volume Device: /dev/asm/lvhome-270
State: ENABLED
Size (MB): 10240
Resize Unit (MB): 32
Redundancy: MIRROR
Stripe Columns: 4
Stripe Width (K): 128
Usage: ACFS
Mountpath: /oracle/MYDB

[root@db1host1p ~]$ $GRID_HOME/bin/crsctl stat res -t |grep acfs
ora.dghome.sephome.acfs
ONLINE ONLINE db1host1p mounted on /oracle/MYDB
ONLINE ONLINE db2host2p mounted on /oracle/MYDB

# As root, let’s mount it:
[root@db1host1p ~]# mount -t acfs /dev/asm/lvhome-270 /oracle/MYDB

# Then, with the $ORACLE_HOME available:
[oracle@db1host1p ~]$ srvctl start instance -d MYDB -i MYDB001

Matheus.

Rebuild all indexes of a Partioned Table

Another quick post!

Regarding you frequently need to collect all indexes of a partioned table (local and global indexes), this is a quick script that make the task a little bit easier:

begin
-- local indexes
for i in (select p.index_owner owner, p.index_name, p.partition_name
from dba_indexes i, dba_ind_partitions p
where i.owner='&OWNER'
and   i.table_name='&TABLE'
and   i.partitioned='YES'
and   i.visibility='VISIBLE' -- Rebuild only of the visible indexes, to get real effect :)
and   p.index_name=i.index_name
and   p.index_owner=i.owner
order by 1,2) loop
execute immediate 'alter index '||i.owner||'.'||i.index_name||' rebuild  partition '||i.partition_name||' online parallel 12'; -- parallel 12 solve most of the problems
execute immediate 'alter index '||i.owner||'.'||i.index_name||' parallel 1'; -- If you don't use parallel indexes in your database, or the default parallel of the index, or what you want...
end loop;
-- global indexes
for i in (select i.owner owner, i.index_name
from dba_indexes i
where i.owner='&OWNER'
and   i.table_name='&TABLE'
and   i.partitioned='NO'
and   i.visibility='VISIBLE' -- same comment
order by 1,2) loop
execute immediate 'alter index '||i.owner||'.'||i.index_name||' rebuild online parallel 12'; -- same
execute immediate 'alter index '||i.owner||'.'||i.index_name||' parallel 1'; -- same :)
end loop;
end;
/

I hope this script make your life easier. Hugs!

Matheus.

How to Prevent Automatic Database Startup

This is a quick post! 😀
About Oracle Restart
Reference to SRVCTL

Ok!
In a nutshell, my notes:

To register the database, if not already registered:
> srvctl add database -d $DBNAME -o $ORACLE_HOME -p $ORACLE_HOME/dbs/spfile.ora -y manual

Once the database is registered, change the management policy for the database to manual:
> srvctl modify database -d $DBNAME -y manual

Matheus.

Service Detected on OEM but not in SRVCTL or SERVICE_NAMES Parameter?

Okey, it happens.
To me, after a database moving from a cluster to another. The service was registered by SRVCTL in the old cluster but is not needed. So, was not registered in the new cluster.
But OEM insists to list, for example, the “service3” as offline. The problem is that you can not remove it by SRVCTL, because you had not registered, right? See the example below:

Listing services:
srvdatabase1:/home/oracle>srvctl status service -d systemdb
Service service1_systemdb is running on nodes: srvdatabase1
Service service2 is running on nodes: srvdatabase1
Service service2_systemdb is running on nodes: srvdatabase1

In the service_name parameter:
srvdatabase1:/home/oracle>sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Mon Jun 8 15:21:00 2015
Copyright (c) 1982, 2009, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> show parameters service;
NAME                                 TYPE
------------------------------------ --------------------------------
VALUE
------------------------------
service_names                        string
service2,test,systemdb

And the offline alarm goes to “service3“?
The easiest fix:

SQL> exec dbms_service.DELETE_SERVICE('service3');
PL/SQL procedure successfully completed.

Matheus.

Ordering Sequences over RAC – Hang on ‘DFS lock handle’

Hi all!
Whats up?
I had a fun weekend. So, some things to write about. 🙂

This post is just to show an exerience with the event ‘DFS lock handle’, related to sequence ordering over the cluster nodes.

I started a process to make pk id’s adjustment, related to application’s number limitation (int 32 bits -> 4294967296), looking for move older entries and “release” some ids.
As a legacy of database unification, we have a lot of tables with the same name in different schemas, those use the same sequence for pk ids generation.
The readjustment involve a select of a sequence and is runned in a lot of parallel “sqlplus” call to optimize the time and fit to the business maintenance window.

When I started, I just used the global service name (dedicated connection) and the scanlistener. The result was distribuiting connections over the 5 nodes of the cluster. Bad idea.
In the first, I suspected about the concurrency by the sequence over different nodes (could occour if the node caches are too small), based on a few XA transaction bugs involving this event.

By the way, if you’re facing this hang with XA transactions, please take a look on “High rdbms ipc reply and DFS lock handle in 11gR2 RAC With XA Enabled Application (Doc ID 1361615.1)“.
It can be solved by setting “_clusterwide_global_transactions” to FALSE.
It’s recommendable, additionally, to read the Best Practices for Using XA with RAC.

Take a look on the blocking session over the cluster nodes:

proddb4> @sess
User:MATHEUS
SID SERIAL# INST_ID EVENT SQL_ID BLOCKING_SE BLOCKING_SESSION BLOCKING_INSTANCE
------ ---------- ---------- ----------------- -------------- ----------- ---------------- -----------------
9386 147 4 DFS lock handle 9zr9vpvmkqzkv VALID 10968 4
9499 179 4 DFS lock handle fqk9y9q7u2d5c VALID 11082 4
8821 153 4 DFS lock handle 2jd84taf7krh2 VALID 13902 4
22442 1155 3 DFS lock handle 8ycpfxq2jthq3 VALID 9067 3
9860 1339 3 DFS lock handle 2jmzv23ug9kth VALID 10299 3
9772 1529 3 DFS lock handle 802kn9htah6pt VALID 22442 3
22543 1673 5 DFS lock handle 6tgvwkt6cqngk VALID 3074 5
22307 135 5 DFS lock handle 5b3zgqgq7bbdz VALID 3665 5
21010 91 5 DFS lock handle gkmycubvn9aa3 VALID 3546 5
9508 1459 3 DFS lock handle 7cw6bcjsf8xf2 VALID 10387 3
10299 4669 3 DFS lock handle 7y2tnuckh37wp VALID 11795 3
121 139 5 DFS lock handle 6tgvwkt6cqngk VALID 3310 5
596 113 5 DFS lock handle 8yqbzu29shvnm VALID 2603 5
360 113 5 DFS lock handle dv49pafm9z8zy VALID 596 5
10740 3177 3 DFS lock handle c6q65hnq0ju7x VALID 11707 3
9838 181 4 DFS lock handle aqa7afq2upkuq VALID 9386 4
714 77 5 DFS lock handle ft8xzyzhycpn2 VALID 360 5
9951 147 4 DFS lock handle 697mts944db7y VALID 9725 4
950 109 5 DFS lock handle cd2gsz5rb2qw9 VALID 3192 5
10387 1529 3 DFS lock handle 2tqnrbh0x60dp VALID 12238 3
10064 143 4 DFS lock handle d833wg4u9cfyb VALID 10649 1
833 1503 5 DFS lock handle 7ynbg2t4taxha VALID 2366 5
10649 53 1 DFS lock handle 0sgzmj1tbx4rh VALID 10737 1
2249 149 5 DFS lock handle aa6jr8ugxaz4z VALID 833 5
9612 175 4 DFS lock handle d2nrr4gtdjq9b VALID 9499 4
10825 57 1 DFS lock handle acmyc4sw7zzc2 VALID 10649 1
2603 1415 5 DFS lock handle fg47vs5wa8zq8 VALID 22307 5
2485 65 5 DFS lock handle 702x9zwtfktu6 VALID 714 5
10737 55 1 DFS lock handle bthxrpmz0ug63 VALID 12148 3

Ok doke, let’s cancel the sessions and rerun the process just in node node (by SID). It should solve the small caches over cluster hang, without need to modify the sequence, right?

snailBeeep. Wrong:

proddb4> @sess
User:MATHEUS
SID SERIAL# INST_ID EVENT SQL_ID BLOCKING_SE BLOCKING_SESSION BLOCKING_INSTANCE
------ ---------- ---------- --------------- ------------- ----------- ---------------- -----------------
2494 53953 4 DFS lock handle fc3cam368zsp6 UNKNOWN
6561 32113 4 DFS lock handle f618p0hd4xsy0 UNKNOWN
9269 111 4 DFS lock handle fkn8hxbsfkfnz UNKNOWN
9047 175 4 DFS lock handle fqk9y9q7u2d5c VALID 8931 4
459 12605 4 DFS lock handle 5b3zgqgq7bbdz VALID 9271 4
1929 305 4 DFS lock handle 6tgvwkt6cqngk VALID 8026 4
7349 1013 4 DFS lock handle 802kn9htah6pt UNKNOWN
7800 175 4 DFS lock handle 0hc1bmqj1fp4f UNKNOWN
21475 17349 4 DFS lock handle cfh3r4sq788vu VALID 9042 4
8026 641 4 DFS lock handle 6tgvwkt6cqngk VALID 459 4
14919 59 4 DFS lock handle gkmycubvn9aa3 VALID 15373 4
15032 2267 4 DFS lock handle 9zr9vpvmkqzkv VALID 7688 4
15145 2411 4 DFS lock handle ddkqx4xttc9s9 UNKNOWN
15373 1657 4 DFS lock handle 2jd84taf7krh2 VALID 15713 4
8934 157 4 DFS lock handle 8ycpfxq2jthq3 VALID 1929 4
15826 551 4 DFS lock handle d8dhmr2sx08xq VALID 9612 4
15713 3357 4 DFS lock handle 2jmzv23ug9kth VALID 10177 4
8821 155 4 DFS lock handle 9fpmw9cwak21s UNKNOWN
16050 7007 4 DFS lock handle 4t5qkth35r2um VALID 8705 4
2042 1269 4 DFS lock handle 7cw6bcjsf8xf2 UNKNOWN

What a hell!
Lets take a look in one of the sqls to find the sequence…

proddb4> @sqlid 6tgvwkt6cqngk
UPDATE TABLE_XPTO SET RECNO = SEQ_OWNER.SEQ_NAME.NEXTVAL WHERE ROWID=:B1 EXTVAL WHERE ROWID=:B1

And what about the sequence configuration?

proddb4> @getddl sequence SEQ_OWNER SEQ_NAME
create sequence SEQ_OWNER.SEQ_NAME
minvalue 1
maxvalue 9999999999
start with 85669803
increment by 1
cache 120000
cycle
order;

ORDER!
Man, of course. It create a several control over the nodes just to keep the sequence in order, as explained in this post by Christo Kutrovsky.

To my situation, in the business maintenance window, it’s not an important constraint. So, lets disable the ordering:

proddb4> alter sequence SEQ_OWNER.SEQ_NAME noorder;
Sequence altered.

Then, TAADÃÃ!

proddb4> @sess
User:MATHEUS
SID SERIAL# INST_ID EVENT SQL_ID BLOCKING_SE BLOCKING_SESSION BLOCKING_INSTANCE
----- ---------- ------- ------------------------ ------------- ----------- ---------------- -----------------
15145 2411 4 library cache: mutex X ddkqx4xttc9s9 UNKNOWN
15032 2267 4 library cache: mutex X 9zr9vpvmkqzkv UNKNOWN
14919 59 4 library cache: mutex X gkmycubvn9aa3 NOT IN WAIT
9269 111 4 library cache: mutex X fkn8hxbsfkfnz UNKNOWN
9047 175 4 library cache: mutex X fqk9y9q7u2d5c UNKNOWN
8934 157 4 library cache: mutex X 8ycpfxq2jthq3 UNKNOWN
8821 155 4 library cache: mutex X 9fpmw9cwak21s UNKNOWN
8026 641 4 library cache: mutex X 6tgvwkt6cqngk UNKNOWN
7800 175 4 library cache: mutex X 0hc1bmqj1fp4f UNKNOWN
7349 1013 4 library cache: mutex X 802kn9htah6pt UNKNOWN
2042 1269 4 library cache: mutex X 7cw6bcjsf8xf2 UNKNOWN
9160 1205 4 library cache: mutex X 6tgvwkt6cqngk NOT IN WAIT
9042 293 4 library cache: mutex X 4jc1u6n2qx94z UNKNOWN
10177 5611 4 library cache: mutex X c6q65hnq0ju7x UNKNOWN
9271 235 4 library cache: mutex X d2nrr4gtdjq9b NOT IN WAIT
9951 1191 4 library cache: mutex X bkdhxhhqdbqb9 UNKNOWN
8931 291 4 library cache: mutex X 697mts944db7y UNKNOWN
9838 1315 4 library cache: mutex X 6q6r7ht1hnctg NOT IN WAIT
8818 325 4 library cache: mutex X 2tqnrbh0x60dp UNKNOWN

Of course we’re having some mutex x, but it’s a lot better then DFS lock, and the process just “go”. 🙂

After be done, to keep the configuration, let’s enable ordering again:

proddb4> alter sequence SEQ_OWNER.SEQ_NAME order;
Sequence altered.

Matheus.

ASM: Adding disk “_DROPPED%” FORCE

Ok doke,
First let I make it clear: Adding a disk with force should be avoided, mainly by all the rebalance involved. The best choice, if you has “time”, is to just put disks online, like:

1) ALTER DISKGROUP ONLINE DISK ; or
2) ALTER DISKGROUP ONLINE DISKS IN FAILGROUP ; or
3) ALTER DISKGROUP ONLINE ALL;

But, the post is about adding back to DG the dropped disks.
Let’s imagine, to undestand my situation, you lost the contact with one of your two site storage… In this example, represented by failgroup FGAUX. You would see the disks like this:

SQL> select name,failgroup,state from v$asm_disk a where state <> 'NORMAL';

NAME FAILGROUP STATE
------------------------------ ------------------------------ --------
_DROPPED_0000_DGDATA FGAUX FORCING
_DROPPED_0001_DGDATA FGAUX FORCING
_DROPPED_0002_DGDATA FGAUX FORCING

So, you know your disks by the name pattern (0 are FGMAIN and 1 are FGAUX, the problematic). You can do something like:

[root@database-host ~]# /etc/init.d/oracleasm listdisks |grep DGDATA
DGDATA001
DGDATA002
DGDATA003
DGDATA101
DGDATA102
DGDATA103

Now, make the simple… 🙂

SQL> ALTER DISKGROUP DGDATA ADD
FAILGROUP FGAUX
DISK
'ORCL:DGDATA101' name DGDATA101 FORCE,
'ORCL:DGDATA102' name DGDATA102 FORCE,
'ORCL:DGDATA103' name DGDATA103 FORCE;

Diskgroup altered.

SQL> ALTER DISKGROUP DGDATA rebalance power 8;

Diskgroup altered.

While waiting the reball, let’s see the disks in DG:

SQL> select * from v$asm_operation where group_number=(select group_number from v$asm_diskgroup where name='DGDATA');

GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE
------------ ----- ---- ---------- ---------- ---------- ---------- ---------- ----------- --------------------------------------------
3 REBAL WAIT 8
SQL> select name,failgroup,state from v$asm_disk a where group_number=(select group_number from v$asm_diskgroup where name='DGDATA');

NAME FAILGROUP STATE
------------------------------ ------------------------------ --------
_DROPPED_0000_DGDATA FGAUX FORCING
_DROPPED_0001_DGDATA FGAUX FORCING
_DROPPED_0002_DGDATA FGAUX FORCING
DGDATA101 FGAUX NORMAL
DGDATA102 FGAUX NORMAL
DGDATA103 FGAUX NORMAL
DGDATA001 FGMAIN NORMAL
DGDATA002 FGMAIN NORMAL
DGDATA003 FGMAIN NORMAL

And, when the rebalance end, the situation will be OK:

SQL> select * from v$asm_operation where group_number=(select group_number from v$asm_diskgroup where name='DGDATA');

GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE
------------ ----- ---- ---------- ---------- ---------- ---------- ---------- ----------- --------------------------------------------
3 REBAL RUN 8 8 629 19087 10143 1

SQL> select * from v$asm_operation where group_number=(select group_number from v$asm_diskgroup where name='DGDATA');

no rows selected

SQL> select name,failgroup,state from v$asm_disk a where group_number=(select group_number from v$asm_diskgroup where name='DGDATA');

NAME FAILGROUP STATE
------------------------------ ------------------------------ --------
DGDATA101 FGAUX NORMAL
DGDATA102 FGAUX NORMAL
DGDATA103 FGAUX NORMAL
DGDATA001 FGMAIN NORMAL
DGDATA002 FGMAIN NORMAL
DGDATA003 FGMAIN NORMAL

OK? Easy! 😀

Matheus.

Whats is the main characteristics/skills of a DBA? [part 2]

This is a second round about this post.

In the first part, I listed the top 8 “Personal Characteristics” of a DBA, to me and 7 other authors. Now, let’s do the same about the “Technical Characteristics”, or just “Skills”…

DBA

As I said in the first part of, it’s not an usual subject, so, identify and select these skills was not an easy task. Also as in the first part, I made my own list before the search. And I believe in something like that…

# Top 8 DBA Skills:
(By Matheus)

1) Database Architecture

Components, pools, events, basic flow and structure, reports, logs, traces, views, SQL concepts… In a nutshell, how to use a Database. 

2) Logic and some Programming Skills

Basic programming skills/logic is needed. Programming best practices, techniques and logics base in, at least, 2 different paradigms.

3) Infrastructure Knowledge

Components that surround or are basic services to DB : Networks, Computer Architecture / processors , Storage , Application Server operation , etc.

4) Data Structs and Database Design/Modeling

After all, it’s what database means: understand about structure of objects, partitioning, indexes, statistics, transactions, data modeling, manipulation tools and data migration.

5) Solid SQL and PL/SQL

As a base of data, you must know where data is, how to get it, how to manipulate it efficiently and aligned with good development practices. Isn’t it?

6) Database and SQL Performance Tuning

Wide and the hardest to archive in my opinion. For archive excellence on it, you’ll need all items above. Database operating knowledge, their pools, optimizers, available resources, development/programming and SQL to find gaps in the code, and infrastructure knowledge to understand possible external interference in the functioning of the DB.

7) Security and Oracle Support Interacting

Knowing good patching management practices, how to apply them, techniques and strategies for patch, upgrade and migration, openness and interaction on SRs, navigation Metalink, seraching and understanding of bugs, backup/recovery (DRs) management, access and possible security gaps in infrastructure, systems, or credentials.

8) Knows the Environment / Applications

To know the main applications of the company where you are, it implementation language, some business rules, the environmental behavior, most normal events, peak hours, the operation of legacy applications, people, company practices and their major gaps is essential.

What do you think about my list?
Here is the lists of 5 important references I found, just to have in mind:

# Burleson
– System analisys and design skills
– Database design skills
– Physical disk storage skills
– Databa Secutiry Skills
– Backup and Recovery Skills
– Change Control Management Skills
(http://www.dba-oracle.com/oracle_tips_dba_job_skills.htm)

# Craig Mullins
– Data modeling and database design
– Metadata managements and respository usage
– Database schema creationg and management
– Backup and recovery
– Ensuring data integrity
– Performance management and tunning
– Ensuring availability
– SQL code reviews and walk-thru
– Procedural skills
– Data security
– Capacity planning
– General database management
– General system managements and networking skills
– ERP and business knowledge
– Extensible datatype administration
– Web-specific technology expertise
– Storage management technics
(http://www.craigsmullins.com/dbta_085.htm)

# Mark Spenik and Orryn Sledge
– Knowledge of Structured Query Language (SQL)
– Sound database design
– General understanding of network architectures (for example, Client/Server, Internet/Intranet, Enterprise)
– Knowledge about the database itself
(http://www.developer.com/db/article.php/718491/What-Is-a-Database-Administrator.htm)

# Oracle
– Installing and upgrading the Oracle server and application tools
– Allocating system storage and planning future storage requirements for the database system
– Creating primary database storage structures (tablespaces) after application developers have designed an application
– Creating primary objects (tables, views, indexes) once application developers have designed an application
– Modifying the database structure, as necessary, from information given by application developers
– Enrolling users and maintaining system security
– Ensuring compliance with your Oracle license agreement
– Controlling and monitoring user access to the database
– Monitoring and optimizing the performance of the database
– Planning for backup and recovery of database information
– Maintaining archived data on tape
– Backing up and restoring the database
– Contacting Oracle Corporation for technical support
(http://www.developer.com/db/article.php/718491/What-Is-a-Database-Administrator.htm)

# Wikipedia
– Communication
– Database theory
– Database design
– Technology knowledgements
– SQL & PL/SQL
– Computing architectures
– Operating Systems
(http://en.wikipedia.org/wiki/Database_administrator)

What do YOU think about it?

Liked this post?
Leave a comment and feel free to share it!