RMAN-06059: expected archived log not found

# Error

RMAN-03002: failure of backup command at 06/28/2015 14:56:30
RMAN-06059: expected archived log not found, loss of archived log compromises recoverability
ORA-19625: error identifying file +DGFRA/corpdb/archivelog/2015_06_27/thread_1_seq_20198.1192.883524615
ORA-17503: ksfdopn:2 Failed to open file +DGFRA/corpdb/archivelog/2015_06_27/thread_1_seq_20198.1192.883524615
ORA-15012: ASM file '+DGFRA/corpdb/archivelog/2015_06_27/thread_1_seq_20198.1192.883524615' does not exist

# Solution
First of all, you need to know which files exists or not:

RMAN> CROSSCHECK ARCHIVELOG ALL;

Then, clear the missing and run another backup.

RMAN> DELETE EXPIRED ARCHIVELOG ALL;

It’s hardly recommended that you make a full backup after that, to ensure you have a recoverable state.

Matheus.

RHEL: Adding User/Group to SSH and SUDOERS file

Some Linux basics… To add a group or a user (this case “new_group”) to the ssh and sudoers file:

[root@db-server ~]# vi /etc/ssh/sshd_config
[root@db-server ~]# cat /etc/ssh/sshd_config |grep new_group
AllowGroups ssh_group1 root oinstall linux new_group
[root@db-server ~]# vi /etc/sudoers
[root@db-server ~]# cat /etc/sudoers |grep new_group
%new_group      ALL=(ALL)       PASSWD: ALL
[root@db-server ~]# service sshd restart
Stopping sshd:                                             [  OK  ]
Starting sshd:                                             [  OK  ]
[root@db-server ~]#

Matheus.

ORA-00845: MEMORY_TARGET not supported on this system (RHEL)

# Solution:
Make sure that /dev/shm is mounted. You can check this by typing df -k at the command prompt. It will look something like this:

Filesystem Size Used Avail Use% Mounted on

shmfs 1G 512M 512M 50% /dev/shm

If you don’t find it then you will have to manually mount it as root user. The size should be more than MEMORY_TARGET or MEMORY_MAX_TARGET.

For example, if the MEMORY_TARGET is less than 2 GB, you should make like that:

#root: mount -t tmpfs shmfs -o size=2048m /dev/shm

I recommend you add an entry in /etc/fstab so that the mount remains persistent even after a reboot.
To make it, add the following entry in /etc/fstab:

shmfs /dev/shm tmpfs size=2048m 0 0

Helped?
Share this post!

Matheus.

ORA-01548: active rollback segment found, terminate

# Problem

SQL> drop tablespace UNDOTBS1;
drop tablespace UNDOTBS1
*
ERROR at line 1:
ORA-01548: active rollback segment '_SYSSMU10_1251904955$' found, terminate dropping tablespace
SQL> drop rollback segment "_SYSSMU3_1251904955$";
Rollback segment dropped.
SQL> drop tablespace UNDOTBS1;
drop tablespace UNDOTBS1
*
ERROR at line 1:
ORA-01548: active rollback segment '_SYSSMU10_1251904955$' found, terminate dropping tablespace


# Solution

CREATE ROLLBACK SEGMENT rb1 STORAGE(INITIAL 1M next 1M minextents 20) tablespace UNDOTBS5;
CREATE ROLLBACK SEGMENT rb2 STORAGE(INITIAL 1M next 1M minextents 20) tablespace UNDOTBS5;
CREATE ROLLBACK SEGMENT rb3 STORAGE(INITIAL 1M next 1M minextents 20) tablespace UNDOTBS5;

# Why?
The UNDO_MANAGEMENT is set as ‘MANUAL’, right? To drop any undo the default UNDO must have at least one segment.

Matheus.

GoldenGate: RAC One Node Archivelog Missing

The situation:

We have a GoldenGate on Allow Mode running some extracts on RAC One Node Database (reading the archivelogs). And then, suddenly, the instance crashes (network lost contact to the server) and the other instance (thread) was auto started by CRS. To the database no problems: The other node redologs was used during the startup recover and every thing is ok.

The application running with Weblogic serverpool and gridlink just had a little contention and continued the operation thought the started instance. The Goldengate switch was manually made, but some sequences was lost. What we found? the sequences was in the old thread’s redologfiles. It should be backed up if fast_start_mttr_target was different to zero. Buuut, the world is not so beautiful:

raconenodedb> show parameters mttr
NAME TYPE VALUE
------------------------------------
fast_start_mttr_target integer 0

How we solved?
Simple solution: identified the group/thread and made a cp from ASM. The copied redolog was used as archivelog on goldengate and everything was ok.

raconenodedb> select sequence#,group#,thread# from v$log where thread#=2 order by 1;
SEQUENCE# GROUP# THREAD#
---------- ---------- ----------
39636 6 2
39637 7 2
39638 8 2
39639 9 2
39640 10 2
ASMCMD> cp group_10.288.859482805 /oracle/grup10_thread2
copying +DGDATA/MYDB/ONLINELOG/group_10.288.859482805 -> /oracle/grup10_thread2

Easy like that.

Matheus.

Error: Starting ACFS in RHEL 6 (Can’t exec “/usr/bin/lsb_release”)

Quick tip:

# Error:
[root@db1gridserver1 bin]# ./acfsload start -s
Can’t exec “/usr/bin/lsb_release”: No such file or directory at /grid/product/11.2.0/lib/osds_acfslib.pm line 511.
Use of uninitialized value $LSB_RELEASE in split at /grid/product/11.2.0/lib/osds_acfslib.pm line 516.

# Solution:
[root@db1gridserver1 bin]# yum install redhat-lsb-core-4.0

Note: Bug 17359415 – Linux: Configuring ACFS reports that cannot execute ‘/usr/bin/lsb_release’ (Doc ID 17359415.8)

Matheus.

Getting Oracle Parameters: Hidden and Unhidden

Today’s post is a quick post!
Very quick post! very very quick post!
But it’s a helpful post!

Connected as sys with sysdba:

select x.ksppinm name,
ksppdesc description,
y.kspftctxvl value,
y.kspftctxdf isdefault,
decode(bitand(y.kspftctxvf, 7), 1,'MODIFIED',4,'SYSTEM_MOD','FALSE') ismod, decode(bitand(y.kspftctxvf, 2), 2, 'TRUE', 'FALSE') isadj
from sys.x$ksppi x, sys.x$ksppcv2 y
where x.inst_id = userenv('Instance')
and y.inst_id = userenv('Instance')
and x.indx + 1 = y.kspftctxpn
order by name;

Matheus.

Application Hangs: resmgr:become active

Application APP hangs with resmgr:become active. There is a resource plan defined who has a specific group to this Application. What is wrong and how to fix?

Here I presume you what is a resource manager and a resource plan. And, of course, for what purpose they exists. You must to know that this event is related to high active sessions in the group of resource plan too.

Before everything else, please understand if this is an acceptable behavior of the application. Then, in which resource group the sessions in this event are. The are other application in this same group with an unacceptable behavior? Yes? So, fix it.
No? Consider tho adjust the resource plan, switch the application to a new group, or, like in my case, remap the Application APP to the right group… ¬¬

To make it clear: In my case, the mapping is missing, so the schema MYAPP (Application APP) fit to OTHER_GROUP, where we use to set minimal limits:

SID SERIAL# INST_ID USERNAME RESOURCE_CONSUMER_GROUP EVENT
----- ---------- ---------- ------------------------------ -----------
492 29459 2 MYAPP OTHER_GROUPS resmgr:become active
1102 19145 2 MYAPP OTHER_GROUPS resmgr:become active
955 33161 2 MYAPP OTHER_GROUPS resmgr:become active
1084 33839 2 MYAPP OTHER_GROUPS db file sequential read
MYDB> show parameters resource_manager_plan
NAME TYPE VALUE
--------------------- ------ --------------
resource_manager_plan string MYDB_PLAN
MYDB> select group_or_subplan, active_sess_pool_p1, cpu_p1, cpu_p2, cpu_p3, cpu_p4 from DBA_RSRC_PLAN_DIRECTIVES where plan = 'MYDB_PLAN'
Enter value for plano: MYDB_PLAN
GROUP_OR_SUBPLAN ACTIVE_SESS_POOL_P1 CPU_P1 CPU_P2 CPU_P3 CPU_P4
------------------------------ ------------------- ---------- ---------- ---------- ----------
BATCH_GROUP 60 0 10 0 0
SYS_GROUP 80 0 0 0
APP_PLAN 20 0 30 0 0
OTHER_GROUPS 20 0 20 0 0
GGATE_GROUP 0 10 0 0
PAYTRUE_GROUP 40 0 30 0 0
DBA_GROUP 20 0 0 0

You can configure the mapping by user like that:

BEGIN
DBMS_RESOURCE_MANAGER.clear_pending_area;
DBMS_RESOURCE_MANAGER.create_pending_area;
DBMS_RESOURCE_MANAGER.set_consumer_group_mapping (
attribute => DBMS_RESOURCE_MANAGER.oracle_user,
-- DBMS_RESOURCE_MANAGER.service_name (or a lot of possibilities. Google it!)
value => 'MYAPP',
consumer_group => 'APP_PLAN');
DBMS_RESOURCE_MANAGER.validate_pending_area;
DBMS_RESOURCE_MANAGER.submit_pending_area;
END;
/

To switch the connected sessions, it can be done like:

SELECT 'EXEC DBMS_RESOURCE_MANAGER.SWITCH_CONSUMER_GROUP_FOR_SESS ('''||SID||''','''||SERIAL#||''',''APP_PLAN'');' FROM V$SESSION where username='MYAPP'
and RESOURCE_CONSUMER_GROUP='OTHER_GROUPS';

Remember that creating a resource plan without making the mappings is a bit pointless… 😛

Matheus.

Application Looping Until Lock a Row with NOWAIT Clause

Yesterday I treated an interesting situation:
A BATCH stayed on “SQL*Net message from client” event but the last_call_et was always on 1 or 0. Seems OK, with some client contention to send the commands to the DBMS, right? Nope.

It was caused by a loop in the application code “waiting” for a row lock but without “DBMS waiting events” (something like “select * from table for update nowait”). Take a look in how it was identified below.

First the session with no SQL_ID, no wait events and last_Call_et=0 of a “BATH_PROCESS” user:

proddb2> @sid
Sid:9796
Inst:
LAST_CALL_ET SQL_ID   EVENT STATUS SID SERIAL# INST_ID USERNAME
------------ ------- ------------- ---------- ------------------------
0 SQL*Net message from client INACTIVE 9796 45117 2 BATCH_PROCESS
proddb2> @trace
Enter value for sid: 9796
Enter value for serial: 45117
PL/SQL procedure successfully completed.

As you see, with no idea about what is happening, I started a trace. The trace was stuck with this:

*** 2015-06-15 14:03:25.755
WAIT #4574470448: nam='SQL*Net message from client' ela=993072 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833326636999
CLOSE #4574470448:c=10,e=15,dep=0,type=3,tim=12833326637228
PARSE #4574470448:c=25,e=41,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1139820409,tim=12833326637286
BINDS #4574470448:
Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=01 fl2=1000000 frm=00 csi=00 siz=24 off=0
kxsbbbfp=110a8d0d8 bln=22 avl=05 flg=05
value=5022011
WAIT #4574470448: nam='gc cr block 2-way' ela= 709 p1=442 p2=5944 p3=8483 obj#=0 tim=12833326638533
WAIT #4574470448: nam='gc cr block 2-way' ela= 541 p1=3 p2=2088264 p3=4367 obj#=0 tim=12833326639352
WAIT #4574470448: nam='gc cr block 2-way' ela= 651 p1=442 p2=5944 p3=8483 obj#=0 tim=12833326641673
WAIT #4574470448: nam='enq: TX - row lock contention' ela= 1093 name|mode=1415053318 usn<obj#=23141074 tim=12833326643029
EXEC #4574470448:c=1776,e=5836,p=0,cr=117,cu=1,mis=0,r=0,dep=0,og=1,plh=1139820409,tim=12833326643150
ERROR #4574470448:err=54 tim=12833326643172
WAIT #4574470448: nam='SQL*Net break/reset to client' ela= 9 driver id=1413697536 break?=1 p3=0 obj#=23141074 tim=12833326643373
WAIT #4574470448: nam='SQL*Net break/reset to client' ela= 503 driver id=1413697536 break?=0 p3=0 obj#=23141074 tim=12833326643891
WAIT #4574470448: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833326643915

AHÁ!
Did you see the “err=54” there? Yes. You know this error:

ORA-00054: Resource busy and acquire with NOWAIT specified

It’s caused by a SELECT FOR UPDATE NOWAIT in the code.
But, this select is in a loop, so the session don’t go ahead until have it.
(Obviously it could be coded with some treatment/better logic for this loop and errors, buuuut…)

What can we do now?
The easy way is to discover the holding session and kill it.
And sometimes the easy way is the best way. 😉

For that, we use the “obj#” and “value”, also bolded in the trace.
As I know the application, I know that the used field in all “where clauses” is the “RECNO” column. But if you don’t, it’s needed to discover. With this information in mind:

proddb2>select * from dba_objects where object_id='23141074';
OWNER OBJECT_NAME
------------------------------ ----------------
OWNER_EXAMPLE TABLE_XPTO
proddb2> select * from OWNER_EXAMPLE.TABLE_XPTO WHERE recno=5022011;
COL_KEY FSAMED0 FSAMED1 FSMNEG1 FSMNEG2 FSMNEG3 COL_DATE RECNO
------- ---------- ---------- ---------- ---------- ---------- -----
1002974 0 0 -516.8 0 0 15/06/2015 00:00:00 5022011

Ok, I know the row that is holded by the other session.
Let’s discover which session is causing a lock by myself (but in my case, without “NOWAIT” clause, to have time to find the holder):

proddb5>select * from OWNER_EXAMPLE.TABLE_XPTO WHERE recno=5022011 for update;

In another sqlplus session:

proddb2> @me
INST_ID SID SERIAL# USERNAME EVENT BLOCKING_SE BLOCKING_SESSION BLOCKING_INSTANCE
------- ---------- ---------- --------------- ----------------------
5 14174 479 MATHEUS_BOESING enq:TX - row lock contention VALID 11006 1
2 4233 12879 MATHEUS_BOESING PX Deq: Execution Msg NOT IN WAIT
1 15410 7697 MATHEUS_BOESING PX Deq: Execution Msg NOT IN WAIT

AHÁ again!
The SID 11006. Let’s see who is there:

proddb2> @sid
Sid:11006
Inst:
SQL_ID SEQ# EVENT STATUS SID SERIAL# INST_ID USERNAME
-------------------- ---------- --------------------------------------
9jzm6vn5j06js 24919 enq: TX - row lock contention ACTIVE 11006 44627 1 DBLINK_OTHER_BATCH_SCHEMA

Ok, it’s another session of a different batch process in a remote database holding this row. As it’s less relevant, lets kill! Muahaha!
Then, you’ll see, my session get the lock and is in the middle of a transaction:

proddb1> @kill
***
sid : 11006
serial : 44627
***
System altered.
***
proddb1> @me
INST_ID SID SERIAL# USERNAME EVENT BLOCKING_SE BLOCKING_SESSION BLOCKING_INSTANCE
------- ---------- ---------- --------------- --------------------
5 14174 479 MATHEUS_BOESING transaction UNKNOWN
2 4332 56037 MATHEUS_BOESING PX Deq: Execution Msg NOT IN WAIT
1 12058 9 MATHEUS_BOESING class slave wait NO HOLDER

To release the “row locked” to my principal process, lets suicide (kill my own session, this case, that is holding the row lock right now).

proddb5> @kill
***
sid : 14174
serial : 479
***
System altered.
***

After kill all the holding sessions, my BATCH_PROCESS just gone! 😀
Take a look on the trace (running ok):

WAIT #4576933904: nam='SQL*Net message to client' ela= 3 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981531019
FETCH #4576933904:c=45,e=71,p=0,cr=3,cu=0,mis=0,r=5,dep=0,og=1,plh=419358542,tim=12833981531062
WAIT #4576933904: nam='SQL*Net message from client' ela= 562 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981531654
WAIT #4576933904: nam='SQL*Net message to client' ela= 3 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981531788
FETCH #4576933904:c=55,e=86,p=0,cr=2,cu=0,mis=0,r=5,dep=0,og=1,plh=419358542,tim=12833981531826
WAIT #4576933904: nam='SQL*Net message from client' ela= 715 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981532576
WAIT #4576933904: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981532721
FETCH #4576933904:c=61,e=96,p=0,cr=2,cu=0,mis=0,r=5,dep=0,og=1,plh=419358542,tim=12833981532758
WAIT #4576933904: nam='SQL*Net message from client' ela= 600 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981533617
WAIT #4576933904: nam='SQL*Net message to client' ela= 3 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981534163
FETCH #4576933904:c=52,e=82,p=0,cr=2,cu=0,mis=0,r=5,dep=0,og=1,plh=419358542,tim=12833981534203
WAIT #4576933904: nam='SQL*Net message from client' ela= 517 driver id=1413697536 #bytes=1 p3=0 obj#=23141074 tim=12833981534752

Now, with the problem solved, lets disable the trace and continue the other daily tasks… 🙂

proddb2> @untrace
Enter value for sid: 9796
Enter value for serial: 45117
PL/SQL procedure successfully completed.

I hope it was useful!
If helped you, make a comment! 😀

See ya!
Matheus.