Ordering Sequences over RAC – Hang on ‘DFS lock handle’

Hi all!
Whats up?
I had a fun weekend. So, some things to write about. 🙂

This post is just to show an exerience with the event ‘DFS lock handle’, related to sequence ordering over the cluster nodes.

I started a process to make pk id’s adjustment, related to application’s number limitation (int 32 bits -> 4294967296), looking for move older entries and “release” some ids.
As a legacy of database unification, we have a lot of tables with the same name in different schemas, those use the same sequence for pk ids generation.
The readjustment involve a select of a sequence and is runned in a lot of parallel “sqlplus” call to optimize the time and fit to the business maintenance window.

When I started, I just used the global service name (dedicated connection) and the scanlistener. The result was distribuiting connections over the 5 nodes of the cluster. Bad idea.
In the first, I suspected about the concurrency by the sequence over different nodes (could occour if the node caches are too small), based on a few XA transaction bugs involving this event.

By the way, if you’re facing this hang with XA transactions, please take a look on “High rdbms ipc reply and DFS lock handle in 11gR2 RAC With XA Enabled Application (Doc ID 1361615.1)“.
It can be solved by setting “_clusterwide_global_transactions” to FALSE.
It’s recommendable, additionally, to read the Best Practices for Using XA with RAC.

Take a look on the blocking session over the cluster nodes:

proddb4> @sess
User:MATHEUS
SID SERIAL# INST_ID EVENT SQL_ID BLOCKING_SE BLOCKING_SESSION BLOCKING_INSTANCE
------ ---------- ---------- ----------------- -------------- ----------- ---------------- -----------------
9386 147 4 DFS lock handle 9zr9vpvmkqzkv VALID 10968 4
9499 179 4 DFS lock handle fqk9y9q7u2d5c VALID 11082 4
8821 153 4 DFS lock handle 2jd84taf7krh2 VALID 13902 4
22442 1155 3 DFS lock handle 8ycpfxq2jthq3 VALID 9067 3
9860 1339 3 DFS lock handle 2jmzv23ug9kth VALID 10299 3
9772 1529 3 DFS lock handle 802kn9htah6pt VALID 22442 3
22543 1673 5 DFS lock handle 6tgvwkt6cqngk VALID 3074 5
22307 135 5 DFS lock handle 5b3zgqgq7bbdz VALID 3665 5
21010 91 5 DFS lock handle gkmycubvn9aa3 VALID 3546 5
9508 1459 3 DFS lock handle 7cw6bcjsf8xf2 VALID 10387 3
10299 4669 3 DFS lock handle 7y2tnuckh37wp VALID 11795 3
121 139 5 DFS lock handle 6tgvwkt6cqngk VALID 3310 5
596 113 5 DFS lock handle 8yqbzu29shvnm VALID 2603 5
360 113 5 DFS lock handle dv49pafm9z8zy VALID 596 5
10740 3177 3 DFS lock handle c6q65hnq0ju7x VALID 11707 3
9838 181 4 DFS lock handle aqa7afq2upkuq VALID 9386 4
714 77 5 DFS lock handle ft8xzyzhycpn2 VALID 360 5
9951 147 4 DFS lock handle 697mts944db7y VALID 9725 4
950 109 5 DFS lock handle cd2gsz5rb2qw9 VALID 3192 5
10387 1529 3 DFS lock handle 2tqnrbh0x60dp VALID 12238 3
10064 143 4 DFS lock handle d833wg4u9cfyb VALID 10649 1
833 1503 5 DFS lock handle 7ynbg2t4taxha VALID 2366 5
10649 53 1 DFS lock handle 0sgzmj1tbx4rh VALID 10737 1
2249 149 5 DFS lock handle aa6jr8ugxaz4z VALID 833 5
9612 175 4 DFS lock handle d2nrr4gtdjq9b VALID 9499 4
10825 57 1 DFS lock handle acmyc4sw7zzc2 VALID 10649 1
2603 1415 5 DFS lock handle fg47vs5wa8zq8 VALID 22307 5
2485 65 5 DFS lock handle 702x9zwtfktu6 VALID 714 5
10737 55 1 DFS lock handle bthxrpmz0ug63 VALID 12148 3

Ok doke, let’s cancel the sessions and rerun the process just in node node (by SID). It should solve the small caches over cluster hang, without need to modify the sequence, right?

snailBeeep. Wrong:

proddb4> @sess
User:MATHEUS
SID SERIAL# INST_ID EVENT SQL_ID BLOCKING_SE BLOCKING_SESSION BLOCKING_INSTANCE
------ ---------- ---------- --------------- ------------- ----------- ---------------- -----------------
2494 53953 4 DFS lock handle fc3cam368zsp6 UNKNOWN
6561 32113 4 DFS lock handle f618p0hd4xsy0 UNKNOWN
9269 111 4 DFS lock handle fkn8hxbsfkfnz UNKNOWN
9047 175 4 DFS lock handle fqk9y9q7u2d5c VALID 8931 4
459 12605 4 DFS lock handle 5b3zgqgq7bbdz VALID 9271 4
1929 305 4 DFS lock handle 6tgvwkt6cqngk VALID 8026 4
7349 1013 4 DFS lock handle 802kn9htah6pt UNKNOWN
7800 175 4 DFS lock handle 0hc1bmqj1fp4f UNKNOWN
21475 17349 4 DFS lock handle cfh3r4sq788vu VALID 9042 4
8026 641 4 DFS lock handle 6tgvwkt6cqngk VALID 459 4
14919 59 4 DFS lock handle gkmycubvn9aa3 VALID 15373 4
15032 2267 4 DFS lock handle 9zr9vpvmkqzkv VALID 7688 4
15145 2411 4 DFS lock handle ddkqx4xttc9s9 UNKNOWN
15373 1657 4 DFS lock handle 2jd84taf7krh2 VALID 15713 4
8934 157 4 DFS lock handle 8ycpfxq2jthq3 VALID 1929 4
15826 551 4 DFS lock handle d8dhmr2sx08xq VALID 9612 4
15713 3357 4 DFS lock handle 2jmzv23ug9kth VALID 10177 4
8821 155 4 DFS lock handle 9fpmw9cwak21s UNKNOWN
16050 7007 4 DFS lock handle 4t5qkth35r2um VALID 8705 4
2042 1269 4 DFS lock handle 7cw6bcjsf8xf2 UNKNOWN

What a hell!
Lets take a look in one of the sqls to find the sequence…

proddb4> @sqlid 6tgvwkt6cqngk
UPDATE TABLE_XPTO SET RECNO = SEQ_OWNER.SEQ_NAME.NEXTVAL WHERE ROWID=:B1 EXTVAL WHERE ROWID=:B1

And what about the sequence configuration?

proddb4> @getddl sequence SEQ_OWNER SEQ_NAME
create sequence SEQ_OWNER.SEQ_NAME
minvalue 1
maxvalue 9999999999
start with 85669803
increment by 1
cache 120000
cycle
order;

ORDER!
Man, of course. It create a several control over the nodes just to keep the sequence in order, as explained in this post by Christo Kutrovsky.

To my situation, in the business maintenance window, it’s not an important constraint. So, lets disable the ordering:

proddb4> alter sequence SEQ_OWNER.SEQ_NAME noorder;
Sequence altered.

Then, TAADÃÃ!

proddb4> @sess
User:MATHEUS
SID SERIAL# INST_ID EVENT SQL_ID BLOCKING_SE BLOCKING_SESSION BLOCKING_INSTANCE
----- ---------- ------- ------------------------ ------------- ----------- ---------------- -----------------
15145 2411 4 library cache: mutex X ddkqx4xttc9s9 UNKNOWN
15032 2267 4 library cache: mutex X 9zr9vpvmkqzkv UNKNOWN
14919 59 4 library cache: mutex X gkmycubvn9aa3 NOT IN WAIT
9269 111 4 library cache: mutex X fkn8hxbsfkfnz UNKNOWN
9047 175 4 library cache: mutex X fqk9y9q7u2d5c UNKNOWN
8934 157 4 library cache: mutex X 8ycpfxq2jthq3 UNKNOWN
8821 155 4 library cache: mutex X 9fpmw9cwak21s UNKNOWN
8026 641 4 library cache: mutex X 6tgvwkt6cqngk UNKNOWN
7800 175 4 library cache: mutex X 0hc1bmqj1fp4f UNKNOWN
7349 1013 4 library cache: mutex X 802kn9htah6pt UNKNOWN
2042 1269 4 library cache: mutex X 7cw6bcjsf8xf2 UNKNOWN
9160 1205 4 library cache: mutex X 6tgvwkt6cqngk NOT IN WAIT
9042 293 4 library cache: mutex X 4jc1u6n2qx94z UNKNOWN
10177 5611 4 library cache: mutex X c6q65hnq0ju7x UNKNOWN
9271 235 4 library cache: mutex X d2nrr4gtdjq9b NOT IN WAIT
9951 1191 4 library cache: mutex X bkdhxhhqdbqb9 UNKNOWN
8931 291 4 library cache: mutex X 697mts944db7y UNKNOWN
9838 1315 4 library cache: mutex X 6q6r7ht1hnctg NOT IN WAIT
8818 325 4 library cache: mutex X 2tqnrbh0x60dp UNKNOWN

Of course we’re having some mutex x, but it’s a lot better then DFS lock, and the process just “go”. 🙂

After be done, to keep the configuration, let’s enable ordering again:

proddb4> alter sequence SEQ_OWNER.SEQ_NAME order;
Sequence altered.

Matheus.

ORA-24777: use of non-migratable database link not allowed

Hoje reencontrei este erro e, como de costume, já havia esquecido qual era a solução. Por isso fica aqui minha anotação. 🙂
Tive que puxar mais da memória do que qualquer outra coisa, pois todos os links da primeira página do Google tinham apenas o stack do erro mas sem solução. Que na realidade é bem simples:

O erro ocorre devido sessão utilizando shared connection (XA) utilizar DBLink com conexão dedicated. O uso de XA no JDBC é uma configuração do datasource, caso você esteja utilizando weblogic, jboss, tomcat ou outro com suporte a java.
A solução que adotei até hoje consiste setar o “host” do DBLink ou a entrada de TNS utilizada pelo DBLink no arquivo tnsnames do BD de origem para utilizar tipo de conexão “shared” para conexão remota.

Ex Antes:  (DESCRIPTION= (ADDRESS_LIST= (ADDRESS=(PROTOCOL=TCP)(HOST=)(PORT=))) (CONNECT_DATA= (SERVICE_NAME=<serviço>) (SERVER=DEDICATED)))
Ex Depois:  (DESCRIPTION= (ADDRESS_LIST= (ADDRESS=(PROTOCOL=TCP)(HOST=)(PORT=))) (CONNECT_DATA= (SERVICE_NAME=<serviço>) (SERVER=SHARED)))

Segundo o stackoverflow, pode ser resolvido criando o database link como shared.
Ex: CREATE SHARED DATABASE LINK CONNECT TO bob IDENTIFIED  BY “MyNewPasswd1” AUTHENTICATED BY jim IDENTIFIED BY “JimsPass23” USING ‘DB01’;

Na realidade a segunda solução me pareceu muito mais elegante. Mas só percebi que existia depois de já ter resolvido pela primeira abordagem.
Se acontecer novamente, testo da segunda forma e comento aqui. Prometo. 😉

Ah!
Acho que não é necessario mencionar que o BD destino do DBLink precisa estar escutando em algum serviço com suporte a XA, né?
Não é o assunto deste post, então não vou entrar em detalhes. Mas basicamente você precisa setar para um valor diferente de zero os parâmetros: max_shared_servers; shared_servers; max_dispatchers e dispatchers e subir o serviço do dispartcher no listener. Taca no Google aí que essa parte é barbada!

Abraço!
Matheus.