HANGANALYZE Part 1

Hi all!
I realized I have some posts about database hangs but have no posts about hanganalyze, system state or ashdump usage. So let’s fix it. 🙂
To organize the ideas I’m going to split the subject on three posts. This first will be about hanganalyse.

See the second part of this post here: HANGANALIZE Part 2.

Ok, so let me refer the most clear Oracle words I could found:
“Hanganalyze tries to work out who is waiting for who by building wait chains, and then depending on the level will request various processes to dump their errorstack.”

This is very similar to what we can do manually through v$wait_chains. But is quicker and ‘oficial’, so let’s use! 😀

But before I show how you can do it, it’s important to mention that Oracle does not recommend you to use ‘numeric events’ without a SR (MOS), according to Note: 75713.1.

So, how to do it? Basically 2 ways:

1) ALTER SESSION SET EVENTS 'immediate trace name HANGANALYZE level LL'; OR EVENT="60 trace name HANGANALYZE level 5"
2) ORADEBUG hanganalyze LL


I prefer to use ORADEBUG on database server if possible, regarding you already are with some hanging:

sqlplus / as sysdba
oradebug setmypid;
oradebug unlimit;
oradebug hanganalyze LL

For example, connected with sqlplus / as sysdba:

SQL> oradebug setmypid;
Statement processed.
SQL> oradebug unlimit;
Statement processed.
SQL> oradebug hanganalyze 3
Hang Analysis in /db/oracle/diag/rdbms/greporadb/greporadb/trace/greporadb_ora_2096.trc

What this ‘level’ means?

Level Description Comment
1 Very minimal output Could be useful…
2 Minimal output Useful for some cases…
3 Dump only processes thought to be in a hang Most common level
4 Dump leaf nodes in wait chains You really need this info?
5 Dump all processes involved in wait chains can be a lot!
6 Dump errorstacks of processes involved in wait chains can be high overhead
10 Dump all processes not a good idea…

But take care! Using too high a level will cause lots of processes to be asked to dump their stack. This can be very expensive…
In summary, Remember the Note: 75713.1!

If you have a RAC?

SQL> oradebug setmypid
SQL> oradebug unlimit
SQL> oradebug setinst all 
SQL> oradebug -g def hanganalyze LL

OR

SQL> oradebug setmypid
SQL> oradebug unlimit 
SQL> oradebug -g all hanganalyze LL

Nice, and how hanganalize looks like?
Here it goes an Oracle’s example of output:

    ==============
    HANG ANALYSIS:
    ==============
    Open chains found:
   >>   This process (below) is running
    Chain 1 :  :
        
   >>   Below is a wait chain. Sid 16 waits for Sid 17
    Chain 2 :  :
        
     -- 
    Other chains found:
    Chain 3 :  :
        
    Extra information that will be dumped at higher levels:
   >> This just shows which nodes would be dumped at each level
    [level  4] :   2 node dumps -- [LEAF] [LEAF_NW] [IGN_DMP]
    [level  5] :   2 node dumps -- [NLEAF]
    [level 10] :  10 node dumps -- [IGN]
    
    State of nodes
   >> All nodes are listed below. The "state" column shows the state
   >> that the session is in
    ([nodenum]/sid/sess_srno/session/state/start/finish/[adjlist]/predecessor):
   >> The first nodes are IGN (ignore)
    [0]/1/1/0x826f94c0/IGN/1/2//none
    [1]/2/1/0x826f9d2c/IGN/3/4//none
    [2]/3/1/0x826fa598/IGN/5/6//none
    [3]/4/1/0x826fae04/IGN/7/8//none
    [4]/5/1/0x826fb670/IGN/9/10//none
    [5]/6/1/0x826fbedc/IGN/11/12//none
    [6]/7/1049/0x826fc748/IGN/13/14//none
    [7]/8/1049/0x826fcfb4/IGN/15/16//none
    [8]/9/1049/0x826fd820/IGN/17/18//none
    [9]/10/1049/0x826fe08c/IGN/19/20//none
   >> Below are LEAF nodes in various states
    [12]/13/158/0x826ff9d0/LEAF_NW/21/22//none
    [15]/16/416/0x82701314/NLEAF/23/26/[16]/none
    [16]/17/941/0x82701b80/LEAF/24/25//15
    [17]/18/344/0x827023ec/NLEAF/27/28/[16]/none
   >> You are told which processes are being dumped
   >> They will dump errorstacks to their own trace files
    Dumping System_State and Fixed_SGA in process with ospid 18668
    Dumping Process information for process with ospid 18656
    Dumping Process information for process with ospid 18658
    ...
    ================================
    PROCESS DUMP FROM HANG ANALYZER:
    ================================
   >> This process dumps its errorstack and processstate. 
   >> See  for details of this informaiton
    ----- Call Stack Trace -----
    calling              call     entry             
     ...
    ======================================
    END OF PROCESS DUMP FROM HANG ANALYZER
    ======================================
    ====================
    END OF HANG ANALYSIS
    ====================

And about Node States?

State Meaning
IGN Ignore
LEAF A waiting leaf node
LEAF_NW A running (using CPU?) leaf node
NLEAF An element in a chain but not at the end (not a leaf)

Cool, right?
There is a very useful tool to analyze chains of hanging. And also generate files that can be added to an SR, if needed.

There is an observetion in MOSC about “DUMP” word, let’me reproduce it:
“Note that in 11g+ the “ORADEBUG HANGANALYZE NN” form will also try to include SHORT_STACK dumps in the hanganalyze chains for level 3 and higher. Short stacks will NOT be included in event triggered HANGANALYZE (like from ALTER SESSION) nor from “ORADEBUG DUMP HANGANALYZE nn”, only from ORADEBUG HANGANALYZE nn (no DUMP keyword).”

OK, but I’m in a hang situation, what if a can’t loging as sysdba in my database?
This case, wait the next week post. There is a very useful kludge. 😉

# KB:
Troubleshooting Database Hang Issues (Doc ID 1378583.1)
How to Collect Diagnostics for Database Hanging Issues (Doc ID 452358.1)
Troubleshooting Database Contention With V$Wait_Chains (Doc ID 1428210.1)
EVENT: HANGANALYZE – Reference Note (Doc ID 130874.1)
Important Customer information about using Numeric Events (Doc ID 75713.1)

Matheus.

3 Comments

  1. Pingback: SYSTEMSTATE DUMP – |GREP ORA

  2. Pingback: ASHDUMP for Instance Crash/Hang ‘Post Mortem’ Analysis – |GREP ORA

  3. Pingback: HANGANALYZE Part 2 – |GREP ORA

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.