Hi all!
I realized I have some posts about database hangs but have no posts about hanganalyze, system state or ashdump usage. So let’s fix it. 🙂
To organize the ideas I’m going to split the subject on three posts. This first will be about hanganalyse.
See the second part of this post here: HANGANALIZE Part 2.
Ok, so let me refer the most clear Oracle words I could found:
“Hanganalyze tries to work out who is waiting for who by building wait chains, and then depending on the level will request various processes to dump their errorstack.”
This is very similar to what we can do manually through v$wait_chains. But is quicker and ‘oficial’, so let’s use! 😀
But before I show how you can do it, it’s important to mention that Oracle does not recommend you to use ‘numeric events’ without a SR (MOS), according to Note: 75713.1.
So, how to do it? Basically 2 ways:
1) ALTER SESSION SET EVENTS 'immediate trace name HANGANALYZE level LL'; OR EVENT="60 trace name HANGANALYZE level 5" 2) ORADEBUG hanganalyze LL
I prefer to use ORADEBUG on database server if possible, regarding you already are with some hanging:
sqlplus / as sysdba oradebug setmypid; oradebug unlimit; oradebug hanganalyze LL
For example, connected with sqlplus / as sysdba:
SQL> oradebug setmypid; Statement processed. SQL> oradebug unlimit; Statement processed. SQL> oradebug hanganalyze 3 Hang Analysis in /db/oracle/diag/rdbms/greporadb/greporadb/trace/greporadb_ora_2096.trc
What this ‘level’ means?
Level | Description | Comment |
1 | Very minimal output | Could be useful… |
2 | Minimal output | Useful for some cases… |
3 | Dump only processes thought to be in a hang | Most common level |
4 | Dump leaf nodes in wait chains | You really need this info? |
5 | Dump all processes involved in wait chains | can be a lot! |
6 | Dump errorstacks of processes involved in wait chains | can be high overhead |
10 | Dump all processes | not a good idea… |
But take care! Using too high a level will cause lots of processes to be asked to dump their stack. This can be very expensive…
In summary, Remember the Note: 75713.1!
If you have a RAC?
SQL> oradebug setmypid SQL> oradebug unlimit SQL> oradebug setinst all SQL> oradebug -g def hanganalyze LL
OR
SQL> oradebug setmypid SQL> oradebug unlimit SQL> oradebug -g all hanganalyze LL
Nice, and how hanganalize looks like?
Here it goes an Oracle’s example of output:
============== HANG ANALYSIS: ============== Open chains found: >> This process (below) is running Chain 1 : : >> Below is a wait chain. Sid 16 waits for Sid 17 Chain 2 : : -- Other chains found: Chain 3 : : Extra information that will be dumped at higher levels: >> This just shows which nodes would be dumped at each level [level 4] : 2 node dumps -- [LEAF] [LEAF_NW] [IGN_DMP] [level 5] : 2 node dumps -- [NLEAF] [level 10] : 10 node dumps -- [IGN] State of nodes >> All nodes are listed below. The "state" column shows the state >> that the session is in ([nodenum]/sid/sess_srno/session/state/start/finish/[adjlist]/predecessor): >> The first nodes are IGN (ignore) [0]/1/1/0x826f94c0/IGN/1/2//none [1]/2/1/0x826f9d2c/IGN/3/4//none [2]/3/1/0x826fa598/IGN/5/6//none [3]/4/1/0x826fae04/IGN/7/8//none [4]/5/1/0x826fb670/IGN/9/10//none [5]/6/1/0x826fbedc/IGN/11/12//none [6]/7/1049/0x826fc748/IGN/13/14//none [7]/8/1049/0x826fcfb4/IGN/15/16//none [8]/9/1049/0x826fd820/IGN/17/18//none [9]/10/1049/0x826fe08c/IGN/19/20//none >> Below are LEAF nodes in various states [12]/13/158/0x826ff9d0/LEAF_NW/21/22//none [15]/16/416/0x82701314/NLEAF/23/26/[16]/none [16]/17/941/0x82701b80/LEAF/24/25//15 [17]/18/344/0x827023ec/NLEAF/27/28/[16]/none >> You are told which processes are being dumped >> They will dump errorstacks to their own trace files Dumping System_State and Fixed_SGA in process with ospid 18668 Dumping Process information for process with ospid 18656 Dumping Process information for process with ospid 18658 ... ================================ PROCESS DUMP FROM HANG ANALYZER: ================================ >> This process dumps its errorstack and processstate. >> See for details of this informaiton ----- Call Stack Trace ----- calling call entry ... ====================================== END OF PROCESS DUMP FROM HANG ANALYZER ====================================== ==================== END OF HANG ANALYSIS ====================
And about Node States?
State | Meaning |
IGN | Ignore |
LEAF | A waiting leaf node |
LEAF_NW | A running (using CPU?) leaf node |
NLEAF | An element in a chain but not at the end (not a leaf) |
Cool, right?
There is a very useful tool to analyze chains of hanging. And also generate files that can be added to an SR, if needed.
There is an observetion in MOSC about “DUMP” word, let’me reproduce it:
“Note that in 11g+ the “ORADEBUG HANGANALYZE NN” form will also try to include SHORT_STACK dumps in the hanganalyze chains for level 3 and higher. Short stacks will NOT be included in event triggered HANGANALYZE (like from ALTER SESSION) nor from “ORADEBUG DUMP HANGANALYZE nn”, only from ORADEBUG HANGANALYZE nn (no DUMP keyword).”
OK, but I’m in a hang situation, what if a can’t loging as sysdba in my database?
This case, wait the next week post. There is a very useful kludge. 😉
# KB:
Troubleshooting Database Hang Issues (Doc ID 1378583.1)
How to Collect Diagnostics for Database Hanging Issues (Doc ID 452358.1)
Troubleshooting Database Contention With V$Wait_Chains (Doc ID 1428210.1)
EVENT: HANGANALYZE – Reference Note (Doc ID 130874.1)
Important Customer information about using Numeric Events (Doc ID 75713.1)
Matheus.
Pingback: SYSTEMSTATE DUMP – |GREP ORA
Pingback: ASHDUMP for Instance Crash/Hang ‘Post Mortem’ Analysis – |GREP ORA
Pingback: HANGANALYZE Part 2 – |GREP ORA