OEM 12c Agent: java.lang.OutOfMemoryError: Java heap space

Hi all,

So I got this error from OEM in a client. When connecting noticed OEM agent was down.

From logs, it failed to restart because of error “java.lang.OutOfMemoryError: Java heap space“.

I tried to run clearstate but got same error.

Searched on MOS – found MOS Duplicate 1952593.1: EM12c: emctl start agent Fails With Target Interaction Manager failed at Startup java.lang.OutOfMemoryError: Java heap space reported in gcagent_errors.log (Doc ID 1902124.1)

Solution seems to be moving /agent_inst/sysman/emd/state/ content to another location before running clearstate.

After doing so, all worked fine! So if you are facing the same, try this out!

oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/bin$ ./emctl status agent
Oracle Enterprise Manager Cloud Control 12c Release 4
Copyright (c) 1996, 2014 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent is Not Running
oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/bin$ cd ..
oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst$ cd sysman
oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman$ cd log/
oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman/log$ ls -lrt | tail
-rw-r----- 1 oracle dba 4096 Jan 9 09:47 emagent.nohup.lr
-rw-r----- 1 oracle dba 11165 Jan 9 09:48 startup.info
-rw-r----- 1 oracle dba 153432 Jan 9 09:48 gcagent_errors.log
-rw------- 1 oracle dba 540010788 Jan 9 09:48 heapDump_7.hprof
-rw-r----- 1 oracle dba 4687452 Jan 9 09:48 gcagent.log
-rw-r----- 1 oracle dba 1746 Jan 9 09:48 agabend.log
-rw-r----- 1 oracle dba 2863 Jan 9 09:48 gcagent_pfu.log
-rw-r----- 1 oracle dba 84308 Jan 9 09:48 emagent.nohup
-rw-r----- 1 oracle dba 762142 Jan 9 10:06 emdctlj.log
-rw-r----- 1 oracle dba 121782 Jan 9 10:06 emctl.log
oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman/log$ date
Monday, January 9, 2020 10:06:24 AM PST
oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman/log$ tail gcagent.log
2020-01-09 09:48:50,875 [32:F9C26A76] INFO - *jetty*: Graceful shutdown SslSelectChannelConnector@0.0.0.0:3872
2020-01-09 09:48:50,877 [32:F9C26A76] INFO - *jetty*: Graceful shutdown ContextHandler@63b5a40a@63b5a40a/emd/lifecycle/main,null
2020-01-09 09:48:50,877 [32:F9C26A76] INFO - *jetty*: Graceful shutdown HTTPLifecycleHandler@7284aa02
2020-01-09 09:48:50,877 [32:F9C26A76] INFO - *jetty*: Graceful shutdown ContextHandler@5dac13d7@5dac13d7/emd/main,null
2020-01-09 09:48:50,877 [32:F9C26A76] INFO - *jetty*: Graceful shutdown HTTPRequestHandler@52a34783
2020-01-09 09:48:50,877 [32:F9C26A76] INFO - *jetty*: Graceful shutdown ServletContextHandler@201d592a@201d592a/emd/browser,null
2020-01-09 09:48:50,878 [32:F9C26A76] INFO - *jetty*: Graceful shutdown ContextHandler@19a9bea3@19a9bea3/emd/persistence/main,null
2020-01-09 09:48:50,878 [32:F9C26A76] INFO - *jetty*: Graceful shutdown HTTPAgentPersistenceHandler@3d89acb5
2020-01-09 09:48:50,878 [32:F9C26A76] INFO - *jetty*: Graceful shutdown ContextHandler@5722cc7e@5722cc7e/,null
2020-01-09 09:48:53,932 [32:F9C26A76] INFO - *jetty*: Shutdown hook complete
oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman/log$ tail gcagent_errors.log
at oracle.sysman.gcagent.target.interaction.execution.Threshold.initThreshold(Threshold.java:4132)
at oracle.sysman.gcagent.target.interaction.execution.TargetInteractionMgr.init(TargetInteractionMgr.java:225)
at oracle.sysman.gcagent.target.interaction.execution.TargetInteractionMgr.tmNotifier(TargetInteractionMgr.java:171)
at oracle.sysman.gcagent.tmmain.lifecycle.TMComponentSvc.invokeNotifier(TMComponentSvc.java:998)
at oracle.sysman.gcagent.tmmain.lifecycle.TMComponentSvc.invokeInitializationStep(TMComponentSvc.java:1083)
at oracle.sysman.gcagent.tmmain.lifecycle.TMComponentSvc.doInitializationStep(TMComponentSvc.java:916)
at oracle.sysman.gcagent.tmmain.lifecycle.TMComponentSvc.notifierDriver(TMComponentSvc.java:812)
at oracle.sysman.gcagent.tmmain.TMMain.startup(TMMain.java:256)
at oracle.sysman.gcagent.tmmain.TMMain.agentMain(TMMain.java:557)
at oracle.sysman.gcagent.tmmain.TMMain.main(TMMain.java:546)
oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman/log$ less gcagent_errors.log
oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman/log$ cd ..
oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman$ cd config/
oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman/config$ ls -lrt | tail
-rw-r----- 1 oracle dba 8001 May 13 2016 b64LocalCertificate.txt
-rw-r----- 1 oracle dba 17893 May 13 2016 b64InternetCertificate.txt
-rw-r----- 1 oracle dba 7833 May 13 2016 emd.properties.2016_05_13_10_05_31
-rw------- 1 oracle dba 499 May 20 2016 s_jvm_options.opt.save
-rw-r----- 1 oracle dba 8202 Sep 26 01:20 emd.properties.bkp
-rw-r----- 1 oracle dba 8204 Jan 9 09:46 emd.properties.bak
-rw-r----- 1 oracle dba 156 Jan 9 09:46 private.properties.bak
-rw-r----- 1 oracle dba 8204 Jan 9 09:48 emd.properties
-rw-r----- 1 oracle dba 266 Jan 9 09:48 autotune.properties
-rw-r----- 1 oracle dba 156 Jan 9 09:48 private.properties
oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman/config$ date
Monday, January 9, 2020 10:07:56 AM PST
oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman/config$ n/emctl status agent
Oracle Enterprise Manager Cloud Control 12c Release 4
Copyright (c) 1996, 2014 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent is Not Running
oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman/config$ which emctl
/db/oracle/product/agent12c/12.1.0.4/agent_inst/bin/emctl
oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman/config$ e agent; emctl start agent

Oracle Enterprise Manager Cloud Control 12c Release 4 Copyright (c) 1996, 2014 Oracle Corporation. All rights reserved. EMD clearstate failed: Offline clearstate failed : java.lang.OutOfMemoryError: Java heap space Oracle Enterprise Manager Cloud Control 12c Release 4 Copyright (c) 1996, 2014 Oracle Corporation. All rights reserved. Starting agent ...........................................................................................................................failed. Consult emctl.log and emagent.nohup in: /db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman/log oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman/config$ oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman/config$ cd .. oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman$ cd emd oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman/emd$ mkdir state-20200109 oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman/emd$ mv state/* state\-20200109/ oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman/emd$ du -hs state* 1K state 166M state-20200109 oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman/emd$ emctl clearstate agent; emctl sta
Oracle Enterprise Manager Cloud Control 12c Release 4
Copyright (c) 1996, 2014 Oracle Corporation. All rights reserved.
EMD clearstate completed successfully
Oracle Enterprise Manager Cloud Control 12c Release 4
Copyright (c) 1996, 2014 Oracle Corporation. All rights reserved.
Starting agent ......................... started.
oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman/emd$
oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman/emd$ emctl status agent
Oracle Enterprise Manager Cloud Control 12c Release 4
Copyright (c) 1996, 2014 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent Version : 12.1.0.4.0
OMS Version : 12.1.0.4.0
Protocol Version : 12.1.0.1.0
Agent Home : /db/oracle/product/agent12c/12.1.0.4/agent_inst
Agent Log Directory : /db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman/log
Agent Binaries : /db/oracle/product/agent12c/12.1.0.4/core/12.1.0.4.0
Agent Process ID : 27885
Parent Process ID : 27848
Agent URL : https://x0319vp114.nordstrom.net:3872/emd/main/
Local Agent URL in NAT : https://x0319vp114.nordstrom.net:3872/emd/main/
Repository URL : https://oemcloud.nordstrom.net:4900/empbs/upload
Started at : 2020-01-09 10:15:39
Started by user : oracle
Operating System : SunOS version 5.11 (sparcv9)
Last Reload : (none)
Last successful upload : 2020-01-09 10:17:35
Last attempted upload : 2020-01-09 10:17:35
Total Megabytes of XML files uploaded so far : 0.59
Number of XML files pending upload : 1
Size of XML files pending upload(MB) : 0
Available disk space on upload filesystem : 50.38%
Collection Status : Collections enabled
Heartbeat Status : Ok
Last attempted heartbeat to OMS : 2020-01-09 10:16:54
Last successful heartbeat to OMS : 2020-01-09 10:16:54
Next scheduled heartbeat to OMS : 2020-01-09 10:17:54

---------------------------------------------------------------
Agent is Running and Ready
oracle@dbserver:/db/oracle/product/agent12c/12.1.0.4/agent_inst/sysman/emd$

 

ORA-04030: out of process memory when trying to allocate % bytes (kkoutlCreatePh,ub1 : kkoabr)

Hi all,

So, I started seen this on a client environment. Researching on the  case after no crear reference on MOS, I noticed some high PGA allocation as per below.

SYS@proddb AS SYSDBA PROD> select pid, serial#,category, allocated/1024/1024 MB, used/1024/1024 MB_used, max_allocated/1024/1024 MB_MAX_ALLOCATED_ON_PGA
2 from v$process_memory where pid=852;

PID SERIAL# CATEGORY MB MB_USED MB_MAX_ALLOCATED_ON_PGA
---------- ---------- --------------- ---------- ---------- -----------------------
852 91 SQL .086807251 .00806427 .672416687
852 91 PL/SQL .087730408 .078926086 .126182556
852 91 Freeable .5625 0
852 91 Other 2.25187302 2.25187302

Seems a match MOS ORA-04030 Error With High “kkoutlCreatePh” (Doc ID 1618444.1).

The solution? Simply disabled the following parameter:

"_b_tree_bitmap_plans"=false

 

Trace for additional info:

Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/12.1.0/db_1
System name: Linux
Node name: proddb.local
Release: 4.1.12-94.8.3.el7uek.x86_64
Version: #2 SMP Wed Apr 25 19:57:32 PDT 2018
Machine: x86_64
Instance name: proddb
Redo thread mounted by this instance: 1
Oracle process number: 854
Unix process pid: 28899, image: oracle@proddb.local

*** 2019-01-09 11:20:09.130
*** SESSION ID:(5429.55092) 2019-01-09 11:20:09.130
*** CLIENT ID:() 2019-01-09 11:20:09.130
*** SERVICE NAME:(XXXXX) 2019-01-09 11:20:09.130
*** MODULE NAME:(JDBC Thin Client) 2019-01-09 11:20:09.130
*** CLIENT DRIVER:(jdbcthin) 2019-01-09 11:20:09.130
*** ACTION NAME:() 2019-01-09 11:20:09.130

[TOC00000]
Jump to table of contents
Dump continued from file: /u01/app/oracle/diag/rdbms/proddb/proddb/trace/proddb_ora_28899.trc
[TOC00001]
ORA-04030: out of process memory when trying to allocate 34392040 bytes (kkoutlCreatePh,ub1 : kkoabr)

[TOC00001-END]
[TOC00002]
========= Dump for incident 2710965 (ORA 4030) ========
[TOC00003]
----- Beginning of Customized Incident Dump(s) -----
=======================================
TOP 10 MEMORY USES FOR THIS PROCESS
---------------------------------------
72% 18 GB, 1356 chunks: "free memory " SQL
kxs-heap-c ds=0x7fa8df330220 dsprt=0x7fa8df49dbe0
27% 6809 MB, 3366 chunks: "permanent memory " SQL
kxs-heap-c ds=0x7fa8df330220 dsprt=0x7fa8df49dbe0
0% 100 MB, 894 chunks: "permanent memory " SQL
kkoutlCreatePh ds=0x7fa8cce16708 dsprt=0x7fa8df330220
0% 23 MB, 589515 chunks: "chedef : qcuatc "
TCHK^2a9688d9 ds=0x7fa8df33feb8 dsprt=0x7fa8df49c9e0
0% 18 MB, 150124 chunks: "opndef: qcopCreateOpnViaM "
TCHK^2a9688d9 ds=0x7fa8df33feb8 dsprt=0x7fa8df49c9e0
0% 16 MB, 214829 chunks: "logdef: qcopCreateLog "
TCHK^2a9688d9 ds=0x7fa8df33feb8 dsprt=0x7fa8df49c9e0
0% 11 MB, 241 chunks: "free memory "
top call heap ds=0x7fa8df49dbe0 dsprt=(nil)
0% 9643 KB, 4623 chunks: "qkkele " SQL
kxs-heap-c ds=0x7fa8df330220 dsprt=0x7fa8df49dbe0
0% 6534 KB, 58578 chunks: "optdef: qcopCreateOptInte "
TCHK^2a9688d9 ds=0x7fa8df33feb8 dsprt=0x7fa8df49c9e0
0% 4134 KB, 15399 chunks: "kccdef : qcsvwsci "
TCHK^2a9688d9 ds=0x7fa8df33feb8 dsprt=0x7fa8df49c9e0

=======================================
PRIVATE MEMORY SUMMARY FOR THIS PROCESS
---------------------------------------
******************************************************
PRIVATE HEAP SUMMARY DUMP
25 GB total:
25 GB commented, 794 KB permanent
12 MB free (0 KB in empty extents),
24 GB, 1 heap: "kxs-heap-c " 67 KB free held
------------------------------------------------------
Summary of subheaps at depth 1
25 GB total:
203 MB commented, 6809 MB permanent
18 GB free (0 KB in empty extents),
6398 MB, 9243 chunks: "allocator state " 6398 MB free held
3758 MB, 4623 chunks: "qkkele " 3749 MB free held
3650 MB, 4623 chunks: "qkkkey " 3650 MB free held

=========================================
REAL-FREE ALLOCATOR DUMP FOR THIS PROCESS
-----------------------------------------

Dump of Real-Free Memory Allocator Heap [0x7fa8df317000]
mag=0xfefe0001 flg=0x5000003 fds=0x0 blksz=65536
blkdstbl=0x7fa8df317010, iniblk=524288 maxblk=524288 numsegs=318
In-use num=2965 siz=641597440, Freeable num=0 siz=0, Free num=254 siz=3586195456

==========================================
INSTANCE-WIDE PRIVATE MEMORY USAGE SUMMARY
------------------------------------------

Dumping Work Area Table (level=1)
=====================================

Global SGA Info
---------------

global target: 102400 MB
auto target: 62376 MB
max pga: 2048 MB
pga limit: 4096 MB
pga limit known: 0
pga limit errors: 0

pga inuse: 33104 MB
pga alloc: 35238 MB
pga freeable: 1225 MB
pga freed: 433664026 MB
pga to free: 0 %
broker request: 0

pga auto: 12 MB
pga manual: 0 MB

pga alloc (max): 35238 MB
pga auto (max): 12284 MB
pga manual (max): 2 MB

# workareas : 0
# workareas(max): 551

Oracle memory usage on Linux / Unix

Hi all,

So one of the most important things that we need to do when setting up a new server or checking the capacity of the server is to see how much memory Oracle is using.

When checking the capacity there are some practical things that always help me to get a fast glimpse of the system:

  • When opening topas and hitting M you will see this below
Topas Monitor for host: SERVER1 Interval: 2 Sat Dec 8 03:39:59 2019
================================================================================
REF1 SRAD TOTALMEM INUSE FREE FILECACHE HOMETHRDS CPUS
--------------------------------------------------------------------------------
0 0 59.8G 59.6G 212.3 16.3G 528 0-15
1 1 61.4G 61.2G 188.8 15.7G 536 16-31

On the memory session you will see 3 categories, INUSE, FREE and FILECACHE. There you may see what is being using for what but there is not much granularity there.

  • When using top you have this summary below
top - 11:48:08 up 119 days, 10:18, 1 user, load average: 26.76, 26.16, 25.95
Tasks: 1936 total, 38 running, 1898 sleeping, 0 stopped, 0 zombie
Cpu(s): 79.3%us, 1.1%sy, 0.0%ni, 15.1%id, 4.3%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 263750172k total, 219075656k used, 44674516k free, 797476k buffers
Swap: 16773116k total, 505760k used, 16267356k free, 88055108k cached

Same you have a high level usage. So here comes the question:

How are you to prove that you have a memory shortage?

I often use vmstat on Linux looking on the columns si and so equals to 0 (swap in and swap out) and when the free command, the free column you will also have no or very low swap being used

/home/oracle> vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
15 3 505760 44896608 797480 88062288 0 0 7037 1304 0 0 29 2 61 8 0
16 1 505760 44922964 797480 88062320 0 0 432272 144314 38784 31348 41 2 52 5 0
14 2 505760 44943072 797480 88062320 0 0 468904 155424 32676 27522 34 1 60 5 0
15 2 505760 44943032 797480 88062328 0 0 431032 144275 32596 27469 34 1 60 5 0
15 2 505760 44920136 797480 88062352 0 0 396232 145052 30772 26657 32 1 62 6 0
19 1 505760 44928576 797480 88062360 0 0 429360 160158 33640 28012 36 1 58 5 0
15 3 505760 44935340 797480 88062368 0 0 477232 161849 28393 21423 41 1 53 5 0
17 1 505760 44924744 797480 88062368 0 0 515265 160212 27478 20578 40 1 54 5 0
16 1 505760 44921596 797480 88062368 0 0 495408 159304 25458 19548 37 1 58 5 0
18 1 505760 44918144 797480 88062384 0 0 552880 168895 28203 22774 38 1 56 5 0
15 2 505760 44922344 797480 88062392 0 0 546920 160463 25321 19151 37 1 58 5 0
16 4 505760 44921544 797480 88062400 0 0 571544 153810 25429 20011 36 1 58 5 0
16 1 505760 44919620 797480 88062400 0 0 577552 160004 27132 20111 40 1 54 5 0
19 2 505760 44360240 797480 88062400 0 0 584969 155553 29467 22145 41 2 52 5 0
/home/oracle> free
total used free shared buffers cached
Mem: 263750172 219060896 44689276 91608 797480 88062464
-/+ buffers/cache: 130200952 133549220
Swap: 16773116 505760 16267356

To check a process specific memory usage (RSS) I often use ps along with other commands to calculate the process memory for a specific process id as below:

/home/oracle> ps -eo rss,pid,euser,lstart,args:100 --sort %mem | grep -v grep | grep 35796 | awk '{printf $1/1024 "MB"; $1=""; print }'| sort
19.6016MB 35796 oracle Sat Sep 8 02:43:54 2018 ora_lg00_ORC1
34.957MB 32340 oracle Sat Jan 5 11:50:09 2019 oracleORC1 (LOCAL=NO)

RSS is resident memory, but when comes to shared memory like the Oracle SGA the methods above could be miss leading – not to say wrong – but as Oracle memory is shared we may see double counting on the results. I sometimes use pmap to check a process memory as well when available

/home/oracle> pmap 35796
35796: ora_lg00_ORC1
total 0K

But, still when checking a server wide scope, do you want to keep doing manual work and lots of math? I don’t think so. 🙂

That’s why when I came across SMEM made my life a lot easier. It is a python script which gives you a nice breakdown of the memory usage and without the miss leading double counting.

You can see the commands and processes and their memory:

[root@srv01 smem-1.4]# ./smem -trk | head
PID User Command Swap USS PSS RSS
4829 root /opt/stackdriver/collectd/s 444.0K 4.0G 4.0G 4.0G
5647 oracle asm_gen0_+ASM 50.1M 424.4M 425.0M 437.8M
16512 oracle rman software/product/11.2. 0 172.9M 173.7M 177.8M
85107 oracle ora_n001_db01 42.3M 147.8M 147.8M 185.8M
85103 oracle ora_n000_db01 42.4M 146.5M 146.6M 184.6M
85109 oracle ora_n002_db01 42.2M 145.6M 145.6M 183.5M
85111 oracle ora_n003_db01 42.1M 145.1M 145.2M 183.1M
7287 oracle ora_dia0_db01 1.6M 68.6M 68.8M 107.8M

As well the overall server per user:

root@srv01 smem-1.4]# ./smem -turk 
User Count Swap USS PSS RSS oracle 1358 4.8G 7.8G 8.0G 76.6G 
root 43 12.0M 4.1G 4.1G 4.2G user1 10 0 321.0M 328.0M 369.2M 
nobody 2 96.0K 2.1M 2.3M 6.0M user2 2 0 684.0K 1.7M 7.7M 
user4 2 0 632.0K 1.7M 7.9M user4 1 72.0K 536.0K 540.0K 2.1M 
ntp 1 424.0K 332.0K 368.0K 2.4M 
smmsp 1 1.3M 160.0K 298.0K 1.9M 
rpc 1 336.0K 68.0K 73.0K 1.7M 
rpcuser 1 808.0K 4.0K 16.0K 1.9M 
--------------------------------------------------- 
1422 4.8G 12.2G 12.5G 81.3G

Hope it helps, see you next time!

OEM after a Maintenance: A memory component is suspected of causing a fault with a 100% certainty. Component Name : % Fault class : fault.memory.intel.dimm_ce

Hi all!
So, I had this message from a memory component in my Exadata:

Message=A memory component is suspected of causing a fault with a 100% certainty. Component Name : /SYS/MB/P0/D3 Fault class : fault.memory.intel.dimm_ce

But this was right after a maintenance on server. Checking on ILOM:

-> show /SYS/MB/P0/D3

 /SYS/MB/P0/D3
    Targets:
        PRSNT
        SERVICE

    Properties:
        type = DIMM
        ipmi_name = MB/P0/D3
        fru_name = 16384MB DDR4 SDRAM DIMM
        fru_manufacturer = Samsung
        fru_part_number = %
        fru_rev_level = 01
        fru_serial_number = %
        fault_state = OK
        clear_fault_action = (none)

Checking on CellCLI alert history:

CellCLI> list alerthistory detail

	 name:                   13_1
	 alertDescription:       "A memory component suspected of causing a fault"
	 alertMessage:           "A memory component is suspected of causing a fault with a 100% certainty.  Component Name : /SYS/MB/P0/D3  Fault class    : fault.memory.intel.dimm_ce  Fault message  : http://support.oracle.com/msg/SPX86A-8002-XM"
	 alertSequenceID:        13
	 alertShortName:         Hardware
	 alertType:              Stateful
	 beginTime:              %
	 endTime:                %
	 examinedBy:             
	 metricObjectName:       /SYS/MB/P0/D3_FAULT
	 notificationState:      1
	 sequenceBeginTime:      %
	 severity:               critical
	 alertAction:            "For additional information, please refer to http://support.oracle.com/msg/SPX86A-8002-XM Automatic Service Request has been notified with Unique Identifier: %.  Diagnostic package is attached. It is also accessible at % It will be retained on the storage server for 7 days. If the diagnostic package has expired, then it can be re-created at %"

Hm… Let’s read the MOS: SPX86A-8002-XM – Memory Correctable ECC (Doc ID 1615285.1)

Suggested Action for System Administrator

Replace the faulty memory DIMM at the earliest possible convenience.”

Hmm… But as I said, this was right after a maintenance on server, what if this is related?
Ok, some additional piece of information:

-> version 
SP firmware 3.2.9.23 
SP firmware build number: 116695 
SP firmware date: Thu Mar 30 11:38:01 CST 2017 
SP filesystem version: 0.2.10

At the current firmware level of SP firmware 3.2.9.23 the memory correctable error threshold limit for DIMM replacement is 240 CEs in a 72 hrs period.

So, the suggestion is:
– Clear all the error messages after complete the maintenance and lets check if the threshold is reached again. If so, we may need to really replace it.

How to do it? Easy:

ssh root@grepora01-ilom
-> show /SYS/MB/P0/D3
Expected:
[...]
fault_state = Faulted
[..]
-> set /SYS/MB/P0/D3 clear_fault_action=true
Are you sure you want to clear /SYS/MB/P0/D3 (y/n)? y
-> show /SYS/MB/P0/D3
[Expected]
 /SYS/MB/P0/D3
    Targets:
        PRSNT
        SERVICE
Properties:
type = DIMM
ipmi_name = MB/P0/D3
fru_name = 16384MB DDR4 SDRAM DIMM
fru_manufacturer = Samsung
fru_part_number = %
fru_rev_level = 01
fru_serial_number = %
 fault_state = OK
clear_fault_action = (none)

Hope it helps!
Cheers!

“TNS-12531: TNS:cannot allocate memory error” – Are you sure, Oracle?

Hey guys!
So, I was working on a server build and everything was running fine until I tried to start the listerner. The process hang on “Starting /u01/app/grid/product/12.1.0/grid/bin/tnslsnr: please wait…” and then raised TNS-12531: TNS:cannot allocate memory error.

Well 1st thing, looked the error up using orerr:

TNS-12531: TNS: cannot allocate memory
Cause: Sufficient memory could not be allocated to perform the desired activity.
Action: Either free some resource for TNS, or add more memory to the machine. For further details, turn on tracing and re-execute the operation.

Should be simple right? Well, not in this case. The server had plenty of resources and not even the database was up yet so over 90% of the server memory was free.

Checked all sort of things when I started to check the server network configuration.
Looking up found that the server will through this error also when the hostname definition is different from what is resolved by the /etc/hosts file.

Once those matched, volià, listener started successfully.

Not the memory right? Oracle and its tricks…

That kept me bugging so I found this article, which shows a trace of the listener with a bit more information.

Hope this can save you some minutes on troubleshooting.

Until next time!

AIX: “WARNING: Heavy swapping observed on system in last 5 mins.”

Quick one today!

Having message below in your 11.2.0.3 on AIX like this?

WARNING: Heavy swapping observed on system in last 5 mins. 
pct of memory swapped in [31.28%] pct of memory swapped out [3.81%]. 
Please make sure there is no memory pressure and the SGA and PGA are configured correctly. 
Look at DBRM trace file for more details.

Stand down, this issue is caused by unpublished Bug 11801934, mentioned in MOS False Swap Warning Messages Printed To Alert.log On AIX (Doc ID 1508575.1).

Basically happens because the v$osstat does not reflect proper stats for the swap space paging.

So, stay calm and see you next week!