OEM13c: Regions that display real-time data will not be displayed. Exception while loading RAC Database Home Page: null

Hi all,
So I was having this issue from a Database Home page on OEM 13c:

image_OEMBug.png

Actually, OMS log was presenting several null pointer exceptions… So, whats is the deal?

Everything seemed to match to MOS Bug 22957131 – OEM13C: Exception while loading RAC Database Home Page: null.

The solution?
– Patch 25197714 for the EM 13.2 OMS
– Patch 25155095 for the EM 13.1 OMS

Also, those fixes are included on following Boudle Patches:
– 13.2.1.0.161231
– 13.1.1.0.161220

Applied the patch and solved my case. Hope it helps you!

More Reference:
– EM 13C: Target Database Home Page Displays Message in Enterprise Manager 13c Cloud Control: Regions that display real-time data will not be displayed. Exception while loading RAC Database Home Page: null (Doc ID 2210123.1)
– Note 2219797.1 Enterprise Manager 13.2 Master Bundle Patch List
– Note 2124038.1 Enterprise Manager 13.1 Master Bundle Patch List for the Management Agent and Plug-ins

OEM Metric “Memory Utilization” Different on 12c and 13c

So, as rollout strategy we created a new OEM13c to decommission a 12c. However during the testes, noticed Memory Utilization metric was a lot different between 12c and 13c. Why?

Happens that the Memory Utilization is calculated differently between 12c and 13c, but also seems 13c is more accurate, as per MOS The Host Memory Utilization Percentage Calculation in Enterprise Manager Cloud Control (Doc ID 1908853.1)

Well, those who are familiar with memory use computations in the operating system might become confused when examining the memory use metric data from Enterprise Manager 12c and 13c Cloud Control. Metrics such as Memory Utilization (%) do not have an equivalent in the OS, but OS data will be used in its derivation.

This is the formula used by Enterprise Manager 12.1.0.3 for Linux Memory Utilization (%), for example:

Memory Utilization (%) = (100.0 * (activeMem) / realMem)
 = 100 * 25046000/99060536
 = 25.28
EM Shows : 25.5

* On this, activeMem is Active Memory (Active), and realMem is Total Memory (MemTotal).

Comparing this with MemFree, which is not valid, might provide an impression that utilization is not being accurately represented.

Also, the “OEM13c value” was already collected in OEM12c, but under metric name “Used Logical Memory”. And basically “Memory Utilization” in 12c uses “activeMem” instead of “realMem-(freeMem+Buffers+Cached)”. As per image below.

OEM12_grep_mem

The formula in place on 13c is exactly the same as used to fix MOS EM 13c: Incorrect Memory Utilization Reported for Linux Hosts in Enterprise Manager 13.1.0.0.0 Cloud Control (Doc ID 2144976.1)

Example:

[root@greporasrv ~]# free
             total       used       free     shared    buffers     cached
Mem:     264087460  257669460    6418000    7657500     461088   11008128
-/+ buffers/cache:  246200244   17887216
Swap:     25165820    3365104   21800716

(100.0 * (realMem-(freeMem+Buffers+Cached)) / realMem)
100*(264087460-(6418000+461088+11008128))/264087460) = 93,22678328

As per OEM13c:

OEM13_grep_mem.jpg

Also, by checking on server using SAR, seems value in OEM 13c is more accurate, indeed:

[root@greporasrv ~]# sar -r
Linux 2.6.39-400.294.4.el6uek.x86_64 (greporasrv) 	08/29/2017 	_x86_64_	(44 CPU)

12:00:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit
12:10:01 AM   5377540 258709920     97.96    719080  10775828  83876744     29.00
12:20:01 AM   6131220 257956240     97.68    719504  10721084  82467712     28.51
12:30:01 AM   5623060 258464400     97.87    719700  10720972  83456216     28.85
12:40:01 AM   5606572 258480888     97.88    719836  10779108  83228440     28.77
12:50:01 AM   5783256 258304204     97.81    719860  10848644  82925908     28.67
01:00:01 AM   4151148 259936312     98.43    719888  11589048  84400040     29.18
01:10:01 AM   3717000 260370460     98.59    719904  11534336  84838784     29.33
01:20:01 AM   4282412 259805048     98.38    720164  11480792  84047568     29.06
01:30:01 AM   4473128 259614332     98.31    720184  11483604  83857348     28.99
01:40:01 AM   5113136 258974324     98.06    720256  11528492  83036284     28.71
01:50:01 AM   4971036 259116424     98.12    720284  11587956  82955128     28.68
02:00:01 AM   4026540 260060920     98.48    720344  11663184  86489692     29.90
02:10:01 AM   4312916 259774544     98.37    720380  11678316  83834592     28.98
02:20:01 AM   5058980 259028480     98.08    720408  11624028  82876972     28.65
02:30:01 AM   4609908 259477552     98.25    720556  11541392  83871244     29.00
02:40:01 AM   5020668 259066792     98.10    720592  11574912  82887808     28.66
02:50:01 AM   5175916 258911544     98.04    720748  11619572  82725252     28.60
03:00:01 AM   4701236 259386224     98.22    720780  11687100  83421624     28.84
03:10:01 AM   4757976 259329484     98.20    721204  11648864  83298716     28.80
03:20:01 AM   4485280 259602180     98.30    721248  11719272  83299472     28.80
03:30:01 AM   4267068 259820392     98.38    721264  11794688  83683344     28.93
03:40:01 AM   4080264 260007196     98.45    721404  11856796  83863540     28.99
03:50:01 AM   4864276 259223184     98.16    721676  11975372  82735744     28.60
04:00:01 AM   4427284 259660176     98.32    721696  12056676  83450524     28.85
04:10:01 AM   4868184 259219276     98.16    721736  11863420  82860464     28.65
04:20:01 AM   4711608 259375852     98.22    721760  11877192  83205684     28.77
04:30:01 AM   4452764 259634696     98.31    721928  11945108  83515596     28.87
04:40:01 AM   4800700 259286760     98.18    722072  12015444  82681320     28.58
04:50:01 AM   4796588 259290872     98.18    722212  12075496  82703948     28.59
05:00:01 AM   4320164 259767296     98.36    722372  12164956  83390596     28.83
05:10:01 AM   3350940 260736520     98.73    722488  12120116  84525028     29.22
05:20:01 AM   4200236 259887224     98.41    722628  11965996  83510580     28.87
05:30:01 AM   4028020 260059440     98.47    722640  12019516  83720748     28.94
05:40:01 AM   3929740 260157720     98.51    722720  12069520  83632964     28.91
05:50:01 AM   2719452 261368008     98.97    723460  14408924  83745112     28.95
06:00:01 AM   1530448 262557012     99.42    723644  14943264  84618304     29.25
06:10:01 AM   2925268 261162192     98.89    605748  13363596  84792452     29.31
06:20:02 AM   3235532 260851928     98.77    605916  13811664  83516740     28.87
06:30:01 AM   3265640 260821820     98.76    606072  13848028  83385196     28.83
06:40:01 AM   2102756 261984704     99.20    606232  14745508  83638764     28.92
06:50:01 AM   2386376 261701084     99.10    606644  14821232  83118484     28.74
07:00:01 AM   5343496 258743964     97.98    186908  12019804  84375032     29.17
07:10:01 AM   5073472 259013988     98.08    219044  12597104  83579876     28.90
07:20:01 AM   5380380 258707080     97.96    241300  12600412  83107160     28.73
07:30:01 AM   5063504 259023956     98.08    253984  12653840  83373804     28.82
07:40:01 AM   8241032 255846428     96.88    269960   9772232  83072188     28.72
07:50:01 AM   8549616 255537844     96.76    278472   9853288  82646916     28.57
08:00:01 AM   8185864 255901596     96.90    287296   9938816  83179808     28.76
08:10:01 AM   7797504 256289956     97.05    295856  10029904  83464160     28.86
08:20:01 AM   8813696 255273764     96.66    302620   9930672  82081220     28.38
08:30:01 AM   8574984 255512476     96.75    309156   9880124  82557600     28.54
08:40:01 AM   8010072 256077388     96.97    314804   9912220  83241764     28.78
08:50:01 AM   8791112 255296348     96.67    319568   9980532  81787424     28.28

OMSPatcher finds that previous patching session is not yet completed – What to do?

Hey all,
As usual, a client reached out with this issue:

OMSPatcher finds that previous patching session is not yet completed.
Please refer log file "/u01/app/oracle/middleware/cfgtoollogs/omspatcher/28018178/omspatcher_2018-07-09_23-44-58PM_deploy.log" 
for the previous session and execute the script "/u01/app/oracle/middleware/.omspatcher_storage/oms_session/scripts_2018-07-09_23-44-39PM/run_script_singleoms_resume.sh"  to complete the previous session. OMSPatcher can proceed to execute new operations only if previous session is completed successfully.

Interesting, right?
This means a patch execution in July failed and it wasn’t noticed.

What to do? Point is, the error itself already say what needs to be done.
You just may want to make it properly. How? Here is a quick Action Plan:

ZER0) Check the Deploy log to understand the root cause for the failure on previous patch and fix it.

In my case?
Not all required components were down.

A simple “stop oms” stops only the OMS managed server, JVMD engine, and HTTP server but leaves Node Manager and Administration Server running.
However, a “stop oms -all” stops all Enterprise Manager processes including Administration Server, OMS, HTTP Server, Node Manager, Management Server, JVMD engine, and Oracle BI Publisher (if it is configured on the host). This was the fixing.

Step-by-Step:

1. Blackout targets to avoid unwanted pages.
– On OEM: Enterprise–>Monitoring–>Blackouts

2. Shutdown OMS and AGENT

cd $AGENT_HOME/bin
./emctl stop agent
cd $OMS_HOME/bin
./emctl stop oms -all

3. Resume Patching with issue (with provided command)
(in my case):

/u01/app/oracle/middleware/.omspatcher_storage/oms_session/scripts_218-07-09_23-44-39PM/run_script_singleoms_resume.sh

4. Verify patches got installed

$OMS_HOME/OPatch/opatch lsinventory
$OMS_HOME/OMSPatcher/omspatcher lspatches

5. Start the OMS and agent

cd $AGENT_HOME/bin
./emctl start agent
cd $OMS_HOME/bin
./emctl start oms
./emctl status oms -details

6. Sync EMCLI with server changes:

$OMS_HOME/bin/emcli login -username=sysman
Enter password : <-- sysman password
$OMS_HOME/bin/emcli sync

More“OMSPatcher finds that previous patching session is not yet completed – What to do?”

OEM: Quickly Ignore ORA Error on Agent Layer

Hey all,
So, I had a very specific situation to ignore an error from an agent. Turns that this seems even easier and quicker to ignore an specific error using an OEM Metric… How? Using agent parameter adrAlertLogErrorCodeExcludeRegex.

How to do it? Well, [AGENT_INST]/sysman/config/emd.properties, add a line with this parameters and the Regex to ignore the desired error or message.

To ignore all ORA-700, por example, it can be done by:

adrAlertLogErrorCodeExcludeRegex=.*700.*

Now to ignore, for example, ORA 700 [kskvmstatact: excessive swapping observed]

adrAlertLogErrorCodeExcludeRegex=.*kskvmstatact.*

After this, a restart on agent is required.

This is also well documented as per MOS EM 12c, 13c: How to Disable or Suppress OEM Alerts for Alert Log Error ORA-700 (Doc ID 2406779.1)

Hope it helps!

OEM 13C: How to Set Up Out Of Band Notifications

So, after a quiet weekend on a client, noticed I was not being paged for a reason: OMS was down! 😀

Ok, so, how to monitor the monitoring easily?
OEM 13c has a feature called Out of Band Notification, which allows configuring an agent with email credentials to send notifications when he is not able to communicate with OMS and Repo DB are down.

Details of that configuration is on this MOS note: EM 13c, 12c: How to Set Up Out Of Band Email Notification in Enterprise Manager Cloud Control (Doc ID 1472854.1)

How does it work?
The agent on the OMS host checks the status of the ‘OMS and repository’ target (oracle_emrep) by running the metric ‘Response’ which runs the perl script:

[agent_home]/plugins/oracle.sysman.emrep.agent.plugin_12.1.0.n.0/scripts/emrepresp.pl

If the oracle_emrep target is detected as down then emrepdown.pl will be called on same directory.

The emrepdown.pl uses the perl “Net::SMTP” method to send an email using the Out Of Band email information (To Email ID, Email Gateway, From Email ID) defined in the Agent’s /sysman/config/emd.properties configuration file.

Note: this method does not currently support SSL email authentication, an internal ER (Bug 18886316 “WOULD LIKE ABILITY FOR EMREPDOWN.PL TO BE ABLE TO USE SSL” ) has been raised for this.

How to set up?
1) Run the following commands which will set the email parameters in the emd.properties file.
Do this on the chained agent (ie. the agent on the same machine as the OMS which monitors the oracle_emrep target)

a) Set the agent ORACLE_HOME

$ export ORACLE_HOME=
$ export PATH= ORACLE_HOME/bin:$PATH

Example:

$ export ORACLE_HOME=/oracle/12c/12cagent/core/12.1.0.3.0
$ export PATH=$ORACLE_HOME/bin:$PATH

b) Check if any values are currently set for the Out of Band parameters

$ emctl getproperty agent -name emd_email_address
$ emctl getproperty agent -name emd_from_email_address
$ emctl getproperty agent -name emd_email_gateway

 

If the message is returned:

emd_email_address is not a valid configuration property

It means that this is not yet set up, continue to the next section.

c) Set the Out of Band parameters

emctl setproperty agent -allow_new -name emd_email_address -value [youremailaddress]
emctl setproperty agent -allow_new -name emd_from_email_address -value [senderAddress]
emctl setproperty agent -allow_new -name emd_email_gateway -value [outgoingsmtpserver]
Example:
$ emctl setproperty agent -allow_new -name emd_email_gateway -value smtp.server.hostname
$ emctl setproperty agent -allow_new -name emd_email_address -value noc@grepora.com
$ emctl setproperty agent -allow_new -name emd_from_email_address -value 13cagent@grepora.com

TIP: The value for the emd_email_gateway can be the same as is used for ‘normal’ email notifications via the OMS. This can be accessed via setup/notifications/notification methods.

If you need to use “Use Secure Connection:SSL” normally, then this means that your mail server requires SSL authentication which means that the OOB method will not be suitable. Remember: the OOB method does not support SSL email authentication at this moment in time.

2) Stop and start the agent for these parameters to take effect.

More informations like to test this configuration can be found on MOS note: EM 13c, 12c: How to Set Up Out Of Band Email Notification in Enterprise Manager Cloud Control (Doc ID 1472854.1)

Hope that helps, cheers!

OEM: The number of hanging transactions are hang_trans is %

Hi all!
So, today is quickie one, just to make the links. Seems this message from OEM is not clear enough for some people, specially regarding non-specialists in Oracle: This means something is in lock in your database!

If this is the case, contact a DBA.

If you ARE a DBA, you may want to read this post about easy locating and solving locks: Solving Simple Locks Through @lock2s and @killlocker.

Also, if the session if from DBLink, is always useful to read this: Lock by DBLink – How to locate the remote session?

There is also some additional/specific material about some issues and bugs in this regard here: Tag: LOCK.

I hope it helps!
Cheers!

OEM Information Reports: ORA-00600 [kpndbcon-svchpnotNULL]

Having this error from an Information Report?

ORA-00600 [kpndbcon-svchpnotNULL]
ORA-00600: internal error code, arguments: [kpndbcon-svchpnotNULL], [], [], [], [], [], [], [], [], [], [], []

Don’t worry… Basically this is not an Oracle direct issue , the cause of this error is that while the report is running (it takes 2 or 3 minutes) one of the following happens:

  • The Database Session in the OEM Repository (Database Repository) is killed.
  • The Database Session in the Target Database (where OEM has to connect and get the data) is killed.
  • There is network issues between OEM Repository and the Target database causing “time outs” or that the session finishes erroneously. .
  • High workload in one database causes “time out” making the session finished erroneously.
  • So basically this is a communication problem, between the OEM Repository and the database from where the data is being gotten.
  • To keep reports like this running with database links is something that Oracle doesn’t support at all because of any network issue can cause that the report gets errors, you can read the following notes:

Some reference about it:

  • ORA-00600 [kpndbcon-svchpnotNULL] Errors (Doc ID 1615517.1)
  • ORA-00600 [kpndbcon-svchpnotNULL] query through dblink (Doc ID 1490700.1)
  • Information Publisher Report fails with Error Rendering Element. Exception: ORA-00600 [kpndbcon-svchpnotNULL] (Doc ID 1930280.1)


So what’s the solution
?
The solution here is easy, just re-run it.

Hope it helps. Cheers!

OEM after a Maintenance: A memory component is suspected of causing a fault with a 100% certainty. Component Name : % Fault class : fault.memory.intel.dimm_ce

Hi all!
So, I had this message from a memory component in my Exadata:

Message=A memory component is suspected of causing a fault with a 100% certainty. Component Name : /SYS/MB/P0/D3 Fault class : fault.memory.intel.dimm_ce

But this was right after a maintenance on server. Checking on ILOM:

-> show /SYS/MB/P0/D3

 /SYS/MB/P0/D3
    Targets:
        PRSNT
        SERVICE

    Properties:
        type = DIMM
        ipmi_name = MB/P0/D3
        fru_name = 16384MB DDR4 SDRAM DIMM
        fru_manufacturer = Samsung
        fru_part_number = %
        fru_rev_level = 01
        fru_serial_number = %
        fault_state = OK
        clear_fault_action = (none)

Checking on CellCLI alert history:

CellCLI> list alerthistory detail

	 name:                   13_1
	 alertDescription:       "A memory component suspected of causing a fault"
	 alertMessage:           "A memory component is suspected of causing a fault with a 100% certainty.  Component Name : /SYS/MB/P0/D3  Fault class    : fault.memory.intel.dimm_ce  Fault message  : http://support.oracle.com/msg/SPX86A-8002-XM"
	 alertSequenceID:        13
	 alertShortName:         Hardware
	 alertType:              Stateful
	 beginTime:              %
	 endTime:                %
	 examinedBy:             
	 metricObjectName:       /SYS/MB/P0/D3_FAULT
	 notificationState:      1
	 sequenceBeginTime:      %
	 severity:               critical
	 alertAction:            "For additional information, please refer to http://support.oracle.com/msg/SPX86A-8002-XM Automatic Service Request has been notified with Unique Identifier: %.  Diagnostic package is attached. It is also accessible at % It will be retained on the storage server for 7 days. If the diagnostic package has expired, then it can be re-created at %"

Hm… Let’s read the MOS: SPX86A-8002-XM – Memory Correctable ECC (Doc ID 1615285.1)

Suggested Action for System Administrator

Replace the faulty memory DIMM at the earliest possible convenience.”

Hmm… But as I said, this was right after a maintenance on server, what if this is related?
Ok, some additional piece of information:

-> version 
SP firmware 3.2.9.23 
SP firmware build number: 116695 
SP firmware date: Thu Mar 30 11:38:01 CST 2017 
SP filesystem version: 0.2.10

At the current firmware level of SP firmware 3.2.9.23 the memory correctable error threshold limit for DIMM replacement is 240 CEs in a 72 hrs period.

So, the suggestion is:
– Clear all the error messages after complete the maintenance and lets check if the threshold is reached again. If so, we may need to really replace it.

How to do it? Easy:

ssh root@grepora01-ilom
-> show /SYS/MB/P0/D3
Expected:
[...]
fault_state = Faulted
[..]
-> set /SYS/MB/P0/D3 clear_fault_action=true
Are you sure you want to clear /SYS/MB/P0/D3 (y/n)? y
-> show /SYS/MB/P0/D3
[Expected]
 /SYS/MB/P0/D3
    Targets:
        PRSNT
        SERVICE
Properties:
type = DIMM
ipmi_name = MB/P0/D3
fru_name = 16384MB DDR4 SDRAM DIMM
fru_manufacturer = Samsung
fru_part_number = %
fru_rev_level = 01
fru_serial_number = %
 fault_state = OK
clear_fault_action = (none)

Hope it helps!
Cheers!

Managing AWR Warehouse Repository Database

1. Change Retention Period Of AWR Warehouse Repository Database

This retention of the AWR on the Repository Database can be changed by the following:

<OMS_HOME>/bin>./emcli awrwh_reconfigure -retention=<New retention period (in years)>
For example: 
[oracle@oem13c oms]$ emcli awrwh_reconfigure -retention=5

2. Change Staging Location Of AWR Dump Files

For the AWR Warehouse, the target database by default creates dump file in home directory. So after adding the target to AWR warehosue, we need to reconfigure it from OEM CLI to change the dump files directory as following:

<OMS_HOME>/bin>./emcli awrwh_reconfigure_src -target_name=<target database name> -target_type=rac_database -src_dir="<directory path>"
For example: 
[oracle@oem13c ]$ ./emcli awrwh_reconfigure_src -target_name=greporadb -target_type=rac_database -src_dir="/arwdata/awrw"

3. Change Upload Interval Of SnapShots In AWR Warehouse Repository Database

This configuration can be changed by the following:

<OMS_HOME>/bin>./emcli awrwh_reconfigure -upload_interval <New upload interval>
For example: 
[oracle@oem13c oms]$ ./emcli awrwh_reconfigure -upload_interval=12

4. List Current Configuration

This can be accomplished by the following:

[oracle@oem13c oms]$ emcli awrwh_reconfigure -list
Upload Interval (hours) = 12
Retention (years) = 5
Dump Location = /awrdata/awrw/
AWR Warehouse reconfigured successfully
[oracle@oem13c oms]$

Reference
EM13c: How To Change Retention Period Of AWR Warehouse Repository Database In 13.2 OEM Cloud Control (Doc ID 2247437.1)
EM13c: How To Change Staging Location Of Dump Files On AWR Warehouse Repository Database In 13.2 OEM Cloud Control (Doc ID 2247439.1)
EM13c: How To Change Upload Interval Of SnapShots In AWR Warehouse Repository Database In 13.2 OEM Cloud Control (Doc ID 2247438.1)

OEM: The ILOM server is currently offline or unreachable on the network.

Hi all!
Just got an alarm from OEM with this message. How to check it?
– First thing is to be able to connect on ILOM from DBNode.
– From there we can test the IPv4 and/or IPv6 interfaces through ping, as pe shown below.

This is also documented as per this Doc: Oracle Integrated Lights Out Manager (ILOM) 3.0 HTML Documentation Collection – Test IPv4 or IPv6 Network Configuration (CLI)

In my case, it was only a false alarm, as I was able to connect to other DBNodes from this ILOM:

[root@greporasrv01db01 ~]# ssh greporasrv01-ilom.jcrew.com
The authenticity of host 'greporasrv01-ilom.grepora.com (10.48.18.64)' can't be established.
RSA key fingerprint is 59:c5:9f:b1:60:59:15:16:94:c8:94:88:7b:4e:52:57.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'greporasrv01-ilom.grepora.com' (RSA) to the list of known hosts.
Password: 

Oracle(R) Integrated Lights Out Manager

Version 3.2.9.23 r116695

Copyright (c) 2017, Oracle and/or its affiliates. All rights reserved.

Warning: HTTPS certificate is set to factory default.

Hostname: greporasrv01-ilom

-> show /SP/network

 /SP/network
    Targets:
        interconnect
        ipv6
        test

    Properties:
        commitpending = (Cannot show property)
        dhcp_clientid = none
        dhcp_server_ip = none
        ipaddress = 10.50.12.64
        ipdiscovery = static
        ipgateway = 10.50.12.1
        ipnetmask = 255.255.255.0
        macaddress = 00:10:E0:95:73:E6
        managementport = MGMT
        outofbandmacaddress = 00:10:E0:95:73:E6
        pendingipaddress = 10.50.12.64
        pendingipdiscovery = static
        pendingipgateway = 10.50.12.1
        pendingipnetmask = 255.255.255.0
        pendingmanagementport = MGMT
        pendingvlan_id = (none)
        sidebandmacaddress = 00:10:E0:95:73:E7
        state = ipv4-only
        vlan_id = (none)

    Commands:
        cd
        set
        show

-> cd /SP/network/test
/SP/network/test

-> show

 /SP/network/test
    Targets:

    Properties:
        ping = (Cannot show property)
        ping6 = (Cannot show property)

    Commands:
        cd
        set
        show

-> set ping=10.50.12.51       -- DBNode1
Ping of 10.50.12.51 succeeded

-> set ping=10.50.12.52       -- DBNode2
Ping of 10.50.12.52 succeeded