OEM 13C: How to Set Up Out Of Band Notifications

So, after a quiet weekend on a client, noticed I was not being paged for a reason: OMS was down! 😀

Ok, so, how to monitor the monitoring easily?
OEM 13c has a feature called Out of Band Notification, which allows configuring an agent with email credentials to send notifications when he is not able to communicate with OMS and Repo DB are down.

Details of that configuration is on this MOS note: EM 13c, 12c: How to Set Up Out Of Band Email Notification in Enterprise Manager Cloud Control (Doc ID 1472854.1)

How does it work?
The agent on the OMS host checks the status of the ‘OMS and repository’ target (oracle_emrep) by running the metric ‘Response’ which runs the perl script:

[agent_home]/plugins/oracle.sysman.emrep.agent.plugin_12.1.0.n.0/scripts/emrepresp.pl

If the oracle_emrep target is detected as down then emrepdown.pl will be called on same directory.

The emrepdown.pl uses the perl “Net::SMTP” method to send an email using the Out Of Band email information (To Email ID, Email Gateway, From Email ID) defined in the Agent’s /sysman/config/emd.properties configuration file.

Note: this method does not currently support SSL email authentication, an internal ER (Bug 18886316 “WOULD LIKE ABILITY FOR EMREPDOWN.PL TO BE ABLE TO USE SSL” ) has been raised for this.

How to set up?
1) Run the following commands which will set the email parameters in the emd.properties file.
Do this on the chained agent (ie. the agent on the same machine as the OMS which monitors the oracle_emrep target)

a) Set the agent ORACLE_HOME

$ export ORACLE_HOME=
$ export PATH= ORACLE_HOME/bin:$PATH

Example:

$ export ORACLE_HOME=/oracle/12c/12cagent/core/12.1.0.3.0
$ export PATH=$ORACLE_HOME/bin:$PATH

b) Check if any values are currently set for the Out of Band parameters

$ emctl getproperty agent -name emd_email_address
$ emctl getproperty agent -name emd_from_email_address
$ emctl getproperty agent -name emd_email_gateway

 

If the message is returned:

emd_email_address is not a valid configuration property

It means that this is not yet set up, continue to the next section.

c) Set the Out of Band parameters

emctl setproperty agent -allow_new -name emd_email_address -value [youremailaddress]
emctl setproperty agent -allow_new -name emd_from_email_address -value [senderAddress]
emctl setproperty agent -allow_new -name emd_email_gateway -value [outgoingsmtpserver]
Example:
$ emctl setproperty agent -allow_new -name emd_email_gateway -value smtp.server.hostname
$ emctl setproperty agent -allow_new -name emd_email_address -value noc@grepora.com
$ emctl setproperty agent -allow_new -name emd_from_email_address -value 13cagent@grepora.com

TIP: The value for the emd_email_gateway can be the same as is used for ‘normal’ email notifications via the OMS. This can be accessed via setup/notifications/notification methods.

If you need to use “Use Secure Connection:SSL” normally, then this means that your mail server requires SSL authentication which means that the OOB method will not be suitable. Remember: the OOB method does not support SSL email authentication at this moment in time.

2) Stop and start the agent for these parameters to take effect.

More informations like to test this configuration can be found on MOS note: EM 13c, 12c: How to Set Up Out Of Band Email Notification in Enterprise Manager Cloud Control (Doc ID 1472854.1)

Hope that helps, cheers!

OEM: The number of hanging transactions are hang_trans is %

Hi all!
So, today is quickie one, just to make the links. Seems this message from OEM is not clear enough for some people, specially regarding non-specialists in Oracle: This means something is in lock in your database!

If this is the case, contact a DBA.

If you ARE a DBA, you may want to read this post about easy locating and solving locks: Solving Simple Locks Through @lock2s and @killlocker.

Also, if the session if from DBLink, is always useful to read this: Lock by DBLink – How to locate the remote session?

There is also some additional/specific material about some issues and bugs in this regard here: Tag: LOCK.

I hope it helps!
Cheers!

OEM Information Reports: ORA-00600 [kpndbcon-svchpnotNULL]

Having this error from an Information Report?

ORA-00600 [kpndbcon-svchpnotNULL]
ORA-00600: internal error code, arguments: [kpndbcon-svchpnotNULL], [], [], [], [], [], [], [], [], [], [], []

Don’t worry… Basically this is not an Oracle direct issue , the cause of this error is that while the report is running (it takes 2 or 3 minutes) one of the following happens:

  • The Database Session in the OEM Repository (Database Repository) is killed.
  • The Database Session in the Target Database (where OEM has to connect and get the data) is killed.
  • There is network issues between OEM Repository and the Target database causing “time outs” or that the session finishes erroneously. .
  • High workload in one database causes “time out” making the session finished erroneously.
  • So basically this is a communication problem, between the OEM Repository and the database from where the data is being gotten.
  • To keep reports like this running with database links is something that Oracle doesn’t support at all because of any network issue can cause that the report gets errors, you can read the following notes:

Some reference about it:

  • ORA-00600 [kpndbcon-svchpnotNULL] Errors (Doc ID 1615517.1)
  • ORA-00600 [kpndbcon-svchpnotNULL] query through dblink (Doc ID 1490700.1)
  • Information Publisher Report fails with Error Rendering Element. Exception: ORA-00600 [kpndbcon-svchpnotNULL] (Doc ID 1930280.1)


So what’s the solution
?
The solution here is easy, just re-run it.

Hope it helps. Cheers!

OEM after a Maintenance: A memory component is suspected of causing a fault with a 100% certainty. Component Name : % Fault class : fault.memory.intel.dimm_ce

Hi all!
So, I had this message from a memory component in my Exadata:

Message=A memory component is suspected of causing a fault with a 100% certainty. Component Name : /SYS/MB/P0/D3 Fault class : fault.memory.intel.dimm_ce

But this was right after a maintenance on server. Checking on ILOM:

-> show /SYS/MB/P0/D3

 /SYS/MB/P0/D3
    Targets:
        PRSNT
        SERVICE

    Properties:
        type = DIMM
        ipmi_name = MB/P0/D3
        fru_name = 16384MB DDR4 SDRAM DIMM
        fru_manufacturer = Samsung
        fru_part_number = %
        fru_rev_level = 01
        fru_serial_number = %
        fault_state = OK
        clear_fault_action = (none)

Checking on CellCLI alert history:

CellCLI> list alerthistory detail

	 name:                   13_1
	 alertDescription:       "A memory component suspected of causing a fault"
	 alertMessage:           "A memory component is suspected of causing a fault with a 100% certainty.  Component Name : /SYS/MB/P0/D3  Fault class    : fault.memory.intel.dimm_ce  Fault message  : http://support.oracle.com/msg/SPX86A-8002-XM"
	 alertSequenceID:        13
	 alertShortName:         Hardware
	 alertType:              Stateful
	 beginTime:              %
	 endTime:                %
	 examinedBy:             
	 metricObjectName:       /SYS/MB/P0/D3_FAULT
	 notificationState:      1
	 sequenceBeginTime:      %
	 severity:               critical
	 alertAction:            "For additional information, please refer to http://support.oracle.com/msg/SPX86A-8002-XM Automatic Service Request has been notified with Unique Identifier: %.  Diagnostic package is attached. It is also accessible at % It will be retained on the storage server for 7 days. If the diagnostic package has expired, then it can be re-created at %"

Hm… Let’s read the MOS: SPX86A-8002-XM – Memory Correctable ECC (Doc ID 1615285.1)

Suggested Action for System Administrator

Replace the faulty memory DIMM at the earliest possible convenience.”

Hmm… But as I said, this was right after a maintenance on server, what if this is related?
Ok, some additional piece of information:

-> version 
SP firmware 3.2.9.23 
SP firmware build number: 116695 
SP firmware date: Thu Mar 30 11:38:01 CST 2017 
SP filesystem version: 0.2.10

At the current firmware level of SP firmware 3.2.9.23 the memory correctable error threshold limit for DIMM replacement is 240 CEs in a 72 hrs period.

So, the suggestion is:
– Clear all the error messages after complete the maintenance and lets check if the threshold is reached again. If so, we may need to really replace it.

How to do it? Easy:

ssh root@grepora01-ilom
-> show /SYS/MB/P0/D3
Expected:
[...]
fault_state = Faulted
[..]
-> set /SYS/MB/P0/D3 clear_fault_action=true
Are you sure you want to clear /SYS/MB/P0/D3 (y/n)? y
-> show /SYS/MB/P0/D3
[Expected]
 /SYS/MB/P0/D3
    Targets:
        PRSNT
        SERVICE
Properties:
type = DIMM
ipmi_name = MB/P0/D3
fru_name = 16384MB DDR4 SDRAM DIMM
fru_manufacturer = Samsung
fru_part_number = %
fru_rev_level = 01
fru_serial_number = %
 fault_state = OK
clear_fault_action = (none)

Hope it helps!
Cheers!

Managing AWR Warehouse Repository Database

1. Change Retention Period Of AWR Warehouse Repository Database

This retention of the AWR on the Repository Database can be changed by the following:

<OMS_HOME>/bin>./emcli awrwh_reconfigure -retention=<New retention period (in years)>
For example: 
[oracle@oem13c oms]$ emcli awrwh_reconfigure -retention=5

2. Change Staging Location Of AWR Dump Files

For the AWR Warehouse, the target database by default creates dump file in home directory. So after adding the target to AWR warehosue, we need to reconfigure it from OEM CLI to change the dump files directory as following:

<OMS_HOME>/bin>./emcli awrwh_reconfigure_src -target_name=<target database name> -target_type=rac_database -src_dir="<directory path>"
For example: 
[oracle@oem13c ]$ ./emcli awrwh_reconfigure_src -target_name=greporadb -target_type=rac_database -src_dir="/arwdata/awrw"

3. Change Upload Interval Of SnapShots In AWR Warehouse Repository Database

This configuration can be changed by the following:

<OMS_HOME>/bin>./emcli awrwh_reconfigure -upload_interval <New upload interval>
For example: 
[oracle@oem13c oms]$ ./emcli awrwh_reconfigure -upload_interval=12

4. List Current Configuration

This can be accomplished by the following:

[oracle@oem13c oms]$ emcli awrwh_reconfigure -list
Upload Interval (hours) = 12
Retention (years) = 5
Dump Location = /awrdata/awrw/
AWR Warehouse reconfigured successfully
[oracle@oem13c oms]$

Reference
EM13c: How To Change Retention Period Of AWR Warehouse Repository Database In 13.2 OEM Cloud Control (Doc ID 2247437.1)
EM13c: How To Change Staging Location Of Dump Files On AWR Warehouse Repository Database In 13.2 OEM Cloud Control (Doc ID 2247439.1)
EM13c: How To Change Upload Interval Of SnapShots In AWR Warehouse Repository Database In 13.2 OEM Cloud Control (Doc ID 2247438.1)

OEM 13C Patching Agent: [ERROR- Failed to Update Target Type Metadata]

While applying Patch to OEM13c, specifically to OMS Agent, I got this error when trying to start it back:

Collection Status                            : Collections enabled
Heartbeat Status       : OMS responded illegally [ERROR- Failed to Update Target Type Metadata]
Last attempted heartbeat to OMS              : 2018-04-17 12:12:51
Last successful heartbeat to OMS             : (none)
Next scheduled heartbeat to OMS              : 2018-04-17 12:13:21
---------------------------------------------------------------
Agent is Running and Ready
[oracle@oem13c oms]$ ./emctl upload agent
Oracle Enterprise Manager Cloud Control 13c Release 2
Copyright (c) 1996, 2016 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
EMD upload error:full upload has failed: uploadXMLFiles skipped :: OMS version not checked yet. If this issue persists check trace files for ping to OMS related errors. (OMS_DOWN)
[oracle@oem13c oms]$ ./emctl pingOMS
Oracle Enterprise Manager Cloud Control 13c Release 2
Copyright (c) 1996, 2016 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
EMD pingOMS error: OMS sent an invalid response: “ERROR- Failed to Update Target Type Metadata”

Nice hãn?
I Found to MOS EM 13c Agent : pingOMS error: OMS sent an invalid response: “ERROR- Failed to Update Target Type Metadata” (Doc ID 2318564.1) saying:

“This particular issue is caused when any Agent Plugin is upgraded to a higher level than the OMS plugin.”

The solution according to MOS Doc is to rollback the Agent Plugins ahead to the OMS version. Checking it:
Continue reading

Monitoring Your Oracle Database With Grafana

Hi everybody,

Let’s talk about Dashboarding Oracle Databases with Grafana.

I always felt the need of a graphical monitoring tool for basic database things such as volume of archives, back-up archives, state of services, offline disks, space of diskgroup, consum of UNDO, consum of TEMP, space of filesystem, space of every diskgroups in all clusters. OEM seems just too much complicated to give a simple online graphical dashboard for this.

So I developed a “collector” of data that sends the data to Influxdb and generate these graphs. Simple like that.

Have a look on how it looks like:

grafana1

grafana2

Ok, but I how did it?
Here it goes a piece of code:
Continue reading

(OSB) Oracle Service Bus 12.2 – LDAP Authorization

Oracle Service Bus 12.2 is now available to download on Oracle support.

It bring news to middleware admin and new features to developer.

Now, all user priviledges and group roles are managed by ‘EM Console’. So it would be little confusing in this release.

I had little inconvenince on first time that I triyed to use OSB with LDAP autorization.

After configure Weblogic in LDAP, user could login in Weblogic Console (/console)

But, can’t login Service Bus Console (/sbconsole), taking in browser http 401 (Unauthorized)

In WL Admin log:

[AdminServer] [ERROR] [ADFC-50017] [oracle.adfinternal.controller.application.AdfcExceptionHandler] [tid: [ACTIVE].ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'] [userId: maiquel_oliveira] [ecid: fdf185a3-35fd-4171-9b75-4b80a0f40b03-000000cf,0] [APP: service-bus] [partition-name: DOMAIN] [tenant-name: GLOBAL] [DSID: 0000Lu77u7K7q2r6wJAhMG1Pj34E000001] ADFc: While attempting to handle this exception the application's exception handler failed.[[
oracle.adf.controller.security.AuthorizationException: ADFC-0619: Authorization check failed: User 'user_name' does not have 'VIEW' permission on 'jsf.resourcesPageDef'.

Error 401–Unauthorized

From RFC 2068 Hypertext Transfer Protocol — HTTP/1.1:

10.4.2 401 Unauthorized

The request requires user authentication. The response MUST include a WWW-Authenticate header field (section 14.46) containing a challenge applicable to the requested resource. The client MAY repeat the request with a suitable Authorization header field (section 14.8). If the request already included Authorization credentials, then the 401 response indicates that authorization has been refused for those credentials. If the 401 response contains the same challenge as the prior response, and the user agent has already attempted authentication at least once, then the user SHOULD be presented the entity that was given in the response, since that entity MAY include relevant diagnostic information. HTTP access authentication is explained in section 11.

Follow the trick:

Continue reading

Oracle ASR: Communication Issues

Hello all,
Are you having notifications like this one from you ASR?

ALERT: Oracle Auto Service Request (ASR) has detected a heartbeat failure for these assets.

[list with "Serial"; "Hostnam"; "Information" of affected targets]

IMPACT: ASR would not be able to create a Service Request (SR) if a fault were to occur.
ACTION: Determine why the heartbeat has failed for these assets and resolve the issue.

This is only a notification saying that ASR not able to reach a target.
For detailed information on how to troubleshoot, you can access MOS Oracle Auto Service Request (ASR) No Heartbeat Issue – How to Resolve (Doc ID 1346328.2)

In general why, things to test are:
– Access from ASR server to transport.oracle.com, via https, using port 443.
– Access from ASR server to ASR assets, via http, using port 6481.

Continue reading