monitoring – grepOra

Monitoring Your Oracle Database With Grafana

By dbafrasson 11 de May de 2018 11 de May de 2018

Hi everybody,

Let’s talk about Dashboarding Oracle Databases with Grafana.

I always felt the need of a graphical monitoring tool for basic database things such as volume of archives, back-up archives, state of services, offline disks, space of diskgroup, consum of UNDO, consum of TEMP, space of filesystem, space of every diskgroups in all clusters. OEM seems just too much complicated to give a simple online graphical dashboard for this.

So I developed a “collector” of data that sends the data to Influxdb and generate these graphs. Simple like that.

Have a look on how it looks like:

grafana1

grafana2

Ok, but I how did it?
Here it goes a piece of code:
More“Monitoring Your Oracle Database With Grafana”

Enterprise Manager

Oracle ASR: Communication Issues

By matheusdba 30 de August de 2017 15 de September de 2017

Hello all,
Are you having notifications like this one from you ASR?

ALERT: Oracle Auto Service Request (ASR) has detected a heartbeat failure for these assets.

[list with "Serial"; "Hostnam"; "Information" of affected targets]

IMPACT: ASR would not be able to create a Service Request (SR) if a fault were to occur.
ACTION: Determine why the heartbeat has failed for these assets and resolve the issue.

This is only a notification saying that ASR not able to reach a target.
For detailed information on how to troubleshoot, you can access MOS Oracle Auto Service Request (ASR) No Heartbeat Issue – How to Resolve (Doc ID 1346328.2)

In general why, things to test are:
– Access from ASR server to transport.oracle.com, via https, using port 443.
– Access from ASR server to ASR assets, via http, using port 6481.

More“Oracle ASR: Communication Issues”

Enterprise Manager

OEM Alarms: TaskZombieException

By matheusdba 9 de August de 2017 10 de August de 2017

Hello all,
Recently I started to have several alarms like this:

Internal error detected: java.lang.Throwable:oracle.sysman.gcagent.tmmain.execution.LongOpManager$ZombieDetection:1017.

Or

Internal error detected: oracle.sysman.gcagent.task.TaskZombieException:oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask:620.

In agent log the message:

# $AGENT_INST/sysman/log/gcagent.log:

2017-06-13 12:54:09,232 [355:GC.Executor.14 (oracle_database:DB_DB12:%DB%)] ERROR - oracle_database:DB_DB12:%DB% oracle.sysman.gcagent.task.TaskZombieException: task declared as a zombie
at oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask.accountedCall(TaskFutureImpl.java:620)
at oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask.call(TaskFutureImpl.java:643)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)

To avoid generating pages and new incidents, I changed parameter _zombieCreateIncident to false on agents in related servers, as per described in MOS EM12c: Incident constantly raised for Oracle.sysman.gcagent.task.TaskZombieException: task declared as a zombie (Doc ID 2116834.1) regarding MOS Bug 22674258 – Zombie processes created for DB metric collection /workaround do not work.
I also added some other parameters that should help, like increasing the wait, all mentioned in same MOS notes.

Like this, in $AGENT_INST/sysman/config/emd.properties:

#GrepOra magical fixes:
_zombieSuspensions=true
_canceledThreadWait=900
_zombieThreadPercentThreshold=0
_zombieCreateIncident=false

And it solved the issue. At leat, the noise stopped. 🙂

Hope it helps,
Cheers!

Enterprise Manager

OEM 12c+: Prevent OMS from Sending Old Notifications for Events

By matheusdba 2 de August de 2017 2 de August de 2017

Are you receiving old notifications from OEM? Like 2 or 3 days past, mostly already solved or after a blackout?
Yeah, this is annoying, specially when getting floods and floods of notifications.

Ok, so here go a very good tip: You can set grace period for notifications! 🙂
Easy easy, do this way:

cd /bin
emctl set property -name oracle.sysman.core.notification.grace_period -value [provide value in minutes]

The oracle.sysman.core.notification.grace_period OMS parameter has been introduced in 12c and allows the user to configure the grace period within which the notification should be sent. The value is set in minutes.

For example:

emctl set property -name oracle.sysman.core.notification.grace_period -value 1440

With this, OMS sends only those notifications which have been raised in the past 1440 mins (last 24 hours) and ignores all the notifications for events / incidents created prior to this time period.

After this, you’ll need to start OMS:

emctl start oms

The oracle.sysman.core.notification.grace_period parameter applies to all the Notification methods, but if the requirement is to specify the grace period for a particular notification method only, you can use the below parameters accordingly:

oracle.sysman.core.notification.grace_period_connector: For Connectors
oracle.sysman.core.notification.grace_period_email: For email notifications
oracle.sysman.core.notification.grace_period_oscmd: For OS Command notifications
oracle.sysman.core.notification.grace_period_plsql: For PLSQL notifications
oracle.sysman.core.notification.grace_period_snmp: For SNMP Trap notifications
oracle.sysman.core.notification.grace_period_ticket: For ticketing tools

This is weel described as per MOS: 12c Cloud control: How to Prevent OMS from Sending out Old Notifications for Events / Incidents Occurred in the Past? (Doc ID 1605351.1)

Cheers!

Enterprise Manager, RAC | Data Guard

EMagent 12c with high number of threads causing “su: cannot set user id: Resource temporarily unavailable”

By matheusdba 10 de May de 2017 29 de April de 2017

Hello all,
Yeah, these days I got some errors and when validating the server found the fllowing error:

su: cannot set user id: Resource temporarily unavailable

As you can imagine, in order to fix the issue, I adjusted the /etc/security/limits.conf increasing oracle nprocs to:

oracle           soft    nproc           4047
oracle           hard    nproc           20384

Ok, turns out that after a while I got the same errors again…
After some investigating I find that the EM Agent process was with 5020 threads!
Take a look:

More“EMagent 12c with high number of threads causing “su: cannot set user id: Resource temporarily unavailable””

Database, Enterprise Manager, Errors and Bugs

Metric evaluation error start – Get dynamic property error,Dynamic Category property error,Computation of a critical dynamic property failed. Retries Completed.

By matheusdba 19 de April de 2017 19 de April de 2017

Hi all,
Getting this error? OMS 12.1.0.3?
Ok… After an investigation on MOS and also a SR opened with Oracle is was pointed to relation with Bug 17575631 CLUSTER DB HOME PAGE ERRORS- REGIONS FETCH REAL TIME DATA WILL NOT BE DISPLAYED on Database Plugin.
It’s solved in OMS 12.1.0.4 and also in 12.1.0.5.1 EM Database Plugin Bundle patch.

I’ll show you how to workaround it. 🙂

More“Metric evaluation error start – Get dynamic property error,Dynamic Category property error,Computation of a critical dynamic property failed. Retries Completed.”

Enterprise Manager, MySQL, Shell Scripts

Getting today’s Errors and Warnings from MySQL log

By matheusdba 15 de April de 2016 21 de April de 2020

Quick one!

# Warnings

cat /var/log/mysqld.log |grep `date +%y%m%d` | grep "\[Warning\]"

# Errors

cat /var/log/mysqld.log |grep `date +%y%m%d` | grep "\[ERROR\]"

And a Bonus!
To get entries from X days ago:

cat /var/log/mysqld.log |grep `date --date="46 days ago" +%y%m%d`

Matheus.

Database, Enterprise Manager, MySQL

Monitoring MySQL with Nagios – Quick View

By matheusdba 7 de April de 2016 21 de April de 2020

Hi all!
As you know, we have some commercial solutions to monitoring/alerting MySQL, like MySQL Enterprise Monitor or Oracle Grid/Cloud Control.

But, regarding we are using MySQL instead of Oracle Database, we can assume it’s probably a decision taken based on cost. So, considering Open Source solutions, we basically have Nagios, Zabbix, OpenNMS…

MangagedMonitoringConsole

Thinking on Nagios, in my opinion the “supra sumo” is mysql_health_check.pl.
Below whitepaper and presentation:
White Paper
Presentation
Code
Good one by Sheeri Cabral and posted here!

Any way, with theese two we can make lots of magic:

1. check_mysql.pl
– Check status of MySql server (slow queries, etc)
– Queries per second graph

2. check_db_query.pl
– Allowes to run SQL Queries and setting thresholds for warning e critical. Ex:

check_db_query.pl -d database -q query [-w warn] [-c crit] [-C conn_file] [-p placeholder]

Ex for Nagios call:

define command{
command_name    check_db_entries
command_line    /usr/local/bin/perl $USER1$/check_db_query.pl -d "$ARG1$" -q "$ARG2$" $ARG3$
}

So, now it’s just make your queries and implement your free monitoring on MySQL! 🙂
Matheus.