19c+ DBMS_STATS.GATHER_TABLE_STATS Memory Leaking on PGA

Hi all,

So, here is a quick post but which wasn’t so eaasy to be mapped, though.

It happens that right after a 19c upgrade from 18c, we started facing Memory Leak messages related to PGA area on a regular basis. After a while following the v$process_memory allocation, it was possible to map it to an ODI routing with DBMS_STATS.GATHER_TABLE_STATS. The PGA exceeds the limit just a few minutes after.

Nice han, so what is happening?

Long story short: Bug 30846782 : 19C+ FAST/EXCESSIVE PGA GROWTH WHEN USING DBMS_STATS.GATHER_TABLE_STATS.

According to Oracle Docs: No Workaround!

But here is the golden info I want to share after an SR:

EXECUTE IMMEDIATE 'alter session set "_fix_control"='20424684:OFF';

You are welcome!

Cheers!

How to send Telegram messages from SQL Server

Hi folks,
You usually receive notifications from SQLServer at your e-mail, right? A nice way to receive updates from your SQL Server Schedule job, it’s sending them through telegram. To do so, you’ll have to configure one additional step and choose Operating System (CmdExec) as a type of command. And to call the Telegram API, we can use a PowerShell command like the following:

powershell "Invoke-RestMethod -Uri 'https://api.telegram.org/bot{API-KEY}/sendMessage?chat_id={CHAT-ID}&text={TEXT}'"

Just replace the following keys with the proper value (don’t forget to remove the curly brackets too).

API-KEY: Your bot API key
CHAT-ID: The ID of the chat where you’ll receive the message, could be a single user, or a chat group
TEXT: Your beloved message.

What else do you use to achieve this kind of alers? Slack? healthchecks.io? Tell us 😉

Plan Stability: SQLs switching to Bad Plans (HPV) and Load Very High (12c+)

Hi all,

That’s not the first post about this topic. Please read this one first.

Thing is: I access this post soooo often, as I face this sort of issue sooo frequently I decided to make a new one with some updated info. Even because, 3 years ago I wrote this article for 12.1 databases, now we have some relevant changes to 12.2 onwards.

Ok, so, for a little bit of context from the previous post:

  • After some critical issues with Plan Stability (new HSP being created out of nowhere very often) I mapped this to the usage of Adaptive Query Features.
  • At that point, I had a match to the latchfree events and mapped it specifically to latch misses on “Result Cache: RC Latch”.
  • As per MOS at the time, when Automatic Dynamic Sampling is used for the SQL statements, it can decide, based upon these statistics, that a better response time could be achieved by using the result cache for those queries. This can cause heavy usage of the result cache leading to the contention on latch free for “Result Cache: RC Latch”
  • So the villain was the Automatic Dynamic Sampling, as part to the SQL Plan Directives.
  • On the following months and migrations I started seeing more and more issues related to the SQL Plan Directives as part of the Adaptive Query Optimization features. So, I have been following the referred steps more and more.
  • Happens this means setting a few underscore parameters, which we should avoid unless approved/recommended by Oracle, so I’ve been trying to be less intrusive and simply disabling the whole Adaptive Features.
  • How to do it? That’s the tricky part and the thing to keep in mind, as we had changes from 12.1 to 12.2

Some words about the parameters settings:

  • The parameter “optimizer_adaptive_features” is a valid parameter for 12.1 and it’s enabled by default. I recommend leaving it as FALSE if you are facing this sort of issues.
  • This parameter is obsolete con 12.2 onwards though. It was replaced by both optimizer_adaptive_statistics and optimizer_adaptive_plans.
    • 12.1:

adap1

    • 12.2+:

adap2

(images from https://blogs.oracle.com/optimizer/optimizer-adaptive-features-in-oracle-database-12c-release-2)

  • Also, keep one eye on all %optim% parameters, as specific mechanisms can be enabled (for some reason) as I recently faced in a client ( _optimizer_adaptive_plan = TRUE).

When Adaptive Queries are good?

Ok Matheus, but if we should disable it, why did Oracle created in the first place?

Well, if something would be always bad, it would be disabled by default… There are some good usage cases: Dynamic Sampling is typically recommended when you know you are getting a bad execution plan due to complex predicates.

oh, noice, so I’ll change on my database I have a lot of complex queries” . Don’t, I suggest you to try and use an alter session statement to set the value for optimizer_dynamic_sampling parameter as it can be extremely difficult to come up with a system-wide setting. And you can mess up with other things.

When Adaptive Queries are bad?

If the queries compile times need to be as fast as possible, for example, unrepeated OLTP queries where you can’t amortize the additional cost of compilation over many executions. Serial statements are typically short running and any Dynamic Sampling overhead at compile time can have a large impact on overall system performance (if statements are frequently hard parsed).

For systems that match this profile, setting OAF=FALSE is recommended. For Oracle Database 12c Release 2 onwards, using the default OAS=FALSE is recommended.

I hope this helps you!

DBCA “Recovery manager failed to restore datafiles”

Hi all,

If that’s the 5th Blog Post with the same title one open, don’t close, READ THIS ONE!

This one is different.

It’s actually another case about Oracle throwing generic errors for DBCA where 99% of times it’s the same error, so all blogs are different but the same in essence, and none resolve your problem. So, let’s go by parts:

The Error from Client:

DBCA_Error

 

Generic Case (if this is the first blog you open about the subject):

  • Make sure the file $ORACLE_HOME/bin/oracle has privilege set to chmod 6751 in both (ASM and DB) homes. it should look like this:
[oracle@PROD01 bin]$ cd /u01/app/oracle/product/19c/db/bin
[oracle@PROD01 bin]$
[oracle@PROD01 bin]$ ls -ltr oracle*
-rwsr-s--x. 1 oracle asmadmin 441253104 Aug 27 22:29 oracle
  • If you are not sure, set it accordingly:
cd $ORACLE_HOME/bin && chmod 6751 oracle
  • Not yet? Check the disks assigned to ASM privileges and groups:
kfod status=TRUE asm_diskstring='/dev/asm*' disk=ALL
  • Ater all this, still not working? Go for the atypical case below:

 

Atypical Case (Exception)

After some struggle and no success, I started validating everything I could. Some piece of words before the silver bullet here:

Noticed I knew you do have ASM? How come that? Well, most likely the error below the ASM happens at the point the DBCA runs a RMAN restore to create the database base files and metadata. This most likely happens at the time the write is happening on the ASM, once this is the most unstable process involved.

By looking deeper on the installation logs I could see:

[Thread-527] [ 2020-08-27 23:50:04.942 PDT ] [RMANUtil$RMANUtilErrorListener.handleError:1386] ERROR=channel ORA_DISK_1: restoring datafile 00001 to +DATA
[Thread-527] [ 2020-08-27 23:50:04.942 PDT ] [RMANUtil$RMANUtilErrorListener.handleError:1386] ERROR=channel ORA_DISK_1: reading from backup piece /ora01/app/oracle/product/19c/db/assistants/dbca/templates/Seed_Database.dfb
[Thread-527] [ 2020-08-27 23:50:04.942 PDT ] [RMANUtil$RMANUtilErrorListener.handleError:1386] ERROR=channel ORA_DISK_1: ORA-19870: error while restoring backup piece /ora01/app/oracle/product/19c/db/assistants/dbca/templates/Seed_Database.dfb
[Thread-527] [ 2020-08-27 23:50:04.942 PDT ] [RMANUtil$RMANUtilErrorListener.handleError:1386] ERROR=ORA-19504: failed to create file "+DATA"
[Thread-527] [ 2020-08-27 23:50:04.942 PDT ] [RMANUtil$RMANUtilErrorListener.handleError:1386] ERROR=ORA-17502: ksfdcre:4 Failed to create file +DATA
[Thread-527] [ 2020-08-27 23:50:04.942 PDT ] [RMANUtil$RMANUtilErrorListener.handleError:1386] ERROR=ORA-15001: diskgroup "DATA" does not exist or is not mounted
[Thread-527] [ 2020-08-27 23:50:04.942 PDT ] [RMANUtil$RMANUtilErrorListener.handleError:1386] ERROR=ORA-01017: invalid username/password; logon denied

Bingo, so it’s a password issue?

Well, I’m creating the database and this actually matches with all the chmod 6751 thing…

What then?

Well, after a while going crazy validating passwd files and so one, realized something about the oracle user:

[oracle@PROD01 bin]$ id -a
uid=500(oracle) gid=501(oinstall) groups=501(oinstall),10(wheel),203(dba),503(asmadmin),504(asmoper),525(madhoc) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
[oracle@PROD01 bin]$ grep oracle /etc/group
wheel:x:10:oracle
asmadmin:x:503:oracle
asmoper:x:504:oracle
madhoc:x:525:oracle

Can you see the oracle user is part of the oinstall group but not really appearing on /etc/group? Also not in osusergroup dba.

Well, let’s force it?

[oracle@PROD01 bin]$ sudo su -
Last login: Fri Aug 28 14:13:21 PDT 2020 on pts/3
[root@DMSDB1PA ~]# usermod -g oinstall -G oinstall,dba,asmadmin,asmoper,madhoc oracle
[root@PROD01 ~]# id oracle
uid=500(oracle) gid=501(oinstall) groups=501(oinstall),10(wheel),203(dba),503(asmadmin),504(asmoper),525(madhoc)
[root@PROD01 ~]# grep oracle /etc/group
wheel:x:10:oracle
dba:x:203:oracle
asmadmin:x:503:oracle
asmoper:x:504:oracle
oinstall:x:501:oracle
madhoc:x:525:oracle
[root@PROD01 ~]#

Well done!

Now try running DBCA again. That was a very tricky issue to find.

Know something else? At the point I was writing this I decided to have another look and ended up finding this MOS note: “ORA-17502 /ORA-01017: invalid username/password; logon denied ” While Creating 19c Database (Doc ID 2545858.1). We have a bug for it: BUG:29821687 – ORA-17502 /ORA-01017: INVALID USERNAME/PASSWORD; LOGON DENIED ” WHILE CREATING 19C DATABASE

You have the workaround already though. Go champs!

Hope it helps you, cheers!

OPatch Auto Failing to Apply Datapatch for a Removed Database

Hi all,

So, just a few days ago, during a client support, crossed the following case.

A few database creations initially failed with DBCA due to other issues, but it seems the DBCA didn’t cleread all creation steps after the failure and theoretical rollback.

As a consequence, whenever running DBCA with GUI client was seeing an old database. When trying to remove it with DBCA, the removal fail as the database can’t be brought up (creation has failed, remember?). Ok, we can leave with it, right?

Yes until we reached the point where Opatchauto failed with the following:

Verifying SQL patch applicability on home /u01/app/oracle/product/19c/db

"/bin/sh -c 'cd /u01/app/oracle/product/19c/db; ORACLE_HOME=/u01/app/oracle/product/19c/db ORACLE_SID=DB1 /u01/app/oracle/product/19c/db/OPatch/datapatch -prereq -verbose'" command failed with errors. Please refer to logs for more details. SQL changes, if any, can be analyzed by manually retrying the same command.

The reason? See the complete log about the failing step:

Executing command as oracle: 
 /bin/sh -c 'cd /u01/app/oracle/product/19c/db;ORACLE_HOME=/u01/app/oracle/product/19c/db ORACLE_SID=DB1 /u01/app/oracle/product/19c/db/OPatch/datapatch -verbose'
2020-09-02 16:26:56,362 INFO  [438] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor - COMMAND Looks like this: /bin/sh -c 'cd /u01/app/oracle/product/19c/db;ORACLE_HOME=/u01/app/oracle/product/19c/db ORACLE_SID=DB1 /u01/app/oracle/product/19c/db/OPatch/datapatch -verbose'
2020-09-02 16:26:57,662 INFO  [438] com.oracle.glcm.patch.auto.db.product.executor.GISystemCall - Is retry required=false
2020-09-02 16:26:57,662 INFO  [438] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor - status: 1
2020-09-02 16:26:57,662 INFO  [438] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor - 
COMMAND EXECUTION FAILURE :
SQL Patching tool version 19.8.0.0.0 Production on Wed Sep  2 16:26:57 2020
Copyright (c) 2012, 2020, Oracle.  All rights reserved.

Log file for this invocation: /u01/app/oracle/base/cfgtoollogs/sqlpatch/sqlpatch_25218_2020_09_02_16_26_57/sqlpatch_invocation.log

Connecting to database...
Error: prereq checks failed!
Database connect failed with: ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Linux-x86_64 Error: 2: No such file or directory
Additional information: 4376
Additional information: 1275019259 (DBD ERROR: OCISessionBegin)

Please refer to MOS Note 1609718.1 and/or the invocation log
/u01/app/oracle/base/cfgtoollogs/sqlpatch/sqlpatch_25218_2020_09_02_16_26_57/sqlpatch_invocation.log
for information on how to resolve the above errors.

SQL Patching tool complete on Wed Sep  2 16:26:57 2020

Clearly, the database is still in place.
As per MOS (ORA-01078 Can Not Delete Database From Dbca on Linux (Doc ID 1559634.1)) -> See the /etc/oratab!

Thing is, the doesn’t have the DB1 line. Also, all related files, logs directory structure, passwd, init, etc… all wiped out. What else?

Here is goes what seems Oracle forgot to tell:

[oracle@PRODB01 dbca]$ srvctl status database -d DB1
Database is not running.
[oracle@DMSDB1PA dbca]$ srvctl config database -d DB1
Database unique name: DB1
Database name: DB1
Oracle home: /u01/app/oracle/product/19c/db
Oracle user: oracle
Spfile: 
Password file: 
Domain: 
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Disk Groups: DATA
Services: 
OSDBA group: oinstall
OSOPER group: 
Database instance: DB1
Ohhh, that took me a while to realize, this was a Standalone server. Once understood, the fix is straight forward:
[oracle@PRODB01 dbca]$ srvctl remove database -d DB1
Remove the database dms? (y/[n]) y
[oracle@PRODB01 dbca]$

I hope this can buy you some minutes of MOS, in case you are googling it first.
Or save you in case you gave up MOS already.

Cheers!

I bet you heard about ORACLE_PDB_SID, right? But do you know how it works?

This one is more on a curious side =) but somehow interesting.

Not too long ago Mike Dietrichde published a post on to connect directly to a PDB without a password.

After it, lots of DBAs started to post their testing showing the same, But how it works? As there is only a bequeath connection available to a CDB?

Well, I was doing some digging for another project and stumbled on a curious thing. A logon trigger called DBMS_SET_PDB

To my surprise, this trigger alters the session´s container to the one specified by the ORACLE_PDB_SID variable. Similar to Tanel Poder´s script called cc.sql or ccr.sql which basically are short cuts to avoid a lot of typing =)

SYS@CDB19c>select
DBMS_Metadata.Get_DDL('TRIGGER',t.trigger_name,t.owner)
From
DBA_Triggers t
Where
TRIGGER_NAME ='DBMS_SET_PDB'; 2 3 4 5 6

DBMS_METADATA.GET_DDL('TRIGGER',T.TRIGGER_NAME,T.OWNER)
--------------------------------------------------------------------------------

CREATE OR REPLACE NONEDITIONABLE TRIGGER "SYS"."DBMS_SET_PDB" after logon on database
WHEN (user = 'SYS' or user = 'SYSTEM') declare
pdb_name varchar2(64);
begin
DBMS_SYSTEM.get_env ('ORACLE_PDB_SID', pdb_name);
if(pdb_name is not null)
then
EXECUTE IMMEDIATE 'alter session set container = ' || '"' || pdb_name
|| '"';
end if;
exception
when others then
NULL;
end dbms_set_pdb;
ALTER TRIGGER "SYS"."DBMS_SET_PDB" ENABLE

Also is mentioned in the article that this is this would only work on versions 18.8 and above. But well, as we have the code for the trigger, should it possible to work on 12c as well?

Let´s try it| =) the oldest container which I have at the moment is 12r2  but that should do it.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SYS@CDB12CR2>select * from v$version;

BANNER CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production 0
PL/SQL Release 12.2.0.1.0 - Production 0
CORE 12.2.0.1.0 Production 0
TNS for Linux: Version 12.2.0.1.0 - Production 0
NLSRTL Version 12.2.0.1.0 - Production 0

SYS@CDB12CR2>CREATE OR REPLACE NONEDITIONABLE TRIGGER "SYS"."DBMS_SET_PDB_EFA" after logon on database
2 WHEN (user = 'SYS' or user = 'SYSTEM') declare
3 pdb_name varchar2(64);
begin
4 5 DBMS_SYSTEM.get_env ('ORACLE_PDB_SID', pdb_name);
6 if(pdb_name is not null)
7 then
8 EXECUTE IMMEDIATE 'alter session set container = ' || '"' || pdb_name
9 || '"';
10 end if;
exception
when others then
NULL;
end DBMS_SET_PDB_EFA; 11 12 13 14
15 /

Trigger created.

SYS@CDB12CR2>ALTER TRIGGER "SYS"."DBMS_SET_PDB_EFA" ENABLE
2 /

Trigger altered.

SYS@CDB12CR2>show pdbs

CON_ID CON_NAME OPEN MODE RESTRICTED
---------- ------------------------------ ---------- ----------
2 PDB$SEED READ ONLY NO
3 PDB12CR2 READ WRITE NO
SYS@CDB12CR2>exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
[oracle@server01 ~]$ export ORACLE_PDB_SID=PDB12CR2;
[oracle@server01 ~]$ CDB12CR2
The Oracle base remains unchanged with value /oraadm/oracle
LD_LIBRARY_PATH=/oraadm/oracle/product/12.2.0.1/lib
ORACLE_BASE=/oraadm/oracle
ORACLE_HOME=/oraadm/oracle/product/12.2.0.1
ORACLE_PDB_SID=PDB12CR2
ORACLE_SID=CDB12CR2
ORAENV_ASK=NO

[oracle@server01 ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.2.0.1.0 Production on Thu Jul 30 11:30:34 2020

Copyright (c) 1982, 2016, Oracle. All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SYS@CDB12CR2>show pdbs

CON_ID CON_NAME OPEN MODE RESTRICTED
---------- ------------------------------ ---------- ----------
3 PDB12CR2 READ WRITE NO
SYS@CDB12CR2>alter session set container=cdb$root;

Session altered.

SYS@CDB12CR2>show pdbs

CON_ID CON_NAME OPEN MODE RESTRICTED
---------- ------------------------------ ---------- ----------
2 PDB$SEED READ ONLY NO
3 PDB12CR2 READ WRITE NO
SYS@CDB12CR2>

And as you can see, the trigger works as expected right, it’s not a rocket science as it gets the context environment variable value and alters the session´s container.

Thanks and until next time.

Cheers.

Elisson Almeida

 

Note:

Right after I wrote this post, I went to look for more information on ORACLE_PDB_SID and found another post from Mike Dietrichde which also shows the trigger code and some pitfalls that you must be aware.

Autonomous Health Framework – TFA events, changes and set

Continuing on this series, there are 2 commands which are very useful and we’ll see today: “events” and “changes”. Besides “set”, which we’ll cover too.

On the events command you will be able to drill down to a specific date and time, passing a specific component if needed as well you can check a database or instance as well. Using the last option you can, this is very useful when you want to see a history for the system and lists all important events in the system:

Usage : /oraadm/dba/ahf/oracle.ahf/tfa/bin/tfactl events [-search <keyword> | -component <ASM|CRS> | -database <db_name> | -instance <db_instance_name> | -source <filename> | -from <time> -to <time> | -json | -fields all|<fields_list> ]
-component [ASM|CRS] Searches all ASM or CRS events
-database Searches all events from a particular database
-instance Searches all events from a particular instance
-source Searches all events from a particular alert file
-json Outputs event information in JSON format
-fields When provided with the -json option, the command will only return the requested fields
-from "Mon/dd/yyyy hh:mm:ss" From <time>
or "yyyy-mm-dd hh:mm:ss"
or "yyyy-mm-ddThh:mm:ss"
or "yyyy-mm-dd"
-to "Mon/dd/yyyy hh:mm:ss" To <time>
or "yyyy-mm-dd hh:mm:ss"
or "yyyy-mm-ddThh:mm:ss"
or "yyyy-mm-dd"
-for "Mon/dd/yyyy" For <date>
or "yyyy-mm-dd"
-last <n><h|d>

[root@servertst01 bin]# ./tfactl events -last 30d -database dbaseprd


Output from host : servertst02
------------------------------

Event Summary:
INFO :7
ERROR :3
WARNING :0

Event Timeline:
[Apr/24/2020 01:43:06.000]: [db.dbaseprd.dbaseprd2]: Reconfiguration started (old inc 21, new inc 23)
[Apr/24/2020 02:09:47.000]: [db.dbaseprd.dbaseprd2]: Reconfiguration started (old inc 23, new inc 25)
[Apr/24/2020 02:12:36.000]: [db.dbaseprd.dbaseprd2]: Instance terminated by USER, pid = 246331
[Apr/24/2020 02:12:36.000]: [db.dbaseprd.dbaseprd2]: Shutting down instance (abort)
[Apr/24/2020 02:28:29.000]: [db.dbaseprd.dbaseprd2]: Starting ORACLE instance (normal)
[Apr/24/2020 02:28:37.000]: [db.dbaseprd.dbaseprd2]: Reconfiguration started (old inc 0, new inc 29)
[Apr/24/2020 02:31:06.000]: [db.dbaseprd.dbaseprd2]: Reconfiguration started (old inc 29, new inc 31)
[Apr/24/2020 03:14:38.000]: [db.dbaseprd.dbaseprd2]: Reconfiguration started (old inc 31, new inc 33)
[Apr/25/2020 11:46:12.000]: [db.dbaseprd.dbaseprd2]: ORA-00060: deadlock detected while waiting for resource
[May/07/2020 04:18:24.000]: [db.dbaseprd.dbaseprd2]: ORA-08103: object no longer exists

Output from host : servertst03
------------------------------

Event Summary:
INFO :7
ERROR :5
WARNING :0

Event Timeline:
[Apr/16/2020 04:12:25.000]: [db.dbaseprd.dbaseprd3]: ORA-08103: object no longer exists
[Apr/23/2020 04:12:17.000]: [db.dbaseprd.dbaseprd3]: ORA-08103: object no longer exists
[Apr/24/2020 01:43:06.000]: [db.dbaseprd.dbaseprd3]: Reconfiguration started (old inc 21, new inc 23)
[Apr/24/2020 02:09:47.000]: [db.dbaseprd.dbaseprd3]: Reconfiguration started (old inc 23, new inc 25)
[Apr/24/2020 02:12:37.000]: [db.dbaseprd.dbaseprd3]: Reconfiguration started (old inc 25, new inc 27)
[Apr/24/2020 02:28:38.000]: [db.dbaseprd.dbaseprd3]: Reconfiguration started (old inc 27, new inc 29)
[Apr/24/2020 02:31:05.000]: [db.dbaseprd.dbaseprd3]: Instance terminated by USER, pid = 6814
[Apr/24/2020 02:31:05.000]: [db.dbaseprd.dbaseprd3]: Shutting down instance (abort)
[Apr/24/2020 03:14:28.000]: [db.dbaseprd.dbaseprd3]: Starting ORACLE instance (normal)
[Apr/24/2020 03:14:37.000]: [db.dbaseprd.dbaseprd3]: Reconfiguration started (old inc 0, new inc 33)
[Apr/28/2020 10:12:44.000]: [db.dbaseprd.dbaseprd3]: ORA-00060: deadlock detected while waiting for resource
[May/10/2020 13:40:21.000]: [db.dbaseprd.dbaseprd3]: ORA-00060: deadlock detected while waiting for resource

Output from host : servertst01
------------------------------

Event Summary:
INFO :8
ERROR :2
WARNING :0

Event Timeline:
[Apr/24/2020 01:43:05.000]: [db.dbaseprd.dbaseprd1]: Instance terminated by USER, pid = 37711
[Apr/24/2020 01:43:05.000]: [db.dbaseprd.dbaseprd1]: Shutting down instance (abort)
[Apr/24/2020 02:09:38.000]: [db.dbaseprd.dbaseprd1]: Starting ORACLE instance (normal)
[Apr/24/2020 02:09:46.000]: [db.dbaseprd.dbaseprd1]: Reconfiguration started (old inc 0, new inc 25)
[Apr/24/2020 02:12:37.000]: [db.dbaseprd.dbaseprd1]: Reconfiguration started (old inc 25, new inc 27)
[Apr/24/2020 02:28:39.000]: [db.dbaseprd.dbaseprd1]: Reconfiguration started (old inc 27, new inc 29)
[Apr/24/2020 02:31:06.000]: [db.dbaseprd.dbaseprd1]: Reconfiguration started (old inc 29, new inc 31)
[Apr/24/2020 03:14:38.000]: [db.dbaseprd.dbaseprd1]: Reconfiguration started (old inc 31, new inc 33)
[Apr/24/2020 14:40:10.000]: [db.dbaseprd.dbaseprd1]: Incident details in: /oraadm/dbaseprd/admin/traces/diag/rdbms/dbaseprd/dbaseprd1/incident/incdir_115657/dbaseprd1_ora_151392_i115657.trc
[Apr/24/2020 14:40:10.000]: [db.dbaseprd.dbaseprd1]: ORA-00600: internal error code, arguments: [pevmexe.c: ENTER: which perc?], [], [], [], [], [], [], [], [], [], [], []

[root@servertst01 bin]#

The “changes” command, this shows the changes in the system, as I did not have any changes made after the TFA was installed was not able to reproduce it here. It lists all changes in the system:

Usage : /oraadm/dba/ahf/oracle.ahf/tfa/bin/tfactl changes [ -from <time> -to <time> | -for <time> | last <time_length> ]
-from "Mon/dd/yyyy hh:mm:ss" From <time>
or "yyyy-mm-dd hh:mm:ss"
or "yyyy-mm-ddThh:mm:ss"
or "yyyy-mm-dd"
-to "Mon/dd/yyyy hh:mm:ss" To <time>
or "yyyy-mm-dd hh:mm:ss"
or "yyyy-mm-ddThh:mm:ss"
or "yyyy-mm-dd"
-for "Mon/dd/yyyy" For <date>
or "yyyy-mm-dd"
-last <n><h|d>

Another way to see the changes in your environment is to use the search command passing a part to it and a specific component.

[root@servertst01 bin]# ./tfactl analyze -search "ALTER" -comp db
INFO: analyzing db (DB Alert Logs) logs for the last 60 minutes... Please wait...
INFO: analyzing host: servertst01

Report title: DB Alert Logs
Report date range: last ~1 hour(s)
Report (default) time zone: WET - Western European Time
Analysis started at: 11-May-2020 03:04:17 PM WEST
Elapsed analysis time: 2 second(s).
Configuration file: /oraadm/dba/ahf/oracle.ahf/tfa/ext/tnt/conf/tnt.prop
Configuration group: db
Parameter: ALTER
Total message count: 44,397, from 10-Feb-2020 01:19:51 PM WET to 11-May-2020 03:01:46 PM WEST
Messages matching last ~1 hour(s): 18, from 11-May-2020 02:17:23 PM WEST to 11-May-2020 03:01:46 PM WEST
Matching regex: ALTER
Case sensitive: false
Match count: 1
May 11 15:01:46 2020 - generic - ALTER SYSTEM SET open_cursors=6000 SCOPE=BOTH SID='*';

INFO: analyzing db (DB Alert Logs) logs for the last 60 minutes... Please wait...
INFO: analyzing host: servertst02

Report title: DB Alert Logs
Report date range: last ~1 hour(s)
Report (default) time zone: WET - Western European Time
Analysis started at: 11-May-2020 03:04:23 PM WEST
Elapsed analysis time: 0 second(s).
Configuration file: /oraadm/dba/ahf/oracle.ahf/tfa/ext/tnt/conf/tnt.prop
Configuration group: db
Parameter: ALTER
Total message count: 3,598, from 10-Feb-2020 03:03:42 PM WET to 11-May-2020 02:30:16 PM WEST
Messages matching last ~1 hour(s): 5, from 11-May-2020 02:30:05 PM WEST to 11-May-2020 02:30:16 PM WEST
Matching regex: ALTER
Case sensitive: false
Match count: 2
May 11 14:30:05 2020 - generic - ALTER SYSTEM ARCHIVE LOG
May 11 14:30:16 2020 - generic - ALTER SYSTEM ARCHIVE LOG
Thread 2 advanced to log sequence 1096 (LGWR switch)
Current log# 4 seq# 1096 mem# 0: +REDO1/testdba/onlinelog/group_4.274.1032015207
Current log# 4 seq# 1096 mem# 1: +REDO2/testdba/onlinelog/group_4.274.1032015207

INFO: analyzing db (DB Alert Logs) logs for the last 60 minutes... Please wait...
INFO: analyzing host: servertst03

Report title: DB Alert Logs
Report date range: last ~1 hour(s)
Report (default) time zone: WET - Western European Time
Analysis started at: 11-May-2020 03:04:25 PM WEST
Elapsed analysis time: 0 second(s).
Configuration file: /oraadm/dba/ahf/oracle.ahf/tfa/ext/tnt/conf/tnt.prop
Configuration group: db
Parameter: ALTER
Total message count: 1,687, from 19-Mar-2020 06:28:29 PM WET to 11-May-2020 02:30:18 PM WEST
Messages matching last ~1 hour(s): 4, from 11-May-2020 02:30:06 PM WEST to 11-May-2020 02:30:18 PM WEST
Matching regex: ALTER
Case sensitive: false
Match count: 0

And for last the set command which will configure several options for TFA:

[root@servertst01 bin]# ./tfactl set -h

Usage : tfact set ahfnotificationaddress="<space separated email ids>"
ahfnotificationaddress="<email1> <email2>" set AHF Notification Address

Turn ON/OFF or Modify various TFA features

Usage : /oraadm/dba/ahf/oracle.ahf/tfa/bin/tfactl set [ autodiagcollect=<ON|OFF> | trimfiles=<ON|OFF> | tracelevel=<COLLECT|SCAN|INVENTORY|OTHER|ISA|HANDLER|MAIN|CLIENT|CONSOLE>:<FATAL|ERROR|WARN|INFO|DEBUG|TRACE> | reposizeMB=<n> [repositorydir=<dir>] [-force] | repositorydir=<dir> [reposizeMB=<n>] [-force] | logsize=<n> [-local] | logcount=<n> [-local] | maxcorefilesize=<n> [-local] | maxcorecollectionsize=<n> [-local] | maxfilecollectionsize=<n> | autopurge=<ON|OFF> | publicip=<ON|OFF> | minSpaceForRTScan=<n> | rtscan=<ON|OFF> | diskUsageMon=<ON|OFF> | diskUsageMonInterval=<n> | manageLogsAutoPurge=<ON|OFF> | manageLogsAutoPurgeInterval=<n> | manageLogsAutoPurgePolicyAge=<d|h> | minagetopurge=<n> | tfaIpsPoolSize=<n> | tfaDbUtlPurgeAge=<n> | tfaDbUtlPurgeMode=<simple|resource|profile> | tfaDbUtlPurgeThreadDelay=<n> | tfaDbUtlCrsProfileDelay=<n> | indexRecoveryMode=<recreate|restore>] [-c]

autodiagcollect allow for automatic diagnostic collection when an event
is observed (default ON)
trimfiles allow trimming of files during diagcollection (default ON)
tracelevel control the trace level of log files in /oraadm/dba/ahf/oracle.ahf/data/servertst01/diag/tfa
(default INFO for all facilities)
reposizeMB=<n> set the maximum size of diagcollection repository to <n>MB
repositorydir=<dir> set the diagcollection repository to <dir>
-force skip inital checks while changing repository (Not Recommended)
logsize=<n> set the maximum size of each TFA log to <n>MB (default 50 MB)
logcount=<n> set the maximum number of TFA logs to <n> (default 10)
port=<n> set TFA Port to <n>
maxcorefilesize=<n> set the maximum size of Core File to <n>MB (default 20 MB )
maxcorecollectionsize=<n> set the maximum collection size of Core Files to <n>MB (default 200 MB )
maxfilecollectionsize=<n> set the maximum file collection size to <n>MB (default 5 GB )
autopurge allow automatic purging of collections when less space
is observed in repository (default OFF)
publicip allow TFA to run on public network
smtp Update SMTP Configuration
minSpaceForRTScan=<n> Minimun space required to run RT Scanning(default 500)
rtscan allow Alert Log Scanning
diskUsageMon allow Disk Usage Monitoring
diskUsageMonInterval=<n> Time interval between consecutive Disk Usage Snapshot(default 60 minutes)
manageLogsAutoPurge allow Manage Log Auto Purging
manageLogsAutoPurgeInterval=<n> Time interval between consecutive Managelogs Auto Purge(default 60 minutes)
manageLogsAutoPurgePolicyAge=<d|h> Logs older than the time period will be auto purged(default 30 days)
minagetopurge set the age in hours for collections to be skipped by
AutoPurge (default 12 Hours)
tfaIpsPoolSize set the TFA IPS pool size
tfaDbUtlPurgeAge set the TFA ISA Purge Age (in seconds)
tfaDbUtlPurgeMode set the TFA ISA Purge Mode (simple/resource)
tfaDbUtlPurgeThreadDelay set the TFA ISA Purge Thread Delay (in minutes)
tfaDbUtlCrsProfileDelay set the TFA ISA Crs Profile Delay
indexRecoveryMode set the Lucene index recovery mode (recreate/restore)
rediscoveryInterval set the time interval for running lite rediscovery
-c set the value on all nodes (Does not apply to repository
settings)
-local set the value on the local node (if option is not included
the value will be set on all the nodes)
Examples:
/oraadm/dba/ahf/oracle.ahf/tfa/bin/tfactl set autodiagcollect=ON
/oraadm/dba/ahf/oracle.ahf/tfa/bin/tfactl set autopurge=ON
/oraadm/dba/ahf/oracle.ahf/tfa/bin/tfactl set tracelevel=INVENTORY:3
/oraadm/dba/ahf/oracle.ahf/tfa/bin/tfactl set reposizeMB=20480
/oraadm/dba/ahf/oracle.ahf/tfa/bin/tfactl set logsize=100
/oraadm/dba/ahf/oracle.ahf/tfa/bin/tfactl set port=5000

Hope it helps, there are still many scenarios of TFA which I would like to bring but not that much time, I will try to make long posts for the next ones. We´ll see.

Thanks for reading!

Elisson Almeida

Autonomous Health Framework – Managing logs and REST API

Hi all,

We are used to use adrci to manage our database logs right, if you don’t about adrci please check here and here.

But did you know that TFA can cleanup the CRS and database logs as well?

You might be thinking: 2 set of tools for the same job? Yyeah kind off… =)

Adrci is configurable and performs the purges per configured policy and tfactl managelogs will not. Also the tfactl managelogs will also clean the listener logs as well and you can clean all cluster logs from one node without having to connect to each server to execute the purge from adrci for example.

The cleanup can be done by database or GI which comes in handy as well.

You have a dry run option which will show what will be deleted and total freed space without actually deleting any files

Usage : /oraadm/dba/ahf/oracle.ahf/tfa/bin/tfactl [run] managelogs [ -purge [[-older <n><m|h|d>] | [-gi] | [-database <all|d1,d2..>] | [-dryrun] ]] [ -show [usage|variation] [ [-older <n><d>] | [-gi] | [-database <all|d1,d2..>] ] ] [-node <all|local|node1,node2..>]

Options:
-purge           Purge logs
   -older        Timeperiod for log purge
   -gi           Purge Grid Infrastructure logs(all ADR homes under GIBASE/diag and crsdata(cvu dirs))
   -database     Purge Database logs (Default all else provide list)
   -dryrun       Estimate logs which will be cleared by purge command
-show            Print usage/variation details
   -older         Timeperiod for change in log volume
   -gi           Space utilization under GIBASE
   -database     Space utilization for Database logs (Default all else provide list)


 -older <n><m|h|d>  Files from past 'n' [d]ays or 'n' [h]ours or 'n' [m]inutes


e.g:
   /oraadm/dba/ahf/oracle.ahf/tfa/bin/tfactl managelogs -purge -older 30d -dryrun
   /oraadm/dba/ahf/oracle.ahf/tfa/bin/tfactl managelogs -purge -older 30d
   /oraadm/dba/ahf/oracle.ahf/tfa/bin/tfactl managelogs -show usage

   /oraadm/dba/ahf/oracle.ahf/tfa/bin/tfactl run managelogs -purge -older 30d -dryrun
   /oraadm/dba/ahf/oracle.ahf/tfa/bin/tfactl run managelogs -purge -older 30d
   /oraadm/dba/ahf/oracle.ahf/tfa/bin/tfactl run managelogs -show usage

 

TFA also comes with a REST API which you can use POST and GET http which you need to setup prior usage, it will setup 2 users, tfaadmin and tfarest, and start a ORDS service on the server.

[root@servertst01 bin]# ./tfactl rest status

TFA REST Services is not running

[root@servertst01 bin]# ./tfactl rest start

TFA REST Services is not running

[root@servertst01 bin]# ./tfactl rest -start

Configuring TFA REST Services using ORDS :

This might take couple of minutes. Please be patient.

Adding Dependency Jars to ORDS

Adding users to ORDS :

Password must be at least 8 characters and contain a combination of Upper case and Lower case letters, numbers and one of [!@#%_]

Enter a password for user tfaadmin:
Confirm password for user tfaadmin:

Enter a password for user tfarest:
Confirm password for user tfarest:

Starting TFA REST Services

Successfully started TFA REST Services [PID : 158682]

TFA URL : https://servertest01:9090/ords/tfactl/print/status

COMPLIANCE URL : https://servertest01:9090/ords/tfactl/orachk/version

 

The commands are most like the ones from the CLI but using this REST API, you will receive JSON outputs which you can process and automate the use of TFA

Below are some examples

/tfactl/print/status

[root@servertst01 ~]$ curl -k --user tfaadmin:tfaadmin https://servertst01:9090/ords/tfactl/print/status
[ {
"status" : "CheckOK",
"hostname" : "servertst01",
"pid" : 135195,
"port" : 5000,
"version" : "20.1.3.0.0",
"buildId" : "20130020200429095054",
"inventoryStatus" : "COMPLETE"
}, {
"status" : "CheckOK",
"hostname" : "servertst02",
"pid" : 241723,
"port" : 5000,
"version" : "20.1.3.0.0",
"buildId" : "20130020200429095054",
"inventoryStatus" : "COMPLETE"
} ]

/print/actions

[root@servertst01 ~]$ curl -k --user tfaadmin:tfaadmin https://servertst01:9090/ords/tfactl/print/actions
[ {
"actionName" : "Run inventory",
"hostname" : "servertst01",
"client" : "DiagCollectionThread",
"startTime" : "May 13 15:48:52 WEST",
"endTime" : "May 13 15:49:58 WEST",
"status" : "COMPLETE",
"comments" : "<components><component><name>EMAGENT</name></component><component><name>DBCLIENT</name></component><component><name>ACFS</name></component><component><name>RDBMS</name></component><component><name>OS</name></component><component><name>CRS</name></component><component><name>EM</name></component><component><name>ASMPROXY</name></component><component><name>ASMIO</name></component><component><name>CRSCLIENT</name></component><component><name>OMS</name></component><component><name>DBWLM</name></component><component><name>INSTALL</name></component><component><name>TNS</name></component><component><name>OCM</name></component><component><name>ASM</name></component><component><name>RHP</name></component><component><name>CHA</name></component><component><name>AFD</name></component><component><name>EMPLUGINS</name></component><component><name>CFGTOOLS</name></component><component><name>WLS</name></component></components>"
}, {
"actionName" : "Collect traces",
"hostname" : "servertst01",
"client" : "tfactl",
"startTime" : "May 13 15:48:51 WEST",
"endTime" : "May 13 15:53:21 WEST",
"status" : "COMPLETE",
"comments" : "-database all -rdbms -asm -crsclient -dbclient -dbwlm -tns -rhp -afd -crs -wls -emagent -oms -ocm -emplugins -em -acfs -install -cfgtools -os -sundiag -chmos -monitor -copy -since 12h -node all -z TFA_DEF_ZIP_20200513154851 -user root -tag TFA_DEF_TAG_20200513154851 -logid 20200513154851:20200513154851servertst01"
}, {
"actionName" : "Run inventory",
"hostname" : "servertst01",
"client" : "DiagCollectionThread",
"startTime" : "May 13 15:47:48 WEST",
"endTime" : "May 13 15:48:20 WEST",
"status" : "COMPLETE",
"comments" : "<components><component><name>EMAGENT</name></component><component><name>DBCLIENT</name></component><component><name>ACFS</name></component><component><name>RDBMS</name></component><component><name>OS</name></component><component><name>CRS</name></component><component><name>EM</name></component><component><name>ASMPROXY</name></component><component><name>ASMIO</name></component><component><name>CRSCLIENT</name></component><component><name>OMS</name></component><component><name>DBWLM</name></component><component><name>INSTALL</name></component><component><name>TNS</name></component><component><name>OCM</name></component><component><name>ASM</name></component><component><name>RHP</name></component><component><name>CHA</name></component><component><name>AFD</name></component><component><name>EMPLUGINS</name></component><component><name>CFGTOOLS</name></component><component><name>WLS</name></component></components>"
}, {
"actionName" : "Collect traces",
"hostname" : "servertst01",
"client" : "tfactl",
"startTime" : "May 13 15:47:48 WEST",
"endTime" : "May 13 15:52:29 WEST",
"status" : "COMPLETE",
"comments" : "-database all -rdbms -asm -crsclient -dbclient -dbwlm -tns -rhp -afd -crs -wls -emagent -oms -ocm -emplugins -em -acfs -install -cfgtools -os -sundiag -chmos -monitor -copy -since 12h -node all -z TFA_DEF_ZIP_20200513154748 -user root -tag TFA_DEF_TAG_20200513154748 -logid 20200513154748:20200513154748servertst01"
}, {
"actionName" : "Run inventory",
"hostname" : "servertst02",
"client" : "DiagCollectionThread",
"startTime" : "May 13 15:48:54 WEST",
"endTime" : "May 13 15:49:04 WEST",
"status" : "COMPLETE",
"comments" : "<components><component><name>EMAGENT</name></component><component><name>DBCLIENT</name></component><component><name>ACFS</name></component><component><name>RDBMS</name></component><component><name>OS</name></component><component><name>CRS</name></component><component><name>EM</name></component><component><name>ASMPROXY</name></component><component><name>ASMIO</name></component><component><name>CRSCLIENT</name></component><component><name>OMS</name></component><component><name>DBWLM</name></component><component><name>INSTALL</name></component><component><name>OCM</name></component><component><name>TNS</name></component><component><name>ASM</name></component><component><name>RHP</name></component><component><name>CHA</name></component><component><name>AFD</name></component><component><name>EMPLUGINS</name></component><component><name>CFGTOOLS</name></component><component><name>WLS</name></component></components>"
}, {
"actionName" : "Collect traces",
"hostname" : "servertst02",
"client" : "tfactl",
"startTime" : "May 13 15:48:53 WEST",
"endTime" : "May 13 15:53:27 WEST",
"status" : "COMPLETE",
"comments" : "-timezone Europe/Lisbon -nodelist servertst02_servertst01 -s 05.13.2020.03.48 -e 05.13.2020.15.48 -updateendtime -z TFA_DEF_ZIP_20200513154851.zip -tag TFA_DEF_TAG_20200513154851 -logid 20200513154851 -emagent -dbclient -acfs -rdbms -os -crs -em -asmproxy -asmio -crsclient -oms -dbwlm -install -tns -ocm -asm -rhp -cha -afd -emplugins -cfgtools -wls -chmos -copy -user root -uid 9a79d417-a877-4355-a2f2-210f1d45024a:20200513154851servertst01"
}, {
"actionName" : "Run inventory",
"hostname" : "servertst02",
"client" : "DiagCollectionThread",
"startTime" : "May 13 15:47:50 WEST",
"endTime" : "May 13 15:48:12 WEST",
"status" : "COMPLETE",
"comments" : "<components><component><name>EMAGENT</name></component><component><name>DBCLIENT</name></component><component><name>ACFS</name></component><component><name>RDBMS</name></component><component><name>OS</name></component><component><name>CRS</name></component><component><name>EM</name></component><component><name>ASMPROXY</name></component><component><name>ASMIO</name></component><component><name>CRSCLIENT</name></component><component><name>OMS</name></component><component><name>DBWLM</name></component><component><name>INSTALL</name></component><component><name>OCM</name></component><component><name>TNS</name></component><component><name>ASM</name></component><component><name>RHP</name></component><component><name>CHA</name></component><component><name>AFD</name></component><component><name>EMPLUGINS</name></component><component><name>CFGTOOLS</name></component><component><name>WLS</name></component></components>"
}, {
"actionName" : "Collect traces",
"hostname" : "servertst02",
"client" : "tfactl",
"startTime" : "May 13 15:47:49 WEST",
"endTime" : "May 13 15:52:37 WEST",
"status" : "COMPLETE",
"comments" : "-timezone Europe/Lisbon -nodelist servertst02_servertst01 -s 05.13.2020.03.47 -e 05.13.2020.15.47 -updateendtime -z TFA_DEF_ZIP_20200513154748.zip -tag TFA_DEF_TAG_20200513154748 -logid 20200513154748 -emagent -dbclient -acfs -rdbms -os -crs -em -asmproxy -asmio -crsclient -oms -dbwlm -install -tns -ocm -asm -rhp -cha -afd -emplugins -cfgtools -wls -chmos -copy -user root -uid 94229462-6332-4a7e-849b-9469c1ad86ef:20200513154748servertst01"
} ]

/tfactl/diagcollect

[root@servertst01 ~]$ curl -k --user tfaadmin:tfaadmin https://servertst01:9090/ords/tfactl/diagcollect
{
"collectionId" : "20200513161625servertst01",
"zipName" : "TFA_DEF_ZIP_20200513161625",
"tagName" : "TFA_DEF_TAG_20200513161625",
"message" : [ "Diagcollect request will be processed soon by TFA" ]
}

 

You can download the report using the collectionId as below

curl -k --user tfaadmin:tfaadmin -L0 https://servertst01:9090/ords/tfactl/download/20200513161625servertst01 --output remoteTFA.zip

 % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 196M 0 196M 0 0 195M 0 --:--:-- 0:00:01 --:--:-- 195M

On the next ones I will do a review on a ORACHK and OSWatcher which are also part of the Autonomous Health Framework

I Hope it helps!

Elisson Almeida

Autonomous Health Framework – OSWatcher

Hi all,

Another one for our series about Autonomous Health Framework:

Along with the AHF, we have some tools that we can take advantage of, one of them is OSWatcher. OSwatcher is a utility to capture performance metrics from the operating system using native OS tools for IO, network, CPU, memory, etc.

It gathers a snapshot of your system and stores it in a directory which you can then use it to parse the information there and perform a system wide analysis.

You can stop it if you want, out of the box it will gather OS information every 30 minutes

To see if it is running you can use tfactl toolstatus as below

[root@servertst01 bin]# ./tfactl toolstatus

.------------------------------------------------------------------.
| TOOLS STATUS - HOST : servertst01 |
+----------------------+--------------+--------------+-------------+
| Tool Type | Tool | Version | Status |
+----------------------+--------------+--------------+-------------+
| Development Tools | orachk | 19.3.0.0.0 | DEPLOYED |
| | oratop | 14.1.2 | DEPLOYED |
+----------------------+--------------+--------------+-------------+
| Support Tools Bundle | darda | 2.10.0.R6036 | DEPLOYED |
| | oswbb | 8.3.2 | RUNNING |
| | prw | 12.1.13.11.4 | NOT RUNNING |
+----------------------+--------------+--------------+-------------+
| TFA Utilities | alertsummary | 19.3.0.0.0 | DEPLOYED |
| | calog | 19.3.0.0.0 | DEPLOYED |
| | dbcheck | 18.3.0.0.0 | DEPLOYED |
| | dbglevel | 19.3.0.0.0 | DEPLOYED |
| | grep | 19.3.0.0.0 | DEPLOYED |
| | history | 19.3.0.0.0 | DEPLOYED |
| | ls | 19.3.0.0.0 | DEPLOYED |
| | managelogs | 19.3.0.0.0 | DEPLOYED |
| | menu | 19.3.0.0.0 | DEPLOYED |
| | param | 19.3.0.0.0 | DEPLOYED |
| | ps | 19.3.0.0.0 | DEPLOYED |
| | pstack | 19.3.0.0.0 | DEPLOYED |
| | summary | 19.3.0.0.0 | DEPLOYED |
| | tail | 19.3.0.0.0 | DEPLOYED |
| | triage | 19.3.0.0.0 | DEPLOYED |
| | vi | 19.3.0.0.0 | DEPLOYED |
'----------------------+--------------+--------------+-------------'

Note :-
DEPLOYED : Installed and Available - To be configured or run interactively.
NOT RUNNING : Configured and Available - Currently turned off interactively.
RUNNING : Configured and Available.

[root@servertst01 bin]#

And to run a simple test you can call tfactl oswbb and it should parse all data in its archive directory BUT and I tried to do so I ran into a java out of memory error
I bumped the memory of the process up to 4G and still gor the same error.

If you want to try to increase your process memory you will need to edit the file oswbb.pm under oracle.ahf/tfa/ext/oswbb

You will see a line like below, and you will need to change the memory valeu to one that your system can handle, I´m not saying it will not work you can try.

system("$java -Xmx512M -jar $oswjar -i $adir @flags");

Here is the error:

[root@servertst01 bin]# ./tfactl oswbb

Starting OSW Analyzer V8.3.0
OSWatcher Analyzer Written by Oracle Center of Expertise
Copyright (c) 2019 by Oracle Corporation

Parsing Data. Please Wait...

Scanning file headers for version and platform info...


Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at u.c(Unknown Source)
at u.a(Unknown Source)
at OSWGraph.OSWGraph.main(Unknown Source)


Analysis results are saved in /oraadm/oracrs/product/19.0.0/ahf/oracle.ahf/data/repository/suptools/servertst01/oswbb/root/oswbb

And there process memory which was a bit ovee 4G when it died

[root@servertst01 ~]$ ps -eo rss,pid,euser,lstart,args:100 --sort %mem | grep -v grep | grep java | awk '{printf $1/1024 "MB"; $1=""; print }'| sort |grep osw
4364.3MB 103531 root Wed May 13 18:09:12 2020 /oraadm/oracrs/product/19.0.0/ahf/oracle.ahf/jre/bin/java -Xmx4096M -jar /oraadm/oracrs/product/19.0.0/ahf/oracle.ahf/data/repository/suptools/servertst01/oswbb/root/oswbb/oswbba.jar -i /oraadm/oracrs/product/19.0.0/ahf/oracle.ahf/data/repository/suptools/servertst01/oswbb/oracrs/archive

I had over 4 days of data there, so by using the options -B and -E I was able to workaround the issue. If you are doing some troubleshooting I would advise to read 4 days of data anyway as we could be see averages and a spike could be reduced and not see in the analysis.
So it did not bothered that much.

Here when passing the date range I was able to execute the process as needed.

[root@servertst01 bin]# ./tfactl oswbb -6 -B May 13 09:25:00 2020 -E May 13 09:30:00 2020

Validating times in the archive...


Starting OSW Analyzer V8.3.0
OSWatcher Analyzer Written by Oracle Center of Expertise
Copyright (c) 2019 by Oracle Corporation

Parsing Data. Please Wait...

Scanning file headers for version and platform info...


Parsing file servertst01_pidstat_20.05.13.0900.dat ...

Parsing file servertst01_iostat_20.05.13.0900.dat ...
This directory already exists. Rewriting...

Parsing file servertst01_vmstat_20.05.13.0900.dat ...

Parsing file servertst01_netstat_20.05.13.0900.dat ...

Parsing file servertst01_top_20.05.13.0900.dat ...

Parsing file servertst01_ps_20.05.13.0900.dat ...


Parsing Completed.


Enter 1 to Display CPU Process Queue Graphs
Enter 2 to Display CPU Utilization Graphs
Enter 3 to Display CPU Other Graphs
Enter 4 to Display Memory Graphs
Enter 5 to Display Disk IO Graphs
Enter 61 to Display Individual OS Process I/O RPS Graphs
Enter 62 to Display Individual OS Process I/O WPS Graphs
Enter 63 to Display Individual OS Process Percent User CPU Graphs
Enter 64 to Display Individual OS Process Percent System CPU Graphs
Enter 65 to Display Individual OS Process Percent Total CPU (User + System) Graphs
Enter 66 to Display Individual OS Process Percent Memory Graphs

Enter GP to Generate Individual Process Profile
Enter GC to Generate All CPU Gif Files
Enter GM to Generate All Memory Gif Files
Enter GD to Generate All Disk Gif Files
Enter GN to Generate All Network Gif Files

Enter L to Specify Alternate Location of Gif Directory
Enter Z to Zoom Graph Time Scale (Does not change analysis dataset)
Enter B to Returns to Baseline Graph Time Scale (Does not change analysis dataset)
Enter R to Remove Currently Displayed Graphs

Enter X to Export Parsed Data to Flat File
Enter S to Analyze Subset of Data(Changes analysis dataset including graph time scale)
Enter A to Analyze Data
Enter D to Generate DashBoard

Enter Q to Quit Program

Please Select an Option:

Hope it helps!

Elisson Almeida

Autonomous Health Framework – TFA data collection

Hi all,

Here I am or another post on the Autonomous Health Framework series!

Based on previous posts, we now have the AHF installed, what can we do with it?
I would say a lot! Everything? I don’t think so.

I will post here what I usually use more on TFA and then I will focus on orachk (which is a monster on its own as well).

Imagine that you have a problem and need to gather different information from the OS and the Oracle Stack from different nodes?

That´s where TFA – Trace file analyzer comes into play. It will collect the data for you and if you run as its root. You can specify the component that you need if you need to drill down to a very specify issue.

Also when you have a Service request with MOS they will probably ask for it to be executed but if they don’t, you should do it and upload any way as it will save a lot of time to troubleshooting and back and forth messages.

And if you have Internet access, you can upload it using the TFA as well, see it here.

[root@servertst01 bin]# ./tfactl diagcollect –h

Invalid Option for diagcollect: –h

Collect logs from across nodes in cluster

Usage : /oraadm/oracrs/product/19.0.0/ahf/oracle.ahf/tfa/bin/tfactl diagcollect [ [component_name1] [component_name2] ... [component_nameN] | [-srdc ] | [-defips]] [-sr <SR#>] [-node <all|local|n1,n2,..>] [-tag ] [-z ] [-last <m|h|d>| -from  -to  | -for ] [-nocopy] [-notrim] [-silent] [-nocores][-collectalldirs][-collectdir <dir1,dir2..>][-examples]
components:-ips|-database|-asm|-crsclient|-dbclient|-dbwlm|-tns|-rhp|-procinfo|-cvu|-afd|-crs|-cha|-chastore|-wls|-emagent|-oms|-ocm|-emplugins|-em|-acfs|-install|-cfgtools|-os|-ashhtml|-ashtext|-awrhtml|-awrtext|-qos
-srdc Service Request Data Collection (SRDC).
-defips Include in the default collection the IPS Packages for:
ASM, CRS and Databases
-sr Enter SR number to which the collection will be uploaded
-node Specify comma separated list of host names for collection
-tag  The files will be collected into tagname directory inside
repository
-z  The collection zip file will be given this name within the
TFA collection repository
-last <m|h|d> Files from last 'n' [m]inutes, 'n' [d]ays or 'n' [h]ours
-since Same as -last. Kept for backward compatibility.
-from "Mon/dd/yyyy hh:mm:ss" From 
or "yyyy-mm-dd hh:mm:ss"
or "yyyy-mm-ddThh:mm:ss"
or "yyyy-mm-dd"
-to "Mon/dd/yyyy hh:mm:ss" To 
or "yyyy-mm-dd hh:mm:ss"
or "yyyy-mm-ddThh:mm:ss"
or "yyyy-mm-dd"
-for "Mon/dd/yyyy" For .
or "yyyy-mm-dd"
-nocopy Does not copy back the zip files to initiating node from all nodes
-notrim Does not trim the files collected
-silent This option is used to submit the diagcollection as a background
process
-nocores Do not collect Core files when it would normally have been
collected
-collectalldirs Collect all files from a directory marked "Collect All"
flag to true
-collectdir Specify comma separated list of directories and collection will
include all files from these irrespective of type and time constraints
in addition to components specified
-examples Show diagcollect usage examples
For detailed help on each component use:
/oraadm/oracrs/product/19.0.0/ahf/oracle.ahf/tfa/bin/tfactl diagcollect [component_name1] [component_name2] ... [component_nameN] -help
[root@servertst01 bin]#

[root@servertst01 bin]# ./tfactl diagcollect -examples
Examples:
/oraadm/oracrs/product/19.0.0/ahf/oracle.ahf/tfa/bin/tfactl diagcollect
Trim and Zip all files updated in the last 12 hours as well as chmos/osw data
from across the cluster and collect at the initiating node
Note: This collection could be larger than required but is there as the
simplest way to capture diagnostics if an issue has recently occurred.

/oraadm/oracrs/product/19.0.0/ahf/oracle.ahf/tfa/bin/tfactl diagcollect -last 8h
Trim and Zip all files updated in the last 8 hours as well as chmos/osw data
from across the cluster and collect at the initiating node

/oraadm/oracrs/product/19.0.0/ahf/oracle.ahf/tfa/bin/tfactl diagcollect -database hrdb,fdb -last 1d -z foo
Trim and Zip all files from databases hrdb & fdb in the last 1 day and
collect at the initiating node

/oraadm/oracrs/product/19.0.0/ahf/oracle.ahf/tfa/bin/tfactl diagcollect -crs -os -node node1,node2 -last 6h
Trim and Zip all crs files, o/s logs and chmos/osw data from node1 & node2
updated in the last 6 hours and collect at the initiating node

/oraadm/oracrs/product/19.0.0/ahf/oracle.ahf/tfa/bin/tfactl diagcollect -asm -node node1 -from "May/07/2020" -to "May/07/2020 21:00:00"
Trim and Zip all ASM logs from node1 updated between from and to time and
collect at the initiating node

/oraadm/oracrs/product/19.0.0/ahf/oracle.ahf/tfa/bin/tfactl diagcollect -for "May/07/2020"
Trim and Zip all log files updated on "May/07/2020" and collect at the
collect at the initiating node

/oraadm/oracrs/product/19.0.0/ahf/oracle.ahf/tfa/bin/tfactl diagcollect -for "May/07/2020 21:00:00"
Trim and Zip all log files updated from 09:00 on "May/07/2020" to 09:00 on "May/08/2020"
(i.e. 12 hours before and after the time given) and collect at the initiating node

/oraadm/oracrs/product/19.0.0/ahf/oracle.ahf/tfa/bin/tfactl diagcollect -crs -collectdir /tmp_dir1,/tmp_dir2
Trim and Zip all crs files updated in the last 12 hours
Also collect all files from /tmp_dir1 and /tmp_dir2 at the initiating node