Autonomous Health Framework – Missing node on TFA

Hi all,

Welcome to the second post of this series!

Yesterday we one about Autonomous Health Framework basics and tomorrow we have another one!

For today: While manually installing AHF – when the root user can not ssh to the cluster nodes – you might have a missing communication between the nodes while installing
which you don’t have all cluster nodes found by all TFA installations.

In my example, I have a 3 nodes cluster which I was doing the manual install, on the 1st two nodes I was able to see them but on the 3rd node the tfa was not being able to see the other 2.

Here are the steps or the manual installation:

  • Node1
[root@servertst01 oraadm]# /oraadm/home/oracrs/ahf_setup -ahf_loc /oraadm/dba/ahf -data_dir /oraadm/dba/ahf/data

AHF Installer for Platform Linux Architecture x86_64

AHF Installation Log : /tmp/ahf_install_178647_2020_05_08-16_24_07.log

Starting Autonomous Health Framework (AHF) Installation

AHF Version: 20.1.3 Build Date: 202004290950

TFA is already installed at : /oraadm/oracrs/product/19.0.0/tfa/servertst01/tfa_home

Installed TFA Version : 192000 Build ID : 20190426041420

AHF Location : /oraadm/dba/ahf/oracle.ahf

AHF Data Directory : /oraadm/dba/ahf/oracle.ahf/data

Shutting down TFA : /oraadm/oracrs/product/19.0.0/tfa/servertst01/tfa_home

Copying TFA Data Files from /oraadm/oracrs/product/19.0.0/tfa/servertst01/tfa_home

Uninstalling TFA : /oraadm/oracrs/product/19.0.0/tfa/servertst01/tfa_home

Do you want to add AHF Notification Email IDs ? [Y]|N : n

Login using root is disabled in sshd config. Installing AHF only on Local Node

Extracting AHF to /oraadm/dba/ahf/oracle.ahf

Configuring TFA Services

Copying TFA Data Files to AHF

Discovering Nodes and Oracle Resources

Starting TFA Services
Created symlink from /etc/systemd/system/multi-user.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
Created symlink from /etc/systemd/system/graphical.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.

.---------------------------------------------------------------------------------.
| Host | Status of TFA | PID | Port | Version | Build ID |
+-------------+---------------+--------+------+------------+----------------------+
| servertst01 | RUNNING | 189966 | 5000 | 20.1.3.0.0 | 20130020200429095054 |
'-------------+---------------+--------+------+------------+----------------------'

Running TFA Inventory...

Adding default users to TFA Access list...

.--------------------------------------------------------------------.
| Summary of AHF Configuration |
+-----------------+--------------------------------------------------+
| Parameter | Value |
+-----------------+--------------------------------------------------+
| AHF Location | /oraadm/dba/ahf/oracle.ahf |
| TFA Location | /oraadm/dba/ahf/oracle.ahf/tfa |
| Orachk Location | /oraadm/dba/ahf/oracle.ahf/orachk |
| Data Directory | /oraadm/dba/ahf/oracle.ahf/data |
| Repository | /oraadm/dba/ahf/oracle.ahf/data/repository |
| Diag Directory | /oraadm/dba/ahf/oracle.ahf/data/servertst01/diag |
'-----------------+--------------------------------------------------'


Starting orachk daemon from AHF ...

AHF binaries are available in /oraadm/dba/ahf/oracle.ahf/bin

AHF is successfully installed

Moving /tmp/ahf_install_178647_2020_05_08-16_24_07.log to /oraadm/dba/ahf/oracle.ahf/data/servertst01/diag/ahf/

Node 2

[root@servertst02 oraadm]# /oraadm/home/oracrs/ahf_setup -ahf_loc /oraadm/dba/ahf -data_dir /oraadm/dba/ahf/data

AHF Installer for Platform Linux Architecture x86_64

AHF Installation Log : /tmp/ahf_install_236841_2020_05_08-16_27_47.log

Starting Autonomous Health Framework (AHF) Installation

AHF Version: 20.1.3 Build Date: 202004290950

TFA is already installed at : /oraadm/oracrs/product/19.0.0/tfa/servertst02/tfa_home

Installed TFA Version : 192000 Build ID : 20190426041420

[ERROR] : AHF-00009: AHF Location directory [/oraadm/dba/ahf] not found

[root@servertst02 oraadm]# mkdir -p /oraadm/dba/ahf/data
[root@servertst02 oraadm]# /oraadm/home/oracrs/ahf_setup -ahf_loc /oraadm/dba/ahf -data_dir /oraadm/dba/ahf/data

AHF Installer for Platform Linux Architecture x86_64

AHF Installation Log : /tmp/ahf_install_237428_2020_05_08-16_27_59.log

Starting Autonomous Health Framework (AHF) Installation

AHF Version: 20.1.3 Build Date: 202004290950

TFA is already installed at : /oraadm/oracrs/product/19.0.0/tfa/servertst02/tfa_home

Installed TFA Version : 192000 Build ID : 20190426041420

AHF Location : /oraadm/dba/ahf/oracle.ahf

AHF Data Directory : /oraadm/dba/ahf/oracle.ahf/data

Shutting down TFA : /oraadm/oracrs/product/19.0.0/tfa/servertst02/tfa_home

Copying TFA Data Files from /oraadm/oracrs/product/19.0.0/tfa/servertst02/tfa_home

Uninstalling TFA : /oraadm/oracrs/product/19.0.0/tfa/servertst02/tfa_home

Do you want to add AHF Notification Email IDs ? [Y]|N : n

Login using root is disabled in sshd config. Installing AHF only on Local Node

Extracting AHF to /oraadm/dba/ahf/oracle.ahf

Configuring TFA Services

Copying TFA Data Files to AHF

Discovering Nodes and Oracle Resources

Starting TFA Services
Created symlink from /etc/systemd/system/multi-user.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
Created symlink from /etc/systemd/system/graphical.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.

.---------------------------------------------------------------------------------.
| Host | Status of TFA | PID | Port | Version | Build ID |
+-------------+---------------+--------+------+------------+----------------------+
| servertst02 | RUNNING | 249057 | 5000 | 20.1.3.0.0 | 20130020200429095054 |
| servertst01 | RUNNING | 189966 | 5000 | 20.1.3.0.0 | 20130020200429095054 |
'-------------+---------------+--------+------+------------+----------------------'

Running TFA Inventory...

Adding default users to TFA Access list...

.--------------------------------------------------------------------.
| Summary of AHF Configuration |
+-----------------+--------------------------------------------------+
| Parameter | Value |
+-----------------+--------------------------------------------------+
| AHF Location | /oraadm/dba/ahf/oracle.ahf |
| TFA Location | /oraadm/dba/ahf/oracle.ahf/tfa |
| Orachk Location | /oraadm/dba/ahf/oracle.ahf/orachk |
| Data Directory | /oraadm/dba/ahf/oracle.ahf/data |
| Repository | /oraadm/dba/ahf/oracle.ahf/data/repository |
| Diag Directory | /oraadm/dba/ahf/oracle.ahf/data/servertst02/diag |
'-----------------+--------------------------------------------------'


Starting orachk daemon from AHF ...

AHF binaries are available in /oraadm/dba/ahf/oracle.ahf/bin

AHF is successfully installed

Moving /tmp/ahf_install_237428_2020_05_08-16_27_59.log to /oraadm/dba/ahf/oracle.ahf/data/servertst02/diag/ahf/

[root@servertst02 oraadm]# ls /oraadm/dba/ahf/oracle.ahf/data/servertst02/

Node 3

[root@servertst03 oraadm]# /oraadm/home/oracrs/ahf_setup -ahf_loc /oraadm/dba/ahf -data_dir /oraadm/dba/ahf/data

AHF Installer for Platform Linux Architecture x86_64

AHF Installation Log : /tmp/ahf_install_129104_2020_05_08-16_31_08.log

Starting Autonomous Health Framework (AHF) Installation

AHF Version: 20.1.3 Build Date: 202004290950

TFA is already installed at : /oraadm/oracrs/product/19.0.0/tfa/servertst03/tfa_home

Installed TFA Version : 192000 Build ID : 20190426041420

AHF Location : /oraadm/dba/ahf/oracle.ahf

AHF Data Directory : /oraadm/dba/ahf/oracle.ahf/data

Shutting down TFA : /oraadm/oracrs/product/19.0.0/tfa/servertst03/tfa_home

Copying TFA Data Files from /oraadm/oracrs/product/19.0.0/tfa/servertst03/tfa_home

Uninstalling TFA : /oraadm/oracrs/product/19.0.0/tfa/servertst03/tfa_home

Do you want to add AHF Notification Email IDs ? [Y]|N : n

Login using root is disabled in sshd config. Installing AHF only on Local Node

Extracting AHF to /oraadm/dba/ahf/oracle.ahf

Configuring TFA Services

Copying TFA Data Files to AHF

Discovering Nodes and Oracle Resources

Starting TFA Services
Created symlink from /etc/systemd/system/multi-user.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
Created symlink from /etc/systemd/system/graphical.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.

.---------------------------------------------------------------------------------.
| Host | Status of TFA | PID | Port | Version | Build ID |
+-------------+---------------+--------+------+------------+----------------------+
| servertst03 | RUNNING | 135279 | 5000 | 20.1.3.0.0 | 20130020200429095054 |
'-------------+---------------+--------+------+------------+----------------------'

Running TFA Inventory...

Adding default users to TFA Access list...

.--------------------------------------------------------------------.
| Summary of AHF Configuration |
+-----------------+--------------------------------------------------+
| Parameter | Value |
+-----------------+--------------------------------------------------+
| AHF Location | /oraadm/dba/ahf/oracle.ahf |
| TFA Location | /oraadm/dba/ahf/oracle.ahf/tfa |
| Orachk Location | /oraadm/dba/ahf/oracle.ahf/orachk |
| Data Directory | /oraadm/dba/ahf/oracle.ahf/data |
| Repository | /oraadm/dba/ahf/oracle.ahf/data/repository |
| Diag Directory | /oraadm/dba/ahf/oracle.ahf/data/servertst03/diag |
'-----------------+--------------------------------------------------'


Starting orachk daemon from AHF ...

AHF binaries are available in /oraadm/dba/ahf/oracle.ahf/bin

AHF is successfully installed

Moving /tmp/ahf_install_129104_2020_05_08-16_31_08.log to /oraadm/dba/ahf/oracle.ahf/data/servertst03/diag/ahf/

As you can see the node 2 was able to see the existing installation of node 1 but when it came to node 3 it was not able to.

To resolve the issue, we need to use the syncnodes command:

[root@servertst01 bin]# ./tfactl status

.----------------------------------------------------------------------------------------------------.
| Host | Status of TFA | PID | Port | Version | Build ID | Inventory Status |
+-------------+---------------+--------+------+------------+----------------------+------------------+
| servertst01 | RUNNING | 189966 | 5000 | 20.1.3.0.0 | 20130020200429095054 | COMPLETE |
| servertst02 | RUNNING | 249057 | 5000 | 20.1.3.0.0 | 20130020200429095054 | COMPLETE |
'-------------+---------------+--------+------+------------+----------------------+------------------'
[root@servertst01 bin]# ./tfactl syncnodes

Login using root is disabled in sshd config. Please enable it or

Please copy these files manually to remote node and restart TFA
1. /oraadm/dba/ahf/oracle.ahf/data/servertst01/tfa/server.jks
2. /oraadm/dba/ahf/oracle.ahf/data/servertst01/tfa/client.jks
3. /oraadm/dba/ahf/oracle.ahf/data/servertst01/tfa/internal/ssl.properties

These files must be owned by root and should have 600 permissions.

I copied the files to the specified location and stopped and started TFA and it was able to see all nodes in the cluster.

[root@servertst03 bin]# ./tfactl statusahf

.----------------------------------------------------------------------------------------------------.
| Host | Status of TFA | PID | Port | Version | Build ID | Inventory Status |
+-------------+---------------+--------+------+------------+----------------------+------------------+
| servertst03 | RUNNING | 135279 | 5000 | 20.1.3.0.0 | 20130020200429095054 | COMPLETE |
'-------------+---------------+--------+------+------------+----------------------+------------------'

orachk scheduler is running [PID: 135279] [Version: 20.1.3]
[root@servertst03 bin]# ./tfactl stop
Stopping TFA from the Command Line
Stopped OSWatcher
Nothing to do !
Killing TFA running with pid 135279
. . .
Successfully stopped TFA..
[root@servertst03 bin]# ./tfactl start
Starting TFA..
Waiting up to 100 seconds for TFA to be started..
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
Successfully started TFA Process..
. . . . .
TFA Started and listening for commands
[root@servertst03 bin]# ./tfactl status

.----------------------------------------------------------------------------------------------------.
| Host | Status of TFA | PID | Port | Version | Build ID | Inventory Status |
+-------------+---------------+--------+------+------------+----------------------+------------------+
| servertst03 | RUNNING | 191886 | 5000 | 20.1.3.0.0 | 20130020200429095054 | COMPLETE |
| servertst01 | RUNNING | 189966 | 5000 | 20.1.3.0.0 | 20130020200429095054 | COMPLETE |
| servertst02 | RUNNING | 249057 | 5000 | 20.1.3.0.0 | 20130020200429095054 | COMPLETE |
'-------------+---------------+--------+------+------------+----------------------+------------------'

And that solved the issue!

I hope it helps you!

Elisson Almeida

Autonomous Health Framework – Installation and Usage Basics

Hi all,

So, this week we’ll have an Autonomous Health Framework series o posts! One post per day, stay tuned!

Today I’ll review some points of the Autonomous Health Framework and how it can help us on the day to day tasks as well as on specific problem resolutions. This framework is a bundle of tools that were already available like orachk/exachk, tfa plus other features that I will try to cover focusing on practical cases.

So, first of all, you need to install it, if you have a RAC cluster most likely you already have it but it is always good to update it, as Oracle security recommendations as also part of the orachk analyze. For reference: In an ideal world you have root access or sudo, as its daemons under root provides a more wide log collection capabilities.

  1. Download the latest version from Autonomous Health Framework (AHF) – Including TFA and ORAchk/EXAChk (Doc ID 2550798.1)
  2. Transfer and unzip the file to your server and as root follow the installation as below
root@servertst01 oracrs]# ./ahf_setup -ahf_loc /oraadm/oracrs/product/19.0.0/ahf -data_dir /oraadm/oracrs/product/19.0.0/ahf/data

AHF Installer for Platform Linux Architecture x86_64

AHF Installation Log : /tmp/ahf_install_120962_2020_05_07-22_41_11.log

Starting Autonomous Health Framework (AHF) Installation

AHF Version: 20.1.3 Build Date: 202004290950

TFA is already installed at : /oraadm/oracrs/product/19.0.0/tfa/servertst01/tfa_home

Installed TFA Version : 192000 Build ID : 20190426041420

AHF Location : /oraadm/oracrs/product/19.0.0/ahf/oracle.ahf

AHF Data Directory : /oraadm/oracrs/product/19.0.0/ahf/oracle.ahf/data

Shutting down TFA : /oraadm/oracrs/product/19.0.0/tfa/servertst01/tfa_home

Copying TFA Data Files from /oraadm/oracrs/product/19.0.0/tfa/servertst01/tfa_home

Uninstalling TFA : /oraadm/oracrs/product/19.0.0/tfa/servertst01/tfa_home

Do you want to add AHF Notification Email IDs ? [Y]|N : n

Login using root is disabled in sshd config. Installing AHF only on Local Node

Extracting AHF to /oraadm/oracrs/product/19.0.0/ahf/oracle.ahf

Configuring TFA Services

Copying TFA Data Files to AHF

Discovering Nodes and Oracle Resources

Starting TFA Services
Created symlink from /etc/systemd/system/multi-user.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
Created symlink from /etc/systemd/system/graphical.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.

.---------------------------------------------------------------------------------.
| Host | Status of TFA | PID | Port | Version | Build ID |
+-------------+---------------+--------+------+------------+----------------------+
| servertst01 | RUNNING | 135195 | 5000 | 20.1.3.0.0 | 20130020200429095054 |
'-------------+---------------+--------+------+------------+----------------------'

Running TFA Inventory...

Adding default users to TFA Access list...

.--------------------------------------------------------------------------------------.
| Summary of AHF Configuration |
+-----------------+--------------------------------------------------------------------+
| Parameter | Value |
+-----------------+--------------------------------------------------------------------+
| AHF Location | /oraadm/oracrs/product/19.0.0/ahf/oracle.ahf |
| TFA Location | /oraadm/oracrs/product/19.0.0/ahf/oracle.ahf/tfa |
| Orachk Location | /oraadm/oracrs/product/19.0.0/ahf/oracle.ahf/orachk |
| Data Directory | /oraadm/oracrs/product/19.0.0/ahf/oracle.ahf/data |
| Repository | /oraadm/oracrs/product/19.0.0/ahf/oracle.ahf/data/repository |
| Diag Directory | /oraadm/oracrs/product/19.0.0/ahf/oracle.ahf/data/servertst01/diag |
'-----------------+--------------------------------------------------------------------'


Starting orachk daemon from AHF ...

AHF binaries are available in /oraadm/oracrs/product/19.0.0/ahf/oracle.ahf/bin

AHF is successfully installed

Moving /tmp/ahf_install_120962_2020_05_07-22_41_11.log to /oraadm/oracrs/product/19.0.0/ahf/oracle.ahf/data/servertst01/diag/ahf/

As you can see as the root user is not allowed to ssh, the installation was done only on the local node. Repeat the process on the remaning ones

Once you have it completed you will be able to to a tfactl status and a toolstatus and see the processes and tools available.
[root@servertst01 bin]# ./tfactl status

.----------------------------------------------------------------------------------------------------.
| Host | Status of TFA | PID | Port | Version | Build ID | Inventory Status |
+-------------+---------------+--------+------+------------+----------------------+------------------+
| servertst01 | RUNNING | 135195 | 5000 | 20.1.3.0.0 | 20130020200429095054 | COMPLETE |
| servertst02 | RUNNING | 241723 | 5000 | 20.1.3.0.0 | 20130020200429095054 | COMPLETE |
'-------------+---------------+--------+------+------------+----------------------+------------------'
[root@servertst01 bin]#
[root@servertst01 bin]# ./tfactl toolstatus

.------------------------------------------------------------------.
| TOOLS STATUS - HOST : servertst01 |
+----------------------+--------------+--------------+-------------+
| Tool Type | Tool | Version | Status |
+----------------------+--------------+--------------+-------------+
| Development Tools | orachk | 19.3.0.0.0 | DEPLOYED |
| | oratop | 14.1.2 | DEPLOYED |
+----------------------+--------------+--------------+-------------+
| Support Tools Bundle | darda | 2.10.0.R6036 | DEPLOYED |
| | oswbb | 8.3.2 | RUNNING |
| | prw | 12.1.13.11.4 | NOT RUNNING |
+----------------------+--------------+--------------+-------------+
| TFA Utilities | alertsummary | 19.3.0.0.0 | DEPLOYED |
| | calog | 19.3.0.0.0 | DEPLOYED |
| | dbcheck | 18.3.0.0.0 | DEPLOYED |
| | dbglevel | 19.3.0.0.0 | DEPLOYED |
| | grep | 19.3.0.0.0 | DEPLOYED |
| | history | 19.3.0.0.0 | DEPLOYED |
| | ls | 19.3.0.0.0 | DEPLOYED |
| | managelogs | 19.3.0.0.0 | DEPLOYED |
| | menu | 19.3.0.0.0 | DEPLOYED |
| | param | 19.3.0.0.0 | DEPLOYED |
| | ps | 19.3.0.0.0 | DEPLOYED |
| | pstack | 19.3.0.0.0 | DEPLOYED |
| | summary | 19.3.0.0.0 | DEPLOYED |
| | tail | 19.3.0.0.0 | DEPLOYED |
| | triage | 19.3.0.0.0 | DEPLOYED |
| | vi | 19.3.0.0.0 | DEPLOYED |
'----------------------+--------------+--------------+-------------'

Note :-
DEPLOYED : Installed and Available - To be configured or run interactively.
NOT RUNNING : Configured and Available - Currently turned off interactively.
RUNNING : Configured and Available.

[root@servertst01 bin]#

3. Using it you can ‘query’ all cluster nodes for information from a single point which can really speed up your information collection. Here is an example of the summary command:

[root@servertst01 bin]# ./tfactl summary -h
---------------------------------------------------------------------------------
Usage : TFACTL [run] summary -help
---------------------------------------------------------------------------------
Command : /oraptp/oracrs/product/19.0.0/ahf/oracle.ahf/tfa/bin/tfactl [run] summary [OPTIONS]
Following Options are supported:
        [no_components] : [Default] Complete Summary Collection
        -overview       : [Optional/Default] Complete Summary Collection - Overview
        -crs            : [Optional/Default] CRS Status Summary
        -asm            : [Optional/Default] ASM Status Summary
        -acfs           : [Optional/Default] ACFS Status Summary
        -database       : [Optional/Default] DATABASE Status Summary
        -exadata        : [Optional/Default] EXADATA Status Summary
                          Not enabled/ignored in Windows and Non-Exadata machine
        -patch          : [Optional/Default] Patch Details
        -listener       : [Optional/Default] LISTENER Status Summary
        -network        : [Optional/Default] NETWORK Status Summary
        -os             : [Optional/Default] OS Status Summary
        -tfa            : [Optional/Default] TFA Status Summary
        -summary        : [Optional/Default] Summary Tool Metadata

        -json           : [Optional] - Prepare json report
        -html           : [Optional] - Prepare html report
        -print          : [Optional] - Display [html or json] Report at Console
        -silent         : [Optional] - Interactive console by defauly
        -history <num>  : [Optional] - View Previous <numberof> Summary Collection History in Interpreter
        -node <node(s)> : [Optional] - local or Comma Separated Node Name(s)
        -help           : Usage/Help.
---------------------------------------------------------------------------------
  - Data Collection From Node - servertst02 .. Done.
[root@servertst01 bin]#

[root@servertst01 bin]# ./tfactl summary

  Executing Summary in Parallel on Following Nodes:
    Node : servertst01
    Node : servertst02

LOGFILE LOCATION : /oraadm/oracrs/product/19.0.0/ahf/oracle.ahf/data/repository/suptools/servertst01/summary/root/20200508124309/log/summary_command_20200508124309_servertst01_35756.log

  Component Specific Summary collection :
    - Collecting CRS details ... Done.
    - Collecting ASM details ... Done.
    - Collecting ACFS details ... Done.
    - Collecting DATABASE details ... Done.
    - Collecting PATCH details ... Done.
    - Collecting LISTENER details ... Done.
    - Collecting NETWORK details ... Done.
    - Collecting OS details ... Done.
    - Collecting TFA details ... Done.
    - Collecting SUMMARY details ... Done.

  Remote Summary Data Collection : In-Progress - Please wait ...
  - Data Collection From Node - servertst02 .. Done.

  Prepare Clusterwide Summary Overview ... Done
      cluster_status_summary

  COMPONENT   STATUS    DETAILS
+-----------+---------+---------------------------------------------------------------------------------------------------+
  CRS         PROBLEM   .-----------------------------------------------.
                        | CRS_SERVER_STATUS   : ONLINE                  |
                        | CRS_STATE           : ONLINE                  |
                        | CRS_INTEGRITY_CHECK : FAIL                    |
                        | CRS_RESOURCE_STATUS : OFFLINE Resources Found |
                        '-----------------------------------------------'
  ASM         PROBLEM   .-------------------------------------------------------.
                        | ASM_DISK_SIZE_STATUS : WARNING - Available Size < 20% |
                        | ASM_BLOCK_STATUS     : PASS                           |
                        | ASM_CHAIN_STATUS     : PASS                           |
                        | ASM_INCIDENTS        : PASS                           |
                        | ASM_PROBLEMS         : PASS                           |
                        '-------------------------------------------------------'
  ACFS        OFFLINE   .-----------------------.
                        | ACFS_STATUS : OFFLINE |
                        '-----------------------'
  DATABASE    PROBLEM   .-----------------------------------------------------------------------------------------------.
                        | ORACLE_HOME_DETAILS                                                        | ORACLE_HOME_NAME |
                        +----------------------------------------------------------------------------+------------------+
                        | .------------------------------------------------------------------------. | OraDb11g_home1   |
                        | | PROBLEMS | INCIDENTS | DB_BLOCKS | DATABASE_NAME | STATUS  | DB_CHAINS | |                  |
                        | +----------+-----------+-----------+---------------+---------+-----------+ |                  |
                        | | PROBLEM  | PROBLEM   | PASS      | FSIGNTST      | PROBLEM | PROBLEM   | |                  |
                        | '----------+-----------+-----------+---------------+---------+-----------' |                  |
                        '----------------------------------------------------------------------------+------------------'
  PATCH       OK        .----------------------------------------------.
                        | CRS_PATCH_CONSISTENCY_ACROSS_NODES      : OK |
                        | DATABASE_PATCH_CONSISTENCY_ACROSS_NODES : OK |
                        '----------------------------------------------'
  LISTENER    OK        .-----------------------.
                        | LISTNER_STATUS   : OK |
                        '-----------------------'
  NETWORK     OK        .---------------------------.
                        | CLUSTER_NETWORK_STATUS :  |
                        '---------------------------'
  OS          OK        .-----------------------.
                        | MEM_USAGE_STATUS : OK |
                        '-----------------------'
  TFA         OK        .----------------------.
                        | TFA_STATUS : RUNNING |
                        '----------------------'
  SUMMARY     OK        .-----------------------------------.
                        | SUMMARY_EXECUTION_TIME : 0H:2M:9S |
                        '-----------------------------------'
+-----------+---------+---------------------------------------------------------------------------------------------------+


        ### Entering in to SUMMARY Command-Line Interface ###

tfactl_summary>list

  Components : Select Component - select [component_number|component_name]
        1 => overview
        2 => crs_overview
        3 => asm_overview
        4 => acfs_overview
        5 => database_overview
        6 => patch_overview
        7 => listener_overview
        8 => network_overview
        9 => os_overview
        10 => tfa_overview
        11 => summary_overview

 

See you tomorrow at the next post o this series!

Elisson Almeida

 

Monitoring database storage over-committing with OEM extention metric when using autoextend datafiles

Hi all,

Working on a project, I was asked to stop using autoextend on the database file, while setting up a new environment on a client, to avoid ASM diskgroup space issues.

In case the free space was not enough to handle all datafiles growth when often people would add datafiles in autoextend without checking the complete database e storage, which could cause a bigger problem.

I really like the autoextend, as it can prevent processes from failing and even more. Instead of be wasting space as often, in multiple tablespace environment, as the tablespace grows differently and as I mention not all validations are often done when adding space. We just want to resolve the issue and move to the next one.

So how can we save space by using it on where is needed and not being at risk of over-committing the storage use?

The idea is to monitor the database possible growth and compare with the current free space in the ASM diskgroup being used by the database.

So to accomplish that, I used column MAXBYTES in dba_data_files and dba_temp_files. In the example below we can see the current size from the BYTES column and the MAXBYTES as the limit to which the file can grow to be.  If the MAXBYTES is 0 the file does not have autoextend on.

SYS@db011>select TABLESPACE_NAME,FILE_NAME,BYTES,MAXBYTES from dba_data_files where TABLESPACE_NAME like 'SYS%' or TABLESPACE_NAME='UNDOTBS1';

TABLESPACE_NAME FILE_NAME BYTES MAXBYTES
------------------------------ -------------------------------------------------------------------------- ---------- ------------
SYSTEM +ORA_DATA/db01/datafile/system.264.1032015007 2147483648 32212254720
SYSAUX +ORA_DATA/db01/datafile/sysaux.319.1032015009 2751463424 32212254720
UNDOTBS1 +ORA_DATA/db01/datafile/undotbs1.320.103201500 4294967296 0

So, to gather the max database size, I summed the greatest value between BYTES and MAXBYTES (when available) and using the v$asm_diskgroup to grab the current avaliable space from the diskgroup used in the db_create_file_dest parameter and finally subtracting the available space from the max database size:

select dg.FREE_MB - round((dbf.max+tempf.max),0) as total_free
from
(select trunc(sum(greatest(maxbytes,bytes)/1024/1024)) as max
from dba_data_files ) dbf,
(select trunc(sum(greatest(maxbytes,bytes)/1024/1024)) as max
from dba_temp_files) tempf,
(select FREE_MB
from v$asm_diskgroup where name in (select LTRIM( value, '+' ) from v$parameter where name='db_create_file_dest')) dg;

Based on this we can create several ways to avoid the storage over-committing.

The one I used was to create a metric extension:

If the value of the above query is negative, means that the max database size is greater than the available size therefore the capacity plan for that database must be reviewed.

  1. To create the metric extension, go to, on the OEM menu  – Enterprise -> Monitoring -> Metric Extension > Actions -> Create -> Metric Extension.
  2. In the General Properties page select the target type as Cluster database if it´s RAC or Database instance.
  3. In the Adapter, select “SQL”
  4. In Collection schedule, enable the metric, upload the data and set the collection frequently to your environment needs.
  5. In the Adapter page add the above query.
  6. In the Columns page, add a column for the result of the query.
    1. To fit my environment I only set the critical alert and in case the value of the metric is below 0.
  7. Continue filling the necessary information on the next pages, as well as test the new metric on a selected target, Afterwards save the new metric.
  8. Prior to deploying it to any target you need save the metric as a deployable draft, select the created metric and go the action menu and select Save as Deployable draft

Now you can deploy the new metric to the targets.

Hope this can help!

Elisson Almeida

Autonomous Linux – Did you hear about it?

Hi all!

Not long ago Oracle lunched the Oracle Autonomous Database which run from the Oracle Cloud Infrastructure. This new service provides a database which several tasks are automated for you. From security patching, performance tunning and others…

ccac4eed66edabdb4b94b7aa54071ba8

So what this is all about?

Now the Oracle Linux was added to the Autonomous family as well. This means it is an operation system which runs from OCI which goals are to provide a more secure, cost effective and high reliable system with less manual administrative tasks.

It’s not new that security patches are a must but who has the downtime avaliable or the staff to patch their systems every time a new patch is released?

With Oracle Ksplice you will have kernel patches without having to reboot your instance. zero-day vulnerabilities and overall reduced sys admin manual tasks. But we know that this type of solutions do not replace a professional but insetad free their time to do other more important tasks.

What are the down sides of it? In my humble opinion is that we dont have much option to deny a update once we have it automated. We can use the Oracle OS Management Service to manage the servers separating which we want to automate or to manual control.

Note from the Autonomous Linux page is that The Oracle Autonomous Linux image has been moved. It will no longer be available on the  Oracle Cloud Marketplace or the Oracle Images catalog. As of this date, it is available from the Platform Images catalog within the Oracle Cloud Infrastructure console, when creating a compute instance.

I got my seft locked out my OCI account (while studying for the free exams, if dont know what I´m talking abuot please see it here) so I wont be able to show scrren on how to create a insatnce useing the Autonomous Linux but for sure this will be done very soon.

Stay sharp!

Elisson Almeida

New slots for FREE OCI Certifications exams are OPEN!

Breaking news all!

New slots for the OCI free exam certifications are open, the earliest that I saw a opening was on July 6th, even the less we now have more time to study and no charge at all!

Click on the link  to book your exam for free.

In the last week all slots for the exams were full and I tried to schedule one for me but could not find any stop until May 15th which this program would end but now they are open but dont know until when so rush and schedule yours while they are available.

If you don’t know what I’m talking about.  Last March 30th , it was announced that OCI training and certification exams were free for 6 OCI related programs.

This lo’s of people are taking advantage of it as you have quality training (not enough for the certification alone but I will get to it), and most important the with no exam fee.

The trainings are good, video recordings similar to the ones I mentioned here in this post. Why I say it is not enough, because you have access only to the videos, not to any E-Kit related material which is complementary to the training so if you want to practice and get a bit deeper, which you will need if you want to pass, you will have to do some digging on your own.

I managed to take 2 certifications Autonomous Database Specialist and Oracle Cloud Infrastructure Foundations Associate. I scheduled 2 more as they are for July I will have time to prepare for them.

Also please read the OCI related FAQ for this program which will clear several points, like you have one free change per exam and If you schedule an exam directly through the Pearson VUE website, you won’t receive this discount.

Hope it helps and good luck

Elisson Almeida

ALTERNATE Archive log destination configuration that can help you – at least it helped me =)

Hi all,

I was working on a database that was waiting on a backup policy to be configured on the Netbackup side, so basically it had no backups. No news here right =)

The dev team did not wait for much to hammer down the database which caused the archive log generation spiked high.

They were planning to work all weekend long and I needed to make sure that the database was available.

I had a spare diskgroup that I could use when the FRA got full but I did not wanted to monitor the space usage along the weekend to change the archive log destination, also did not want set it up ahead of it causing to double the archive log being sent to both locations wasting space.

So, one active location at a particular time and if /when the 1st location filled up, it would switch to the secondary location.

The configuration that helped me with was the ALTERNATE parameter in the log_archive_dest_n.

The parameter “Specifies an alternate archiving destination to be used when the original destination fails.”

If you want you can read the all details of the parameter here 

Most of the configurations using this parameter are related to the Dataguard standby configurations but it works on this scenario as well.

This is how I used it:
*I used the noreopen configuration as I knew that the space issue would not be resolved during the weekend

alter system set log_archive_dest_1='location=use_db_recovery_file_dest noreopen alternate=log_archive_dest_2' scope=both sid'*';
alter system set log_archive_dest_2='+NEW_FLASH´ sid='*';
alter system set log_archive_dest_state_2= 'ALTERNATE' scope=both sid='*';

On the database, you can see the status of the archive destinations:

select dest_id, dest_name, status from v$archive_dest_status where status <> 'INACTIVE';

DEST_ID DEST_NAME STATUS
---------- -------------------- ---------
1 LOG_ARCHIVE_DEST_1 VALID
2 LOG_ARCHIVE_DEST_2 UNKNOWN

I saw this error in the alert when the space on the 1st destination exhausted (also some hiccups on the database´s services):

ORA-17502: ksfdcre:4 Failed to create file +FLASH
ORA-15041: diskgroup "+FLASH" space exhausted
ORA-16038: log 8 sequence# 1059 cannot be archived
ORA-19504: failed to create file ""

Checking the log_archive_dest parameter status, the 1st one got disabled and only the 2nd was is valid, the hickups were baralley noticed andthe database kept working all the time.

select dest_id, dest_name, status from v$archive_dest_status where status <> 'INACTIVE';

DEST_ID DEST_NAME STATUS
---------- -------------------- ---------
1 LOG_ARCHIVE_DEST_1 DISABLED
2 LOG_ARCHIVE_DEST_2 VALID

When the space issue was resolved, all I needed to do was to enable the 1st location and set the 2nd one again to alternate.

select dest_id, dest_name, status from v$archive_dest_status where status <> 'INACTIVE';

DEST_ID DEST_NAME STATUS
---------- -------------------- ---------
1 LOG_ARCHIVE_DEST_1 VALID
2 LOG_ARCHIVE_DEST_2 UNKNOWN

Thanks and hope it helps

Elisson Almeida

ASM: Process PPAx holds disk from being dropped

So I was trying to remove some no longer needed disks from a cluster: 1st I umounted the disgroups from the other cluster nodes and from the last one I executed the DROP DISKGROUP DG_NAME INCLUDING CONTENTS. Nothing new right, but the SA told me that the disks were still being accessed.

I checked with kfod and kfed and also using asmcmd lsdsk -g –candidate, the disks were there, avaliable to be used in another diskgroup but they were no a part of any diskgroup.

I tried to check using asmcmd lsod but it was not returning anything while using lsop and fuser I was able to see the process that was holding the disks, which was an asm process PPA7:
*the disks that I´m was looking for was from the db5*

ASMCMD> lsod --process ppa
Instance Process OSPID Path
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_db1_data1p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_db1_data3p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_db1_flash1p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_db1_redos1p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_db1_redos6p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_db1_redos7p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_db2_data1p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_db2_data3p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_db2_flash1p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_db2_redos1p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_db2_redos2p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_db3_data1p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_db3_data6p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_db3_flash1p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_db3_redos1p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_db3_redos2p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_ocrvoting2p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_ocrvoting5p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_ora_data1p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_ora_data2p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_ora_flash1p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_ora_redo1p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_ora_redo2p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_ora_redo3p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_ora_redo4p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_db4_data1p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_db4_data6p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_db4_flash1p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_db4_flash2p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_db4_redos1p1
1 oracle@server01 (PPA7) 38908 /dev/mapper/mp_db4_redos2p1

[oracrs@server01 ~]$ fuser /dev/mapper/mp_db5_redos2p1
/dev/dm-99: 38908
[oracrs@server01 ~]$ lsof /dev/mapper/mp_db5_redos2p1
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
asm_ppa7_ 38908 oracrs 270u BLK 253,99 0t0 70663 /dev/mapper/../dm-99

Checking MOS I found some notes like Database Process Hold Locks Dropped ASM Disk (Doc ID 2159941.1) but still would not help me as there was no work around the issue.

So basicly I needed to kill the process but could I? It was a production system and kill ASM processes is not something that one should take lightly.

In the documentation the PPA process is found under the
Parallel Query Slave Process which made my life easier, right? Not really.

How to make sure that killing a process would not make the instance to go down?

There is a script and blog post from Tanel Poder which also helped me , and it is worth the reading,

Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.6.0.0.0

SYS@+ASM1>SELECT indx,ksuprpnm,TO_CHAR(ksuprflg,'XXXXXXXXXXXXXXXX')
FROM x$ksupr
WHERE BITAND(ksuprflg,4) = 4 ORDER BY indx
/ 2 3 4

INDX KSUPRPNM TO_CHAR(KSUPRFLG,
---------- ------------------------------------------------ -----------------
2 oracle@server01 (PMON) E
3 oracle@server01 (CLMN) E
4 oracle@server01 (PSP0) 6
5 oracle@server01 (VKTM) 6
6 oracle@server01 (GEN0) 6
7 oracle@server01 (MMAN) 6
13 oracle@server01 (PMAN) 6
15 oracle@server01 (LMON) 6
16 oracle@server01 (LMD0) 6
17 oracle@server01 (LMS0) 6
20 oracle@server01 (LCK1) 6
21 oracle@server01 (DBW0) 6
22 oracle@server01 (LGWR) 6
23 oracle@server01 (CKPT) 6
24 oracle@server01 (SMON) 6
25 oracle@server01 (LREG) 6
27 oracle@server01 (RBAL) 6
28 oracle@server01 (GMON) 6
31 oracle@server01 (IMR0) 6
32 oracle@server01 (LCK0) 6
38 oracle@server01 (ASMB) 6

Cutting the sotry short as my process was not the query output, I did some other testing and I was comfortable enough to kill it and release the disks to the SA team.

Hope it helps

Elisson Almeida

Oracle Patching with OPlan

Everyone that I’ve worked with knows that I don’t like patching (and sometimes I try to imagine who does), but they are necessary to corrects bugs and improve the Oracle software stability.

When you have a single node server with one database, the patch planing is no brainer but when you have a RAC with multiple nodes, different Oracle homes and so on, the planning and preparations start to get more complex and it is easy to miss or overlook a step in the planning which can lead to issues during your patching.

So to help me with all that I use oplan. Oplan is a tool which comes with along OPatch and you can get its latest version in patch 6880880

More informations on oplan can be found here: Oracle Software Patching with OPLAN (Doc ID 1306814.1)

OK, so what do I used it most for?

Generating the apply patching steps, which are very in handy:

$ORACLE_HOME/OPatch/oplan/oplan generateApplySteps <bundle patch location>

And my favorite, rollback steps, which I have done more times that I would like to admit:

$ORACLE_HOME/OPatch/oplan/oplan generateRollbackSteps <bundle patch location>

Also as rollback, I do tar of the oracle binaries being patched prior as there times even the rollback did not work :-/

Both files will be created under the directory below and you will see an html and text files.

$ORACLE_HOME/cfgtoollogs/oplan/<TimeStamp>/

This process is to help you organise your steps, read it through prior executing to make sure it makes sense in your environment

Oplan has its limitations, from the Oracle note which I mentioned above:

Data Guard configurations are not supported.
OPlan can be used to create patch plans for Oracle home's running Oracle Data Guard configurations, but OPlan does not consider such an environment usable as 'Data Guard Standby-First Patch Apply' alternative. See the following for additional information on 'Data Guard Standby-First Patch Apply'

<Document 1265700.1> Oracle Patch Assurance - Data Guard Standby-First Patch Apply

Shared Oracle Home Configurations are not supported.

Single Instance Databases running in the same configuration are not supported

Even so I would still use it as it generates a plan based on your target environment adding more information that you would need to do manually if you were only to read the README files from the patching

Hope it helps.

Thanks and until next time

Elisson Almeida

Parallel file transfer on Linux

Hi all,

I had a request to copy a ton of files from one file system to another,  I know that there are tools that can help with that like rsync but due to some requirements and me wanted to do some scripting I put something together  to help with this request. This is not the 1st time I do something like this but it is the 1st time I share 🙂

What I’m sharing is now what I did for the request I mentioned but you will get an idea

The script will copy a list of files from one server to another. This list I usually create by using find like this

find /Directory_which_I_want_to_copy -type f > file_list.txt

The script will receive some parameters as listed below

parallel_xfer.ksh    

Also a requirement for this to work is that you can ssh to the target server without a password.

It will keep X parallel sessions running at all times until there are new files to start copying it, After all copies are started, it will monitor them until completion. Also the script assumes that the source and target directory destination is the same but this is easily changed if needed.

The logging needs to be improved but it will show the file it started as well their processes count

Hope it helps

Elisson

#!/bin/ksh
DBLIST=${1}
DEST_DIR=${2}
SERVER=${3}
NUM_SESS=${4}
STARTED_COUNT=0
RUN_COUNT=0

trim() {
    local var=$@
    var="${var#"${var%%[![:space:]]*}"}"   # remove leading whitespace characters
    var="${var%"${var##*[![:space:]]}"}"   # remove trailing whitespace characters
    echo "$var"
}


FILE_COUNT="$(cat ${DEST_DIR}/$DBLIST | wc -l)"
cd ${DEST_DIR}
for FILE in $(cat $DBLIST)
do
 STARTED_COUNT=$((${STARTED_COUNT}+1))
 if [ ${RUN_COUNT} -le ${NUM_SESS} ]
 then
   sftp -Cq USER@${SERVER}:${FILE} ${DEST_DIR}/. >/dev/null 2>/dev/null &
   echo "`date` - Transferring file ${FILE} to ${DEST_DIR} - ${STARTED_COUNT}/$(trim ${FILE_COUNT})"
   sleep 5
 fi
 echo "\n"

 RUN_COUNT=$(ps -ef | grep " ${$} " | grep sftp | grep -v grep | wc -l)

 while [ ${RUN_COUNT} -ge ${NUM_SESS} ]
 do
  RUN_COUNT=$(ps -ef | grep " ${$} " | grep sftp | grep -v grep | wc -l)
  echo "`date` - $(trim ${RUN_COUNT}) transfer processes running"
  echo "`date` - Amount of GB transferred `du -sg ${DEST_DIR}`\n"
  sleep 60
 done
done

while [ $(ps -ef | grep " ${$} " | grep sftp | grep -v grep | wc -l) -gt 0  ]
do
 RUN_COUNT=$(ps -ef | grep " ${$} " | grep sftp | grep -v grep | wc -l)
 echo "`date` - $(trim ${RUN_COUNT}) transfer processes running"
 echo "`date` - Amount of GB transferred - `du -sg ${DEST_DIR}`\n"
 sleep 60
done
echo "`date` - Transfered completed"

Getting Oracle version – new utility on 18c oraversion

While scripting on an environment with different Oracle versions, I often needed to get the Oracle version for the target database as usually the SQL to be executed is version depended.

I used different methods to grab the database  version some you can see below:

SQL> select 
substr(
substr(
banner, 
instr(banner,' ',-(length(banner)-instr(banner,'.')),1)+1, 
instr(banner,' ',+instr(banner,'.'),1)-instr(banner,' ',-(length(banner)-instr(banner,'.')),1)-1
),
1, 
instr(
substr(
banner, 
instr(banner,' ',-(length(banner)-instr(banner,'.')),1)+1,
instr(banner,' ',+instr(banner,'.'),1)-instr(banner,' ',-(length(banner)-instr(banner,'.')),1)-1
),
'.'
)-1
) version
from v$version
where rownum = 1
; 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

VERSION
--------------------------------------------------------------------------------
18

Or like this

SQL> select substr(version,1,instr(version,'.')-1) version from product_component_version where product like 'Oracle%';

VERSION
--------------------------------------------------------------------------------
18

But on Oracle 18c there is a new utility called oraversion which you can get the same result as the above queries.

[oracle@server01 ~]$ oraversion
This program prints release version information.
These are its possible arguments:
-compositeVersion: Print the full version number: a.b.c.d.e.
-baseVersion: Print the base version number: a.0.0.0.0.
-majorVersion: Print the major version number: a.
-buildStamp: Print the date/time associated with the build.
-buildDescription: Print a description of the build.
-help: Print this message.
[oracle@server1 ~]$

[oracle@server01 ~]$ oraversion -majorVersion
18
[oracle@server01 ~]$

This could be somewhat useful but I though it was worth sharing.

Until next time.

Elisson Almeida