Exadata DNS Change – Pitfalls to be avoided

Hi all, it’s been a while but here I am!

There were some changes in the infrastructure at the place I work and I was asked to do a DNS change on a bit old Exadata X5. I had never done one before this, so the idea of this post is to help others who might face the issues I had.

The first thing I did was to look up the documentation about it and see the steps, yes there are blogs about it but the doc can help to get at least the first glance of the situation.

Long story short: Exadata has lots of components and the new DNS should be changed on all of them.

Here is a summary of the steps.

Infiniband switches

Connect to the switches and sudo to ilom-admin and change the DNS

su - ilom-admin
show /SP/clients/dns
set /SP/clients/dns nameserver=192.168.16.1,192.168.16.2,192.168.16.3
show /SP/clients/dns

 

Database nodes

For my image I only needed to change the /etc/resolv.conf, if you have a newer one you will need to user ipconf – That´s why you need to go to the documentation, at least there we hope that they will put some mentions on the pitfalls (well keep reading and you will see that was not my case)

Also changed the DNS on wach database node ilom, runing the ipmtool from the each node

ipmitool sunoem cli 'show /SP/clients/dns'
ipmitool sunoem cli 'set /SP/clients/dns nameserver=192.168.16.1,192.168.16.2,192.168.16.3'
ipmitool sunoem cli 'show /SP/clients/dns'


Cell nodes – Here things start to get interesting

For the storage cell there are some points that need to be taken under consideration:

Increase the ASM disk_repair_time – the goal here is to avoid a full rebalance if you do this within its timeframe, if you don’t know this parameter,  ASM will wait for up to the interval specified for DISK_REPAIR_TIME for the disk(s) to come online. If the disk(s) come back online within this interval, a resync operation will occur, where only the extents that were modified while the disks were offline are written to the disks once back online. If the disk(s) do not come back within this interval, ASM will initiate a forced drop of the disk(s), which will trigger a rebalance.

On each cell node we need to make sure all disks are OK, stop all cell disks, stop all cell services and user ipconfig to change the DNS configuration

#Check that putting the grid disks offline will not cause a problem for Oracle ASM - it should all say YES on the 3rd column 
cellcli -e LIST GRIDDISK ATTRIBUTES name,asmmodestatus,asmdeactivationoutcome

#Inactivate all grid disks on the cell - may take a while to complete
cellcli -e ALTER GRIDDISK ALL INACTIVE


#Confirm the grid disks are offline, it should show asmmodestatus=OFFLINE or asmmodestatus=UNUSED, and asmdeactivationoutcome=Yes for all grid disks
cellcli -e LIST GRIDDISK ATTRIBUTES name, asmmodestatus,asmdeactivationoutcome

#Confirm that the disks are offline
cellcli -e LIST GRIDDISK

#Shut down the cell services and ocrvottargetd service
cellcli -e ALTER CELL SHUTDOWN SERVICES ALL
service ocrvottargetd stop #on some images this services does not exists

To execute the ipconf on the old way we only need to call it can follow the prompts, but if you have a newer image you will need to provide its parameters as is shown in the documentation.

The documentation says that after it we could start the cell services back up but I would recommend validating the DNS prior to doing that, why is that you might say because mine did not work and I could have a bigger issue with a cell node without DNS trying to start the services.

So, how to test, use nslookup, dig and curl

nslookup dns_domain.com
curl -v 192.168.16.1:53
dig another_server_in_the_network

 

My tests did not work, I was able to ping the DNS servers but not to resolve any name, I had an SR on MOS but did not help much either, looking up as this is a production system I tried to see if the firewall was up on the Linux site, and to my surprise it was.

I tried to manually add rules to iptables but it did not work and then I came across this note Exadata: New DNS server is not accessible after changing using IPCONF (Doc ID 1581417.1)

And there it was, I needed to restart the cellwall service to recreate the iptables rules.

# Restart cellwall service
service cellwall restart
service cellwall status

One final point, check if ASM started the rebalance or not, if it did, do not start to bring down another cell node until the rebalance is finish, otherwise you may run into deeper issues.

 

I hope it helps!

Elisson Almeida

AWS AQUA for Redshift

Hi all,

Quick one today. Did you see this new release? I’m a bit behind the schedule but trying to catch up with the news. It seems very interesting:

AQUA (Advanced Query Accelerator) for Amazon Redshift is available in preview. AQUA provides a new distributed and hardware-accelerated cache that brings compute to the storage layer for Amazon Redshift and delivers up to 10x faster query performance than other cloud data warehouses.

AQUA is a high-speed cache on top of Redshift Managed Storage that can scale out and process data in parallel across many AQUA nodes. AQUA uses AWS-designed analytics processors that dramatically accelerate data compression, encryption, and data processing on queries that scan, filter, and aggregate large data sets. With this new architecture, customers can run queries quicker than ever before, allowing them to query data directly, even at scale, and giving them more up-to-date dashboards, reducing development time, and making system maintenance easier.

It is available for preview in US East (Ohio), US East (N. Virginia), and US West (Oregon) regions at this point.

For more references:

  • There is a very tech detailed article about it HERE.
  • There is also a very nice TechTalk presenting it HERE.

Let’s keep up to date!

ORA-07445 [kxsPurgeCursor()]

Hi all,
We started getting lots of ORA-07445 errors that are being reported in the Exadata production database from a client.
Checking on the traces we can see that someone is running the below SQL and that is triggering these errors:
declare
i number := 0;
begin
for c1 in (
select address, hash_value, sql_id from v$sql where
last_active_time < sysdate - 1/24
and executions < 2 ) loop
begin
dbms_shared_pool.purge(c1.address || ' ' || to_char(c1.hash_value), 'C');
i := i + 1;
exception when others then
dbms_output.put_line(c1.sql_id);
end;
end loop;
dbms_output.put_line('Cursors purged = ' || to_char(i));
end;
/
I’m not even going to discuss the PL above, it’s purging all new SQLs executed less than 2 times in the last 24hours. I’m not really agreeing with this…
What matters is: in the end, it was being caused because hash value of non-cursor is being passed to dbms_shared_pool.purge().  This is a match to Bug 29281112 – ORA-7445: [kxspurgecursor()+517] [sigsegv] (Doc ID 29281112.8)
There is no actual workaround besides stop passing a non-cursor hash to dbms_shared_pool.purge, however we have a fix for a better addressing of those cases on:

Quick Reference: Oracle Exadata default passwords

It’s not needed to say how important this is to have all the password reset, even though on new Exa hardware the SSH is usually disabled. How ever, oftenly we need to have access in some of the consoles knowing the password was never changed but unaware on what it should be.

For those cases (until you change it, ASAP), here is a quick reference:

Database Server:

  • root/welcome1
  • oracle/welcome1
  • grid/welcome1
  • grub/sos1Exadata

Exadata Storage Servers:

  • root/welcome1
  • celladmin/welcome1
  • cellmonitor/welcome1

InfiniBand switches:

  • root/welcome1
  • nm2user/changeme

Ethernet switches:

  • admin/welcome1

Power distribution units (PDUs):

  • admin/welcome1
  • root/welcome1

Database server ILOMs:

  • root/welcome1

Exadata Storage Server ILOMs:

  • root/welcome1

InfiniBand ILOMs:

  • ilom-admin/ilom-admin
  • ilom-operator/ilom-operator

Keyboard, video, mouse (KVM):

  • admin/welcome1

Change them!

I hope this was useful for you.

Removing user from SQL Server database

So, have you ever been in the need to drop a user from the SQL Server database, but you stuck with the following related error?

Error: 15138 The database principal owns a schema in the database, and cannot be dropped.

If you don’t have access to the SSMS to see which schema or objects the user owns, the following SQL should do the job.

USE [DATABASENAME]
GO
SELECT so.name OBJECT, su.name OWNER
FROM sys.schemas so
INNER JOIN sysusers su ON (so.principal_id = su.uid)
WHERE su.name = 'username'
GO
SELECT so.name OBJECT, su.name OWNER, so.xtype TP
FROM sys.sysobjects so
INNER JOIN sysusers su ON (so.uid = su.uid)
WHERE su.name = 'username'
GO

Once you have the owned objects/schemas, you can change them with the following SQL (schema example):

USE [DATABASENAME]
GO
ALTER AUTHORIZATION ON SCHEMA::[db_datareader] TO [newowner] /* new owner username */
ALTER AUTHORIZATION ON SCHEMA::[db_datawriter] TO [newowner]
GO

Then you’re ready to drop the user DROP USER [username].

Bonus SQL: The following will list the schemas and their owner.

SELECT schema_name(schema_id) as SCHEMANAME,
user_name(s.principal_id) as USERNAME
FROM sys.schemas AS s

AWS CloudFormation

Hello everyone,

We are going to start one series of posts where we are going to show the AWS CloudFormation usage.

The nexts post going to abord a creation one RDS, DMS using AWS cloudFormation stack. In this series we’re going to showcase AWS CloudFormation by creating one RDS and one DMS.

First one fast introduction about AWS cloudFormation.

Let start to explain AWS CloudFormation

” AWS CloudFormation provides a common language for you to model and provision AWS and third party application resources in your cloud environment.”

In practice it’s a json/yaml file, where we can describe instructions to create AWS services.

Let’s  code!.

First step that we need is connect to AWS Console. In the search field type CloudFormation like the picture below.

Click on CloudFormation to open the service console.

 

Click on Stacks.

 

 

After this, click on create a new stack and select “with new resources”

We are going to click on Create template in designer and you will be redirected to page like below.

 

Click on Template and the code editor is going to open.

The next step will be to create one script to deploy a service. In this example, we going to use a DMS script and a RDS Postgresql. The examples used in this article are available in the next article.

To execute the script the first step is to validate by clicking the highlighted button in the below image.

The return can be OK or error. If the return is OK you can create the stack. To do this clicking the highlighted button in the below image.

You can check execution events by clicking in the Events page. The return is similar to the image below.

In the next articles, we are going see the source code and to use the AWS DMS Service to replicate data from Oracle Database to RDS PostgreSQL.

 

5 out of 6 OCI Certifications – Impressions and Experience

Hi all,

Not long ago the was a program by Oracle which gave access to PCI training and certification for free. As several people still wanted to take their chance on the exams, there were an extension on with more openings. I blogged about it here

During the last days studying for the OCI free certifications. Well, let´s say that I procrastinated a bit and along with moving countries with my family, COVID the lockdown and all that comes with it, I managed to pass on 5 out of the 6 OCI certifications that are available, I did not enroll on the Dev one once I knew that Kubernetes and OCI Functions were the type of beast that I was not planning to tackle.

Here are they in order:
oci_results

And yes I did the Architect Professional prior the Architect Associate, let´s call a miss on schedule from my part – to say the least.

I did all video trainings and some more than one time and read a lot of the documentation. Also asked to collegues and peers how the exams were as soon as I saw they passed in the exams that I was aamin to.

While not working with cloud nor with OCI every day I knew that I needed to put a few extra hours but it not that big of a deal. so if you dont work with cloud and want to get in the game, you sure can do it. Oracle has the free trial and the always free tier that you can levarage from. Mine expired in the middle of the studing so I needed to get creative with the demos that are avaliable and on the last day prior my last exam I was able to get access to my corp´s tenancy (thanks a lot guys) to resolve some cases which I still needed.

The demos from the trainings are not 100% specialy on the Database service side. I was expecting more, there are great demos from Rohit Rahi and Flávio Pereira and others. So I strong recommand that you follow along. My tip goes to increasing the video speed to 1.5x That really helped my to get focus and to move a bit faster to get all done prior each exam.

My scores ranged from 74% upt to 92% and in my opnion the Architect Associate was the most dificult due its questions type not the content it self. As most of the questions were specific to OCI features and services and not related to desing, some trick questions- To me these make the exams less enjoyable. I had no OCI CLI questions which I though It would be also covered.

I would like to thank my family to put up with me for the last week of testing and to Oracle and Oracle Cloud teams which providded the trainings and the exams vouchers.

 

Cheers

Elisson Almeida

Monitoring database storage over-committing with OEM extention metric when using autoextend datafiles

Hi all,

Working on a project, I was asked to stop using autoextend on the database file, while setting up a new environment on a client, to avoid ASM diskgroup space issues.

In case the free space was not enough to handle all datafiles growth when often people would add datafiles in autoextend without checking the complete database e storage, which could cause a bigger problem.

I really like the autoextend, as it can prevent processes from failing and even more. Instead of be wasting space as often, in multiple tablespace environment, as the tablespace grows differently and as I mention not all validations are often done when adding space. We just want to resolve the issue and move to the next one.

So how can we save space by using it on where is needed and not being at risk of over-committing the storage use?

The idea is to monitor the database possible growth and compare with the current free space in the ASM diskgroup being used by the database.

So to accomplish that, I used column MAXBYTES in dba_data_files and dba_temp_files. In the example below we can see the current size from the BYTES column and the MAXBYTES as the limit to which the file can grow to be.  If the MAXBYTES is 0 the file does not have autoextend on.

SYS@db011>select TABLESPACE_NAME,FILE_NAME,BYTES,MAXBYTES from dba_data_files where TABLESPACE_NAME like 'SYS%' or TABLESPACE_NAME='UNDOTBS1';

TABLESPACE_NAME FILE_NAME BYTES MAXBYTES
------------------------------ -------------------------------------------------------------------------- ---------- ------------
SYSTEM +ORA_DATA/db01/datafile/system.264.1032015007 2147483648 32212254720
SYSAUX +ORA_DATA/db01/datafile/sysaux.319.1032015009 2751463424 32212254720
UNDOTBS1 +ORA_DATA/db01/datafile/undotbs1.320.103201500 4294967296 0

So, to gather the max database size, I summed the greatest value between BYTES and MAXBYTES (when available) and using the v$asm_diskgroup to grab the current avaliable space from the diskgroup used in the db_create_file_dest parameter and finally subtracting the available space from the max database size:

select dg.FREE_MB - round((dbf.max+tempf.max),0) as total_free
from
(select trunc(sum(greatest(maxbytes,bytes)/1024/1024)) as max
from dba_data_files ) dbf,
(select trunc(sum(greatest(maxbytes,bytes)/1024/1024)) as max
from dba_temp_files) tempf,
(select FREE_MB
from v$asm_diskgroup where name in (select LTRIM( value, '+' ) from v$parameter where name='db_create_file_dest')) dg;

Based on this we can create several ways to avoid the storage over-committing.

The one I used was to create a metric extension:

If the value of the above query is negative, means that the max database size is greater than the available size therefore the capacity plan for that database must be reviewed.

  1. To create the metric extension, go to, on the OEM menu  – Enterprise -> Monitoring -> Metric Extension > Actions -> Create -> Metric Extension.
  2. In the General Properties page select the target type as Cluster database if it´s RAC or Database instance.
  3. In the Adapter, select “SQL”
  4. In Collection schedule, enable the metric, upload the data and set the collection frequently to your environment needs.
  5. In the Adapter page add the above query.
  6. In the Columns page, add a column for the result of the query.
    1. To fit my environment I only set the critical alert and in case the value of the metric is below 0.
  7. Continue filling the necessary information on the next pages, as well as test the new metric on a selected target, Afterwards save the new metric.
  8. Prior to deploying it to any target you need save the metric as a deployable draft, select the created metric and go the action menu and select Save as Deployable draft

Now you can deploy the new metric to the targets.

Hope this can help!

Elisson Almeida

Autonomous Linux – Did you hear about it?

Hi all!

Not long ago Oracle lunched the Oracle Autonomous Database which run from the Oracle Cloud Infrastructure. This new service provides a database which several tasks are automated for you. From security patching, performance tunning and others…

ccac4eed66edabdb4b94b7aa54071ba8

So what this is all about?

Now the Oracle Linux was added to the Autonomous family as well. This means it is an operation system which runs from OCI which goals are to provide a more secure, cost effective and high reliable system with less manual administrative tasks.

It’s not new that security patches are a must but who has the downtime avaliable or the staff to patch their systems every time a new patch is released?

With Oracle Ksplice you will have kernel patches without having to reboot your instance. zero-day vulnerabilities and overall reduced sys admin manual tasks. But we know that this type of solutions do not replace a professional but insetad free their time to do other more important tasks.

What are the down sides of it? In my humble opinion is that we dont have much option to deny a update once we have it automated. We can use the Oracle OS Management Service to manage the servers separating which we want to automate or to manual control.

Note from the Autonomous Linux page is that The Oracle Autonomous Linux image has been moved. It will no longer be available on the  Oracle Cloud Marketplace or the Oracle Images catalog. As of this date, it is available from the Platform Images catalog within the Oracle Cloud Infrastructure console, when creating a compute instance.

I got my seft locked out my OCI account (while studying for the free exams, if dont know what I´m talking abuot please see it here) so I wont be able to show scrren on how to create a insatnce useing the Autonomous Linux but for sure this will be done very soon.

Stay sharp!

Elisson Almeida

ODA Useful Commands to Manage VMs

1. To restart the vm:

oakcli stop vm [vm-name]
oakcli stop vm [vm-name] -force
oakcli start vm [vm-name]

2. To show All vm status

oakcli show vm

Example:

root@server oak]# oakcli show vm

NAME NODENUM MEMORY VCPU STATE REPOSITORY
vm1 1 8192 4 ONLINE fileshare

3. Listing Configured Options for a VM Template

oakcli show vmtemplate ol6linux_64

4. Adding a Network to the VM Template

oakcli modify vmtemplate ol6linux_64 -addnetwork net1

5. Configure CPU, Memory on the Template

oakcli configure vmtemplate ol6linux_64 -vcpu 4 -maxvcpu 8 -cpucap 10 -memory 3000M -maxmemory 6G -os OTHER_LINUX

6. Create a VM by Cloning from Template

oakcli clone vm ol6test -vmtemplate ol6linux_64 -repo repo1 -node oda2

7. Override VM Template Values

oakcli configure vm ol6test -vcpu 6 -memory 4G

8. Configure High Availability and Failover Values

oakcli configure vm ol6test -prefnode oda2 -failover oda1

9. Start a VM

oakcli start vm ol6test

10. Access VM Console for a VM:

oakcli show vmconsole [vm-name]

Hope it helps you!