GGate: STOP request pending. There are open, long-running transactions.

Hey all!
So, these days I was about to stop the GGate extract to add a new table to replication and saw this:

GGSCI (greporasrvr) 2> stop extract GREPORA_EXT

Sending STOP request to EXTRACT GREPORA_EXT ...

STOP request pending. There are open, long-running transactions.
Before you stop Extract, make the archives containing data for those transactions available for when Extract restarts.
To force Extract to stop, use the SEND EXTRACT GREPORA_EXT, FORCESTOP command.
Oldest redo log file necessary to restart Extract is:

Redo Log Sequence Number 616, RBA 135673360.

2018-04-12 15:55:41 WARNING OGG-01742 Command sent to EXTRACT GREPORA_EXT returned with an invalid response.

Interesting, right?
Let’s discover what is causing the situation:

GGSCI (greporasrvr) 2> send extract GREPORA_EXT, showtrans duration 20 MIN

Sending showtrans request to EXTRACT GREPORA_EXT ...

Oldest redo log file necessary to restart Extract is:

Redo Log Sequence Number 616, RBA 135673360

------------------------------------------------------------
XID: 92.1.1460254
Items: 1
Extract: GREPORA_EXT
Redo Thread: 1
Start Time: 2018-04-12:08:27:03
SCN: 22.120909138 (94610189650)
Redo Seq: 616
Redo RBA: 135673360
Status: Running

------------------------------------------------------------
XID: 69.7.1771671
Items: 1
Extract: GREPORA_EXT
Redo Thread: 1
Start Time: 2018-04-12:13:27:29
SCN: 22.121387178 (94610667690)
Redo Seq: 629
Redo RBA: 1457168
Status: Running

15:59:12 SYS@GREPDB>select * from gv$transaction where xidusn=92;
1 000000018FEA2170 92 1 1460254 121 548870 24255 48 ACTIVE 04/12/18 08:27:02 120909138 22 168 2
131578 24097 43 0000000198A49D68 7683 NO NO NO NO

0 0 0 0 0 0 0 0 81875 4581610 23108750 1415217 -1.675E+09 197 12-APR-2018 08:27:02 0 0
9.4610E+10 0 5C0001001E481600 0000000000000000 0000000000000000

1 row selected.

16:00:08 SYS@GREPDB>select sid,serial#,event,machine,sql_id,seconds_in_wait,prev_sql_id,module,program,action from gv$session where taddr='000000018FEA2170';
885 55577 db file sequential read feedprocs12 39za7ttrfjbr1 55 6x9pg77q982g3
JDBC Thin Client JDBC Thin Client

1 row selected.

16:00:37 SYS@GREPDB>set verify off
set pagesize 999
col username format a13
col prog format a22
col sql_text format a140
col sid format 999
col child_number format 99999 heading CHILD
col ocategory format a10
col avg_etime format 9,999,999.99
col etime format 9,999,999.99
col execs for 99999,999
select sql_id, child_number, plan_hash_value plan_hash, executions execs, elapsed_time/1000000 etime,
(elapsed_time/1000000)/decode(nvl(executions,0),0,1,executions) avg_etime, u.username,
sql_text
from v$sql s, dba_users u
where
sql_id like nvl('&sql_id',sql_id)
and u.user_id = s.parsing_user_id
/
Enter value for sql_id: 39za7ttrfjbr1

SQL_ID CHILD PLAN_HASH EXECS ETIME AVG_ETIME USERNAME
------------- ------ ---------- ---------- ------------- ------------- -------------
SQL_TEXT
--------------------------------------------------------------------------------------------------------------------------------------------
39za7ttrfjbr1 0 72089250 0 25,434.97 25,434.97 GREPORA
--SQL Name:summary/envision/insertMyTable INSERT INTO mytable ( .......) 

1 row selected.

Nice!
Now we know it’s being caused by this long insert in mytable, but how can I proceed?
There are basically two options:
1. Kill the session on DB: Long rollback predicted, be aware of the impact and rollback activity, plus application effects if any.
2. Force stop of Extract with SEND EXTRACT GREPORA_EXT, FORCESTOP: In this case there is no effect for the database and no rollback, HOWEVER, when extract start again it will need to start from this cancelled sequence. First question, do I have this sequence still available? As you could see before, sequence 616 on would be needed, lets check on our availability:

16:01:41 SYS@GREPDB>archive log list
Database log mode Archive Mode
Automatic archival Enabled
Archive destination +FRA_GREPDB
Oldest online log sequence 619
Next log sequence to archive 634
Current log sequence 634

Ouch!
So, in case it be needed, be aware you’ll need your archivelogs and, if already removed, restore may be needed.

There is also third option: 3. Wait until transaction be completed. 🙂

If 1 or 2 need to be chosen, choose wisely. 🙂

Hope this helps.
Cheers!

GGATE ABBENDED: ORA-00308: Cannot Open Archived Log

Hi all!
Ok, so this is one of the most common issues for GGate administration. How to solve it? Easy thing.

First let’s understand what it means: It means the redologs don’t have the required information (assuming integrated) and you have already deleted archivelogs the extract needs. Why? Probably because you already backed up those archivelogs and they were not needed for the database anymore.

Unfortunately we don’t have any kind of ARCHIVELOG DELETION POLICY to guarantee extracts had already read it, like we have for Dataguard. So, what can we do?

Restore the missing archivelogs.

But first let’s confirm on the errors. Some examples:

ERROR OGG-00446 Oracle GoldenGate Capture for Oracle, ext1.prm: Opening ASM file +ARCH/2_11055_742755632.dbf in DBLOGREADER mode: (308) ORA-00308: cannot open archived log '+ARCH/2_11055_742755632.dbf' ORA-17503.

or

ERROR OGG-01028 Oracle GoldenGate Capture for Oracle,ext1.prm: Getting attributes for ASM file +ARCH/2_86720_716466928.dbf,

SQL : (15056)

ORA-15056: additional error message ORA-15173: entry '2_86720_716466928.dbf' does not exist in directory '/...


SOLUTION
:

Restore all archive logs starting from recovery checkpoint until the current checkpoint and restart the extract:
More“GGATE ABBENDED: ORA-00308: Cannot Open Archived Log”

ORA-00600: internal error code, arguments: [17147]

Hello all!
Having this error? Probably something like this:

ORA-00600: internal error code, arguments: [17147], [0x7F0315C53788], [], [], [], [], [], [], [], [], [], []

This is a known error. The best match on MOS is GoldenGate 12c IntRep Crash W/ ORA-600 [kghfrmrg:nxt] ;ORA-600[kghfrh:ds]; ORA 600 [17114]; [17147] (Doc ID 1666909.1)

More“ORA-00600: internal error code, arguments: [17147]”

ORA-27054: NFS file system where the file is created or resides is not mounted with correct options

Due to ease in which we can go to the future or return to the past using Goldengate, it becomes increasingly necessary recover archives from backup, sometimes it is necessary to recover a several days.
To do it, generally we need large disk space, at this time, starts a searching for storage disks.

After finding a disk, is need to mount it, i performed with simply mount options in AIX.

More“ORA-27054: NFS file system where the file is created or resides is not mounted with correct options”

ERROR OGG-02636 when creating a integrated extract in Goldengate 12C on a Puggable database 12C

While creating an integrated extract in Goldengate 12C on a Puggable database 12C I came across the follow error, stating that the needed catalog name is mandatory  and was not being informed.

ERROR OGG-02636 Oracle GoldenGate Capture for Oracle, ext1.prm: The TABLE specification ‘TABLE table_name’ for the source table ‘table_name’ does not include a catalog name. The database requires a catalog name.

There is two ways to solve this case: The first , besides less indicated, is to add the name of pluggable database (Catalog) before the owner name on the table maps, for example:

GGSCI (host1.net) 1> edit param ext1

–Tables
TABLE PDB_NAME.SCHEMA_OWNER.TABLE_NAME;

Not really enjoying this solution and after searching for long hours without any other result , our friend Maiquel DC indicated a parameter that identifies the catalog name for all tables in the extract ;

Add the following parameter in extract  configuration file:

GGSCI (host1.net) 1> edit param ext1

–Parameters
SOURCECATALOG PDB_NAME

 

Thats all folks.
Dieison.

nc -l – Starting up a fake service

Hi everyone!

Recently i have faced a situation that made me find out a very nice and useful command that helped me a lot, and i hope it comes to help you guys as well, and it’s named:

nc

Situation: We have a replicated environment from one datacenter to another (Using Golden Gate), where the ETL happens. So basically is:

Datacenter 1 (root data)

Replicates to datacenter 2 (transforming the data)

that replicates to datacenter 3 (production itself)

In Datacenter level 2, we have a dataguard configured. So then came the question:

  • What if we need to do the switchover to the standby environments?
  • Will we gonna have everything we need properly set up for the replication?
  • How are we going to test the ports if nothing is up in there? Aren’t we gonna get “connection refused”?

Calm down! There is a very nice workaround for this.

All you need to do is install the nc command as root (if it is not installed already):

yum install nc

Then execute it as follows, on the server you wanna test:

nc -l

example:

I wanna make sure that on the standby server the port 7809 (Golden Gate MANAGER port) is open. On the standby server you run:

nc -l 7809

Then, from a remote server, you are going to be able to connect through a simple telnet command:

telnet server.domain port

example:

telnet standby.company.com 7809

 

ON PRACTICE:

  • Try the telnet from the remote server to the standby:

remoteserver {/home/oracle}: telnet standby.server 7809

Trying 192.168.0.10…

telnet: connect to address 192.168.0.10: Connection refused

  • Then we start the fake service on the standby server!

standby.server {/home/oracle}: nc -l 7809

  • And try the telnet again:

remoteserver {/home/oracle}: telnet standby.server 7809

Trying 192.168.0.10…

Connected to standby.server.

Escape character is ‘^]’.

 

Cheers!

Rafael.

<EJB Exception in method: ejbPostCreate: java.sql.SQLException: XA error: XAResource.XAER_RMFAIL start() failed on resource 'ggds-datasource_domain': XAER_RMFAIL : Resource manager is unavailable

Some incidents that we face are expected. Usually we wait for problems when something changes in an environment.
But, some times, for no apparent reason, with no systemic alteration, we encounter errors where our first reaction is: what a f ***!?

This time we find a java exception in a standard domain for GoldenGate Director;

For months the application behaved stable and functional, until it did fails for no apparent reason;

When I saw part of the exception, “XAER_RMFAIL: Resource manager is unavailable” went straight to talk to one of the best DBAs know that Matheus Boesing, to request a check on the resource manager database, (No problem found).

… then we fall back into a BUG: 11672297 Bug: ORA-01092 MAPPED TO XAER_RMERR instead of XAER_RMFAIL – (Doc ID 1329800.1)

In version 12.1, this bug is fixed, but as a palliative solution can do the following:

Increase the value (Maximum Duration of XA Calls) in JTA configurations of weblogic domain, the default value is 12000, in my case, I adjusted to 48000;

The problem was solved, at least for now.
Dieison.
 

OGG-0352: Invalid character for character set UTF-8 was found while performing character validation of source column

Almost a month without post!
My bad, december is always a crazy time to DBAs, right?

This post’s title error happens because the charset is different between databases used on replication thought GoldenGate and occurs only with alphanumerical columns (CHAR, VARCHAR, VARCHAR2), because, even the char length be the same, the data length will be different (like I explained here). Take a look in this example:

sourcedb> select table_name,column_name,data_length,char_length from dba_tab_cols where column_name=’NAME' order by 1,2,3;
 
TABLE_NAME    COLUMN_NAME     DATA_LENGTH CHAR_LENGTH
------------- --------------- ----------- -----------
TABLE_EXAMPLE NAME            25          25
destdb> select table_name,column_name,data_length,char_length from dba_tab_cols where column_name=’NAME' order by 1,2,3;

TABLE_NAME  COLUMN_NAME     DATA_LENGTH CHAR_LENGTH
------------- --------------- ----------- -----------
TABLE_EXAMPLE NAME            100         25

 

There is basically two solutions:
1) Change one of those charsets.
2) Add “SOURCECHARSET PASSTHRU” clause on the replicat file.

I usually prefer the second option, just because it’s less intrusive than number 1.

 

See ya!
Matheus.