I checked strange coincidence during GoldenGate Director monitoring failure and GoldenGate Manager messages.
During GoldenGate operation, it append never-ending failure messages bellow, however, none GoldenGate proccess change to “ABBENDED” status
2016-10-26 08:47:28 ERROR OGG-01224 Oracle GoldenGate Command Interpreter for Oracle: Bad file number.
2016-10-26 08:47:29 ERROR OGG-01668 Oracle GoldenGate Command Interpreter for Oracle: PROCESS ABENDING.
It’s caused by GoldenGate logfile size ( ggserr.log), so correct with this:
grepora-gg@machine oracle$ cp ggserr.log ggserr.log-err-temp-log && > ggserr.log
It good idea stop manager proccess (if it’s possible) before truncate log file.
Keep this in mind 😉
Recently I caught a bunch of connections in ‘TIME_WAIT’ on a MySQL Server through ‘netstat – antp 3306’…
After some time, we identified this was caused by the environment not using DNS, only fixed IPS (uuugh!)…
As you know, for security measures MySQL maintains a host cache for connections established. From MySQL docs:
“For each new client connection, the server uses the client IP address to check whether the client host name is in the host cache. If not, the server attempts to resolve the host name. First, it resolves the IP address to a host name and resolves that host name back to an IP address. Then it compares the result to the original IP address to ensure that they are the same. The server stores information about the result of this operation in the host cache. If the cache is full, the least recently used entry is discarded.”
126.96.36.199 DNS Lookup Optimization and the Host Cache
For this reason, there is a DNS ‘reverse’ lookup for each login was hanging this connections.
Right way: Add an A type registry in DNS for the hosts. Use DNS!
Quick way: Add on /etc/hosts from database server the mapping for the connected hosts, avoiding the DNS Lookup.
Quicker way: Setting the skip-name-resolve variable at /etc/my.cnf. This variable avoids this behavior in database layer for new connections and solve the problem.
This is a good (portuguese) post about it: http://wagnerbianchi.com/blog/?p=831
A few time ago I passed by some performance issues on AIX working with instances with different configuration (proc/mem). The root cause was basically the inefficient configuration of networking for interconnect (UDP).
As you know, the UDP is a non-response (for that reason with less metadata and faster) protocol. By the default, every server have a pool to send udp (and tcp) messages and another to recieve.
In my situation, once there was an ‘inferior’ instance, the pools were automatically set smaller in this one, and it was causing a high interconnection block sending statistics from other instances. In deed, it was lots of resends caused by overflows in this smaller instance…
There is one one to evaluete how much loss are you having for UDP in your AIX server:
netstat -s | grep 'socket buffer overflows'
If you are having considerable number of overflows, it’s recommended to reavaluate the sized of your udp_recvspace. And, of course, maintain the calculation of pools.
Oracle recommends, at least:
tcp_recvspace = 65536
tcp_sendspace = 65536
udp_sendspace = ((DB_BLOCK_SIZE * DB_FILE_MULTIBLOCK_READ_COUNT) + 4 KB) but no lower than 65536
udp_recvspace = 655360 (Minimum recommended value is 10x udp_sendspace, parameter value must be less than sb_max)
rfc1323 = 1
sb_max = 4194304
ipqmaxlen = 512
This and others details about configuring RAC on AIX ban be found in note: RAC and Oracle Clusterware Best Practices and Starter Kit (AIX) (Doc ID 811293.1)
I’d recommend you take a look too.
Have a nice day!
Sometimes it’s necessary to restart GoldenGate process, and after start GG Extract, it take’s long time ‘in recovery’ status.
It’ a interesting subject, and can be found here (before read below 🙂 ) .
GGSCI (greporagg) 16> send EXT status
EXTRACT EXT (PID 23068830)
Current status: In recovery: Processing data
Current read position:
Redo thread #: 2
Sequence #: 4246
Timestamp: 2016-10-08 07:32:36.000000
Current write position:
Sequence #: 29295
Timestamp: 2016-10-14 17:59:43.476624
Extract Trail: ./dirdat/TR
So let’s check how to find transaction:
GGSCI (greporagg) 17> send EXT showtrans
Sending SHOWTRANS request to EXTRACT EXT ...
Redo Thread: 4
Start Time: 2016-10-08:07:33:51
SCN: 1658.1839293825 (7122895070593)
Redo Seq: 3388
Redo RBA: 224131088
In database (dark) side:
SQL> select s.sid
from gv$transaction t
inner join gv$session s on t.addr = s.taddr
order by start_time asc;
It’s not necessarily the DBA function, but how often someone of business came and ask you wich is the impact on recompiling one or other procedure?
It probably happen because the DBA usually make some magic and have a better understanding about objects relationship. It happens specially in cases there is no code governance…
So, you don’t have to handle all responsability and can switch some of that with developer, through DBA_DEPENDENCIES view.
The undertstanding is easy: The depended objects and the refered objects. If ou change the refered, all depended will be impacted by.
Enter value for owner: GREPORA
Enter value for obj_name: TABLE_EXAMPLE
OWNER Name TYPE DEPE REFERENCED REFERENCED_OWNER REFERENCED_NAME
------------------ ----------------------------------- ---------- ---- ---------- ------------------ -----------------------------------
GREPORA TOTALANSWEREDQUESTIONS FUNCTION HARD TABLE GREPORA TABLE_EXAMPLE
GREPORA USERRESPONSESTATUS FUNCTION HARD TABLE GREPORA TABLE_EXAMPLE
GREPORA VW_INPROGRESSFEEDBACKOPTS VIEW HARD TABLE GREPORA TABLE_EXAMPLE
GREPORA EVENTSTARTDT FUNCTION HARD TABLE GREPORA TABLE_EXAMPLE
GREPORA HAVEUSERANSWEREDANYTHING FUNCTION HARD TABLE GREPORA TABLE_EXAMPLE
col owner for a18
col name for a35
col type for a10
col referenced_owner for a18
col referenced_name for a35
col referenced_type for a10
select owner,name,type,dependency_type,referenced_type,referenced_owner,referenced_name from dba_dependencies
where referenced_owner like upper('%&owner%') and referenced_name like upper('%&OBJ_NAME%');
This week, had a unexpected stop on Weblogic server after start this server it played a trick, it turns to refuse any telnet request on managed server port even localhost, however it was started successfully.
It’s so easy to resolve:
When trying to start or increase parallel of recover manager on datagauard (MRP):
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;
ERROR at line 1:
ORA-01153: an incompatible media recovery is active
I simply happen because there already are a process runnning, let’s check:
Sometimes we need to connect to the database using an unusual schema that we don’t even know the password, maybe because it was created through a script in a release or the latest DBA never stored it somewhere public or maybe due to any other reason (whatever, right? you just need to connect there and end of the story), but anyhow, you need to login using this schema specifically (to create/delete synonyms, dblinks, jobs etc…). How would you do that without the password?