OGG-0352: Invalid character for character set UTF-8 was found while performing character validation of source column

Almost a month without post!
My bad, december is always a crazy time to DBAs, right?

This post’s title error happens because the charset is different between databases used on replication thought GoldenGate and occurs only with alphanumerical columns (CHAR, VARCHAR, VARCHAR2), because, even the char length be the same, the data length will be different (like I explained here). Take a look in this example:

sourcedb> select table_name,column_name,data_length,char_length from dba_tab_cols where column_name=’NAME' order by 1,2,3;
 
TABLE_NAME    COLUMN_NAME     DATA_LENGTH CHAR_LENGTH
------------- --------------- ----------- -----------
TABLE_EXAMPLE NAME            25          25
destdb> select table_name,column_name,data_length,char_length from dba_tab_cols where column_name=’NAME' order by 1,2,3;

TABLE_NAME  COLUMN_NAME     DATA_LENGTH CHAR_LENGTH
------------- --------------- ----------- -----------
TABLE_EXAMPLE NAME            100         25

 

There is basically two solutions:
1) Change one of those charsets.
2) Add “SOURCECHARSET PASSTHRU” clause on the replicat file.

I usually prefer the second option, just because it’s less intrusive than number 1.

 

See ya!
Matheus.

Advertisements

Charsets: Single-Byte vs Multibyte Encoding Scheme Issue

Sad history:

IMP-00019: row rejected due to ORACLE error 12899
IMP-00003: ORACLE error 12899 encountered
ORA-12899: value too large for column "SCHEMA"."TABLE"."COLUMN" (actual: 61, maximum: 60)

To understand: It happens when the export/import is being made by different charsets. Usually when the destination is a superset with “multibyting” and the source is a single-byte one. The reason is that as more as the charset is not specific, more bits are used to represent a charcter (c-ç, a-ã, o-õ-ô, for example), this way, the columns that uses as data length byte will be different sized between theese databases.

Of course, as more specific a charset configuration is, much better for performance constraints it’ll be (specially for sequencial reads), because the databases needs to work with less bytes in datasets/datablocks for the same tuples, in a simple way to explain. Otherside, this is a quite specific configuration. The performance issues are mostly related to more simple tunings (sql access plan, indexing, statistics or solution architecture) than this kind of details. But, it’s important to mention if you’re working in a database that is enough well tuned…

For more information, I recommend this (recent) documentation: https://docs.oracle.com/database/121/NLSPG/ch2charset.htm. Please, invest your time to understand the relation between “Single-Byte Encoding Schemes” and “Multibyte Encoding Schemes” in this doc.

The follow image ilustrates in a simple way the difference of byting used to address more characters (a characteristic of supersets):

nls81023

Ok, doke!
And the solution is…

Let’s summarize the problem first: The char (char, varchar) columns uses more bytes to represent the same characters. So the situations where, in the source, the column was used by the maximum lengh, it “explodes” the column lengh in the destination database with a multibyte encoding scheme.
For consideration, I’m not using datapump (expdp/impdp or impdb with networklink) just because it’s a legacy system with long columns. Datapump doesn’t support this “deprecated” type of data.
So, my solution, for this pontual problem occouring during a migration was to change the data lengh of the char columns from “byte” to “char”. This way, the used metric is the charchain rather than bytesize. Here is my “kludge” for you:

select 'ALTER TABLE '||owner||'.'||TABLE_NAME||' MODIFY '||COLUMN_NAME||' CHAR('||data_length||' CHAR);'
from dba_tab_cols where DATA_TYPE='CHAR' and owner='&SCHEMA'
union all
select 'ALTER TABLE '||owner||'.'||TABLE_NAME||' MODIFY '||COLUMN_NAME||' VARCHAR2('||data_length||' CHAR);'
from dba_tab_cols
where DATA_TYPE='VARCHAR2' and owner='&SCHEMA';

 

And it works!
Hugs and see ya!
Matheus.

Acentuação no SQLPLUS com Windows em inglês

Hoje perdi tempão pra fazer funcionar a acentuação no SQLPLUS com Windows em inglês. Resolvi compartilhar.
E não. Não basta escolher o idioma certo na instalação do client. Depende realmente do idioma do SO.

Cenário:
Server Windows com client Oracle 11.2 x64 instalado. Necessário usar esta versão do OracleClient visando reconhecimento de scan listeners. Restrições da situação demandavam utilização do sqlplus, obrigatoriamente.

Convém destacar:
– O escopo deste post é fazer com que os dados sejam ‘escritos’ corretamente no BD, e não corrigir dados que já estão gravados incorretamente.
– O escopo deste post não é fazer a conversão de charset do BD, e sim a configuração de nls_lang do client.
– Antes de reclamar que não tá funcionando, dá uma olhadinha no bônus, lá no final do post.

Basicamente, concluí que a melhor configuração para resolver o problema consiste em:
1) Setar o CHCP para 1252 (código de página)
2) Setar variável NLS_LANG do server para BRAZILIAN PORTUGUESE_BRAZIL.WE8MSWIN1252
3) Setar NLS_LANG do Oracle para BRAZILIAN PORTUGUESE_BRAZIL.WE8MSWIN1252

OBS: Funciona com outras combinações, mas optei pelo BRAZILIAN PORTUGUESE_BRAZIL.WE8MSWIN1252 pois segundo a Oracle (http://docs.oracle.com/html/B13804_02/gblsupp.htm) é o que temos como ‘oficial’ pro “Português (Brazil).”

Vale lembrar que esta configuração pode ser feita diretamente via prompt de comando antes da utilização do sqlplus. Aí o problema já estaria resolvido (:D).
Até aqui o Eduardo Legatti me ajudou 100%. Mas agora pra que a configuração seja permanente no servidor, como faz?

Faz assim:

1)  Configuração CHCP para 1252 por Default:
– Run > regedit

Em [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage]:
– Setar o (Default) para c_1252.nls

Default CHCP

–  SetarOEMCP para 1252

set OEMCP

2) Configurar NLS_LANG do Oracle:

Em [HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\KEY_]:
– Setar NLS_LANG para BRAZILIAN PORTUGUESE_BRAZIL.WE8MSWIN1252

NLS_LANG Oracle
3) Configurar o NLS_LANG como variável de ambiente:

Em [My Computer > Properties > Advanced System Settings > Environment Variables]:
– Setar o NLS_LANG para BRAZILIAN PORTUGUESE_BRAZIL.WE8MSWIN1252

set OEMCP
4) Reboot do Servidor:

– Para efetivar mudanças no REGEDIT é necessário reiniciar o server.

Era isso!
Espero que seja útil para mais alguém, assim como foram úteis pra mim os links abaixo:

Nota: Greek Characters Are Not Displayed Correctly In A DOS Window (Doc ID 208902.1)
http://www.leniel.net/2010/03/oracle-nlslang-character-set-encoding.html#sthash.KPuB7Rtg.YxcOtWFd.dpbs
http://docs.oracle.com/html/B13804_02/gblsupp.htm
http://nagaappani.blogspot.com.br/2013/05/setting-nlslang-on-windows-for-oracle.html
http://ss64.com/nt/chcp.html
http://www.fabioprado.net/2012/11/configurando-national-language-support.html
http://tosemopcao.blogspot.com.br/2009/11/bd-oracle-acentos-no-10g-xe.html
http://eduardolegatti.blogspot.com.br/2011/01/habilitando-o-suporte-acentuacao-no.html

Bônus)
Embora as configurações estejam adequadas, a fonte (tipo de letra) do DOS (prompt de comando) pode não suportar os caracteres especiais e acentos. Basta trocar pra “Lucida Console”:

Fonte_Caracteres

Fechado?

Até breve!