Charsets: Single-Byte vs Multibyte Encoding Scheme Issue

Sad history:

IMP-00019: row rejected due to ORACLE error 12899
IMP-00003: ORACLE error 12899 encountered
ORA-12899: value too large for column "SCHEMA"."TABLE"."COLUMN" (actual: 61, maximum: 60)

To understand: It happens when the export/import is being made by different charsets. Usually when the destination is a superset with “multibyting” and the source is a single-byte one. The reason is that as more as the charset is not specific, more bits are used to represent a charcter (c-ç, a-ã, o-õ-ô, for example), this way, the columns that uses as data length byte will be different sized between theese databases.

Of course, as more specific a charset configuration is, much better for performance constraints it’ll be (specially for sequencial reads), because the databases needs to work with less bytes in datasets/datablocks for the same tuples, in a simple way to explain. Otherside, this is a quite specific configuration. The performance issues are mostly related to more simple tunings (sql access plan, indexing, statistics or solution architecture) than this kind of details. But, it’s important to mention if you’re working in a database that is enough well tuned…

For more information, I recommend this (recent) documentation: https://docs.oracle.com/database/121/NLSPG/ch2charset.htm. Please, invest your time to understand the relation between “Single-Byte Encoding Schemes” and “Multibyte Encoding Schemes” in this doc.

The follow image ilustrates in a simple way the difference of byting used to address more characters (a characteristic of supersets):

nls81023

Ok, doke!
And the solution is…

Let’s summarize the problem first: The char (char, varchar) columns uses more bytes to represent the same characters. So the situations where, in the source, the column was used by the maximum lengh, it “explodes” the column lengh in the destination database with a multibyte encoding scheme.
For consideration, I’m not using datapump (expdp/impdp or impdb with networklink) just because it’s a legacy system with long columns. Datapump doesn’t support this “deprecated” type of data.
So, my solution, for this pontual problem occouring during a migration was to change the data lengh of the char columns from “byte” to “char”. This way, the used metric is the charchain rather than bytesize. Here is my “kludge” for you:

select 'ALTER TABLE '||owner||'.'||TABLE_NAME||' MODIFY '||COLUMN_NAME||' CHAR('||data_length||' CHAR);'
from dba_tab_cols where DATA_TYPE='CHAR' and owner='&SCHEMA'
union all
select 'ALTER TABLE '||owner||'.'||TABLE_NAME||' MODIFY '||COLUMN_NAME||' VARCHAR2('||data_length||' CHAR);'
from dba_tab_cols
where DATA_TYPE='VARCHAR2' and owner='&SCHEMA';

 

And it works!
Hugs and see ya!
Matheus.

Advertisements

2 thoughts on “Charsets: Single-Byte vs Multibyte Encoding Scheme Issue

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s