Monday, October 10, 2011

Edit your export dmp-file: never?

Don't!


Every expert in the field will tell you to never, ever edit your dump file.
And rightfully so. Unless you really are caught between a rock and a hard place, and think you know what you are doing. I think I know.

Fair warning: you are about to see stunts that should not be attempted at home. Do not perform these actions without supervision of experienced personnel.
And never use a text editor.

Background.


The scenario is this: you are under a time constraint (deadlines!). You have a dump file, exported from a WE8ISO8859P1 database, using .WE8ISO8859P1 as character set part of the NLS_LANG setting.
You are about to import this into a WE8MSWIN1252 database.
Now, with the latest "smart" additions to exp and imp, you are bound to get "Warning: possible character set conversion" in you log file.
Either because you use WE8ISO8859P1, and imp recognizes the dump file uses that, too, but the database isn't, or because you use WE8MSWIN1252 for NLS_LANG, and imp recognizes your dump file uses WE8ISO8859P1.
Both ways, your Euro ('€') character will become "unmapped" or "unknown": 0xBF (191 decimal): '¿'

Solution


There's one Quick and Dirty way to get rid of this: edit your dump file, and change the header record, that defines the character set used.
Bytes 2 and three define the character set used. Use this to find the values:
select nls_charset_id(value) id,
value Charset_name
from v$nls_valid_values
where parameter = 'CHARACTERSET'
and nls_charset_id(value) is not null
and value like 'WE8%';
That will give you a list, amongst which these entries:
ID CHARSET_NAME
---------- ----------------------------------------------------------------
2 WE8DEC
3 WE8HP
5 WE8EBCDIC37
6 WE8EBCDIC500
7 WE8EBCDIC1140
8 WE8EBCDIC285
9 WE8EBCDIC1146
10 WE8PC850
27 WE8EBCDIC1148
28 WE8PC858
31 WE8ISO8859P1
39 WE8ISO8859P9
46 WE8ISO8859P15
[snipped]
178 WE8MSWIN1252
210 WE8GCOS7

Now, let's see how this could work under MS Windows. The premium hex tool under MS Windows is debug! And as long as your dump file fits, this is feasable:

H:\werk\coters<debug pasfotos.dmp
-d
136F:0100 03 00 1F 45 58 50 4F 52-54 3A 56 30 37 2E 30 33 ...EXPORT:V07.03
136F:0110 2E 30 34 0A 44 43 4F 54-0A 52 54 41 42 4C 45 53 .04.DCOT.RTABLES
136F:0120 0A 32 30 34 38 0A 30 0A-00 20 20 20 20 20 20 20 .2048.0..
136F:0130 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
136F:0140 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
136F:0150 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
136F:0160 20 20 20 20 20 20 20 20-57 65 64 20 44 65 63 20 Wed Dec
136F:0170 31 33 20 31 32 3A 32 38-3A 35 35 20 32 30 30 30 13 12:28:55 2000
-e0102
136F:0102 1F.B2
-w
Writing 0E800 bytes
-q

What you can see here, is opening a (yes, small) export file, called pasfotos.dmp, in debug. I then dump (command: d) 8 lines. Line 1 contains 0x03, 0x00, 0x1f.
Bytes 2 and 3 are the character set identifiers: 0x001f translates into 31 decimal, or: WE8ISO8859P1.
That is correct, this over 10 years old file was dumped from a WE8ISO8859P1 database.
I can also edit this file: e0102 (edit position 0102). debug prompts the old value (1F), and allows me to enter a new (B2) value.
I then write back the file (command w), and quit debug (q). I can open the same file again, to show you the change:

H:\werk\coters>debug pasfotos.dmp
-d
136F:0100 03 00 B2 45 58 50 4F 52-54 3A 56 30 37 2E 30 33 ...EXPORT:V07.03
136F:0110 2E 30 34 0A 44 43 4F 54-0A 52 54 41 42 4C 45 53 .04.DCOT.RTABLES
136F:0120 0A 32 30 34 38 0A 30 0A-00 20 20 20 20 20 20 20 .2048.0..
136F:0130 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
136F:0140 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
136F:0150 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
136F:0160 20 20 20 20 20 20 20 20-57 65 64 20 44 65 63 20 Wed Dec
136F:0170 31 33 20 31 32 3A 32 38-3A 35 35 20 32 30 30 30 13 12:28:55 2000
-q

Obviously, 0xB2 is 178 decimal, and that is the characterset id for MSWIN1252.

My file is much larger, I don't like debug


or whatever. Luckily, there is this nifty app, that allows for larger files to be hacked. Usage is simple:

dmp2utf8 <filename> [charset id]
Default is UTF8, or charset id 871 (0x0367). There are some othe nice gems on this site, www.dbatools.net.

On Unix, or Linux, you would use hexedit, or even sed.

2 comments:

Anonymous said...

>>"debug prompts the old value (1F), and allows me to enter a new (BF) value."

It seems that it should read "debug prompts the old value (1F), and allows me to enter a new (B2) value."

Nice post Frank (It reminds me of my old and forgotten assembler knowledge).

Cheers.

Carlos.

Frank said...

My only excuse is it was quite late, and then some personal sorrow.
Corrected, thanks for spotting & the thumbs up!