Friday, November 19, 2010

ORA-12641 "Authentication service failed to initialize" on multi homed machines

Update: the TMPDIR environment variable gets picked up; the file will be created in that directory. A technical note will be released, as this is not documented anywhere.
I have been struggeling to get Kerberos Authentication running on multi homes machines at my clients site. I just could not figure out why one package (HP-speak for virtual machines) works just fine, where the next errs off with a dreaded "ORA-12641: Authentication service failed to initialize".
Mind you - both using the same keytab and Kerberos configuration files, and similar sqlnet.ora files.

Go trace, young man, go trace!


Today I had the awakening moment to enable server tracing on the database side: added the following code to sqlnet.ora:
trace_level_server = 32
trace_file_server = svr_trc
trace_directory_client = $TNS_ADMIN/trace

Sure enough, comparing the package with errors against the one that validated me just fine using Kerberos, gave me some idea:

[19-NOV-2010 12:46:38:931] nauk5ra_rcinit: Validating existing replay cache.
[19-NOV-2010 12:46:38:931] snauk5g_open_file: entry
[19-NOV-2010 12:46:38:931] snauk5g_open_file: Setting write lock.
[19-NOV-2010 12:46:38:931] snauk5g_open_file: Opening /var/tmp/kerb5srv.RC.
[19-NOV-2010 12:46:38:931] snauk5k_lock_file: entry
[19-NOV-2010 12:46:38:931] snauk5k_lock_file: Setting exclusive lock.
[19-NOV-2010 12:46:38:931] snauk5k_lock_file: exit
[19-NOV-2010 12:46:38:931] snauk5g_open_file: exit
[19-NOV-2010 12:46:38:931] nauk5rz_validate: entry
[19-NOV-2010 12:46:38:931] nauk5rz_validate: exit
[19-NOV-2010 12:46:38:931] snauk5t_close_file: entry
[19-NOV-2010 12:46:38:931] snauk5k_lock_file: entry
[19-NOV-2010 12:46:38:931] snauk5k_lock_file: Resetting lock.
[19-NOV-2010 12:46:38:931] snauk5k_lock_file: exit
[19-NOV-2010 12:46:38:931] snauk5t_close_file: exit

Now, compare that to the faulty environment:

[19-NOV-2010 12:38:56:805] nauk5ra_rcinit: Validating existing replay cache.
[19-NOV-2010 12:38:56:805] snauk5g_open_file: entry
[19-NOV-2010 12:38:56:805] snauk5g_open_file: Setting write lock.
[19-NOV-2010 12:38:56:805] snauk5g_open_file: Opening /var/tmp/kerb5srv.RC.
[19-NOV-2010 12:38:56:805] snauk5g_open_file: open failed with 13: File I/O error
.
[19-NOV-2010 12:38:56:805] snauk5g_open_file: Returning 203: File not found
.
[19-NOV-2010 12:38:56:805] snauk5g_open_file: exit
[19-NOV-2010 12:38:56:805] nauk5ra_rcinit: Could not open cache.
[19-NOV-2010 12:38:56:805] nauk5ru_create: entry
[19-NOV-2010 12:38:56:805] nauk5ru_create: Creating replay cache.
[19-NOV-2010 12:38:56:806] snauk5g_open_file: entry

This will eventually end in

[19-NOV-2010 12:38:56:807] na_csrd: exit
[19-NOV-2010 12:38:56:807] nacomer: error 12641 received from authentication service

Workaround


Now, I would not write this, if I did not need to remind myself of the following. There is a workaround until this is permanently fixed by Oracle, but it is not without risk.
The problem is on file permissions. Once the file is created, it gets 600, or read-write for owner only permissions. Change that to 666 (read-write to everybody), and you're done.
In fact, the file does not need to have any contents:

rm /var/tmp/kerb5srv.RC
touch /var/tmp/kerb5srv.RC
chmod 666 /var/tmp/kerb5srv.RC

Asof now, all packages on this machine can use Kerberos authentication.
Please note: kerb5srv is the name of the Service Principal chosen by my client overhere - your implementation may use a different name. Use that (or, trace to see what file is used)

By the way, the entry on Privacy was updated, due to the signing of this.

The Catch


Apart from the fact you may have reservations to open up a file to the world like that, the file itself resides in a temporary directory, /var/tmp.
If any system administrator decides to clean that up, the dreaded ORA-12641 will be all over the place.
Just rerun code above, and you're done.

Permanent Solution


There seems no sqlnet variable to cure this; sqlnet.KERBEROS5_CC_NAME brings nothing. Strictly speaking, it is not the Credential Cache, but the service principal cache, so that figures.
What I would like to see is one of:
  • introduce an environment variable that allows to govern location of this file
  • use a suffix, just like in the credentials cache.
Update: the TMPDIR environment variable does point one...

No comments: