Monday, September 21, 2009

Sun breaks gethostbyname(), nscd.

Thanks, Sun!

gethostbyname(3) is supposed to return the canonical hostname in the h_name field of the struct hostent that it returns a pointer to. It is supposed to return any aliases (e.g. CNAMEs in DNS) in the h_aliases field.

Sun recently released a version of /lib/nss_dns.so.1, which is used by nscd (which, if it is running, all calls to gethostbyname(3) go through). This is part of patch 140391-02 (for SPARC) or 140392-02 (for x86). The -03 version of these patches also has the problem. This patch is part of the most recent Recommended Patch cluster, and it is included in the Solaris 10 u7 release.

This patch messes up the return from gethostbyname(3), so that when you look up a CNAME, the CNAME goes into the h_name field and the actual canonical name goes into the h_aliases field.

This breaks anything that uses gethostbyname(3) and actually expects the h_name field to contain the canonicalized hostname. (At work, we found the bug because certain software wouldn't start right -- because the start script compares the local hostname to the result of a lookup of a CNAME, and that no longer worked right.)

Note that this bug persists even if you have hosts caching turned off in nscd.conf.

The simple workaround is to turn off nscd (by using svcadm disable name-service-cache). This can cause some serious slowdowns if you have a lot of name lookups (e.g. directories that contain lots of different users and groups). I measured a slowdown of a factor of 7.7 doing 'ls -l' on a directory containing 150 files each owned by a different user and group. (It was a local directory, and I redirected the output to /dev/null, so I believe I limited confounding factors.) If you don't want to turn off nscd, your only other choice (until a real patch is released) is to ask Sun for their IDR ("Interim Diagnostics and Relief") pseudo-patch for this, which is IDR142516-01 for SPARC, IDR142517-01 for x86. This will require a Sun service contract.

Feh.

No comments: