I assist other departments on campus with Mac related issues fairly regularly, since I’m one of the few Sysadmin’s on campus that really know Mac OS X Server. The issue they were seeing (and have been seeing since they upgraded to 10.6 about 4 months ago) was any time someone tried to login to a client, or really anything as a user that was part of the OD, it would take about 60 seconds to authenticate. If they used their server’s local admin account, however, it worked instantly.
Everything seemed to be running, but it just took a long time. Investigating further, everything seemed to point to Kerberos just not functioning. It was running, but kinit would take about 60 seconds to come back asking for a password. And for some reason, the REALM for the Kerberos server had been set as SERVERNAME.LOCAL. Which, shouldn’t be an issue in of itself, but it was certainly not “proper”.
So, last night I spent about 4 hours rebuilding their Kerberos setup. Mainly by following this article, but it didn’t really work as they describe. I’m pretty sure the missing step is, you should reboot after removing all the Kerberos info. Just restarting the services doesn’t seem to be enough.
Anyway, to add to that, for some reason, kerberosautoconfig couldn’t write the edu.mit.Kerberos file in /Library/Preferences, and to really clear things out, I removed the keytab from /etc, and it wasn’t being regenerated.
The first issue I solved with “kerberosautoconfig -f /LDAPv3/127.0.0.1 -o /Users/admin/Desktop/ -r REALM.EXAMPLE.COM -m server.example.com” which outputs the edu.mit.Kerberos to the desktop of the admin user, then I manually copied that into Preferences. Once that was done, I was able to “touch” /etc/krb5.keytab, and run “sh-3.2# sso_util configure -r REALM.EXAMPLE.COM -f /LDAPv3/127.0.0.1 -a diradmin -v 1 all” and get it to populate the keytab file. A reboot later, and things were nearly working.
Last step was to touch these two files in /Library/Preferences that didn’t seem to exist: edu.mit.Kerberos.kadmind.launchd and edu.mit.Kerberos.krb5kdc.launchd. Reboot again, and both kadmin and the kdc were running. kinit was instant, kadmin -p diradmin was instant. Logging into a client, or via AFP, or just WGM as diradmin was instant.
While it probably was only 2 hours of work, it took me 4 because I really didn’t want to reboot before recreating the kerberos info, like the article said, yet for some reason, it just doesn’t work right if you don’t throw a reboot in there. =/
Your mileage may vary, but I’ll chalk this one up to a partially failed 10.6 upgrade. While it worked fine upgrading out OD Master, for some reason, it just hosed Kerberos for this department (even though, like I said, it LOOKED fine… it just didn’t work).
I love helping other departments out, and I’m especially glad my boss actually encourages/demands it.
But the real icing on this whole cake was, right as I finished, and was starting to test everything, my Comcast internet connection went out. This was at 12:15am. During this time, I had no net, or phone. They did this to me about a week ago too. It finally came back up around 1:30am. I had to stay up until then so I could actually finish testing. What I really want to know is, why the hell Comcast takes it’s headend down for over an hour at a time when people are more than likely still up. And why it took over an hour?! Don’t they have the configuration ready to load, and they just reboot the headend after it’s loaded? Or does it really take that long to cycle through and reestablish connections with everyone’s modem after a reset? Further, why on earth couldn’t they let people know they’re doing maintenance ahead of time? Email is not a new thing guys… and you have location info on my account. Email us when you’re going to take shit down. *grumble*