Logging

Mystery of the failed SSH logins solved by discovery of empty files

I had a repeat of a problem that I’ve probably had many times over the years. I wanted to write down the solution before I forgot it, because it had been long enough since the last time that I’d forgotten the fix.

Let’s set the stage: I wanted to SSH into a remote system. In my case, that remote system was a Mac OS Snow Leopard system. The client was also Mac OS X.

On that remote system, I had enabled sshd. This is done on Snow Leopard with System Preferences > Sharing > Remote Login. (In the Sharing System Preferences, you set up the users that are allowed to remote log in, either allowing all users or setting up an ACL that only allows specific users. If you set up the ACL, it should supercede whatever additional user/group login restrictions you’ve configured in /private/etc/sshd_config.)

Then, I tried to SSH from my client system. Instead of a successful login, I was told the following — before I was prompted for a password.

$ ssh user@server.example.com
Connection closed by 192.168.1.10.

Scratching your head, you add verbosity to the SSH connection attempt.

$ ssh userid@server.example.com
[snip]
debug1: SSH2_MSG_KEXINIT sent
Connection closed by 192.168.1.10.

I searched for other people having the same problem with “debug1: SSH2_MSG_KEXINIT sent,” because other people always have the same problem and some of them wrote about it and some of those solved it. Right?

Well, in this case, the other solutions I found were not helpful to my specific situation. But many people responding to pleas for help did mention the always-good advice to “check the server logs.” Which I could do, so I did.

In the secure.log, I found several groups of lines like this corresponding to the times I had tried to log in:

Nov 5 18:47:20 server sshd[1239]: error: Could not load host key: /etc/ssh_host_key
Nov 5 18:47:20 server sshd[1239]: error: Could not load host key: /etc/ssh_host_rsa_key
Nov 5 18:47:20 server sshd[1239]: error: Could not load host key: /etc/ssh_host_dsa_key
Nov 5 18:47:20 server sshd[1239]: Disabling protocol version 1. Could not load host key

That reminded me of the common problem with SSH host keys on Radmind-managed computers. See, Mac OS X will try to create the host keys if they are missing, but not if they are zero-length. On Radmind-managed computers, it was trivially easy to get zero-length SSH host key files in /private/etc because the tendency was to manage them with negative transcripts. Files listed in negative transcripts would be created if they were missing, but they would be created as empty files (by design).

Empty SSH host key files will prevent you from logging into that system with SSH.

I checked the server and — sure enough — the host key files were zero length. I deleted them, then stopped and restarted Remote Login for good measure. This solved the problem, and I could log in from the client.

Run Drupal cron with Drush at Site5

After starting to use Drush, I wanted to switch my Drupal cron jobs over to it. I’d previously been running these jobs the standard way, loading a URL for each of my Drupal sites with curl. That curl command was run by the cron; Site5 allows editing of your crontab directly or through its Backstage Web interface.

I started with the basics. Drush was installed in my home directory, with an alias under ~/bin in my Site5 account. I replaced my curl command one-for-one with the following:

$ /home/your_site5_account/bin/drush \
—quiet \
—root=/home/your_site5_account/public_html/your_drupal_site/ \
cron

This didn’t work out well. I was getting a huge number of mail messages, indicating problems with the cron jobs. The messages typically contained “tput: No value for $TERM and no -T specified.” Needless to say, this was rather frustrating, so after some trial and error, I modified the command as follows, and that has been working better for me.

$ /usr/local/bin/php \
-d memory_limit=64M \
/home/your_site5_account/drush/drush.php \
—quiet \
—root=/home/your_site5_account/public_html/your_drupal_site/ \
cron

The biggest change is that I specify PHP, rather than Drush, directly. This was done so that I could increase the PHP memory limit for the cron job significantly, to 64M, while running Drush. I’m sure that this increase was needed partly due to the number of modules I have installed on my main site (which has an even larger default PHP memory limit in its php.ini). My research indicated that I needed to do this for Drush since it doesn’t access the Drupal sites in the same way loading a page does.

The other noticeable change is that I provide the path to the drush.php file rather than pointing to the Drush alias.

Local logins succeed but network logins fail on an Active Directory bound Mac OS X Leopard system

I came across an interesting “problem” with Active Directory binding on Mac OS X Leopard. The symptoms were:

  • No Active Directory user accounts could log into the computer from the loginwindow.
  • Some of the attempted logins involved cached mobile accounts from the Active Directory.
  • The account login failures happened even though loginwindow’s “network accounts are available” indicator was green.
  • The login problem persisted it the computer been unbound and rebound to the domain.
  • The same Active Directory users could log in on other Macs.
  • Local users could log in to the affected computer.
  • Using “su” to switch users from a local user to an Active Directory user worked in Terminal.
  • Lookups using “dscl” and other DirectoryService tools worked.

Since I’ve written (what seems like a) a book about Active Directory troubleshooting, I threw the book at this problem. It ended up taking quite some time to troubleshoot, and the answer ended up being very simple. However, it wasn’t on my normal list of culprits.

The biggest clue I found, besides the symptoms above, was that the DirectoryService debug logs yielded this during Active Directory logins from loginwindow:

2010-02-24 21:33:37 EST - T[0xB0103000] - mbrmig - Dispatch - Membership -
is user jaharmi member of group GUID 3BBC71F5-3497-4494-904B-8AC3E25CCA52 =
false

It didn’t seem like a smoking gun, but I’d never come across this “false” response on a bound system before. So, what group was so important to the login process that the DirectoryService debug logs cared enough to note the failure? I was darned if I knew, and I had no other promising clues at that point.

So, I investigated that group further, and found it by its UUID using dsmemberutil:

$ dsmemberutil getid -X 3BBC71F5-3497-4494-904B-8AC3E25CCA52
gid: 200

Well, that helped a little, but the name would have helped a lot more. I had to find which group corresponded to the GID of 200. That GID was not at all familiar to me, but it was under 500, so there was a pretty good chance it came from Mac OS X.

$ dscl /Local/Default -list /Groups PrimaryGroupID | awk ‘$2 == 200 { print
$1; }’

com.apple.access_loginwindow

This was my eureka! moment. I wasn’t entirely sure, but I was pretty confident that the “com.apple.access_loginwindow” group was the access control list group for the loginwindow process. Loginwindow controls all graphical logins to Mac OS X, and is the parent process of each GUI login session.

Looking up the group’s description confirmed that it was the ACL group. I did the lookup in Workgroup Manager, which was set to view the DSLocal directory service. While I was there, I also checked the membership: it listed only the computational group “localaccounts.” The “localaccounts” group is essentially a query that returns all accounts in the local directory service.

Well, that would certainly prevent Active Directory users from logging in with loginwindow. The ACL consulted the membership of the “com.apple.access_loginwindow” group to determine who was allowed to log in via the GUI. Because it contained only the “localaccounts” group, the ACL was preventing all non-local users from logging in.

Not knowing how this group was handled or even what had last edited it, I compared the affected system to a different AD-bound Leopard computer, which also had Workgroup Manager. (It’s handy to have the Mac OS X Server Admin Tools deployed out to your computers even if you don’t have a server to maintain.) The second computer didn’t have the group at all, which perplexed me a bit.

However, that made me reasonably sure I could simply delete that group. I backed it up from the filesystem at the command line, just to make sure, and then deleted it with Workgroup Manager on the affected computer.

After that, logins for all Active Directory accounts I tried proceeded normally at the loginwindow on that system.

With the problem solved, I sought more information about the workings of the “com.apple.access_loginwindow” group. I confirmed that it is created when the “Allow network users to login in at login window”
option is turned on in System Preferences > Accounts > Login Options. This should be turned on by default, and that initial state results in no “com.apple.access_loginwindow” group at all.

Since the option is on by default, the really simple solutions to this kind of problem are:

  1. Don’t turn off the “Allow network users to login in at login window” option in System Preferences > Accounts > Login Options.

     

  2. If “Allow network users to login in at login window” has been turned off, either:
    1. delete the group named above, or
    2. toggle the option back on.

Deleting the “com.apple.access_loginwindow” group removes it completely and reinstates login capability for both local and network user accounts.

Toggling the System Preferences option back on, adds the “netaccounts” group to the “com.apple.access_loginwindow” group, reenabling login for both local and network users. It does not, however, remove the group “com.apple.access_loginwindow,” which remains on the system afterwards.

Here’s what that looks like in Workgroup Manager:

To prevent this on managed clients, I could see a system administrator proactively creating and managing the membership of the “com.apple.access_loginwindow” group. To ensure that managed clients bound to an Active Directory allow both local and network users to log in, make sure the group is populated with the appropriate nested groups: “localaccounts” and “netaccounts.”

List changed files in a Mercurial repository with a custom output style

While trying to troubleshoot what I’d done to mess up the Mercurial repositories managing my Drupal installations last weekend, I really would have liked a way to see what files had changes in specific revisions. Each revision to a Mercurial repository affects some files, of course, but it seems awfully hard to figure what files changed in that check-in.

I have since found a way to do that by customizing the output of Mercurial. To customize output, you can create templates on the command line (with —template) or for more powerful reformatting, create an output style file.

I struggled for a while to figure out how to use style files, and eventually came up with something that works for me so far.

Since I’ve installed Mercurial from Lee Cantey’s standard binary package for Mac OS X Leopard, I created the file “map-cmdline.changedfiles” at the “/Library/Python/2.5/site-packages/mercurial/templates” path. (Where you put the file may vary depending on where Mercurial is installed, and I’m sorry but I don’t know where it gets installed on other systems.) The contents of “map-cmdline.changedfiles” are below, along with my possibly inept description of what each line is doing:

# Get all of the files in the selected revision
# and stringify them, whatever that means
# but do not 'tabindent' or wrap them to 68/76 columns
# Without first setting changeset to the list of files
# you won't get output from subsequent lines
changeset = '{files|stringify}'
# List modified files, one per line
# preceded by M to mimic `hg status`
file = 'M {file}\n'
last_file = 'M {file}\n'
# List added files, one per line
# preceded by A to mimic `hg status`
file_add = 'A {file_add}\n'
last_file_add = 'A {file_add}\n'
# List deleted files, one per line
# preceded by ! to mimic `hg status`
file_del = '! {file_del}\n'
last_file_del = '! {file_del}\n'

I don’t know why the “map-cmdline.” portion of the filename is there, but as long as I have it, I can call the style file from the command line with what follows the period. So, I can call the style with “—style changedfiles” — and that tiny bit of voodoo seems reasonable enough to me. (The other styles in the directory above, many of which end in “.tmpl” extensions, seem related to the Mercurial Web server, hgweb. I tried, but I couldn’t use their names at the command line, with or without their extensions. Plus, their contents looked HTML-ish.)

With the “map-cmdline.changedfiles” style file saved in that location, I can call Mercurial’s “log” command:

$ hg log --style changedfiles -r tip

… which gives me a list of the files changed in the “tip” (or latest revision) of the repository. I could substitute in any revision identifier for “tip.”

I haven’t actually seen the “file_add” and “file_del” keywords in action; every time I’ve used this style file in the manner described, I’ve only seen files marked as “M” — even if I’m looking at a revision where new files were first checked into the repo. I’m confused by that, but I’m not going to let it sour my day at this point.

There might have been an easier way to do this but I didn’t find one last weekend. It took me some time to figure even this bit out, and I hope writing this post saves someone new to Mercurial from future frustration.

Microsoft Entourage error -18576 as it happens

I have had a problem that has repeated since I first began using Entourage as an Exchange client since at least version 11.0. I’ve never been able to solve it, but let me at least explain it here, because I’ve seen other people encounter it as well. It’s error -18576, and I hate it.

The symptoms are that I have calendar events displayed locally in Entourage that do not appear to synchronize to an Exchange mailbox. The Exchange mailbox, or account, does have some but not all of my calendar events.

There doesn’t seem to be any rhyme or reason to events that are missing. Many of them are recurring entries on my calendar. Beyond that, I’ve had events I put on my calendar myself, events that have been created through invitations from others, events with and without categories assigned, events that were created manually or with AppleScript, events that are flagged private, and so on.

The sequence of events I see in a traffic capture for affected events goes something like this:

  1. Entourage tries to LOCK (a WebDAV call?) the event at a URL path ending in /exchange/userid/Calendar/eventname.eml.
  2. Exchange returns an “HTTP/1.1 401 Unauthorized, NTLMSSP_CHALLENGE” response.
  3. Entourage tries to LOCK the event again, at the same URL path.
  4. Exchange returns an “HTTP/1.1 201 Created” response.
  5. Entourage attempts an HTTP PUT of the event at the URL.
  6. Exchange returns an “HTTP/1.1 200 OK” response.
  7. Entourage attempts an HTTP PROPPATCH of the URL.
  8. Exchange returns a “HTTP/1.1 207 Multi-Status” response.
  9. Entourage attempts to UNLOCK the URL path.
  10. Exchange responds with “HTTP/1.1 424 Method Failure.”

That’s it, that’s the extent of the HTTP/WebDAV exchange between Entourage and the Exchange server. The Entourage Error Log shows an -18576 error, but as of Entourage 2008, has no further description. Entourage’s Exchange-related errors are numbered in the -18000s and are typically based on HTTP (see the Entourage Error Page; use the equation 19000 + error number to determine its HTTP equivalent.

In my case, this is an HTTP 424 error, or a “Failed Dependency.” According to RFC 4918 on WebDAV, this means “The property change could not be made because of another property change that failed.”

The end result in Entourage is an event that does not appear in the Exchange calendar, but does appear locally — and there’s that entry in the Entourage Error Log, of course. There’s one such for each attempt to sync an affected event, so they build up over time — but some events do synchronize. It’s more aggravating that I’m left with a feeling of being in limbo, not knowing whether any calendar I’m looking at is correct or not. (Is there a sniglet for that? Update: I hereby propose “calendremma.”)

It doesn’t seem to matter whether Entourage is connecting to a front end or back end Exchange server (at least through Exchange 2003). Rebuilding the Entourage database doesn’t clear up the problem, although seems to occasionally have offered some temporary respite. However, the events that didn’t sync before still don’t sync.

The events which don’t synchronize fail to appear via Exchange ActiveSync on my iPod touch, and aren’t visible in either Outlook Web Access or Outlook for Windows.

Start Directory Service debug logging automatically after the next restart on Mac OS X

There can be times when you need to troubleshoot Directory Services in Mac OS X during system starts up. Directory Services apparently sits above and below the kernel — depending on what each of those components needs to do — so having debugging capability early on in the boot process can help you hone in on problems.

To enable debug logging at startup, run the following at the command line:

$ sudo touch /Library/Preferences/DirectoryService/.DSLogDebugAtStart

… and then, on the next restart, debug logging will be enabled. This is equivalent to running:

$ sudo killall -USR1 DirectoryService

… but the advantage is is that debug logging begins at startup, rather than whenever you can log in and start it manually.

Remove the file from the path above when you want to disable the logging for the subsequent restart. As a file whose name starts with a dot, you won’t see it listed by default in the Finder. (The Finder hides dot files by default.)

$ sudo rm /Library/Preferences/DirectoryService/.DSLogDebugAtStart

For more details, see the Apple article on Mac OS X Server v10.5, 10.6: Enabling Directory Service debug logging. (Note that directory service logging works on the workstation version of Mac OS X, not just Mac OS X Server.)

Radmind server logging and the repo command

Under normal circumstances, the latest Radmind tools that communicate with the server report client status updates in the Radmind server’s system log. These standard messages can include ones like:

May 8 03:14:56 RadmindServerHost radmind[7890]: report radmind-client.example.com 192.168.7.42 - - ktcheck No updates needed
May 15 03:15:25 RadmindServerHost radmind[24531]: report radmind-client.example.com 192.168.7.42 - - ktcheck Updates retrieved
May 15 03:21:48 RadmindServerHost radmind[24534]: report radmind-client.example.com 192.168.7.42 - - lapply Changes applied successfully
May 15 03:31:07 RadmindServerHost radmind[24356]: report radmind-client.example.com 192.168.7.42 CertificateCN - lapply Error, changes made

The Radmind repo, or “report,” tool provides the ability to send arbitrary messages to the Radmind server process. But how are these messages formatted and sent?

$ repo -e "Debug" -h radmindserverhost.example.com -w2 "Test message"

… results in the system log message:

May 15 03:31:56 RadmindServerHost radmind[25236]: report radmind-client.example.com 192.168.7.42 CertificateCN - Debug Test message

Here, we can see that an entry created with repo looks like the standard Radmind log messages above. The client hostname and IP address are reported after the “report” text. The CertificateCN for the client — if the highest authorization level is specified (with the -w2 flag) — is also listed; if not, a dash takes its place. I haven’t seen a case where the second dash is substituted, however.

Finally, where the Radmind command/tool used would normally be, the “event” specified by repo will printed. After that, the message text appears.

The value proposition is that if you’re using Radmind, the repo command can help you send arbitrary messages to the server for logging. As bonus, if you’ve taken the time and effort to build the certificate infrastructure for Radmind, you can send these messages securely between the clients and the server cloaked in SSL.

If you’re using multiple servers, you may want to combine their logs in one location so that you can get all of the clients’ reports in one location. You may also want or need to retain these reports for more time. In either case, determine what policies you should apply to the syslog or Apple System Logger (ASL, for Mac OS X) configuration for your server systems.

Whether or not you use repo, it’s good to know that the tools do some logging. The logging can be followed to try to determine the status of your clients, or whether they are failing their updates.

Unfortunately, the most common client failures I have seen tend to involve the lapply tool, and the default level of detail I’ve seen reported back to the server does not provide an indication of what problem has been encountered. You see only that there was an error. Still, even though you may not get enough detail to remotely resolve the problem, it’s something for you to go by find problems in the first place.

Syndicate content