New mail server postmortem

The new mail server for work went into production last week. Here are some notes/gotchas in putting it together:

RH9/package notes in general:

Be careful when doing updates with the glibc. For any reasonable box, RH9’s installer will use the i686 version instead of the i386 one. If you “downgrade” accidentally when updating the i386 RPM by hand, you will hose the box. Make sure you pick up the i686 version from the proper directory. If you do “downgrade”, it may be possible to force install the i686 RPMs if you haven’t rebooted and are lucky.

A working SRPM for Cyrus-IMAP 2.1.15 can be found at http://home.teleport.ch/simix. I couldn’t get a modified version of the 2.0.17 spec file to work. Note that the bin files for this RPM go into /usr/lib/cyrus-imapd.

RH9 comes with namazu and relatively recent versions of MHonArc and Mailman. The latter two rebuilds easily off of updated SRPMs and lightly edited spec files. Note that there’s a bug that causes high load in Mailman 2.1.2. Make sure you use the latest Mailman package.

The perl-DB_FILE module is needed for SpamAssassin Bayesian learning.

Application notes:

Sendmail configuration was relatively simple, since we have 8.12 examples to work off of. One thing to change between an 8.11 and 8.12 configuration is to make sure define(`confAUTH_OPTIONS’, `A p y’) has the “p y” at the end, to require TLS for anyone doing SMTP auth.

The new box was set up with LDAP authentication, with saslauthd as the intermediary. Cyrus uses SASL2, whereas Sendmail uses SASL1. Make sure the conf files for authentication are in the right place. Make sure saslauthd is running before testing authentication. Right now, we have saslauthd getting to LDAP via PAM, since the RH9 version of saslauthd doesn’t appear to have the LDAP mechanism compiled in, and PAM/LDAP is sufficient for our needs.

One note with PAM/LDAP, vis-a-vis objectClasses. We’re using our own custom objectclass for the mail users, but for some reason the PAM/LDAp configuration in /etc/ldap.conf didn’t pick up the objectclass filter. This is fine for most users, since the default posixUser applies, but didn’t work for our mail-only users. The mail-only users have been changed to posixUsers in the meantime. This doesn’t affect anything else in our setup, but is just a nuisance.

Mailman 2.1.3 now uses a number of daemon processes instead of a single qrunner process called by cron. This is a definite improvement, as we no longer will have mail stuck in the delivery queue because the archiver was crunching through a big file. Mailman migration basically involved copying the $MAILMANDIR/lists directory (along with the data directory for unprocessed messages) and running $MAILMANDDIR/bin/update to update the lists. If I recall, this also generates the sendmail aliases for Mailman.

The Mailman archiver was having problems with MHonArc, where headers weren’t being transmitted for some reason. The solution was to send the message to a trivial shell script:

#!/bin/sh

LISTNAME=$1
TEMPFILE=/tmp/$LISTNAME.`date +”%H%M%S”`

cat > $TEMPFILE

/usr/bin/mhonarc -add -mbox “$TEMPFILE” -outdir “/var/mailinglists/lists/$LISTNAME” -rcfile /var/mailman/etc/mhonarc.mrc > /dev/null

rm $TEMPFILE

For some reason, this retains the headers for MHonArc processing. Note that the Mailman external archiver configuration is now a FAQ entry for Mailman.

Namazu worked out of the box with the previous configuration. Not much had changed.

Cyrus upgrade was relatively painless, too. The main differences had to do with Cyrus file locations. Almost everything except for the IMAP spool itself is now under /var/lib/imap, e.g., /usr/sieve had to put into /var/lib/imap/sieve. We took sieve, user, quota and, of course, mailboxes.db. The IMAP spool now has a first-letter partitioning scheme. For example, /var/spool/imap/user/cchen in 2.0.x now goes to /var/spool/imap/c/user/cchen. This was a minor matter of scripting after the files were copied over from the old hard drive.

Note that copying (cp -a) took about 6 or 7 hours. Make sure people clean out their Spam folders first. A dd would have been faster, but was not possible for this setup.

mdadm is nice. We took the old drive and set it up as a hot spare for the RAID-1 on the system disks. raidhotadd would have been fine, too, but mdadm is more informative.

Server notes:

We got a big performance improvement by making sure the mail and IMAP spools, Cyrus maiboxes.db and LDAP database were marked as noatime (chattr -R +A). Load on the hyperthreading box hovered between 1 and 3 for most of the first day or two before we realized that noatime had not been set. After this was done, load dropped to 0.50 or so for normal periods. Possibly, the hyperthreading confused things like top and uptime into reporting a higher-than-normal value (IO load being counted for proportionately more than it should have been), but that’s just a belief on my part. Certainly, the box was sluggish in the first couple of days of production use, with excessive IO.

Sendmail’s milter timeout had to be tweaked. We started Bayesian scoring a couple of weeks ago, with spamd running on a different box. The default timeout of 10 seconds allowed at least 5% of spam to get by without SpamAssassin providing any scoring. The timeout has now been set to 90 seconds, which allows virtually all email to be scanned and scored.

Notes for the future:

We need to put in something like MIMEDefang to take care of the email social engineering viruses. MIMEDefang requires a set of extra Perl modules, most of which is available at RPM contrib directories from rpmfind.net. Also, has a recent RPM of MIMEDefang in any case. MIMEDefang will be configured later; we’ll get by with just SpamAssassin for now.

We have to put in some sort of distributed IMAP store for the NY and CA offices. Presumably, we’ll use Cyrus Murder for this. I’ll also need to read the Sendmail book for this sort of distributed delivery for CA users.

Comments are closed.