DNS And E-mail Part 2: Troubleshooting
Volume Number: 21 (2005)
Issue Number: 8
Column Tag: Programming
Mac In The Shell
by Edward Marczak
DNS And E-mail Part 2: Troubleshooting
How And Where To Look For The Source Of Trouble.
You know that something isn't working right. Perhaps you're not receiving mail, or worse, the
users have noticed before you have, and now your phone is ringing. But where do you start looking?
What tools can you use to root out the problem? If you're wondering about the answers, read on.
Hello Again
Last month, we got into DNS and e-mail, and how they relate. This month is part 2, and depends
on part 1. We got into some basic troubleshooting, but there's more to the equation. If you're
setting things up for the first time, you tend to run into DNS related issues. If things have been
up and running for a bit, and then suddenly stop, there's another range of issues to look at. But
for me, it all starts in one place.
Sunday Papers
Always look at the logs. I've said it before: logs are the heart of the system. If you're a
system administrator, you should always be monitoring the logs. Always. Partially by eye,
partially by script that will alert you to problems. In the case of mail, the logs are going to
tell you one of two things: the problem is directly on your system, the mail server, or, the problem
lies elsewhere, outside of the mail server. In the former case, there's either database corruption
or a simple misconfiguration. In the latter, while it can be several things, I'm betting on DNS as
the culprit.
On The Outside
Let's start with this case: you're getting calls that inbound mail isn't showing up. Or, perhaps
you just set up a server from scratch and noticed this for yourself. What do you do? Check the
logs. Specifically, /var/log/mail.log. In one case, you'll send e-mail from an outside test
account (GMail, I'm looking at you), and the logs will show...nothing. Nada. No movement. Well
what's wrong?
This can be one of three things: 1. Postfix just isn't running (this should be unlikely, as
watchdog/launchd take great pains to make sure postfix is always going. But hey, it can happen). 2.
Your firewall/router isn't allowing the SMTP protocol to reach your mail server, or 3. Outward
facing DNS isn't correct.
What Can I Do?!?
In the case that Postfix isn't running, start it! It may be that simple. If you've been toying
with the config files by hand (or you have some disk corruption), there may be something preventing
it. How will you know? Check the logs! When you start Postfix, by Server Admin or CLI, watch
/var/log/mail.log. If there's a problem, it'll tell you.
In the case where your firewall isn't configured correctly, I unfortunately can't help you
directly, but merely point it out as a source of problems. There are many, many, many varieties of
firewall out there, along with many ways to have a network configured. I can't speak to them all.
In the general sense, though, port 25 for SMTP must be able to traverse your firewall, reach your
mail server, and your mail server should be able to reply via the same path - most services aren't
cool with asymmetric routing.
Then there's the case where the world just doesn't know how to send you mail.
Contact
As I touched on last month, DNS plays an important role in the delivery of e-mail. Specifically,
when a mail server needs to deliver a piece of e-mail somewhere other than its local self, it
queries DNS for the MX record of the recipient's domain.
If someone on the outside world is trying to send you mail, the DNS server they're using had
better be able to get their mail server an MX record for your domain. If it can't, their mail will
sit in their local queue for a bit. The tool for this job is dig - the domain information groper.
By default, dig will lookup DNS 'A' records, similar to the now deprecated nslookup:
$ dig www.example.com
; <<>> DiG 9.2.2 <<>> www.example.com
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 8947
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 0
;; QUESTION SECTION:
;www.example.com. IN A
;; ANSWER SECTION:
www.example.com. 172800 IN A 192.0.34.166
;; AUTHORITY SECTION:
example.com. 21600 IN NS a.iana-servers.net.
example.com. 21600 IN NS b.iana-servers.net.
;; Query time: 4172 msec
;; SERVER: 192.168.100.12#53(192.168.100.12)
;; WHEN: Tue Jul 19 23:56:45 2005
;; MSG SIZE rcvd: 97
Gives us quite a bit more information than nslookup, too. The quick way to look at this is:
Question section: what was asked of us?
Answer section: the answer to the query, if possible.
Authority section: which servers are authoritive for the domain in question.
We, however, need to check the MX record for our site. Let's look up a domain that has a nice,
clean example. Here's an example from my favorite example, GMail:
$ dig MX gmail.com
; <<>> DiG 9.2.2 <<>> MX gmail.com
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12756
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 4, ADDITIONAL: 4
;; QUESTION SECTION:
;gmail.com. IN MX
;; ANSWER SECTION:
gmail.com. 3600 IN MX 10 gsmtp171.google.com.
gmail.com. 3600 IN MX 10 gsmtp185.google.com.
gmail.com. 3600 IN MX 10 gsmtp171-2.google.com.
gmail.com. 3600 IN MX 10 gsmtp185-2.google.com.
gmail.com. 3600 IN MX 5 gmail-smtp-in.l.google.com.
;; AUTHORITY SECTION:
gmail.com. 86400 IN NS ns3.google.com.
gmail.com. 86400 IN NS ns4.google.com.
gmail.com. 86400 IN NS ns1.google.com.
gmail.com. 86400 IN NS ns2.google.com.
;; ADDITIONAL SECTION:
ns1.google.com. 276050 IN A 216.239.32.10
ns2.google.com. 276050 IN A 216.239.34.10
ns3.google.com. 276050 IN A 216.239.36.10
ns4.google.com. 276050 IN A 216.239.38.10
;; Query time: 46 msec
;; SERVER: 192.168.100.12#53(192.168.100.12)
;; WHEN: Tue Jul 19 23:19:04 2005
;; MSG SIZE rcvd: 306
First thing to point out: with dig, you can simply give it the record type that you're after.
Back to e-mail: notice that in a proper setup, the answer section contains all mail servers, along
with their priorities. Remember: the lower the number, the higher the priority. Also notice that
in the answer section, each record ends with the root node - the dot (".").
gmail-smtp-in.1.google.com is the highest-priority mail server. All mail deliveries should be
attempted there first. This is followed by 4 backup servers that have equal priority. If the main
server is down or unreachable, outside mail servers will choose one of these to deliver to. The
algorithm for this is actually a little more complex than you'd think at first look - it doesn't
simply choose one at random. This algorithm is part of a deeper discussion, though. Once the main
server is back up, they will hand off the mail that they've queued up.
Naturally, you should have a backup mail server, and it should reside at a separate location on a
separate network. If you can't configure this yourself, your ISP will very often act as a backup MX
for you. I've also helped with some creative arrangements where companies will backup each other.
Just be aware that mail that gets queued on a backup sits there unencrypted, so you need to trust
your backup partner. This is, actually, a good reason to start encrypting e-mail yourself. Check
out the GPG article in this issue by Emmanuel Stein for instructions on just that. Additionally, I
have an introduction to GPG, digital certificates and why I sign on my site at
http://www.radiotope.com/writing/?p=13.
dig is happy to query other DNS servers. In the case of mail, you need to do this. Make sure
you're querying a server outside of your LAN so you're getting the same perspective as the world.
Run it like this:
dig @199.184.165.1 MX yourdomain.com
(199.184.165.1 is an actual DNS server - be kind: don't hammer servers that are open to the
public. Find and use the DNS server provided by your ISP.)
If DNS is the culprit, what's wrong? Did the foray into dig shine a light on the issue? Were
there any surprise or missing entries? Malformed entries? Mail server priorities set correctly? As
you can see, there are a lot of 'i's to dot and 't's to cross. Computers can be sensitive to that
kind of thing.
If your DNS is correct, try the telnet trick from last month's column. Either mail is getting
through, or it isn't. If mail can't penetrate a firewall, correct DNS won't make a difference.
Inside
If your setup passes the DNS test, what's next? To the logs! You should still be looking at
/var/log/mail.log - preferably in a terminal using "tail -f" so you're watching it update in
quasi-real time.
This is a quick refresher regarding the e-mail subsystem. If you need more depth, please refer
to part 1 of this article.
The first thing to remember is this: 'e-mail', although typically viewed as one single entity, is
traditionally many separate pieces. In our case, Apple uses Postfix for smtp - the server receiving
e-mail from the outside world, and from your mail client, and they use Cyrus for POP and IMAP - your
mail client checking mail. All of these components have convenient acronyms to go along with them
(Woo! More acronyms!):
- MTA: Mail Transfer Agent - your SMTP server, as it's responsible for transferring mail.
- MUA: Mail User Agent - this is your e-mail application such as Eudora, Mail.app or Pine. It's
the user interface into the mail store.
- MDA: Mail Delivery Agent - This is a program that is responsible for getting the mail from the
MTA and dropping it in the mail store.
Traditionally, your MTA (such as sendmail) would receive mail (internally or externally), and
know how to get it into the system mail spool all by itself. The system mail spool contained
mailboxes for each user on the system, and it would do so using mbox format. mbox is the
traditional Unix mailbox format. It stores all mail for a single mail folder, like your inbox, in a
single, large text file. As you can imagine, this doesn't scale too well. How many people reading
this article have, or know someone who has, an inbox that's over 500MB? How about 1GB? Yeah, it
happens. To keep all of that in a single, un-indexed text file can cause you some performance
issues. On the other end of this, a MUA, like 'mail' or pine, would know how to reach into that
same mail store and display mail to the end user. Cyrus changes all of that, so this discussion
will focus on what we are dealing with.
Subdivisions
If you've never touched Apple's stock mail config, you should very rarely see problems. Now that
the log rolling bug has been fixed, you almost have to try to destroy Cyrus. See part 1 on ways to
detect and troubleshoot Cyrus issues. Postfix is typically just a rock. But we're techs, right? We
can't leave well enough alone! We can improve it! Or, more likely, your employer or your client
ask you about some functionality that doesn't already exist in Apple's config. Thankfully, 10.4
Server brings us integrated Amavis with ClamAV and SpamAssassin. However, the now oft-most
requested item - vacation replies - is something you have to deal with. (This should be handled by
sieve, but it was broken under 10.3, and currently remains so under 10.4.2).
While Postfix has several tables and files, two really cover the vast majority of configuration:
main.cf and master.cf. These are stored along with the rest of Postfix's configuration files in
/etc/postfix. If you are ever tempted to make changes to these files, back them up first! It'll
take you two seconds to tar them up (tar czvf ~/postfix-`date '+%Y%m%d%H%m'`.tar.gz /etc/postfix),
or just simply copy them. You'll be much happier that you did when things stop working, and you
need to go back to things the way they were.
main.cf is the global configuration file for Postfix. When you edit information in Server Admin
(or Postfix Enabler), you're altering this file. main.cf is primarily responsible for defining the
service's role:
- What domain to receive mail for.
- What destinations to relay mail to.
- How to deliver mail
(direct or relay).
- The domain to use in outbound mail.
- Clients to allow relay from.
Apple includes a default main.cf file in /etc/postfix. Server Admin just tacks the parameters
that it changes onto the end of the file. So you'll find two values for certain parameters. The
second will override the first, so Apple's method is OK. In some ways, it's nice that all of the
values it changes are grouped.
The format of main.cf is simple: parameter = value. Like a unix shell variable, you can use the
value of the parameter later on in another parameter like this: parameter = $other_paramter.
Interestingly, since Postfix does not compute the value until run time, you can use a variable
before you actually assign anything to it. The lesson here is that one reason Postfix may not start
is that main.cf is mangled. The one parameter that must be right here is "mydestination" - this
should contain all domain names that Postfix will accept as local, and hand off to 'deliver'.
Any time you make changes to the configuration files, you must issue a postfix reload for Postfix
to pick up the changes. I've seen people make changes, and then wonder why the changes aren't
working. No postfix reload, no changes.
More likely, you find some neat addition to Postfix, and the instructions ask you to modify
master.cf. master.cf controls Postfix's master process, which in turn controls all Postfix child
processes.
Each line in master.cf defines a service in Postfix. The order doesn't matter, however: a) if
you define a service multiple times, only the last one is honored, and b) it's smart to keep logical
groups together, for your own sanity. Empty lines and comment lines (begin with '#') are ignored. A
logical line begins by having non-whitespace text in column 1. A line is continued by having
white-space precede the text. Each line has 8 columns. The man page can do a better job of
explaining the basics than I can. Unfortunately, Apple doesn't include this man page, so go check
out http://www.postfix.org/master.5.html.
The part that tends to throw people off is taking generic installation instructions and modifying
them for OS X. Sometimes an installer will try to put a binary or script in a location that is not
in your path. Postfix filters should never be run as root, so you'll need an additional system user
to run filters. However, most generic instructions tell you to add a user using adduser - present
on all Unicies but OS X (not by default, anyway).
There are three ways to get a filter to run: globally, by placing a content_filter= statement in
main.cf, by specifying a content filter for a given path (master.cf), or, by running a filter on the
fly, based on some criteria (altering lookup tables or checks).
The first method is the most simple, and is used by Apple to hook Amavis into Postfix. Note the
last line of main.cf on a Tiger Server:
content_filter = smtp-amavis:[127.0.0.1]:10024
This makes Postfix run all queued mail through a filter called smtp-amavis. How does Postfix
know what "smtp-amavis" is? It must be defined as a service in master.cf. You'll see that
master.cf has this definition (under Tiger):
smtp-amavis unix - - y - 2 smtp
This content filter line in main.cf will take all mail and run it through the filter
"smtp-amavis", and that's really what we want for an anti-virus filter. However, if, for some
reason, we wanted this filter to run only for incoming mail, we have to remove the content_filter
statement from main.cf (that's global, remember - in and out) and redefine the smtp service to have
two unique instances: one for inbound and one for outbound. This needs to be done by port, or by IP
address. This tends to confuse people that are running with one physical interface.
The easiest way to handle this is to have two IP addresses. You either need two physical
interfaces, or you can multi-home a single interface. Then, you can add a line in master.cf that
applies only to the outbound interface:
&ly;outbound ip> :smtp inet n - y - - smtpd
-o content_filter=smtp-amavis:
The "-o" flag overrides a main.cf parameter. In this case, we'll only filter if mail is
traveling over the outbound ip. Also note: there is never any white-space surrounding a "=" in
master.cf. The trailing ":" after the filter name is important! When left blank afterwards, it
means 'for all domains'.
Another way to handle different routes for different filters is to use different ports.
Naturally, this is dependant on the filter itself, and the protocol involved. For example: while
server to server smtp would be tough to change from the default of port 25, you can certainly tailor
your submission port to suit your needs.
The third way is a bit more dynamic. Postfix will let you filter on the fly by using the access
table or header_check and body_check rules.
A great feature of Postfix is that it performs filtering after mail is in the queue, but before
it gets delivered. This way, if there's a problem with the filter, mail doesn't bounce, but simply
gets queued up. You'll see this in the logs as "status=deferred". Check your mail queue by typing
"mailq" at the command line. You'll see something like this:
# mailq
?-Queue ID- --Size-- ----Arrival Time---- -Sender/Recipient-------
?8EF0949EA 710 Tue Jan 13 20:43:31 ?
710 Tue Jan 13 20:43:31 ?
(deferred ?transport) ?
-- 1 Kbytes in 1 Request.
In the case where you see "deferred", you need to take care of the filter, or remove it by
commenting it out of both main.cf and master.cf. Remember from last month, a 'deferred' condition
can also apply to Cyrus' deliver agent not being able to, er....deliver!
If you can fix your filter (a socket based filter may simply need to be restarted, a pipe based
filter may need some re-coding), great. You should simply have to postqueue -f to flush the queue
and have Postfix make re-delivery attempts.
If you've removed the filter (temporarily, right?), you need to re-queue the mail, otherwise it
will keep trying to deliver itself through the old (now non-existent) channel. Do this, as root (or
use sudo), with the postsuper command: postsuper -r ALL, followed by a postfix reload. Watch the
logs and your mail should start flowing freely. Follow that up by another mailq.
Learning To Fly
This isn't the end - we're just learning to fly. Hopefully these troubleshooting guides help you
find some of the trouble in an e-mail system should you run into it. As I say each month: watch the
logs! This is the heartbeat of your system, and it'll let you know immediately when there is a
problem.
Naturally, there can be other problems that crop up in the OS X mail system: mailman issues,
other custom work or installations that may grab a port on you, etc. The best way to learn how
something works is to watch it fail - and then reinforce that by diving in and fixing it!
Next month, we're going to take a break from e-mail and DNS, and get back to something a little
more straight-forward: the Terminal, and one of my favorite utilities, "screen".
P.S. :
- Johnny Cash - Hello Again
- Joe Jackson - Sunday papers
- Oingo Boingo - On
the Outside
- Ice Cube (not .38 Special) - What can I do?
- Stereolab - Contact
- Buzzcocks -
Inside
- Rush - Subdivisions (to all you ACNs that landed at YYZ for camp!)
- Pink Floyd -
Leaning To Fly
Ed Marczak, owns and operates Radiotope, a technology consulting company that
implements mail servers and mail automation. When not typing furiously, he spends time with his
wife and two daughters. Get your mail on at http://www.radiotope.com