Nagios on OS X, Part 2

Volume Number: 22 (2006)
Issue Number: 2
Column Tag: Programming

Patch Panel

Nagios on OS X, Part 2

by John C. Welch

Setting Up Nagios 2.0

Since part one of this article was published, there have been some changes in Nagios, (the reasons behind the delay for part two in fact.) Nagios 2 has reached the release candidate stage, and as such, I felt that this part should deal with what (should) be the current version by the time you read this. Luckily, this doesn't really change anything in part 1 other than the version of Nagios you download and install, and one minor change for the configure step of installing Nagios.

Addendum and Errata

The change involves an additional group, the nagios command group. I use nagioscmd for the name of this group, and you create it as you did the nagios group in part 1. This brings us to a rather obvious error in part one, and one I should have caught. The configure command in part one is incorrect, and if it works at all, will give you an incorrect setup. With Nagios 2 in mind, the correct configure command is:

./configure --with-gd-lib=/opt/local/lib --with-gd-inc=/opt/local/include 
   --prefix=/usr/local/nagios --with-cgiurl=/cgi-bin --with-htmlurl=/ --with-nagios-user=nagios 
   --with-nagios-group=nagios --with-command-group=nagioscmd

That should work correctly for you. The rest of part one should be unchanged for you, it has been for me.

Initial Configuration

One change in Nagios 2, and one that will be welcomed by administrators new to Nagios is the initial configuration. In Nagios 1.2 and later, you had to use multiple configuration files, and getting them set up, and grasping the relationship between them was a little tricky. With Nagios 2, if you're new to Nagios, or you want to play with a smaller setup before you get into mapping the entire Internet, you can now use a much smaller number of config files (around 3 for a minimal configuration). In fact, you have your choice of two templates to work from, one called "minimal.cfg-sample" and one called "bigger.cfg.sample". As the names imply, they are for minimal, and slightly bigger Nagios setups.

In case I haven't done this, let me stress one critical point: This article is not, nor should it be taken as a replacement for the Nagios documentation! That documentation is far more complete, and if this article disagrees with the Nagios docs, the Nagios docs should be assumed to be more correct, unless they are assuming !Mac OS X.

The Config Files

As I have alluded to, Nagios makes use of text config files for its setup and to do its work. They can seem daunting at first, but they do follow a logical flow and they should not intimidate you in the least. When you initially install the config files, they all have the name pattern of <something>.cfg-sample. When you have a file set up as you like, remove the -sample from the end of the name, and Nagios will be able to use it. Note that in general, any changes to a config file will probably require a restart of the Nagios process for those changes to be used.

Basic Nagios config theory

Before we get into specifics of the files, we need to look at the relationship of things in Nagios, so that we might have a better mental picture of what's going on. The basic 'unit' in Nagios is the host. A host is a computer, a router, switch, etc. It's anything that Nagios directly probes. Each host has a series of characteristics, like IP address, and name that define it to Nagios. Since you can have hundreds, if not thousands of hosts in a network, and applying settings and services to them individually would be rather tedious, you have hostgroups. Hostgroups are just what they sound like, a collection of hosts that allow you to more easily work with hosts. So you can apply a probe to a single hostgroup instead of 300 hosts. Easier, no?

Now, since in addition to monitoring, we have to notify people we have contacts. Contacts are the human versions of hosts. You can use email, paging, whatever method you can script Nagios to use to notify contacts. As we'll see, you can also set up working hours and non-working hours for notifications too, so non-critical notifications aren't paging people at 2am. Like hostgroups, we have contactgroups, since in larger organizations, you may have quite a few people who need to receive notifications from the same host or hostgroup.

The actual probes Nagios uses are checkcommands, and that's what determines the information that Nagios checks. However, you don't directly apply the checkcommands, rather you use services, which use the checkcommand definition, and other parameters to probe host(groups). This makes it easier to have many checks per host.

Another basic concept is the dependency. This can be used to make sure that you're not getting spurious alerts. For example, if you have a monitored switch that has 24 monitored hosts connected to it and the switch goes down, you'd potentially get alerts from all 25 hosts and however many services are being monitored on each host. If the switch is down, you can't talk to the hosts anyway, so those alerts are somewhat useless, as what they're really telling you is that the hosts are unreachable, not that they're actually malfunctioning. So, we use dependencies to say "If Switch A is down/unreachable, don't bother me with alerts from its attached hosts." You can have both service and host dependencies, and both are critical to a happy Nagios installation.

Finally we have extended info, for both hosts and services which can be thought of as metadata. This is useful for things like 3-D icons, connecting Nagios to various graphing utilities, etc.

nagios.cfg

The nagios.cfg file is the heart of Nagios. Without it, nothing works. So, understanding it is critical. Luckily, while it's long, it's really pretty simple, and well-documented. (I really have to put in a commendation to the Nagios team for their documentation. It's an excellent example of how to do useful documentation, and far more projects, commercial and open source both could do well to learn from Nagios' example.)

The nagios.cfg file is really a listing of other configuration files and settings that apply to Nagios as a whole. For example this line:

log_file=/usr/local/nagios/var/nagios.log

Tells Nagios where its log file lives. That's the default from a new Nagios installation. Going down nagios.cfg, we see the entries for the checkcommands file, the misccommands file, and the minimal.cfg file, which is the very basic Nagios configuration file, and handy for new users. As you go down the list, we see that you can get quite complex with the configuration files, giving you the flexibility to grow your Nagios installation to as large as you need to be. (At the high end, you can have clusters of Nagios servers all talking to each other. We're not going to get into those here.)

Other entries in nagios.cfg include the user and group Nagios runs as, and whether external commands are to be used. (note that to use the Web interface, you have to set check_external_commands=1 in nagios.cfg) One of the nicest things about Nagios is that pretty much every entry in nagios.cfg is nicely commented, so you don't have to guess at what any of them does. Since this is just a basic intro to Nagios, we only need to enable external commands. We'll leave the rest alone.

minimal.cfg

This file is (obviously) a minimal config file that will let you get Nagios up and running with a minimum of work. While it's not going to be one you'll want to use for large installations, it has everything you need to get started.

The first section defines time periods, (can be separately defined in timeperiods.cfg). While minimal.cfg only defines the "24x7" time period, you can create others to suit your own needs, (working_hours, non_working_hours, weekends, etc). The syntax is pretty self explanatory; a 'define timeperiod' block, containing a name used by other config files, an alias that can be more descriptive, contain spaces, etc, and the days in the timeperiod with hours, in 24hr time, that each day covers in the timeperiod, one per line. You can create more if you like, or just use the 24x7 one.

The next section defines the commands Nagios uses to talk to hosts and hostgroups, (can be separately defined in checkcommands.cfg and misccommands.cfg. misccommands is used for things like notification commands and other commands that don't directly use a nagios command plugin). This is where you define the commands that make up services. Looking at this section, we see the checkcommands.cfg file has some commands already set up. We'll skip down past the first two notification commands, to the check-host-alive command, as it's simpler to explain. The basic syntax is simple:

# Command to check to see if a host is "alive" (up) by pinging it--
a comment for your use
define command{ 
--start the command definition block
   command_name   check-host-alive --the name you refer to the command
-- as in the rest of Nagios
   command_line   $USER1$/check_ping -H $HOSTADDRESS$ -w 99,99% -c 100,100% -p 1 
--the actual command
   }

The command string is pretty straightforward. The $USER1$ macro, defined in resource.cfg, is the path to the plugin directory, normally /usr/local/nagios/libexec. You can use the $USERx$ macros to define all kinds of commands, like SNMP community strings, etc. You can have up to 32 of them, and they make life a lot easier. The check_tcp is the actual command executable name, (in general, Nagios command plugins are of the form check_<functionname>.), followed by various switches. A common one is the "-H" switch, which is the host address. Rather than entering it manually for every host, we use the $HOSTADDRESS$, which is defined in the host entry for minimal.cfg, and we'll see that in a bit. The rest of the switches are command-specific, and are explained by the commands help function. You can bring this up in terminal by running /usr/local/nagios/libexec/commandname -h in Terminal, and this will bring up the command's syntax definitions. I've yet to run into a command where this didn't work. Before using a command on a host or hostgroup, it's a good idea to use -h to make sure you know how the command works and what it is going to tell you.

Next up is the contact definition, (also defined by contacts.cfg), which tells Nagios who to notify when it needs to. Since minimal.cfg is by design a simple setup file, there's only one entry here, although you can put more in if you like. The contact_name is how you refer to the contact in the rest of Nagios, the alias is there for more human-friendly labeling. Note that you have separate host and service notification periods. While they're the same in this default, there are cases where you may want them to be different. For example, a backup service that only runs on the weekends wouldn't need 24x7 monitoring, but the host it runs on would. So you could set up a "backup_admin" contact that only received service notifications during a "weekend" time period.

The service and host notification options lines are nothing but a set of switches defined thusly:

d = send notifications on a DOWN state

u = send notifications on an UNREACHABLE state

r = send notifications on recoveries (OK state)

f = send notifications when the host starts and stops flapping

n = no host notifications will be sent out

w = send notifications on a WARNING state

u = send notifications on an UNKNOWN state

c = send notifications on a CRITICAL state

Note that "u" can have different definitions depending what kind of notifications you have. "Flapping" is the Nagios definition for a host that is changing states too often. To avoid this you can enable and configure "flap detection" in nagios.cfg. Flap detection can be quite useful if you have a balky host or service, or if you're having other problems causing hosts or services to look like they're coming up and down a lot.

The service_notification_commands and host_notification_commands lines are how you need to be notified for service and host alerts, (email in this case), and then you have a line for the email address you wish to use for the notifications.

The Contact Groups section follows, (defined separately in contactgroups.cfg), and is, obviously, where you create contact groups. The syntax is similar to the contacts configuration, (you'll note that Nagios uses as many common terms as possible in its config files, which makes things much easier on you) with the "members" line being a comma-delimited list of contacts that will be notified when that particular contact group is notified.

Next up is the Hosts section, (defined separately in hosts.cfg). This is the section where you tell Nagios what to monitor. The host definition has, by necessity, a largish list of terms, even in minimal.cfg:

define host{
   use                     generic-host; Name of host template to use
   host_name               localhost
   alias                   localhost
   address                 127.0.0.1
   hostgroups              test
   check_command           check-host-alive
   max_check_attempts      10
   notification_interval   120
   notification_period     24x7
   notification_options    d,r
   contact_groups          admins
        }

The use line tells the defintion which template to use, in this case, "generic-host", defined just above the host definitions in minimal.cfg. The templates are handy for defining values that are going to be common to a host or group of hosts, (not a hostgroup), so that your host definitions don't have to be needlessly long.

The host_name is how you refer to the host in the rest of Nagios, the alias is for a more human-friendly label. The address is what is used by the $HOSTADDRESS$ macro we saw in the command definitions earlier. It can be a FQDN DNS name, or an IP address. For servers, I prefer the IP address wherever possible, since that way, the DNS service on my network dropping doesn't kill Nagios' ability to find hosts. One line not in the default, but that I included here is hostgroups. You can, if you like, define a host's hostgroups in the host definition or a separate hostgroup definition. I recommend picking one and sticking with it to avoid confusion. The check_command line is a single command that you can use as the basic "is it up or not command". This isn't where you set up all the services you check on a host, we'll look at that later. This is just a default command for a given host. max_check_attempts is how many times Nagios will retry the check_command if the result is anything other than OK. (This only applies to the check command in the host definition, not all services running against that host).

notification_interval is how many time units, (default is minutes) that Nagios will wait to send out notifications of a host that is still down or unreachable. That's continuously down. If the host goes up and comes back down, that's different. notification_period is the timeperiod that notifications for this host are allowed, and use the timeperiod(s) set by you. notification_options determine the conditions that notifications are sent out. Usually, you want at least d,r, so that you know when a host goes down, and if it comes back up by itself. contact_groups are self-evident, they're the groups who get notified. If you want multiple contact groups, then use a comma-delimited list. Please note that this example is not a complete list of host parameters by any means, and you should consult the Nagios documentation for a full list.

Since we just defined the host, we should next define the host group the host belongs to, and that's the next section, (separately in hostgroups.cfg):

define hostgroup{
   hostgroup_name    test
   alias             Test Servers
   members           localhost
        }

As you can see, hostgroups look a lot like contact groups. The members parameter is a comma-delimited list of hosts if multiple hosts are used.

The final section in minimal.cfg is services, (defined separately in services.cfg). This is where you really get into the meat of Nagios. Services are how you apply commands to multiple hosts with a single entry, and notify multiple contacts or contact groups. Services can be any command Nagios knows about, and you can get quite specific. For example, while there's no specific command to check the KDC status on OS X Server, I was able to do so by using SNMP to check for the KDC process by using the following command definition:

#check_kdc_process_via_snmp command definition
define command{
        command_name    check_kdc_process_via_snmp
        command_line    $USER1$/check_proc_by_snmp $HOSTADDRESS$ $USER3$ $USER9$
        }

and wrapping it in a service:

define service{
   use   generic-service         ; Name of service template to use
        
   host_name   xserve01
   service_description   SNMP KDC Process Check
   is_volatile   0  
   check_period   24x7
   max_check_attempts   3
   normal_check_interval   3
   retry_check_interval   1
   contact_groups   xserve-admins,nt-admins
   notification_interval   120
   notification_period   24x7
   notification_options   w,u,c,r
   check_command                   check_kdc_process_via_snmp
        }

Like host definitions, service definitions have a template option, so you can set common parameters once and apply them to all the services that use this template. Let's take a look at the default "PING" definition in minimal.cfg:

define service{
   use                     generic-service         
; Name of service template to use
   host_name               localhost
   service_description     PING
   is_volatile             0
   check_period            24x7
   max_check_attempts      4
   normal_check_interval   5
   retry_check_interval    1
   contact_groups          admins
   notification_options    w,u,c,r
   notification_interval   960
   notification_period     24x7
   check_command           check_ping!100.0,20%!500.0,60%
        }

If we compare the service to the host definitions, we see that they're quite similar. The use parameter is the template the definition uses. The host_name paramter is a comma-delimited list of hosts this service runs giants. The max_check_attempts, notification_interval, and notification_periods are the same as for the host definition. The is_volatile parameter normally doesn't apply to services, and should be left at its default unless you have a need to change it. The normal_check_interval setting is how many time units to wait between regular checks of a service. By default, this is set to every five minutes. You can increase the frequency of checks, but this will also increase the load on your server and network, and should be done with caution. The retry_check_interval is how long to wait before scheduling a 're-check' which only happens in the case of a non 'OK' return from a check. The contact_groups parameter is exactly the same as for the host definition. The check_command is the name of the command as defined in the checkcommand definition, and any additional parameters the command needs.

That, in a nutshell is minimal.cfg, and almost all the settings you need to get started with Nagios. However, there are still a couple more files to set up before we're ready to start Nagios and get monitoring.

cgi.cfg

This is the config file that controls how Nagios talks to the CGIs and who can access which CGIs. This file isn't too complicated, but it has to be correct for Nagios' web interface to work correctly. Like all Nagios files, it's well commented. Running down the entries, most are self explanatory, so I won't comment on all of them. One of the ones that can catch you off guard is the url_html_path entry. Remember, with Mac OS X Server, that's going to be the root defined for the site in Server Admin, so you want that to point at the path defined by the physical_html_path parameter just above it.

The use_authentication parameter and the access control sections that follow it are critical, and you really, really want to read the Nagios documentation on how they work. If you are using a lot of external commands that can reboot hosts, etc, (all possible with Nagios), properly configuring your CGI access controls is critical to the security of your Nagios installation.

The statuswrl_include parameter is if you want to create a 3-D VRML 'flythrough' view of your network. It's not really any more useful than any of the 2-D views, but it's pretty cool for corner office types. The rest of the options can be left alone for an initial installation.

Testing Your Configuration

Of course you read all the docs and did everything right, but just in case, Nagios gives you a way to test your config, via the -v option for the nagios executable. The syntax is <path to the nagios executable> -v <path to nagios.cfg>. So for our example, we'd use:

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

If we did everything right, Nagios will tell us, and we can start it up. If there are config file errors, Nagios will do its best to give you the file name and the line number with the error. This info has always been fairly accurate in my experience, so just look where Nagios tells you, and you should be able to find any errors quickly.

If you didn't get any errors, then let's start nagios as a daemon. This is done by using the -d switch as below:

/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

Once that's done, run top to make sure Nagios is running. If it is, congratulations, you have a working Nagios installation. If not, run Nagios with the -v option to see what you may have missed. Checking system.log can help here as well.

Conclusion

Well, we've gone over a very basic Nagios configuration setup guide, (and corrected some errors from part 1). In the third part of this, we'll take a look at the actual interface, and get an idea of what we're looking at when we check on Nagios, along with some of the email notifications you might get. Thanks!

Bibliography and References

There are two sites that you really must get familiar with to use Nagios. http://www.nagios.org/ is the main Nagios site, and has tons of excellent information for you to use. The other is http://nagiosexchange.org, the biggest collection of Nagios plugins you'll find anywhere.

John Welch jwelch@bynkii.com is Unix/Open Systems administrator for Kansas City Life Insurance, (http://www.kclife.com/) a Technical Strategist for Provar, (http://www.provar.com/) and the "GeekSpeak" segment producer for Your Mac Life, (http://www.yourmaclife.com/). He has over fifteen years of experience at making Macs work with other computer systems. John specializes in figuring out ways in which to make the Mac do what nobody thinks it can, showing that the Mac is a superior administrative platform, and teaching others how to use it in interesting, if sometimes frightening ways. He also does things that don't involve computertry on occasion, or at least that's the rumor.

Software Updates via MacUpdate

Latest Forum Discussions

Combo Quest (Games)

Combo Quest 1.0 Device: iOS Universal Category: Games Price: $.99, Version: 1.0 (iTunes) Description: Combo Quest is an epic, time tap role-playing adventure. In this unique masterpiece, you are a knight on a heroic quest to retrieve... | Read more »

Hero Emblems (Games)

Hero Emblems 1.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: ** 25% OFF for a limited time to celebrate the release ** ** Note for iPhone 6 user: If it doesn't run fullscreen on your device... | Read more »

Puzzle Blitz (Games)

Puzzle Blitz 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: Puzzle Blitz is a frantic puzzle solving race against the clock! Solve as many puzzles as you can, before time runs out! You have... | Read more »

Sky Patrol (Games)

Sky Patrol 1.0.1 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0.1 (iTunes) Description: 'Strategic Twist On The Classic Shooter Genre' - Indie Game Mag... | Read more »

The Princess Bride - The Official Game...

The Princess Bride - The Official Game 1.1 Device: iOS Universal Category: Games Price: $3.99, Version: 1.1 (iTunes) Description: An epic game based on the beloved classic movie? Inconceivable! Play the world of The Princess Bride... | Read more »

Frozen Synapse (Games)

Frozen Synapse 1.0 Device: iOS iPhone Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: Frozen Synapse is a multi-award-winning tactical game. (Full cross-play with desktop and tablet versions) 9/10 Edge 9/10 Eurogamer... | Read more »

Space Marshals (Games)

Space Marshals 1.0.1 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0.1 (iTunes) Description: ### IMPORTANT ### Please note that iPhone 4 is not supported. Space Marshals is a Sci-fi Wild West adventure taking place... | Read more »

Battle Slimes (Games)

Battle Slimes 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: BATTLE SLIMES is a fun local multiplayer game. Control speedy & bouncy slime blobs as you compete with friends and family.... | Read more »

Spectrum - 3D Avenue (Games)

Spectrum - 3D Avenue 1.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: "Spectrum is a pretty cool take on twitchy/reaction-based gameplay with enough complexity and style to stand out from the... | Read more »

Drop Wizard (Games)

Drop Wizard 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: Bring back the joy of arcade games! Drop Wizard is an action arcade game where you play as Teo, a wizard on a quest to save his... | Read more »

Price Scanner via MacPrices.net

Our MacBook Price Trackers will show you the...

Our Apple award-winning MacBook Price Trackers are continually updated with the latest information on prices, bundles, and availability for 16″ and 14″ MacBook Pros along with 13″ and 15″ MacBook... Read more

Amazon is offering a 10% discount on Apple’s...

Don’t pay full price! Amazon has 16-inch M4 Pro MacBook Pros (Silver and Black colors) on sale today for 10% off Apple’s MSRP. Shipping is free. These are the lowest prices currently available for 16... Read more

13-inch M4 MacBook Airs on sale for $150 off...

Amazon has new 13″ M4 MacBook Airs on sale for $150 off MSRP right now, starting at $849. Sale prices apply to most colors and configurations. Be sure to select Amazon as the seller, rather than a... Read more

15-inch M4 MacBook Airs on sale for $150 off...

Amazon has new 15″ M4 MacBook Airs on sale for $150 off Apple’s MSRP, starting at $1049. Be sure to select Amazon as the seller, rather than a third-party: – 15″ M4 MacBook Air (16GB/256GB): $1049, $... Read more

Amazon is offering a $50 discount on Apple’s...

Amazon has Apple’s 11th-generation A16 iPads in stock on sale for $50 (or a little more) off MSRP this week. Shipping is free: – 11″ 11th-generation 128GB WiFi iPads: $299 $50 off MSRP – 11″ 11th-... Read more

Clearance 13-inch M1 MacBook Airs available f...

Walmart has clearance, but new, Apple 13″ M1 MacBook Airs (8GB RAM, 256GB SSD) available online for $649, $360 off original MSRP, in Space Gray, Silver, and Gold colors. These are new MacBooks for... Read more

iPad minis on sale for $100 off Apple’s MSRP...

Amazon is offering $100 discounts (up to 20% off) on Apple’s newest 2024 WiFi iPad minis, each with free shipping. These are the lowest prices available for new minis among the Apple retailers we... Read more

AirPods Max headphones on sale for $479, $70...

Amazon has AirPods Max with USB-C on sale for $479.99 in all colors. Shipping is free. Their price is $70 off Apple’s MSRP, and it’s the lowest price available today for AirPods Max. Keep an eye on... Read more

14-inch M4 Pro/M4 Max MacBook Pros on sale th...

Don’t pay full price! Get a new 14″ MacBook Pro with an M4 Pro or M4 Max CPU for up to $320 off Apple’s MSRP this weekend at these retailers…they are the lowest prices available for these MacBook... Read more

Get a 15-inch M4 MacBook Air for $150 off App...

A couple of Apple retailers are offering $150 discounts on new 15″ M4 MacBook Airs this weekend. Prices at these retailers start at $1049: (1): Amazon has new 15″ M4 MacBook Airs on sale for $150 off... Read more

Jobs Board

SPREAD THE WORD:
Slashdot
Digg
Del.icio.us
Reddit
Newsvine

MacTech

Patch Panel

Nagios on OS X, Part 2

Addendum and Errata

Initial Configuration

The Config Files

Testing Your Configuration

Conclusion

Software Updates via MacUpdate

Latest Forum Discussions

Price Scanner via MacPrices.net

Jobs Board