TweetFollow Us on Twitter

Nagios on OS X, Part 2

Volume Number: 22 (2006)
Issue Number: 2
Column Tag: Programming

Patch Panel

Nagios on OS X, Part 2

by John C. Welch

Setting Up Nagios 2.0

Since part one of this article was published, there have been some changes in Nagios, (the reasons behind the delay for part two in fact.) Nagios 2 has reached the release candidate stage, and as such, I felt that this part should deal with what (should) be the current version by the time you read this. Luckily, this doesn't really change anything in part 1 other than the version of Nagios you download and install, and one minor change for the configure step of installing Nagios.

Addendum and Errata

The change involves an additional group, the nagios command group. I use nagioscmd for the name of this group, and you create it as you did the nagios group in part 1. This brings us to a rather obvious error in part one, and one I should have caught. The configure command in part one is incorrect, and if it works at all, will give you an incorrect setup. With Nagios 2 in mind, the correct configure command is:

./configure --with-gd-lib=/opt/local/lib --with-gd-inc=/opt/local/include 
   --prefix=/usr/local/nagios --with-cgiurl=/cgi-bin --with-htmlurl=/ --with-nagios-user=nagios 
   --with-nagios-group=nagios --with-command-group=nagioscmd

That should work correctly for you. The rest of part one should be unchanged for you, it has been for me.

Initial Configuration

One change in Nagios 2, and one that will be welcomed by administrators new to Nagios is the initial configuration. In Nagios 1.2 and later, you had to use multiple configuration files, and getting them set up, and grasping the relationship between them was a little tricky. With Nagios 2, if you're new to Nagios, or you want to play with a smaller setup before you get into mapping the entire Internet, you can now use a much smaller number of config files (around 3 for a minimal configuration). In fact, you have your choice of two templates to work from, one called "minimal.cfg-sample" and one called "bigger.cfg.sample". As the names imply, they are for minimal, and slightly bigger Nagios setups.

In case I haven't done this, let me stress one critical point: This article is not, nor should it be taken as a replacement for the Nagios documentation! That documentation is far more complete, and if this article disagrees with the Nagios docs, the Nagios docs should be assumed to be more correct, unless they are assuming !Mac OS X.

The Config Files

As I have alluded to, Nagios makes use of text config files for its setup and to do its work. They can seem daunting at first, but they do follow a logical flow and they should not intimidate you in the least. When you initially install the config files, they all have the name pattern of <something>.cfg-sample. When you have a file set up as you like, remove the -sample from the end of the name, and Nagios will be able to use it. Note that in general, any changes to a config file will probably require a restart of the Nagios process for those changes to be used.

Basic Nagios config theory

Before we get into specifics of the files, we need to look at the relationship of things in Nagios, so that we might have a better mental picture of what's going on. The basic 'unit' in Nagios is the host. A host is a computer, a router, switch, etc. It's anything that Nagios directly probes. Each host has a series of characteristics, like IP address, and name that define it to Nagios. Since you can have hundreds, if not thousands of hosts in a network, and applying settings and services to them individually would be rather tedious, you have hostgroups. Hostgroups are just what they sound like, a collection of hosts that allow you to more easily work with hosts. So you can apply a probe to a single hostgroup instead of 300 hosts. Easier, no?

Now, since in addition to monitoring, we have to notify people we have contacts. Contacts are the human versions of hosts. You can use email, paging, whatever method you can script Nagios to use to notify contacts. As we'll see, you can also set up working hours and non-working hours for notifications too, so non-critical notifications aren't paging people at 2am. Like hostgroups, we have contactgroups, since in larger organizations, you may have quite a few people who need to receive notifications from the same host or hostgroup.

The actual probes Nagios uses are checkcommands, and that's what determines the information that Nagios checks. However, you don't directly apply the checkcommands, rather you use services, which use the checkcommand definition, and other parameters to probe host(groups). This makes it easier to have many checks per host.

Another basic concept is the dependency. This can be used to make sure that you're not getting spurious alerts. For example, if you have a monitored switch that has 24 monitored hosts connected to it and the switch goes down, you'd potentially get alerts from all 25 hosts and however many services are being monitored on each host. If the switch is down, you can't talk to the hosts anyway, so those alerts are somewhat useless, as what they're really telling you is that the hosts are unreachable, not that they're actually malfunctioning. So, we use dependencies to say "If Switch A is down/unreachable, don't bother me with alerts from its attached hosts." You can have both service and host dependencies, and both are critical to a happy Nagios installation.

Finally we have extended info, for both hosts and services which can be thought of as metadata. This is useful for things like 3-D icons, connecting Nagios to various graphing utilities, etc.

nagios.cfg

The nagios.cfg file is the heart of Nagios. Without it, nothing works. So, understanding it is critical. Luckily, while it's long, it's really pretty simple, and well-documented. (I really have to put in a commendation to the Nagios team for their documentation. It's an excellent example of how to do useful documentation, and far more projects, commercial and open source both could do well to learn from Nagios' example.)

The nagios.cfg file is really a listing of other configuration files and settings that apply to Nagios as a whole. For example this line:

log_file=/usr/local/nagios/var/nagios.log

Tells Nagios where its log file lives. That's the default from a new Nagios installation. Going down nagios.cfg, we see the entries for the checkcommands file, the misccommands file, and the minimal.cfg file, which is the very basic Nagios configuration file, and handy for new users. As you go down the list, we see that you can get quite complex with the configuration files, giving you the flexibility to grow your Nagios installation to as large as you need to be. (At the high end, you can have clusters of Nagios servers all talking to each other. We're not going to get into those here.)

Other entries in nagios.cfg include the user and group Nagios runs as, and whether external commands are to be used. (note that to use the Web interface, you have to set check_external_commands=1 in nagios.cfg) One of the nicest things about Nagios is that pretty much every entry in nagios.cfg is nicely commented, so you don't have to guess at what any of them does. Since this is just a basic intro to Nagios, we only need to enable external commands. We'll leave the rest alone.

minimal.cfg

This file is (obviously) a minimal config file that will let you get Nagios up and running with a minimum of work. While it's not going to be one you'll want to use for large installations, it has everything you need to get started.

The first section defines time periods, (can be separately defined in timeperiods.cfg). While minimal.cfg only defines the "24x7" time period, you can create others to suit your own needs, (working_hours, non_working_hours, weekends, etc). The syntax is pretty self explanatory; a 'define timeperiod' block, containing a name used by other config files, an alias that can be more descriptive, contain spaces, etc, and the days in the timeperiod with hours, in 24hr time, that each day covers in the timeperiod, one per line. You can create more if you like, or just use the 24x7 one.

The next section defines the commands Nagios uses to talk to hosts and hostgroups, (can be separately defined in checkcommands.cfg and misccommands.cfg. misccommands is used for things like notification commands and other commands that don't directly use a nagios command plugin). This is where you define the commands that make up services. Looking at this section, we see the checkcommands.cfg file has some commands already set up. We'll skip down past the first two notification commands, to the check-host-alive command, as it's simpler to explain. The basic syntax is simple:

# Command to check to see if a host is "alive" (up) by pinging it--
a comment for your use
define command{ 
--start the command definition block
   command_name   check-host-alive --the name you refer to the command
-- as in the rest of Nagios
   command_line   $USER1$/check_ping -H $HOSTADDRESS$ -w 99,99% -c 100,100% -p 1 
--the actual command
   }

The command string is pretty straightforward. The $USER1$ macro, defined in resource.cfg, is the path to the plugin directory, normally /usr/local/nagios/libexec. You can use the $USERx$ macros to define all kinds of commands, like SNMP community strings, etc. You can have up to 32 of them, and they make life a lot easier. The check_tcp is the actual command executable name, (in general, Nagios command plugins are of the form check_<functionname>.), followed by various switches. A common one is the "-H" switch, which is the host address. Rather than entering it manually for every host, we use the $HOSTADDRESS$, which is defined in the host entry for minimal.cfg, and we'll see that in a bit. The rest of the switches are command-specific, and are explained by the commands help function. You can bring this up in terminal by running /usr/local/nagios/libexec/commandname -h in Terminal, and this will bring up the command's syntax definitions. I've yet to run into a command where this didn't work. Before using a command on a host or hostgroup, it's a good idea to use -h to make sure you know how the command works and what it is going to tell you.

Next up is the contact definition, (also defined by contacts.cfg), which tells Nagios who to notify when it needs to. Since minimal.cfg is by design a simple setup file, there's only one entry here, although you can put more in if you like. The contact_name is how you refer to the contact in the rest of Nagios, the alias is there for more human-friendly labeling. Note that you have separate host and service notification periods. While they're the same in this default, there are cases where you may want them to be different. For example, a backup service that only runs on the weekends wouldn't need 24x7 monitoring, but the host it runs on would. So you could set up a "backup_admin" contact that only received service notifications during a "weekend" time period.

The service and host notification options lines are nothing but a set of switches defined thusly:

    d = send notifications on a DOWN state

    u = send notifications on an UNREACHABLE state

    r = send notifications on recoveries (OK state)

    f = send notifications when the host starts and stops flapping

    n = no host notifications will be sent out

    w = send notifications on a WARNING state

    u = send notifications on an UNKNOWN state

    c = send notifications on a CRITICAL state

Note that "u" can have different definitions depending what kind of notifications you have. "Flapping" is the Nagios definition for a host that is changing states too often. To avoid this you can enable and configure "flap detection" in nagios.cfg. Flap detection can be quite useful if you have a balky host or service, or if you're having other problems causing hosts or services to look like they're coming up and down a lot.

The service_notification_commands and host_notification_commands lines are how you need to be notified for service and host alerts, (email in this case), and then you have a line for the email address you wish to use for the notifications.

The Contact Groups section follows, (defined separately in contactgroups.cfg), and is, obviously, where you create contact groups. The syntax is similar to the contacts configuration, (you'll note that Nagios uses as many common terms as possible in its config files, which makes things much easier on you) with the "members" line being a comma-delimited list of contacts that will be notified when that particular contact group is notified.

Next up is the Hosts section, (defined separately in hosts.cfg). This is the section where you tell Nagios what to monitor. The host definition has, by necessity, a largish list of terms, even in minimal.cfg:

define host{
   use                     generic-host; Name of host template to use
   host_name               localhost
   alias                   localhost
   address                 127.0.0.1
   hostgroups              test
   check_command           check-host-alive
   max_check_attempts      10
   notification_interval   120
   notification_period     24x7
   notification_options    d,r
   contact_groups          admins
        }

The use line tells the defintion which template to use, in this case, "generic-host", defined just above the host definitions in minimal.cfg. The templates are handy for defining values that are going to be common to a host or group of hosts, (not a hostgroup), so that your host definitions don't have to be needlessly long.

The host_name is how you refer to the host in the rest of Nagios, the alias is for a more human-friendly label. The address is what is used by the $HOSTADDRESS$ macro we saw in the command definitions earlier. It can be a FQDN DNS name, or an IP address. For servers, I prefer the IP address wherever possible, since that way, the DNS service on my network dropping doesn't kill Nagios' ability to find hosts. One line not in the default, but that I included here is hostgroups. You can, if you like, define a host's hostgroups in the host definition or a separate hostgroup definition. I recommend picking one and sticking with it to avoid confusion. The check_command line is a single command that you can use as the basic "is it up or not command". This isn't where you set up all the services you check on a host, we'll look at that later. This is just a default command for a given host. max_check_attempts is how many times Nagios will retry the check_command if the result is anything other than OK. (This only applies to the check command in the host definition, not all services running against that host).

notification_interval is how many time units, (default is minutes) that Nagios will wait to send out notifications of a host that is still down or unreachable. That's continuously down. If the host goes up and comes back down, that's different. notification_period is the timeperiod that notifications for this host are allowed, and use the timeperiod(s) set by you. notification_options determine the conditions that notifications are sent out. Usually, you want at least d,r, so that you know when a host goes down, and if it comes back up by itself. contact_groups are self-evident, they're the groups who get notified. If you want multiple contact groups, then use a comma-delimited list. Please note that this example is not a complete list of host parameters by any means, and you should consult the Nagios documentation for a full list.

Since we just defined the host, we should next define the host group the host belongs to, and that's the next section, (separately in hostgroups.cfg):

define hostgroup{
   hostgroup_name    test
   alias             Test Servers
   members           localhost
        }

As you can see, hostgroups look a lot like contact groups. The members parameter is a comma-delimited list of hosts if multiple hosts are used.

The final section in minimal.cfg is services, (defined separately in services.cfg). This is where you really get into the meat of Nagios. Services are how you apply commands to multiple hosts with a single entry, and notify multiple contacts or contact groups. Services can be any command Nagios knows about, and you can get quite specific. For example, while there's no specific command to check the KDC status on OS X Server, I was able to do so by using SNMP to check for the KDC process by using the following command definition:

#check_kdc_process_via_snmp command definition
define command{
        command_name    check_kdc_process_via_snmp
        command_line    $USER1$/check_proc_by_snmp $HOSTADDRESS$ $USER3$ $USER9$
        }

and wrapping it in a service:

define service{
   use   generic-service         ; Name of service template to use
        
   host_name   xserve01
   service_description   SNMP KDC Process Check
   is_volatile   0  
   check_period   24x7
   max_check_attempts   3
   normal_check_interval   3
   retry_check_interval   1
   contact_groups   xserve-admins,nt-admins
   notification_interval   120
   notification_period   24x7
   notification_options   w,u,c,r
   check_command                   check_kdc_process_via_snmp
        }

Like host definitions, service definitions have a template option, so you can set common parameters once and apply them to all the services that use this template. Let's take a look at the default "PING" definition in minimal.cfg:

define service{
   use                     generic-service         
; Name of service template to use
   host_name               localhost
   service_description     PING
   is_volatile             0
   check_period            24x7
   max_check_attempts      4
   normal_check_interval   5
   retry_check_interval    1
   contact_groups          admins
   notification_options    w,u,c,r
   notification_interval   960
   notification_period     24x7
   check_command           check_ping!100.0,20%!500.0,60%
        }

If we compare the service to the host definitions, we see that they're quite similar. The use parameter is the template the definition uses. The host_name paramter is a comma-delimited list of hosts this service runs giants. The max_check_attempts, notification_interval, and notification_periods are the same as for the host definition. The is_volatile parameter normally doesn't apply to services, and should be left at its default unless you have a need to change it. The normal_check_interval setting is how many time units to wait between regular checks of a service. By default, this is set to every five minutes. You can increase the frequency of checks, but this will also increase the load on your server and network, and should be done with caution. The retry_check_interval is how long to wait before scheduling a 're-check' which only happens in the case of a non 'OK' return from a check. The contact_groups parameter is exactly the same as for the host definition. The check_command is the name of the command as defined in the checkcommand definition, and any additional parameters the command needs.

That, in a nutshell is minimal.cfg, and almost all the settings you need to get started with Nagios. However, there are still a couple more files to set up before we're ready to start Nagios and get monitoring.

cgi.cfg

This is the config file that controls how Nagios talks to the CGIs and who can access which CGIs. This file isn't too complicated, but it has to be correct for Nagios' web interface to work correctly. Like all Nagios files, it's well commented. Running down the entries, most are self explanatory, so I won't comment on all of them. One of the ones that can catch you off guard is the url_html_path entry. Remember, with Mac OS X Server, that's going to be the root defined for the site in Server Admin, so you want that to point at the path defined by the physical_html_path parameter just above it.

The use_authentication parameter and the access control sections that follow it are critical, and you really, really want to read the Nagios documentation on how they work. If you are using a lot of external commands that can reboot hosts, etc, (all possible with Nagios), properly configuring your CGI access controls is critical to the security of your Nagios installation.

The statuswrl_include parameter is if you want to create a 3-D VRML 'flythrough' view of your network. It's not really any more useful than any of the 2-D views, but it's pretty cool for corner office types. The rest of the options can be left alone for an initial installation.

Testing Your Configuration

Of course you read all the docs and did everything right, but just in case, Nagios gives you a way to test your config, via the -v option for the nagios executable. The syntax is <path to the nagios executable> -v <path to nagios.cfg>. So for our example, we'd use:

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

If we did everything right, Nagios will tell us, and we can start it up. If there are config file errors, Nagios will do its best to give you the file name and the line number with the error. This info has always been fairly accurate in my experience, so just look where Nagios tells you, and you should be able to find any errors quickly.

If you didn't get any errors, then let's start nagios as a daemon. This is done by using the -d switch as below:

/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

Once that's done, run top to make sure Nagios is running. If it is, congratulations, you have a working Nagios installation. If not, run Nagios with the -v option to see what you may have missed. Checking system.log can help here as well.

Conclusion

Well, we've gone over a very basic Nagios configuration setup guide, (and corrected some errors from part 1). In the third part of this, we'll take a look at the actual interface, and get an idea of what we're looking at when we check on Nagios, along with some of the email notifications you might get. Thanks!

Bibliography and References

There are two sites that you really must get familiar with to use Nagios. http://www.nagios.org/ is the main Nagios site, and has tons of excellent information for you to use. The other is http://nagiosexchange.org, the biggest collection of Nagios plugins you'll find anywhere.


John Welch jwelch@bynkii.com is Unix/Open Systems administrator for Kansas City Life Insurance, (http://www.kclife.com/) a Technical Strategist for Provar, (http://www.provar.com/) and the "GeekSpeak" segment producer for Your Mac Life, (http://www.yourmaclife.com/). He has over fifteen years of experience at making Macs work with other computer systems. John specializes in figuring out ways in which to make the Mac do what nobody thinks it can, showing that the Mac is a superior administrative platform, and teaching others how to use it in interesting, if sometimes frightening ways. He also does things that don't involve computertry on occasion, or at least that's the rumor.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Latest Forum Discussions

See All

Fresh From the Land Down Under – The Tou...
After a two week hiatus, we are back with another episode of The TouchArcade Show. Eli is fresh off his trip to Australia, which according to him is very similar to America but more upside down. Also kangaroos all over. Other topics this week... | Read more »
TouchArcade Game of the Week: ‘Dungeon T...
I’m a little conflicted on this week’s pick. Pretty much everyone knows the legend of Dungeon Raid, the match-3 RPG hybrid that took the world by storm way back in 2011. Everyone at the time was obsessed with it, but for whatever reason the... | Read more »
SwitchArcade Round-Up: Reviews Featuring...
Hello gentle readers, and welcome to the SwitchArcade Round-Up for July 19th, 2024. In today’s article, we finish up the week with the unusual appearance of a review. I’ve spent my time with Hot Lap Racing, and I’m ready to give my verdict. After... | Read more »
Draknek Interview: Alan Hazelden on Thin...
Ever since I played my first release from Draknek & Friends years ago, I knew I wanted to sit down with Alan Hazelden and chat about the team, puzzle games, and much more. | Read more »
The Latest ‘Marvel Snap’ OTA Update Buff...
I don’t know about all of you, my fellow Marvel Snap (Free) players, but these days when I see a balance update I find myself clenching my… teeth and bracing for the impact to my decks. They’ve been pretty spicy of late, after all. How will the... | Read more »
‘Honkai Star Rail’ Version 2.4 “Finest D...
HoYoverse just announced the Honkai Star Rail (Free) version 2.4 “Finest Duel Under the Pristine Blue" update alongside a surprising collaboration. Honkai Star Rail 2.4 follows the 2.3 “Farewell, Penacony" update. Read about that here. | Read more »
‘Vampire Survivors+’ on Apple Arcade Wil...
Earlier this month, Apple revealed that poncle’s excellent Vampire Survivors+ () would be heading to Apple Arcade as a new App Store Great. I reached out to poncle to check in on the DLC for Vampire Survivors+ because only the first two DLCs were... | Read more »
Homerun Clash 2: Legends Derby opens for...
Since launching in 2018, Homerun Clash has performed admirably for HAEGIN, racking up 12 million players all eager to prove they could be the next baseball champions. Well, the title will soon be up for grabs again, as Homerun Clash 2: Legends... | Read more »
‘Neverness to Everness’ Is a Free To Pla...
Perfect World Games and Hotta Studio (Tower of Fantasy) announced a new free to play open world RPG in the form of Neverness to Everness a few days ago (via Gematsu). Neverness to Everness has an urban setting, and the two reveal trailers for it... | Read more »
Meditative Puzzler ‘Ouros’ Coming to iOS...
Ouros is a mediative puzzle game from developer Michael Kamm that launched on PC just a couple of months back, and today it has been revealed that the title is now heading to iOS and Android devices next month. Which is good news I say because this... | Read more »

Price Scanner via MacPrices.net

Amazon is still selling 16-inch MacBook Pros...
Prime Day in July is over, but Amazon is still selling 16-inch Apple MacBook Pros for $500-$600 off MSRP. Shipping is free. These are the lowest prices available this weekend for new 16″ Apple... Read more
Walmart continues to sell clearance 13-inch M...
Walmart continues to offer clearance, but new, Apple 13″ M1 MacBook Airs (8GB RAM, 256GB SSD) online for $699, $300 off original MSRP, in Space Gray, Silver, and Gold colors. These are new MacBooks... Read more
Apple is offering steep discounts, up to $600...
Apple has standard-configuration 16″ M3 Max MacBook Pros available, Certified Refurbished, starting at $2969 and ranging up to $600 off MSRP. Each model features a new outer case, shipping is free,... Read more
Save up to $480 with these 14-inch M3 Pro/M3...
Apple has 14″ M3 Pro and M3 Max MacBook Pros in stock today and available, Certified Refurbished, starting at $1699 and ranging up to $480 off MSRP. Each model features a new outer case, shipping is... Read more
Amazon has clearance 9th-generation WiFi iPad...
Amazon has Apple’s 9th generation 10.2″ WiFi iPads on sale for $80-$100 off MSRP, starting only $249. Their prices are the lowest available for new iPads anywhere: – 10″ 64GB WiFi iPad (Space Gray or... Read more
Apple is offering a $50 discount on 2nd-gener...
Apple has Certified Refurbished White and Midnight HomePods available for $249, Certified Refurbished. That’s $50 off MSRP and the lowest price currently available for a full-size Apple HomePod today... Read more
The latest MacBook Pro sale at Amazon: 16-inc...
Amazon is offering instant discounts on 16″ M3 Pro and 16″ M3 Max MacBook Pros ranging up to $400 off MSRP as part of their early July 4th sale. Shipping is free. These are the lowest prices... Read more
14-inch M3 Pro MacBook Pros with 36GB of RAM...
B&H Photo has 14″ M3 Pro MacBook Pros with 36GB of RAM and 512GB or 1TB SSDs in stock today and on sale for $200 off Apple’s MSRP, each including free 1-2 day shipping: – 14″ M3 Pro MacBook Pro (... Read more
14-inch M3 MacBook Pros with 16GB of RAM on s...
B&H Photo has 14″ M3 MacBook Pros with 16GB of RAM and 512GB or 1TB SSDs in stock today and on sale for $150-$200 off Apple’s MSRP, each including free 1-2 day shipping: – 14″ M3 MacBook Pro (... Read more
Amazon is offering $170-$200 discounts on new...
Amazon is offering a $170-$200 discount on every configuration and color of Apple’s M3-powered 15″ MacBook Airs. Prices start at $1129 for models with 8GB of RAM and 256GB of storage: – 15″ M3... Read more

Jobs Board

*Apple* Systems Engineer - Chenega Corporati...
…LLC,** a **Chenega Professional Services** ' company, is looking for a ** Apple Systems Engineer** to support the Information Technology Operations and Maintenance Read more
Solutions Engineer - *Apple* - SHI (United...
**Job Summary** An Apple Solution Engineer's primary role is tosupport SHI customers in their efforts to select, deploy, and manage Apple operating systems and Read more
*Apple* / Mac Administrator - JAMF Pro - Ame...
Amentum is seeking an ** Apple / Mac Administrator - JAMF Pro** to provide support with the Apple Ecosystem to include hardware and software to join our team and Read more
Operations Associate - *Apple* Blossom Mall...
Operations Associate - Apple Blossom Mall Location:Winchester, VA, United States (https://jobs.jcp.com/jobs/location/191170/winchester-va-united-states) - Apple Read more
Cashier - *Apple* Blossom Mall - JCPenney (...
Cashier - Apple Blossom Mall Location:Winchester, VA, United States (https://jobs.jcp.com/jobs/location/191170/winchester-va-united-states) - Apple Blossom Mall Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.