Wednesday, August 5, 2015

Nagios: What is it? Why should I use it?

So, this will be the first in a longer series of Nagios related posts.

Nagios, quite simply, is a IT monitoring software application.

What does this mean?  Nagios is a network monitoring tool that can provide you with health status, uptime, availability, and a myriad of other information about all your systems, in one convenient platform.  it gives a quick and concise overview of all your systems, and proactively monitors them for up/down status, and whatever other checks you choose to configure it to do.  Here are a couple of screenshots of an environment I set up to monitor 2 Linux Servers, the NagiosAdmin Box, and 15 Windows servers.


























And for a list of services being monitored:












































You can have it just display this information on a webpage, or you can also have it send out alerts also so that even if you are away from your desk, you can know if issues arise.

From the pictures above, we can see that I don't have my entire monitoring zone configured correctly.  We can see on the Windows machines, that while IIS is being monitored correctly, most of the machines are not reporting the CPU status.  This is due to configuration that hasn't been done at this point, so I could show an example of some things working, while others not.

We can also see on my Linux boxes, that we are running JBoss.  JBoss is actively being monitored, and if it were to hang, I would get an email and a text message informing me of an issue with JBoss.  It can also be noted that the Disk Space is being checked on my Linux boxes.  linux1 is ok, the root partition is only 82% full, but I configured Nagios to send me a warning at 85% full, and to tell me it is critical at 90% full.  So 82% is ok.  linux2 however has an issue.  The root partition is 87% full, which exceeds the configuration of 85%, so it is set in a warning status.  Normally this would inform me every 30 minutes of this, but sense for something like this I do not want to be constantly bugged about, I have disabled to warnings just on the Disk Space Check service.

At first glance, Nagios may not seem like the prettiest product out there, and I'll be honest, I've seen things that look a lot better.  That being said, there are a lot of after-market solutions which can come into play which can help improve the look and feel of the end-user experience, but we won't go into those at this point.

So up to now, we've talked a little about what Nagios is, next we will talk about why to use Nagios.

While there are many different applications out there that can do the same/similar job as Nagios, there are five main reasons why I like it:
  1. Nagios Core is FREE

  2. Nagios, while they have a paid version, which has support from the vendor, some additional bells and whistles, offers a free version of their software.  This allows small and medium (and even larger if they wish to invest the time and expertise) businesses to have an industry standard network, application, server, etc... monitoring application without having to fork out tons of money to do so.

  3. Nagios Monitoring is System Agnostic

  4. Nagios, while designed to run on Linux for the monitoring server, can monitor any type of operating system whether it be Windows, Linux, or OS X.  This allows an admin the flexibility to have a one stop shop for monitoring and taking care of his systems.

  5. Nagios is Proactive

  6. What do I mean by "Nagios is Proactive"?  What I mean is that once you have set up and configured Nagios, it will contact you however you have configured it (for myself, I have done both email and text message) to let you know if there is a problem with a service you are monitoring, and which server is affected.  If the server comes back up, whether naturally or through the work of another admin, it will notify you of that also.  This allows a SysAdmin to go about his other work without having to constantly check application and server status to make sure that things are happy.  I don't know about others, but that is a huge stress relief and time free up for me.





  7. Nagios can Leverage the Systems Being Monitored

  8. Via plugins, like NRPE, you can monitor system resources that would normally not be available through a network monitoring scan.  What I mean by this is you can monitor client
    • CPU Usage
    • Memory Usage
    • Disk Space
    • Applications and Processes
    • Anything else you can imagine that you can code/script up





  9. Nagios is Modular

  10. The problem that a lot of people have with Linux and Open Source projects is that they can quickly get very confusing and a lot of applications have a one-stop configuration file, that can easily become a multi-thousand line file that is a bear to manage.  The Nagios project has done away with a single configuration file, and have modularized the systems to allow for multiple configuration files.

    Let's set up the following scenario to see how this is helpful.

    Image you are a Systems Administrator, and monitor 100 servers.  You have 20 Linux boxes and 80 Windows Boxes that you take care of.  These servers support 10 different applications with 2 Linux and 8 Windows boxes for each application.  If you were to try and put all 100 servers in one configuration file, and having 10 lines per server declaration (more details on how this is done will be had in a future post), that would quickly give you a 1000 line file to deal with, and managing all your hosts would be difficult.  Plus, that's only the server declarations, this doesn't include any monitoring definitions that could easily start spanning many thousand more lines of configuration.

    Instead, Nagios allows the Admin to use the /etc/nagios/nagios.cfg file to specify where other specific config file/config directories are.  This can allow us to create a file structure path that makes it easy to administer.  We can house our server declarations in multiple files as so:
    • /etc/nagios/server/group1.cfg
    • /etc/nagios/server/group2.cfg
    • /etc/nagios/server/group3.cfg
    • etc...
    And then we can host our configs also in the same way:
    • /etc/nagios/config/group1/linux.cfg
    • /etc/nagios/config/group1/windows.cfg
    • /etc/nagios/config/group2/linux.cfg
    • /etc/nagios/config/group2/windows.cfg
    • /etc/nagios/config/group3/linux.cfg
    • /etc/nagios/config/group3/windows.cfg
    • etc...
    As we can see, if we need to change something with the declaration of a server in group 1, it is easy to find the file associated with the server group, and if we then need to add a monitoring definition to group 2 Linux servers, we can very quickly go in edit the configuration for that group, without running the risk of affecting another group.
So to sum things up, Nagios is a network application monitoring utility that can proactively monitor your network.  While there are paid versions of Nagios, it is free to use and can be configured to monitor just about anything you want.

While there are other utilities out there which perform a similar function as Nagios, and many of them work splendidly, I have decided to focus on Nagios for the time being as it is free to use for any individual or business, supports cross-platform monitoring, and is modular in its approach for configuration.

No comments:

Post a Comment