Nagios
I just started playing with Nagios, an open-source monitoring software package (GPL). I used to use monit instead, but there are two limitations of monit that made me switch:
Note: I'm still using monit for process checks, as Nagios can't do that as well as monit does (monit uses the information in the lockfile to see if the process is still in memory, and uses user-defined commands to restart the process if it is not in memory).
Here is how Nagios works, basically:
Future plans: Failover
Using nsca, it is possible to have a Nagios server "standby", that would detect if the "master" Nagios server is down, and perform the checks and notifications during this downeime. nsca allows this "standby" server to have all information about previous checks.
BTW, I used the rpm packages from Dag Wieers, and they work fine with my CentOS 4 system. I tried installing nagios on a Fedora Core 4 machine, but the rpm packages for nagios in Fedora-extras are confusing...
- It can only do port/protocol checks on remote hosts
- It has no tolerance setting for check failures (it sends a warning as soon as there is one failure)
Note: I'm still using monit for process checks, as Nagios can't do that as well as monit does (monit uses the information in the lockfile to see if the process is still in memory, and uses user-defined commands to restart the process if it is not in memory).
Here is how Nagios works, basically:
- The tools that do the checks are called plugins
- Objects have to be defined (timeperiods,hosts, contacts, services, commands), and group of objects can be created.
- It uses smart checking algorithms so that your server doesn't do 1000 checks at 13:48 and 3 checks at 13:56
- It can do a command in some situations (restart apache, for example)
- It does several port/protocol checks on remote servers
- It uses nrpe (check_nrpe) to perform "local" checks on remote servers
- When a check fails 4 times, it sends a notification (email)
- When it notified 3 times and the problem is still not solved or acknowledged, it escalates the notifications (sends to my cell as well).
- For some servers, I allow nagios to wake me up. For some others, they can only send messages to my cell between 7:30 and 22:00 (using timeperiods)
Future plans: Failover
Using nsca, it is possible to have a Nagios server "standby", that would detect if the "master" Nagios server is down, and perform the checks and notifications during this downeime. nsca allows this "standby" server to have all information about previous checks.
BTW, I used the rpm packages from Dag Wieers, and they work fine with my CentOS 4 system. I tried installing nagios on a Fedora Core 4 machine, but the rpm packages for nagios in Fedora-extras are confusing...
Comments