Tuesday, August 08, 2006

Syndication

I started using syndication recently. I like the idea of reading the news in Thunderbird instead of having to go to 10 websites. Of course you can read this blog via RSS :). http://lubik.blogspot.com/atom.xml

Labels: , ,

PfSense follow-up

I have been using PfSense for a month now, so I thought I should post about it... I ran into a few problems that I'd like to share:

  • Unexpected crashes caused by a PSU that was not powerful enough
  • All my devices/computers lost their IP address (DHCP)
I solved the first problem using an 1.5A PSU (I was using a 0.8A PSU). It only crashed once since then, but my target is more... 0 (a firewall shouldn't crash).

For the second problem, I increased my lease time to 1 week. Logs show nothing about what could have caused this incident (according to the logs, PfSense's DHCP server was still serving clients correctly).

NOTE: RC2 is out, I'm trying that right now...

Labels: , , ,

Nagios

I just started playing with Nagios, an open-source monitoring software package (GPL). I used to use monit instead, but there are two limitations of monit that made me switch:
  • It can only do port/protocol checks on remote hosts
  • It has no tolerance setting for check failures (it sends a warning as soon as there is one failure)
On the other hand, Nagios has tools that allows a Nagios server to perform "local" checks on remote servers, via the network (check_snmp, check_nt, check_nrpe and check_ssh). It has as side effect that it can monitor Windows servers quite well. The web interface enough for my needs.

Note: I'm still using monit for process checks, as Nagios can't do that as well as monit does (monit uses the information in the lockfile to see if the process is still in memory, and uses user-defined commands to restart the process if it is not in memory).

Here is how Nagios works, basically:
  • The tools that do the checks are called plugins
  • Objects have to be defined (timeperiods,hosts, contacts, services, commands), and group of objects can be created.
  • It uses smart checking algorithms so that your server doesn't do 1000 checks at 13:48 and 3 checks at 13:56
  • It can do a command in some situations (restart apache, for example)
Nagios has also other very nice features... Here is how I configured my server:
  • It does several port/protocol checks on remote servers
  • It uses nrpe (check_nrpe) to perform "local" checks on remote servers
  • When a check fails 4 times, it sends a notification (email)
  • When it notified 3 times and the problem is still not solved or acknowledged, it escalates the notifications (sends to my cell as well).
  • For some servers, I allow nagios to wake me up. For some others, they can only send messages to my cell between 7:30 and 22:00 (using timeperiods)
Nagios' configuration is a lot less painful than I thought. To make it easier, I created one file for each organization for which I monitor servers for.

Future plans: Failover

Using nsca, it is possible to have a Nagios server "standby", that would detect if the "master" Nagios server is down, and perform the checks and notifications during this downeime. nsca allows this "standby" server to have all information about previous checks.

BTW, I used the rpm packages from Dag Wieers, and they work fine with my CentOS 4 system. I tried installing nagios on a Fedora Core 4 machine, but the rpm packages for nagios in Fedora-extras are confusing...

Labels: , ,