Linux

check_load2

Description:

A new way to monitor CPU load: no more setting WARN and CRIT levels manually.

Current Version

Last Release Date

June 16, 2009

Compatible With


Project Files
Project Notes
We are currently in the process of distributing a standard set of Nagios monitoring scripts to over 300 client systems. One of the metrics we would like to monitor is the three load averages (or as Dr. Gunther calls them: the LaLaLa triplets). Since these 300 servers aren't all alike, we are bound to run into systems with one, two, four, eight or more processors. That way there is no nice way of making one standard configuration, since you'll have to define separate LA levels for WARN and CRIT. Why? Cause a quad system can take much more load than a single core system. One way to get around this would be by defining separate host groups, based on the amount of processors in a system. You could then define a unique check_load command for each CPU host group. I've gone the other way around though... My work-around for this is by replacing check_load with check_load2. This script takes no command line parameters and works on the basis of standard multipliers. We are of the opinion that the number of processors multiplied by a certain factor (150%? 200%? and so on) is a good enough way to define these WARN and CRIT levels. These multipliers can easily be modified (at the top of the script) to fit what -you- think is a worrying level of activity. This script was tested on Redhat ES3, Solaris 8 and Mac OS X 10.4. It should run on other versions of these OSes as well. . EDIT: Oh! Just like my other recent Nagios scripts, check_load2 comes with a debugging option. Set $DEBUG at the top of the file to anything larger than zero and the script will dump information at various stages of its execution.
Reviews (1) Add a Review
Changes to exiting on warning/criticals
by Joobs888, March 31, 2015

Love this script, if anyone had the same issue I was facing. The script doesn't exit if there was a warning or critical output. Last two lines change from this: [ $CRIT -gt 0 ] && (echo "NOK: load averages are at $REAL_1min, $REAL_5min, $REAL_15min"; exit $STATE_CRITICAL) [ $WARN -gt 0 ] && (echo "NOK: load averages are at $REAL_1min, $REAL_5min, $REAL_15min"; exit $STATE_WARNING); To this: [ $CRIT -gt 0 ] && (echo "NOK: load averages are at $REAL_1min, $REAL_5min, $REAL_15min") && exit $?; [ $WARN -gt 0 ] && (echo "NOK: load averages are at $REAL_1min, $REAL_5min, $REAL_15min") && exit $?;



Project Stats
Rating
4.7 (3)
Favorites
0
Views
153,958