Plugin, template and nagvis gadget for checking system stats via SNMP Version 1.0 If you find any of these useful, drop me an email and/or visit my employer's website to see some of the cool stuff we make. :-) Brent Bice bbice@sgi.com http://www.sgi.com/ This plugin can be used to fetch, graph, and visualize (or also warn about) various system performance stats. It works best with systems that are running net-snmp agents but will do some minimal monitoring of systems that support the mib2 host mib also (such as modern versions of windows). You can use it with nagios to send notifications about obvious stuff like Idle CPU usage dropping below certain levels (not especially useful to me - many of our servers run flat out for days at a time). But you can also use it to throw alarms about specific sorts of CPU usage such as System or Nice CPU. (ie, when you trip over an Oracle bug that causes System CPU to suddenly spike way above the norm of 10-20 percent and you have mere minutes to log in and diagnose the problem before the system becomes unusable - grin). This tarball contains several files: sgichk_snmp_system.pl - The plugin itself. It requires memcached to be running. It uses it to store previous cpu counters and the date/time it last ran so it knows what the time delta between checks is. check_snmp_system.php - A nagvis template for generating graphs of the CPU, Memory, Swap, Paging Activity, and Swap Activity and load averages graphs. snmp-sysperf.php - A nagvis gadget for visualizing the CPU, Memory and load average data on a nagvis dashboard. It displays CPU and memory usages with bar charts similar to the old xosview program. The Load averages are printed as text. You must customize this script to tell it where to find a truetype font file of your choice. Just change the $font= setting near the top of the file to point to the absolute path to a true-type font. Here's some example nagios config stuff: # Check system perf stats via SNMP define command{ command_name check_snmp_system command_line $USER1$/./sgichk_snmp_system.pl -H $HOSTADDRESS$ -C $USER4$ $ARG1$ } # An example of using the command. # We always use SNMP v2 or v3 - bulk queries are much faster, so in the # example below I'm using -2 to specify v2. In this example, I don't want # any notifications about load, cpu, paging, or swapping. I'm just using # nagios to fetch/graph/visualize these stats. But see the next example # # Check system perf stats define service{ host_name test-dbserver use pnp4nag-service,local-service normal_check_interval 1 service_description Check System Performance check_command check_snmp_system!-2 contact_groups dcounix-email } # In this example, a Barracuda BMA, I want to know if the NiceCPU usage # jumps or the load average goes too high. Either is a (happily not too # frequent) indication that something is amiss on our BMA. # # When the BMA has "issues" the NICE cpu usage also spikes # # Check system perf stats define service{ host_name bma use pnp4nag-service,local-service normal_check_interval 1 max_check_attempts 15 service_description Check System Performance check_command check_snmp_system!-2 --load-warn=15 --load-crit=30 --nice-warn=15 --nice-crit=20 } # Last example - a windows system that support the host mib but not all # the cool features of net-snmp. # # Check system perf stats define service { host_name pv-excas1-dc21 use pnp4nag-service,local-service normal_check_interval 1 service_description Check System Performance check_command check_snmp_system!-2 --use-mib2 }