TOP LEVEL CATEGORIES
EXPLORE
Description:
Plugin for nagios nrpe to report on local host lsf status and performance data. Uses bhosts and lsload lsf commands. Performance data tested with pnp4nagios, which produces a time graph for each metrix in the lsloads listing.
Why is this run on the local host rather than querying each host status via the lsf master? Because it fits in nicely with the pnp4nagios architecture, which produces historical graphs on lsload data. pnp4nagios works on a per host basis.
Why is this written as a shell script rather than perl/python/c. We need to run lsf commands and we need to source lsf env vars to find the commands. These commands live in different places depending on the architecture of the system (intel/sun/powerpc) and whether its 32 or 64 bit. Thus a shell script is most portable across multiple platforms/architectures (assuming you have a bash interpreter and posix standard environment with tools like awk).
I have also written a check_lsf_master.sh which checks the master, and populates the performance data with LSF queue information.
V1.2 17 Aug 2011 Performance data output fully complient to nagios standard. Check for lsf required daemons before checking host via lsf.
V1.3 01 Sep 2011 Can have more than one sbatchd running; change test from ‘eq 1’ to ‘ge 1’.
V1.4 15 Sep 2011 eauth daemon does not get spun up until its required. If it dies, res restarts it. So no need to check for it. Changed closed_full from warning to ok. Depends on how you want to interpret this. We don’t want to know hosts are full; you might want to know.
V1.5 21 Sep 2011 Changed closed_Excl from warning to OK.
Current Version
1.5
Last Release Date
2011-09-21
Compatible With
Owner
Alastair Munro
License
GPL
You must be logged in to submit a review.
Your review has been submitted and is pending approval.
To:
From:
Your recommendation has been sent.