Build precise queries to find exactly what you need
Press ESC to close
Join our next live webinar: “Advanced Nagios Monitoring Techniques” – Register Now
Your review has been submitted and is pending approval.
HP/HPE Smart Array Hardware status plugin for Nagios 1.x/2.x/3.x
Current Version
1.15
Last Release Date
2017-04-28
Owner
Simone
License
GPL
Compatible With
check_cciss 2008/10/06 (v.1.8)
check_cciss 2012/03/06 (v.1.9)
check_cciss 2012/04/04 (v.1.10)
check_cciss 2012/07/16 (v.1.11)
check_cciss 2013/11/20 (v.1.12)
check_cciss 2017/01/23 (v.1.13)
check_cciss 2017/02/25 (v.1.14)
check_cciss 2017/04/28 (v.1.15)
This plugin checks hardware status for Smart Array Controllers, using the HP Array Configuration Utility CLI / HPE Smart Storage Administrator. (Array, controller, cache, disk, battery, etc...) Examples: ./check_cciss -v RAID OK: Smart Array 6i in Slot 0 array A logicaldrive 1 (67.8 GB, RAID 1+0, OK) (Controller Status: OK Cache Status: OK Battery Status: OK) ./check_cciss -v -p RAID OK: Smart Array 6i in Slot 0 (Embedded) array A logicaldrive 1 (33.9 GB, RAID 1, OK) physicaldrive 2:0 (port 2:id 0 , Parallel SCSI, 36.4 GB, OK) physicaldrive 2:1 (port 2:id 1 , Parallel SCSI, 36.4 GB, OK) physicaldrive 1:5 (port 1:id 5 , Parallel SCSI, 72.8 GB, OK, spare) [Controller Status: OK Cache Status: OK Battery/Capacitor Status: OK] ./check_cciss RAID OK Another Examples: RAID CRITICAL - HP Smart Array Failed: Smart Array 6i in Slot 0 array A (failed) logicaldrive 1 (67.8 GB, 1+0, Interim Recovery Mode) RAID WARNING - HP Smart Array Rebuilding: Smart Array 6i in Slot 0 array A logicaldrive 1 (67.8 GB, 1+0, Rebuilding)
Hello, I have imported this plugin on my EON server. After following the steps, I have this error : RAID UNKNOWN - /usr/sbin/hpssacli did not execute properly : Error: No controllers detected. Possible causes: - The driver for the installed controller(s) is not loaded. - On LINUX, the scsi_generic (sg) driver module is not loaded. See the README file for more details. I have check the sudoer/permission and everything seems to be fine. Anyone could kindly advise me ? Regards,
I've had to make a bunch of hacks to this to keep it running over time. PROGPATH is wrong on some systems, the execute test does not look at the correct owner (since sudo is used), and a failed battery/capacity can result in the cache "permanently" disabled. @@ -188 +188,3 @@ -. $PROGPATH/utils.sh +[[ -x $PROGPATH/utils.sh ]] && . $PROGPATH/utils.sh +[[ -x /usr/lib/nagios/plugins/utils.sh ]] && . /usr/lib/nagios/plugins/utils.sh +[[ -x /usr/lib64/nagios/plugins/utils.sh ]] && . /usr/lib64/nagios/plugins/utils.sh @@ -291 +293 @@ -if [ ! -x $hpacucli ]; then +if [ ! -f $hpacucli ]; then @@ -296 +298 @@ - if [ -x $hpssacli ]; then + if [ -f $hpssacli ]; then @@ -410,0 +413 @@ + check=`echo "$check" | grep -v 'Cache Status: Permanently Disabled'`
Very nice plugin. I can only recommend. Thanks for your work! I had to make a small fix to work on my system: (v1.15) on line 296: from: if [ -x $hpssacli ]; then to: if [ -e $hpssacli ]; then since -x tests FILE exists and execute (or search) permission is granted But I guess in my case execute is not granted at test time.
Hi mamiral, thank you for using my plugin. I'm glad you've adapted it to your system :-) Regards
I am having issues getting an alert in icinga2, are RAID is critical but there is no alert in icinga2 RAID CRITICAL - HP Smart Array Failed: Smart Array P410 in Slot 2 Controller Status: OK Cache Status: Temporarily Disabled Battery/Capacitor Status: Failed (Replace Batteries/Capacitors) Smart Array P410i in Slot 0 (Embedded) Controller Status: OK Cache Status: OK Battery/Capacitor Status: OK
Hi Colby, thanks for testing my plugin with Icinga 2. Sorry but I'm not using this alternative to Nagios. You have to adapt it if needed (for example the "$STATE_*" definited into Nagios "utils.sh"). Regards
Getting this on HP DL380G7 RHEL 5 64bit. Installed HPACUCLI 9.40 and using check_cciss.sh 1.14. did same thing on older hpacucli 8.70 and cciss.sh 1.9. hpacucli work fine if executed stand alone. Yes sudoers has correct entry. any help would be appreciated.
Hi, updated today to 1.15 with debug/notes about this. The problem can be this: # chmod 555 /usr/sbin/hpacucli # chmod 555 /usr/sbin/hpssacli (default are 500 root:root)
Workaround: $ sudo ln -s /usr/sbin/ssacli /usr/sbin/hpssacli Also, the script notes that /etc/sudoers should contain: nagios ALL=NOPASSWD: /usr/sbin/hpacucli, /usr/sbin/hpssacli For RHEL7 nrpe package at least this should be: nrpe ALL=NOPASSWD: /usr/sbin/hpacucli, /usr/sbin/hpssacli If you adapt the script to check for ssacli instead of or in addition to hpssacli of course you also need to change the /etc/sudoers entry. Lastly, it's good practice to use a separate /etc/sudoers.d/ssacli file instead of changing the main /etc/sudoers file.
Notes added to v.1.15 Thanks
The status for cache being temporarily disabled due to the raid battery charging or discharging is being misreported as a critical alert instead of warning level. Here is a patch that fixes it: @@ -346,6 +346,9 @@ if echo ${check} | egrep Failed >/dev/null; then echo "RAID CRITICAL - HP Smart Array Failed: "${check} | egrep Failed exit $STATE_CRITICAL +elif echo ${check} | egrep "Cache Status: Temporarily Disabled" >/dev/null; then + echo "RAID WARNING - HP Smart Array Cache Disabled: "${check} + exit $STATE_WARNING elif echo ${check} | egrep Disabled >/dev/null; then echo "RAID CRITICAL - HP Smart Array Problem: "${check} | egrep Disabled exit $STATE_CRITICAL @@ -361,9 +364,6 @@ elif echo ${check2} | egrep Recover >/dev/null; then echo "RAID WARNING - HP Smart Array Recovering: "${check2} | egrep Recover exit $STATE_WARNING -elif echo ${check} | egrep "Cache Status: Temporarily Disabled" >/dev/null; then - echo "RAID WARNING - HP Smart Array Cache Disabled: "${check} - exit $STATE_WARNING elif echo ${check} | egrep FIRMWARE >/dev/null; then echo "RAID WARNING - "${check} exit $STATE_WARNING
Works great locally, but gives an error when called through NRPE: RAID UNKNOWN - /usr/sbin/hpacucli did not execute properly : sudo: sorry, you must have a tty to run sudo Another poster has reported the same a year ago but no response from developer. Is plugin dead?
Works like a charm. I love how it does everything for you with little options. All you need to know is either you want verbose and physical output or not.
Seems silly you have to submit a review in order to communicate with the developer, ah well. It doesn't appear that any of the three drivers are available on our system: HPPROC="/proc/driver/cciss/cciss" HPSCSIPROC="/proc/scsi/scsi" COMPAQPROC="/proc/driver/cpqarray/ida" However, the program 'hpacucli' seems to work correctly. I suspect it has something to do with the fact that we don't need the cciss drivers anymore but are using hpsa (/sys/module/hpsa/) If this is something you'll fix, then I'll wait for an update, otherwise I'll patch it locally, perhaps by ripping out the checks.
Hi, your script works great on the local machine but when I try it on the nagios server with nrpe, I have this message : RAID UNKNOWN - /usr/sbin/hpacucli did not execute properly : sudo: sorry, you must have a tty to run sudo Could you help me to resolve this ?
On debian wheezy with an SL4540 I had to comment the HP Smart Array presence check in order to make it working. Otherwize I've got "RAID UNKNOWN - HP Smart Array not found". I think it's because the system recognize it has an internal drive, so you may add an option to enable/disable the check.
The -p switch seems to not work. How can I fix this? I'd like to get the status of physical disks in the output. I've added -p to my command but still getting the same result like -v without the status of physical disks. ./check_cciss-1.11 -v RAID CRITICAL - HP Smart Array Failed: Smart Array P400 in Slot 1 Controller Status: OK Cache Status: Temporarily Disabled Battery/Capacitor Status: Failed (Replace Batteries/Capacitors) ./check_cciss-1.11 -v -p RAID CRITICAL - HP Smart Array Failed: Smart Array P400 in Slot 1 Controller Status: OK Cache Status: Temporarily Disabled Battery/Capacitor Status: Failed (Replace Batteries/Capacitors)
Hello Patrick, use the "-p" (detail for physical drives) with the "-v" (status and informations about RAID). Regards
We have a few DL580 g5's with the raid controller in slot 11, and this check command doesn't work, it doesn't look for more then one character in your grep statement. [nrpe@CCNETENGDB2] [/usr/lib64/nagios/plugins] > ./check_cciss RAID UNKNOWN - /usr/sbin/hpacucli did not execute properly : Error: The controller identified by "slot=1" was not detected. Line 256, I added a + to Slot w to w+. after adding this, the slot is properly identified and everything works fine! # Get "Slot" & exclude slot needed if [ "$EXCLUDE_SLOT" = "1" ]; then slots=`echo ${check} | egrep -o "Slot w+" | awk '{print $NF}' | grep -v "$excludeslot"` else slots=`echo ${check} | egrep -o "Slot w+" | awk '{print $NF}'` fi --Joe
Thanks, I've not a DL580 :-) It will be fixed/included with the 1.12 version
Great check. Works like a champ out of the box. I patched it to auto-detect the hpsa driver. We have a mix of cciss and hpsa. --- check_cciss-1.11 2013-03-27 12:13:13.732582522 -0700 +++ check_cciss 2013-03-27 11:42:54.888555702 -0700 @@ -209,7 +209,7 @@ done # Use HPSA driver (Hewlett Packard Smart Array) -if [ "$HPSA" = "1" ]; then +if [ "$HPSA" = "1" -o -d /sys/bus/pci/drivers/hpsa ]; then COMPAQPROC="/proc/scsi/scsi" fi
Good! ;-)
Please change following lines if your controller has slots in 2 digits: if [ "$EXCLUDE_SLOT" = "1" ]; then # slots=`echo ${check} | egrep -o "Slot w" | awk '{print $NF}' | grep -v "$excludeslot"` slots=`echo ${check} | egrep -o "Slot w*" | awk '{print $NF}' | grep -v "$excludeslot"` else # slots=`echo ${check} | egrep -o "Slot w" | awk '{print $NF}'` slots=`echo ${check} | egrep -o "Slot w*" | awk '{print $NF}'` fi
Thanks. It will be fixed/included with the 1.12 version
Worked "out of the box" on CentOS 5.8 with a Smart Array P400. Thanks a lot!
I can't get this to work on any of my P410i controllers. The output is the following: username@Server:~$ /usr/lib/nagios/plugins/check_cciss -d ### Check if "HP Smart Array" (/proc/driver/cciss/cciss) is present >>> cat: /proc/driver/cciss/cciss*: No such file or directory ### Check if "HP Smart Array" (/proc/driver/cpqarray/ida) is present >>> cat: /proc/driver/cpqarray/ida*: No such file or directory RAID UNKNOWN - HP Smart Array not found
Hi! Try -s ...detect controller with HPSA (Hewlett Packard Smart Array)
Why it can't work on my HPDL385G7 with CentOS6.2,when I run check_cciss command, it tell me "RAID UNKNOWN - HP Smart Array not found",who can help me ? PS.the raid HW is P410i
I'm sorry I'm new at linux and nagios so this is probably a dumb question. Can someone point me in a direction as to how to use this plugin to check a server that is being monitor by my nagios server?
Hi Andrew, the script is simple to setup. You can ask you question to Nagios Forum http://support.nagios.com/forum/ or similar forum/mailing list. Regards
On failure this returns "exit $STATE_CRITICAL" But $STATE_CRITICAL is not defined, so the return status is always good. Only the Status Information text changes.
Peter, the "$STATE_*" are definited into Nagios "utils.sh" (see include at line 134 of check_cciss-1.10) The script work correctly from 2005 (v.1.0) with the same states!
Updated to check_cciss 1.9 (see www.monitoringexchange.org if not present here) - Increased debug verbosity - Added arguments to detect controller with HPSA driver (Hewlett Packard Smart Array) (-s) - Recognize required firmware upgrades - Don't confuse messages about a new fimrware with a chassis-error - Check physical drives for predicted failures - Added arguments to show detail for physical drives (-p) - Check the state of the cache (a dead battery will turn the cache off) Happy Nagios ;-)
Updated to 1.10 Happy Nagios! :-)
If you adjust the grep that jisse44 suggests to Fail vs. the Failed he suggests you'll pick up a drive status of "Predictive Failure" as well.
Fixed! Thanks
Has worked well for my purposes. However, if there's a firmware upgrade, the check fails with "RAID UNKNOWN - /usr/sbin/hpacucli did not execute properly : Error: The controller identified by "chassisname=a" was not detected." The firmware update text is falsely matching the egrep's regex. I made the following change to line 215 and 217: original ... | egrep -v "Slot" | ... modified ... | egrep -v -e "Slot" -e "scenario" | ...
Hi, very good plugin. I just add lines to watch which physical drive is down or rebuilding, after line 210 of v1.8: check2c=`sudo -u root $hpacucli controller slot=$slot physicaldrive all show 2>&1 | grep '(Failed|Rebuilding)' | awk '{print $1, $2}'` status=$? if test ${status} -ne 0; then echo "RAID UNKNOWN - $hpacucli did not execute properly : "${check2c} exit $STATE_UNKNOWN fi check2="$check2$check2b -> /! $check2c"
This one works out of the box. The check_hparray that is just like this one does not work with nagios3. The check_hparray.pl errors out on arrays of slot=0 (all of mine) and didn't give verbose output. This one I just did a -v in nrpe.cfg, simple and detailed.... though it doesn't handle UNKNOWN quite right... was still green rather than other. not a big deal....
You must be logged in to submit a review.
To:
From: