HP (Compaq)

check_cciss – HP and Compaq Smart Array Hardware status

Description:

HP/HPE Smart Array Hardware status plugin for Nagios 1.x/2.x/3.x

Current Version

1.15

Last Release Date

2017-04-28

Compatible With

  • Nagios 1.x
  • Nagios 2.x
  • Nagios 3.x

Owner

License

GPL


Project Files
Project Photos
Project Notes
This plugin checks hardware status for Smart Array Controllers, using the HP Array Configuration Utility CLI / HPE Smart Storage Administrator. (Array, controller, cache, disk, battery, etc...) Examples: ./check_cciss -v RAID OK: Smart Array 6i in Slot 0 array A logicaldrive 1 (67.8 GB, RAID 1+0, OK) (Controller Status: OK Cache Status: OK Battery Status: OK) ./check_cciss -v -p RAID OK: Smart Array 6i in Slot 0 (Embedded) array A logicaldrive 1 (33.9 GB, RAID 1, OK) physicaldrive 2:0 (port 2:id 0 , Parallel SCSI, 36.4 GB, OK) physicaldrive 2:1 (port 2:id 1 , Parallel SCSI, 36.4 GB, OK) physicaldrive 1:5 (port 1:id 5 , Parallel SCSI, 72.8 GB, OK, spare) [Controller Status: OK Cache Status: OK Battery/Capacitor Status: OK] ./check_cciss RAID OK Another Examples: RAID CRITICAL - HP Smart Array Failed: Smart Array 6i in Slot 0 array A (failed) logicaldrive 1 (67.8 GB, 1+0, Interim Recovery Mode) RAID WARNING - HP Smart Array Rebuilding: Smart Array 6i in Slot 0 array A logicaldrive 1 (67.8 GB, 1+0, Rebuilding)
Reviews (26) Add a Review
RAID UNKNOWN
by RBERNARD91BOVIS, August 31, 2022

Hello, I have imported this plugin on my EON server. After following the steps, I have this error : RAID UNKNOWN - /usr/sbin/hpssacli did not execute properly : Error: No controllers detected. Possible causes: - The driver for the installed controller(s) is not loaded. - On LINUX, the scsi_generic (sg) driver module is not loaded. See the README file for more details. I have check the sudoer/permission and everything seems to be fine. Anyone could kindly advise me ? Regards,



patch for 1.15
by rstevens, December 31, 2018

I've had to make a bunch of hacks to this to keep it running over time. PROGPATH is wrong on some systems, the execute test does not look at the correct owner (since sudo is used), and a failed battery/capacity can result in the cache "permanently" disabled. @@ -188 +188,3 @@ -. $PROGPATH/utils.sh +[[ -x $PROGPATH/utils.sh ]] && . $PROGPATH/utils.sh +[[ -x /usr/lib/nagios/plugins/utils.sh ]] && . /usr/lib/nagios/plugins/utils.sh +[[ -x /usr/lib64/nagios/plugins/utils.sh ]] && . /usr/lib64/nagios/plugins/utils.sh @@ -291 +293 @@ -if [ ! -x $hpacucli ]; then +if [ ! -f $hpacucli ]; then @@ -296 +298 @@ - if [ -x $hpssacli ]; then + if [ -f $hpssacli ]; then @@ -410,0 +413 @@ + check=`echo "$check" | grep -v 'Cache Status: Permanently Disabled'`



Great work!
by mamiral, August 31, 2017

Very nice plugin. I can only recommend. Thanks for your work! I had to make a small fix to work on my system: (v1.15) on line 296: from: if [ -x $hpssacli ]; then to: if [ -e $hpssacli ]; then since -x tests FILE exists and execute (or search) permission is granted But I guess in my case execute is not granted at test time.



Issue with receiving alerts using icinga2
by Colby, May 31, 2017

I am having issues getting an alert in icinga2, are RAID is critical but there is no alert in icinga2 RAID CRITICAL - HP Smart Array Failed: Smart Array P410 in Slot 2 Controller Status: OK Cache Status: Temporarily Disabled Battery/Capacitor Status: Failed (Replace Batteries/Capacitors) Smart Array P410i in Slot 0 (Embedded) Controller Status: OK Cache Status: OK Battery/Capacitor Status: OK



RAID UNKNOWN - /usr/sbin/hpacucli did not execute properly : Error: The specified device does not have any logical drives.
by mysticman76, March 31, 2017

Getting this on HP DL380G7 RHEL 5 64bit. Installed HPACUCLI 9.40 and using check_cciss.sh 1.14. did same thing on older hpacucli 8.70 and cciss.sh 1.9. hpacucli work fine if executed stand alone. Yes sudoers has correct entry. any help would be appreciated.



check_cciss_1.14 looksfor /usr/sbin/hpssacli but ssacli installs as /usr/sbin/ssacli
by tt, March 31, 2017

Workaround: $ sudo ln -s /usr/sbin/ssacli /usr/sbin/hpssacli Also, the script notes that /etc/sudoers should contain: nagios ALL=NOPASSWD: /usr/sbin/hpacucli, /usr/sbin/hpssacli For RHEL7 nrpe package at least this should be: nrpe ALL=NOPASSWD: /usr/sbin/hpacucli, /usr/sbin/hpssacli If you adapt the script to check for ssacli instead of or in addition to hpssacli of course you also need to change the /etc/sudoers entry. Lastly, it's good practice to use a separate /etc/sudoers.d/ssacli file instead of changing the main /etc/sudoers file.



Battery charge/discharge reporting bug
by pjunod, July 31, 2015

The status for cache being temporarily disabled due to the raid battery charging or discharging is being misreported as a critical alert instead of warning level. Here is a patch that fixes it: @@ -346,6 +346,9 @@ if echo ${check} | egrep Failed >/dev/null; then echo "RAID CRITICAL - HP Smart Array Failed: "${check} | egrep Failed exit $STATE_CRITICAL +elif echo ${check} | egrep "Cache Status: Temporarily Disabled" >/dev/null; then + echo "RAID WARNING - HP Smart Array Cache Disabled: "${check} + exit $STATE_WARNING elif echo ${check} | egrep Disabled >/dev/null; then echo "RAID CRITICAL - HP Smart Array Problem: "${check} | egrep Disabled exit $STATE_CRITICAL @@ -361,9 +364,6 @@ elif echo ${check2} | egrep Recover >/dev/null; then echo "RAID WARNING - HP Smart Array Recovering: "${check2} | egrep Recover exit $STATE_WARNING -elif echo ${check} | egrep "Cache Status: Temporarily Disabled" >/dev/null; then - echo "RAID WARNING - HP Smart Array Cache Disabled: "${check} - exit $STATE_WARNING elif echo ${check} | egrep FIRMWARE >/dev/null; then echo "RAID WARNING - "${check} exit $STATE_WARNING



Does not work with NRPE
by ljorg, June 30, 2015

Works great locally, but gives an error when called through NRPE: RAID UNKNOWN - /usr/sbin/hpacucli did not execute properly : sudo: sorry, you must have a tty to run sudo Another poster has reported the same a year ago but no response from developer. Is plugin dead?



Works Like It Should
by dbentley, October 31, 2014

Works like a charm. I love how it does everything for you with little options. All you need to know is either you want verbose and physical output or not.



HPSA Driver
by isaaclw, October 31, 2014

Seems silly you have to submit a review in order to communicate with the developer, ah well. It doesn't appear that any of the three drivers are available on our system: HPPROC="/proc/driver/cciss/cciss" HPSCSIPROC="/proc/scsi/scsi" COMPAQPROC="/proc/driver/cpqarray/ida" However, the program 'hpacucli' seems to work correctly. I suspect it has something to do with the fact that we don't need the cciss drivers anymore but are using hpsa (/sys/module/hpsa/) If this is something you'll fix, then I'll wait for an update, otherwise I'll patch it locally, perhaps by ripping out the checks.



Can you help me ?!
by kaly, February 28, 2014

Hi, your script works great on the local machine but when I try it on the nagios server with nrpe, I have this message : RAID UNKNOWN - /usr/sbin/hpacucli did not execute properly : sudo: sorry, you must have a tty to run sudo Could you help me to resolve this ?



Very useful
by darkweaver87, August 31, 2013

On debian wheezy with an SL4540 I had to comment the HP Smart Array presence check in order to make it working. Otherwize I've got "RAID UNKNOWN - HP Smart Array not found". I think it's because the system recognize it has an internal drive, so you may add an option to enable/disable the check.



status of physical disks not displayed
by Patrick, May 31, 2013

The -p switch seems to not work. How can I fix this? I'd like to get the status of physical disks in the output. I've added -p to my command but still getting the same result like -v without the status of physical disks. ./check_cciss-1.11 -v RAID CRITICAL - HP Smart Array Failed: Smart Array P400 in Slot 1 Controller Status: OK Cache Status: Temporarily Disabled Battery/Capacitor Status: Failed (Replace Batteries/Capacitors) ./check_cciss-1.11 -v -p RAID CRITICAL - HP Smart Array Failed: Smart Array P400 in Slot 1 Controller Status: OK Cache Status: Temporarily Disabled Battery/Capacitor Status: Failed (Replace Batteries/Capacitors)



Works fine, as long as you have less then 10 slots
by godish, April 30, 2013

We have a few DL580 g5's with the raid controller in slot 11, and this check command doesn't work, it doesn't look for more then one character in your grep statement. [nrpe@CCNETENGDB2] [/usr/lib64/nagios/plugins] > ./check_cciss RAID UNKNOWN - /usr/sbin/hpacucli did not execute properly : Error: The controller identified by "slot=1" was not detected. Line 256, I added a + to Slot w to w+. after adding this, the slot is properly identified and everything works fine! # Get "Slot" & exclude slot needed if [ "$EXCLUDE_SLOT" = "1" ]; then slots=`echo ${check} | egrep -o "Slot w+" | awk '{print $NF}' | grep -v "$excludeslot"` else slots=`echo ${check} | egrep -o "Slot w+" | awk '{print $NF}'` fi --Joe



Excellent - one small patch
by j.mccanta@f5.com, March 31, 2013

Great check. Works like a champ out of the box. I patched it to auto-detect the hpsa driver. We have a mix of cciss and hpsa. --- check_cciss-1.11 2013-03-27 12:13:13.732582522 -0700 +++ check_cciss 2013-03-27 11:42:54.888555702 -0700 @@ -209,7 +209,7 @@ done # Use HPSA driver (Hewlett Packard Smart Array) -if [ "$HPSA" = "1" ]; then +if [ "$HPSA" = "1" -o -d /sys/bus/pci/drivers/hpsa ]; then COMPAQPROC="/proc/scsi/scsi" fi



Script modifid for servers where slot number is greated than one digit
by nityanaths, November 30, 2012

Please change following lines if your controller has slots in 2 digits: if [ "$EXCLUDE_SLOT" = "1" ]; then # slots=`echo ${check} | egrep -o "Slot w" | awk '{print $NF}' | grep -v "$excludeslot"` slots=`echo ${check} | egrep -o "Slot w*" | awk '{print $NF}' | grep -v "$excludeslot"` else # slots=`echo ${check} | egrep -o "Slot w" | awk '{print $NF}'` slots=`echo ${check} | egrep -o "Slot w*" | awk '{print $NF}'` fi



Works great!
by GldRush98, July 31, 2012

Worked "out of the box" on CentOS 5.8 with a Smart Array P400. Thanks a lot!



NOT working on P410i raid controller
by sparkey, June 30, 2012

I can't get this to work on any of my P410i controllers. The output is the following: username@Server:~$ /usr/lib/nagios/plugins/check_cciss -d ### Check if "HP Smart Array" (/proc/driver/cciss/cciss) is present >>> cat: /proc/driver/cciss/cciss*: No such file or directory ### Check if "HP Smart Array" (/proc/driver/cpqarray/ida) is present >>> cat: /proc/driver/cpqarray/ida*: No such file or directory RAID UNKNOWN - HP Smart Array not found



Please help me !
by jgh2008, June 30, 2012

Why it can't work on my HPDL385G7 with CentOS6.2,when I run check_cciss command, it tell me "RAID UNKNOWN - HP Smart Array not found",who can help me ? PS.the raid HW is P410i



Is this just for the local machine?
by Andrew, April 30, 2012

I'm sorry I'm new at linux and nagios so this is probably a dumb question. Can someone point me in a direction as to how to use this plugin to check a server that is being monitor by my nagios server?



nasty bug.
by Peter, April 30, 2012

On failure this returns "exit $STATE_CRITICAL" But $STATE_CRITICAL is not defined, so the return status is always good. Only the Status Information text changes.



Updated to check_cciss 1.9
by simonerosa, March 31, 2012

Updated to check_cciss 1.9 (see www.monitoringexchange.org if not present here) - Increased debug verbosity - Added arguments to detect controller with HPSA driver (Hewlett Packard Smart Array) (-s) - Recognize required firmware upgrades - Don't confuse messages about a new fimrware with a chassis-error - Check physical drives for predicted failures - Added arguments to show detail for physical drives (-p) - Check the state of the cache (a dead battery will turn the cache off) Happy Nagios ;-)



Works for me!
by jbroome, December 31, 2011

If you adjust the grep that jisse44 suggests to Fail vs. the Failed he suggests you'll pick up a drive status of "Predictive Failure" as well.



Great plugin
by leprasmurf, December 31, 2011

Has worked well for my purposes. However, if there's a firmware upgrade, the check fails with "RAID UNKNOWN - /usr/sbin/hpacucli did not execute properly : Error: The controller identified by "chassisname=a" was not detected." The firmware update text is falsely matching the egrep's regex. I made the following change to line 215 and 217: original ... | egrep -v "Slot" | ... modified ... | egrep -v -e "Slot" -e "scenario" | ...



Good
by jisse44, June 30, 2011

Hi, very good plugin. I just add lines to watch which physical drive is down or rebuilding, after line 210 of v1.8: check2c=`sudo -u root $hpacucli controller slot=$slot physicaldrive all show 2>&1 | grep '(Failed|Rebuilding)' | awk '{print $1, $2}'` status=$? if test ${status} -ne 0; then echo "RAID UNKNOWN - $hpacucli did not execute properly : "${check2c} exit $STATE_UNKNOWN fi check2="$check2$check2b -> /! $check2c"



best of the three
by jdecello, March 31, 2010

This one works out of the box. The check_hparray that is just like this one does not work with nagios3. The check_hparray.pl errors out on arrays of slot=0 (all of mine) and didn't give verbose output. This one I just did a -v in nrpe.cfg, simple and detailed.... though it doesn't handle UNKNOWN quite right... was still green rather than other. not a big deal....



Add a Review

You must be logged in to submit a review.

Thank you for your review!

Your review has been submitted and is pending approval.

Recommend

To:


From:


Thank you for your recommendation!

Your recommendation has been sent.

Project Stats
Rating
4.3 (39)
Favorites
5
Views
212,546