SAN and NAS

check_IBM_DS_health

Description:

Plugin to monitor IBM DS4x00 / DS5x00 system health. It is not very sophisticated but I wanted to share it anyway.
You need to install IBM DS Storage Manager. The plugin uses SMcli command usually based in “/opt/IBM_DS/client/SMcli”. Location can be controlled with the “COMMAND” variable.

Check that the Nagios User has sufficient rights on “/opt/IBM_DS/client/SMcli” and “/var/opt/SM”, otherwise the check could fail or produce messages like “attempt to update the configuration file was unsuccessful”.

At least one Controller IP must be specified.

Usage: check_IBM_health.sh -a X.X.X.X -b X.X.X.X
-a IP of Controller A
-b IP of Controller B

define command {
command_name Check_IBM_DS_Health
command_line $USER1$/check_IBM_DS_health.sh -a $HOSTADDRESS$ -b $ARG1$
}

Tested with DS4300, DS4700, DS4800, DS5020 , DS5100 and Storage Manager 10.70, 10.77 and 10.83.

##################
Version 1.1 adds more intelligent filtering of unnecessary SMcli output and differentiation between Critical status for Hardware failures and Warning status for Preferred Path errors.

Version 1.2 patches the SMcli output parsing. Thanks to user “cseres” for the input!

Version 1.3 removes Clock Sync Warnings from the output.

Version 1.4 changes result parsing to fix “Unreadable sector” messages from DS3300/3400 not getting reported correctly. Thanks to user “Deep911” for the input!

Version 1.5 changes result parsing to fix “Battery Canister Expiration” messagesnot getting reported correctly. Also another wildcard entry in the nested “case”-statement was addedd to get at least a UNKNOWN response for any possible message. Thanks to user “dedri” for the input!

Current Version

1.5

Last Release Date

2013-01-08

Compatible With

  • Nagios 3.x

License

GPL


Project Files
Project Notes
Reviews (11) Add a Review
Lite Correction for IBM 3200 and 35xx
by sguibert, November 30, 2014

The plugins works good for IBM DS 3200 and IBM DS 3512. A little correction, because when plugins retreive status, it show a warning about ds storage name. Juste add option -quick to the command on check_IBM_DS_HEALTH.sh : #RESULT=$($COMMAND $CTRLA_IP $CTRLB_IP -c "show storageSubsystem healthStatus;") RESULT=$($COMMAND $CTRLA_IP $CTRLB_IP -c 'show storageSubsystem healthStatus;' -quick) thanks to Dick Visser : https://wiki.terena.org/display/~federated-user-3/Installing+SMcli+on+Ubuntu+12.04



additional parameters
by steinweb, November 30, 2014

Hi there, thanks for providing this script. It also works on my DS3400 and DS3512 boxes. I recently got a DS3512 which (somehow) requires a monitor/administrator password. I didn't want to provide the password in the script, but rather as parameter on the command line. Thus, I just forward any additional parameters directly to SMcli. Here's my modification to the script: # diff -u /scripts/check_IBM_DS_health_1.5.sh-orig /scripts/check_IBM_DS_health_1.5.sh --- /scripts/check_IBM_DS_health_1.5.sh-orig 2014-11-04 16:54:45.000000000 +0100 +++ /scripts/check_IBM_DS_health_1.5.sh 2014-11-04 17:44:38.000000000 +0100 @@ -27,7 +27,7 @@ ######################################################### #SMcli location -COMMAND=/opt/IBM_DS/client/SMcli +COMMAND="sudo /opt/IBM_DS/client/SMcli" # Define Nagios return codes # @@ -45,12 +45,14 @@ echo "IBM DS4x00/5x00 Health Check" echo "the script requires IP of at least one DS4x00/5x00 Controller, second is optional" echo "" - echo "Usage check_IBM_health.sh -a X.X.X.X -b X.X.X.X" + echo "Usage check_IBM_health.sh -a X.X.X.X -b X.X.X.X [...]" echo "" echo " -h Show this page" echo " -a IP of Controller A" echo " -b IP of Controller B" echo "" + echo " additional parameters are forwarded to SMcli" + echo "" exit 0 } @@ -78,10 +80,10 @@ shift CTRLB_IP=$1 ;; +# pass unknown commands to SMcli *) - echo "Unknown argument: $1" - print_help - exit $STATE_UNKNOWN + PAR="$@" + break ;; esac shift @@ -92,7 +94,7 @@ # ##execute SMcli -RESULT=$($COMMAND $CTRLA_IP $CTRLB_IP -c "show storageSubsystem healthStatus;") +RESULT=$($COMMAND $CTRLA_IP $CTRLB_IP $PAR -c "show storageSubsystem healthStatus;") ##filter unnecessary SMcli output RESULT=$(echo $RESULT |sed 's/Performing syntax check...//g' | sed 's/Syntax check complete.//g' | sed 's/Executing script...//g' | sed 's/Script execution complete.//g'| sed 's/SMcli completed successfully.//g' | sed 's/The controller clocks in the storage subsystem are out of synchronization with the storage management station.//g' | sed 's/ Controller in Slot [AB]://g' | sed 's/Storage Management Station://g' | sed 's/\s\s[0-9]{2}s[0-9]{2}:[0-9]{2}:[0-9]{2}s(CEST|CET)s[0-9]{4}//g')



Unknown response from SMcli: " "
by jamesc_syd, October 31, 2014

Works brilliantly under the command prompt returning: me@server libexec]# ./check_IBM_DS_health_1.5.sh -a 999.999.999.999 -b 999.999.999.999 Storage Subsystem health status = optimal. OK However when I try to run it in Nagios 4.0.8 I get: Unknown response from SMcli: " " UNKNOWN At a bit of a loss as to how to get it to work, any suggestions?



"Nominal Temperature Exceeded"
by lufi, May 31, 2014

Hello, I realized that if SMcli outputs the following message "The following failures have been found: Nominal Temperature Exceeded Storage Subsystem: (XXX) Component reporting problem: Thermal sensor Status: Nominal temperature exceeded Location: Drive enclosure 0 Component requiring service: Temperature sensor Enclosure: Controller/Drive enclosure" then the check returns unknown. I added "*failures*" in line 109 so also this error gets reported and the check becomes critical. Maybe you can considerate this for the next version of the check.



Great plugin, small change
by zhopkins, April 30, 2014

Thanks for the extremely useful plugin! I've found a scenario in which the plugin reports a warning/error condition as unknown. The following output is given by the plugin, ===== Unkown response from SMcli: " The following failures have been found: Insufficient Cache Backup Device Capacity Storage Subsystem: [[Array Name]] Component reporting problem: Not Available Status: Not Available Location: Controller/Drive enclosure, Controller in slot A Component requiring service: Controller in slot A Service action (removal) allowed: No Service action LED on component: Yes " ===== I think that modifying line 114 to include "Insufficient" under the warning search would be a reasonable change. Would you concur? == Changed Code === case "$RESULT" in *optimal*) echo $RESULT echo "OK" exit $STATE_OK ;; *failure*) case "$RESULT" in *failed*|*Failed*|*Unreadable*) echo $RESULT echo "CRITICAL" exit $STATE_CRITICAL ;; *preferred*|*Preferred*|*Expiration*|*Insufficient*) echo $RESULT echo "WARNING" exit $STATE_WARNING ;; *) echo "Unkown response from SMcli: " $RESULT "" echo "UNKNOWN" exit $STATE_UNKNOWN ;; esac ;; ====



error handling
by dedri, November 30, 2012

Very good script. Thank you. Further to the topic started from Deep911, the below error in my storagesubsystem also cannot be found and plugin gave me the output (null). ------------------ The following failures have been found: Battery Canister Nearing Expiration Storage Subsystem: MyCOmpany Component reporting problem: Battery Status: Near expiration Location: Controller enclosure 85, Controller in Slot A Smart battery: Yes Component requiring service: Controller A Service action (removal) allowed: No Service action LED on component: No Script execution complete. SMcli completed successfully. ----- The strange is that the output is nothing (null), and not what is written in the code: echo "Unkown response from SMcli: " $RESULT "" echo "UNKNOWN" exit $STATE_UNKNOWN



Great
by Tasslehoff, November 30, 2012

Very useful plugin! Tested successfully on DS4300 monitoring from Debian Etch with Storage Manager 10.83. Install SMcli on Debian: 1) explode SM10.83_Linux_32bit_x86_single-10.83.x5.23.tgz on filesystem 2) move to Linux_32bit_x86_10p83_singleLinux folder 3) extract files with "rpm2cpio SMclient-LINUX-10.83.G5.22-1.noarch.rpm | cpio -vid" 4) copy optIBM_DSclient where you want on filesystem 5) edit BASEDIR and JAVA_EXEC variables inside SMcli script (use JRE6 from Sun) If you want to run this plugin as nagios user remember to give execute permission on SMcli and the script itself (chmod 755) and run it as root editing /etc/sudoers for example: Cmnd_Alias SMCLI = /opt/IBM_DS/client/SMcli Cmnd_Alias IBMDS = /usr/lib/nagios/plugins/check_IBM_DS_health_1.3.sh nagios ALL=NOPASSWD: SMCLI nagios ALL=NOPASSWD: IBMDS Thanks moep!



Issue clearer?
by Schwabe, October 31, 2012

Hello, it is possible to make the output more clearer. When i remove a power cord, it's show me the following output: The following failures have been found: Power-Fan CRU/FRU - No Power Input Storage Subsystem: DS3512_1 Component reporting problem: Power supply CRU/FRU (Right) Status: No power input Location: Controller/Drive expansion enclosure Component requiring service: Power supply CRU/FRU (Right) Service action (removal) allowed: No Service action LED on component: Yes Subcomponent affected: Power supply (0) It would be clearer with: "Failed power supply @ right side"



Error handling
by Deep911, September 30, 2012

Thank you for writing the plugin. It's working with DS3400 and DS3300, but it did not notice all errors. This error: The following failures have been found: Unreadable sector(s) detected Storage Subsystem: DS3300 Unreadable sectors detected: 1 gave me the output (null). you must change *failure*) case "$RESULT" in *failed*|*Failed*) echo $RESULT echo "CRITICAL" exit $STATE_CRITICAL ;; to *failure*) case "$RESULT" in *failed*|*Failed*|*failures*) echo $RESULT echo "CRITICAL" exit $STATE_CRITICAL ;; for the right output like this. The following failures have been found: Unreadable sector(s) detected Storage Subsystem: DS3300 Unreadable sectors detected: 1 CRITICAL



Patch to improve error handling
by cseres, January 31, 2012

Thank you for writing the plugin. It's working with DS3400, but it did not notice all errors, because case failed to handle cases where - output contains string "Failed" with capital F - output doesn't contain strings "optimal" or "failure" Here's a patch for 1.1 that should fix it: 100c100 *failed*|*Failed*) 109a110,114 > *) > echo $RESULT > echo "UNKNOWN" > exit $STATE_UNKNOWN > ;; 112c117,122 *) > echo "Unkown response from SMCLI: " $RESULT "" > echo "UNKNOWN" > exit $STATE_UNKNOWN > ;; > esac



tested with DS3400
by mtrento, November 30, 2011

i just tested it sucessfully with a DS3400. SMCli can be downloaded here : http://www-933.ibm.com/support/fixcentral/swg/selectFixes?parent=ibm/Storage_Disk&product=ibm/Storage_Disk/DS3400&release=All&platform=All&function=all The script do not bother of clock synchronisation , it reports only the status. Here a sample output: ./check_IBM_DS_health.sh -a 10.0.0.1 The controller clocks in the storage subsystem are out of synchronization with the storage management station. Controller in Slot A: Mon Nov 21 22:29:53 CET 2011 Controller in Slot B: Mon Nov 21 22:27:57 CET 2011 Storage Management Station: Mon Nov 21 13:41:00 CET 2011 Storage Subsystem health status = optimal. OK



Add a Review

You must be logged in to submit a review.

Thank you for your review!

Your review has been submitted and is pending approval.

Recommend

To:


From:


Thank you for your recommendation!

Your recommendation has been sent.

Project Stats
Rating
4.9 (14)
Favorites
2
Views
113,287