Linux

check_iostat – I/O statistics

Description:

This plugin shows the I/O usage of the specified disk, using the iostat external program. It prints three statistics: Transactions per second (tps), Kilobytes per second read from the disk (KB_read/s) and and written to the disk (KB_written/s)

Current Version

Last Release Date

June 16, 2009

Compatible With


Project Files
Project Notes
This simple plugins uses iostat to obtain it's metrics, parses it, and uses bc for comparing the results with the specified WARNING and CRITICAL levels (since the shell can't compare floating point numbers). Feedbacks/suggestions are appreciated =)
Reviews (18) Add a Review
Fix infinite generating reports in Debian 11
by ao, September 30, 2024

When upgrading to Debian 11, the check didn't work anymore. I found the problem in Debian 11 with the help of randomtask commend. The variable $samples is defined with '2i'. In Debian 11 it will ignore the $samples in command (see below) and will infinitely generate reports. That is the reason why the check isn't working anymore as it is not reporting anything back. The fix is simple: remove the i. TMPX=`$iostat $disk -x -k -d 5 $samples | grep $disk | tail -1` TMPD=`$iostat $disk -k -d 5 $samples | grep $disk | tail -1` ---------- #!/bin/bash #----------check_iostat.sh----------- # # Version 0.0.2 - Jan/2009 # Changes: added device verification # # by Thiago Varela - thiago@iplenix.com # # Version 0.0.3 - Dec/2011 # Changes: # - changed values from bytes to mbytes # - fixed bug to get traffic data without comma but point # - current values are displayed now, not average values (first run of iostat) # # by Philipp Niedziela - pn@pn-it.com # # Version 0.0.4 - April/2014 # Changes: # - Allow Empty warn/crit levels # - Can check I/O, WAIT Time, or Queue # # by Warren Turner # # Version 0.0.5 - Jun/2014 # Changes: # - removed -y flag from call since iostat doesn't know about it any more (June 2014) # - only needed executions of iostat are done now (save cpu time whenever you can) # - fixed the obvious problems of missing input values (probably because of the now unimplemented "-y") with -x values # - made perfomance data optional (I like to have choice in the matter) # # by Frederic Krueger / fkrueger-dev-checkiostat@holics.at # # Version 0.0.6 - Jul/2014 # Changes: # - Cleaned up argument checking, removed excess iostat calls, steamlined if statements and renamed variables to fit current use # - Fixed all inputs to match current iostat output (Ubuntu 12.04) # - Changed to take last ten seconds as default (more useful for nagios usage). Will go to "since last reboot" (previous behaviour) on -g flag. # - added extra comments/whitespace etc to make add readability # # by Ben Field / ben.field@concreteplatform.com # # Version 0.0.7 - Sep/2014 # Changes: # - Fixed performance data for Wait check # # by Christian Westergard / christian.westergard@gmail.com # # Version 0.0.8 - Jan/2019 # Changes: # - Added Warn/Crit thresholds to performance output # # by Danny van Zunderd / danny_vz@live.nl # # Version 0.0.9 - Okt/2024 # Changes: # - Fixed the problem with infinite generating reports with iostat in Debian 11. Changed the $samples from 2i to 2. # # by Alex iostat=`which iostat 2>/dev/null` bc=`which bc 2>/dev/null` function help { echo -e " Usage: -d = --Device to be checked. Example: "-d sda" Run only one of i, q, W: -i = IO Check Mode --Checks Total Transfers/sec, Read IO/Sec, Write IO/Sec, Bytes Read/Sec, Bytes Written/Sec --warning/critical = Total Transfers/sec,Read IO/Sec,Write IO/Sec,Bytes Read/Sec,Bytes Written/Sec -q = Queue Mode --Checks Disk Queue Lengths --warning/critial = Average size of requests, Queue length of requests -W = Wait Time Mode --Check the time for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them. --warning/critical = Avg I/O Wait Time (ms), Avg Read Wait Time (ms), Avg Write Wait Time (ms), Avg Service Wait Time (ms), Avg CPU Utilization -w,-c = pass warning and critical levels respectively. These are not required, but with out them, all queries will return as OK. -p = Provide performance data for later graphing -g = Since last reboot for system (more for debugging that nagios use!) -h = This help " exit -1 } # Ensuring we have the needed tools: ( [ ! -f $iostat ] || [ ! -f $bc ] ) && ( echo "ERROR: You must have iostat and bc installed in order to run this pluginntuse: apt-get install systat bcn" && exit -1 ) io=0 queue=0 waittime=0 printperfdata=0 STATE="OK" samples=2 status=0 MSG="" PERFDATA="" #------------Argument Set------------- while getopts "d:w:c:ipqWhg" OPT; do case $OPT in "d") disk=$OPTARG;; "w") warning=$OPTARG;; "c") critical=$OPTARG;; "i") io=1;; "p") printperfdata=1;; "q") queue=1;; "W") waittime=1;; "g") samples=1;; "h") echo "help:" && help;; ?) echo "Invalid option: -$OPTARG" >&2 exit -1 ;; esac done # Autofill if parameters are empty if [ -z "$disk" ] then disk=sda fi #Checks that only one query type is run [[ `expr $io+$queue+$waittime` -ne "1" ]] && echo "ERROR: select one and only one run mode" && help #set warning and critical to insane value is empty, else set the individual values if [ -z "$warning" ] then warning=99999 else #TPS with IO, Request size with queue warn_1=`echo $warning | cut -d, -f1` #Read/s with IO,Queue Length with queue warn_2=`echo $warning | cut -d, -f2` #Write/s with IO warn_3=`echo $warning | cut -d, -f3` #KB/s read with IO warn_4=`echo $warning | cut -d, -f4` #KB/s written with IO warn_5=`echo $warning | cut -d, -f5` #Crude hack due to integer expression later in the script warning=1 fi if [ -z "$critical" ] then critical=99999 else #TPS with IO, Request size with queue crit_1=`echo $critical | cut -d, -f1` #Read/s with IO,Queue Length with queue crit_2=`echo $critical | cut -d, -f2` #Write/s with IO crit_3=`echo $critical | cut -d, -f3` #KB/s read with IO crit_4=`echo $critical | cut -d, -f4` #KB/s written with IO crit_5=`echo $critical | cut -d, -f5` #Crude hack due to integer expression later in the script critical=1 fi #------------Argument Set End------------- #------------Parameter Check------------- #Checks for sane Disk name: [ ! -b "/dev/$disk" ] && echo "ERROR: Device incorrectly specified" && help #Checks for sane warning/critical levels if ( [[ $warning -ne "99999" ]] || [[ $critical -ne "99999" ]] ); then if ( [[ "$warn_1" -gt "$crit_1" ]] || [[ "$warn_2" -gt "$crit_2" ]] ); then echo "ERROR: critical levels must be higher than warning levels" && help elif ( [[ $io -eq "1" ]] || [[ $waittime -eq "1" ]] ); then if ( [[ "$warn_3" -gt "$crit_3" ]] || [[ "$warn_4" -gt "$crit_4" ]] || [[ "$warn_5" -gt "$crit_5" ]] ); then echo "ERROR: critical levels must be higher than warning levels" && help fi fi fi #------------Parameter Check End------------- # iostat parameters: # -m: megabytes # -k: kilobytes # first run of iostat shows statistics since last reboot, second one shows current vaules of hdd # -d is the duration for second run, -x the rest TMPX=`$iostat $disk -x -k -d 10 $samples | grep $disk | tail -1` #------------IO Test------------- if [ "$io" == "1" ]; then TMPD=`$iostat $disk -k -d 10 $samples | grep $disk | tail -1` #Requests per second: tps=`echo "$TMPD" | awk '{print $2}'` read_sec=`echo "$TMPX" | awk '{print $4}'` written_sec=`echo "$TMPX" | awk '{print $5}'` #Kb per second: kbytes_read_sec=`echo "$TMPX" | awk '{print $6}'` kbytes_written_sec=`echo "$TMPX" | awk '{print $7}'` # "Converting" values to float (string replace , with .) tps=${tps/,/.} read_sec=${read_sec/,/.} written_sec=${written_sec/,/.} kbytes_read_sec=${kbytes_read_sec/,/.} kbytes_written_sec=${kbytes_written_sec/,/.} # Comparing the result and setting the correct level: if [ "$warning" -ne "99999" ]; then if ( [ "`echo "$tps >= $warn_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_2" | bc`" == "1" ] || [ "`echo "$written_sec >= $warn_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $warn_4" | bc -q`" == "1" ] || [ "`echo "$kbytes_written_sec >= $warn_5" | bc`" == "1" ] ); then STATE="WARNING" status=1 fi fi if [ "$critical" -ne "99999" ]; then if ( [ "`echo "$tps >= $crit_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_2" | bc -q`" == "1" ] || [ "`echo "$written_sec >= $crit_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $crit_4" | bc -q`" == "1" ] || [ "`echo "$kbytes_written_sec >= $crit_5" | bc`" == "1" ] ); then STATE="CRITICAL" status=2 fi fi # Printing the results: MSG="$STATE - I/O stats: Transfers/Sec=$tps Read Requests/Sec=$read_sec Write Requests/Sec=$written_sec KBytes Read/Sec=$kbytes_read_sec KBytes_Written/Sec=$kbytes_written_sec" PERFDATA=" | total_io_sec'=$tps;$warn_1;$crit_1; read_io_sec=$read_sec;$warn_2;$crit_2; write_io_sec=$written_sec;$warn_3;$crit_3; kbytes_read_sec=$kbytes_read_sec;$warn_4;$crit_4; kbytes_written_sec=$kbytes_written_sec;$warn_5;$crit_5;" fi #------------IO Test End------------- #------------Queue Test------------- if [ "$queue" == "1" ]; then qsize=`echo "$TMPX" | awk '{print $8}'` qlength=`echo "$TMPX" | awk '{print $9}'` # "Converting" values to float (string replace , with .) qsize=${qsize/,/.} qlength=${qlength/,/.} # Comparing the result and setting the correct level: if [ "$warning" -ne "99999" ]; then if ( [ "`echo "$qsize >= $warn_1" | bc`" == "1" ] || [ "`echo "$qlength >= $warn_2" | bc`" == "1" ] ); then STATE="WARNING" status=1 fi fi if [ "$critical" -ne "99999" ]; then if ( [ "`echo "$qsize >= $crit_1" | bc`" == "1" ] || [ "`echo "$qlength >= $crit_2" | bc`" == "1" ] ); then STATE="CRITICAL" status=2 fi fi # Printing the results: MSG="$STATE - Disk Queue Stats: Average Request Size=$qsize Average Queue Length=$qlength" PERFDATA=" | qsize=$qsize;$warn_1;$crit_1; queue_length=$qlength;$warn_2;$crit_2;" fi #------------Queue Test End------------- #------------Wait Time Test------------- #Parse values. Warning - svc time will soon be deprecated and these will need to be changed. Future parser could look at first line (labels) to suggest correct column to return if [ "$waittime" == "1" ]; then avgwait=`echo "$TMPX" | awk '{print $10}'` avgrwait=`echo "$TMPX" | awk '{print $11}'` avgwwait=`echo "$TMPX" | awk '{print $12}'` avgsvctime=`echo "$TMPX" | awk '{print $13}'` avgcpuutil=`echo "$TMPX" | awk '{print $14}'` # "Converting" values to float (string replace , with .) avgwait=${avgwait/,/.} avgrwait=${avgrwait/,/.} avgwwait=${avgwwait/,/.} avgsvctime=${avgsvctime/,/.} avgcpuutil=${avgcpuutil/,/.} # Comparing the result and setting the correct level: if [ "$warning" -ne "99999" ]; then if ( [ "`echo "$avgwait >= $warn_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $warn_2" | bc -q`" == "1" ] || [ "`echo "$avgwwait >= $warn_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $warn_4" | bc -q`" == "1" ] || [ "`echo "$avgcpuutil >= $warn_5" | bc`" == "1" ] ); then STATE="WARNING" status=1 fi fi if [ "$critical" -ne "99999" ]; then if ( [ "`echo "$avgwait >= $crit_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $crit_2" | bc -q`" == "1" ] || [ "`echo "$avgwwait >= $crit_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $crit_4" | bc -q`" == "1" ] || [ "`echo "$avgcpuutil >= $crit_5" | bc`" == "1" ] ); then STATE="CRITICAL" status=2 fi fi # Printing the results: MSG="$STATE - Wait Time Stats: Avg I/O Wait Time (ms)=$avgwait Avg Read Wait Time (ms)=$avgrwait Avg Write Wait Time (ms)=$avgwwait Avg Service Wait Time (ms)=$avgsvctime Avg CPU Utilization=$avgcpuutil" PERFDATA=" | avg_io_waittime_ms=$avgwait;$warn_1;$crit_1; avg_r_waittime_ms=$avgrwait;$warn_2;$crit_2; avg_w_waittime_ms=$avgwwait;$warn_3;$crit_3; avg_service_waittime_ms=$avgsvctime;$warn_4;$crit_4; avg_cpu_utilization=$avgcpuutil;$warn_5;$crit_5;" fi #------------Wait Time End------------- # now output the official result echo -n "$MSG" if [ "x$printperfdata" == "x1" ]; then echo -n "$PERFDATA"; fi echo "" exit $status #----------/check_iostat.sh-----------



Fix for latest sysstat in Debian 11 (Bullseye)
by randomtask, December 31, 2021

Great plugin, but I did find one issue. When I upgraded one of my servers to Debian 11 this plugin stopped working. It appears to be an issue with an updated version of iostat. To fix just edit the following lines from around line 234. TMPX=$($iostat $disk -x -k -d 10 2 $samples | grep $disk | tail -1) TMPD=$($iostat $disk -k -d 10 2 $samples | grep $disk | tail -1) All I did was add a 2 after the 10 so that only 2 lines are returned to get the stats out. So far it seems to be working just fine. I hope this helps.



Updates to bash 4.4 mechanisms
by josephw, June 30, 2020

#!/bin/bash #----------check_iostat.sh----------- # # Version 0.0.2 - Jan/2009 # Changes: added device verification # # by Thiago Varela - thiago@iplenix.com # # Version 0.0.3 - Dec/2011 # Changes: # - changed values from bytes to mbytes # - fixed bug to get traffic data without comma but point # - current values are displayed now, not average values (first run of iostat) # # by Philipp Niedziela - pn@pn-it.com # # Version 0.0.4 - April/2014 # Changes: # - Allow Empty warn/crit levels # - Can check I/O, WAIT Time, or Queue # # by Warren Turner # # Version 0.0.5 - Jun/2014 # Changes: # - removed -y flag from call since iostat doesn't know about it any more (June 2014) # - only needed executions of iostat are done now (save cpu time whenever you can) # - fixed the obvious problems of missing input values (probably because of the now unimplemented "-y") with -x values # - made perfomance data optional (I like to have choice in the matter) # # by Frederic Krueger / fkrueger-dev-checkiostat@holics.at # # Version 0.0.6 - Jul/2014 # Changes: # - Cleaned up argument checking, removed excess iostat calls, steamlined if statements and renamed variables to fit current use # - Fixed all inputs to match current iostat output (Ubuntu 12.04) # - Changed to take last ten seconds as default (more useful for nagios usage). Will go to "since last reboot" (previous behaviour) on -g flag. # - added extra comments/whitespace etc to make add readability # # by Ben Field / ben.field@concreteplatform.com # # Version 0.0.7 - Sep/2014 # Changes: # - Fixed performance data for Wait check # # by Christian Westergard / christian.westergard@gmail.com # # Version 0.0.8 - Jan/2019 # Changes: # - Added Warn/Crit thresholds to performance output # # by Danny van Zunderd / danny_vz@live.nl # # Version 0.0.9 - Jun/2020 # Changes: # - Updated to use bash 4.4 mechanisms # # by Joseph Waggy / joseph.waggy@gmail.com iostat=$(which iostat 2>/dev/null) bc=$(which bc 2>/dev/null) help() { echo -e " Usage: -d = --Device to be checked. Example: "-d sda" Run only one of i, q, W: -i = IO Check Mode --Checks Total Transfers/sec, Read IO/Sec, Write IO/Sec, Bytes Read/Sec, Bytes Written/Sec --warning/critical = Total Transfers/sec,Read IO/Sec,Write IO/Sec,Bytes Read/Sec,Bytes Written/Sec -q = Queue Mode --Checks Disk Queue Lengths --warning/critial = Average size of requests, Queue length of requests -W = Wait Time Mode --Check the time for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them. --warning/critical = Avg I/O Wait Time (ms), Avg Read Wait Time (ms), Avg Write Wait Time (ms), Avg Service Wait Time (ms), Avg CPU Utilization -w,-c = pass warning and critical levels respectively. These are not required, but with out them, all queries will return as OK. -p = Provide performance data for later graphing -g = Since last reboot for system (more for debugging that nagios use!) -h = This help " } # Ensuring we have the needed tools: if [[ ! -f $iostat ]] || [[ ! -f $bc ]]; then echo -e "ERROR: You must have iostat and bc installed in order to run this pluginntuse: apt-get install systat bcn" exit -1 fi io=0 queue=0 waittime=0 printperfdata=0 STATE="OK" samples=2i status=0 MSG="" PERFDATA="" #------------Argument Set------------- while getopts "d:w:c:ipqWhg" OPT; do case $OPT in "d") disk=$OPTARG ;; "w") warning=$OPTARG ;; "c") critical=$OPTARG ;; "i") io=1 ;; "p") printperfdata=1 ;; "q") queue=1 ;; "W") waittime=1 ;; "g") samples=1 ;; "h") echo "help:" help exit 0 ;; ?) echo "Invalid option: -$OPTARG" >&2 help exit -1 ;; esac done # Autofill if parameters are empty if [[ -z "$disk" ]]; then disk=sda fi #Checks that only one query type is run if [[ $((io+queue+waittime)) -ne "1" ]]; then echo "ERROR: select one and only one run mode" help exit -1 fi #set warning and critical to insane value is empty, else set the individual values if [[ -z "$warning" ]]; then warning=99999 else #TPS with IO, Request size with queue warn_1=$(echo $warning | cut -d, -f1) #Read/s with IO,Queue Length with queue warn_2=$(echo $warning | cut -d, -f2) #Write/s with IO warn_3=$(echo $warning | cut -d, -f3) #KB/s read with IO warn_4=$(echo $warning | cut -d, -f4) #KB/s written with IO warn_5=$(echo $warning | cut -d, -f5) #Crude hack due to integer expression later in the script warning=1 fi if [[ -z "$critical" ]]; then critical=99999 else #TPS with IO, Request size with queue crit_1=$(echo $critical | cut -d, -f1) #Read/s with IO,Queue Length with queue crit_2=$(echo $critical | cut -d, -f2) #Write/s with IO crit_3=$(echo $critical | cut -d, -f3) #KB/s read with IO crit_4=$(echo $critical | cut -d, -f4) #KB/s written with IO crit_5=$(echo $critical | cut -d, -f5) #Crude hack due to integer expression later in the script critical=1 fi #------------Argument Set End------------- #------------Parameter Check------------- #Checks for sane Disk name: if [[ ! -b "/dev/$disk" ]]; then echo "ERROR: Device incorrectly specified" help exit -1 fi #Checks for sane warning/critical levels if [[ $warning -ne "99999" || $critical -ne "99999" ]]; then if [[ "$warn_1" -gt "$crit_1" || "$warn_2" -gt "$crit_2" ]]; then echo "ERROR: critical levels must be higher than warning levels" help exit -1 elif [[ $io -eq "1" || $waittime -eq "1" ]]; then if [[ "$warn_3" -gt "$crit_3" || "$warn_4" -gt "$crit_4" || "$warn_5" -gt "$crit_5" ]]; then echo "ERROR: critical levels must be higher than warning levels" help exit -1 fi fi fi #------------Parameter Check End------------- # iostat parameters: # -m: megabytes # -k: kilobytes # first run of iostat shows statistics since last reboot, second one shows current vaules of hdd # -d is the duration for second run, -x the rest TMPX=$($iostat $disk -x -k -d 10 $samples | grep $disk | tail -1) #------------IO Test------------- if [[ "$io" == "1" ]]; then TMPD=$($iostat $disk -k -d 10 $samples | grep $disk | tail -1) #Requests per second: tps=$(echo "$TMPD" | awk '{print $2}') read_sec=$(echo "$TMPX" | awk '{print $4}') written_sec=$(echo "$TMPX" | awk '{print $5}') #Kb per second: kbytes_read_sec=$(echo "$TMPX" | awk '{print $6}') kbytes_written_sec=$(echo "$TMPX" | awk '{print $7}') # "Converting" values to float (string replace , with .) tps=${tps/,/.} read_sec=${read_sec/,/.} written_sec=${written_sec/,/.} kbytes_read_sec=${kbytes_read_sec/,/.} kbytes_written_sec=${kbytes_written_sec/,/.} # Comparing the result and setting the correct level: if [[ "$warning" -ne "99999" ]]; then if [[ "$(echo "$tps >= $warn_1" | bc)" == "1" || "$(echo "$read_sec >= $warn_2" | bc)" == "1" || "$(echo "$written_sec >= $warn_3" | bc)" == "1" || "$(echo "$kbytes_read_sec >= $warn_4" | bc -q)" == "1" || "$(echo "$kbytes_written_sec >= $warn_5" | bc)" == "1" ]]; then STATE="WARNING" status=1 fi fi if [[ "$critical" -ne "99999" ]]; then if [[ "$(echo "$tps >= $crit_1" | bc)" == "1" || "$(echo "$read_sec >= $crit_2" | bc -q)" == "1" || "$(echo "$written_sec >= $crit_3" | bc)" == "1" || "$(echo "$kbytes_read_sec >= $crit_4" | bc -q)" == "1" || "$(echo "$kbytes_written_sec >= $crit_5" | bc)" == "1" ]]; then STATE="CRITICAL" status=2 fi fi # Printing the results: MSG="$STATE - I/O stats: Transfers/Sec=$tps Read Requests/Sec=$read_sec Write Requests/Sec=$written_sec KBytes Read/Sec=$kbytes_read_sec KBytes_Written/Sec=$kbytes_written_sec" PERFDATA=" | total_io_sec'=$tps;$warn_1;$crit_1; read_io_sec=$read_sec;$warn_2;$crit_2; write_io_sec=$written_sec;$warn_3;$crit_3; kbytes_read_sec=$kbytes_read_sec;$warn_4;$crit_4; kbytes_written_sec=$kbytes_written_sec;$warn_5;$crit_5;" fi #------------IO Test End------------- #------------Queue Test------------- if [[ "$queue" == "1" ]]; then qsize=$(echo "$TMPX" | awk '{print $8}') qlength=$(echo "$TMPX" | awk '{print $9}') # "Converting" values to float (string replace , with .) qsize=${qsize/,/.} qlength=${qlength/,/.} # Comparing the result and setting the correct level: if [[ "$warning" -ne "99999" ]]; then if [[ "$(echo "$qsize >= $warn_1" | bc)" == "1" || "$(echo "$qlength >= $warn_2" | bc)" == "1" ]]; then STATE="WARNING" status=1 fi fi if [[ "$critical" -ne "99999" ]]; then if [[ "$(echo "$qsize >= $crit_1" | bc)" == "1" || "$(echo "$qlength >= $crit_2" | bc)" == "1" ]]; then STATE="CRITICAL" status=2 fi fi # Printing the results: MSG="$STATE - Disk Queue Stats: Average Request Size=$qsize Average Queue Length=$qlength" PERFDATA=" | qsize=$qsize;$warn_1;$crit_1; queue_length=$qlength;$warn_2;$crit_2;" fi #------------Queue Test End------------- #------------Wait Time Test------------- #Parse values. Warning - svc time will soon be deprecated and these will need to be changed. Future parser could look at first line (labels) to suggest correct column to return if [[ "$waittime" == "1" ]]; then avgwait=$(echo "$TMPX" | awk '{print $10}') avgrwait=$(echo "$TMPX" | awk '{print $11}') avgwwait=$(echo "$TMPX" | awk '{print $12}') avgsvctime=$(echo "$TMPX" | awk '{print $13}') avgcpuutil=$(echo "$TMPX" | awk '{print $14}') # "Converting" values to float (string replace , with .) avgwait=${avgwait/,/.} avgrwait=${avgrwait/,/.} avgwwait=${avgwwait/,/.} avgsvctime=${avgsvctime/,/.} avgcpuutil=${avgcpuutil/,/.} # Comparing the result and setting the correct level: if [[ "$warning" -ne "99999" ]]; then if [[ "$(echo "$avgwait >= $warn_1" | bc)" == "1" || "$(echo "$avgrwait >= $warn_2" | bc -q)" == "1" || "$(echo "$avgwwait >= $warn_3" | bc)" == "1" || "$(echo "$avgsvctime >= $warn_4" | bc -q)" == "1" || "$(echo "$avgcpuutil >= $warn_5" | bc)" == "1" ]]; then STATE="WARNING" status=1 fi fi if [[ "$critical" -ne "99999" ]]; then if [[ "$(echo "$avgwait >= $crit_1" | bc)" == "1" || "$(echo "$avgrwait >= $crit_2" | bc -q)" == "1" || "$(echo "$avgwwait >= $crit_3" | bc)" == "1" || "$(echo "$avgsvctime >= $crit_4" | bc -q)" == "1" || "$(echo "$avgcpuutil >= $crit_5" | bc)" == "1" ]]; then STATE="CRITICAL" status=2 fi fi # Printing the results: MSG="$STATE - Wait Time Stats: Avg I/O Wait Time (ms)=$avgwait Avg Read Wait Time (ms)=$avgrwait Avg Write Wait Time (ms)=$avgwwait Avg Service Wait Time (ms)=$avgsvctime Avg CPU Utilization=$avgcpuutil" PERFDATA=" | avg_io_waittime_ms=$avgwait;$warn_1;$crit_1; avg_r_waittime_ms=$avgrwait;$warn_2;$crit_2; avg_w_waittime_ms=$avgwwait;$warn_3;$crit_3; avg_service_waittime_ms=$avgsvctime;$warn_4;$crit_4; avg_cpu_utilization=$avgcpuutil;$warn_5;$crit_5;" fi #------------Wait Time End------------- # now output the official result echo -n "$MSG" if [[ "x$printperfdata" == "x1" ]]; then echo -n "$PERFDATA" fi echo "" exit $status #----------/check_iostat.sh-----------



Warn/Crit in performance
by dvzunderd, January 31, 2019

I've added the warning/critical thresholds to the performance data. #!/bin/bash #----------check_iostat.sh----------- # # Version 0.0.2 - Jan/2009 # Changes: added device verification # # by Thiago Varela - thiago@iplenix.com # # Version 0.0.3 - Dec/2011 # Changes: # - changed values from bytes to mbytes # - fixed bug to get traffic data without comma but point # - current values are displayed now, not average values (first run of iostat) # # by Philipp Niedziela - pn@pn-it.com # # Version 0.0.4 - April/2014 # Changes: # - Allow Empty warn/crit levels # - Can check I/O, WAIT Time, or Queue # # by Warren Turner # # Version 0.0.5 - Jun/2014 # Changes: # - removed -y flag from call since iostat doesn't know about it any more (June 2014) # - only needed executions of iostat are done now (save cpu time whenever you can) # - fixed the obvious problems of missing input values (probably because of the now unimplemented "-y") with -x values # - made perfomance data optional (I like to have choice in the matter) # # by Frederic Krueger / fkrueger-dev-checkiostat@holics.at # # Version 0.0.6 - Jul/2014 # Changes: # - Cleaned up argument checking, removed excess iostat calls, steamlined if statements and renamed variables to fit current use # - Fixed all inputs to match current iostat output (Ubuntu 12.04) # - Changed to take last ten seconds as default (more useful for nagios usage). Will go to "since last reboot" (previous behaviour) on -g flag. # - added extra comments/whitespace etc to make add readability # # by Ben Field / ben.field@concreteplatform.com # # Version 0.0.7 - Sep/2014 # Changes: # - Fixed performance data for Wait check # # by Christian Westergard / christian.westergard@gmail.com # # Version 0.0.8 - Jan/2019 # Changes: # - Added Warn/Crit thresholds to performance output # # by Danny van Zunderd / danny_vz@live.nl iostat=`which iostat 2>/dev/null` bc=`which bc 2>/dev/null` function help { echo -e " Usage: -d = --Device to be checked. Example: "-d sda" Run only one of i, q, W: -i = IO Check Mode --Checks Total Transfers/sec, Read IO/Sec, Write IO/Sec, Bytes Read/Sec, Bytes Written/Sec --warning/critical = Total Transfers/sec,Read IO/Sec,Write IO/Sec,Bytes Read/Sec,Bytes Written/Sec -q = Queue Mode --Checks Disk Queue Lengths --warning/critial = Average size of requests, Queue length of requests -W = Wait Time Mode --Check the time for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them. --warning/critical = Avg I/O Wait Time (ms), Avg Read Wait Time (ms), Avg Write Wait Time (ms), Avg Service Wait Time (ms), Avg CPU Utilization -w,-c = pass warning and critical levels respectively. These are not required, but with out them, all queries will return as OK. -p = Provide performance data for later graphing -g = Since last reboot for system (more for debugging that nagios use!) -h = This help " exit -1 } # Ensuring we have the needed tools: ( [ ! -f $iostat ] || [ ! -f $bc ] ) && ( echo "ERROR: You must have iostat and bc installed in order to run this pluginntuse: apt-get install systat bcn" && exit -1 ) io=0 queue=0 waittime=0 printperfdata=0 STATE="OK" samples=2i status=0 MSG="" PERFDATA="" #------------Argument Set------------- while getopts "d:w:c:ipqWhg" OPT; do case $OPT in "d") disk=$OPTARG;; "w") warning=$OPTARG;; "c") critical=$OPTARG;; "i") io=1;; "p") printperfdata=1;; "q") queue=1;; "W") waittime=1;; "g") samples=1;; "h") echo "help:" && help;; ?) echo "Invalid option: -$OPTARG" >&2 exit -1 ;; esac done # Autofill if parameters are empty if [ -z "$disk" ] then disk=sda fi #Checks that only one query type is run [[ `expr $io+$queue+$waittime` -ne "1" ]] && echo "ERROR: select one and only one run mode" && help #set warning and critical to insane value is empty, else set the individual values if [ -z "$warning" ] then warning=99999 else #TPS with IO, Request size with queue warn_1=`echo $warning | cut -d, -f1` #Read/s with IO,Queue Length with queue warn_2=`echo $warning | cut -d, -f2` #Write/s with IO warn_3=`echo $warning | cut -d, -f3` #KB/s read with IO warn_4=`echo $warning | cut -d, -f4` #KB/s written with IO warn_5=`echo $warning | cut -d, -f5` #Crude hack due to integer expression later in the script warning=1 fi if [ -z "$critical" ] then critical=99999 else #TPS with IO, Request size with queue crit_1=`echo $critical | cut -d, -f1` #Read/s with IO,Queue Length with queue crit_2=`echo $critical | cut -d, -f2` #Write/s with IO crit_3=`echo $critical | cut -d, -f3` #KB/s read with IO crit_4=`echo $critical | cut -d, -f4` #KB/s written with IO crit_5=`echo $critical | cut -d, -f5` #Crude hack due to integer expression later in the script critical=1 fi #------------Argument Set End------------- #------------Parameter Check------------- #Checks for sane Disk name: [ ! -b "/dev/$disk" ] && echo "ERROR: Device incorrectly specified" && help #Checks for sane warning/critical levels if ( [[ $warning -ne "99999" ]] || [[ $critical -ne "99999" ]] ); then if ( [[ "$warn_1" -gt "$crit_1" ]] || [[ "$warn_2" -gt "$crit_2" ]] ); then echo "ERROR: critical levels must be higher than warning levels" && help elif ( [[ $io -eq "1" ]] || [[ $waittime -eq "1" ]] ); then if ( [[ "$warn_3" -gt "$crit_3" ]] || [[ "$warn_4" -gt "$crit_4" ]] || [[ "$warn_5" -gt "$crit_5" ]] ); then echo "ERROR: critical levels must be higher than warning levels" && help fi fi fi #------------Parameter Check End------------- # iostat parameters: # -m: megabytes # -k: kilobytes # first run of iostat shows statistics since last reboot, second one shows current vaules of hdd # -d is the duration for second run, -x the rest TMPX=`$iostat $disk -x -k -d 10 $samples | grep $disk | tail -1` #------------IO Test------------- if [ "$io" == "1" ]; then TMPD=`$iostat $disk -k -d 10 $samples | grep $disk | tail -1` #Requests per second: tps=`echo "$TMPD" | awk '{print $2}'` read_sec=`echo "$TMPX" | awk '{print $4}'` written_sec=`echo "$TMPX" | awk '{print $5}'` #Kb per second: kbytes_read_sec=`echo "$TMPX" | awk '{print $6}'` kbytes_written_sec=`echo "$TMPX" | awk '{print $7}'` # "Converting" values to float (string replace , with .) tps=${tps/,/.} read_sec=${read_sec/,/.} written_sec=${written_sec/,/.} kbytes_read_sec=${kbytes_read_sec/,/.} kbytes_written_sec=${kbytes_written_sec/,/.} # Comparing the result and setting the correct level: if [ "$warning" -ne "99999" ]; then if ( [ "`echo "$tps >= $warn_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_2" | bc`" == "1" ] || [ "`echo "$written_sec >= $warn_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $warn_4" | bc -q`" == "1" ] || [ "`echo "$kbytes_written_sec >= $warn_5" | bc`" == "1" ] ); then STATE="WARNING" status=1 fi fi if [ "$critical" -ne "99999" ]; then if ( [ "`echo "$tps >= $crit_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_2" | bc -q`" == "1" ] || [ "`echo "$written_sec >= $crit_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $crit_4" | bc -q`" == "1" ] || [ "`echo "$kbytes_written_sec >= $crit_5" | bc`" == "1" ] ); then STATE="CRITICAL" status=2 fi fi # Printing the results: MSG="$STATE - I/O stats: Transfers/Sec=$tps Read Requests/Sec=$read_sec Write Requests/Sec=$written_sec KBytes Read/Sec=$kbytes_read_sec KBytes_Written/Sec=$kbytes_written_sec" PERFDATA=" | total_io_sec'=$tps;$warn_1;$crit_1; read_io_sec=$read_sec;$warn_2;$crit_2; write_io_sec=$written_sec;$warn_3;$crit_3; kbytes_read_sec=$kbytes_read_sec;$warn_4;$crit_4; kbytes_written_sec=$kbytes_written_sec;$warn_5;$crit_5;" fi #------------IO Test End------------- #------------Queue Test------------- if [ "$queue" == "1" ]; then qsize=`echo "$TMPX" | awk '{print $8}'` qlength=`echo "$TMPX" | awk '{print $9}'` # "Converting" values to float (string replace , with .) qsize=${qsize/,/.} qlength=${qlength/,/.} # Comparing the result and setting the correct level: if [ "$warning" -ne "99999" ]; then if ( [ "`echo "$qsize >= $warn_1" | bc`" == "1" ] || [ "`echo "$qlength >= $warn_2" | bc`" == "1" ] ); then STATE="WARNING" status=1 fi fi if [ "$critical" -ne "99999" ]; then if ( [ "`echo "$qsize >= $crit_1" | bc`" == "1" ] || [ "`echo "$qlength >= $crit_2" | bc`" == "1" ] ); then STATE="CRITICAL" status=2 fi fi # Printing the results: MSG="$STATE - Disk Queue Stats: Average Request Size=$qsize Average Queue Length=$qlength" PERFDATA=" | qsize=$qsize;$warn_1;$crit_1; queue_length=$qlength;$warn_2;$crit_2;" fi #------------Queue Test End------------- #------------Wait Time Test------------- #Parse values. Warning - svc time will soon be deprecated and these will need to be changed. Future parser could look at first line (labels) to suggest correct column to return if [ "$waittime" == "1" ]; then avgwait=`echo "$TMPX" | awk '{print $10}'` avgrwait=`echo "$TMPX" | awk '{print $11}'` avgwwait=`echo "$TMPX" | awk '{print $12}'` avgsvctime=`echo "$TMPX" | awk '{print $13}'` avgcpuutil=`echo "$TMPX" | awk '{print $14}'` # "Converting" values to float (string replace , with .) avgwait=${avgwait/,/.} avgrwait=${avgrwait/,/.} avgwwait=${avgwwait/,/.} avgsvctime=${avgsvctime/,/.} avgcpuutil=${avgcpuutil/,/.} # Comparing the result and setting the correct level: if [ "$warning" -ne "99999" ]; then if ( [ "`echo "$avgwait >= $warn_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $warn_2" | bc -q`" == "1" ] || [ "`echo "$avgwwait >= $warn_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $warn_4" | bc -q`" == "1" ] || [ "`echo "$avgcpuutil >= $warn_5" | bc`" == "1" ] ); then STATE="WARNING" status=1 fi fi if [ "$critical" -ne "99999" ]; then if ( [ "`echo "$avgwait >= $crit_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $crit_2" | bc -q`" == "1" ] || [ "`echo "$avgwwait >= $crit_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $crit_4" | bc -q`" == "1" ] || [ "`echo "$avgcpuutil >= $crit_5" | bc`" == "1" ] ); then STATE="CRITICAL" status=2 fi fi # Printing the results: MSG="$STATE - Wait Time Stats: Avg I/O Wait Time (ms)=$avgwait Avg Read Wait Time (ms)=$avgrwait Avg Write Wait Time (ms)=$avgwwait Avg Service Wait Time (ms)=$avgsvctime Avg CPU Utilization=$avgcpuutil" PERFDATA=" | avg_io_waittime_ms=$avgwait;$warn_1;$crit_1; avg_r_waittime_ms=$avgrwait;$warn_2;$crit_2; avg_w_waittime_ms=$avgwwait;$warn_3;$crit_3; avg_service_waittime_ms=$avgsvctime;$warn_4;$crit_4; avg_cpu_utilization=$avgcpuutil;$warn_5;$crit_5;" fi #------------Wait Time End------------- # now output the official result echo -n "$MSG" if [ "x$printperfdata" == "x1" ]; then echo -n "$PERFDATA"; fi echo "" exit $status #----------/check_iostat.sh-----------



Small fix
by savv3, September 30, 2014

Fixed performance data for Wait check. Wasn't displaying any data. #!/bin/bash #----------check_iostat.sh----------- # # Version 0.0.2 - Jan/2009 # Changes: added device verification # # by Thiago Varela - thiago@iplenix.com # # Version 0.0.3 - Dec/2011 # Changes: # - changed values from bytes to mbytes # - fixed bug to get traffic data without comma but point # - current values are displayed now, not average values (first run of iostat) # # by Philipp Niedziela - pn@pn-it.com # # Version 0.0.4 - April/2014 # Changes: # - Allow Empty warn/crit levels # - Can check I/O, WAIT Time, or Queue # # by Warren Turner # # Version 0.0.5 - Jun/2014 # Changes: # - removed -y flag from call since iostat doesn't know about it any more (June 2014) # - only needed executions of iostat are done now (save cpu time whenever you can) # - fixed the obvious problems of missing input values (probably because of the now unimplemented "-y") with -x values # - made perfomance data optional (I like to have choice in the matter) # # by Frederic Krueger / fkrueger-dev-checkiostat@holics.at # # Version 0.0.6 - Jul/2014 # Changes: # - Cleaned up argument checking, removed excess iostat calls, steamlined if statements and renamed variables to fit current use # - Fixed all inputs to match current iostat output (Ubuntu 12.04) # - Changed to take last ten seconds as default (more useful for nagios usage). Will go to "since last reboot" (previous behaviour) on -g flag. # - added extra comments/whitespace etc to make add readability # # by Ben Field / ben.field@concreteplatform.com # # Version 0.0.7 - Sep/2014 # Changes: # - Fixed performance data for Wait check # # by Christian Westergard / christian.westergard@gmail.com # iostat=`which iostat 2>/dev/null` bc=`which bc 2>/dev/null` function help { echo -e " Usage: -d = --Device to be checked. Example: "-d sda" Run only one of i, q, W: -i = IO Check Mode --Checks Total Transfers/sec, Read IO/Sec, Write IO/Sec, Bytes Read/Sec, Bytes Written/Sec --warning/critical = Total Transfers/sec,Read IO/Sec,Write IO/Sec,Bytes Read/Sec,Bytes Written/Sec -q = Queue Mode --Checks Disk Queue Lengths --warning/critial = Average size of requests, Queue length of requests -W = Wait Time Mode --Check the time for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them. --warning/critical = Avg I/O Wait Time (ms), Avg Read Wait Time (ms), Avg Write Wait Time (ms), Avg Service Wait Time (ms), Avg CPU Utilization -w,-c = pass warning and critical levels respectively. These are not required, but with out them, all queries will return as OK. -p = Provide performance data for later graphing -g = Since last reboot for system (more for debugging that nagios use!) -h = This help " exit -1 } # Ensuring we have the needed tools: ( [ ! -f $iostat ] || [ ! -f $bc ] ) && ( echo "ERROR: You must have iostat and bc installed in order to run this pluginntuse: apt-get install systat bcn" && exit -1 ) io=0 queue=0 waittime=0 printperfdata=0 STATE="OK" samples=2i status=0 MSG="" PERFDATA="" #------------Argument Set------------- while getopts "d:w:c:ipqWhg" OPT; do case $OPT in "d") disk=$OPTARG;; "w") warning=$OPTARG;; "c") critical=$OPTARG;; "i") io=1;; "p") printperfdata=1;; "q") queue=1;; "W") waittime=1;; "g") samples=1;; "h") echo "help:" && help;; ?) echo "Invalid option: -$OPTARG" >&2 exit -1 ;; esac done # Autofill if parameters are empty if [ -z "$disk" ] then disk=sda fi #Checks that only one query type is run [[ `expr $io+$queue+$waittime` -ne "1" ]] && echo "ERROR: select one and only one run mode" && help #set warning and critical to insane value is empty, else set the individual values if [ -z "$warning" ] then warning=99999 else #TPS with IO, Request size with queue warn_1=`echo $warning | cut -d, -f1` #Read/s with IO,Queue Length with queue warn_2=`echo $warning | cut -d, -f2` #Write/s with IO warn_3=`echo $warning | cut -d, -f3` #KB/s read with IO warn_4=`echo $warning | cut -d, -f4` #KB/s written with IO warn_5=`echo $warning | cut -d, -f5` #Crude hack due to integer expression later in the script warning=1 fi if [ -z "$critical" ] then critical=99999 else #TPS with IO, Request size with queue crit_1=`echo $critical | cut -d, -f1` #Read/s with IO,Queue Length with queue crit_2=`echo $critical | cut -d, -f2` #Write/s with IO crit_3=`echo $critical | cut -d, -f3` #KB/s read with IO crit_4=`echo $critical | cut -d, -f4` #KB/s written with IO crit_5=`echo $critical | cut -d, -f5` #Crude hack due to integer expression later in the script critical=1 fi #------------Argument Set End------------- #------------Parameter Check------------- #Checks for sane Disk name: [ ! -b "/dev/$disk" ] && echo "ERROR: Device incorrectly specified" && help #Checks for sane warning/critical levels if ( [[ $warning -ne "99999" ]] || [[ $critical -ne "99999" ]] ); then if ( [[ "$warn_1" -gt "$crit_1" ]] || [[ "$warn_2" -gt "$crit_2" ]] ); then echo "ERROR: critical levels must be higher than warning levels" && help elif ( [[ $io -eq "1" ]] || [[ $waittime -eq "1" ]] ); then if ( [[ "$warn_3" -gt "$crit_3" ]] || [[ "$warn_4" -gt "$crit_4" ]] || [[ "$warn_5" -gt "$crit_5" ]] ); then echo "ERROR: critical levels must be higher than warning levels" && help fi fi fi #------------Parameter Check End------------- # iostat parameters: # -m: megabytes # -k: kilobytes # first run of iostat shows statistics since last reboot, second one shows current vaules of hdd # -d is the duration for second run, -x the rest TMPX=`$iostat $disk -x -k -d 10 $samples | grep $disk | tail -1` #------------IO Test------------- if [ "$io" == "1" ]; then TMPD=`$iostat $disk -k -d 10 $samples | grep $disk | tail -1` #Requests per second: tps=`echo "$TMPD" | awk '{print $2}'` read_sec=`echo "$TMPX" | awk '{print $4}'` written_sec=`echo "$TMPX" | awk '{print $5}'` #Kb per second: kbytes_read_sec=`echo "$TMPX" | awk '{print $6}'` kbytes_written_sec=`echo "$TMPX" | awk '{print $7}'` # "Converting" values to float (string replace , with .) tps=${tps/,/.} read_sec=${read_sec/,/.} written_sec=${written_sec/,/.} kbytes_read_sec=${kbytes_read_sec/,/.} kbytes_written_sec=${kbytes_written_sec/,/.} # Comparing the result and setting the correct level: if [ "$warning" -ne "99999" ]; then if ( [ "`echo "$tps >= $warn_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_2" | bc`" == "1" ] || [ "`echo "$written_sec >= $warn_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $warn_4" | bc -q`" == "1" ] || [ "`echo "$kbytes_written_sec >= $warn_5" | bc`" == "1" ] ); then STATE="WARNING" status=1 fi fi if [ "$critical" -ne "99999" ]; then if ( [ "`echo "$tps >= $crit_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_2" | bc -q`" == "1" ] || [ "`echo "$written_sec >= $crit_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $crit_4" | bc -q`" == "1" ] || [ "`echo "$kbytes_written_sec >= $crit_5" | bc`" == "1" ] ); then STATE="CRITICAL" status=2 fi fi # Printing the results: MSG="$STATE - I/O stats: Transfers/Sec=$tps Read Requests/Sec=$read_sec Write Requests/Sec=$written_sec KBytes Read/Sec=$kbytes_read_sec KBytes_Written/Sec=$kbytes_written_sec" PERFDATA=" | total_io_sec'=$tps; read_io_sec=$read_sec; write_io_sec=$written_sec; kbytes_read_sec=$kbytes_read_sec; kbytes_written_sec=$kbytes_written_sec;" fi #------------IO Test End------------- #------------Queue Test------------- if [ "$queue" == "1" ]; then qsize=`echo "$TMPX" | awk '{print $8}'` qlength=`echo "$TMPX" | awk '{print $9}'` # "Converting" values to float (string replace , with .) qsize=${qsize/,/.} qlength=${qlength/,/.} # Comparing the result and setting the correct level: if [ "$warning" -ne "99999" ]; then if ( [ "`echo "$qsize >= $warn_1" | bc`" == "1" ] || [ "`echo "$qlength >= $warn_2" | bc`" == "1" ] ); then STATE="WARNING" status=1 fi fi if [ "$critical" -ne "99999" ]; then if ( [ "`echo "$qsize >= $crit_1" | bc`" == "1" ] || [ "`echo "$qlength >= $crit_2" | bc`" == "1" ] ); then STATE="CRITICAL" status=2 fi fi # Printing the results: MSG="$STATE - Disk Queue Stats: Average Request Size=$qsize Average Queue Length=$qlength" PERFDATA=" | qsize=$qsize; queue_length=$qlength;" fi #------------Queue Test End------------- #------------Wait Time Test------------- #Parse values. Warning - svc time will soon be deprecated and these will need to be changed. Future parser could look at first line (labels) to suggest correct column to return if [ "$waittime" == "1" ]; then avgwait=`echo "$TMPX" | awk '{print $10}'` avgrwait=`echo "$TMPX" | awk '{print $11}'` avgwwait=`echo "$TMPX" | awk '{print $12}'` avgsvctime=`echo "$TMPX" | awk '{print $13}'` avgcpuutil=`echo "$TMPX" | awk '{print $14}'` # "Converting" values to float (string replace , with .) avgwait=${avgwait/,/.} avgrwait=${avgrwait/,/.} avgwwait=${avgwwait/,/.} avgsvctime=${avgsvctime/,/.} avgcpuutil=${avgcpuutil/,/.} # Comparing the result and setting the correct level: if [ "$warning" -ne "99999" ]; then if ( [ "`echo "$avgwait >= $warn_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $warn_2" | bc -q`" == "1" ] || [ "`echo "$avgwwait >= $warn_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $warn_4" | bc -q`" == "1" ] || [ "`echo "$avgcpuutil >= $warn_5" | bc`" == "1" ] ); then STATE="WARNING" status=1 fi fi if [ "$critical" -ne "99999" ]; then if ( [ "`echo "$avgwait >= $crit_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $crit_2" | bc -q`" == "1" ] || [ "`echo "$avgwwait >= $crit_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $crit_4" | bc -q`" == "1" ] || [ "`echo "$avgcpuutil >= $crit_5" | bc`" == "1" ] ); then STATE="CRITICAL" status=2 fi fi # Printing the results: MSG="$STATE - Wait Time Stats: Avg I/O Wait Time (ms)=$avgwait Avg Read Wait Time (ms)=$avgrwait Avg Write Wait Time (ms)=$avgwwait Avg Service Wait Time (ms)=$avgsvctime Avg CPU Utilization=$avgcpuutil" PERFDATA=" | avg_io_waittime_ms=$avgwait; avg_r_waittime_ms=$avgrwait; avg_w_waittime_ms=$avgwwait; avg_service_waittime_ms=$avgsvctime; avg_cpu_utilization=$avgcpuutil;" fi #------------Wait Time End------------- # now output the official result echo -n "$MSG" if [ "x$printperfdata" == "x1" ]; then echo -n "$PERFDATA"; fi echo "" exit $status #----------/check_iostat.sh-----------



Version for Ubuntu 12.04/systat 10.0.3.1
by benjfield, July 31, 2014

I have changed the script to work with the above system and cleaned it up a fair amount. Someone might want to have a look at parsing the inputs using the column names rather than column numbers in the future: #!/bin/bash #----------check_iostat.sh----------- # # Version 0.0.2 - Jan/2009 # Changes: added device verification # # by Thiago Varela - thiago@iplenix.com # # Version 0.0.3 - Dec/2011 # Changes: # - changed values from bytes to mbytes # - fixed bug to get traffic data without comma but point # - current values are displayed now, not average values (first run of iostat) # # by Philipp Niedziela - pn@pn-it.com # # Version 0.0.4 - April/2014 # Changes: # - Allow Empty warn/crit levels # - Can check I/O, WAIT Time, or Queue # # by Warren Turner # # Version 0.0.5 - Jun/2014 # Changes: # - removed -y flag from call since iostat doesn't know about it any more (June 2014) # - only needed executions of iostat are done now (save cpu time whenever you can) # - fixed the obvious problems of missing input values (probably because of the now unimplemented "-y") with -x values # - made perfomance data optional (I like to have choice in the matter) # # by Frederic Krueger / fkrueger-dev-checkiostat@holics.at # # Version 0.0.6 - Jul/2014 # Changes: # - Cleaned up argument checking, removed excess iostat calls, steamlined if statements and renamed variables to fit current use # - Fixed all inputs to match current iostat output (Ubuntu 12.04) # - Changed to take last ten seconds as default (more useful for nagios usage). Will go to "since last reboot" (previous behaviour) on -g flag. # - added extra comments/whitespace etc to make add readability # # by Ben Field / ben.field@concreteplatform.com iostat=`which iostat 2>/dev/null` bc=`which bc 2>/dev/null` function help { echo -e " Usage: -d = --Device to be checked. Example: "-d sda" Run only one of i, q, W: -i = IO Check Mode --Checks Total Transfers/sec, Read IO/Sec, Write IO/Sec, Bytes Read/Sec, Bytes Written/Sec --warning/critical = Total Transfers/sec,Read IO/Sec,Write IO/Sec,Bytes Read/Sec,Bytes Written/Sec -q = Queue Mode --Checks Disk Queue Lengths --warning/critial = Average size of requests, Queue length of requests -W = Wait Time Mode --Check the time for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them. --warning/critical = Avg I/O Wait Time (ms), Avg Read Wait Time (ms), Avg Write Wait Time (ms), Avg Service Wait Time (ms), Avg CPU Utilization -w,-c = pass warning and critical levels respectively. These are not required, but with out them, all queries will return as OK. -p = Provide performance data for later graphing -g = Since last reboot for system (more for debugging that nagios use!) -h = This help " exit -1 } # Ensuring we have the needed tools: ( [ ! -f $iostat ] || [ ! -f $bc ] ) && ( echo "ERROR: You must have iostat and bc installed in order to run this pluginntuse: apt-get install systat bcn" && exit -1 ) io=0 queue=0 waittime=0 printperfdata=0 STATE="OK" samples=2i status=0 MSG="" PERFDATA="" #------------Argument Set------------- while getopts "d:w:c:ipqWhg" OPT; do case $OPT in "d") disk=$OPTARG;; "w") warning=$OPTARG;; "c") critical=$OPTARG;; "i") io=1;; "p") printperfdata=1;; "q") queue=1;; "W") waittime=1;; "g") samples=1;; "h") echo "help:" && help;; ?) echo "Invalid option: -$OPTARG" >&2 exit -1 ;; esac done # Autofill if parameters are empty if [ -z "$disk" ] then disk=sda fi #Checks that only one query type is run [[ `expr $io+$queue+$waittime` -ne "1" ]] && echo "ERROR: select one and only one run mode" && help #set warning and critical to insane value is empty, else set the individual values if [ -z "$warning" ] then warning=99999 else #TPS with IO, Request size with queue warn_1=`echo $warning | cut -d, -f1` #Read/s with IO,Queue Length with queue warn_2=`echo $warning | cut -d, -f2` #Write/s with IO warn_3=`echo $warning | cut -d, -f3` #KB/s read with IO warn_4=`echo $warning | cut -d, -f4` #KB/s written with IO warn_5=`echo $warning | cut -d, -f5` #Crude hack due to integer expression later in the script warning=1 fi if [ -z "$critical" ] then critical=99999 else #TPS with IO, Request size with queue crit_1=`echo $critical | cut -d, -f1` #Read/s with IO,Queue Length with queue crit_2=`echo $critical | cut -d, -f2` #Write/s with IO crit_3=`echo $critical | cut -d, -f3` #KB/s read with IO crit_4=`echo $critical | cut -d, -f4` #KB/s written with IO crit_5=`echo $critical | cut -d, -f5` #Crude hack due to integer expression later in the script critical=1 fi #------------Argument Set End------------- #------------Parameter Check------------- #Checks for sane Disk name: [ ! -b "/dev/$disk" ] && echo "ERROR: Device incorrectly specified" && help #Checks for sane warning/critical levels if ( [[ $warning -ne "99999" ]] || [[ $critical -ne "99999" ]] ); then if ( [[ "$warn_1" -gt "$crit_1" ]] || [[ "$warn_2" -gt "$crit_2" ]] ); then echo "ERROR: critical levels must be higher than warning levels" && help elif ( [[ $io -eq "1" ]] || [[ $waittime -eq "1" ]] ); then if ( [[ "$warn_3" -gt "$crit_3" ]] || [[ "$warn_4" -gt "$crit_4" ]] || [[ "$warn_5" -gt "$crit_5" ]] ); then echo "ERROR: critical levels must be higher than warning levels" && help fi fi fi #------------Parameter Check End------------- # iostat parameters: # -m: megabytes # -k: kilobytes # first run of iostat shows statistics since last reboot, second one shows current vaules of hdd # -d is the duration for second run, -x the rest TMPX=`$iostat $disk -x -k -d 10 $samples | grep $disk | tail -1` #------------IO Test------------- if [ "$io" == "1" ]; then TMPD=`$iostat $disk -k -d 10 $samples | grep $disk | tail -1` #Requests per second: tps=`echo "$TMPD" | awk '{print $2}'` read_sec=`echo "$TMPX" | awk '{print $4}'` written_sec=`echo "$TMPX" | awk '{print $5}'` #Kb per second: kbytes_read_sec=`echo "$TMPX" | awk '{print $6}'` kbytes_written_sec=`echo "$TMPX" | awk '{print $7}'` # "Converting" values to float (string replace , with .) tps=${tps/,/.} read_sec=${read_sec/,/.} written_sec=${written_sec/,/.} kbytes_read_sec=${kbytes_read_sec/,/.} kbytes_written_sec=${kbytes_written_sec/,/.} # Comparing the result and setting the correct level: if [ "$warning" -ne "99999" ]; then if ( [ "`echo "$tps >= $warn_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_2" | bc`" == "1" ] || [ "`echo "$written_sec >= $warn_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $warn_4" | bc -q`" == "1" ] || [ "`echo "$kbytes_written_sec >= $warn_5" | bc`" == "1" ] ); then STATE="WARNING" status=1 fi fi if [ "$critical" -ne "99999" ]; then if ( [ "`echo "$tps >= $crit_1" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_2" | bc -q`" == "1" ] || [ "`echo "$written_sec >= $crit_3" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $crit_4" | bc -q`" == "1" ] || [ "`echo "$kbytes_written_sec >= $crit_5" | bc`" == "1" ] ); then STATE="CRITICAL" status=2 fi fi # Printing the results: MSG="$STATE - I/O stats: Transfers/Sec=$tps Read Requests/Sec=$read_sec Write Requests/Sec=$written_sec KBytes Read/Sec=$kbytes_read_sec KBytes_Written/Sec=$kbytes_written_sec" PERFDATA=" | total_io_sec'=$tps; read_io_sec=$read_sec; write_io_sec=$written_sec; kbytes_read_sec=$kbytes_read_sec; kbytes_written_sec=$kbytes_written_sec;" fi #------------IO Test End------------- #------------Queue Test------------- if [ "$queue" == "1" ]; then qsize=`echo "$TMPX" | awk '{print $8}'` qlength=`echo "$TMPX" | awk '{print $9}'` # "Converting" values to float (string replace , with .) qsize=${qsize/,/.} qlength=${qlength/,/.} # Comparing the result and setting the correct level: if [ "$warning" -ne "99999" ]; then if ( [ "`echo "$qsize >= $warn_1" | bc`" == "1" ] || [ "`echo "$qlength >= $warn_2" | bc`" == "1" ] ); then STATE="WARNING" status=1 fi fi if [ "$critical" -ne "99999" ]; then if ( [ "`echo "$qsize >= $crit_1" | bc`" == "1" ] || [ "`echo "$qlength >= $crit_2" | bc`" == "1" ] ); then STATE="CRITICAL" status=2 fi fi # Printing the results: MSG="$STATE - Disk Queue Stats: Average Request Size=$qsize Average Queue Length=$qlength" PERFDATA=" | qsize=$qsize; queue_length=$qlength;" fi #------------Queue Test End------------- #------------Wait Time Test------------- #Parse values. Warning - svc time will soon be deprecated and these will need to be changed. Future parser could look at first line (labels) to suggest correct column to return if [ "$waittime" == "1" ]; then avgwait=`echo "$TMPX" | awk '{print $10}'` avgrwait=`echo "$TMPX" | awk '{print $11}'` avgwwait=`echo "$TMPX" | awk '{print $12}'` avgsvctime=`echo "$TMPX" | awk '{print $13}'` avgcpuutil=`echo "$TMPX" | awk '{print $14}'` # "Converting" values to float (string replace , with .) avgwait=${avgwait/,/.} avgrwait=${avgrwait/,/.} avgwwait=${avgwwait/,/.} avgsvctime=${avgsvctime/,/.} avgcpuutil=${avgcpuutil/,/.} # Comparing the result and setting the correct level: if [ "$warning" -ne "99999" ]; then if ( [ "`echo "$avgwait >= $warn_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $warn_2" | bc -q`" == "1" ] || [ "`echo "$avgwwait >= $warn_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $warn_4" | bc -q`" == "1" ] || [ "`echo "$avgcpuutil >= $warn_5" | bc`" == "1" ] ); then STATE="WARNING" status=1 fi fi if [ "$critical" -ne "99999" ]; then if ( [ "`echo "$avgwait >= $crit_1" | bc`" == "1" ] || [ "`echo "$avgrwait >= $crit_2" | bc -q`" == "1" ] || [ "`echo "$avgwwait >= $crit_3" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $crit_4" | bc -q`" == "1" ] || [ "`echo "$avgcpuutil >= $crit_5" | bc`" == "1" ] ); then STATE="CRITICAL" status=2 fi fi # Printing the results: MSG="$STATE - Wait Time Stats: Avg I/O Wait Time (ms)=$avgwait Avg Read Wait Time (ms)=$avgrwait Avg Write Wait Time (ms)=$avgwwait Avg Service Wait Time (ms)=$avgsvctime Avg CPU Utilization=$avgcpuutil" PERFDATA=" | avg_io_waittime_ms=$avgiotime; avg_r_waittime_ms=$avgiotime; avg_w_waittime_ms=$avgiotime; avg_service_waittime_ms=$avgsvctime; avg_cpu_utilization=$avgcpuutil;" fi #------------Wait Time End------------- # now output the official result echo -n "$MSG" if [ "x$printperfdata" == "x1" ]; then echo -n "$PERFDATA"; fi echo "" exit $status #----------/check_iostat.sh-----------



Many fixes, works again.
by fkrueger, June 30, 2014

Hi, I had to do a few fixes and some (minor) clearing up compared to the 0.0.4 version posted here. The plugin works again now.. as for SElinux, I will find out once I created an RPM for our environment and do a testing rollout :-) Regards, Frederic ----------check_iostat.sh----------- #!/bin/bash # # Version 0.0.2 - Jan/2009 # Changes: added device verification # # by Thiago Varela - thiago@iplenix.com # # -------------------------------------- # # Version 0.0.3 - Dec/2011 # Changes: # - changed values from bytes to mbytes # - fixed bug to get traffic data without comma but point # - current values are displayed now, not average values (first run of iostat) # # by Philipp Niedziela - pn@pn-it.com # # Version 0.0.4 - April/2014 # Changes: # - Allow Empty warn/crit levels # - Can check I/O, WAIT Time, or Queue # # by Warren Turner # # Version 0.0.5 - Jun/2014 # Changes: # - removed -y flag from call since iostat doesn't know about it any more (June 2014) # - only needed executions of iostat are done now (save cpu time whenever you can) # - fixed the obvious problems of missing input values (probably because of the now unimplemented "-y") with -x values # - made perfomance data optional (I like to have choice in the matter) # # by Frederic Krueger / fkrueger-dev-checkiostat@holics.at # iostat=`which iostat 2>/dev/null` bc=`which bc 2>/dev/null` function help { echo -e " Usage: -d = --Device to be checked. Example: "-d sda" -i = IO Check Mode --Checks Total Disk IO, Read IO/Sec, Write IO/Sec, Bytes Read/Sec, Bytes Written/Sec --warning/critical = Total IO,Read IO/Sec,Write IO/Sec,Bytes Read/Sec,Bytes Written/Sec -q = Queue Mode --Checks Disk Queue Lengths --warning/critial = Total Queue Length,Read Queue Length,Write Queue Length -W = Wait Time Mode --Check the time for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them. --warning/critical = Avg I/O Wait Time/ms,Read Wait Time/ms,Write Wait Time/ms -p = Provide performance data for later graphing -h = This help " exit -1 } # Ensuring we have the needed tools: ( [ ! -f $iostat ] || [ ! -f $bc ] ) && ( echo "ERROR: You must have iostat and bc installed in order to run this pluginntuse: apt-get install systat bcn" && exit -1 ) io=0 queue=0 waittime=0 printperfdata=0 STATE="OK" MSG="" PERFDATA="" # Getting parameters: while getopts "d:w:c:io:pqu:Wt:h" OPT; do case $OPT in "d") disk=$OPTARG;; "w") warning=$OPTARG;; "c") critical=$OPTARG;; "i") io=1;; "p") printperfdata=1;; "q") queue=1;; "W") waittime=1;; "h") help;; esac done # Autofill if parameters are empty if [ -z "$disk" ] then disk=sda fi if [ -z "$warning" ] then warning=99999 fi if [ -z "$critical" ] then critical=99999 fi # Adjusting the warn and crit levels: crit_total=`echo $critical | cut -d, -f1` crit_read=`echo $critical | cut -d, -f2` crit_written=`echo $critical | cut -d, -f3` crit_kbytes_read=`echo $critical | cut -d, -f4` crit_kbytes_written=`echo $critical | cut -d, -f5` warn_total=`echo $warning | cut -d, -f1` warn_read=`echo $warning | cut -d, -f2` warn_written=`echo $warning | cut -d, -f3` warn_kbytes_read=`echo $warning | cut -d, -f4` warn_kbytes_written=`echo $warning | cut -d, -f5` ## # Checking parameters: # [ ! -b "/dev/$disk" ] && echo "ERROR: Device incorrectly specified" && help # ( [ "$warn_total" == "" ] || [ "$warn_read" == "" ] || [ "$warn_written" == "" ] || # [ "$crit_total" == "" ] || [ "$crit_read" == "" ] || [ "$crit_written" == "" ] ) && # echo "ERROR: You must specify all warning and critical levels" && help # ( [[ "$warn_total" -ge "$crit_total" ]] || # [[ "$warn_read" -ge "$crit_read" ]] || # [[ "$warn_written" -ge "$crit_written" ]] ) && # echo "ERROR: critical levels must be highter than warning levels" && help # iostat parameters: # -m: megabytes # -k: kilobytes # first run of iostat shows statistics since last reboot, second one shows current vaules of hdd # Doing the actual checks: # -d has the total per second, -x the rest TMPD=`$iostat $disk -k -d 2 1 | grep $disk` TMPX=`$iostat $disk -x -d 2 1 | grep $disk` ## IO Check ## if [ "$io" == "1" ] then total=`echo "$TMPD" | awk '{print $2}'` read_sec=`echo "$TMPX" | awk '{print $4}'` written_sec=`echo "$TMPX" | awk '{print $5}'` kbytes_read_sec=`echo "$TMPD" | awk '{print $6}'` kbytes_written_sec=`echo "$TMPD" | awk '{print $7}'` # IO # "Converting" values to float (string replace , with .) total=${total/,/.} read_sec=${read_sec/,/.} written_sec=${written_sec/,/.} kbytes_read_sec=${kbytes_read_sec/,/.} kbytes_written_sec=${kbytes_written_sec/,/.} # IO # Comparing the result and setting the correct level: if [ "$warn_total" -ne "99999" ] then if ( [ "`echo "$total >= $warn_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_read" | bc`" == "1" ] || [ "`echo "$written_sec >= $warn_written" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $warn_kbytes_read" | bc -q`" == "1" ] || [ "`echo "$kbytes_written_sec >= $warn_kybtes_written" | bc`" == "1" ] ) then STATE="WARNING" status=1 fi fi if [ "$crit_total" -ne "99999" ] then if ( [ "`echo "$total >= $crit_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_read" | bc -q`" == "1" ] || [ "`echo "$written_sec >= $crit_written" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $crit_kbytes_read" | bc -q`" == "1" ] || [ "`echo "$kbytes_written_sec >= $crit_kbytes_written" | bc`" == "1" ] ) then STATE="CRITICAL" status=2 fi fi if [ "$crit_total" == "99999" ] && [ "$warn_total" == "99999" ] then STATE="OK" status=0 fi # IO # Printing the results: MSG="$STATE - I/O stats: Total IO/Sec=$total Read IO/Sec=$read_sec Write IO/Sec=$written_sec KBytes Read/Sec=$kbytes_read_sec KBytes_Written/Sec=$kbytes_written_sec" PERFDATA=" | total_io_sec'=$total; read_io_sec=$read_sec; write_io_sec=$written_sec; kbytes_read_sec=$kbytes_read_sec; kbytes_written_sec=$kbytes_written_sec;" fi ## QUEUE Check ## if [ "$queue" == "1" ] then total=`echo "$TMPX" | awk '{print $8}'` readq_sec=`echo "$TMPX" | awk '{print $6}'` writtenq_sec=`echo "$TMPX" | awk '{print $7}'` # QUEUE # "Converting" values to float (string replace , with .) total=${total/,/.} readq_sec=${readq_sec/,/.} writtenq_sec=${writtenq_sec/,/.} # QUEUE # Comparing the result and setting the correct level: if [ "$warn_total" -ne "99999" ] then if ( [ "`echo "$total >= $warn_total" | bc`" == "1" ] || [ "`echo "$readq_sec >= $warn_read" | bc`" == "1" ] || [ "`echo "$writtenq_sec >= $warn_written" | bc`" == "1" ] ) then STATE="WARNING" status=1 fi fi if [ "$crit_total" -ne "99999" ] then if ( [ "`echo "$total >= $crit_total" | bc`" == "1" ] || [ "`echo "$readq_sec >= $crit_read" | bc -q`" == "1" ] || [ "`echo "$writtenq_sec >= $crit_written" | bc`" == "1" ] ) then STATE="CRITICAL" status=2 fi fi if [ "$crit_total" == "99999" ] && [ "$warn_total" == "99999" ] then STATE="OK" status=0 fi # QUEUE # Printing the results: MSG="$STATE - Disk Queue Stats: Average Queue Length=$total Read Queue/Sec=$readq_sec Write Queue/Sec=$writtenq_sec" PERFDATA=" | total=$total; read_queue_sec=$readq_sec; write_queue_sec=$writtenq_sec;" fi ## WAIT TIME Check ## if [ "$waittime" == "1" ] then TMP=`$iostat $disk -x -k -d 2 1 | grep $disk` avgiotime=`echo "$TMP" | awk '{print $10}'` avgsvctime=`echo "$TMP" | awk '{print $11}'` avgcpuutil=`echo "$TMP" | awk '{print $12}'` # QUEUE # "Converting" values to float (string replace , with .) avgiotime=${avgiotime/,/.} avgsvctime=${avgsvctime/,/.} avgcpuutil=${avgcpuutil/,/.} # WAIT TIME # Comparing the result and setting the correct level: if [ "$warn_total" -ne "99999" ] then if ( [ "`echo "$avgiotime >= $warn_total" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $warn_read" | bc`" == "1" ] || [ "`echo "$avgcpuutil >= $warn_written" | bc`" == "1" ] ) then STATE="WARNING" status=1 fi fi if [ "$crit_total" -ne "99999" ] then if ( [ "`echo "$avgiotime >= $crit_total" | bc`" == "1" ] || [ "`echo "$avgsvctime >= $crit_read" | bc -q`" == "1" ] || [ "`echo "$avgcpuutil >= $crit_written" | bc`" == "1" ] ) then STATE="CRITICAL" status=2 fi fi if [ "$crit_total" == "99999" ] && [ "$warn_total" == "99999" ] then STATE="OK" status=0 fi # WAIT TIME # Printing the results: MSG="$STATE - Wait Time Stats: Avg I/O Wait Time/ms=$avgiotime Avg Service Wait Time/ms=$avgsvctime Avg CPU Utilization=$avgcpuutil" PERFDATA=" | avg_io_waittime_ms=$avgiotime; avg_service_waittime_ms=$avgsvctime; avg_cpu_utilization=$avgcpuutil;" fi # now output the official result echo -n "$MSG" if [ "x$printperfdata" == "x1" ]; then echo -n "$PERFDATA"; fi echo "" exit $status ----------/check_iostat.sh-----------



Updated Script
by EndlessTundra, April 30, 2014

Hey Everyone, this script was very nice but it also had some weird irritations so I reworked it and added: - Allow empty Warning/Critical values - Added Modes so that you can check Disk IOs, Disk Queue, or Disk Wait Times - To see the usage information use check_diskio.sh -h Sorry I don't have this anywhere on the web so I'm just going to paste it here: #!/bin/bash # # Version 0.0.2 - Jan/2009 # Changes: added device verification # # by Thiago Varela - thiago@iplenix.com # # -------------------------------------- # # Version 0.0.3 - Dec/2011 # Changes: # - changed values from bytes to mbytes # - fixed bug to get traffic data without comma but point # - current values are displayed now, not average values (first run of iostat) # # by Philipp Niedziela - pn@pn-it.com # # Version 0.0.4 - April/2014 # Changes: # - Allow Empty warn/crit levels # - Can check I/O, WAIT Time, or Queue # # by Warren Turner iostat=`which iostat 2>/dev/null` bc=`which bc 2>/dev/null` function help { echo -e " Usage: -d = --Device to be checked. Example: "-d sda" -i = IO Check Mode --Checks Total Disk IO, Read IO/Sec, Write IO/Sec, Bytes Read/Sec, Bytes Written/Sec --warning/critical = Total IO,Read IO/Sec,Write IO/Sec,Bytes Read/Sec,Bytes Written/Sec -q = Queue Mode --Checks Disk Queue Lengths --warning/critial = Total Queue Length,Read Queue Length,Write Queue Length -W = Wait Time Mode --Check the time for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them. --warning/critical = Avg I/O Wait Time/ms,Read Wait Time/ms,Write Wait Time/ms " exit -1 } # Ensuring we have the needed tools: ( [ ! -f $iostat ] || [ ! -f $bc ] ) && ( echo "ERROR: You must have iostat and bc installed in order to run this pluginntuse: apt-get install systat bcn" && exit -1 ) io=0 queue=0 waittime=0 msg="OK" # Getting parameters: while getopts "d:w:c:io:qu:Wt:h" OPT; do case $OPT in "d") disk=$OPTARG;; "w") warning=$OPTARG;; "c") critical=$OPTARG;; "i") io=1;; "q") queue=1;; "W") waittime=1;; "h") help;; esac done # Autofill if parameters are empty if [ -z "$disk" ] then disk=sda fi if [ -z "$warning" ] then warning=99999 fi if [ -z "$critical" ] then critical=99999 fi # Adjusting the warn and crit levels: crit_total=`echo $critical | cut -d, -f1` crit_read=`echo $critical | cut -d, -f2` crit_written=`echo $critical | cut -d, -f3` crit_kbytes_read=`echo $critical | cut -d, -f4` crit_kbytes_written=`echo $critical | cut -d, -f5` warn_total=`echo $warning | cut -d, -f1` warn_read=`echo $warning | cut -d, -f2` warn_written=`echo $warning | cut -d, -f3` warn_kbytes_read=`echo $warning | cut -d, -f4` warn_kbytes_written=`echo $warning | cut -d, -f5` # # Checking parameters: # [ ! -b "/dev/$disk" ] && echo "ERROR: Device incorrectly specified" && help # ( [ "$warn_total" == "" ] || [ "$warn_read" == "" ] || [ "$warn_written" == "" ] || # [ "$crit_total" == "" ] || [ "$crit_read" == "" ] || [ "$crit_written" == "" ] ) && # echo "ERROR: You must specify all warning and critical levels" && help # ( [[ "$warn_total" -ge "$crit_total" ]] || # [[ "$warn_read" -ge "$crit_read" ]] || # [[ "$warn_written" -ge "$crit_written" ]] ) && # echo "ERROR: critical levels must be highter than warning levels" && help # iostat parameters: # -m: megabytes # -k: kilobytes # first run of iostat shows statistics since last reboot, second one shows current vaules of hdd # Doing the actual checks: ## IO Check ## if [ "$io" == "1" ] then total=`$iostat $disk -y -k -d 2 1 | grep $disk | awk '{print $2}'` read_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $4}'` written_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $5}'` kbytes_read_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $6}'` kbytes_written_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $7}'` # IO # "Converting" values to float (string replace , with .) total=${total/,/.} read_sec=${read_sec/,/.} written_sec=${written_sec/,/.} kbytes_read_sec=${kbytes_read_sec/,/.} kbytes_written_sec=${kbytes_written_sec/,/.} # IO # Comparing the result and setting the correct level: if [ "$warn_total" -ne "99999" ] then if ( [ "`echo "$total >= $warn_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_read" | bc`" == "1" ] || [ "`echo "$written_sec >= $warn_written" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $warn_kbytes_read" | bc -q`" == "1" ] || [ "`echo "$kbytes_written_sec >= $warn_kybtes_written" | bc`" == "1" ] ) then msg="WARNING" status=1 fi fi if [ "$crit_total" -ne "99999" ] then if ( [ "`echo "$total >= $crit_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_read" | bc -q`" == "1" ] || [ "`echo "$written_sec >= $crit_written" | bc`" == "1" ] || [ "`echo "$kbytes_read_sec >= $crit_kbytes_read" | bc -q`" == "1" ] || [ "`echo "$kbytes_written_sec >= $crit_kbytes_written" | bc`" == "1" ] ) then msg="CRITICAL" status=2 fi fi if [ "$crit_total" == "99999" ] && [ "$warn_total" == "99999" ] then msg="OK" status=0 fi # IO # Printing the results: echo "$msg - I/O stats: Total IO/Sec=$total Read IO/Sec=$read_sec Write IO/Sec=$written_sec KBytes Read/Sec=$kbytes_read_sec KBytes_Written/Sec=$kbytes_written_sec | 'Total IO/Sec'=$total; 'Read IO/Sec'=$read_sec; 'Write IO/Sec'=$written_sec; 'KBytes Read/Sec'=$kbytes_read_sec; 'KKBytes_Written/Sec'=$kbytes_written_sec;" fi ## QUEUE Check ## if [ "$queue" == "1" ] then total=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $8}'` read_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $2}'` written_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $3}'` # QUEUE # "Converting" values to float (string replace , with .) total=${total/,/.} read_sec=${read_sec/,/.} written_sec=${written_sec/,/.} # QUEUE # Comparing the result and setting the correct level: if [ "$warn_total" -ne "99999" ] then if ( [ "`echo "$total >= $warn_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_read" | bc`" == "1" ] || [ "`echo "$written_sec >= $warn_written" | bc`" == "1" ] ) then msg="WARNING" status=1 fi fi if [ "$crit_total" -ne "99999" ] then if ( [ "`echo "$total >= $crit_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_read" | bc -q`" == "1" ] || [ "`echo "$written_sec >= $crit_written" | bc`" == "1" ] ) then msg="CRITICAL" status=2 fi fi if [ "$crit_total" == "99999" ] && [ "$warn_total" == "99999" ] then msg="OK" status=0 fi # QUEUE # Printing the results: echo "$msg - Disk Queue Stats: Average Queue Length=$total Read Queue/Sec=$read_sec Write Queue/Sec=$written_sec | 'total'=$total; 'Read Queue/Sec'=$read_sec; 'Write Queue/Sec'=$written_sec;" fi ## WAIT TIME Check ## if [ "$waittime" == "1" ] then total=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $10}'` read_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $11}'` written_sec=`$iostat $disk -x -y -k -d 2 1 | grep $disk | awk '{print $12}'` # QUEUE # "Converting" values to float (string replace , with .) total=${total/,/.} read_sec=${read_sec/,/.} written_sec=${written_sec/,/.} # WAIT TIME # Comparing the result and setting the correct level: if [ "$warn_total" -ne "99999" ] then if ( [ "`echo "$total >= $warn_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $warn_read" | bc`" == "1" ] || [ "`echo "$written_sec >= $warn_written" | bc`" == "1" ] ) then msg="WARNING" status=1 fi fi if [ "$crit_total" -ne "99999" ] then if ( [ "`echo "$total >= $crit_total" | bc`" == "1" ] || [ "`echo "$read_sec >= $crit_read" | bc -q`" == "1" ] || [ "`echo "$written_sec >= $crit_written" | bc`" == "1" ] ) then msg="CRITICAL" status=2 fi fi if [ "$crit_total" == "99999" ] && [ "$warn_total" == "99999" ] then msg="OK" status=0 fi # WAIT TIME # Printing the results: echo "$msg - Wait Time Stats: Avg I/O Wait Time/ms=$total Avg Read Wait Time/ms=$read_sec Avg Write Wait Time/ms=$written_sec | 'Avg I/O Wait Time/ms'=$total; 'Avg Read Wait Time/ms'=$read_sec; 'Avg Write Wait Time/ms'=$written_sec;" fi exit $status



Good script - see new version
by chlewis, April 30, 2014

Hi, I have posted an updated version of this script here: http://exchange.nagios.org/directory/Plugins/Operating-Systems/Linux/check_iostat--2D-I-2FO-statistics--2D-updated-2014/details The script fixes the bugs mentioned in other posts also adds await (how long the system spends waiting to wrtie to disk) to the output and added a pnp4nagios graphing template.



Update
by amateo, June 30, 2013

I have created a patched version between original version and philippn's one. This patch: * Runs iostat just once. * Avoids the conversion between '.' and ',' by running iostat with LANG=C * Gets actual values not the ones from last reboot. * Runs from bash This is the patch: Index: check_iostat =================================================================== --- check_iostat (revisiĆ³n: 11002) +++ check_iostat (copia de trabajo) @@ -1,9 +1,20 @@ -#!/bin/sh +#!/bin/bash # # Version 0.0.2 - Jan/2009 # Changes: added device verification +# +# by Thiago Varela - thiago@iplenix.com # -# by Thiago Varela - thiago@iplenix.com +# -------------------------------------- +# +# Version 0.0.3 - Dec/2011 +# Changes: +# - changed values from bytes to mbytes +# - fixed bug to get traffic data without comma but point +# - current values are displayed now, not average values (first run of iostat) +# +# by Philipp Niedziela - pn@pn-it.com +# iostat=`which iostat 2>/dev/null` bc=`which bc 2>/dev/null` @@ -50,14 +61,19 @@ echo "ERROR: critical levels must be highter than warning levels" && help +# iostat parameters: +# -m: megabytes +# -k: kilobytes +# first run of iostat shows statistics since last reboot, second one shows current vaules of hdd # Doing the actual check: -tps=`$iostat $disk | grep $disk | awk '{print $2}'` -kbread=`$iostat $disk | grep $disk | awk '{print $3}'` -kbwritten=`$iostat $disk | grep $disk | awk '{print $4}'` +# We get just 2nd line, which is the actual value +output=$(LANG=C $iostat $disk -d 1 2 | grep $disk | sed -n '2p') +tps=$(echo "$output" | awk '{print $2}') +kbread=$(echo "$output" | awk '{print $3}') +kbwritten=$(echo "$output" | awk '{print $4}') - # Comparing the result and setting the correct level: -if ( [ "`echo "$tps >= $crit_tps" | bc`" == "1" ] || [ "`echo "$kbread >= $crit_read" | bc`" == "1" ] || +if ( [ "`echo "$tps >= $crit_tps" | bc`" == "1" ] || [ "`echo "$kbread >= $crit_read" | bc -q`" == "1" ] || [ "`echo "$kbwritten >= $crit_written" | bc`" == "1" ] ); then msg="CRITICAL" status=2



Selinux
by krzych, February 28, 2013

Anybody cooperated this with nrpe and selinux ? What type of context should it has ?



philippn's version is the only way to go
by GldRush98, July 31, 2012

philippn's changes made this script useful. With out those changes, the averages this check provides be default are fairly worthless.



some hints
by konstantin, May 31, 2012

Hi, I want to add 2 Hints. The Expression from comma to point is not needed. Just export LANG=C in the script. Then the output of iostat will be dotted. The second is that I would suggest to use #!/bin/bash as interpreter due to the fact that /bin/sh is linked to /bin/dash in newer distributions. And this script will not work without it.



Problem running on ubuntu
by darfnader, April 30, 2012

In ubuntu this has to be ran as a bash script. Also you need 'bc' installed on the system



Great Plugin
by mguthrie, February 29, 2012

Gave me exactly what I needed, thanks!



Update
by philippn, December 31, 2011

I've changed a bit to get it working on my server (performance data in MB; showing current read/write, not average vaules since last restart) http://www.pn-it.com/wp-content/uploads/2011/12/check_iostat / http://www.pn-it.com/linux-ubuntu/nagios-festplatten-mit-check_iostat-uberwachen/



Plugin doesn't return KB/s read and write
by kforbus, June 30, 2010

Very nice plugin. Only change I made was adding "-k" to the lines: kbread=`$iostat $disk -k | grep $disk | awk '{print $3}'` kbwritten=`$iostat $disk -k | grep $disk | awk '{print $4}'` This is because the plugin appears to return blocks read and written per second instead of kilobytes read and written per second. The "-k" option for iostat fixes this.



Bug with locale France
by apapillon, April 30, 2010

Change this lines : # Doing the actual check: tps=`$iostat $disk | grep $disk | awk '{print $2}' | sed -e 's/,/./g'` kbread=`$iostat $disk | grep $disk | awk '{print $3}' | sed -e 's/,/./g'` kbwritten=`$iostat $disk | grep $disk | awk '{print $4}' | sed -e 's/,/./g'`



Add a Review

You must be logged in to submit a review.

Thank you for your review!

Your review has been submitted and is pending approval.

Recommend

To:


From:


Thank you for your recommendation!

Your recommendation has been sent.

Project Stats
Rating
4.3 (22)
Favorites
1
Views
304,325