# config file for check_backup_log # author: alex harvey # email: alexh19740110@gmail.com # version 0.1 # It is assumed your script is run from a crontab # probably the root crontab. In order to configure this plugin # you'll need to know: # # ** the time of day / days in which your backup is run (see your crontab) # ** the location of gzip & gunzip if your backup logs are compressed (usually # /usr/bin/gzip and /usr/bin/gunzip # ** the full path to your backup shell script (see your crontab) # ** the full path to your logfile--you can use $ISODATE if that's # in your logfile naming convention # ** getting trickier: you'll need a list of regular expression fail # patterns that this script will scan the logs for. This is important. # If you get this list wrong, you won't get the alerts you need. # I recommend # a. READING the backup script and ensuring that this Nagios # plugin will actually match all possible patterns the shell script # ITSELF can put into your backup log. # b. STUDYING the output of fail scenarios for the actually backup # program the backup shell script can produce and make sure this # script will match these. For example, if you are using 'ufsdump' # be aware of the fail patterns this program can produce. # If you have a list of such patterns send me an email and I'll # add them to documentation for a future version of this. # ** most tricky: you'll need to know how to write a k-shell function # to extract the error code from your backup log. See my note above. # # Note: this configuration file is expected to reside in # /usr/local/nagios/etc/check_backup_log.cfg # # Hack the code of check_backup_log if you'd like it to reside elsewhere. # # Please email me any bugs or suggestions for improvements and I'll see # what I can do. # # -- Alex Harvey # 30th November 2006 # location of gzip GZIP= # location of gunzip GUNZIP= # location of backup script BACKUPSCRIPT=/usr/local/bin/backup.sh # location of backup logfile LOGFILE=/var/adm/logs/backup.${ISODATE}.log # fail patterns FAILPATTERNS="ERROR:.Backup.of.+.failed \ error" # function to extract error code get_fail_code () { logfile=$1 code=$( \ egrep '^ERROR .* at least one backup had errors.' $logfile | \ sed -e 's/[():]//g' | \ awk '{print $2}' \ ) echo ${code:-undef} } # minimum number of minutes we'll tolerate backup running for before warning LOWERTHRESHOLD=5 # maximum number of minutes we'll tolerate backup running for before warning UPPERTHRESHOLD= # warn about late as well as early finishes # note: $UPPERTHRESHOLD is ignored if this is disabled. WARNONLATE=0 # STARTMINS is the backup start time in minutes # You'll need to use a 'case' statement if the backup starts at different times on different days # use -1 in a 'case' statement if the backup doesn't run on that day. You may want to enclose # $LOWERTHRESHOLD and $UPPERTHRESHOLD in the case statement. #0 5 * * 2-6 /usr/local/bin/backup.sh #0 2 * * 0 /usr/local/bin/backup.sh case $DAY in Sun ) STARTMINS=$(( 2*60 )) ;; Mon ) STARTMINS=-1 ;; Tue|Wed|Thu|Fri|Sat ) STARTMINS=$(( 5*60 )) ;; esac # end of file