/usr/lib/netdata/conf.d/health.d
# unmatched lines # the following alarms trigger only when there are enough data. # we assume there are enough data when: # # $1m_total_requests > 120 # # i.e. when there are at least 120 requests during the last minute template: web_log_1m_total_requests on: web_log.requests class: Workload type: Web Server component: Web log lookup: sum -1m unaligned calc: ($this == 0)?(1):($this) units: requests every: 10s info: number of HTTP requests in the last minute template: web_log_1m_unmatched on: web_log.excluded_requests class: Errors type: Web Server component: Web log lookup: sum -1m unaligned of unmatched calc: $this * 100 / $web_log_1m_total_requests units: % every: 10s warn: ($web_log_1m_total_requests > 120) ? ($this > 1) : ( 0 ) delay: up 1m down 5m multiplier 1.5 max 1h summary: Web log unparsed info: Percentage of unparsed log lines over the last minute to: webmaster # ----------------------------------------------------------------------------- # high level response code alarms # the following alarms trigger only when there are enough data. # we assume there are enough data when: # # $1m_requests > 120 # # i.e. when there are at least 120 requests during the last minute template: web_log_1m_requests on: web_log.type_requests class: Workload type: Web Server component: Web log lookup: sum -1m unaligned calc: ($this == 0)?(1):($this) units: requests every: 10s info: number of HTTP requests in the last minute template: web_log_1m_successful on: web_log.type_requests class: Workload type: Web Server component: Web log lookup: sum -1m unaligned of success calc: $this * 100 / $web_log_1m_requests units: % every: 10s warn: ($web_log_1m_requests > 120) ? ($this < (($status >= $WARNING ) ? ( 95 ) : ( 85 )) ) : ( 0 ) crit: ($web_log_1m_requests > 120) ? ($this < (($status == $CRITICAL) ? ( 85 ) : ( 75 )) ) : ( 0 ) delay: up 2m down 15m multiplier 1.5 max 1h summary: Web log successful info: Ratio of successful HTTP requests over the last minute (1xx, 2xx, 304, 401, 429) to: webmaster template: web_log_1m_redirects on: web_log.type_requests class: Workload type: Web Server component: Web log lookup: sum -1m unaligned of redirect calc: $this * 100 / $web_log_1m_requests units: % every: 10s warn: ($web_log_1m_requests > 120) ? ($this > (($status >= $WARNING ) ? ( 1 ) : ( 20 )) ) : ( 0 ) delay: up 2m down 15m multiplier 1.5 max 1h summary: Web log redirects info: Ratio of redirection HTTP requests over the last minute (3xx except 304) to: webmaster template: web_log_1m_bad_requests on: web_log.type_requests class: Errors type: Web Server component: Web log lookup: sum -1m unaligned of bad calc: $this * 100 / $web_log_1m_requests units: % every: 10s warn: ($web_log_1m_requests > 120) ? ($this > (($status >= $WARNING) ? ( 10 ) : ( 30 )) ) : ( 0 ) delay: up 2m down 15m multiplier 1.5 max 1h summary: Web log bad requests info: Ratio of client error HTTP requests over the last minute (4xx except 401 and 429) to: webmaster template: web_log_1m_internal_errors on: web_log.type_requests class: Errors type: Web Server component: Web log lookup: sum -1m unaligned of error calc: $this * 100 / $web_log_1m_requests units: % every: 10s warn: ($web_log_1m_requests > 120) ? ($this > (($status >= $WARNING) ? ( 1 ) : ( 2 )) ) : ( 0 ) crit: ($web_log_1m_requests > 120) ? ($this > (($status == $CRITICAL) ? ( 2 ) : ( 5 )) ) : ( 0 ) delay: up 2m down 15m multiplier 1.5 max 1h summary: Web log server errors info: Ratio of server error HTTP requests over the last minute (5xx) to: webmaster # ----------------------------------------------------------------------------- # web slow # the following alarms trigger only when there are enough data. # we assume there are enough data when: # # $1m_requests > 120 # # i.e. when there are at least 120 requests during the last minute template: web_log_10m_response_time on: web_log.request_processing_time class: Latency type: System component: Web log lookup: average -10m unaligned of avg units: ms every: 30s info: average HTTP response time over the last 10 minutes template: web_log_web_slow on: web_log.request_processing_time class: Latency type: Web Server component: Web log lookup: average -1m unaligned of avg units: ms every: 10s green: 500 red: 1000 warn: ($web_log_1m_requests > 120) ? ($this > $green && $this > ($web_log_10m_response_time * 2) ) : ( 0 ) crit: ($web_log_1m_requests > 120) ? ($this > $red && $this > ($web_log_10m_response_time * 4) ) : ( 0 ) delay: down 15m multiplier 1.5 max 1h summary: Web log processing time info: Average HTTP response time over the last 1 minute options: no-clear-notification to: webmaster # ----------------------------------------------------------------------------- # web too many or too few requests # the following alarms trigger only when there are enough data. # we assume there are enough data when: # # $5m_successful_old > 120 # # i.e. when there were at least 120 requests during the 5 minutes starting # at -10m and ending at -5m template: web_log_5m_successful_old on: web_log.type_requests class: Workload type: Web Server component: Web log lookup: average -5m at -5m unaligned of success units: requests/s every: 30s info: average number of successful HTTP requests for the 5 minutes starting 10 minutes ago template: web_log_5m_successful on: web_log.type_requests class: Workload type: Web Server component: Web log lookup: average -5m unaligned of success units: requests/s every: 30s info: average number of successful HTTP requests over the last 5 minutes template: web_log_5m_requests_ratio on: web_log.type_requests class: Workload type: Web Server component: Web log calc: ($web_log_5m_successful_old > 0)?($web_log_5m_successful * 100 / $web_log_5m_successful_old):(100) units: % every: 30s warn: ($web_log_5m_successful_old > 120) ? ($this > 200 OR $this < 50) : (0) crit: ($web_log_5m_successful_old > 120) ? ($this > 400 OR $this < 25) : (0) delay: down 15m multiplier 1.5 max 1h options: no-clear-notification summary: Web log 5 minutes requests ratio info: Ratio of successful HTTP requests over over the last 5 minutes, \ compared with the previous 5 minutes \ (clear notification for this alarm will not be sent) to: webmaster
.
Edit
..
Edit
adaptec_raid.conf
Edit
apcupsd.conf
Edit
as400.conf
Edit
bcache.conf
Edit
beanstalkd.conf
Edit
boinc.conf
Edit
btrfs.conf
Edit
ceph.conf
Edit
cgroups.conf
Edit
clickhouse.conf
Edit
cockroachdb.conf
Edit
consul.conf
Edit
cpu.conf
Edit
db2.conf
Edit
dbengine.conf
Edit
disks.conf
Edit
dns_query.conf
Edit
dnsmasq_dhcp.conf
Edit
docker.conf
Edit
elasticsearch.conf
Edit
entropy.conf
Edit
exporting.conf
Edit
file_descriptors.conf
Edit
gearman.conf
Edit
geth.conf
Edit
go.d.plugin.conf
Edit
haproxy.conf
Edit
hdfs.conf
Edit
httpcheck.conf
Edit
ioping.conf
Edit
ipc.conf
Edit
ipfs.conf
Edit
ipmi.conf
Edit
isc_dhcpd.conf
Edit
k8sstate.conf
Edit
kubelet.conf
Edit
load.conf
Edit
lvm.conf
Edit
mdstat.conf
Edit
megacli.conf
Edit
memcached.conf
Edit
memory.conf
Edit
ml.conf
Edit
mq.conf
Edit
mysql.conf
Edit
net.conf
Edit
netfilter.conf
Edit
nvme.conf
Edit
pihole.conf
Edit
ping.conf
Edit
plugin.conf
Edit
portcheck.conf
Edit
postgres.conf
Edit
power_supply_capacity.conf
Edit
processes.conf
Edit
proxysql.conf
Edit
python.d.plugin.conf
Edit
qos.conf
Edit
rabbitmq.conf
Edit
ram.conf
Edit
reboot.conf
Edit
redis.conf
Edit
retroshare.conf
Edit
riakkv.conf
Edit
scaleio.conf
Edit
softnet.conf
Edit
storcli.conf
Edit
streaming.conf
Edit
swap.conf
Edit
synchronization.conf
Edit
systemdunits.conf
Edit
tcp_conn.conf
Edit
tcp_listen.conf
Edit
tcp_mem.conf
Edit
tcp_orphans.conf
Edit
tcp_resets.conf
Edit
timex.conf
Edit
udp_errors.conf
Edit
unbound.conf
Edit
upsd.conf
Edit
vcsa.conf
Edit
vernemq.conf
Edit
vsphere.conf
Edit
web_log.conf
Edit
websphere_jmx.conf
Edit
websphere_mp.conf
Edit
websphere_pmi.conf
Edit
whoisquery.conf
Edit
x509check.conf
Edit
zfs.conf
Edit