首页 > 其他 > 详细

prometheus自定义监控项的报警规则

时间:2020-04-27 14:49:02      阅读:776      评论:0      收藏:0      [点我收藏+]

prometheus rules:

- name: basic-and-important
    rules:
    - alert: NodeCPUUsage
      annotations:
        description: {{ $labels.instance }} CPU usage is above 80% (current value is {{ $value }}) 
      expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{job="node-exporter",mode="idle"}[5m])) * 100) > 80
      for: 10m
      labels:
        severity: critical
      annotations:
        description: {{$labels.instance}} CPU usage is above 80% (current value is {{ $value }})
    - alert: NodeMEMUsage
      expr: ((1 - (node_memory_MemAvailable_bytes{job="node-exporter"} / (node_memory_MemTotal_bytes{job="node-exporter"}))) * 100) > 80
      for: 10m
      labels:
        severity: critical
      annotations:
        description: {{$labels.instance}} MEM usage is above 80% (current value is {{ $value }})
    - alert: NodeDiskUsage
      expr: (1-(node_filesystem_free_bytes{job="node-exporter",fstype=~"ext4|xfs"} / node_filesystem_size_bytes{job="node-exporter",fstype=~"ext4|xfs"}))*100 > 80
      for: 10m
      labels:
        severity: critical
      annotations:
        description: {{$labels.instance}} Disk usage is above 80% (current value is {{ $value }})
    - alert: API response time per min
      expr: increase(http_server_requests_seconds_sum{uri!="/actuator/health"}[1m])/increase(http_server_requests_seconds_count{uri!="/actuator/health"}[1m])>2
      for: 1m 
      labels:
        severity: critical
      annotations:
        description: {{$labels.job}} {{$labels.url}}  response time more than 2s. current value is {{ $value }}
    - alert: Count of API request times per min
      expr: increase(http_server_requests_seconds_count{uri!="/actuator/health",uri!="/actuator/prometheus",status!="200"}[1m])>1
      for: 1m
      labels:
        severity: critical
      annotations:
        description: {{$labels.job}} {{$labels.url}}  request error times is {{ $value }} in recent one min
  - name: rabbitmq-monitoring
    rules:
    - alert: rabbitmq_queue_messages
      expr: rabbitmq_queue_messages{queue!~".*_DL"} > 10
      for: 5m
      labels:
        severity: critical
      annotations:
        description: queue name:{{$labels.queue}} is blocked. current count is {{ $value }}
    - alert: rabbitmq_consumer_error_total
      expr: increase(rabbitmq_consumer_error_total[1m]) > 10
      for: 1m
      labels:
        severity: critical
      annotations:
        description: service name:{{$labels.job}} cannot consume the queues. current count is {{ $value }}
    - alert: rabbitmq_connection_recovery_total
      expr: increase(rabbitmq_connection_recovery_total[1m]) > 10
      for: 1m
      labels:
        severity: critical
      annotations:
        description: service name:{{$labels.job}} connection recovery total is {{ $value }}

 

prometheus自定义监控项的报警规则

原文:https://www.cnblogs.com/malukang/p/12786507.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!