Skip to content

AlertManager

The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integrations such as email, PagerDuty, or OpsGenie. It also takes care of silencing and inhibition of alerts.

Configuration

It is configured through the alertmanager.config key of the values.yaml of the helm chart or the alertmanager.yaml file if you're using docker-compose.

As stated in the configuration file, it has four main keys (as templates is handled in alertmanager.config.templateFiles):

  • global: SMTP and API main configuration, it will be inherited by the other elements.
  • route: Route tree definition.
  • receivers: Notification integrations configuration.
  • inhibit_rules: Alert inhibition configuration.

Receivers

Notification receivers are the named configurations of one or more notification integrations.

Null receiver

Useful to ditch alerts that shouldn't be inhibited

receivers:
  - name: 'null'

Email notifications

To configure email notifications, set up the following in your config:

  config:
    global:
      smtp_from: {{ from_email_address }}
      smtp_smarthost: {{ smtp_server_endpoint }}:{{ smtp_server_port }}
      smtp_auth_username: {{ smpt_authentication_username }}
      smtp_auth_password: {{ smpt_authentication_password }}
    receivers:
    - name: 'email'
      email_configs:
        - to: {{ receiver_email }}
          send_resolved: true

If you need to set smtp_auth_username and smtp_auth_password you should value using helm secrets.

send_resolved, set to False by default, defines whether or not to notify about resolved alerts.

Rocketchat Notifications

Go to pavel-kazhavets AlertmanagerRocketChat repo for the updated rules.

In RocketChat:

  • Login as admin user and go to: Administration => Integrations => New Integration => Incoming WebHook.
  • Set "Enabled" and "Script Enabled" to "True".
  • Set all channel, icons, etc. as you need.
  • Paste contents of the official AlertmanagerIntegrations.js or my version into Script field.
AlertmanagerIntegrations.js
class Script {
    process_incoming_request({
        request
    }) {
        console.log(request.content);

        var alertColor = "warning";
        if (request.content.status == "resolved") {
            alertColor = "good";
        } else if (request.content.status == "firing") {
            alertColor = "danger";
        }

        let finFields = [];
        for (i = 0; i < request.content.alerts.length; i++) {
            var endVal = request.content.alerts[i];
            var elem = {
                title: "alertname: " + endVal.labels.alertname,
                value: "*instance:* " + endVal.labels.instance,
                short: false
            };

            finFields.push(elem);

            if (!!endVal.annotations.summary) {
                finFields.push({
                    title: "summary",
                    value: endVal.annotations.summary
                });
            }

            if (!!endVal.annotations.severity) {
                finFields.push({
                    title: "severity",
                    value: endVal.labels.severity
                });
            }

            if (!!endVal.annotations.grafana) {
                finFields.push({
                    title: "grafana",
                    value: endVal.annotations.grafana
                });
            }

            if (!!endVal.annotations.prometheus) {
                finFields.push({
                    title: "prometheus",
                    value: endVal.annotations.prometheus
                });
            }

            if (!!endVal.annotations.message) {
                finFields.push({
                    title: "message",
                    value: endVal.annotations.message
                });
            }

            if (!!endVal.annotations.description) {
                finFields.push({
                    title: "description",
                    value: endVal.annotations.description
                });
            }
        }

        return {
            content: {
                username: "Prometheus Alert",
                attachments: [{
                    color: alertColor,
                    title_link: request.content.externalURL,
                    title: "Prometheus notification",
                    fields: finFields
                }]
            }
        };

        return {
            error: {
                success: false
            }
        };
    }
}
  • Create Integration. The field Webhook URL will appear in the Integration configuration.

In Alertmanager:

  • Create new receiver or modify config of existing one. You'll need to add webhooks_config to it. Small example:
route:
    repeat_interval: 30m
    group_interval: 30m
    receiver: 'rocketchat'

receivers:
    - name: 'rocketchat'
      webhook_configs:
          - send_resolved: false
            url: '${WEBHOOK_URL}'
  • Reload/restart alertmanager.

In order to test the webhook you can use the following curl (replace {{ webhook-url }}):

curl -X POST -H 'Content-Type: application/json' --data '
{
  "text": "Example message",
  "attachments": [
    {
      "title": "Rocket.Chat",
      "title_link": "https://rocket.chat",
      "text": "Rocket.Chat, the best open source chat",
      "image_url": "https://rocket.cha t/images/mockup.png",
      "color": "#764FA5"
    }
  ],
  "status": "firing",
  "alerts": [
    {
      "labels": {
        "alertname": "high_load",
        "severity": "major",
        "instance": "node-exporter:9100"
      },
      "annotations": {
        "message": "node-exporter:9100 of job xxxx is under high load.",
        "summary": "node-exporter:9100 under high load."
      }
    }
  ]
}
' {{ webhook-url }}

Route

A route block defines a node in a routing tree and its children. Its optional configuration parameters are inherited from its parent node if not set.

Every alert enters the routing tree at the configured top-level route, which must match all alerts (i.e. not have any configured matchers). It then traverses the child nodes. If continue is set to false, it stops after the first matching child. If continue is true on a matching node, the alert will continue matching against subsequent siblings. If an alert does not match any children of a node (no matching child nodes, or none exist), the alert is handled based on the configuration parameters of the current node.

A basic configuration would be:

route:
  group_by: [job, alertname, severity]
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: 'email'
  routes:
    - match:
        alertname: Watchdog
      receiver: 'null'

Inhibit rules

Inhibit rules define which alerts triggered by Prometheus shouldn't be forwarded to the notification integrations. For example the Watchdog alert is meant to test that everything works as expected, but is not meant to be used by the users. Similarly, if you are using EKS, you'll probably have an KubeVersionMismatch, because Kubernetes allows a certain version skew between their components. So the alert is more strict than the Kubernetes policy.

To disable both alerts, set a match rule in config.inhibit_rules:

  config:
    inhibit_rules:
      - target_match:
          alertname: Watchdog
      - target_match:
          alertname: KubeVersionMismatch

Inhibit rules between times

To prevent some alerts to be sent between some hours you can use the time_intervals alertmanager configuration.

This can be useful for example if your backup system triggers some alerts that you don't need to act on.

# See route configuration at https://prometheus.io/docs/alerting/latest/configuration/#route
route:
  receiver: 'email'
  group_by: [job, alertname, severity]
  group_wait: 5m
  group_interval: 5m
  repeat_interval: 12h
  routes:
    - receiver: 'email'
      matchers:
        - alertname =~ "HostCpuHighIowait|HostContextSwitching|HostUnusualDiskWriteRate"
        - hostname = backup_server
      mute_time_intervals:
        - night
time_intervals:
  - name: night
    time_intervals:
      - times:
          - start_time: 02:00
            end_time: 07:00

If that doesn't work for you, you can use the sleep peacefully guidelines to tackle it at query level.

Alert rules

Alert rules are a special kind of Prometheus Rules that trigger alerts based on PromQL expressions. People have gathered several examples under Awesome prometheus alert rules

Alerts must be configured in the Prometheus configuration, either through the operator helm chart, under the additionalPrometheusRulesMap or in the prometheus.yml file. For example:

additionalPrometheusRulesMap:
  - groups:
      - name: alert-rules
        rules:
          - alert: BlackboxProbeFailed
            expr: probe_success == 0
            for: 5m
            labels:
              severity: error
            annotations:
              summary: "Blackbox probe failed (instance {{ $labels.target }})"
              description: "Probe failed\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

Other examples of rules are:

If you are using prometheus.yml directly, you also need to configure the alerting:

alerting:
  alertmanagers:
    - scheme: http
      static_configs:
        - targets: [ 'alertmanager:9093' ]

Silences

To silence an alert with a regular expression use the matcher alertname=~".*Condition".

References