Skip to content

Prometheus Troubleshooting

Solutions for problems with Prometheus.

Service monitor not being recognized

Probably the service monitor labels aren't properly configured. Each prometheus monitors it's own targets, to see how you need to label your resources, describe the prometheus instance and search for Service Monitor Selector.

kubectl get prometheus -n monitoring
kubectl describe prometheus prometheus-operator-prometheus -n monitoring

The last one will return something like:

  Service Monitor Selector:
    Match Labels:
      Release:  prometheus-operator

Which means you need to label your service monitors with release: prometheus-operator, be careful if you use Release: prometheus-operator it won't work.

Failed calling webhook

  Error: UPGRADE FAILED: failed to create resource: Internal error occurred: failed calling webhook "": Post https://prometheus-operator-operator.monitoring.svc:443/admission-prometheusrules/mutate?timeout=30s: no endpoints available for service "prometheus-operator-operator"

Since version 0.30 of the operator, there is an admission webhook to prevent malformed rules from being added to the cluster. Without this validation, creating an invalid resource will cause Prometheus not to load it. If the container then restarts, it will go into a crashloop.

For the webhook to work, the control plane needs to be able to access the webhook service. That means the addition of a firewall rule in EKS and GKE deployments. People have succeeded with GKE, but people struggling with EKS have decided to disable the webhook.

To disable it, the following options have to be set:

  • prometheusOperator.admissionWebhooks.enabled=false
  • prometheusOperator.admissionWebhooks.patch.enabled=false
  • prometheusOperator.tlsProxy.enabled=false

If you have deployed your release with the webhook enabled, you also need to remove all the resources that match the following:

kubectl get
kubectl get MutatingWebhookConfiguration

Before executing helmfile apply again.