Skip to content

Resilience

Autostart services if the system reboots

Using init system services to manage your services

Get basic metrics traceability and alerts

Set up Prometheus with:

  • The blackbox exporter to track if the services are available to your users and to monitor SSL certificates health.
  • The node exporter to keep track on the resource usage of your machines and set alerts to get notified when concerning events happen (disks are getting filled, CPU usage is too high)

Get basic logs traceability and alerts

Set up Loki and clear up your system log errors.

Improve the resilience of your data

If you're still using ext4 for your filesystems instead of zfs you're missing a big improvement. To set it up:

Automatically react on system failures

Future undeveloped improvements

  • Handle the system reboots after kernel upgrades