Resilience
Autostart services if the system reboots⚑
Using init system services to manage your services
Get basic metrics traceability and alerts⚑
Set up Prometheus with:
- The blackbox exporter to track if the services are available to your users and to monitor SSL certificates health.
- The node exporter to keep track on the resource usage of your machines and set alerts to get notified when concerning events happen (disks are getting filled, CPU usage is too high)
Get basic logs traceability and alerts⚑
Set up Loki and clear up your system log errors.
Improve the resilience of your data⚑
If you're still using ext4
for your filesystems instead of zfs
you're missing a big improvement. To set it up:
- Plan your zfs storage architecture
- Install ZFS
- Create ZFS local and remote backups
- [Monitor your ZFS ]
Automatically react on system failures⚑
Future undeveloped improvements⚑
- Handle the system reboots after kernel upgrades