Watchdog 🐕🦺
Many applications running for long periods of time eventually transition to broken states, and cannot recover except by being restarted. - HN
# Raspberry-pi ⮺
Expected availability from this design: ~99.9–99.99%
99.99% still means roughly 4.4 minutes of downtime per month.
It avoid scheduled daily reboots entirely.
# Hardware Setup
- Raspberry Pi 4 or 5 (Ethernet preferred over Wi-Fi)
- SSD boot (USB SSD) instead of microSD
# Watchdog
The Pi 5 changed a lot of low-level hardware compared with earlier Pi models, and watchdog support is handled differently than on older BCM2835/BCM2711 systems. - ChatGPT
Check devices (Pi5)
$ ls -l /dev/watchdog*
$ dmesg | grep -i watchdog
[ 0.673520] bcm2835-wdt bcm2835-wdt: Broadcom BCM2835 watchdog timerEnable hardware watchdog
$ sudo apt update
$ sudo apt install watchdog
# check it
$ systemctl status watchdogIf one of the condtion below is met, the server will reboot. Make sure network interface is correct.
# /etc/watchdog.conf
watchdog-device=/dev/watchdog # If Linux freezes → reboot
max-load-1=24 # If system load goes insane → reboot
# interface=eth0 # If networking dies badly enough → reboot
# ping=8.8.8.8 # this may be handled by network recovery below to avoid reboot
retry-timeout=60$ sudo systemctl enable watchdog
$ sudo systemctl start watchdog# Make SSH self-healing
If SSH crashes:
- wait 5 seconds
- restart automatically
No reboot needed.
[Unit]
StartLimitIntervalSec=0
[Service]
Restart=always
RestartSec=5$ sudo systemctl daemon-reload
$ sudo systemctl restart ssh# Add network recovery
Dedicated to restarting networking rather than using watchdog
test script
# /usr/local/bin/network-check.sh
#!/bin/bash
if ! ping -c2 1.1.1.1 >/dev/null; then
logger "Network unavailable, restarting network"
systemctl restart networking
fiservice
# /etc/systemd/system/network-check.service
[Unit]
Description=Network health check
After=network-online.target
[Service]
Type=oneshot
ExecStart=/usr/local/bin/network-check.shtimer
# /etc/systemd/system/network-check.timer
[Unit]
Description=Run network check every 5 minutes
[Timer]
OnBootSec=2min
OnUnitActiveSec=5min
[Install]
WantedBy=timers.targetEnable
# make script executable
$ sudo chmod +x /usr/local/bin/network-check.sh
$ sudo systemctl daemon-reload
$ sudo systemctl enable --now network-check.timer
# check status
$ systemctl list-timers
$ journalctl -u network-check.service# Reduce disk wear
Logs go mostly to RAM instead of continuously writing to storage.
# /etc/systemd/journald.conf
Storage=volatile
SystemMaxUse=50MRestart
$ sudo systemctl restart systemd-journald