Watchdog 🐕‍🦺

Many applications running for long periods of time eventually transition to broken states, and cannot recover except by being restarted. - HN

# Raspberry-pi ⮺

Expected availability from this design: ~99.9–99.99%

99.99% still means roughly 4.4 minutes of downtime per month.
It avoid scheduled daily reboots entirely.

# Hardware Setup

  • Raspberry Pi 4 or 5 (Ethernet preferred over Wi-Fi)
  • SSD boot (USB SSD) instead of microSD

# Watchdog

The Pi 5 changed a lot of low-level hardware compared with earlier Pi models, and watchdog support is handled differently than on older BCM2835/BCM2711 systems. - ChatGPT

Check devices (Pi5)

$ ls -l /dev/watchdog*
$ dmesg | grep -i watchdog
[    0.673520] bcm2835-wdt bcm2835-wdt: Broadcom BCM2835 watchdog timer

Enable hardware watchdog

$ sudo apt update
$ sudo apt install watchdog

# check it
$ systemctl status watchdog

If one of the condtion below is met, the server will reboot. Make sure network interface is correct.

# /etc/watchdog.conf
watchdog-device=/dev/watchdog   # If Linux freezes → reboot
max-load-1=24                   # If system load goes insane → reboot
# interface=eth0                  # If networking dies badly enough → reboot
# ping=8.8.8.8                    # this may be handled by network recovery below to avoid reboot 
retry-timeout=60
$ sudo systemctl enable watchdog
$ sudo systemctl start watchdog

# Make SSH self-healing

If SSH crashes:

  • wait 5 seconds
  • restart automatically

No reboot needed.

sudo systemctl edit ssh

[Unit]
StartLimitIntervalSec=0

[Service]
Restart=always
RestartSec=5
$ sudo systemctl daemon-reload
$ sudo systemctl restart ssh

# Add network recovery

Dedicated to restarting networking rather than using watchdog

test script

# /usr/local/bin/network-check.sh
#!/bin/bash

if ! ping -c2 1.1.1.1 >/dev/null; then
    logger "Network unavailable, restarting network"
    systemctl restart networking
fi

service

# /etc/systemd/system/network-check.service
[Unit]
Description=Network health check
After=network-online.target

[Service]
Type=oneshot
ExecStart=/usr/local/bin/network-check.sh

timer

# /etc/systemd/system/network-check.timer
[Unit]
Description=Run network check every 5 minutes

[Timer]
OnBootSec=2min
OnUnitActiveSec=5min

[Install]
WantedBy=timers.target

Enable

# make script executable
$ sudo chmod +x /usr/local/bin/network-check.sh

$ sudo systemctl daemon-reload
$ sudo systemctl enable --now network-check.timer

# check status
$ systemctl list-timers
$ journalctl -u network-check.service

# Reduce disk wear

Logs go mostly to RAM instead of continuously writing to storage.

# /etc/systemd/journald.conf
Storage=volatile
SystemMaxUse=50M

Restart

$ sudo systemctl restart systemd-journald

# PC-hardware ⮺

Written on July 2, 2026, Last update on
linux-system watchdog pc-hardware raspberry-pi