proxmox-ve/ceph/config-backup.sh

36 lines
1.7 KiB
Bash

#!/bin/sh
HOSTNAME=$(cat /etc/hostname)
tar cfvz proxmox-${HOSTNAME}-config.tgz \
/etc/hosts \
/etc/hostname \
/etc/resolv.conf \
/etc/ceph \
/etc/corosync \
/etc/ssh \
/etc/network \
/var/lib/ceph \
/var/lib/pve-cluster
#
# I managed to restore everything after a very stressful 24hrs!
#
# For those reading in the future, don't bother backing up /etc/pve. Don't listen to what anyone says.
# Useful to have no doubt - so you can cherry pick files you might need, but it's an ineffective disaster recovery strategy.
# It's simply the FUSE mount of the SQLlite database, and you can't even write to it properly while it's mounted - and can't access it at all when it isn't.
# Instead back up the db file /var/lib/pve-cluster/config.db and use that to restore the config.
#
# TLDR: To completely restore all nodes, I made the following backups and copied them to a fresh install:
# /etc/hosts, /etc/hostname, /etc/resolv.conf, /etc/ceph, /etc/corosync, /etc/ssh (particularly the host keys),
# /etc/network, /var/lib/ceph /var/lib/pve-cluster. I stopped PVE first to avoid conflics with "systemctl stop pve-cluster pvedaemon pveproxy pvestatd".
#
# After restoring these files and rebooting PVE, VM CT, storage etc. configs are restored. Ceph required some extra work in my case:
#
# Enable the no-subscription repository and use "pveceph install --repository no-subscription" to install ceph (or use the web ui)
# Manually start and enable the manager and monitor on each node using systemctl start/enable ceph-mgr@<hostname>/ceph-mon@<hostname>
# Check your OSDs are detected by running "ceph-volume lvm list"
# Rejoin the OSDs to the cluster using "ceph-volume lvm activate --all"
# Profit
#