# CEPH Cluster * CEPH liebt Jumbo-Frames! Bei einem 10Gigabit Netzwerk die MTU für das CEPH-Netzwerk auf 9000 stellen. # Installation: * Proxmox Installation * CEPH Netz: vmbr1, 10.24.18.11/30 * Installation CEPH und Auswahl CEPH-Netz * Crushmap dekompilieren und von "host" auf "osd" ändern ``` ceph osd getcrushmap -o crush.orig crushtool -d crush.orig -o crush.txt nano crush.txt crushtool -c crush.txt -o crush.new ceph osd setcrushmap -i crush.new ``` * Erstellung der OSD's * Trennung der Pools nach hdd oder ssd ``` ceph osd crush rule create-replicated replicated_hdd default osd hdd ceph osd crush rule create-replicated replicated_ssd default osd ssd ``` * Erstellung von RAID5-ähnlichen pools ``` pveceph pool create ceph --erasure-coding k=2,m=1,failure-domain=osd,device-class=hdd pveceph pool create ssd --erasure-coding k=2,m=1,failure-domain=osd,device-class=ssd ``` * Pools auf "autoscale mode=on" stellen ## CEPH Dasboard: https://xxx.xxx.xxx.xxx:8443/ ## Ceph erasure-conding (RAID5): pveceph pool create ceph --erasure-coding k=2,m=1,failure-domain=osd,device-class=hdd Erstellt einen RAID5-ähnlichen Pool aus mindestens drei OSDs/Festplatten, setzt die Domain auf "OSD" und die Geräteklasse "HDD" ## Fehlerlog löschen: ceph crash archive-all Löscht in Proxmox die permanente Fehlermeldung falls mal ein Prozess abgestürtzt ist. ## Ceph OSD nach gezogener Platte wiederbeleben: * welche OSD ist down? * ceph osd tree * LVM block-Pfad anzeigen * ceph-volume lvm list * Deaktiviere OSD: * lvm lvchange -a n /dev/ceph-xyz.... * (Re)aktiviere OSD: * lvm lvchange -a y /dev/ceph-xyz.... * Neustart des LVM Volumes * ceph-volume lvm activate (Siehe ausgabe: ceph-colume lvm list) * check der OSD's: * ceph osd tree * Falls das OSD noch down oder out ist: Im Webfrontend Ceph ---> OSD # Installation ## Proxmox: * vmbr0 : Netzwerk nach aussen * vmbr1 : internes CEPH Netzwerk * mit ethtool Intel Netzwerkkarte patchen: ``` ethtool -K eno1 gso off gro off tso off tx off rx off auto eno1 iface eno1 inet manual post-up ethtool -K $IFACE gso off gro off tso off tx off rx off auto enp1s0 iface enp1s0 inet manual post-up ethtool -K $IFACE gso off gro off tso off tx off rx off ``` ## CRUSH Map bearbeiten ``` ceph osd getcrushmap -o crush.orig # Get compiled CRUSH Map crushtool -d crush.orig -o crush.txt # Decompile CRUSH Map ## Edit crush.txt! crushtool -c crush.txt -o crush.new # Recompile CRUSH Map ceph osd setcrushmap -i crush.new # Set new CRUSH Map ``` # Node aus dem Cluster entfernen Separate a Node Without Reinstalling Caution This is not the recommended method, proceed with caution. Use the previous method if you’re unsure. You can also separate a node from a cluster without reinstalling it from scratch. But after removing the node from the cluster, it will still have access to any shared storage. This must be resolved before you start removing the node from the cluster. A Proxmox VE cluster cannot share the exact same storage with another cluster, as storage locking doesn’t work over the cluster boundary. Furthermore, it may also lead to VMID conflicts. It’s suggested that you create a new storage, where only the node which you want to separate has access. This can be a new export on your NFS or a new Ceph pool, to name a few examples. It’s just important that the exact same storage does not get accessed by multiple clusters. After setting up this storage, move all data and VMs from the node to it. Then you are ready to separate the node from the cluster. Warning Ensure that all shared resources are cleanly separated! Otherwise you will run into conflicts and problems. First, stop the corosync and pve-cluster services on the node: ``` systemctl stop pve-cluster systemctl stop corosync ``` Start the cluster file system again in local mode: ``` pmxcfs -l ``` Delete the corosync configuration files: ``` rm /etc/pve/corosync.conf rm -r /etc/corosync/* ``` You can now start the file system again as a normal service: ``` killall pmxcfs systemctl start pve-cluster ``` The node is now separated from the cluster. You can deleted it from any remaining node of the cluster with: ``` pvecm delnode oldnode ``` If the command fails due to a loss of quorum in the remaining node, you can set the expected votes to 1 as a workaround: ``` pvecm expected 1 ``` And then repeat the pvecm delnode command. Now switch back to the separated node and delete all the remaining cluster files on it. This ensures that the node can be added to another cluster again without problems. ``` rm /var/lib/corosync/* ``` As the configuration files from the other nodes are still in the cluster file system, you may want to clean those up too. After making absolutely sure that you have the correct node name, you can simply remove the entire directory recursively from /etc/pve/nodes/NODENAME. Caution The node’s SSH keys will remain in the authorized_key file. This means that the nodes can still connect to each other with public key authentication. You should fix this by removing the respective keys from the /etc/pve/priv/authorized_keys file.