# CEPH Cluster

* CEPH liebt Jumbo-Frames! Bei einem 10Gigabit Netzwerk die MTU für das CEPH-Netzwerk auf 9000 stellen.

# Installation:

* Proxmox Installation
* CEPH Netz: vmbr1, 10.24.18.11/30
* Installation CEPH und Auswahl CEPH-Netz
* Crushmap dekompilieren und von "host" auf "osd" ändern
```
ceph osd getcrushmap -o crush.orig
crushtool -d crush.orig -o crush.txt
nano crush.txt 
crushtool -c crush.txt -o crush.new
ceph osd setcrushmap -i crush.new
```
* Erstellung der OSD's
* Trennung der Pools nach hdd oder ssd
```
ceph osd crush rule create-replicated replicated_hdd default osd hdd
ceph osd crush rule create-replicated replicated_ssd default osd ssd
```
* Erstellung von RAID5-ähnlichen pools
```
pveceph pool create ceph --erasure-coding k=2,m=1,failure-domain=osd,device-class=hdd
pveceph pool create ssd --erasure-coding k=2,m=1,failure-domain=osd,device-class=ssd
```
* Pools auf "autoscale mode=on" stellen

## CEPH Dasboard:
 https://xxx.xxx.xxx.xxx:8443/

## Ceph erasure-conding (RAID5):
<code>pveceph pool create ceph --erasure-coding k=2,m=1,failure-domain=osd,device-class=hdd</code>

Erstellt einen RAID5-ähnlichen Pool aus mindestens drei OSDs/Festplatten, setzt die Domain auf "OSD" und die Geräteklasse "HDD"

## Fehlerlog löschen:
<code>ceph crash archive-all</code>

Löscht in Proxmox die permanente Fehlermeldung falls mal ein Prozess abgestürtzt ist.

## Ceph OSD nach gezogener Platte wiederbeleben:

* welche OSD ist down?
  * <code>ceph osd tree</code>
* LVM block-Pfad anzeigen
  * <code>ceph-volume lvm list</code>
* Deaktiviere OSD:
  * <code>lvm lvchange -a n /dev/ceph-xyz....</code>
* (Re)aktiviere OSD:
  * <code>lvm lvchange -a y /dev/ceph-xyz....</code>
* Neustart des LVM Volumes
  * <code>ceph-volume lvm activate <osdNr> <OsdFsId></code> (Siehe ausgabe: ceph-colume lvm list)
* check der OSD's:
  * <code>ceph osd tree</code>
  * Falls das OSD noch down oder out ist: Im Webfrontend Ceph ---> OSD

# Installation

## Proxmox:
* vmbr0 : Netzwerk nach aussen
* vmbr1 : internes CEPH Netzwerk
* mit ethtool Intel Netzwerkkarte patchen:
```
ethtool -K eno1 gso off gro off tso off tx off rx off 

auto eno1
iface eno1 inet manual
        post-up ethtool -K $IFACE gso off gro off tso off tx off rx off 
auto enp1s0
iface enp1s0 inet manual
        post-up ethtool -K $IFACE gso off gro off tso off tx off rx off
```

## CRUSH Map bearbeiten
```
 ceph osd getcrushmap -o crush.orig       # Get compiled CRUSH Map
 crushtool -d crush.orig -o crush.txt     # Decompile CRUSH Map
 ## Edit crush.txt!
 crushtool -c crush.txt -o crush.new     # Recompile CRUSH Map
 ceph osd setcrushmap -i crush.new        # Set new CRUSH Map
```

# Node aus dem Cluster entfernen
Separate a Node Without Reinstalling
Caution 	This is not the recommended method, proceed with caution. Use the previous method if you’re unsure.

You can also separate a node from a cluster without reinstalling it from scratch. But after removing the node from the cluster, it will still have access to any shared storage. This must be resolved before you start removing the node from the cluster. A Proxmox VE cluster cannot share the exact same storage with another cluster, as storage locking doesn’t work over the cluster boundary. Furthermore, it may also lead to VMID conflicts.

It’s suggested that you create a new storage, where only the node which you want to separate has access. This can be a new export on your NFS or a new Ceph pool, to name a few examples. It’s just important that the exact same storage does not get accessed by multiple clusters. After setting up this storage, move all data and VMs from the node to it. Then you are ready to separate the node from the cluster.
Warning 	Ensure that all shared resources are cleanly separated! Otherwise you will run into conflicts and problems.

First, stop the corosync and pve-cluster services on the node:
```
systemctl stop pve-cluster
systemctl stop corosync
``` 
Start the cluster file system again in local mode:
```
pmxcfs -l
```
Delete the corosync configuration files:
```
rm /etc/pve/corosync.conf
rm -r /etc/corosync/*
```
You can now start the file system again as a normal service:
```
killall pmxcfs
systemctl start pve-cluster
```
The node is now separated from the cluster. You can deleted it from any remaining node of the cluster with:
```
pvecm delnode oldnode
```
If the command fails due to a loss of quorum in the remaining node, you can set the expected votes to 1 as a workaround:
```
pvecm expected 1
```
And then repeat the pvecm delnode command.

Now switch back to the separated node and delete all the remaining cluster files on it. This ensures that the node can be added to another cluster again without problems.
```
rm /var/lib/corosync/*
```
As the configuration files from the other nodes are still in the cluster file system, you may want to clean those up too. After making absolutely sure that you have the correct node name, you can simply remove the entire directory recursively from /etc/pve/nodes/NODENAME.
Caution 	The node’s SSH keys will remain in the authorized_key file. This means that the nodes can still connect to each other with public key authentication. You should fix this by removing the respective keys from the /etc/pve/priv/authorized_keys file.