Docker Cluster HA Setup

1. Setup a VPN in-between your 3,5,7 or more servers. This can be done with TincVPN for example but there are many others you can choose from.

https://www.tinc-vpn.org/

2. Installing GlusterFS

apt-get install gpg -y
curl https://download.gluster.org/pub/gluster/glusterfs/11/rsa.pub | gpg --dearmor > /usr/share/keyrings/glusterfs-archive-keyring.gpg
DEBID=$(grep 'VERSION_ID=' /etc/os-release | cut -d '=' -f 2 | tr -d '"')
DEBVER=$(grep 'VERSION=' /etc/os-release | grep -Eo '[a-z]+')
DEBARCH=$(dpkg --print-architecture)
echo "deb [signed-by=/usr/share/keyrings/glusterfs-archive-keyring.gpg] https://download.gluster.org/pub/gluster/glusterfs/LATEST/Debian/${DEBID}/${DEBARCH}/apt ${DEBVER} main" | sudo tee /etc/apt/sources.list.d/gluster.list
apt-get update && apt-get install glusterfs-server -y

3. Edit /etc/glusterfs/glusterd.vol and add
This will prevent glusterfs from getting exposed to the dangerous interwebs.

option transport.socket.bind-address 10.0.X.1

rpcbind does listen on the network and we don't need it, so lets get rid of it.

apt-get remove rpcbind -y

4. Enable GlusterFS

systemctl start glusterd && systemctl enable glusterd

5. Peer with your GlusterFS nodes

gluster peer probe 10.0.2.1
gluster peer probe 10.0.3.1

6. Check the peering status

gluster peer status

7. Folders for the mount

mkdir -p /mnt/bricks/docker &&   mkdir -p /mnt/data/docker

8. Create your first volume for Docker

gluster volume create docker replica 3 10.0.1.1:/mnt/bricks/docker 10.0.2.1:/mnt/bricks/docker 10.0.3.1:/mnt/bricks/docker force
gluster volume start docker

9. Mount your first volume

mount.glusterfs 10.0.X.1:/docker /mnt/data/docker

10. Make the mount boot ready

[Unit]
Description=mounts service service
Wants=network-online.target glusterd.service
After=network-online.target glusterd.service
[Service]
User=root
Group=root
ExecStartPre=sleep 5
ExecStart=mount.glusterfs 10.0.X.1:/docker /mnt/data/docker
RemainAfterExit=true
Type=oneshot
[Install]
WantedBy=multi-user.target

Copy this to /etc/systemd/system/mounts.service

11. Enable the mount service

systemctl enable mounts

12. You may have to edit the GlusterFS systemd file to prevent a race condition with your VPN.
GlusterFS will fail to start if your VPN isn't running already.

You can do this with

systemctl edit glusterd --full

Added one line

ExecStartPre=/bin/sh -c 'until ping -c1 10.0.X.1; do sleep 1; done;'

Profit! Next reboot GlusterFS should start up fine.

13. Install Docker

# Add Docker's official GPG key:
apt-get update
apt-get install ca-certificates curl -y
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
chmod a+r /etc/apt/keyrings/docker.asc
# Add the repository to Apt sources:
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null
# Install Docker
apt-get update && apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -y

14. Init the Swarm on the first Node

docker swarm init --advertise-addr 10.0.1.1 --listen-addr=10.0.1.1

advertise-addr will only advertise the swarm inside our VPN network

15. Join other Nodes

docker swarm join --token whateverthattokenis 10.0.1.1:2377 --listen-addr=10.0.2.1
docker swarm join --token whateverthattokenis 10.0.1.1:2377 --listen-addr=10.0.3.1

listen-addr will force swarm to bind to your local VPN

16. Check the Cluster

docker node ls

17. Promote the other Nodes to archive 100% True HA

docker node promote node2
docker node promote node3

18. Deploy your first service
In my case it was a ZNC bouncer.
Had to run the docker container normally to generate the config files.

docker run -it -v /mnt/data/docker/znc/:/znc-data znc --makeconf

Lets deploy the service.

docker service create --mount type=bind,src=/mnt/data/docker/znc/,dst=/znc-data --publish published=1025,target=1025 --name bouncer znc

The service will get exposed on port 1025 on all nodes.

19. If you run this, on any node.

docker node ps $(docker node ls -q)

You should be able to check your container status.

20. When you reboot the node with your container, the service should be restored in about 60s.