High Availability Cluster with Pacemaker Part 4: Basic setup

This blog chapter is now about the initial Cluster setup

https://clusterlabs.org/

This solution is also used by SUSE and Redhat in their products:

https://www.suse.com/products/highavailability/

https://www.redhat.com/en/store/high-availability-add#?sku=RH00025

JWB-Systems can deliver you training and consulting on all kinds of Pacemaker clusters:

SLE321v15 Deploying and Administering SUSE Linux Enterprise High Availability 15

SLE321v12 Deploying and Administering SUSE Linux Enterprise High Availability 12

RHEL 8 / CentOS 8 High Availability

RHEL 7 / CentOS 7 High Availability

We know now the basics, now it’s time to setup the whole thing.

As an example we will create the following cluster:

  • 2 Nodes
  • STONITH via SBD
  • The cluster should deliver a NFS service
  • Storage for the NFS Server will be local on the nodes
  • One node is active and has the master copy of the storage
  • The other node is passive and gets the master copy of the storage replicated via DRBD (Distributed replicated block device, you can imagine it as a RAID1 over 2 partitions on 2 nodes)

I will try to cover both ways of installing, SUSE SLES 12 or 15 and RHEL derivates (in case i talk here about RHEL i mean RHEL 7 or 8 and it’s derivates). I do not recommend Debian or it’s derivates, some of their resource agents are buggy and can cause some trouble.

First of all we need 2 nodes installed with SUSE or Redhat Linux. A minimum install will do, a X server for GUI is not necessary.

I choose on both distributions the default disk layout. We just must make sure to have free disk place on the harddisk for the NFS data later.

Both nodes will be installed exactly the same way.

Network setup:

i mentioned earlier in my blog already that network redundancy is important. If this prerequisite is already fullfilled bcs the nodes are VMs and network reduncancy is done on the hypervisor level, then we dont need to take care of it in the VMs.

If the nodes are bare metal then we must build up the network redundant:

  • 2 NICs bonded for user traffic: node1 192.168.111.1/24, node2 192.168.111.2/24
  • 2 NICs for cluster communication (can be bonded at OS level, but can also be given to Corosync separately and Corosync handles the redundancy). I choose here the bond variant: node1 172.17.1.1/16, node2 172.17.1.2/16.
  • 2 NICs bonded for DRBD replication: node1 172.17.2.1/16, node2 172.17.2.2/16
  • The traffic for the SBD LUN we do also on the user traffic network, its just minimal.
  • We need an ISCSI target configured that provides one LUN with 10MB, its available under 192.168.111.3/24

After you configured the network on both nodes there are additional things to configure before we go to the cluster:

  • on both nodes /etc/hosts with the cluster communication network so that both nodes can reach each other with hostnames and FQDN.
  • both nodes must have a reliable time source so that they are time synced.
  • SSH PKI between both nodes.

Software installation

After that we can install the cluster software on both nodes:

SLES:

node1:~ # zypper in -t pattern ha_sles
node1:~ # zypper in yast2-iscsi-client
node2:~ # zypper in -t pattern ha_sles
node2:~ # zypper in yast2-iscsi-client

RHEL:

node1:~ # yum install pacemaker pcs resource-agents sbd fence-agents-sbd
node1:~ # yum install iscsi-initiator-utils
node2:~ # yum install pacemaker pcs resource-agents sbd fence-agents-sbd
node2:~ # yum install iscsi-initiator-utils

ISCSI Setup

Now we need to connect first the nodes to the ISCSI LUN, this works the same way on SLES and RHEL. Do the following commands on both nodes:

node1:~ # systemctl enable iscsid.service
node1:~ # systemctl start iscsid.service
node1:~ # iscsiadm -m discovery -t st -p 192.168.111.3

(the IQN output from this command will be used as argument for the next one)

node1:~ # iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.sbd1.x8664:sn.sbd1 -p 192.168.111.3 -l
node1:~ # systemctl restart iscsid.service

node2:~ # systemctl enable iscsid.service
node2:~ # systemctl start iscsid.service
node2:~ # iscsiadm -m discovery -t st -p 192.168.111.3

(the IQN output from this command will be used as argument for the next one)

node2:~ # iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.sbd1.x8664:sn.sbd1 -p 192.168.111.3 -l
node2:~ # systemctl restart iscsid.service
node1:~ # lsscsi
node2:~ # lsscsi

should give you a confirmation that the LIO-ORG disk is available now.

After we are able to see the LIO-ORG disk on both nodes we are able to set up SBD-STONITH.

SBD Setup

SBD needs several setup steps:

  • /etc/sysconfig/sbd configfile on both nodes
  • watchdog kernel module loaded on both nodes
  • SBD disk formatted with the SBD format (done on one node)
  • After Cluster initialisation a SBD resource needs to be created in the cluster

/etc/sysconfig/sbd

Create on one node the SBD configfile and copy it to the other node.
The configfile should have the following content. Please adapt the values for the lines SBD_DEVICE, SBD_OPTS


SBD_DEVICE=”/dev/Your SBD device name”
SBD_PACEMAKER=yes
SBD_STARTMODE=always
SBD_DELAY_START=no
SBD_WATCHDOG_DEV=/dev/watchdog
SBD_WATCHDOG_TIMEOUT=5
SBD_TIMEOUT_ACTION=flush,reboot
SBD_MOVE_TO_ROOT_CGROUP=auto
SBD_OPTS=”-W -n node1″

Copy the file to the other node and change there the last parameter

SBD_OPTS=”-W -n node2″
.

watchdog

For bare metal nodes the OS normally detects the fitting watchdog kernel module and loads it, you can control id a watchdog module is loaded by typing this command:

lsmod | grep wdt

There should be a loaded watchdog module. If not you need to find out which watchdog chip is used on your mainboard and load the respective module manually.

If you have a KVM virtual machine you can emulate in the VM config the i6300esb Watchdog and load the module my

node1:~ # modprobe i6300esb
node2:~ # modprobe i6300esb

A generic wachdog you can use is the softdog, it emulates a watchdog in the kernel:

node1:~ # modprobe softdog
node2:~ # modprobe softdog

Don’t forget to make the module load permanent:

node1:~ # echo i6300esb > /etc/modules-load.d/i6300esb.conf
node2:~ # echo i6300esb > /etc/modules-load.d/i6300esb.conf

or

node1:~ # echo softdog > /etc/modules-load.d/softdog.conf
node2:~ # echo softdog > /etc/modules-load.d/softdog.conf

Format the SBD disk

Now format on one node the SBD device:

node1:~ # sbd -d /dev/”your SBD devive name” create

Also make the SBD service starting on all nodes:

node1:~ # systemctl enable sbd
node2:~ # systemctl enable sbd

Now we have finished the SBD preparations and can initialize the cluster

Initialize the cluster

First we need to create on every node a password for the user hacluster:

node1:~ # echo mypassword | passwd –stdin hacluster
node2:~ # echo mypassword | passwd –stdin hacluster

There are several ways to get the initial cluster configuration done, depending on what distribution you use.

Redhat does the cluster initialisation and management with a tool called pcs on one node:

node1:~ # systemctl enable pcsd
node1:~ # systemctl start pcsd
node2:~ # systemctl enable pcsd
node2:~ # systemctl start pcsd
node1:~ # pcs cluster auth node1 node2 -u hacluster -p mypassword–force
node1:~ # pcs cluster setup –name mycluster –transport udp –addr0 172.17.1.0 –mcast0 239.255.1.1 node1 node2

This configures the cluster on both nodes on Redhat

With SUSE you can use the cluster module from yast

node1:~ # yast2 cluster

and copy the resulting configfile /etc/corosync.corosync.conf to the other node.

After that you are ready to start your cluster and monitor the cluster status:

Redhat:

node1:~ # pcs cluster start –all

SUSE:

node1:~ # systemctl start pacemaker
node2:~ # systemctl start pacemaker

Cluster Status

node2:~ #crm_mon -rnf

Stack: corosync

Current DC: node1 (version 1.1.23-1.el7_9.1-9acf116022) – partition with quorum

Last updated: Mon Mar 15 11:42:36 2021

Last change: Mon Feb 22 10:32:46 2021 by hacluster via crmd on node1

2 nodes configured

0 resource instances configured

Node node1: online

Node node2: online

No inactive resources

Migration Summary:

* Node node2:

* Node node1:

This gives you a live updated commandline view of your current cluster status. If you change your cluster configuration you should always have a terminal window open parallel and watch your cluster status. It’s one of the most often used commands in Pacemaker, you should get used to it.

You can also control the status of corosyng (the communication platform):

node1:~ # corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
       id      = 172.17.1.1
       status  = ring 0 active with no faults

Add SBD resource to the cluster

Now we need for a working SBD STONITH to add a resource in the cluster:

Redhat:

node1:~ # pcs property set no-quorum-policy=ignore
node1:~ # pcs property set stonith-enabled=true
node1:~ # pcs property set stonith-timeout=60s
node1:~ # pcs property set stonith-watchdog-timeout=0
node1:~ # pcs stonith create sbd fence_sbd devices=”/dev/your-SBD-Device-name”

SUSE:

node1:~ # crm configure property set no-quorum-policy=ignore
node1:~ #crm configure property set stonith-enabled=true
node1:~ #crm configure property set stonith-timeout=60s
node1:~ #crm configure property set stonith-watchdog-timeout=0
node1:~ #crm configure primitive SBD-STONITH stonith:external/sbd

See now the changed cluster status:

node2:~ # crm_mon -rnf

Stack: corosync

Current DC: node1 (version 1.1.23-1.el7_9.1-9acf116022) – partition with quorum

Last updated: Mon Mar 15 11:42:36 2021

Last change: Mon Feb 22 10:32:46 2021 by hacluster via crmd on node1

2 nodes configured

0 resource instances configured

Node node1: online

       sbd     (stonith:fence_sbd):    Started

Node node2: online

No inactive resources

Migration Summary:

* Node node2:

* Node node1:

Test STONITH

Now it’s time to extensively test your STONITH implementation. Remember, the reliable functionality of your cluster strongly depends on your STONITH.

You need to test all possible error cases multiple times and watch the result to make sure the cluster acts in case of an error correct:

  • Kill every node ( expected behaviour: Remaining node shows killed node offline unclean, after some seconds offline clean

node2:~ # crm_mon -rnfj

node1:~ # echo c > /proc/sysrq-trigger

and vice versa

  • block Corosync communication ( expected behaviour: Nodes cant see each other, one node will try to STONITH the other node, remaining node shows stonithed node offline unclean, after some seconds offline clean

node2:~ # crm_mon -rnfj

node1:~ # iptables -A INPUT-p udp –dport 5405 -j DROP

You also have to test other things, like unavailability of the SBD Device, killing pacemaker and corosync processes and so on. Take your time for that. The better you test the STONITH the more stable the cluster will be.

In the next chapter of the block i will explain deeper the cluster functionality and how to add services to the cluster.

CU soon 🙂