HA iSCSI Target with DRBD (2 node cluster how-to)

Issues related to applications and software problems
Post Reply
hunter86_bg
Posts: 1710
Joined: 2015/02/17 15:14:33
Location: Bulgaria
Contact:

HA iSCSI Target with DRBD (2 node cluster how-to)

Post by hunter86_bg » 2017/12/28 22:17:36

Hey Community,

as I have seen several times questions about iSCSI in a corosync/pacemaker environment, I have decided to create a "short" how-to.
I would appreciate any feedback (including typos).

Prerequisites:
Minimal Install of 2 CentOS 7.4.1708

Packages I have used:

Code: Select all

# rpm -qa | grep -E "fence|pcs|targetcli|drbd" | sort
drbd90-utils-9.1.0-1.el7.elrepo.x86_64
drbd90-utils-sysvinit-9.1.0-1.el7.elrepo.x86_64
fence-agents-all-4.0.11-66.el7_4.3.x86_64
fence-agents-apc-4.0.11-66.el7_4.3.x86_64
fence-agents-apc-snmp-4.0.11-66.el7_4.3.x86_64
fence-agents-bladecenter-4.0.11-66.el7_4.3.x86_64
fence-agents-brocade-4.0.11-66.el7_4.3.x86_64
fence-agents-cisco-mds-4.0.11-66.el7_4.3.x86_64
fence-agents-cisco-ucs-4.0.11-66.el7_4.3.x86_64
fence-agents-common-4.0.11-66.el7_4.3.x86_64
fence-agents-compute-4.0.11-66.el7_4.3.x86_64
fence-agents-drac5-4.0.11-66.el7_4.3.x86_64
fence-agents-eaton-snmp-4.0.11-66.el7_4.3.x86_64
fence-agents-emerson-4.0.11-66.el7_4.3.x86_64
fence-agents-eps-4.0.11-66.el7_4.3.x86_64
fence-agents-hpblade-4.0.11-66.el7_4.3.x86_64
fence-agents-ibmblade-4.0.11-66.el7_4.3.x86_64
fence-agents-ifmib-4.0.11-66.el7_4.3.x86_64
fence-agents-ilo2-4.0.11-66.el7_4.3.x86_64
fence-agents-ilo-moonshot-4.0.11-66.el7_4.3.x86_64
fence-agents-ilo-mp-4.0.11-66.el7_4.3.x86_64
fence-agents-ilo-ssh-4.0.11-66.el7_4.3.x86_64
fence-agents-intelmodular-4.0.11-66.el7_4.3.x86_64
fence-agents-ipdu-4.0.11-66.el7_4.3.x86_64
fence-agents-ipmilan-4.0.11-66.el7_4.3.x86_64
fence-agents-kdump-4.0.11-66.el7_4.3.x86_64
fence-agents-mpath-4.0.11-66.el7_4.3.x86_64
fence-agents-rhevm-4.0.11-66.el7_4.3.x86_64
fence-agents-rsa-4.0.11-66.el7_4.3.x86_64
fence-agents-rsb-4.0.11-66.el7_4.3.x86_64
fence-agents-sbd-4.0.11-66.el7_4.3.x86_64
fence-agents-scsi-4.0.11-66.el7_4.3.x86_64
fence-agents-vmware-soap-4.0.11-66.el7_4.3.x86_64
fence-agents-wti-4.0.11-66.el7_4.3.x86_64
fence-virt-0.3.2-12.el7.x86_64
kmod-drbd90-9.0.9-1.el7_4.elrepo.x86_64
pcs-0.9.158-6.el7.centos.1.x86_64
targetcli-2.1.fb46-1.el7.noarch
1.Enable 'elrepo' repository on both nodes:

Code: Select all

yum -y install http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
2.Install relevant packages:

Code: Select all

yum -y install fence-agents-all pcs targetcli "*drbd90*" vim-enhanced bash-completion net-tools bind-utils mlocate setroubleshoot-server policycoreutils-{python,devel}
3.Prepare the block device for the drbd (preferably use LVM as it now supports lvmraid with reshape and takeover):
In my case I'm adding 1 qcow2 disk with a serial number (cat /dev/urandom | tr -cd A-Za-z0-9 | head -c 32 ; echo) defined and same size on both machines:

Code: Select all

vgcreate drbd /dev/disk/by-id/virtio-YOUR_SERIAL_NUMBER
lvcreate -l 100%FREE -n drbd0 drbd
4.Create your drbd configuration (example on 'man 5 drbd.conf-9.0'):

Code: Select all

resource drbd0 {
                 net {
                      cram-hmac-alg sha1;
                      shared-secret "FooFunFactory";
                 }
                 volume 0 {
                      device    /dev/drbd0;
                      disk      /dev/drbd/drbd0;
                      meta-disk internal;
                 }
                 on drbd5-centos {
                      node-id   0;
                      address   192.168.122.80:7000;
                 }
                 on drbd6-centos {
                      node-id   1;
                      address   192.168.122.81:7000;
                 }
                 connection {
                      host      drbd5-centos  port 7000;
                      host      drbd6-centos  port 7000;
                      net {
                          protocol C;
                      }
                 }
           }
Use the same config on both nodes, as we use the same LV.

Prepare local block device for drbd replication (both nodes):

Code: Select all

drbdadm create-md drbd0
drbdadm up drbd0
When you check the status , it should be still in "Connecting" state as follows:

Code: Select all

#drbdadm status
drbd0 role:Secondary
  disk:Inconsistent
  drbd6-centos connection:Connecting
5.Prepare a firewalld service.
Copy an existing firewalld service:

Code: Select all

cp /usr/lib/firewalld/services/ssh.xml /etc/firewalld/services/drbd0.xml
Edit until it becomes like this:

Code: Select all

<?xml version="1.0" encoding="utf-8"?>
<service>
  <short>DRBD0</short>
  <description>DRBD0 service</description>
  <port protocol="tcp" port="7000"/>
</service>
Reload firewalld to enable the new service:

Code: Select all

firewall-cmd --reload && firewall-cmd --permanent --add-service={drbd0,high-availability,iscsi-target} && firewall-cmd --reload
Now the DRBD status should look like this:

Code: Select all

drbdadm status
drbd0 role:Secondary
  disk:Inconsistent
  drbd6-centos role:Secondary
    peer-disk:Inconsistent
Force one of the nodes to become primary , which will sync the nodes:

Code: Select all

drbdadm --force primary drbd0
Wait till the sync reaches 100%.You can use 'drbdadm status'.

6.Set SELinux in permissive. Do not disable SELinux, as we won't have AVC denials for analysis.

Code: Select all

setenforce 0
7.Preparation of cluster (I'm using fence_xvm for STONITH)
Enable pcsd on both nodes:

Code: Select all

systemctl enable --now pcsd
Change the password of 'hacluster' user on both nodes:

Code: Select all

echo centos | passwd --stdin hacluster
Auth both pcs daemons:

Code: Select all

pcs cluster auth drbd5-centos drbd6-centos
Note: use user hacluster with the password from the previous step

Build the cluster:

Code: Select all

pcs cluster setup --start --enable --name CentOS-DRBD-iSCSI drbd5-centos drbd6-centos --transport udpu --wait_for_all=1 --encryption 1
Note: DNS resolution is required.For node name use the output of 'uname -n'.

Build your STONITH. I'm skipping this part.
Once your STONITH is working test it via:

Code: Select all

pcs stonith fence NODE_NAME
If you plan not to use a STONITH device (which is HIGHLY not recommended), disable stonith-enabled property or no resource will be started.
To check run:

Code: Select all

pcs property show --all | grep stonith-enabled
8.DRBD cluster resource configuration

Code: Select all

pcs cluster cib /root/cluster
pcs -f /root/cluster resource create DRBD0 ocf:linbit:drbd drbd_resource=drbd0
pcs -f /root/cluster resource master MASTER-DRBD0 DRBD0 meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
pcs cluster cib-push /root/cluster
Note: The master will be promoted once a resource depends on it.

9.Iscsi-ip resource creation into iscsi group

Code: Select all

pcs resource create iscsi-ip ocf:heartbeat:IPaddr2 ip=192.168.122.244 cidr_netmask=24 --group iscsi
10.Colocation and order constraints for iscsi group

Code: Select all

rm -f /root/cluster
pcs cluster cib /root/cluster
pcs -f /root/cluster constraint order promote MASTER-DRBD0 then start iscsi Mandatory id=iscsi-always-after-master-drbd
pcs -f /root/cluster constraint colocation add iscsi with master MASTER-DRBD0 INFINITY id=iscsi-group-where-master-drbd
pcs cluster cib-push /root/cluster
If the ip is not up and no master is started - check if SELinux is in permissive (we will fix this later).
Test resource being migrated via:

Code: Select all

pcs node standby && sleep 60 && pcs node unstandby
If your constraint does not work properly , you might have your iscsi-ip on the recently 'unstandby'-ed node. Check your colocation constraint!
Also consider setting some stickiness via:

Code: Select all

pcs property set default-resource-stickiness=100
11.Enable target.service on both nodes:
Note: If you skip this , there will be some issues with missing '/sys/kernel/config/target'.

Code: Select all

systemctl enable --now target.service
12.iSCSI target and LUN resources to the iscsi group

Code: Select all

rm -f /root/cluster
pcs cluster cib /root/cluster
pcs -f /root/cluster resource create iscsi-target ocf:heartbeat:iSCSITarget iqn="iqn.2018-01.com.example:centos" allowed_initiators="iqn.2018-01.com.example:kalinsg01"   --group iscsi
pcs -f /root/cluster resource create iscsi-lun0 ocf:heartbeat:iSCSILogicalUnit target_iqn=iqn.2018-01.com.example:centos lun=0 path=/dev/drbd0 --group iscsi
pcs cluster cib-push /root/cluster
Notes:
A)target_iqn must mach the iqn defined during the creation of the target resource
B)If no allowed_initiators are defined for the 'ocf:heartbeat:iSCSITarget' resource - everyone is allowed to access the iSCSI Target

13.Service relocation test.
The relocation is needed in order to generate AVC denials in the /var/log/audit/audit.log

14.SELinux policy generation:
Analyse the audit log via:

Code: Select all

sealert -a /var/log/audit/audit.log
You will see the following recommendations:

Code: Select all

setsebool -P domain_kernel_load_modules 1
setsebool -P daemons_enable_cluster_mode 1
Once you execute them (on both nodes), stop the cluster via:

Code: Select all

pcs cluster stop --all
And reboot both nodes simultaneously.
If the cluster doesn't come up , clean up the audit.log , set into permissive again and repeat step 14.

15.Now discover the iscsi target and login to it from the allowed initiator (for help use "man iscsiadm"):
Edit your initiator name:

Code: Select all

# cat /etc/iscsi/initiatorname.iscsi 
InitiatorName=iqn.2018-01.com.example:kalinsg01
Restart the iscsi daemon:

Code: Select all

systemctl restart iscsid.service
Discover the HA-iSCSI:

Code: Select all

iscsiadm --mode discoverydb --type sendtargets --portal 192.168.122.244 --discover
Note: Use an IP, as there were some bugs in RHEL 7.0

Login to the HA-iSCSI:

Code: Select all

iscsiadm --mode node --targetname  iqn.2018-01.com.example:centos --portal 192.168.122.244:3260 --login
Verify your iscsi device via 'lsscsi':

Code: Select all

#lsscsi
[4:0:0:0]    disk    LIO-ORG  iscsi-lun0       4.0   /dev/sdc
Last edited by hunter86_bg on 2018/05/29 13:39:22, edited 3 times in total.

hunter86_bg
Posts: 1710
Joined: 2015/02/17 15:14:33
Location: Bulgaria
Contact:

Re: HA iSCSI Target with DRBD (2 node cluster how-to)

Post by hunter86_bg » 2018/01/28 21:00:27

After some testing, I have noticed that setting the 'allowed_initiators' in the iscsi-lun0 resource did not work, thus I have modified step 12 to represent a working solution.

If the iSCSI Initiators will use the iscsi-lun0 as a PV -> then we should add the device into the 'global_filter' section of our DRBD clusters' lvm.conf, otherwise the drbd device will be kept as primary on both nodes and this will cause havoc in your cluster.
Here is a short example:

Code: Select all

[root@drbd1-rhel ~]# grep global_filter /etc/lvm/lvm.conf
	# Configuration option devices/global_filter.
	# Use global_filter to hide devices from these LVM system components.
	# global_filter are not opened by LVM.
global_filter = [ "r|/dev/drbd/drbd0|", "r|/dev/drbd0|" ]
	# devices/global_filter.
Note: It is not necessary to rebuild the initramfs via 'dracut' as we do not bring the drbd device before the cluster is up and running, but still it is nice to keep everything consistent.

Post Reply