Pacemaker and stonith fencing problem

Issues related to applications and software problems
Post Reply
brysk
Posts: 1
Joined: 2018/01/17 06:20:28

Pacemaker and stonith fencing problem

Post by brysk » 2018/01/17 06:43:14

Hi Centos Folks,

I'm trying to setup 2 node cluster for GFS shared filesystem with fence-scsi fencing agent. My configuration is as below:

Code: Select all

[root@eapa-dhcp01 ~]# pcs config
Cluster Name: cluster
Corosync Nodes:
 eapa-dhcp01 eapa-dhcp02
Pacemaker Nodes:
 eapa-dhcp01 eapa-dhcp02

Resources:
 Clone: dlm-clone
  Meta Attrs: interleave=true ordered=true
  Resource: dlm (class=ocf provider=pacemaker type=controld)
   Operations: monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
               start interval=0s timeout=90 (dlm-start-interval-0s)
               stop interval=0s timeout=100 (dlm-stop-interval-0s)
 Clone: clvmd-clone
  Meta Attrs: interleave=true ordered=true
  Resource: clvmd (class=ocf provider=heartbeat type=clvm)
   Operations: monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
               start interval=0s timeout=90 (clvmd-start-interval-0s)
               stop interval=0s timeout=90 (clvmd-stop-interval-0s)
 Clone: fs_gfs2-clone
  Meta Attrs: interleave=true
  Resource: fs_gfs2 (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/datavg/datalv directory=/mnt/data fstype=gfs2 options=noatime,nodiratime
   Operations: monitor interval=10s on-fail=fence (fs_gfs2-monitor-interval-10s)
               start interval=0s timeout=60 (fs_gfs2-start-interval-0s)
               stop interval=0s timeout=60 (fs_gfs2-stop-interval-0s)
 Resource: service-vip (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: cidr_netmask=24 ip=10.6.18.3
  Operations: monitor interval=5s (service-vip-monitor-interval-5s)
              start interval=0s timeout=20s (service-vip-start-interval-0s)
              stop interval=0s timeout=20s (service-vip-stop-interval-0s)

Stonith Devices:
 Resource: scsi-shooter (class=stonith type=fence_scsi)
  Attributes: devices=/dev/mapper/datavg-datalv pcmk_host_check=static-list pcmk_host_list="eapa-dhcp01 eapa-dhcp02" pcmk_host_map=eapa-dhcp01:1;eapa-dhcp02:2 pcmk_monitor_action=metadata pcmk_reboot_action=off
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (scsi-shooter-monitor-interval-60s)
Fencing Levels:

Location Constraints:
Ordering Constraints:
  Resource Sets:
    set dlm-clone clvmd-clone action=start sequential=true setoptions kind=Mandatory
    set clvmd-clone fs_gfs2-clone action=start sequential=true setoptions kind=Mandatory
Colocation Constraints:
  clvmd-clone with dlm-clone (score:INFINITY)
  fs_gfs2-clone with clvmd-clone (score:INFINITY)
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: cluster
 dc-version: 1.1.16-12.el7-94ff4df
 have-watchdog: false
 last-lrm-refresh: 1516168043
 no-quorum-policy: freeze
Cluster is running fine:

Code: Select all

[root@eapa-dhcp02 data]# pcs status
Cluster name: cluster
Stack: corosync
Current DC: eapa-dhcp01 (version 1.1.16-12.el7-94ff4df) - partition with quorum
Last updated: Wed Jan 17 17:28:53 2018
Last change: Wed Jan 17 16:47:23 2018 by hacluster via crmd on eapa-dhcp01

2 nodes configured
8 resources configured

Online: [ eapa-dhcp01 eapa-dhcp02 ]

Full list of resources:

 Clone Set: dlm-clone [dlm]
     Started: [ eapa-dhcp01 eapa-dhcp02 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ eapa-dhcp01 eapa-dhcp02 ]
 Clone Set: fs_gfs2-clone [fs_gfs2]
     Started: [ eapa-dhcp01 eapa-dhcp02 ]
 service-vip    (ocf::heartbeat:IPaddr2):       Started eapa-dhcp01
 scsi-shooter   (stonith:fence_scsi):   Started eapa-dhcp01

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
SCSI3-RP:

Code: Select all

[root@eapa-dhcp02 ~]# sg_persist -n -i -k -d /dev/mapper/datavg-datalv
  PR generation=0x4, 2 registered reservation keys follow:
    0x976e0000
[root@eapa-dhcp02 ~]# sg_persist -n -i -r -d /dev/mapper/datavg-datalv
  PR generation=0x4, Reservation follows:
    Key=0x976e0000
    scope: LU_SCOPE,  type: Write Exclusive, registrants only

but fencing doesn't work properly, when I'm trying to fence one node:

[root@eapa-dhcp01 ~]# pcs stonith fence eapa-dhcp02
Node: eapa-dhcp02 fenced

here's what I got:

Code: Select all

Jan 17 17:39:23 eapa-dhcp01 crmd[4995]:   notice: Requesting fencing (on) of node eapa-dhcp02
Jan 17 17:39:23 eapa-dhcp01 stonith-ng[4991]:   notice: Client crmd.4995.419379c1 wants to fence (on) 'eapa-dhcp02' with device '(any)'
Jan 17 17:39:23 eapa-dhcp01 stonith-ng[4991]:   notice: Requesting peer fencing (on) of eapa-dhcp02
Jan 17 17:39:23 eapa-dhcp01 stonith-ng[4991]:   notice: Couldn't find anyone to fence (on) eapa-dhcp02 with any device
Jan 17 17:39:23 eapa-dhcp01 stonith-ng[4991]:    error: Operation on of eapa-dhcp02 by <no-one> for crmd.4995@eapa-dhcp01.77937140: No such device
Jan 17 17:39:23 eapa-dhcp01 crmd[4995]:   notice: Stonith operation 14/6:37:0:fc9e719b-2848-4d3b-ae86-e4311bba8322: No such device (-19)
Jan 17 17:39:23 eapa-dhcp01 crmd[4995]:   notice: Stonith operation 14 for eapa-dhcp02 failed (No such device): aborting transition.
Jan 17 17:39:23 eapa-dhcp01 crmd[4995]:  warning: No devices found in cluster to fence eapa-dhcp02, giving up
Jan 17 17:39:23 eapa-dhcp01 crmd[4995]:   notice: Transition aborted: Stonith failed
Jan 17 17:39:23 eapa-dhcp01 crmd[4995]:    error: Unfencing of eapa-dhcp02 by <anyone> failed: No such device (-19)
Jan 17 17:39:23 eapa-dhcp01 crmd[4995]:   notice: Transition 37 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-100.bz2): Complete
Jan 17 17:39:23 eapa-dhcp01 crmd[4995]:   notice: State transition S_TRANSITION_ENGINE -> S_IDLE

I think I have already tried hundreads of stonith configurations, with/without devices, pcmk_host_map, pcmk_reboot_action attributes

I'm using the lastest centos packages:

Code: Select all

[root@eapa-dhcp02 ~]# rpm -qa | grep pacemaker
pacemaker-cli-1.1.16-12.el7.x86_64
pacemaker-libs-1.1.16-12.el7.x86_64
pacemaker-cluster-libs-1.1.16-12.el7.x86_64
pacemaker-1.1.16-12.el7.x86_64
[root@eapa-dhcp02 ~]# rpm -qa | grep fence | grep scsi
fence-agents-scsi-4.0.11-66.el7.x86_64
My block device does support SCSI3-PR, VMs are running on ESXi and hard disk is configured as Raw Device Mapping.


Does anyone have any ideas what's wrong with my setup, any suggestions? I'm really in a dead end now.

hunter86_bg
Posts: 2019
Joined: 2015/02/17 15:14:33
Location: Bulgaria
Contact:

Re: Pacemaker and stonith fencing problem

Post by hunter86_bg » 2018/01/24 02:59:36

Sadly, All my tries (2 in total), to setup this type of fencing ,has never worked.
As your VMs are on ESXI , you can use vmware based fencing , or poison pill (a.k.a. sbd devices) with the 'softdog' (if your Vmware version cannot emulate a watchdog device) module.

Post Reply