I'm trying to setup 2 node cluster for GFS shared filesystem with fence-scsi fencing agent. My configuration is as below:
Code: Select all
[root@eapa-dhcp01 ~]# pcs config
Cluster Name: cluster
Corosync Nodes:
eapa-dhcp01 eapa-dhcp02
Pacemaker Nodes:
eapa-dhcp01 eapa-dhcp02
Resources:
Clone: dlm-clone
Meta Attrs: interleave=true ordered=true
Resource: dlm (class=ocf provider=pacemaker type=controld)
Operations: monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
start interval=0s timeout=90 (dlm-start-interval-0s)
stop interval=0s timeout=100 (dlm-stop-interval-0s)
Clone: clvmd-clone
Meta Attrs: interleave=true ordered=true
Resource: clvmd (class=ocf provider=heartbeat type=clvm)
Operations: monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
start interval=0s timeout=90 (clvmd-start-interval-0s)
stop interval=0s timeout=90 (clvmd-stop-interval-0s)
Clone: fs_gfs2-clone
Meta Attrs: interleave=true
Resource: fs_gfs2 (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/datavg/datalv directory=/mnt/data fstype=gfs2 options=noatime,nodiratime
Operations: monitor interval=10s on-fail=fence (fs_gfs2-monitor-interval-10s)
start interval=0s timeout=60 (fs_gfs2-start-interval-0s)
stop interval=0s timeout=60 (fs_gfs2-stop-interval-0s)
Resource: service-vip (class=ocf provider=heartbeat type=IPaddr2)
Attributes: cidr_netmask=24 ip=10.6.18.3
Operations: monitor interval=5s (service-vip-monitor-interval-5s)
start interval=0s timeout=20s (service-vip-start-interval-0s)
stop interval=0s timeout=20s (service-vip-stop-interval-0s)
Stonith Devices:
Resource: scsi-shooter (class=stonith type=fence_scsi)
Attributes: devices=/dev/mapper/datavg-datalv pcmk_host_check=static-list pcmk_host_list="eapa-dhcp01 eapa-dhcp02" pcmk_host_map=eapa-dhcp01:1;eapa-dhcp02:2 pcmk_monitor_action=metadata pcmk_reboot_action=off
Meta Attrs: provides=unfencing
Operations: monitor interval=60s (scsi-shooter-monitor-interval-60s)
Fencing Levels:
Location Constraints:
Ordering Constraints:
Resource Sets:
set dlm-clone clvmd-clone action=start sequential=true setoptions kind=Mandatory
set clvmd-clone fs_gfs2-clone action=start sequential=true setoptions kind=Mandatory
Colocation Constraints:
clvmd-clone with dlm-clone (score:INFINITY)
fs_gfs2-clone with clvmd-clone (score:INFINITY)
Ticket Constraints:
Alerts:
No alerts defined
Resources Defaults:
No defaults set
Operations Defaults:
No defaults set
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: cluster
dc-version: 1.1.16-12.el7-94ff4df
have-watchdog: false
last-lrm-refresh: 1516168043
no-quorum-policy: freeze
Code: Select all
[root@eapa-dhcp02 data]# pcs status
Cluster name: cluster
Stack: corosync
Current DC: eapa-dhcp01 (version 1.1.16-12.el7-94ff4df) - partition with quorum
Last updated: Wed Jan 17 17:28:53 2018
Last change: Wed Jan 17 16:47:23 2018 by hacluster via crmd on eapa-dhcp01
2 nodes configured
8 resources configured
Online: [ eapa-dhcp01 eapa-dhcp02 ]
Full list of resources:
Clone Set: dlm-clone [dlm]
Started: [ eapa-dhcp01 eapa-dhcp02 ]
Clone Set: clvmd-clone [clvmd]
Started: [ eapa-dhcp01 eapa-dhcp02 ]
Clone Set: fs_gfs2-clone [fs_gfs2]
Started: [ eapa-dhcp01 eapa-dhcp02 ]
service-vip (ocf::heartbeat:IPaddr2): Started eapa-dhcp01
scsi-shooter (stonith:fence_scsi): Started eapa-dhcp01
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
Code: Select all
[root@eapa-dhcp02 ~]# sg_persist -n -i -k -d /dev/mapper/datavg-datalv
PR generation=0x4, 2 registered reservation keys follow:
0x976e0000
[root@eapa-dhcp02 ~]# sg_persist -n -i -r -d /dev/mapper/datavg-datalv
PR generation=0x4, Reservation follows:
Key=0x976e0000
scope: LU_SCOPE, type: Write Exclusive, registrants only
but fencing doesn't work properly, when I'm trying to fence one node:
[root@eapa-dhcp01 ~]# pcs stonith fence eapa-dhcp02
Node: eapa-dhcp02 fenced
here's what I got:
Code: Select all
Jan 17 17:39:23 eapa-dhcp01 crmd[4995]: notice: Requesting fencing (on) of node eapa-dhcp02
Jan 17 17:39:23 eapa-dhcp01 stonith-ng[4991]: notice: Client crmd.4995.419379c1 wants to fence (on) 'eapa-dhcp02' with device '(any)'
Jan 17 17:39:23 eapa-dhcp01 stonith-ng[4991]: notice: Requesting peer fencing (on) of eapa-dhcp02
Jan 17 17:39:23 eapa-dhcp01 stonith-ng[4991]: notice: Couldn't find anyone to fence (on) eapa-dhcp02 with any device
Jan 17 17:39:23 eapa-dhcp01 stonith-ng[4991]: error: Operation on of eapa-dhcp02 by <no-one> for crmd.4995@eapa-dhcp01.77937140: No such device
Jan 17 17:39:23 eapa-dhcp01 crmd[4995]: notice: Stonith operation 14/6:37:0:fc9e719b-2848-4d3b-ae86-e4311bba8322: No such device (-19)
Jan 17 17:39:23 eapa-dhcp01 crmd[4995]: notice: Stonith operation 14 for eapa-dhcp02 failed (No such device): aborting transition.
Jan 17 17:39:23 eapa-dhcp01 crmd[4995]: warning: No devices found in cluster to fence eapa-dhcp02, giving up
Jan 17 17:39:23 eapa-dhcp01 crmd[4995]: notice: Transition aborted: Stonith failed
Jan 17 17:39:23 eapa-dhcp01 crmd[4995]: error: Unfencing of eapa-dhcp02 by <anyone> failed: No such device (-19)
Jan 17 17:39:23 eapa-dhcp01 crmd[4995]: notice: Transition 37 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-100.bz2): Complete
Jan 17 17:39:23 eapa-dhcp01 crmd[4995]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
I think I have already tried hundreads of stonith configurations, with/without devices, pcmk_host_map, pcmk_reboot_action attributes
I'm using the lastest centos packages:
Code: Select all
[root@eapa-dhcp02 ~]# rpm -qa | grep pacemaker
pacemaker-cli-1.1.16-12.el7.x86_64
pacemaker-libs-1.1.16-12.el7.x86_64
pacemaker-cluster-libs-1.1.16-12.el7.x86_64
pacemaker-1.1.16-12.el7.x86_64
[root@eapa-dhcp02 ~]# rpm -qa | grep fence | grep scsi
fence-agents-scsi-4.0.11-66.el7.x86_64
Does anyone have any ideas what's wrong with my setup, any suggestions? I'm really in a dead end now.