NFS Cluster getting weekly errors

hspinks · Post by **hspinks** » 2018/09/06 03:07:59

Greetings All,

I am running an NFS server in a Linux cluster and I am getting the following error from pacemaker;
pcs status
Cluster name: nfs_cluster1
Stack: corosync
Current DC: lastsfile03 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Wed Sep 5 22:02:11 2018
Last change: Tue Aug 28 22:01:15 2018 by hacluster via crmd on lastsfile03

2 nodes configured
7 resources configured

Online: [ lastsfile02 lastsfile03 ]

Full list of resources:

disk_fencing1 (stonith:fence_scsi): Started lastsfile02
Resource Group: nfsgroup
my_lvm (ocf:

LVM): Started lastsfile02
nfsshare (ocf:

Filesystem): Started lastsfile02
nfs-daemon (ocf:

nfsserver): Started lastsfile02
nfs-root (ocf:

exportfs): Started lastsfile02
nfs_ip (ocf:

IPaddr2): Started lastsfile02
nfs-notify (ocf:

nfsnotify): Started lastsfile02
Failed Actions:
* nfs-daemon_monitor_10000 on lastsfile03 'not installed' (5): call=72, status=complete, exitreason='No init script or systemd unit file detected for nfs server',
last-rc-change='Wed Sep 5 01:00:07 2018', queued=0ms, exec=0ms

This issue also caused the cluster to fail over. I am getting this error about every other week. I am installed in a Microsoft Hyper-V environment running in a Microsoft failover cluster utilizing an EMC SAN storage array for storage.

Has anyone seen this error or know what may be causing it or a solution?

hunter86_bg · Post by **hunter86_bg** » 2018/09/06 19:50:12

Most probably a missing package is causing it. Of course you need to debug the script (which means to stop your nfs... downtime is inevitable), and check what's going on.
There are 2 options:
1. Debug the resource by:
- migrate the resource group to the problematic server
- stop only the nfs-daemon resource
- start the resource with "debug-start" and check the output
2. If step 1 doesn't give enough clue, you can extra debug via:
- migrate the resource group to the problematic server
- stop only the nfs-daemon resource
- follow the procedure defined here

Just a side note . Why do you start the IP after the NFS server and not before that ?

Post by **TrevorH** » 2018/09/06 19:54:24

$ rpm -qf /usr/lib/systemd/system/nfs-server.service
nfs-utils-1.3.0-0.54.el7.x86_64

Is that installed?

hspinks · Post by **hspinks** » 2018/11/14 11:32:46

I am still getting this error. I am running on Windows Hyper V environment running on centos 7. I could alleviate downtime if I had a list of the required packages. I do have nfs-utils-1.3.0-0.54.el7.x86_64 installed. The help is much appreciated all!

hunter86_bg · Post by **hunter86_bg** » 2018/11/14 18:42:30

You may enable tracing of the resources in order to get more info.
Another approach is to enable debugging in the script itself (I think it was somewhere in /use/lib/ocf).
Easiest check is to run

Code: Select all

rpm -qa | sort

and inspect any differences in the installed packages.

CentOS

NFS Cluster getting weekly errors

NFS Cluster getting weekly errors

Re: NFS Cluster getting weekly errors

Re: NFS Cluster getting weekly errors

Re: NFS Cluster getting weekly errors

Re: NFS Cluster getting weekly errors