autofs server probe timeouts [SOLVED]

General support questions
Post Reply
jcbollinger
Posts: 4
Joined: 2011/07/29 15:22:19

autofs server probe timeouts [SOLVED]

Post by jcbollinger » 2018/07/16 16:29:50

All my CentOS 6 and CentOS 7 machines recently began manifesting an issue with automounting NFS filesystems exported by a storage system elsewhere in my organization. I present a fuller description elsewhere; here I am focusing on the client side of the problem.

In a nutshell,
  • The server hostname has a large number of distinct resolutions
  • autofs probes each resolution (as it is documented to do), and exactly one of two RPCs it uses for that purpose times out on every resolution of the server hostname
That manifests in the log via a sequence of debug messages like this:
Jul 13 15:48:18 myclient automount[17485]: get_nfs_info: called with host nfs.my.org(10.220.8.68) proto 6 version 0x20
Jul 13 15:48:18 myclient automount[17485]: get_nfs_info: nfs v3 rpc ping time: 0.000290
Jul 13 15:48:18 myclient automount[17485]: get_nfs_info: host nfs.my.org cost 289 weight 0
Jul 13 15:48:18 myclient automount[17485]: get_nfs_info: called with host nfs.my.org(10.220.8.68) proto 17 version 0x20
[... nothing until the beginning of the next probe is logged three seconds later ...]
FWIW, the server is running the Ganesha userspace NFS stack and does not support NFS4. Server administration denies any configuration change within the relevant timeframe, but allows that routine software updates may have been applied. I am confident that neither software update nor configuration change was applied on the client side since well before the problem began manifesting.

I can work around the problem by setting use_hostname_for_mounts = "yes" setting in /etc/autofs.conf, but inasmuch as that skips probing altogether (as I understand it), it seems to gives up many of the advantages that a multiheaded NFS server such as ours provides.

Questions:
  • Is there a plausible client-side explanation for the behavior change?
  • Even if not, then is there a client-side solution other than turning off probing altogether?
Last edited by jcbollinger on 2018/07/17 16:08:46, edited 1 time in total.

jcbollinger
Posts: 4
Joined: 2011/07/29 15:22:19

Re: autofs server probe timeouts

Post by jcbollinger » 2018/07/17 16:08:23

It turns out that the "6" and "17" in the log messages are IP transport protocol numbers representing TCP and UDP, respectively. The mount options did not specify a protocol (and NFS traditionally runs over UDP), so autofs was trying both TCP and UDP probes. The latter were timing out.

The server was always configured to serve NFS only over TCP, but that was not originally a problem. The issue arose when an as-yet uncharacterized change was implemented at the server, or possibly between it and the clients, so that nfs/udp traffic was afterward silently dropped instead of rejected with an ICMP "port unreachable" response. I'm inclined to blame that on a firewall change, but it could also have arisen from an application-layer change on that side.

I resolved the issue on the client side by adding "proto=tcp" to the mount options for each affected filesystem.

Post Reply