NFS: UDP works, TCP times out

Issues related to configuring your network
NdFeB
Posts: 9
Joined: 2017/02/03 16:39:04

NFS: UDP works, TCP times out

Postby NdFeB » 2018/01/18 10:08:36

Hi,

On a fresh install of CentOS 6.9 with all updates, i can mount NFS with UDP protocol and run `rpcinfo -u 10.3.255.234 nfs 3`, the answer is "program 100003 version 3 ready and waiting". I can mount any NFS export over UDP without any problem.

However, when it comes to TCP, any mount attempt hangs 3 min then times out. Same for `rpcinfo -t 10.3.255.234 nfs 3`. I have several other servers with CentOS 6.9 with all udpates, they can mount NFS over TCP without any problem, and run rpcinfo -t and -u, they get positive answers. They are not fresh install.

I can observe the following behaviors on the faulty clients:

1. no layer 4 protocol option: we see that even the UDP attempt times out

Code: Select all

[root@srv-tls-test02 ~]# mount -t nfs 10.3.255.234:/vol/vol_testunix /mnt -o vers=3 -v
mount.nfs: timeout set for Thu Jan 18 10:48:40 2018
mount.nfs: trying text-based options 'vers=3,addr=10.3.255.234'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 10.3.255.234 prog 100003 vers 3 prot TCP port 2049
mount.nfs: prog 100005, trying vers=3, prot=17
mount.nfs: trying 10.3.255.234 prog 100005 vers 3 prot UDP port 4046
(timeout)

2. with tcp option

Code: Select all

[root@srv-tls-test02 ~]# mount -t nfs 10.3.255.234:/vol/vol_testunix /mnt -o vers=3,tcp -v
mount.nfs: timeout set for Thu Jan 18 10:49:25 2018
mount.nfs: trying text-based options 'vers=3,tcp,addr=10.3.255.234'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 10.3.255.234 prog 100003 vers 3 prot TCP port 2049
mount.nfs: prog 100005, trying vers=3, prot=6
mount.nfs: trying 10.3.255.234 prog 100005 vers 3 prot TCP port 4046
(timeout)

3. with udp option

Code: Select all

[root@srv-tls-test02 ~]# mount -t nfs 10.3.255.234:/vol/vol_testunix /mnt -o vers=3,udp -v
mount.nfs: timeout set for Thu Jan 18 10:49:49 2018
mount.nfs: trying text-based options 'vers=3,udp,addr=10.3.255.234'
mount.nfs: prog 100003, trying vers=3, prot=17
mount.nfs: trying 10.3.255.234 prog 100003 vers 3 prot UDP port 2049
mount.nfs: prog 100005, trying vers=3, prot=17
mount.nfs: trying 10.3.255.234 prog 100005 vers 3 prot UDP port 4046
10.3.255.234:/vol/vol_testunix on /mnt type nfs (rw,vers=3,udp)


I looked a bit at traffic. The working client gets an answer from portmapper, then asks to mount "/vol/vol_testunix". If i search for this string in the fresh installed server, I don't find it, but the portmapper answer is present.

I tried fresh install CentOS 6.9 on 2 different servers with different hardware, both time out with NFS over TCP.

This is not a permission problem. If I spoof the faulty IP with a working server, I can mount over TCP without any problem.

The NFS server is a NetApp filer, so I don't have full control on it. If I nmap it from non-working server, i see NFS and rpcbind ports open.

I suspect the default configuration on client side. iptables and ip6tables are flushed and disabled. IPv6 is not used at all. rpcbind, portmapper, mountd, lockd, statd, nfs services are running. rpcinfo on localhost displays "portmapper", "status" and "mountd" for both tcp and udp.

rpcinfo -p localhost:

Code: Select all

[root@srv-tls-test02 ~]# rpcinfo -p localhost
   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp  34749  status
    100024    1   tcp  57346  status
    100005    1   udp  51983  mountd
    100005    1   tcp  34221  mountd
    100005    2   udp  51130  mountd
    100005    2   tcp  49610  mountd
    100005    3   udp  60273  mountd
    100005    3   tcp  55271  mountd

ps aux | egrep "nfs|rpc|lock" :

Code: Select all

[root@srv-tls-test02 ~]# ps aux | egrep "nfs|rpc|lock"
root        28  0.0  0.0      0     0 ?        S    10:04   0:00 [kblockd/0]
root        29  0.0  0.0      0     0 ?        S    10:04   0:00 [kblockd/1]
rpcuser   1380  0.0  0.0  23352  1368 ?        Ss   10:04   0:00 rpc.statd
root      1468  0.0  0.0      0     0 ?        S    10:04   0:00 [rpciod/0]
root      1469  0.0  0.0      0     0 ?        S    10:04   0:00 [rpciod/1]
root      1683  0.0  0.0      0     0 ?        S    10:07   0:00 [nfsiod]
root      1919  0.0  0.0  21672   992 ?        Ss   10:21   0:00 rpc.mountd
root      1925  0.0  0.0      0     0 ?        S    10:21   0:00 [lockd]
root      1926  0.0  0.0      0     0 ?        S    10:21   0:00 [nfsd4]
root      1927  0.0  0.0      0     0 ?        S    10:21   0:00 [nfsd4_callbacks]
root      1928  0.0  0.0      0     0 ?        S    10:21   0:00 [nfsd]
root      1929  0.0  0.0      0     0 ?        S    10:21   0:00 [nfsd]
root      1930  0.0  0.0      0     0 ?        S    10:21   0:00 [nfsd]
root      1931  0.0  0.0      0     0 ?        S    10:21   0:00 [nfsd]
root      1932  0.0  0.0      0     0 ?        S    10:21   0:00 [nfsd]
root      1933  0.0  0.0      0     0 ?        S    10:21   0:00 [nfsd]
root      1934  0.0  0.0      0     0 ?        S    10:21   0:00 [nfsd]
root      1935  0.0  0.0      0     0 ?        S    10:21   0:00 [nfsd]
root      1966  0.0  0.0  25172   632 ?        Ss   10:21   0:00 rpc.idmapd
rpc       2021  0.0  0.0  18980   932 ?        Ss   10:22   0:00 rpcbind
rpcuser   2042  0.0  0.0  23352  1364 ?        Ss   10:39   0:00 rpc.statd --no-notify
root      2225  0.0  0.0  21672   996 ?        Ss   11:15   0:00 rpc.mountd
root      2239  0.0  0.0 103324   900 pts/1    S+   11:25   0:00 egrep nfs|rpc|lock

sysinfo:

Code: Select all

[root@srv-tls-test02 ~]# uname -a
Linux srv-tls-test02 2.6.32-696.18.7.el6.x86_64 #1 SMP Thu Jan 4 17:31:22 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Code: Select all

[root@srv-tls-test02 ~]# cat /etc/redhat-release
CentOS release 6.9 (Final)


Does anyone has an idea ? I am really stuck there. Any hint will be appreciated. Thanks !

EDIT: added information about running services.

EDIT2: added sysinfo

NdFeB
Posts: 9
Joined: 2017/02/03 16:39:04

Re: NFS: UDP works, TCP times out

Postby NdFeB » 2018/01/31 08:40:00

Hello,

I'm going to answer to myself.

Our CentOS host and our NetApp server were not the problem. We have an HP switch (HP1820-48G J9981A) with a security protection against invalid TCP flags attacks. I don't have time to analyze my captures further right now, so I still don't know what exactly is triggering this protection.

All queries reach the NetApp server, all authorizations are given, then the last sync query from the client was dropped by the switch.

I cannot pastebin a pcap capture as it contains sensitive information about my company. If I have time to analyze it on my free time, I will give more information on this thread.

Regards.