5.9 NFSv3 bug?

General support questions including new installations
jeneral
Posts: 9
Joined: 2013/01/22 00:18:10

5.9 NFSv3 bug?

Post by jeneral » 2013/01/22 00:44:35

I'm connecting freeBSD 8.3 NFS clients to centos 5.9 NFS using nfs v3. It worked fine until the upgrade to the latest kernel in 5.9 (2.6.18-348.el5). The NFS clients now hang on certain directories (cannot even 'ls' the directories on the client, but many/most directories work). Reverting to the previous kernel works just fine (2.6.18-308.24.1.el5). What changed in kernel 2.6.18-348.el5 that might affect the NFS? Thanks for any suggestions!

User avatar
AlanBartlett
Forum Moderator
Posts: 9345
Joined: 2007/10/22 11:30:09
Location: ~/Earth/UK/England/Suffolk
Contact:

Re: 5.9 NFSv3 bug?

Post by AlanBartlett » 2013/01/22 02:55:20

Welcome to the [i]CentOS[/i] fora.

[quote]
What changed in kernel 2.6.18-348.el5 that might affect the NFS?
[/quote]
You can check for yourself by --

[code]
[b]rpm -q --changelog kernel-2.6.18-348.el5 | less[/b]
[/code]

jeneral
Posts: 9
Joined: 2013/01/22 00:18:10

Re: 5.9 NFSv3 bug?

Post by jeneral » 2013/01/22 15:34:02

That does reveal:
- [fs] nfsd: don't fail unchecked creates of non-special files (J. Bruce Fields) [848666]
- [fs] nfsd4: return nfserr_symlink on v4 OPEN of non-regular file (J. Bruce Fields) [848666]
- [fs] nfs: nfs_d_automount update caller path after do_add_mount (Carlos Maiolino) [834379]
- [fs] nfs: Don't allow multiple mounts on same mntpnt with -o noac (Sachin Prabhu) [839753]
- [fs] nfsd: safer initialization order in find_file() (Harshula Jayasuriya) [800758]
- [fs] nfsd4: remove use of mutex for file_hashtable (Harshula Jayasuriya) [800758]
- [fs] nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes) (Eric Sandeen) [784191]
- [fs] nfsd: rename 'int access' to 'int may_flags' in nfsd_open() (Eric Sandeen) [784191]
- [fs] nfsd4: Remove check for a 32-bit cookie in nfsd4_readdir() (Eric Sandeen) [784191]
- [fs] nfs: Fix the return value in nfs_page_mkwrite() (Niels de Vos) [818650]
- [fs] nfs: Optimise nfs_vm_page_mkwrite() (Niels de Vos) [818650]
- [fs] nfs: Add debugging facility for nfs aops (Niels de Vos) [818650]
- [fs] nfs: Add the helper nfs_vm_page_mkwrite (Niels de Vos) [818650]
- [fs] knfsd: fix an NFSD bug with full size non-page-aligned reads (J. Bruce Fields) [814626]
- [fs] nfs: allow high priority COMMITs to bypass inode commit lock (Jeff Layton) [773777]
- [fs] nfs: don't skip COMMITs if system under is mem pressure (Jeff Layton) [773777]
- [fs] nfs: nfs_fhget should wait on I_NEW instead of I_LOCK (Sachin Prabhu) [785062]

However, I'm not smart/experience enough with NFS to know what could be causing the problem and more importantly what a work-around might be. I guess what's most important is to see if others run into this problem. My current configuration has been running for years, so it's either a regression bug or a new feature that I need to set some new NFS configuration options (which I really wouldn't have a clue at this point). Two different BSD servers hang in accessing the NFS file system which blows things up. So to me this is a critical bug.

Thanks!

User avatar
toracat
Site Admin
Posts: 7518
Joined: 2006/09/03 16:37:24
Location: California, US
Contact:

5.9 NFSv3 bug?

Post by toracat » 2013/01/22 23:20:59

Enable debugging nfs and see if you can catch useful messages:

echo 1 > /proc/sys/sunrpc/nfs_debug (turn debugging ON)



echo 0 > /proc/sys/sunrpc/nfs_debug (turn debugging OFF)

Check /var/log/messages.

granroth
Posts: 4
Joined: 2013/01/23 01:38:58

Re: 5.9 NFSv3 bug?

Post by granroth » 2013/01/23 01:46:50

I'm seeing similar behavior with SPARC/Solaris 7, 8, & 10 clients connecting with NFSv3. Programs compiled with large file support don't appear to have a problem, but older 32-bit programs compiled without large file support fail to read directories.

jeneral
Posts: 9
Joined: 2013/01/22 00:18:10

Re: 5.9 NFSv3 bug?

Post by jeneral » 2013/01/24 01:12:02

It's a bad bug! I saw no output by enabling the debug. Most likely because it doesn't think there are any errors. However, this is why things were hanging: On the directories that have problems it sees multiple files with of the same filename. For example, on one of the directories it *should* show 564 files. With the new kernel is shows 3,124,293!!! And if you 'ls' the directory it lists all 3M+ files repeating the same file multiple times. I was also using the NFS for email and that explains why I was getting multiple emails coming in. In fact, one of my clients texted "per outlook...have one email that appears to be sending 1.5M times. Have received 110,000 copies." Not good.

I'm using the NFS in an HA environment so I can switch between NFS. With the "good" kernel the old duplicate files still exists (I'm sure because of caching). A umount/mount fixes that.

The NFS servers are 64-bit. The two tested clients (FreeBSD) are a 64-bit and a 32-bit version.

Thanks!

User avatar
toracat
Site Admin
Posts: 7518
Joined: 2006/09/03 16:37:24
Location: California, US
Contact:

Re: 5.9 NFSv3 bug?

Post by toracat » 2013/01/24 04:46:37

You might have hit this bug reported upstream:

https://bugzilla.redhat.com/show_bug.cgi?id=739222

[EDIT] The BZ is for EL6, so may not be applicable here.

granroth
Posts: 4
Joined: 2013/01/23 01:38:58

Re: 5.9 NFSv3 bug?

Post by granroth » 2013/01/31 20:29:49

My problem may not be exactly the same, however kernels newer than 2.6.18-308.24.1.el5 on CentOS 5.9 or newer than 2.6.32-220.23.1.el6.x86_64 on CentOS 6.3 when providing NFSv3 services to Solaris clients prevent non-large-file capable client programs from reading directories due to a size type conflict. Here is demonstration code:

[code]
/* A program to demonstrate the differences between large file support and
the lack of large file support in client programs on Solaris when reading NFSv3
directories hosted on CentOS 5.9 and 6.3 servers running current kernels.

To use make two versions of this program on Solaris, one with:

cc -xc99 -g -errwarn test_largefile.c -o test_smallfile

and the other with:

cc -xc99 -g -errwarn $(getconf LFS_CFLAGS) test_largefile.c -o test_largefile

Then you can try to read directories with each and see which one fails
(if either).
*/

#include <sys/types.h>
#include <dirent.h>
#include <errno.h>
#include <string.h>
#include <stdbool.h>
#include <stdio.h>


int main(int argc, char** argv){

int nErr = 0;
DIR* pDir = NULL;
struct dirent* pEnt = NULL;


if((argc < 2)||(strcmp(argv[1], "-h")==0)||(strcmp(argv[1], "--help")==0)){

printf(
"A program to test effects of compiling in large file support.\n"
" Usage: %s DIR_TO_READ\n"
"If the program works you will see a list of the directory entries in\n"
"DIR_TO_READ, else an error message will be printed\n", argv[0]);

if(argc == 1) return 3;
else return 0;
}

errno = 0;
if( (pDir = opendir(argv[1])) == NULL){
nErr = errno;
fprintf(stderr, "Couldn't open %s, reason: %s\n", argv[1], strerror(nErr));
return 4;
}

printf("%s\n", argv[1]);

while(true){
errno = 0;
pEnt = readdir(pDir);
if(pEnt == NULL){
if(errno != 0){
nErr = errno;
fprintf(stderr, "Couldn't read directory entry from %s, reason: %s\n",
argv[1], strerror(nErr));
return 5;
}
break;
}

printf(" |-%s\n", pEnt->d_name);
}

errno = 0;
if(closedir(pDir) != 0){
nErr = errno;
fprintf(stderr, "Couldn't close %s, reason: %s\n", argv[1], strerror(nErr));
return 6;
}

return 0;
}
[/code]

Is there a way to generate a binary on CentOS without large file support, currently? That would make a handier demonstration . . .

I've been holding off on kernel updates, but this is a show-stopper for our half a petabyte of space flight data.

User avatar
toracat
Site Admin
Posts: 7518
Joined: 2006/09/03 16:37:24
Location: California, US
Contact:

Re: 5.9 NFSv3 bug?

Post by toracat » 2013/02/09 00:04:15

For NFS issues with CentOS 5.9 kernels, please see :

http://bugs.centos.org/view.php?id=6241

https://access.redhat.com/knowledge/solutions/306063

jeneral
Posts: 9
Joined: 2013/01/22 00:18:10

Re: 5.9 NFSv3 bug?

Post by jeneral » 2013/02/13 17:07:55

Just to clarify:
My server is centos 5.9, 32-bit.

My clients are freebsd 8.3, 32 AND 64-bit versions. If I understand correctly, by default, they have LFS. It fails on both.

Perhaps this is NOT an LFS issue nor a 32-bit client issue?

Post Reply