Page 1 of 2

5.9 NFSv3 bug?

Posted: 2013/01/22 00:44:35
by jeneral
I'm connecting freeBSD 8.3 NFS clients to centos 5.9 NFS using nfs v3. It worked fine until the upgrade to the latest kernel in 5.9 (2.6.18-348.el5). The NFS clients now hang on certain directories (cannot even 'ls' the directories on the client, but many/most directories work). Reverting to the previous kernel works just fine (2.6.18-308.24.1.el5). What changed in kernel 2.6.18-348.el5 that might affect the NFS? Thanks for any suggestions!

Re: 5.9 NFSv3 bug?

Posted: 2013/01/22 02:55:20
by AlanBartlett
Welcome to the [i]CentOS[/i] fora.

[quote]
What changed in kernel 2.6.18-348.el5 that might affect the NFS?
[/quote]
You can check for yourself by --

[code]
[b]rpm -q --changelog kernel-2.6.18-348.el5 | less[/b]
[/code]

Re: 5.9 NFSv3 bug?

Posted: 2013/01/22 15:34:02
by jeneral
That does reveal:
- [fs] nfsd: don't fail unchecked creates of non-special files (J. Bruce Fields) [848666]
- [fs] nfsd4: return nfserr_symlink on v4 OPEN of non-regular file (J. Bruce Fields) [848666]
- [fs] nfs: nfs_d_automount update caller path after do_add_mount (Carlos Maiolino) [834379]
- [fs] nfs: Don't allow multiple mounts on same mntpnt with -o noac (Sachin Prabhu) [839753]
- [fs] nfsd: safer initialization order in find_file() (Harshula Jayasuriya) [800758]
- [fs] nfsd4: remove use of mutex for file_hashtable (Harshula Jayasuriya) [800758]
- [fs] nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes) (Eric Sandeen) [784191]
- [fs] nfsd: rename 'int access' to 'int may_flags' in nfsd_open() (Eric Sandeen) [784191]
- [fs] nfsd4: Remove check for a 32-bit cookie in nfsd4_readdir() (Eric Sandeen) [784191]
- [fs] nfs: Fix the return value in nfs_page_mkwrite() (Niels de Vos) [818650]
- [fs] nfs: Optimise nfs_vm_page_mkwrite() (Niels de Vos) [818650]
- [fs] nfs: Add debugging facility for nfs aops (Niels de Vos) [818650]
- [fs] nfs: Add the helper nfs_vm_page_mkwrite (Niels de Vos) [818650]
- [fs] knfsd: fix an NFSD bug with full size non-page-aligned reads (J. Bruce Fields) [814626]
- [fs] nfs: allow high priority COMMITs to bypass inode commit lock (Jeff Layton) [773777]
- [fs] nfs: don't skip COMMITs if system under is mem pressure (Jeff Layton) [773777]
- [fs] nfs: nfs_fhget should wait on I_NEW instead of I_LOCK (Sachin Prabhu) [785062]

However, I'm not smart/experience enough with NFS to know what could be causing the problem and more importantly what a work-around might be. I guess what's most important is to see if others run into this problem. My current configuration has been running for years, so it's either a regression bug or a new feature that I need to set some new NFS configuration options (which I really wouldn't have a clue at this point). Two different BSD servers hang in accessing the NFS file system which blows things up. So to me this is a critical bug.

Thanks!

5.9 NFSv3 bug?

Posted: 2013/01/22 23:20:59
by toracat
Enable debugging nfs and see if you can catch useful messages:

echo 1 > /proc/sys/sunrpc/nfs_debug (turn debugging ON)



echo 0 > /proc/sys/sunrpc/nfs_debug (turn debugging OFF)

Check /var/log/messages.

Re: 5.9 NFSv3 bug?

Posted: 2013/01/23 01:46:50
by granroth
I'm seeing similar behavior with SPARC/Solaris 7, 8, & 10 clients connecting with NFSv3. Programs compiled with large file support don't appear to have a problem, but older 32-bit programs compiled without large file support fail to read directories.

Re: 5.9 NFSv3 bug?

Posted: 2013/01/24 01:12:02
by jeneral
It's a bad bug! I saw no output by enabling the debug. Most likely because it doesn't think there are any errors. However, this is why things were hanging: On the directories that have problems it sees multiple files with of the same filename. For example, on one of the directories it *should* show 564 files. With the new kernel is shows 3,124,293!!! And if you 'ls' the directory it lists all 3M+ files repeating the same file multiple times. I was also using the NFS for email and that explains why I was getting multiple emails coming in. In fact, one of my clients texted "per outlook...have one email that appears to be sending 1.5M times. Have received 110,000 copies." Not good.

I'm using the NFS in an HA environment so I can switch between NFS. With the "good" kernel the old duplicate files still exists (I'm sure because of caching). A umount/mount fixes that.

The NFS servers are 64-bit. The two tested clients (FreeBSD) are a 64-bit and a 32-bit version.

Thanks!

Re: 5.9 NFSv3 bug?

Posted: 2013/01/24 04:46:37
by toracat
You might have hit this bug reported upstream:

https://bugzilla.redhat.com/show_bug.cgi?id=739222

[EDIT] The BZ is for EL6, so may not be applicable here.

Re: 5.9 NFSv3 bug?

Posted: 2013/01/31 20:29:49
by granroth
My problem may not be exactly the same, however kernels newer than 2.6.18-308.24.1.el5 on CentOS 5.9 or newer than 2.6.32-220.23.1.el6.x86_64 on CentOS 6.3 when providing NFSv3 services to Solaris clients prevent non-large-file capable client programs from reading directories due to a size type conflict. Here is demonstration code:

[code]
/* A program to demonstrate the differences between large file support and
the lack of large file support in client programs on Solaris when reading NFSv3
directories hosted on CentOS 5.9 and 6.3 servers running current kernels.

To use make two versions of this program on Solaris, one with:

cc -xc99 -g -errwarn test_largefile.c -o test_smallfile

and the other with:

cc -xc99 -g -errwarn $(getconf LFS_CFLAGS) test_largefile.c -o test_largefile

Then you can try to read directories with each and see which one fails
(if either).
*/

#include <sys/types.h>
#include <dirent.h>
#include <errno.h>
#include <string.h>
#include <stdbool.h>
#include <stdio.h>


int main(int argc, char** argv){

int nErr = 0;
DIR* pDir = NULL;
struct dirent* pEnt = NULL;


if((argc < 2)||(strcmp(argv[1], "-h")==0)||(strcmp(argv[1], "--help")==0)){

printf(
"A program to test effects of compiling in large file support.\n"
" Usage: %s DIR_TO_READ\n"
"If the program works you will see a list of the directory entries in\n"
"DIR_TO_READ, else an error message will be printed\n", argv[0]);

if(argc == 1) return 3;
else return 0;
}

errno = 0;
if( (pDir = opendir(argv[1])) == NULL){
nErr = errno;
fprintf(stderr, "Couldn't open %s, reason: %s\n", argv[1], strerror(nErr));
return 4;
}

printf("%s\n", argv[1]);

while(true){
errno = 0;
pEnt = readdir(pDir);
if(pEnt == NULL){
if(errno != 0){
nErr = errno;
fprintf(stderr, "Couldn't read directory entry from %s, reason: %s\n",
argv[1], strerror(nErr));
return 5;
}
break;
}

printf(" |-%s\n", pEnt->d_name);
}

errno = 0;
if(closedir(pDir) != 0){
nErr = errno;
fprintf(stderr, "Couldn't close %s, reason: %s\n", argv[1], strerror(nErr));
return 6;
}

return 0;
}
[/code]

Is there a way to generate a binary on CentOS without large file support, currently? That would make a handier demonstration . . .

I've been holding off on kernel updates, but this is a show-stopper for our half a petabyte of space flight data.

Re: 5.9 NFSv3 bug?

Posted: 2013/02/09 00:04:15
by toracat
For NFS issues with CentOS 5.9 kernels, please see :

http://bugs.centos.org/view.php?id=6241

https://access.redhat.com/knowledge/solutions/306063

Re: 5.9 NFSv3 bug?

Posted: 2013/02/13 17:07:55
by jeneral
Just to clarify:
My server is centos 5.9, 32-bit.

My clients are freebsd 8.3, 32 AND 64-bit versions. If I understand correctly, by default, they have LFS. It fails on both.

Perhaps this is NOT an LFS issue nor a 32-bit client issue?