5.9 NFSv3 bug?

General support questions including new installations

5.9 NFSv3 bug?

Postby jeneral » 2013/01/22 00:44:35

I'm connecting freeBSD 8.3 NFS clients to centos 5.9 NFS using nfs v3. It worked fine until the upgrade to the latest kernel in 5.9 (2.6.18-348.el5). The NFS clients now hang on certain directories (cannot even 'ls' the directories on the client, but many/most directories work). Reverting to the previous kernel works just fine (2.6.18-308.24.1.el5). What changed in kernel 2.6.18-348.el5 that might affect the NFS? Thanks for any suggestions!
jeneral
 
Posts: 6
Joined: 2013/01/22 00:18:10

Re: 5.9 NFSv3 bug?

Postby AlanBartlett » 2013/01/22 02:55:20

Welcome to the CentOS fora.

What changed in kernel 2.6.18-348.el5 that might affect the NFS?

You can check for yourself by --

Code: Select all
[b]rpm -q --changelog kernel-2.6.18-348.el5 | less[/b]
User avatar
AlanBartlett
Forum Moderator
 
Posts: 8975
Joined: 2007/10/22 11:30:09
Location: ~/Earth/UK/England/Suffolk

Re: 5.9 NFSv3 bug?

Postby jeneral » 2013/01/22 15:34:02

That does reveal:
- [fs] nfsd: don't fail unchecked creates of non-special files (J. Bruce Fields) [848666]
- [fs] nfsd4: return nfserr_symlink on v4 OPEN of non-regular file (J. Bruce Fields) [848666]
- [fs] nfs: nfs_d_automount update caller path after do_add_mount (Carlos Maiolino) [834379]
- [fs] nfs: Don't allow multiple mounts on same mntpnt with -o noac (Sachin Prabhu) [839753]
- [fs] nfsd: safer initialization order in find_file() (Harshula Jayasuriya) [800758]
- [fs] nfsd4: remove use of mutex for file_hashtable (Harshula Jayasuriya) [800758]
- [fs] nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes) (Eric Sandeen) [784191]
- [fs] nfsd: rename 'int access' to 'int may_flags' in nfsd_open() (Eric Sandeen) [784191]
- [fs] nfsd4: Remove check for a 32-bit cookie in nfsd4_readdir() (Eric Sandeen) [784191]
- [fs] nfs: Fix the return value in nfs_page_mkwrite() (Niels de Vos) [818650]
- [fs] nfs: Optimise nfs_vm_page_mkwrite() (Niels de Vos) [818650]
- [fs] nfs: Add debugging facility for nfs aops (Niels de Vos) [818650]
- [fs] nfs: Add the helper nfs_vm_page_mkwrite (Niels de Vos) [818650]
- [fs] knfsd: fix an NFSD bug with full size non-page-aligned reads (J. Bruce Fields) [814626]
- [fs] nfs: allow high priority COMMITs to bypass inode commit lock (Jeff Layton) [773777]
- [fs] nfs: don't skip COMMITs if system under is mem pressure (Jeff Layton) [773777]
- [fs] nfs: nfs_fhget should wait on I_NEW instead of I_LOCK (Sachin Prabhu) [785062]

However, I'm not smart/experience enough with NFS to know what could be causing the problem and more importantly what a work-around might be. I guess what's most important is to see if others run into this problem. My current configuration has been running for years, so it's either a regression bug or a new feature that I need to set some new NFS configuration options (which I really wouldn't have a clue at this point). Two different BSD servers hang in accessing the NFS file system which blows things up. So to me this is a critical bug.

Thanks!
jeneral
 
Posts: 6
Joined: 2013/01/22 00:18:10

5.9 NFSv3 bug?

Postby toracat » 2013/01/22 23:20:59

Enable debugging nfs and see if you can catch useful messages:

echo 1 > /proc/sys/sunrpc/nfs_debug (turn debugging ON)



echo 0 > /proc/sys/sunrpc/nfs_debug (turn debugging OFF)

Check /var/log/messages.
User avatar
toracat
Forum Moderator
 
Posts: 6694
Joined: 2006/09/03 16:37:24
Location: California, US

Re: 5.9 NFSv3 bug?

Postby granroth » 2013/01/23 01:46:50

I'm seeing similar behavior with SPARC/Solaris 7, 8, & 10 clients connecting with NFSv3. Programs compiled with large file support don't appear to have a problem, but older 32-bit programs compiled without large file support fail to read directories.
granroth
 
Posts: 2
Joined: 2013/01/23 01:38:58

Re: 5.9 NFSv3 bug?

Postby jeneral » 2013/01/24 01:12:02

It's a bad bug! I saw no output by enabling the debug. Most likely because it doesn't think there are any errors. However, this is why things were hanging: On the directories that have problems it sees multiple files with of the same filename. For example, on one of the directories it *should* show 564 files. With the new kernel is shows 3,124,293!!! And if you 'ls' the directory it lists all 3M+ files repeating the same file multiple times. I was also using the NFS for email and that explains why I was getting multiple emails coming in. In fact, one of my clients texted "per outlook...have one email that appears to be sending 1.5M times. Have received 110,000 copies." Not good.

I'm using the NFS in an HA environment so I can switch between NFS. With the "good" kernel the old duplicate files still exists (I'm sure because of caching). A umount/mount fixes that.

The NFS servers are 64-bit. The two tested clients (FreeBSD) are a 64-bit and a 32-bit version.

Thanks!
jeneral
 
Posts: 6
Joined: 2013/01/22 00:18:10

Re: 5.9 NFSv3 bug?

Postby toracat » 2013/01/24 04:46:37

You might have hit this bug reported upstream:

https://bugzilla.redhat.com/show_bug.cgi?id=739222

[EDIT] The BZ is for EL6, so may not be applicable here.
User avatar
toracat
Forum Moderator
 
Posts: 6694
Joined: 2006/09/03 16:37:24
Location: California, US

Re: 5.9 NFSv3 bug?

Postby granroth » 2013/01/31 20:29:49

My problem may not be exactly the same, however kernels newer than 2.6.18-308.24.1.el5 on CentOS 5.9 or newer than 2.6.32-220.23.1.el6.x86_64 on CentOS 6.3 when providing NFSv3 services to Solaris clients prevent non-large-file capable client programs from reading directories due to a size type conflict. Here is demonstration code:

Code: Select all
/* A program to demonstrate the differences between large file support and
   the lack of large file support in client programs on Solaris when reading NFSv3
   directories hosted on CentOS 5.9 and 6.3 servers running current kernels.

   To use make two versions of this program on Solaris, one with:
   
   cc -xc99 -g -errwarn test_largefile.c -o test_smallfile
   
   and the other with:
   
   cc -xc99 -g -errwarn $(getconf LFS_CFLAGS) test_largefile.c -o test_largefile
   
   Then you can try to read directories with each and see which one fails
   (if either).
*/

#include <sys/types.h>
#include <dirent.h>
#include <errno.h>
#include <string.h>
#include <stdbool.h>
#include <stdio.h>


int main(int argc, char** argv){

   int nErr = 0;
   DIR* pDir = NULL;
   struct dirent* pEnt = NULL;

   
   if((argc < 2)||(strcmp(argv[1], "-h")==0)||(strcmp(argv[1], "--help")==0)){
      
      printf(
"A program to test effects of compiling in large file support.\n"
"  Usage:  %s  DIR_TO_READ\n"
"If the program works you will see a list of the directory entries in\n"
"DIR_TO_READ, else an error message will be printed\n", argv[0]);
      
      if(argc == 1) return 3;
      else return 0;
   }
   
   errno = 0;
   if( (pDir = opendir(argv[1])) == NULL){
      nErr = errno;
      fprintf(stderr, "Couldn't open %s, reason: %s\n", argv[1], strerror(nErr));
      return 4;
   }
   
   printf("%s\n", argv[1]);
   
   while(true){
      errno = 0;
      pEnt = readdir(pDir);
      if(pEnt == NULL){
         if(errno != 0){
            nErr = errno;
            fprintf(stderr, "Couldn't read directory entry from %s, reason: %s\n",
                    argv[1], strerror(nErr));
            return 5;
         }
         break;
      }
      
      printf(" |-%s\n", pEnt->d_name);
   }
   
   errno = 0;
   if(closedir(pDir) != 0){
      nErr = errno;
      fprintf(stderr, "Couldn't close %s, reason: %s\n", argv[1], strerror(nErr));
      return 6;
   }

   return 0;
}


Is there a way to generate a binary on CentOS without large file support, currently? That would make a handier demonstration . . .

I've been holding off on kernel updates, but this is a show-stopper for our half a petabyte of space flight data.
granroth
 
Posts: 2
Joined: 2013/01/23 01:38:58

Re: 5.9 NFSv3 bug?

Postby toracat » 2013/02/09 00:04:15

User avatar
toracat
Forum Moderator
 
Posts: 6694
Joined: 2006/09/03 16:37:24
Location: California, US

Re: 5.9 NFSv3 bug?

Postby jeneral » 2013/02/13 17:07:55

Just to clarify:
My server is centos 5.9, 32-bit.

My clients are freebsd 8.3, 32 AND 64-bit versions. If I understand correctly, by default, they have LFS. It fails on both.

Perhaps this is NOT an LFS issue nor a 32-bit client issue?
jeneral
 
Posts: 6
Joined: 2013/01/22 00:18:10

Next

Return to CentOS 5 - General Support

Who is online

Users browsing this forum: No registered users and 2 guests