Posts: 2
Joined: 2005/09/30 07:40:27

Postby justus5 » 2006/08/23 08:29:32

We have an active/passive cluster running on two Dell 2850 servers connected
to a SAN storage with 1.7T GFS filesystem and Samba to share it as clustered services.
The 'package' is configured 'by the book' (RHCS Install/Admin guide) and during the test
period it showed up no problems (well, the GFS seemed to be a _bit_ slow .. )

The cluster has been running ok in production for a some time now until yesterday when it
started to give kernel panics. We have ~15 concurrent users connecting to the cluster through
the Samba and file processing is mostly reading/writing of .mdb and .jpeg files. Only thing I've
figured so far, is that the panics occur when the server load gets a bit higher (and we're still
talking no more than max 5% CPU load ..)

Centos (4.3), Cluster Suite and Samba are all updated to latest 4.3/csgfs versions with no

Could this be somekind of DLM lock issue? As the panics seem to be related to the amount of
open files(=locks) in the system?

Posts: 8
Joined: 2006/08/24 18:46:19

Re: cluster suite + samba = kernel panic ..

Postby jgeiger2 » 2006/09/06 02:32:45


I don't have an answer, but I am having the same issue. Starting last Friday, our CS/GFS+Samba cluster started locking up solid. The second node detects it, fences (reboots) the first node and it comes back up. We are using 1 PE2850 w/ RHEL4.3 (the one that dies) and one PE2650 w/CentOS4.3, with the DRAC cards as fence devices. Much like yourself, ours is a textbook install.

I can vouch for the _low_ load at which this happens. We can have 5 or 6 users accessing files when it goes belly up. I haven't been able to ascertain what the issue is exactly. Inode usage is fine, no full LVols's, etc. Most frustrating is the fact that nothing of any value seems to get written to the syslog or samba logs.

I will post back if I find a fix, please do the same if you figure it out. I won't be getting much sleep until this gets fixed.



Posts: 8
Joined: 2006/08/24 18:46:19

Re: cluster suite + samba = kernel panic ..

Postby jgeiger2 » 2006/09/06 07:19:35

Here's what I've found thus far:

I can reproduce the problem 100% of the time by opening 5 or 6 MS Office documents from within a share. It's a guaranteed kernel panic. I've been playing with turning off any and all locking options in the smb.conf.sharename, but I still get the same result. I also updated to the latest (3.0.10-4E.9) version of samba. Thus far that hasn't yielded any benefits.

Site Admin
Posts: 867
Joined: 2005/01/03 21:30:54
Location: /country/belgium

cluster suite + samba = kernel panic ..

Postby arrfab » 2006/09/06 09:27:03

Humh, why are you using gfs ? i remember reading somewhere on the samba mailing-list that samba cluster on top of GFS has a lot of issues ... why not using a simple ext3 filesystem and control which node can mount the filesystem by using heartbeat ?

Posts: 8
Joined: 2006/08/24 18:46:19

Re: cluster suite + samba = kernel panic ..

Postby jgeiger2 » 2006/09/12 17:15:35

I got this fixed, or at least worked around it. Arrfab's post I got me thinking that with an active/passive config like ours, GFS wasn't really necessary. Only one node at a time controls/mounts the filesystem. I scratched all the GFS volumes and recreated them as ext3. The whole thing runs like a champ now.

Couple caveats:

Make sure you check Force Unmount when you create the ext3 resources. This ensures everything is unmounted before a node change.
Many of you probably know this, but it was new to me:
I was having trouble with uid/gid mappings of Active Directory users/groups when the cluster rolled to another node. I found the setting "idmap backend = idmap_rid" (Samba HowTo) to be the fix. Now all my samba boxes have the exact same uid/gid mappings for all the domain users and groups.

Now if only quotas were as simple on ext3 as they were on GFS...