[solved] SELinux setting for Torque's pam_pbssimpleauth.so

Support for security such as Firewalls and securing linux
-javier
Posts: 5
Joined: 2017/05/23 08:42:53

[solved] SELinux setting for Torque's pam_pbssimpleauth.so

Postby -javier » 2017/06/15 18:25:50

Hi all,

I have installed the Torque batch system (from EPEL) in a tiny demo cluster (frontend+3 nodes). In the front-end I have

Code: Select all

[root@n0 ~]# rpm -qa |grep torque
torque-4.2.10-10.el7.x86_64
torque-server-4.2.10-10.el7.x86_64
torque-libs-4.2.10-10.el7.x86_64
torque-docs-4.2.10-10.el7.noarch
torque-gui-4.2.10-10.el7.x86_64
torque-scheduler-4.2.10-10.el7.x86_64
torque-client-4.2.10-10.el7.x86_64
[root@n0 ~]#

and in the compute nodes I have

Code: Select all

[root@n1 log]# rpm -qa |grep torque
torque-4.2.10-10.el7.x86_64
torque-pam-4.2.10-10.el7.x86_64
torque-libs-4.2.10-10.el7.x86_64
torque-mom-4.2.10-10.el7.x86_64
[root@n1 log]#


torque-pam is just a PAM module and instructions to install it

Code: Select all

[root@n1 log]# rpm -ql torque-pam
/lib64/security/pam_pbssimpleauth.so
/usr/share/doc/torque-pam-4.2.10
/usr/share/doc/torque-pam-4.2.10/README.pam
[root@n1 log]#
[root@n1 log]# cat /usr/share/doc/torque-pam-4.2.10/README.pam
This is a simple PAM module to be used on PBS compute nodes (hosts running
pbs_mom) to authorize users that have a running job. Uid 0 is always allowed.
The optional argument "debug" sends verbose information to syslog.

You'll want something like this in your PAM
conf files:

   account    sufficient   pam_pbssimpleauth.so

The pam_pbssimpleauth module combines nicely with the pam_access module to
allow access to cluster administrators:

   account    sufficient   pam_pbssimpleauth.so
   account    required     pam_access.so

/etc/security/access.conf can then have something like:
  -:ALL EXCEPT root admgroup:ALL

[root@n1 log]#


The setup I'm using is totally simple: all 3 nodes are assigned to the next job until it finishes (or it times out) in strict fifo order. To further ensure user programs run undisturbed, torque-pam can be used to forbid ssh access to compute nodes (unless it's the job owner who tries to ssh in -or root, or admgroup). Torque manual says "required"
http://docs.adaptivecomputing.com/torqu ... s%7C_____4
(instead of "sufficient") for pam_pbssimpleauth.so. I'm not sure I understand that. I think I understand sufficient.

Even writing "suficient", I cannot get it working. In compute nodes I have added this last line to the whole-commented-out access.conf

Code: Select all

[root@n1 log]# tail -3 /etc/security/access.conf
# All other users should be denied to get access from all sources.
#- : ALL : ALL
- : ALL EXCEPT root : ALL
[root@n1 log]#

and I think I'm expected to insert the two "account" modules in-between the two "standard" ones

Code: Select all

[root@n1 log]# cat /etc/pam.d/sshd
#%PAM-1.0
auth       required     pam_sepermit.so
auth       substack     password-auth
auth       include      postlogin
# Used with polkit to reauthorize users in remote sessions
-auth      optional     pam_reauthorize.so prepare
account    required     pam_nologin.so
######################################################
account sufficient pam_pbssimpleauth.so debug
account required pam_access.so
######################################################
account    include      password-auth
password   include      password-auth
# pam_selinux.so close should be the first session rule
session    required     pam_selinux.so close
...

because password-auth includes other two "account sufficient" modules (plus other 2 required... should I manually include pam_unix.so before pam_pbssimpleauth?).

The "debug" argument (not mentioned in Torque manual) is great, since it allows me to discover it is not working just because of... SELinux

Code: Select all

[root@n1 log]# less messages
...
Jun 15 18:50:11 n1 pbs_mom: LOG_INFO::create_job_cpuset, creating cpuset for job 15.n0: 0 cpus (), 0 mems ()
Jun 15 18:50:13 n1 pam_pbssimpleauth[1603]: opening /var/lib/torque/mom_priv/jobs
Jun 15 18:50:13 n1 pam_pbssimpleauth[1603]: username javier, known
Jun 15 18:50:13 n1 pam_pbssimpleauth[1603]: opening /var/lib/torque/mom_priv/jobs/15.n0.JB
Jun 15 18:50:13 n1 pam_pbssimpleauth[1603]: error opening job file
Jun 15 18:50:13 n1 pam_pbssimpleauth[1603]: returning failed
Jun 15 18:50:13 n1 dbus-daemon: dbus[692]: [system] Activating service name='org.fedoraproject.Setroubleshootd' (using servicehelper)
...
Jun 15 18:50:16 n1 setroubleshoot: SELinux is preventing /usr/sbin/sshd from read access on the file 15.n0.JB. For complete SELinux messages. run sealert -l 8016e71e-8fe1-4368-a3d9-2576a1f630ec
...
Jun 15 18:51:14 n1 pam_pbssimpleauth[1708]: opening /var/lib/torque/mom_priv/jobs/16.n0.JB
...
Jun 15 18:51:16 n1 setroubleshoot: SELinux is preventing /usr/sbin/sshd from read access on the file 16.n0.JB. For complete SELinux messages. run sealert -l 8016e71e-8fe1-4368-a3d9-2576a1f630ec
...
Jun 15 18:52:07 n1 pam_pbssimpleauth[1763]: opening /var/lib/torque/mom_priv/jobs/17.n0.JB
...
Jun 15 18:52:08 n1 setroubleshoot: SELinux is preventing /usr/sbin/sshd from read access on the file 17.n0.JB. For complete SELinux messages. run sealert -l 8016e71e-8fe1-4368-a3d9-2576a1f630ec
...


SELinux alert is

Code: Select all

[root@n1 log]# sealert -l 8016e71e-8fe1-4368-a3d9-2576a1f630ec
SELinux is preventing /usr/sbin/sshd from read access on the file 17.n0.JB.

*****  Plugin catchall_labels (83.8 confidence) suggests   *******************

If you want to allow sshd to have read access on the 17.n0.JB file
Then you need to change the label on 17.n0.JB
Do
# semanage fcontext -a -t FILE_TYPE '17.n0.JB'
where FILE_TYPE is one of the following: NetworkManager_etc_rw_t, ... , zebra_tmp_t.
Then execute:
restorecon -v '17.n0.JB'


*****  Plugin catchall (17.1 confidence) suggests   **************************

If you believe that sshd should be allowed read access on the 17.n0.JB file by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# ausearch -c 'sshd' --raw | audit2allow -M my-sshd
# semodule -i my-sshd.pp


Additional Information:
Source Context                system_u:system_r:sshd_t:s0-s0:c0.c1023
Target Context                system_u:object_r:var_lib_t:s0
Target Objects                17.n0.JB [ file ]
Source                        sshd
Source Path                   /usr/sbin/sshd
...
Alert Count                   17
First Seen                    2017-06-15 13:43:50 CEST
Last Seen                     2017-06-15 18:52:07 CEST
Local ID                      8016e71e-8fe1-4368-a3d9-2576a1f630ec

Raw Audit Messages
type=AVC msg=audit(1497545527.21:290): avc:  denied  { read } for  pid=1763 comm="sshd" name="17.n0.JB" dev="sda3" ino=1859787 scontext=system_u:system_r:sshd_t:s0-s0:c0.c1023 tcontext=system_u:object_r:var_lib_t:s0 tclass=file


type=SYSCALL msg=audit(1497545527.21:290): arch=x86_64 syscall=open success=no exit=EACCES a0=7ffd0b9b7cf0 a1=0 a2=0 a3=4000 items=0 ppid=1115 pid=1763 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm=sshd exe=/usr/sbin/sshd subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 key=(null)

Hash: sshd,sshd_t,var_lib_t,file,read
[root@n1 log]#


I think i would like that sshd be allowed read access not specifically on the 17.n0.JB file, but on any file (submitted jobs) created in /var/lib/torque/mom_priv/jobs. How could I do that? I think I understand the simplest SELinux concept of file label...

Code: Select all

[root@n1 jobs]# pwd
/var/lib/torque/mom_priv/jobs
[root@n1 jobs]# ls -Z ..
lrwxrwxrwx. root root system_u:object_r:var_lib_t:s0   config -> /etc/torque/mom/config
drwxr-xr-x. root root system_u:object_r:var_lib_t:s0   jobs
lrwxrwxrwx. root root system_u:object_r:var_lib_t:s0   mom.layout -> /etc/torque/mom/mom.layout
-rw-r--r--. root root system_u:object_r:var_lib_t:s0   mom.lock
[root@n1 jobs]# [root@n1 jobs]# ls -la
total 16
drwxr-xr-x. 2 root   root     54 Jun 15 20:10 .
drwxr-xr-x. 3 root   root     66 Jun 15 18:49 ..
-rw-------. 1 root   root   5264 Jun 15 20:10 18.n0.JB
-rwx------. 1 javier javier   10 Jun 15 20:10 18.n0.SC
-rw-------. 1 root   root   2272 Jun 15 20:10 18.n0.TK
[root@n1 jobs]# ls -Z
-rw-------. root   root   system_u:object_r:var_lib_t:s0   18.n0.JB
-rwx------. javier javier system_u:object_r:var_lib_t:s0   18.n0.SC
-rw-------. root   root   system_u:object_r:var_lib_t:s0   18.n0.TK
[root@n1 jobs]# cat 18.n0.JB
    *   !                   ��BY    18.n0
...
[root@n1 jobs]# cat 18.n0.SC
sleep 100
[root@n1 jobs]# cat 18.n0.TK
                                                                                                            18.n0                                                                                                       ����       
[root@n1 jobs]#

...but then I get lost. I have typed ausearch -c 'sshd' --raw and it's the collection of all 17 raw messages (plus some previous ones complaining about authorized_keys on NFS-mounted home, I think). I fear I might cause more trouble if I blindly follow the audit2allow instructions.

Can SELinux allow sshd to read any file in /var/lib/torque/mom_priv/jobs? Thanks in advance for any advice!
Last edited by -javier on 2017/06/22 11:12:27, edited 1 time in total.

poky
Posts: 85
Joined: 2013/03/27 12:18:03

Re: SELinux setting for Torque's pam_pbssimpleauth.so

Postby poky » 2017/06/17 15:56:11

For temporaly change the security context:
chcon -R -v --type=system_u:object_r:usr_t:s0 /var/lib/torque/mom_priv/jobs

Show security context:
ls -alZ /var/lib/torque/mom_priv/jobs/

Back to original security context:
restorecon -R -v -F /var/lib/torque/mom_priv/jobs

For permanent change the security context:
semanage fcontext -a -t system_u:object_r:usr_t:s0 '/var/lib/torque/mom_priv/jobs(/.*)?'

User avatar
TrevorH
Forum Moderator
Posts: 21211
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: SELinux setting for Torque's pam_pbssimpleauth.so

Postby TrevorH » 2017/06/17 16:15:14

I think your best bet is to use service auditd rotate to rotate the audit log files around then move or delete all files except the current audit.log from /var/log/audit. That will get rid of all historical selinux problems from the log.

Now use setenforce 0 to go into permissive mode so that all selinux denials are logged but will not stop anything from working. Recreate your problem and now your current audit.log will have all selinux denials that would stop things from working. If you attempt to do this when not in permissive mode then execution stops after the first denial, you fix that and then find the next one. Permissive lets you get them all in one go.

Now run grep -i avc /var/log/audit/audit.log and those are the denials you will need to fix. Each avc should have information like this dev="sda3" ino=1859787 in it and that should let you find exactly which file it is complaining about - in this case, find whatever filesystem is mounted using /dev/sda3 and use find /thatfilesystem -inum 1859787 and it will show you the exact file that had the problem.

If all files are within a single path like /var/lib/torque/mom_priv/jobs then you can use the semanage fcontext command to tell it to assign the correct label to files created within that directory. It'll need to be something that sshd can access. The other alternative is to create a policy module that allows sshd_t to access files with their current context but since this appears to be var_lib_t that looks like it might allow more than you want.
CentOS 5 died in March 2017 - migrate NOW!
Full time Geek, part time moderator. Use the FAQ Luke

-javier
Posts: 5
Joined: 2017/05/23 08:42:53

Re: SELinux setting for Torque's pam_pbssimpleauth.so

Postby -javier » 2017/06/22 11:11:54

Sorry for the delay in replying. I was writing the reply while checking your answers, and then got into a try-and-error endless loop trying to decide if chcon was enough or I really need semanage fcontext :-) This is what I was writing [later added comments in square brackets]

I cannot make the chcon command work. Reading "man chcon" I think the "_u" is user, "_r" is role, "_t" is type? but --type is the whole label? Hmf, lost again :-) [later I found an example in "man semanage-fcontext" where clearly "*_t" is type, and -t must be _just_ type]

Code: Select all

[root@n1 mom_priv]# pwd
/var/lib/torque/mom_priv
[root@n1 mom_priv]# ls
config  jobs  mom.layout
[root@n1 mom_priv]# chcon -R -v --type=system_u:object_r:usr_t:s0 /var/lib/torque/mom_priv/jobs
changing security context of ‘/var/lib/torque/mom_priv/jobs’
chcon: failed to set type security context component to ‘system_u:object_r:usr_t:s0’: Invalid argument
[root@n1 mom_priv]# ls -alZ
drwxr-xr-x. root root system_u:object_r:var_lib_t:s0   .
drwxr-xr-x. root root system_u:object_r:var_lib_t:s0   ..
lrwxrwxrwx. root root system_u:object_r:var_lib_t:s0   config -> /etc/torque/mom/config
drwxr-xr-x. root root system_u:object_r:var_lib_t:s0   jobs
lrwxrwxrwx. root root system_u:object_r:var_lib_t:s0   mom.layout -> /etc/torque/mom/mom.layout
[root@n1 mom_priv]# ls -alZ /var/lib/torque/mom_priv/jobs/
drwxr-xr-x. root root system_u:object_r:var_lib_t:s0   .
drwxr-xr-x. root root system_u:object_r:var_lib_t:s0   ..
[root@n1 mom_priv]#

I was starting to get used to systemctl instead of service, I had never rotated any logs before. Reading "man service" I would have said that the (seemingly undocumented) rotate cmd should be implemented in some /etc/init.d/auditd script... puzzled :-)

Code: Select all

[root@n1 audit]# systemctl rotate auditd
Unknown operation 'rotate'.
[root@n1 audit]# service auditd
The service command supports only basic LSB actions (start, stop, restart, try-restart, reload, force-reload, status). For other actions, please try to use systemctl.
[root@n1 audit]# service auditd rotate
Rotating logs:                                             [  OK  ]
[root@n1 audit]# ls /etc/rc.d/init.d/
functions  netconsole  network  README
[root@n1 audit]#

Now I'm submitting jobs to the Torque queue using the qsub command from the front-end n0. Results are stored in NFS-mounted home in n1. PAM in n1 is configured for "account sufficient pam_pbssimpleauth.so" and "required pam_access.so".

With SELinux Enforcing

Code: Select all

[javier@n0 ~]$ echo "hostname; sleep 60" | qsub
23.n0
[javier@n0 ~]$ qstat
Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
23.n0                      STDIN            javier                 0 R batch   
[javier@n0 ~]$ ssh n1
Connection closed by 192.168.1.11
[javier@n0 ~]$ qstat
Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
23.n0                      STDIN            javier                 0 R batch   
[javier@n0 ~]$ qstat
Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
23.n0                      STDIN            javier          00:00:00 C batch   
[javier@n0 ~]$ ls ST*
STDIN.e23  STDIN.o23
[javier@n0 ~]$ cat ST*
n1
[javier@n0 ~]$ rm ST*
[javier@n0 ~]$

With SELinux Permisive

Code: Select all

[javier@n0 ~]$ echo "hostname; sleep 60" | qsub
24.n0
[javier@n0 ~]$ qstat
Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
24.n0                      STDIN            javier                 0 R batch   
[javier@n0 ~]$ ssh n1
Last login: Tue Jun 20 20:12:48 2017 from n0
[javier@n1 ~]$ # Worked!!!
[javier@n1 ~]$ exit
logout
Connection to n1 closed.
[javier@n0 ~]$ qstat
Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
24.n0                      STDIN            javier          00:00:00 C batch   
[javier@n0 ~]$  ssh n1
Connection closed by 192.168.1.11
[javier@n0 ~]$

It works!!! I was able to ssh into n1 as long as I had a running job there (sufficient pam_pbssimpleauth.so)

In the compute node the log says

Code: Select all

[root@n1 audit]# grep -i avc /var/log/audit/audit.log
type=AVC msg=audit(1498044840.383:313): avc:  denied  { read } for  pid=1977 comm="sshd" name="23.n0.JB" dev="sda3" ino=2090270 scontext=system_u:system_r:sshd_t:s0-s0:c0.c1023 tcontext=system_u:object_r:var_lib_t:s0 tclass=file
type=AVC msg=audit(1498045019.356:330): avc:  denied  { read } for  pid=2060 comm="sshd" name="24.n0.JB" dev="sda3" ino=2090270 scontext=system_u:system_r:sshd_t:s0-s0:c0.c1023 tcontext=system_u:object_r:var_lib_t:s0 tclass=file
type=AVC msg=audit(1498045019.356:330): avc:  denied  { open } for  pid=2060 comm="sshd" path="/var/lib/torque/mom_priv/jobs/24.n0.JB" dev="sda3" ino=2090270 scontext=system_u:system_r:sshd_t:s0-s0:c0.c1023 tcontext=system_u:object_r:var_lib_t:s0 tclass=file
type=USER_AVC msg=audit(1498045019.522:337): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='avc:  received setenforce notice (enforcing=0)  exe="/usr/lib/systemd/systemd" sauid=0 hostname=? addr=? terminal=?'
[root@n1 audit]#
[root@n1 audit]# mount | grep ^/\\\|:/
/dev/sda3 on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
/dev/sda1 on /boot type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
n0:/home on /home type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.11,local_lock=none,addr=192.168.1.10)
[root@n1 audit]#
[root@n1 audit]# find / -inum 2090270
[root@n1 audit]# ls /var/lib/torque/mom_priv/jobs/
[root@n1 audit]#

It's funny that the jobs always get the same inode number :-) They get deleted as they complete their time in the queue. So in fact I need the "semanage fcontext" solution. Thanks for the explanation (TrevorH) and for the syntax (poky), for some reason reading the sealert I couldn't make sense of the instructions/advice. I think I got overwhelmed by the list of FILE_TYPE... (I think you call them "selinux file labels", and the list included in the sealert are the labels that sshd can access... isn't it?). [no, they're not, I tried cluster_var_lib_t and new different sealerts show up]. And anyways I thought that it would be a solution for a single job file, not for any job. I didn't imagine the file argument would allow a syntax for a file hierarchy.

Code: Select all

[root@n1 mom_priv]# semanage fcontext -a -t system_u:object_r:usr_t:s0 '/var/lib/torque/mom_priv/jobs(/.*)?'
ValueError: Type system_u:object_r:usr_t:s0 is invalid, must be a file or device type
[root@n1 mom_priv]# man semanage
[root@n1 mom_priv]# man semanage-fcontext
[root@n1 mom_priv]# # Ouch! now I see an example of a -t argument! Retrying chcon
[root@n1 mom_priv]# chcon -R -v --type=usr_t /var/lib/torque/mom_priv/jobs
changing security context of ‘/var/lib/torque/mom_priv/jobs’
[root@n1 mom_priv]# ls -Z
lrwxrwxrwx. root root system_u:object_r:var_lib_t:s0   config -> /etc/torque/mom/config
drwxr-xr-x. root root system_u:object_r:usr_t:s0       jobs
lrwxrwxrwx. root root system_u:object_r:var_lib_t:s0   mom.layout -> /etc/torque/mom/mom.layout
[root@n1 mom_priv]#

Ok, I think I got it. Trying again

Code: Select all

[javier@n0 ~]$ echo "hostname; sleep 60" | qsub
25.n0
[javier@n0 ~]$ qstat
Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
25.n0                      STDIN            javier                 0 R batch   
[javier@n0 ~]$ ssh n1
Last login: Wed Jun 21 13:36:59 2017 from n0
[javier@n1 ~]$ # Yup!
[javier@n1 ~]$ exit
logout
Connection to n1 closed.
[javier@n0 ~]$ qstat
Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
25.n0                      STDIN            javier          00:00:00 C batch   
[javier@n0 ~]$ ssh n1
Connection closed by 192.168.1.11
[javier@n0 ~]$

It works!!! Just to try out all the commands you showed me, and some other label in the sealert (I don't think I need -R -F but anyways...)

Code: Select all

[root@n1 mom_priv]# restorecon -R -v -F /var/lib/torque/mom_priv/jobs
restorecon reset /var/lib/torque/mom_priv/jobs context system_u:object_r:usr_t:s0->system_u:object_r:var_lib_t:s0
[root@n1 mom_priv]# ls -Z
lrwxrwxrwx. root root system_u:object_r:var_lib_t:s0   config -> /etc/torque/mom/config
drwxr-xr-x. root root system_u:object_r:var_lib_t:s0   jobs
lrwxrwxrwx. root root system_u:object_r:var_lib_t:s0   mom.layout -> /etc/torque/mom/mom.layout
-rw-r--r--. root root system_u:object_r:var_lib_t:s0   mom.lock
[root@n1 mom_priv]#
[root@n1 mom_priv]# chcon -R -v --type=cluster_var_lib_t /var/lib/torque/mom_priv/jobs
changing security context of ‘/var/lib/torque/mom_priv/jobs’
[root@n1 mom_priv]# ls -Z
lrwxrwxrwx. root root system_u:object_r:var_lib_t:s0   config -> /etc/torque/mom/config
drwxr-xr-x. root root system_u:object_r:cluster_var_lib_t:s0 jobs
lrwxrwxrwx. root root system_u:object_r:var_lib_t:s0   mom.layout -> /etc/torque/mom/mom.layout
-rw-r--r--. root root system_u:object_r:var_lib_t:s0   mom.lock
[root@n1 mom_priv]#

Ouch! nope! A new sealert explains that...

Code: Select all

[root@n1 log]# sealert -l f1aac4f5-d571-4350-b499-a98c223110b0
SELinux is preventing /usr/sbin/sshd from read access on the directory /var/lib/torque/mom_priv/jobs.

*****  Plugin restorecon (92.2 confidence) suggests   ************************

If you want to fix the label.
/var/lib/torque/mom_priv/jobs default label should be var_lib_t.
Then you can run restorecon.
Do
# /sbin/restorecon -v /var/lib/torque/mom_priv/jobs

*****  Plugin catchall_boolean (7.83 confidence) suggests   ******************

If you want to allow daemons to enable cluster mode
Then you must tell SELinux about this by enabling the 'daemons_enable_cluster_mode' boolean.
You can read 'None' man page for more details.
Do
setsebool -P daemons_enable_cluster_mode 1

So I better stick to usr_t. I rebooted expecting to see the cluster_var_lib_t go away but it stayed (?!? so I am definitely misunderstanding the meaning of "temporarily change the security context" and "For permanent change").

Then I entered a useless and time-consuming try-and-error series of reboots, systemctl restart pbs_server/_sched/_mom/ trqauthd/munge and qsub/ssh trying to find out if chcon was enough :-S There seems to be a number of problems with the Torque package and I think I've been facing some of them and incorrectly assuming they were related to SELinux. The fact that I don't understand "temporary change" and/or "permanent change" doesn't help either :-)

I can say this works: stick to usr_t (other fancy labels have caused new sealerts), use semanage fcontext (chcon survives a reboot and yet that's called a "temporary change" - I don't understand SELinux :-), get used to boot/poweroff the cluster with Torque services stopped- create a script to launch/stop them. Then it works reliably. So I guess the final answer to my question was:

Code: Select all

[root@n1 mom_priv]# pwd
/var/lib/torque/mom_priv
[root@n1 mom_priv]# ls -Z
lrwxrwxrwx. root root system_u:object_r:var_lib_t:s0   config -> /etc/torque/mom/config
drwxr-xr-x. root root system_u:object_r:var_lib_t:s0   jobs
lrwxrwxrwx. root root system_u:object_r:var_lib_t:s0   mom.layout -> /etc/torque/mom/mom.layout
-rw-r--r--. root root system_u:object_r:var_lib_t:s0   mom.lock
[root@n1 mom_priv]# semanage fcontext -a -t usr_t '/var/lib/torque/mom_priv/jobs(/.*)?'
[root@n1 mom_priv]# man semanage-fcontext
[root@n1 mom_priv]# restorecon -R -v /var/lib/torque/mom_priv/jobs
restorecon reset /var/lib/torque/mom_priv/jobs context system_u:object_r:var_lib_t:s0->system_u:object_r:usr_t:s0
[root@n1 mom_priv]#

The change could be undone with:

Code: Select all

[root@n1 mom_priv]# semanage fcontext -d -t usr_t '/var/lib/torque/mom_priv/jobs(/.*)?'
[root@n1 mom_priv]# ls -Z
lrwxrwxrwx. root root system_u:object_r:var_lib_t:s0   config -> /etc/torque/mom/config
drwxr-xr-x. root root system_u:object_r:usr_t:s0       jobs
lrwxrwxrwx. root root system_u:object_r:var_lib_t:s0   mom.layout -> /etc/torque/mom/mom.layout
[root@n1 mom_priv]# restorecon -R -v /var/lib/torque/mom_priv/jobs/
restorecon reset /var/lib/torque/mom_priv/jobs context system_u:object_r:usr_t:s0->system_u:object_r:var_lib_t:s0
[root@n1 mom_priv]# ls -Z
lrwxrwxrwx. root root system_u:object_r:var_lib_t:s0   config -> /etc/torque/mom/config
drwxr-xr-x. root root system_u:object_r:var_lib_t:s0   jobs
lrwxrwxrwx. root root system_u:object_r:var_lib_t:s0   mom.layout -> /etc/torque/mom/mom.layout
[root@n1 mom_priv]#

Thanks again, poky, TrevorH, I couldn't have got it working without your help!