Configuration of infiniband (mlx4_core) after update

Issues related to configuring your network
Post Reply
alberto.trj
Posts: 1
Joined: 2014/11/05 18:15:12

Configuration of infiniband (mlx4_core) after update

Post by alberto.trj » 2014/11/05 19:08:35

Hi,

It is a known issue that the mlx4_core module on CentOS 6 defaults the limit of registrable memory to 32 GB, which can be too low for some systems and make MPI jobs to hang:
https://access.redhat.com/solutions/451223
http://community.mellanox.com/docs/DOC-1120
http://www.open-mpi.org/faq/?category=o ... cked-pages

Adding

Code: Select all

options mlx4_core log_num_mtt=22
to /etc/modprobe.d/mlx4.conf solved the problem and everything was running fine.

However, after updating my system from CentOS 6.5 to 6.6, and libmlx4 from 1.0.5 release 4.el6.1 to 1.0.6 release 6.el6 (both from base repo), the options to the mlx4_core module are no longer read, it seems.

Before the update:

Code: Select all

[root@fisica-03 ~]# cat /etc/modprobe.d/mlx4.conf
options mlx4_core log_num_mtt=22
[root@fisica-03 ~]# cat /sys/module/mlx4_core/parameters/log_num_mtt
22
After the update:

Code: Select all

[root@fisica-05 ~]# cat /etc/modprobe.d/mlx4.conf
options mlx4_core log_num_mtt=22
[root@fisica-05 ~]# cat /sys/module/mlx4_core/parameters/log_num_mtt 
0
How do I make the configuration options to be effective for mlx4_core?

outpaddling
Posts: 2
Joined: 2015/01/07 20:09:50

Re: Configuration of infiniband (mlx4_core) after update

Post by outpaddling » 2015/01/07 20:12:16

FYI, I'm seeing the same issue and so far haven't been able to find a solution.

chemal
Posts: 776
Joined: 2013/12/08 19:44:49

Re: Configuration of infiniband (mlx4_core) after update

Post by chemal » 2015/01/13 00:45:46

Last I heard is you don't have to manually tune this anymore, it works out of the box. Are you having any real problems?

leereyno
Posts: 2
Joined: 2009/10/06 18:46:09
Contact:

Re: Configuration of infiniband (mlx4_core) after update

Post by leereyno » 2015/03/05 15:31:21

I'm seeing the same problem and it is causing trouble in that OpenMPI is not happy.

I've found a work-around by adding the following to /etc/rc.d/rc.local:

chmod 644 /sys/module/mlx4_core/parameters/log_num_mtt
chmod 644 /sys/module/mlx4_core/parameters/log_mtts_per_seg

echo 24 > /sys/module/mlx4_core/parameters/log_num_mtt
echo 4 > /sys/module/mlx4_core/parameters/log_mtts_per_seg

This should not be necessary however.

kilian
Posts: 14
Joined: 2015/05/27 01:05:56

Re: Configuration of infiniband (mlx4_core) after update

Post by kilian » 2015/05/27 01:08:15

Did you try to rename your modprobe.d file to /etc/modprobe.d/mlx4_core.conf instead of mlx4.conf?

ssgparker
Posts: 1
Joined: 2015/06/17 19:43:09

Re: Configuration of infiniband (mlx4_core) after update

Post by ssgparker » 2015/06/17 19:52:58

I'm having the same problem and I've tried all suggestions on this page, but none of them worked for me. Currently using the rc.local workaround.

chemal
Posts: 776
Joined: 2013/12/08 19:44:49

Re: Configuration of infiniband (mlx4_core) after update

Post by chemal » 2015/06/20 22:01:40

There is no problem and the worakround is nonsense:

http://www.open-mpi.org/community/lists ... /25090.php

Post Reply