It is a known issue that the mlx4_core module on CentOS 6 defaults the limit of registrable memory to 32 GB, which can be too low for some systems and make MPI jobs to hang:
https://access.redhat.com/solutions/451223
http://community.mellanox.com/docs/DOC-1120
http://www.open-mpi.org/faq/?category=o ... cked-pages
Adding
Code: Select all
options mlx4_core log_num_mtt=22
However, after updating my system from CentOS 6.5 to 6.6, and libmlx4 from 1.0.5 release 4.el6.1 to 1.0.6 release 6.el6 (both from base repo), the options to the mlx4_core module are no longer read, it seems.
Before the update:
Code: Select all
[root@fisica-03 ~]# cat /etc/modprobe.d/mlx4.conf
options mlx4_core log_num_mtt=22
[root@fisica-03 ~]# cat /sys/module/mlx4_core/parameters/log_num_mtt
22
Code: Select all
[root@fisica-05 ~]# cat /etc/modprobe.d/mlx4.conf
options mlx4_core log_num_mtt=22
[root@fisica-05 ~]# cat /sys/module/mlx4_core/parameters/log_num_mtt
0