Need help with modprobe.conf and mutlipath

Installing, Configuring, Troubleshooting server daemons such as Web and Mail
wodel.youchi
Posts: 5
Joined: 2015/04/25 18:16:39

Need help with modprobe.conf and mutlipath

Postby wodel.youchi » 2017/06/07 10:36:08

Hi,

We have an HA cluster of two servers, we're about to change the shared storage, we will be using a Dell MD3600f an FC storage array.
We have some problems configuring multipath.
We have found a Dell PDF document explaining how to configure the multipath to use scsi_dh_drac driver on CentOS 5.5.
At some point the documentation says :
3 - Add DM-RDAC driver module parameter rdac_blacklist in /etc/modprobe.conf.local to support
RDAC/MPP coexistence.


Can someone please explain what does this mean?

Then is says :
4- Rebuild RAMdisk. Enable multipathd daemon using the command:
#chkconfig multipathd on
This command will enable multipathd during the boot up.



I did rebuild the initramfs but I did not add anything to it.

Now multipath -ll command shows this
[root@srv2 ~]# multipath -ll
olddata (3600c0ff000d5bb36dcf3f45101000000) dm-2 HP,MSA2012fc
[size=952G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
\_ 1:0:3:0 sde 8:64 [active][ready]

mpath5 (36f01faf000e25618000003ad5934ddfd) dm-1 DELL,MD36xxf
[size=1.2T][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=100][enabled]
\_ 1:0:0:2 sdb 8:16 [active][ghost]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:1:2 sdd 8:48 [active][ready]

mpath4 (36f01faf000e25686000003f55934df9e) dm-0 DELL,MD36xxf
[size=662G][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=100][enabled]
\_ 1:0:1:1 sdc 8:32 [active][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:0:1 sda 8:0 [active][ghost]

In the Dell array, we have two paths, but I have one ready, the second is ghost, what does this means?
When we execute some commands like, lvs, vgs, sometimes the commands hang for a while before retuning the results.

Here is our mutlipath.conf file :
[root@srv2 ~]# cat /etc/multipath.conf

defaults {
udev_dir /dev
# polling_interval 10
# selector "round-robin 0"
# path_grouping_policy failover
# getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
# prio_callout "/bin/true"
# path_checker tur
# rr_min_io 100
# rr_weight uniform
# failback immediate
# no_path_retry 12
user_friendly_names yes
}

# The blacklist section - use this to blacklist a multipath device based on
# it's wwid ( using wwid ) or device names ( using devnode ) or
# vendor and product id ( using device block).

blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z][[0-9]*]"
devnode "^raw[[0-9]*]"
devnode "^cciss!c[0-9]d[0-9]*"

device {
vendor "*"
product "Universal Xport"
}
}

multipaths {


multipath {
wwid 3600c0ff000d5bb36dcf3f45101000000
alias olddata
no_path_retry 10
}

}

# The devices section - used to define per storage array model settings

devices {


device {
vendor "HP"
product "MSA2[02]12fc|MSA2012i"
path_grouping_policy multibus
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
path_selector "round-robin 0"
rr_weight uniform
path_checker tur
hardware_handler "0"
failback immediate
no_path_retry 18
rr_min_io 100
}

device {
vendor "DELL"
product "MD36xxf"
path_grouping_policy group_by_prio
#prio rdac
#polling_interval 5
path_checker rdac
path_selector "round-robin 0"
hardware_handler "1 rdac"
failback immediate
features "2 pg_init_retries 50"
no_path_retry 30
rr_min_io 100
prio_callout "/sbin/mpath_prio_rdac /dev/%n"

}
}

as you can see the prio and polling_interval were taken from the Dell documentation, but were not recognized by the multlpath service.
Here is the link of the Dell documentation file, it's not a big : [url]http://www.dell.com/downloads/global/products/pvaul/en/powervault-md-linux-device-manager-installation.pdf
[/url]
Thanks in advance
Regards

User avatar
TrevorH
Forum Moderator
Posts: 22600
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: Need help with modprobe.conf and mutlipath

Postby TrevorH » 2017/06/07 11:33:56

First thing to realise is that CentOS 5 is dead and should not be used. It went End of Life at the end of March 2017 and there will be no more updates to it, ever. It's a dead o/s with no future support - you should be planning to migrate to a supported release ASAP. Since CentOS 6 just entered its final support phase and only critical security bugs will be fixed from now on, that really means "use CentOS 7".

There is no DM-RDAC module but there is a dm-rdac (Linux is case sensitive so they are not the same thing). The dm-rdac module doesn't take any parameters. In addition, I'm pretty sure that there is no such thing as /etc/modprobe.conf.local either though there is a /etc/modprobe.conf.

To blacklist the dm-rdac module you would need to add "blacklist dm-rdac" to /etc/modprobe.conf and then rebuild the initrd file.

And don't forget, CentOS 5 is dead. Don't use it.
CentOS 5 died in March 2017 - migrate NOW!
Full time Geek, part time moderator. Use the FAQ Luke

wodel.youchi
Posts: 5
Joined: 2015/04/25 18:16:39

Re: Need help with modprobe.conf and mutlipath

Postby wodel.youchi » 2017/06/07 12:25:33

Hi and thanks,

For now we don't have much choice, our application is supported only on CentOS 5.

I added the line on modprobe.conf and rebuild the initramfs, but we still have the same problem.
I started an mkfs on one volume, and it was horrible, the creation of the inode index took an eternity.
On the log I got this

Code: Select all

Jun  7 11:51:48 srv2 kernel: sd 1:0:1:1: rdac: AVT mode detected
Jun  7 11:51:48 srv2 kernel: sd 1:0:0:2: rdac: AVT mode detected
Jun  7 11:52:50 srv2 kernel: qla2xxx 0000:06:00.0: scsi(1:0:2): Abort command issued -- 1 8adb 2002.
Jun  7 11:54:23 srv2 kernel: qla2xxx 0000:06:00.0: scsi(1:0:2): Abort command issued -- 1 8b04 2002.
Jun  7 11:54:23 srv2 kernel: sd 1:0:0:2: timing out command, waited 60s
Jun  7 11:54:23 srv2 kernel: device-mapper: multipath: Using scsi_dh module scsi_dh_rdac for failover/failback and device management.
Jun  7 11:54:23 srv2 multipathd: dellzmbckp: load table [0 2516582400 multipath 2 pg_init_retries 50 1 rdac 2 1 round-robin 0 1 1 8:16 100 round-robin 0 1 1 8:4
Jun  7 11:54:23 srv2 multipathd: dm-1: add map (uevent)
Jun  7 11:54:23 srv2 multipathd: dm-1: devmap already registered
Jun  7 11:54:23 srv2 multipathd: dm-4: add map (uevent)
Jun  7 11:57:19 srv2 kernel: sd 1:0:0:2: rdac: AVT mode detected
Jun  7 11:58:21 srv2 kernel: qla2xxx 0000:06:00.0: scsi(1:0:2): Abort command issued -- 1 8b5d 2002.
Jun  7 11:59:22 srv2 kernel: qla2xxx 0000:06:00.0: scsi(1:0:2): Abort command issued -- 1 8b66 2002.
Jun  7 11:59:23 srv2 kernel: qla2xxx 0000:06:00.0: scsi(1:0:2): Abort command issued -- 1 8b7a 2002.
Jun  7 11:59:24 srv2 kernel: qla2xxx 0000:06:00.0: scsi(1:0:2): Abort command issued -- 1 8b7b 2002.
Jun  7 11:59:25 srv2 kernel: qla2xxx 0000:06:00.0: scsi(1:0:2): Abort command issued -- 1 8b83 2002.
Jun  7 11:59:25 srv2 kernel: qla2xxx 0000:06:00.0: scsi(1:0:2): Abort command issued -- 1 8b85 2002.
Jun  7 11:59:26 srv2 kernel: qla2xxx 0000:06:00.0: scsi(1:0:2): Abort command issued -- 1 8b88 2002.
Jun  7 11:59:26 srv2 kernel: qla2xxx 0000:06:00.0: scsi(1:0:2): Abort command issued -- 1 8b8d 2002.
Jun  7 11:59:27 srv2 kernel: qla2xxx 0000:06:00.0: scsi(1:0:2): Abort command issued -- 1 8b8e 2002.
Jun  7 11:59:28 srv2 kernel: qla2xxx 0000:06:00.0: scsi(1:0:2): Abort command issued -- 1 8b91 2002.
Jun  7 11:59:29 srv2 kernel: qla2xxx 0000:06:00.0: scsi(1:0:2): Abort command issued -- 1 8b93 2002.
Jun  7 11:59:30 srv2 kernel: qla2xxx 0000:06:00.0: scsi(1:0:2): Abort command issued -- 1 8b97 2002.
Jun  7 11:59:31 srv2 kernel: qla2xxx 0000:06:00.0: scsi(1:0:2): Abort command issued -- 1 8b9a 2002.
Jun  7 12:00:31 srv2 kernel: qla2xxx 0000:06:00.0: scsi(1:0:2): Abort command issued -- 1 8ba6 2002.
Jun  7 12:00:32 srv2 kernel: qla2xxx 0000:06:00.0: scsi(1:0:2): Abort command issued -- 1 8bad 2002.
Jun  7 12:00:32 srv2 kernel: qla2xxx 0000:06:00.0: scsi(1:0:2): Abort command issued -- 1 8bae 2002.
Jun  7 12:00:33 srv2 kernel: qla2xxx 0000:06:00.0: scsi(1:0:2): Abort command issued -- 1 8bb3 2002.
Jun  7 12:00:34 srv2 kernel: qla2xxx 0000:06:00.0: scsi(1:0:2): Abort command issued -- 1 8bb6 2002.
Jun  7 12:00:35 srv2 kernel: qla2xxx 0000:06:00.0: scsi(1:0:2): Abort command issued -- 1 8bb7 2002.
Jun  7 12:00:58 srv2 kernel: INFO: task multipathd:3743 blocked for more than 120 seconds.
Jun  7 12:00:58 srv2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun  7 12:00:58 srv2 kernel: multipathd    D ffff810001014f20     0  3743      1          3744  3742 (NOTLB)
Jun  7 12:00:58 srv2 kernel:  ffff8101facfda28 0000000000000082 ffff8101bc2d4c80 ffffffff80154bc4
Jun  7 12:00:58 srv2 kernel:  ffff8101bc2d4c80 0000000000000001 ffff810229229080 ffff81022ff240c0
Jun  7 12:00:58 srv2 kernel:  0000481db9141b3b 00000000000372db ffff810229229268 0000000280154088
Jun  7 12:00:58 srv2 kernel: Call Trace:
Jun  7 12:00:58 srv2" kernel:  [<ffffffff80154bc4>] cfq_remove_request+0xa8/0x104
Jun  7 12:00:58 srv2 kernel:  [<ffffffff80063171>] wait_for_completion+0x79/0xa2
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8008f4db>] default_wake_function+0x0/0xe
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8014cfed>] blk_execute_rq_nowait+0x7e/0x92
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8014d099>] blk_execute_rq+0x98/0xc0
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8002e2f7>] blk_recount_segments+0x17/0x28
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8015069d>] sg_io+0x258/0x356
Jun  7 12:00:58 srv2 kernel:  [<ffffffff80150c22>] scsi_cmd_ioctl+0x1d2/0x3b5
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8000d02a>] do_lookup+0x8f/0x24b
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8000d58c>] dput+0x2c/0x114
Jun  7 12:00:58 srv2 kernel:  [<ffffffff880aa0a5>] :sd_mod:sd_ioctl+0xa3/0xd2
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8014e6e3>] blkdev_driver_ioctl+0x5d/0x72
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8014ed34>] blkdev_ioctl+0x63c/0x697
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8015b227>] snprintf+0x44/0x4c
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8000f491>] __alloc_pages+0x78/0x308
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8011434d>] sysfs_open_file+0x0/0x1e9
Jun  7 12:00:58 srv2 kernel:  [<ffffffff800e947c>] block_ioctl+0x1b/0x1f
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8004250f>] do_ioctl+0x21/0x6b
Jun  7 12:00:58 srv2 kernel:  [<ffffffff800304ba>] vfs_ioctl+0x457/0x4b9
Jun  7 12:00:58 srv2 kernel:  [<ffffffff800bb2ba>] audit_syscall_entry+0x1a8/0x1d3
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8004c924>] sys_ioctl+0x59/0x78
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8005d29e>] tracesys+0xd5/0xdf
Jun  7 12:00:58 srv2 kernel:
Jun  7 12:00:58 srv2 kernel: INFO: task mkfs.ext3:4072 blocked for more than 120 seconds.
Jun  7 12:00:58 srv2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun  7 12:00:58 srv2 kernel: mkfs.ext3     D ffff810001036520     0  4072   4035                     (NOTLB)
Jun  7 12:00:58 srv2 kernel:  ffff8101ff5bd358 0000000000000086 0000000000000046 0000000000000046
Jun  7 12:00:58 srv2 kernel:  ffff81022ccc5128 0000000000000007 ffff81020e4e50c0 ffff81022fead080
Jun  7 12:00:58 srv2 kernel:  0000481dd52e99f3 0000000001c3bfef ffff81020e4e52a8 000000060001cc00
Jun  7 12:00:58 srv2 kernel: Call Trace:
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8006eda2>] do_gettimeofday+0x40/0x90
Jun  7 12:00:58 srv2 kernel:  [<ffffffff800637de>] io_schedule+0x3f/0x67
Jun  7 12:00:58 srv2 kernel:  [<ffffffff800292df>] get_request_wait+0xe7/0x130
Jun  7 12:00:58 srv2 kernel:  [<ffffffff800a3f9a>] autoremove_wake_function+0x0/0x2e
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8002e2f7>] blk_recount_segments+0x17/0x28
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8000c19d>] __make_request+0x40a/0x4ce
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8001c84a>] generic_make_request+0x211/0x228
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8828e415>] :dm_mod:__map_bio+0x4a/0x123
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8828ef57>] :dm_mod:__split_bio+0x17d/0x3b7
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8828f99c>] :dm_mod:dm_request+0x115/0x124
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8001c84a>] generic_make_request+0x211/0x228
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8828e415>] :dm_mod:__map_bio+0x4a/0x123
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8828ef57>] :dm_mod:__split_bio+0x17d/0x3b7
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8828f99c>] :dm_mod:dm_request+0x115/0x124
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8001c84a>] generic_make_request+0x211/0x228
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8828e415>] :dm_mod:__map_bio+0x4a/0x123
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8828ef57>] :dm_mod:__split_bio+0x17d/0x3b7
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8828f99c>] :dm_mod:dm_request+0x115/0x124
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8001c84a>] generic_make_request+0x211/0x228
Jun  7 12:00:58 srv2 kernel:  [<ffffffff800238a7>] mempool_alloc+0x31/0xe7
Jun  7 12:00:58 srv2 kernel:  [<ffffffff80033698>] submit_bio+0xe6/0xed
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8001b017>] submit_bh+0xf4/0x114
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8001ca51>] __block_write_full_page+0x1f0/0x2f4
Jun  7 12:00:58 srv2 kernel:  [<ffffffff800ea5b2>] blkdev_get_block+0x0/0x46
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8001d5c0>] mpage_writepages+0x1bf/0x37d
Jun  7 12:00:58 srv2 kernel:  [<ffffffff800e94f7>] blkdev_writepage+0x0/0xf
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8005afe8>] do_writepages+0x20/0x2f
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8002fee9>] __writeback_single_inode+0x1a2/0x31c
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8002159c>] sync_sb_inodes+0x1b7/0x271
Jun  7 12:00:58 srv2 kernel:  [<ffffffff800513b2>] writeback_inodes+0x82/0xd8
Jun  7 12:00:58 srv2 kernel:  [<ffffffff800cd9fb>] balance_dirty_pages_ratelimited_nr+0x159/0x2cb
Jun  7 12:00:58 srv2 kernel:  [<ffffffff800102e9>] generic_file_buffered_write+0x562/0x6a9
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8001675a>] __generic_file_aio_write_nolock+0x369/0x3b6
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8008eef1>] __activate_task+0x56/0x6d
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8004727b>] try_to_wake_up+0x472/0x484
Jun  7 12:00:58 srv2 kernel:  [<ffffffff800ca99f>] generic_file_aio_write_nolock+0x20/0x6c
Jun  7 12:00:58 srv2 kernel:  [<ffffffff800cad76>] generic_file_write_nolock+0x8f/0xa8
Jun  7 12:00:58 srv2 kernel:  [<ffffffff800a3f9a>] autoremove_wake_function+0x0/0x2e
Jun  7 12:00:58 srv2 kernel:  [<ffffffff800ea9c9>] blkdev_file_write+0xa0/0xc8
Jun  7 12:00:58 srv2 kernel:  [<ffffffff80016b62>] vfs_write+0xce/0x174
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8001742b>] sys_write+0x45/0x6e
Jun  7 12:00:58 srv2 kernel:  [<ffffffff8005d29e>] tracesys+0xd5/0xdf
Jun  7 12:00:58 srv2 kernel:


I rebooted the server, the I disabled one controller of the Storage Array, to get only one path up, I restarted the mkfs, and this time, it wen smoothly.
There is something wrong with the multipath, but I can't figure it out.

Regards.