First of all, I want to advise you of the fact that there are long lists in this post. I'd spoiler them if I knew how to. My apologizes about that. I hope someone will kindly point out how to make spoilers on this forum.
There is something really strange happening with my iSCSI storage host (hereinafter referred as HOST) and I hope that someone here may have some ideas in regards to my issue.
Here we go:
OS:
CentOS 6.6
NIC (4 ports):
05:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
05:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
07:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
07:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
Embedded NIC:
0a:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection (rev ff)
0b:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
There are 5 different addresses configured on those NIC (one port unused):
4 pots:
IP1 = 192.168.2.70
IP2 = 192.168.2.71
IP3 = 192.168.4.70
IP4 = 192.168.4.118
IP5 = xxx.xxx.xxx.9
- All my infrastructure servers (10 of them) use HOST as a place to keep backups at.
- Each server is connected to HOST via dedicated gigabit ethernet port (network 192.168.2.0), multipath to both 2.70 and 2.71 (ISCSI)
- Two more servers are connected to HOST via 192.168.4.0 network, multipath to both 4.70 and 4.118 (ISCSI)
- One server is connected to HOST via xxx.xxx.xxx.0 network, single path (ISCSI)
No bonding, no link aggregation, just a separate IP address per port.
PROBLEM:
All connections excepts icmp are periodically refused (up and down in un-orderly fashion.
It happens to servers in my 192.168.2.0 network sporadically. Mostly only one address (2.70) refuses connections but sometimes 2.71 refuses too.
Log example:
Code: Select all
Jan 27 15:07:54 billing iscsid: connect to 192.168.2.71:3260 failed (Connection refused)
Jan 27 15:08:09 billing iscsid: connect to 192.168.2.71:3260 failed (Connection refused)
Jan 27 15:08:16 billing iscsid: connection2:0 is operational after recovery (65 attempts)
Jan 27 15:08:16 billing iscsid: connect to 192.168.2.71:3260 failed (Connection refused)
Jan 27 15:08:20 billing iscsid: connect to 192.168.2.71:3260 failed (Connection refused)
Jan 27 15:08:21 billing iscsid: Kernel reported iSCSI connection 2:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
Jan 27 15:08:23 billing iscsid: connect to 192.168.2.71:3260 failed (Connection refused)
Jan 27 15:08:23 billing iscsid: connect to 192.168.2.70:3260 failed (Connection refused)
Jan 27 15:08:27 billing iscsid: connect to 192.168.2.71:3260 failed (Connection refused)
If during the disconnection I try to ssh via 2.70 I get refused. However, I still can connect to any of other 4 IP addresses.
Even if I manage to connect and totally disable iptables I still cannot connect to 2.70 (or 2.71, it depends).
SELinux is disabled.
Also, nothing appears in /var/log/messages on those servers which are in 192.168.4.0 and xxx.xxx.xxx.0 networks.
The thing that bothers me alot - when I issue an
Code: Select all
arp -a
Code: Select all
? (192.168.2.71) at 00:1b:21:93:b6:c2 [ether] on eth1
? (192.168.2.70) at 00:1b:21:93:b6:c2 [ether] on eth1
Code: Select all
ifconfig
Code: Select all
[root@san ~]# ifconfig
eth2 Link encap:Ethernet HWaddr 00:1B:21:39:A4:FD
inet addr:192.168.4.118 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:91205133 errors:0 dropped:0 overruns:0 frame:0
TX packets:21374631 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:115259441252 (107.3 GiB) TX bytes:7323450864 (6.8 GiB)
Interrupt:17 Memory:f7dc0000-f7de0000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:491981 errors:0 dropped:0 overruns:0 frame:0
TX packets:491981 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:25007679 (23.8 MiB) TX bytes:25007679 (23.8 MiB)
q1 Link encap:Ethernet HWaddr 00:1B:21:93:B6:C0
inet addr:xxx.xxx.xxx.9 Bcast:xxx.xxx.xxx.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:682775102 errors:0 dropped:0 overruns:0 frame:0
TX packets:618099916 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:747456210923 (696.1 GiB) TX bytes:550706136944 (512.8 GiB)
q2 Link encap:Ethernet HWaddr 00:1B:21:93:B6:C1
inet addr:192.168.2.70 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:13956110706 errors:0 dropped:0 overruns:0 frame:0
TX packets:2524613095 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:18551919920166 (16.8 TiB) TX bytes:385578680288 (359.0 GiB)
q3 Link encap:Ethernet HWaddr 00:1B:21:93:B6:C2
inet addr:192.168.2.71 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:114665673 errors:0 dropped:0 overruns:0 frame:0
TX packets:23039418 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:139815114569 (130.2 GiB) TX bytes:10093010318 (9.3 GiB)
q4 Link encap:Ethernet HWaddr 00:1B:21:93:B6:C3
inet addr:192.168.4.70 Bcast:192.168.4.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:6038240387 errors:0 dropped:0 overruns:0 frame:0
TX packets:4837696120 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:5801092392535 (5.2 TiB) TX bytes:6472500258440 (5.8 TiB)
I've already tried to replace network adapter (that's how I've come up with 4-ports one a single adapter - I had two embedded and two single port NICs one month ago).
pci_aspm is set to performance.
No jumbo frames.
Any ideas?
Thank you in advance.