[SOLVED] Sporadical "connection refused", iptables disabled.

Issues related to configuring your network
Post Reply
paveltide
Posts: 7
Joined: 2014/12/25 13:52:40

[SOLVED] Sporadical "connection refused", iptables disabled.

Post by paveltide » 2015/02/10 23:04:55

Good day/evening everybody!

First of all, I want to advise you of the fact that there are long lists in this post. I'd spoiler them if I knew how to. My apologizes about that. I hope someone will kindly point out how to make spoilers on this forum.

There is something really strange happening with my iSCSI storage host (hereinafter referred as HOST) and I hope that someone here may have some ideas in regards to my issue.

Here we go:

OS:
CentOS 6.6

NIC (4 ports):
05:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
05:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
07:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
07:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)

Embedded NIC:
0a:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection (rev ff)
0b:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection

There are 5 different addresses configured on those NIC (one port unused):

4 pots:

IP1 = 192.168.2.70

IP2
= 192.168.2.71

IP3 = 192.168.4.70

IP4 = 192.168.4.118

IP5
= xxx.xxx.xxx.9


- All my infrastructure servers (10 of them) use HOST as a place to keep backups at.
- Each server is connected to HOST via dedicated gigabit ethernet port (network 192.168.2.0), multipath to both 2.70 and 2.71 (ISCSI)
- Two more servers are connected to HOST via 192.168.4.0 network, multipath to both 4.70 and 4.118 (ISCSI)
- One server is connected to HOST via xxx.xxx.xxx.0 network, single path (ISCSI)

No bonding, no link aggregation, just a separate IP address per port.

PROBLEM:

All connections excepts icmp are periodically refused (up and down in un-orderly fashion.
It happens to servers in my 192.168.2.0 network sporadically. Mostly only one address (2.70) refuses connections but sometimes 2.71 refuses too.

Log example:

Code: Select all

Jan 27 15:07:54 billing iscsid: connect to 192.168.2.71:3260 failed (Connection refused)
Jan 27 15:08:09 billing iscsid: connect to 192.168.2.71:3260 failed (Connection refused)
Jan 27 15:08:16 billing iscsid: connection2:0 is operational after recovery (65 attempts)
Jan 27 15:08:16 billing iscsid: connect to 192.168.2.71:3260 failed (Connection refused)
Jan 27 15:08:20 billing iscsid: connect to 192.168.2.71:3260 failed (Connection refused)
Jan 27 15:08:21 billing iscsid: Kernel reported iSCSI connection 2:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
Jan 27 15:08:23 billing iscsid: connect to 192.168.2.71:3260 failed (Connection refused)
Jan 27 15:08:23 billing iscsid: connect to 192.168.2.70:3260 failed (Connection refused)
Jan 27 15:08:27 billing iscsid: connect to 192.168.2.71:3260 failed (Connection refused)
If the disconnection happens. it happens to all hosts for the same period of time. Disconnection may occur at any time, and may last from 5 seconds up to an hour or two.
If during the disconnection I try to ssh via 2.70 I get refused. However, I still can connect to any of other 4 IP addresses.
Even if I manage to connect and totally disable iptables I still cannot connect to 2.70 (or 2.71, it depends).
SELinux is disabled.

Also, nothing appears in /var/log/messages on those servers which are in 192.168.4.0 and xxx.xxx.xxx.0 networks.

The thing that bothers me alot - when I issue an

Code: Select all

arp -a
command on any server from 192.168.2.0 network I get the output similar to this;

Code: Select all

? (192.168.2.71) at 00:1b:21:93:b6:c2 [ether] on eth1
? (192.168.2.70) at 00:1b:21:93:b6:c2 [ether] on eth1
As you can see, both IP addresses seem to be on the same MAC, which is not true, because, according to

Code: Select all

ifconfig 
on my HOST:

Code: Select all

[root@san ~]# ifconfig 
eth2      Link encap:Ethernet  HWaddr 00:1B:21:39:A4:FD  
          inet addr:192.168.4.118  Bcast:192.168.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:91205133 errors:0 dropped:0 overruns:0 frame:0
          TX packets:21374631 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:115259441252 (107.3 GiB)  TX bytes:7323450864 (6.8 GiB)
          Interrupt:17 Memory:f7dc0000-f7de0000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:491981 errors:0 dropped:0 overruns:0 frame:0
          TX packets:491981 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:25007679 (23.8 MiB)  TX bytes:25007679 (23.8 MiB)

q1        Link encap:Ethernet  HWaddr 00:1B:21:93:B6:C0  
          inet addr:xxx.xxx.xxx.9  Bcast:xxx.xxx.xxx.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:682775102 errors:0 dropped:0 overruns:0 frame:0
          TX packets:618099916 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:747456210923 (696.1 GiB)  TX bytes:550706136944 (512.8 GiB)

q2        Link encap:Ethernet  HWaddr 00:1B:21:93:B6:C1  
          inet addr:192.168.2.70  Bcast:192.168.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:13956110706 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2524613095 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:18551919920166 (16.8 TiB)  TX bytes:385578680288 (359.0 GiB)

q3        Link encap:Ethernet  HWaddr 00:1B:21:93:B6:C2  
          inet addr:192.168.2.71  Bcast:192.168.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:114665673 errors:0 dropped:0 overruns:0 frame:0
          TX packets:23039418 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:139815114569 (130.2 GiB)  TX bytes:10093010318 (9.3 GiB)

q4        Link encap:Ethernet  HWaddr 00:1B:21:93:B6:C3  
          inet addr:192.168.4.70  Bcast:192.168.4.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6038240387 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4837696120 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:5801092392535 (5.2 TiB)  TX bytes:6472500258440 (5.8 TiB)

I've already tried to replace network adapter (that's how I've come up with 4-ports one a single adapter - I had two embedded and two single port NICs one month ago).

pci_aspm is set to performance.

No jumbo frames.

Any ideas?

Thank you in advance.

User avatar
TrevorH
Site Admin
Posts: 33215
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: Sporadical "connection refused", iptables disabled.

Post by TrevorH » 2015/02/11 08:52:21

My first thought when I read your post was "duplicate ip addresses" somewhere else on the network so you are not talking to the machine you think you are.

My second thought when I read your hardware list was that you have the 2 most problematic Intel NICs in existence and both have hardware bugs that are sort of worked around in the drivers more or less successfully. The most reliable method of fixing those hardware bugs is to turn off pcie_aspm altogether using pcie_aspm=off.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

paveltide
Posts: 7
Joined: 2014/12/25 13:52:40

Re: Sporadical "connection refused", iptables disabled.

Post by paveltide » 2015/02/13 07:57:03

Hi, Trevor.
My first thought when I read your post was "duplicate ip addresses" somewhere else on the network so you are not talking to the machine you think you are.
I'll double-check it.
...that you have the 2 most problematic Intel NICs in existence...
Which of them? I've heard that 82574L may behave nasty, so i updated drivers. Or did you mean that both 82574L and 82576 are NICs I should possibly stay away from?

User avatar
TrevorH
Site Admin
Posts: 33215
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: Sporadical "connection refused", iptables disabled.

Post by TrevorH » 2015/02/13 09:02:11

The 82574L cards are problematic and the 82576 ones are not much better :-(
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

paveltide
Posts: 7
Joined: 2014/12/25 13:52:40

Re: Sporadical "connection refused", iptables disabled.

Post by paveltide » 2015/02/13 11:28:32

Today there was another outage on all servers connected via 192.168.2.0, however my test node which is in 192.168.4.0 network shows no errors or disconnections at all. I'll leave it as it is now for a week and if no errors occur I'll switch it to 192.168.2.0 network and see what will happen next.
turn off pcie_aspm altogether using pcie_aspm=off.
Well,

Code: Select all

dmesg | grep PCIe
shows that PCIe ASPM is disabled:

Code: Select all

[root@san ~]# dmesg|grep PCIe
PCIe ASPM is disabled
igb 0000:05:00.0: eth4: (PCIe:2.5GT/s:Width x4) 
igb 0000:05:00.1: eth5: (PCIe:2.5GT/s:Width x4) 
igb 0000:07:00.0: eth6: (PCIe:2.5GT/s:Width x4) 
igb 0000:07:00.1: eth7: (PCIe:2.5GT/s:Width x4) 
So I think it's not the case.

By the way, the are no duplicates but I've found something in regards to arp problem and now arp -a show different MAC addresses for all IPs. Here are the topics that helped me:

viewtopic.php?t=7775
viewtopic.php?t=8401
http://g33kinfo.com/info/archives/4356

P.S. Which NICs would you recommend instead of those Intels' ?

paveltide
Posts: 7
Joined: 2014/12/25 13:52:40

Re: Sporadical "connection refused", iptables disabled.

Post by paveltide » 2015/02/16 07:49:24

This post if written for the sake of an update.

As of today no disconnections have occurred which I consider to be a good sign - I usually get them once in two-three days. I'm going to keep watching after the situation and will report here if my solution resolved the issue.

User avatar
TrevorH
Site Admin
Posts: 33215
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: Sporadical "connection refused", iptables disabled.

Post by TrevorH » 2015/02/16 13:20:26

P.S. Which NICs would you recommend instead of those Intels' ?
You picked the only 2 Intel cards I would not recommend.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

paveltide
Posts: 7
Joined: 2014/12/25 13:52:40

Re: [SOLVED] Sporadical "connection refused", iptables disab

Post by paveltide » 2015/02/25 14:31:34

Finally I discovered that the issue exists if I configure two or more NICs to have IPs belonging to the same subnet. That's definitely an arp issue, I believe I need to play around with sysctl and arp settings.

I think that's it for now, topic can be marked as "SOLVED".

Thank you for your advices!

Whoever
Posts: 1361
Joined: 2013/09/06 03:12:10

Re: [SOLVED] Sporadical "connection refused", iptables disab

Post by Whoever » 2015/02/26 06:38:29

paveltide wrote:Finally I discovered that the issue exists if I configure two or more NICs to have IPs belonging to the same subnet. That's definitely an arp issue, I believe I need to play around with sysctl and arp settings.

I think that's it for now, topic can be marked as "SOLVED".

Thank you for your advices!
Instead of having 4 IP addresses, why not have 2 IP addresses, one for each subnet, and use bonding to pair the Ethernet NICs?

Post Reply