[SOLVED] IPoIB performance loss

flewis@panasas.com · Post by **flewis@panasas.com** » 2015/03/10 18:28:18

Hello CentOS Networking folks,

Our company has an Infiniband Router (IBR) Product. The CentOS version was upgraded from 6.3 to 6.4.
The Mellanox OFED was upgraded to 2.3-2.0.5 and the Intel 10G driver was upgraded to 3.23.2.1.

[root@ibr2-1A-24G ~]# ethtool -i bond0
driver: bonding
version: 3.6.0
firmware-version: 2
bus-info:

[root@ibr2-1A-24G ~]# ethtool -i eth2
driver: ixgbe
version: 3.23.2.1
firmware-version: 0x800000cb
bus-info: 0000:05:00.0

[root@ibr2-1A-24G ~]# dmesg | grep Mellan
mlx4_core: Mellanox ConnectX core driver v2.3-2.0.5 (Jan 1 2015)
mlx4_en: Mellanox ConnectX HCA Ethernet driver v2.3-2.0.5 (Jan 1 2015)
<mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v2.3-2.0.5 (Jan 1 2015)

An Rx performance reduction of about 20% has been observed on the bonded ethernet slaves on the 6.4 install versus the 6.3 install. Tx performance looks good on both 6.3 and 6.4. The IBR uses IPoIB.
The 6.3 install uses the 6.3 version of OFED v2.3-2.0.5 while the 6.4 install uses the 6.4 version. I compiled and installed version ixgbe 3.23.2.1 separately for both 6.3 and 6.4. I compiled the ixgbe driver with LRO support removed as per the Intel README file since the IBR is used to route traffic between Infiniband and Ethernet networks.

The routers have a Quad 40 Gb/s InfiniBand interface and two 10Gb/s bonded Intel Ethernet interfaces.

HPC Cluster <----> ib0[ IBR using IPoiB ]bond0 <====> Ethernet-Network
<================================= Traffic Direction =================

So the performance issue is seen on 6.4 with traffic arriving at the ethernet slaves in bond0 and exiting ib0. The other direction looks fine.
The issue is not seen on 6.3.

I disabled irqbalance via "killall irqbalance" and manually set the IRQ affinities to
evenly distribute the IRQs across the 8 CPUs in our Westemere based installation.

I dumped the sysctls and I noticed that the default values for net.core.wmem_max and
net.core.rmem_max had changed. I tweaked the setting to be equal across all of the CentoOS
system participating in the test (according to RedHat Tech 6.4 notes, Section 5.10 Kernel)
but it made no difference.

I montitored the IBR via "mpstat -P ALL 5" and the none of the CPUs was totally pinned.
They were generally at 75+% idle in the most intensive test (iperf with 8 threads).

Ethtool -S does not reveal any drops, or pause frames (xon/xoff), errors, etc.

Since the same OFED version and ixgbe version was used on both 6.3 and 6.4, and the performance issue is seen only on 6.4, and otherwise the configurations (ethtool settings, sysctls, etc) are the same,
my hypothesis is that it is due some difference between the 6.3 and 6.4 kernels.

I've tried tweaking varous setting as per the Linux performance tuning guides with only marginal improvements.

My research has come up empty. Any info will be appreciated.

Thanks,
-Fred

gerald_clark · Post by **gerald_clark** » 2015/03/11 01:35:17

Current CentOS is 6.6.

flewis@panasas.com · Post by **flewis@panasas.com** » 2015/03/11 20:38:13

Hi Gerald, thanks for responding.
I tried 6.5 and 6.6 and saw the same issue. Since the issue first showed up in our application in 6.4 I thought it best to describe the issue with respect to 6.4.
-Fred

gerald_clark · Post by **gerald_clark** » 2015/03/11 20:42:09

We don't troubleshoot old unsupported versions.
6.6 is your 6.4 with security and bug fixes applied, and is the supported version.

flewis@panasas.com · Post by **flewis@panasas.com** » 2015/03/12 10:41:50

Okay, thanks for that info.

flewis@panasas.com · Post by **flewis@panasas.com** » 2015/04/17 15:01:14

Hi CentOS networking folks,

So here is the latest on this issue. From what I can tell, it appears that IPoIB is causing memcpy/__pskb_pull_tail to get called excessively.
I pulled the following perf trace from a 6.6 based install. The 6.4 based results look very similar. I turned irqbalance off (killall irqbalance), isolated the ib0 RxTX interrupts (via affinities)
to CPU 6 (Intel Westmere) ran mpstat and noticed that the CPU 6 softirq % is up significantly (17% vs 88%) over the 6.3 based install. The output of "tc -s qdisc ls" also shows significantly
more requeue events for ib0 after 6.3. Based on the function comments (see below), I guess, __pskb_pull_tail is not expected to get called too often.

If anyone has any ideas, suggestions, etc., I'd be glad to hear of them.
I haven't had any luck in finding any obvious changes to default settings, etc.
It doesn't look like a simple performance tuning issue to me, but I'd very much welcome being
wrong on that. :}

# ========
# captured on: Fri Apr 17 01:57:24 2015
# hostname : ibr2b
# os release : 2.6.32-504.el6.x86_64
# perf version : 2.6.32-358.el6.panasas.x86_64.debug
# arch : x86_64
# nrcpus online : 8
# nrcpus avail : 8
# cpudesc : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
# cpuid : GenuineIntel,6,44,2
# total memory : 6113692 kB
# cmdline : /usr/bin/perf record -C 6 -g
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0, excl_guest = 1, precise_ip = 0, id = { 10 }
# sibling cores : 0-7
# sibling threads : 0,4
# sibling threads : 1,5
# sibling threads : 2,6
# sibling threads : 3,7
# node0 meminfo : total = 6282168 kB, free = 5144308 kB
# node0 cpu list : 0-7
# ========
#
# Samples: 1M of event 'cycles'
# Event count (approx.): 948277054400
#
# Overhead Command Shared Object Symbol
# ........ .............. ..................... .................................................
#
24.23% :1 [kernel.kallsyms] [k] memcpy
|
--- memcpy
|
|--95.19%-- __pskb_pull_tail
| dev_hard_start_xmit
| sch_direct_xmit
| __qdisc_run
| |
| |--99.98%-- net_tx_action
| | __do_softirq
| | call_softirq
| | do_softirq
| | irq_exit
| | |
| | |--99.98%-- do_IRQ
| | | ret_from_intr
| | | |
| | | |--88.87%-- cpuidle_idle_call
| | | | cpu_idle
| | | | start_secondary
| | | |
| | | |--5.03%-- thread_return
| | | | cpu_idle
| | | | start_secondary
| | | |
| | | |--4.91%-- cpu_idle
| | | | start_secondary
| | | |
| | | |--1.19%-- start_secondary
| | | --0.00%-- [...]
| | --0.02%-- [...]
| --0.02%-- [...]
|
|--4.05%-- ipoib_poll
| net_rx_action
| __do_softirq
| call_softirq
| do_softirq
| irq_exit
| |
| |--99.99%-- do_IRQ
| | ret_from_intr
| | |
| | |--88.15%-- cpuidle_idle_call
| | | cpu_idle
| | | start_secondary
| | |
| | |--5.68%-- thread_return
| | | cpu_idle
| | | start_secondary
| | |
| | |--4.89%-- cpu_idle
| | | start_secondary
| | |
| | --1.27%-- start_secondary
| --0.01%-- [...]
|
|--0.51%-- ip_output
| ip_forward_finish
| ip_forward
| ip_rcv_finish
| ip_rcv
| __netif_receive_skb
| netif_receive_skb
| ipoib_cm_handle_rx_wc
| ipoib_poll
| net_rx_action
| __do_softirq
| call_softirq
| do_softirq
| irq_exit
| do_IRQ
| ret_from_intr
| |
| |--88.60%-- cpuidle_idle_call
| | cpu_idle
| | start_secondary
| |
| |--5.76%-- cpu_idle
| | start_secondary
| |
| |--4.66%-- thread_return
| | cpu_idle
| | start_secondary
| |
| --0.97%-- start_secondary
--0.24%-- […]

<snip>

/* Moves tail of skb head forward, copying data from fragmented part,
* when it is necessary.
* 1. It may fail due to malloc failure.
* 2. It may change skb pointers.
*
* It is pretty complicated. Luckily, it is called only in exceptional cases.
*/
unsigned char *__pskb_pull_tail(struct sk_buff *skb, int delta)

Any info will be appreciated.

Thanks,
-Fred

flewis@panasas.com · Post by **flewis@panasas.com** » 2015/05/07 14:52:02

This was resolved by using the "stock" CentOS 6.6 OFED that comes with CentOS 6.6 plus some tweaks to the affinity settings.
Performance was still about 7% less but was within acceptable limits.

CentOS