InfiniBand

InfiniBand is a switched fabric communications link used in high-performance computing and enterprise data centers. Its features include high throughput, low latency, quality of service and failover, and it is designed to be scalable. The InfiniBand architecture specification defines a connection between processor nodes and high performance I/O nodes such as storage devices.

Kernel
Users of Mellanox hardware MSX6710, MSX8720, MSB7700, MSN2700, MSX1410, MSN2410, MSB7800, MSN2740, and MSN2100 need at least kernel 4.9.

The following kernel options must be activated:

Modules
If built as modules some of them don't neccessary get loaded automatically. The following file is an example for systemd systems.

Also make sure the module loading systemd service is enabled:

IP over InfiniBand (IPoIB)
Users of InfiniBand can also use it to carry IP-networking packets, thus allowing it to replace ethernet in some cases. From at least kernel version 4.10 onwards users can compile IP-over-IB in-kernel ( CONFIG_INFINIBAND_IPOIB ). However if that's not the case and the support is compiled as kernel module it may be possible that the module isn't automatically loaded. The module is named ib_ipoib:

InfiniBand network interfaces are usually named as :

Performance tuning
When using IP over InfiniBand ther performance is usually low by default. This is because the default mtu of each InfiniBand IP interface is set to a low value.

Automatic
The most conveivent way is to chance mtu automatically when kernel adds the interface to the system. Before mtu can be changed the mode of the interface must be changed to 'connected' from 'datagram' Next example uses udev rules to accoplish that.

It has been reported that the rule above does not work. Users with this problem may use the following instead:

Manual
Mode and mtu can also be changed manually in run time. Next commands assume the interface in question is named.

Performance testing
There are several ways to test InfiniBand performance. When IP-over-InfiniBand has been set up propely users can use normal network performance testing tools for IP networks like. Most users however may want to use since it has the capability of testing the RDMA performance too. For example NFS can utilize RDMA in conjuction with IP networking.

sys-fabric/qperf
Start in one of the nodes:This node now acts as a server.

Run tests on any client that has connection to the node(s) that now runs in server mode. The following example runs tcp, upd and some rdma performance tests and assumes the qperf server node has an ip adress of :

After the needed tests are complete stop the server process(es) just by hitting + on the terminal where the server-side runs.

net-misc/iperf
, like qperf, needs a listening side to be started before actual performance testing. Simplest way to start a iperf server process is:

However to reduce the output of the statistics on the server side users may want to limit the interval of the status updates. Next example set the interval to 10 seconds from the 1 second default: Or suppress all normal output so that only errors are displayed:

Next run the client side - again assuming the server has an ip address of 10.0.10.1:

The command above really puts the IP connection to a stress test by transferring 256 gigabytes of data which took over six minutes on this example. Users who just want quickly to see the bandwidth may run the following:

... which transfers 25 gigabytes with 5 second interval between statistic reporting.

Refer to the for more:

After the needed tests are complete stop the server process(es) just by hitting + on the terminal where the server-side runs.