User:Zucca/wip/InfiniBand/resources

From Gentoo Wiki
< User:Zucca‎ | wip‎ | InfiniBand
Jump to: navigation, search

InfiniBand resources for improving InfiniBand Gentoo wiki -article

If anyone has urge/need to improve the article, I list some resources here for me as a reminder and for others to use.

TODOs
This article has some todo items:
  • Pictures section needs maybe proper images of different cable securing. There are at least three different physical ways on CX4 connector alone. CX12 should have the same set, but what about active (optical) cables?
  • Systemd service file
    • Systemd locks-up the system if Conflicts= is uncommented. No fix yet...
    • Maybe use awk instead of "while read"?
  • The whole connector -section needs complete re-write. There's no need to seperate active and passive cables. Do active cables even exist?

What is this InfiniBand anyway? And how it compares to Ethernet?

InfiniBand is a high-speed serial computer bus, intended for both internal and external connections. It is the result of merging two competing designs, Future I/O, developed by Compaq, IBM, and Hewlett-Packard, with Next Generation I/O (ngio), developed by Intel, Microsoft, and Sun Microsystems. From the Compaq side, the roots were derived from Tandem's ServerNet. For a short time before the group came up with a new name, InfiniBand was called System I/O.

Ethernet: A computer network cabling system designed by Xerox in the late 1970s. Originally transmission rates were 3 Megabits per second (Mb/s) over thick coaxial cable. Media today include fiber, twisted-pair (copper), and several coaxial cable types. Rates are upto 10 Gigabits per second or 10,000 Mb/s.[1]

See also: http://www.informatix-sol.com/docs/EthernetvInfiniBand.pdf

Connector and cable types

Passive cables

Passive cables do not contain any electronics. Only wires. The maximum length at Double Data Rate (DDR) is 10 meters, but usually 7 to 7.5 meters is the maximum length being sold. Passive CX -type connectors have at least three different physical widths: 4X, 8X and 12X. Each representing the number of lanes in the connection. Adding more lanes adds more bandwidth naturally, but the cables get bulkier and connectors get bigger as one needs to carry more wires and. Then in addition to the width there are at least three ways these connectors can be attached to the HCA:

  • Pull-latch
  • Push-latch
  • thumbscrew
Active cables
:( TODO

Setting up

Users of Mellanox InfiniBand hardware propably need kernel 4.9 or newer. [2]

https://software.intel.com/en-us/articles/enabling-ip-over-infiniband-on-the-intel-xeon-phi-coprocessor

http://pkg-ofed.alioth.debian.org/howto/infiniband-howto-4.html Setting up IB

http://pkg-ofed.alioth.debian.org/howto/infiniband-howto-5.html IPoIB

http://www.shocksolution.com/2012/12/installing-and-configuring-infiniband-on-a-red-hat-system/

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Networking_Guide/sec-Configuring_IPoIB.html IPoIB

http://www.davidhunt.ie/enabling-infiniband-on-ububtu-10-10/ & http://www.davidhunt.ie/infiniband-at-home-10gb-networking-on-the-cheap/

And of course the forum topic. ;) If you have some success stories tell it there. :)

If you want to contribute a resource link here, paste in to the talk page or to forums (link above).

Udev rules

FILE /etc/udev/rules.d/2-InfiniBand.rulesudev rules for InfiniBand IP networking interfaces
# Rules to set InfiniBand device attributes.

# Change mode to connected and change mtu to maximum for best performance.
# This does not work for some reason. Why?
#ACTION=="add", KERNEL=="ib[0-9]*", SUBSYSTEM=="net", ATTR{mode}="connected", ATTR{mtu}="65520"
# ... instead we'll need some sh hack to pull this off. Strange...
ACTION=="add", KERNEL=="ib[0-9]*", SUBSYSTEM=="net", RUN+="/bin/sh -c 'echo connected > /sys/class/net/%k/mode && echo 65520 > /sys/class/net/%k/mtu'"

Needed modules

If the related InfiniBand drivers have been compiled as kernel modules it may happen that not all required modules are loaded.

systemd

Systems with systemd may use this as a starting point. It however includes all the most common modules.

FILE /etc/modules-load.d/InfiniBand.confmodules-load.d InfiniBand modules
# Some neccessary modules for InfiniBand

ib_core
# Next one if for certain Mellanox harware. Pick appropiate for your hardware.
ib_mthca
ib_cm
ib_ipoib
ib_uverbs
# umad is recommended, not mandatory, unless you explicitly use programs that need libibumad.
ib_umad
# For RDMA. You propably want these three if you have InfiniBand hardware, unless using IP-over-IB only.
rpcrdma
rdma_cm
rdma_ucm

OpenRC

FILE /etc/conf.d/modulesOpenRC - InfiniBand modules
modules="ib_core ib_cm ib_ipoib ib_uverbs ib_umad rpcrdma rdma_cm rdma_ucm"

Refer to systemd -section for more information about the modules itself.

This needs actual testing. I haven't yet rebooted my server so... :P --Zucca (talk) 21:25, 11 May 2017 (UTC)

NFS over RDMA

https://www.openfabrics.org/images/eventpresos/workshops2015/DevWorkshop/Tuesday/tuesday_09_lever.pdf

https://www.openfabrics.org/images/eventpresos/2017presentations/204_LinuxNFS_CLever.pdf

Warning
Is some cases NFS-over-RDMA mounts can be mounted only once per boot. This is especially a problem on desktop PCs that are put to sleep rather than powering off normally.

Users of NFS and InfiniBand can significantly reduce cpu load by utilizing NSF over RDMA.

/etc/fstab

Client side
FILE /etc/fstabfstab on client side
# Photo share
10.0.11.1:/DCIM                 /home/zucca/DCIM        nfs     rw,_netdev,rdma,port=20049,noauto,x-systemd.automount

# Puclic share
10.0.11.1:/pub                  /srv/pub                nfs     ro,_netdev,rdma,port=20049,noauto,x-systemd.automount

Users who don't use systemd may drop the x-systemd.automount -option and use threir preferred automounter.

Related commands

Troubleshooting

Poweroff and sleep related problems

There are some problems, with at least Mellanox hardware which uses mthca kernel driver, when putting system to sleep (suspend or hibernate) or when powering off or rebooting. The problems are usually caused by use of libibumad.

InfiniBand stops working after suspend/hibernate

When running applications that use ND or libibumad (such as OpenSM) the system might get to an unstable state when trying to shutdown/restart/hibernate it.” This is a problem at least on Mellanox branded HCAs. Most propably the ones using mthca driver (ib_mthca kernel module) Solution would be to shut down all programs using libibumad[3], or not use any such programs at all and even blacklist ib_umad kernel module or alternatively deselect it from kernel config. To be sure it would be best to even unload all ib_* modules althogether.

Note
Users also may need to exit all programs using libibumad. It may be possible that kernel does not allow removal of ib_umad module if libibumad is in use, which in addition is using the kernel module.
A possible Systemd -way to un-load modules
Warning
This is very hacky. Try to avoid unloading modules at all costs.

We can try to unload modules before initiating sleep. This, however, isn't enough in most cases. Terminating programs using the interface may be required to unload modules in the first place.

FILE /etc/systemd/system/modules-unload.servicemodules-unload systemd service
# /etc/systemd/system/modules-unload.service
[Unit]
Description=Module un-load

After=sleep.target reboot.target halt.target poweroff.target
Before=systemd-suspend.service systemd-hibernate.service systemd-hybrid-sleep.service systemd-halt.service systemd-poweroff.service systemd-reboot.service
#Conflicts=systemd-networkd.service systemd-networkd.socket NetworkManager.service dhcpcd.service

ConditionPathExists=/etc/modules-unload.conf

DefaultDependencies=no
StopWhenUnneeded=yes

[Service]
Type=oneshot
RemainAfterExit=yes
TimeoutSec=10s

Environment=UNLOAD_CONFIG=/etc/modules-unload.conf

# Goes tough $UNLOAD_CONFIG where file names (excl. extension) are listed one per line.
# Reads corresponding files inside /etc/modules-load.d/ -directory.
# Then unloads all the modules listed in those files.
# Uhh... Yes. Got it?
# Also we don't abort the whole pprocess if some of the modules cannot be unloaded.
# Hence the ExecStart=- (dash there)
ExecStart=-/bin/sh -c 'grep --no-filename -vE "^[[:space:]]*(#|;|$)" "${UNLOAD_CONFIG}" \
| while read c; do echo "/etc/modules-load.d/$c.conf"; done \
| xargs -r grep --no-filename -vE "^[[:space:]]*(#|;|$)" \
| xargs -r modprobe -var'
# modprobe is being verbose so that we can watch journal to see which modules get unloaded.

# When stopping it's just easy to use systemd built-in to reaload previously unloaded modules.
ExecStop=/usr/lib/systemd/systemd-modules-load

[Install]
WantedBy=sleep.target reboot.target shutdown.target halt.target poweroff.target

[4]

Warning
This service has managed to lock systemd completely on one target and unable to switch tartget afterwards. Forcing (systemctl --force reboot) reboot has caused filesystem corruption.
Do not try without backups and be ready to use them. Uncommenting Conflicts= -line has been causing this.
As a safety measure also ExecStart has now prepending "-" (minus), to avoid locking up since systemd has been also locking if this service unit fails. It's not ideal, but at the moment best that has came up.
You have been warned.

The service above goes trough /etc/modules-unload.conf line by line. On each non-comment line it looks for a modules-load config file base name. Then if it finds a file under /etc/modules-load.d/ with a same name plus .conf extension it unloads all the modules the file lists. So in this case we have these files to load and unload InfiniBand modules:

FILE /etc/modules-unload.confmodules-unload configuration
# Unload all the InfiniBand modules.
InfiniBand
Note
This configuration expects to find /etc/modules-load.d/InfiniBand.conf from the filesystem.

So to recap shortly: in /etc/modules-unload.conf are list of modules-load.d config files whose listed modules will be unloaded before sleep, reboot and shutdown targets.

Hibernate and Sleep are not functional when user-space is using its resources.[3]

Mellanox HCA issue. No offical answer. Solution above should work.

Unsorted resource links

http://www.ietf.org/wg/concluded/ipoib.html

Gallery / pictures

refs

  1. answers.yahoo - ”What is the difference between infiniband and ethernet?”, InfiniBand vs. Ethernet - very briefly | The Free Dictionary - Future I/O that evolved with NGIO into InfiniBand | Wikipedia - InfiniBand - History
  2. Phoronix - Mellanox Platform Support Coming In Linux 4.9 - ”The x86/platform code enables support for Mellanox Technologies platforms with support for hardware like their MSX6710, MSX8720, MSB7700, MSN2700, MSX1410, MSN2410, MSB7800, MSN2740, and MSN2100, among other products.
  3. 3.0 3.1 Section 3 - Known issues
  4. asus36JC.service - Arch Linux forums - initially copied from the post