IOMMU SWIOTLB

IOMMU and SWIOTLB Memory Mapping
Todays computing uses a method of partitioning memory and each device such as a graphics card or PCI device, USB etc has to have memory mapped to be accessed my the device or application. Traditionally IOMMU was used. This is setup when the system is initialised and can not be dynamically changed as the system is running so Chip manufacturers such ans Intel and AMD developed more advanced memory management methods.

In the Linux Kernel we can manipulate the IOMMU using new mechanisms provided by SWIOTLB for Intel and others for architectures from AMD. 64Bit systems have enabled a huge amount of memory to be used in our computer systems and this memory needs mapping before it can be used. These kinds of terms are used across the Enterprise area of computing, particularly the Virtual-Machine sector but they can be used by anyone running a Linux kernel.

IOMMU
This is Input Output Memory Management Unit. In every system this hardware is integrated into a north bridge controller which setup the memory and is programmed by the firmware on your main-board. In recent years manufacturers have stopped integrating this as a North-Bridge chip and integrated it into the CPU itself. This is why if you want to upgrade your memory speed, type and so on you are now required to not only change the motherboard but the CPU as well. Regardless your kernel needs to setup and read the mappings to be able to use your system memory efficiently.

Enabling IOMMU
The above will allow the kernel to control the mappings in the Memory Mapping controller.

IOMMU for AMD64 systems
Once the above is enabled in the kernel for your system you can control aspects of the memory mapping using kernel boot parameters on the boot command line.

You can edit your Grub configuration files as you see fit to use the above available options.

IOMMU for Intel systems
Intel generally adopts a always enable it if it supports it rule so most options are to turn off or disable the function.

SWIOTLB Software Input Output Translation Lookaside Buffer
This is an Intel technology which sort of bypasses the IOMMU and allows for a much more configurable memory management interface. Without going into the deep complexity of how this works, page tables are cached in the Lookaside Buffer reducing the need to constantly access physical RAM to map memory. This technology is also referred to as a bounce buffer as the physical address of the memory map is held in this virtual space of and IO is bounced between the physical IO and the Physical memory by this virtual lookaside buffer.

This allows the memory mapping to be carried out quickly and have a physical memory space available for use much faster than if it had to be created physically in RAM and presented to the system as usable.

SWIOTLB for high input output such as Graphics
For decades the problem has existed in that how would you get data in and out of the CPU and RAM quickly and efficiently especially for high throughput devices like file IO and Graphics cards etc. Unfortunately the system is not only having to deal with that IO but many tasks all at the same time, your CPU and RAM maybe very fast but if you cant get the data out by either network, USB, storage device or onto a screen via a graphics card it is a waste of time having such a fast multiprocessing system.

Normally the system holds 4Mb for normal operation and allows the rest to be used by other devices. The problem is that if a device overlaps or overflows into another then the system panics and can't deal with it. Many new devices like Nvidia graphics cards and SCSI controllers have drivers now that allow you to set the IOMMU values they can use.

There is no way this value can be set automatically because of the diversity of hardware configurations possible on the market. This means the end user has to design and build his/her system and utilise the best setting for their system.

If one set a large SWIOTLB then one would need to instruct the driver of a device to utilise the larger amount of memory mapping buffer. Some hardware physically control this in the BIOS while others don't provide any control over this at all. For the most part newer high end hardware allows the user to control this as above from the kernel options. Some drivers try to automatically control this but as mentioned above can cause stability issues even kernel panic.

So just setting a large SWIOTLB won't mean you will get a faster IO, you will need instruct your hardware to use it. Rule of thumb is if 64Mb is available then set a maximum remap IO for the driver of 4Mb less which would be 60Mb, If 128 then max remap for the driver would be 124mb and so on.