Exposing Infiniband (or any PCI device) to the VMs on an OpenStack Cloud

For a recent project we (me and Wuming) had to provide an OpenStack cloud in which the raw infiniband protocol would be available to the VMs running on the cloud.

The installation was done on Ubuntu 14.04.3 with vanilla OpenStack. Exposing infiniband requires quite a few steps and reading/googling quite a bit so I will document it here in case somebody needs to do the same.

To expose the native hardware interfaces to the VM:

  • The BIOS of the computer has to support it (you may need to activate Intel VT-d (or AMD I/O Virtualization Technology), as with virtualization extensions, it may be off by default). Explore the BIOS of your servers to activate it if necessary;
  • The Infiniband cards themselves have to support it. Look for SR-IOV in an “lspci -v” output as below:

$ sudo lspci -v |grep -A40 Mellanox
04:00.0 Network controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
Subsystem: Hewlett-Packard Company Device 22f5
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at 96000000 (64-bit, non-prefetchable) [size=1M]
Memory at 94000000 (64-bit, prefetchable) [size=32M]
Capabilities: [40] Power Management version 3
Capabilities: [48] Vital Product Data
Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
Capabilities: [60] Express Endpoint, MSI 00
Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
Capabilities: [148] Device Serial Number 24-be-05-ff-ff-b6-e3-40
Capabilities: [108] Single Root I/O Virtualization (SR-IOV)
Capabilities: [154] Advanced Error Reporting
Capabilities: [18c] #19
Kernel driver in use: mlx4_core

 

  • The kernel in linux needs to be configured by passing the option intel_iommu=on. You can do it by editing the file /etc/default/grub so that it contains the option GRUB_CMDLINE_LINUX_DEFAULT=”intel_iommu=on” and running update-grub;
  • The Infiniband cards need to be configured to expose the VFs. Edit the file /etc/modprobe.d/mlx4_core.conf to contain options like: options mlx4_core num_vfs=16 (or as high as your card supports. One VM will take one VF so this could be the limiting factor as of how many VMs can be deployed per compute node). You can find more documentation for the mellanox cards in the Mellanox Linux User Manual Mellanox_OFED_Linux_User_Manual_v3.10 or here since you may need to enable this option (in our case it was enabled already);
  • Nova has to be configured to allow pci passthrough. Follow the documentation in here on “How to prepare the environment”
    • Configure the nova (find the Virtualized Interfaces vendor_id and product_id for your case by running an lspci -vn)
    • Create a flavor that automatically adds the interface to the VM

And now… Just launch a VM and log in! You should see the Infiniband card!