How To Configure NFS over RDMA

Starting from OneFS 9.2.0.0, NFSv3 over RDMA is introduced for better performance. Please refer to Chapter 6 of OneFS NFS white paper for the technical details. This blog provides a guidance to use NFSv3 over RDMA feature for your OneFS clusters. The NFSv3 over RDMA feature has a hard requirement that the clients must have ROCEv2 capabilities. Therefore, configuration is also required on client side.

OneFS Cluster configuration

To use NFSv3 over RDMA, your OneFS cluster hardware must meet requirements:

  • Node type: All Gen6 (F800/F810/H600/H500/H400/A200/A2000), F200, F600, F900
  • Front end network: Mellanox ConnectX-3 Pro, ConnectX-4 and ConnectX-5 network adapters that deliver 25/40/100 GigE speed.
  1. Check your cluster network interfaces that has ROCEv2 capability by running the following command and find the interfaces that contains SUPPORTS_RDMA_RRoCE This is only available through CLI.

# isi network interfaces list -v

  1. Create an IP pool that contains ROCEv2 capable network interface.

(CLI)

# isi network pools create –id=groupnet0.40g.40gpool1 –ifaces=1:40gige- 1,1:40gige-2,2:40gige-1,2:40gige-2,3:40gige-1,3:40gige-2,4:40gige-1,4:40gige-2 –ranges=172.16.200.129-172.16.200.136 –access-zone=System –nfsv3-rroce-only=true

(WebUI) Cluster management –> Network configuration

  1. Enable NFSv3 over RDMA feature by running the following command.

(CLI)

# isi nfs settings global modify –nfsv3-enabled=true –nfsv3-rdma-enabled=true

(WebUI) Protocols –> UNIX sharing(NFS) –> Global settings

  1. Enable OneFS cluster NFS service by running the following command.

(CLI)

# isi services nfs enable

(WebUI) See step 3

  1. Create NFS export by running the following command. The –map-root-enabled=false is used to disable NFS export root-squash for testing purpose, which allows root user to access OneFS cluster data via NFS.

(CLI)

# isi nfs exports create –paths=/ifs/export_rdma –map-root-enabled=false

(WebUI) Protocols –> UNIX sharing (NFS) –> NFS exports

NFSv3 over RDMA client configuration

Note: As the client OS and Mellanox NICs may vary in your environment, you need to look for your client OS documentation and Mellanox documentation for the accurate and detailed configuration steps. This section only demonstrates an example configuration using our in-house lab equipment.

To use NFSv3 over RDMA service of OneFS cluster, your NFSv3 client hardware must meet requirements:

  • RoCEv2 capable NICs: Mellanox ConnectX-3 Pro, ConnectX-4, ConnectX-5, and ConnectX-6
  • NFS over RDMA Drivers: Mellanox OpenFabrics Enterprise Distribution for Linux (MLNX_OFED) or OS Distributed inbox driver. It is recommended to install Mellanox OFED driver to gain the best performance.

If you just want to have a functional test on the NFSv3 over RDMA feature, you can set up Soft-RoCE for your client.

Set up a RDMA capable client on physical machine

In the following steps, we are using the Dell PowerEdge R630 physical server with CentOS 7.9 and Mellanox ConnectX-3 Pro installed.

  1. Check OS version by running the following command:

[root]hopisdtmesrv177# cat /etc/redhat-release

CentOS Linux release 7.9.2009 (Core)

 

  1. Check the network adapter model and information. From the output, we can find the ConnectX-3 Pro is installed, and the network interfaces are named 40gig1 and 40gig2.

[root]hopisdtmesrv177# lspci | egrep -i –color ‘network|ethernet’

 

01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)

 

01:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)

 

03:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

 

05:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)

 

05:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)

 

[root]hopisdtmesrv177# lshw -class network -short

 

H/W path

 

==========================================================

 

/0/102/2/0           40gig1     network        MT27520 Family [ConnectX-3 Pro]

 

/0/102/3/0                      network        82599ES 10-Gigabit SFI/SFP+ Network Connection

 

/0/102/3/0.1                    network        82599ES 10-Gigabit SFI/SFP+ Network Connection

 

/0/102/1c.4/0        1gig1      network        I350 Gigabit Network Connection

 

/0/102/1c.4/0.1      1gig2      network        I350 Gigabit Network Connection

 

/3                   40gig2     network        Ethernet interface

 

  1. Find the suitable Mellanox OFED driver version from Mellanox website. As of MLNX_OFED v5.1, ConnectX-3 Pro are no longer supported and can be utilized through MLNX_OFED LTS version. See Figure 3. If you are using ConnectX-4 and above, you can use the latest Mellanox OFED version.
  • MLNX_OFED LTS Download

An important note: the NFSoRDMA module was removed from the Mellanox OFED 4.0-2.0.0.1 version, then it was added again in Mellanox OFED 4.7-3.2.9.0 version. Please refer to Release Notes Change Log History for the details.

  1. Download the MLNX_OFED 4.9-2.2.4.0 driver for ConnectX-3 Pro to your client.
  2. Extract the driver package, find the “mlnxofedinstall” script to install the driver. As of MLNX_OFED v4.7, NFSoRDMA driver is no longer installed by default. In order to install it over a supported kernel, add the “–with-nfsrdma” installation option to the “mlnxofedinstall” script. Firmware update is skipped in this example, please update it as needed.

[root]hopisdtmesrv177#  ./mlnxofedinstall –with-nfsrdma –without-fw-update

 

Logs dir: /tmp/MLNX_OFED_LINUX.19761.logs

 

General log file: /tmp/MLNX_OFED_LINUX.19761.logs/general.log

 

Verifying KMP rpms compatibility with target kernel…

 

This program will install the MLNX_OFED_LINUX package on your machine.

 

Note that all other Mellanox, OEM, OFED, RDMA or Distribution IB packages will be removed.

 

Those packages are removed due to conflicts with MLNX_OFED_LINUX, do not reinstall them.

 

 

 

Do you want to continue?[y/N]:y

 

 

Uninstalling the previous version of MLNX_OFED_LINUX

 

 

rpm –nosignature -e –allmatches –nodeps mft

 

 

Starting MLNX_OFED_LINUX-4.9-2.2.4.0 installation …

 

Installing mlnx-ofa_kernel RPM

 

Preparing…                          ########################################

 

Updating / installing…

 

mlnx-ofa_kernel-4.9-OFED.4.9.2.2.4.1.r########################################

 

Installing kmod-mlnx-ofa_kernel 4.9 RPM

 

 

 

 

Preparing…                          ########################################

 

mpitests_openmpi-3.2.20-e1a0676.49224 ########################################

 

Device (03:00.0):

 

03:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

 

Link Width: x8

 

PCI Link Speed: 8GT/s

 

 

 

 

Installation finished successfully.

 

 

 

 

Preparing…                          ################################# [100%]

 

Updating / installing…

 

1:mlnx-fw-updater-4.9-2.2.4.0      ################################# [100%]

 

 

 

Added ‘RUN_FW_UPDATER_ONBOOT=no to /etc/infiniband/openib.conf

 

 

Skipping FW update.

 

To load the new driver, run:

 

/etc/init.d/openibd restart

  1. Load the new driver by running the following command. Unload all module that is in use prompted by the command.

[root]hopisdtmesrv177# /etc/init.d/openibd restart

 

Unloading HCA driver:                                      [  OK  ]

 

Loading HCA driver and Access Layer:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [&nbsp; OK&nbsp; ]<br>

  1. Check the driver version to ensure the installation is successful.

[root]hopisdtmesrv177# ethtool -i 40gig1

 

driver: mlx4_en

 

version: 4.9-2.2.4

 

firmware-version: 2.36.5080

 

expansion-rom-version:

 

bus-info: 0000:03:00.0

 

supports-statistics: yes

 

supports-test: yes

 

supports-eeprom-access: no

 

supports-register-dump: no

 

supports-priv-flags: yes

  1. Check the NFSoRDMA module is also installed. If you are using a driver downloaded from server vendor website (like Dell PowerEdge server) rather than Mellanox website, the NFSoRDMA module may not be included in the driver package. You must obtain the NFSoRDMA module from Mellanox driver package and install it.

[root]hopisdtmesrv177# yum list installed | grep nfsrdma

 

kmod-mlnx-nfsrdma.x86_64&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5.0-OFED.5.0.2.1.8.1.g5f67178.rhel7u8

  1. Mount NFS export with RDMA protocol.

[root]hopisdtmesrv177#&nbsp; mount -t nfs -vo nfsvers=3,proto=rdma,port=20049 172.16.200.29:/ifs/export_rdma /mnt/export_rdma

 

mount.nfs: timeout set for Tue Feb 16 21:47:16 2021

 

mount.nfs: trying text-based options ‘nfsvers=3,proto=rdma,port=20049,addr=172.16.200.29’

Useful reference for Mellanox OFED documentation:

Set up Soft-RoCE client for functional test only

Soft-RoCE (also known as RXE) is a software implementation of RoCE that allows RoCE to run on any Ethernet network adapter whether it offers hardware acceleration or not. Soft-RoCE is released as part of upstream kernel 4.8 (or above). It is intended for users who wish to test RDMA on software over any 3rd party adapters.

In the following example configuration, we are using CentOS 7.9 virtual machine to configure Soft-RoCE. Since Red Hat Enterprise Linux 7.4, the Soft-RoCE driver is already merged into the kernel.

  1. Install required software packages.

[root@swsawiklugv1c ~]# yum install -y nfs-utils rdma-core libibverbs-utils

  1. Start Soft-RoCE.

[root@swsawiklugv1c ~]# rxe_cfg start

  1. Get status, which will display ethernet interfaces

[root@swsawiklugv1c ~]# rxe_cfg status

 

rdma_rxe module not loaded

 

Name   Link  Driver  Speed  NMTU  IPv4_addr        RDEV  RMTU

 

ens33  yes   e1000          1500  192.168.198.129

  1. Verify RXE kernel module is loaded by running the following command, ensure that you see rdma_rxe in the list of modules.

[root@swsawiklugv1c ~]# lsmod | grep rdma_rxe

 

rdma_rxe              114188  0

 

ip6_udp_tunnel         12755  1 rdma_rxe

 

udp_tunnel             14423  1 rdma_rxe

 

ib_core               255603  13 rdma_cm,ib_cm,iw_cm,rpcrdma,ib_srp,ib_iser,ib_srpt,ib_umad,ib_uverbs,rdma_rxe,rdma_ucm,ib_ipoib,ib_isert

  1. Create a new RXE device/interface by using rxe_cfg add <interface from rxe_cfg status>.

# rxe_cfg add ens33

  1. Check status again, make sure the rxe0 was added under RDEV (rxe device)

[root@swsawiklugv1c ~]# rxe_cfg status

 

Name   Link  Driver  Speed  NMTU  IPv4_addr        RDEV  RMTU

 

ens33  yes   e1000          1500  192.168.198.129  rxe0  1024  (3)

  1. Mount NFS export with RDMA protocol.

[root@swsawiklugv1c ~]# mount -t nfs -o nfsvers=3,proto=rdma,port=20049 172.16.200.29:/ifs/export_rdma /mnt/export_rdma

You can refer to Red Hat Enterprise Linux configuring Soft-RoCE for more details.

Leave a Reply

Your email address will not be published. Required fields are marked *