How To Configure NFS over RDMA

Starting from OneFS 9.2.0.0, NFSv3 over RDMA is introduced for better performance. Please refer to Chapter 6 of OneFS NFS white paper for the technical details. This article provides guidance on using the NFSv3 over RDMA feature with your OneFS clusters. Note that the OneFS NFSv3 over RDMA functionality requires that any clients are ROCEv2 capable. As such, client-side configuration is also needed.

OneFS Cluster configuration

To use NFSv3 over RDMA, your OneFS cluster hardware must meet requirements:

  • Node type: All Gen6 (F800/F810/H600/H500/H400/A200/A2000), F200, F600, F900
  • Front end network: Mellanox ConnectX-3 Pro, ConnectX-4 and ConnectX-5 network adapters that deliver 25/40/100 GigE speed.

1. Check your cluster network interfaces have ROCEv2 capability by running the following command and noting the interfaces that report ‘SUPPORTS_RDMA_RRoCE’. This check is only available on the CLI.

# isi network interfaces list -v

2. Create an IP pool that contains ROCEv2 capable network interface.

(CLI)

# isi network pools create --id=groupnet0.40g.40gpool1 --ifaces=1:40gige- 1,1:40gige-2,2:40gige-1,2:40gige-2,3:40gige-1,3:40gige-2,4:40gige-1,4:40gige-2 --ranges=172.16.200.129-172.16.200.136 --access-zone=System --nfsv3-rroce-only=true

(WebUI) Cluster management –> Network configuration

3. Enable NFSv3 over RDMA feature by running the following command.

(CLI)

# isi nfs settings global modify --nfsv3-enabled=true --nfsv3-rdma-enabled=true

(WebUI) Protocols –> UNIX sharing(NFS) –> Global settings

4. Enable OneFS cluster NFS service by running the following command.

(CLI)

# isi services nfs enable

(WebUI) See step 3

5. Create NFS export by running the following command. The –map-root-enabled=false is used to disable NFS export root-squash for testing purpose, which allows root user to access OneFS cluster data via NFS.

(CLI)

# isi nfs exports create --paths=/ifs/export_rdma --map-root-enabled=false

(WebUI) Protocols –> UNIX sharing (NFS) –> NFS exports

NFSv3 over RDMA client configuration

Note: As the client OS and Mellanox NICs may vary in your environment, you need to look for your client OS documentation and Mellanox documentation for the accurate and detailed configuration steps. This section only demonstrates an example configuration using our in-house lab equipment.

To use NFSv3 over RDMA service of OneFS cluster, your NFSv3 client hardware must meet requirements:

  • RoCEv2 capable NICs: Mellanox ConnectX-3 Pro, ConnectX-4, ConnectX-5, and ConnectX-6
  • NFS over RDMA Drivers: Mellanox OpenFabrics Enterprise Distribution for Linux (MLNX_OFED) or OS Distributed inbox driver. It is recommended to install Mellanox OFED driver to gain the best performance.

If you just want to have a functional test on the NFSv3 over RDMA feature, you can set up Soft-RoCE for your client.

Set up a RDMA capable client on physical machine

In the following steps, we are using the Dell PowerEdge R630 physical server with CentOS 7.9 and Mellanox ConnectX-3 Pro installed.

  1. Check OS version by running the following command:
# cat /etc/redhat-release

CentOS Linux release 7.9.2009 (Core)

 

2. Check the network adapter model and information. From the output, we can find the ConnectX-3 Pro is installed, and the network interfaces are named 40gig1 and 40gig2.

# lspci | egrep -i --color 'network|ethernet'

01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)

01:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)

03:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

05:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)

05:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)

# lshw -class network -short

H/W path

==========================================================

/0/102/2/0           40gig1     network        MT27520 Family [ConnectX-3 Pro]

/0/102/3/0                      network        82599ES 10-Gigabit SFI/SFP+ Network Connection

/0/102/3/0.1                    network        82599ES 10-Gigabit SFI/SFP+ Network Connection

/0/102/1c.4/0        1gig1      network        I350 Gigabit Network Connection

/0/102/1c.4/0.1      1gig2      network        I350 Gigabit Network Connection

/3                   40gig2     network        Ethernet interface

3. Find the suitable Mellanox OFED driver version from Mellanox website. As of MLNX_OFED v5.1, ConnectX-3 Pro are no longer supported and can be utilized through MLNX_OFED LTS version. See Figure 3. If you are using ConnectX-4 and above, you can use the latest Mellanox OFED version.

  • MLNX_OFED LTS Download

An important note: the NFSoRDMA module was removed from the Mellanox OFED 4.0-2.0.0.1 version, then it was added again in Mellanox OFED 4.7-3.2.9.0 version. Please refer to Release Notes Change Log History for the details.

4. Download the MLNX_OFED 4.9-2.2.4.0 driver for ConnectX-3 Pro to your client.

5. Extract the driver package, find the “mlnxofedinstall” script to install the driver. As of MLNX_OFED v4.7, NFSoRDMA driver is no longer installed by default. In order to install it over a supported kernel, add the “–with-nfsrdma” installation option to the “mlnxofedinstall” script. Firmware update is skipped in this example, please update it as needed.

#  ./mlnxofedinstall --with-nfsrdma --without-fw-update

Logs dir: /tmp/MLNX_OFED_LINUX.19761.logs

General log file: /tmp/MLNX_OFED_LINUX.19761.logs/general.log

Verifying KMP rpms compatibility with target kernel...

This program will install the MLNX_OFED_LINUX package on your machine.

Note that all other Mellanox, OEM, OFED, RDMA or Distribution IB packages will be removed.

Those packages are removed due to conflicts with MLNX_OFED_LINUX, do not reinstall them.

Do you want to continue?[y/N]:y

Uninstalling the previous version of MLNX_OFED_LINUX

rpm --nosignature -e --allmatches --nodeps mft

Starting MLNX_OFED_LINUX-4.9-2.2.4.0 installation ...

Installing mlnx-ofa_kernel RPM

Preparing...                          ########################################

Updating / installing...

mlnx-ofa_kernel-4.9-OFED.4.9.2.2.4.1.r########################################

Installing kmod-mlnx-ofa_kernel 4.9 RPM
...

Preparing...                          ########################################

mpitests_openmpi-3.2.20-e1a0676.49224 ########################################

Device (03:00.0):

03:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

Link Width: x8

PCI Link Speed: 8GT/s

Installation finished successfully.

Preparing...                          ################################# [100%]

Updating / installing...

1:mlnx-fw-updater-4.9-2.2.4.0      ################################# [100%]

Added 'RUN_FW_UPDATER_ONBOOT=no to /etc/infiniband/openib.conf

Skipping FW update.

To load the new driver, run:

# /etc/init.d/openibd restart

6. Load the new driver by running the following command. Unload all module that is in use prompted by the command.

# /etc/init.d/openibd restart

Unloading HCA driver:                                      [  OK  ]

Loading HCA driver and Access Layer:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [&nbsp; OK&nbsp; ]<br>

7. Check the driver version to ensure the installation is successful.

# ethtool -i 40gig1

driver: mlx4_en

version: 4.9-2.2.4

firmware-version: 2.36.5080

expansion-rom-version:

bus-info: 0000:03:00.0

supports-statistics: yes

supports-test: yes

supports-eeprom-access: no

supports-register-dump: no

supports-priv-flags: yes

8. Check the NFSoRDMA module is also installed. If you are using a driver downloaded from server vendor website (like Dell PowerEdge server) rather than Mellanox website, the NFSoRDMA module may not be included in the driver package. You must obtain the NFSoRDMA module from Mellanox driver package and install it.

# yum list installed | grep nfsrdma

kmod-mlnx-nfsrdma.x86_64&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5.0-OFED.5.0.2.1.8.1.g5f67178.rhel7u8

9. Mount NFS export with RDMA protocol.

#&nbsp; mount -t nfs -vo nfsvers=3,proto=rdma,port=20049 172.16.200.29:/ifs/export_rdma /mnt/export_rdma

mount.nfs: timeout set for Tue Feb 16 21:47:16 2021

mount.nfs: trying text-based options 'nfsvers=3,proto=rdma,port=20049,addr=172.16.200.29'

Useful reference for Mellanox OFED documentation:

Set up Soft-RoCE client for functional test only

Soft-RoCE (also known as RXE) is a software implementation of RoCE that allows RoCE to run on any Ethernet network adapter whether it offers hardware acceleration or not. Soft-RoCE is released as part of upstream kernel 4.8 (or above). It is intended for users who wish to test RDMA on software over any 3rd party adapters.

In the following example configuration, we are using CentOS 7.9 virtual machine to configure Soft-RoCE. Since Red Hat Enterprise Linux 7.4, the Soft-RoCE driver is already merged into the kernel.

1. Install required software packages.

# yum install -y nfs-utils rdma-core libibverbs-utils

2. Start Soft-RoCE.

# rxe_cfg start

3. Get status, which will display ethernet interfaces

# rxe_cfg status

rdma_rxe module not loaded

Name   Link  Driver  Speed  NMTU  IPv4_addr        RDEV  RMTU

ens33  yes   e1000          1500  192.168.198.129

4. Verify RXE kernel module is loaded by running the following command, ensure that you see rdma_rxe in the list of modules.

# lsmod | grep rdma_rxe

rdma_rxe              114188  0

ip6_udp_tunnel         12755  1 rdma_rxe

udp_tunnel             14423  1 rdma_rxe

ib_core               255603  13 rdma_cm,ib_cm,iw_cm,rpcrdma,ib_srp,ib_iser,ib_srpt,ib_umad,ib_uverbs,rdma_rxe,rdma_ucm,ib_ipoib,ib_isert

5. Create a new RXE device/interface by using rxe_cfg add <interface from rxe_cfg status>.

# rxe_cfg add ens33

6. Check status again, make sure the rxe0 was added under RDEV (rxe device)

# rxe_cfg status

Name   Link  Driver  Speed  NMTU  IPv4_addr        RDEV  RMTU

ens33  yes   e1000          1500  192.168.198.129  rxe0  1024  (3)

7. Mount NFS export with RDMA protocol.

# mount -t nfs -o nfsvers=3,proto=rdma,port=20049 172.16.200.29:/ifs/export_rdma /mnt/export_rdma

You can refer to Red Hat Enterprise Linux configuring Soft-RoCE for more details.

Leave a Reply

Your email address will not be published. Required fields are marked *