OneFS NFS Locking and Reporting – Part 2

In the previous article in this series we took a look at the new NFS locks and waiters reporting CLI command set and API endpoints in OneFS 9.5 and later. Next, we turn our attention to some additional context, caveats, and NFSv3 lock removal.

Before to the NFS locking enhancements in OneFS 9.5, the legacy CLI commands were somewhat inefficient.  Their output also included other advisory domain locks, such as SMB, which made the output harder to parse. The table below maps the new 9.5 CLI commands (and corresponding handlers), to the old NLM syntax. Note that the ‘isi_classic nfs locks and waiters’ CLI commands have also been deprecated in OneFS 9.5.

Type / Command set OneFS 9.5 and later OneFS 9.4 and earlier
Locks isi nfs locks isi nfs nlm locks
Sessions isi nfs nlm sessions isi nfs nlm sessions
Waiters isi nfs locks waiters isi nfs nlm locks waiters

When upgrading to OneFS 9.5 or later from a prior release, the legacy platform API handlers continue to function through and post upgrade. So any legacy scripts and automation are protected from this lock reporting deprecation. Additionally, while the new platform API handlers will work in during a rolling upgrade in mixed-mode, they will only return results for the nodes that have already been upgdraded (‘high nodes’).

Be aware that the NFS locking CLI framework does not support partial responses. However, if a node is down or the cluster has a rolling upgrade in progress, the alternative is to query the equivalent platform API endpoint instead.

Performance-wise, on very large, busy clusters, there is the possibility that the lock and waiter CLI commands’ output will be sluggish. In such instances, the ’–timeout flag’ can be used to increase the command timeout window. Additionally, output filtering can also be used to reduce number of locks reported.

When a lock is in a transition state, there is a chance that it may not have/report a version. In these instances, the ‘Version’ field will be represented as ‘—‘. For example:

# isi nfs locks list -v

Client: 1/TMECLI1:487722/10.22.10.250

Client ID: 487722351064074

LIN: 4295164422

Path: /ifs/locks/nfsv3/10.22.10.250_1

Lock Type: exclusive

Range: 0, 92233772036854775807

Created: 2023-08-18T08:03:52

Version: -

---------------------------------------------------------------

Total: 1

This behavior should be experienced very infrequently. However, if it is encountered, just execute the CLI command again and the lock version should be reported correctly.

When it comes to troubleshooting, NFSv3/NLM issues, if an NFSv3 client is consistently experiencing ‘NLM_DENIED’ or other lock management issues, this is often as a result of incorrectly configured firewall rules. For example, take the following packet capture (PCAP) excerpt from a Linux client:

   21 08:50:42.173300992  10.22.10.100 → 10.22.10.200 NLM 106    V4 LOCK Reply (Call In 19) NLM_DENIED

Often the assumption is that only the lockd or statd ports on the server side of the firewall need to be opened, and that the client always makes that connection that way. However, this is not the case. Instead, the server will frequently respond with a ‘let me get back to you’, then, later, it will reconnect to the client. As such, if the firewall blocks access to rcpbind on the client and/or lockd or statd on the client, connection failures will likely occur.

Occasionally it does become necessary to remove NLM locks and waiters from the cluster. Traditionally, the ‘isi_classic nfs clients rm’ command was frequently used, but that command has limitations, and is fully deprecated in OneFS 9.5 and later. Instead, the preferred method is to use the ‘isi nfs nlm sessions’ CLI utility, in conjunction with various other ancillary OneFS CLI commands, to clear problematic locks and waiters.

Note that the  ‘isi nfs nlm sessions’ CLI command, available in all current OneFS version, is Zone-Aware, and the output formatting is seen in the output for the client holding the lock, as it now shows the Zone ID number at the beginning. For example:

4/tme-linux1/10.22.10.250

The above represents:

Zone ID / Client _name / IP address of cluster node holding the connection.

A basic procedure to remove NFSv3 NLM locks and waiters from a cluster is as follows:

1.            First, list the NFS locks and search for the pertinent filename.

In OneFS 9.5 and later, the locks list can be filtered using the ‘–path’ argument.

# isi nfs locks list --path=<path> | grep <filename>

Be aware that the full path must be specified, starting with /ifs. There is no partial matching or substitution for paths in this command set.

For OneFS 9.4 and earlier, the following CLI syntax can be used:

#  isi_for_array -sX 'isi nfs nlm locks list | grep <filename>'

2.            Next, list the lock waiters associated with the same filename.

For OneFS 9.5 and later, the waiters list can also be filtered using the ‘–path’ syntax:

# isi nfs locks waiters –path=<path> | grep <filename>

With OneFS 9.4 and earlier, the following CLI syntax can be used:

# isi_for_array -sX 'isi nfs nlm locks waiters |grep -i <filename>'
  1. The next step is to confirm the client and LIN (logical inode number) being waited upon. This can be accomplished by querying the ‘efs.advlock.failover.lock_waiters’ sysctrl. For example:
# isi_for_array -sX 'sysctl efs.advlock.failover.lock_waiters'

For example:

[truncated output]
...
client = { '4/tme-linux1/10.20.10.200’, 0x26593d37370041 }
...
resource = 2:df86:0218

Note that for sanity checking, the ‘isi get -L’ CLI utility can be used to confirm the path of a file from its LIN:

# isi get -L <LIN>
  1.  The penultimate step involves the actual removal of the unwanted locks which are causing waiters to stack up.

Keep in mind that the ‘isi nfs nlm sessions’ command syntax is access zone-aware. As such, the following syntax will List the access zones by their IDs:

# isi zone zones list -v | grep -iE "Zone ID|name"

Once the desired zone ID has been determined, the ‘isi_run -z’ CLI utility can be used to specify the appropriate zone in which to run the ‘isi nfs nlm sessions’ commands:

# isi_run -z 4 -l root

Next, the ‘isi nfs nlm sessions delete’ CLI command will remove the specific lock waiter which is causing the issue. The command syntax requires specifying the client hostname and node IP of the node holding the lock.

# isi nfs nlm sessions delete –-zone <AZ_zone_ID> <hostname> <cluster-ip>

For example:

# isi nfs nlm sessions delete –zone 4 tme-linux1 10.20.10.200
Are you sure you want to delete all NFSv3 locks associated with client tme-linux1 against cluster IP 10.20.10.100? (yes/[no]): yes

5.   Finally, repeat the commands in step 1 above to confirm that the desired NLM locks and waiters have been successfully culled.

Leave a Reply

Your email address will not be published. Required fields are marked *