OneFS Performance Statistics

There have been several recent inquiries on how to effectively gather performance statistics on a cluster, so it seemed like a useful topic to dig into further in a blog article:

First, two cardinal rules… Before planning or undertaking any performance tuning on a cluster, or its attached clients:

1)  Record the original cluster settings before making any configuration changes to OneFS or its data services.

2)  Measure and analyze how the various workloads in your environment interact with and consume storage resources.

Performance measurement is done by gathering statistics about the common file sizes and I/O operations, including CPU and memory load, network traffic utilization, and latency. To obtain key metrics and wall-clock timing data for delete, renew lease, create, remove, set userdata, get entry, and other file system operations, connect to a node via SSH and run the following command as root to enable the vopstat system control:

# sysctl efs.util.vopstats.record_timings=1

After enabling vopstats, they can be viewed by running the ‘sysctl efs.util.vopstats’ command as root: Here is an example of the command’s output:

# sysctl efs.util.vopstats.

efs.util.vopstats.ifs_snap_set_userdata.initiated: 26 

efs.util.vopstats.ifs_snap_set_userdata.fast_path: 0 

efs.util.vopstats.ifs_snap_set_userdata.read_bytes: 0

efs.util.vopstats.ifs_snap_set_userdata.read_ops: 0

efs.util.vopstats.ifs_snap_set_userdata.raw_read_bytes: 0

efs.util.vopstats.ifs_snap_set_userdata.raw_read_ops: 0

efs.util.vopstats.ifs_snap_set_userdata.raw_write_bytes: 6432768

efs.util.vopstats.ifs_snap_set_userdata.raw_write_ops: 2094

efs.util.vopstats.ifs_snap_set_userdata.timed: 0

efs.util.vopstats.ifs_snap_set_userdata.total_time: 0

efs.util.vopstats.ifs_snap_set_userdata.total_sqr_time: 0

efs.util.vopstats.ifs_snap_set_userdata.fast_path_timed: 0

efs.util.vopstats.ifs_snap_set_userdata.fast_path_total_time: 0   

efs.util.vopstats.ifs_snap_set_userdata.fast_path_total_sqr_time: 0

The time data captures the number of operations that cross the OneFS clock tick, which is 10 milliseconds. Independent of the number of events, the total_sq_time provides no actionable information because of the granularity of events.

To analyze the operations, use the total_time value instead. The following example shows only the total time records in the vopstats:

# sysctl efs.util.vopstats | grep –e "total_ time: [ ^0]"

efs.util.vopstats.access_rights.total_time: 40000

efs.util.vopstats.lookup.total_time:  30001

efs.util.vopstats.unlocked_ write_mbuf.total_time:  340006

efs.util.vopstats.unlocked_write_mbuf.fast_path_total_time: 340006

efs.util.vopstats.commit.total_time: 3940137

efs.util.vopstats.unlocked_getattr.total_time: 280006

efs.util.vopstats.unlocked_getattr.fast_path_total_time:   50001 efs.util.vopstats.inactive.total_time: 100004

efs.util.vopstats.islocked.total_time:   30001 efs.util.vopstats.lock1.total_time:   280005

efs.util.vopstats.unlocked_read_mbuf.total_time: 11720146 efs.util.vopstats.readdir.total_time: 20000

efs.util.vopstats.setattr.total_time: 220010 efs.util.vopstats.unlock.total_time: 20001

efs.util.vopstats.ifs_snap_delete_resume.timed: 77350

efs.util.vopstats.ifs_snap_delete_resume.total_time: 720014

efs.util.vopstats.ifs_snap_delete_resume.total_sqr_time: 7200280042

The ‘isi statistics’ CLI command is a great tool for the task here, plus its output is current (ie. real time). It’s a versatile utility, providing real time stats with the following subcommand-level syntax

Statistics

Category

Details
Client Display cluster usage statistics organized according to cluster hosts and users
Drive Show performance by drive
Heat Identify the most accessed files/directories
List List valid arguments to given option
Protocol Display cluster usage statistics organized by communication protocol
Pstat Generate detailed protocol statistics along with CPU, OneFS, network & disk stats
Query Query for specific statistics. There are current and history modes
System Display general cluster statistics (Op rates for protocols, network & disk traffic (kB/s)

Full command syntax and a description of the options can be accessed via isi statistics –help or via the man page (man isi-statistics).

The ‘isi statistics pstat’ command provides statistics per protocol operation, client connections, and the file system. For example, for  NFSv3:

# isi statistics pstat --protocol=nfs3

The ‘isi statistics client’ CLI command  provides I/O and timing data by client name and/or IP address, depending on options – plus the username if it can be determined. For example, to generate a list of the top NFSv3 clients on a cluster, the following command can be used:

# isi statistics client --protocols=nfs3 –-format=top

Or, for SMB clients:

# isi statistics client --protocols=smb2 –-format=top

The extensive list of protocols for which client data can be displayed includes:

nfs3 | smb1 | nlm | ftp | http | siq | smb2 | nfs4 | papi | jobd | irp | lsass_in | lsass_out | hdfs | all | internal | external

SMB2 and SMB3 current connections are both displayed in the following stats command:

# isi statistics query current --stats node.clientstats.active.smb2

Or for SMB2 + 3 historical connection data:

# isi statistics query history --stats node.clientstats.active.smb2

The following command will total by users, as opposed to node. This can be helpful when investigating HPC workloads, or other workflows involving compute clusters:

# isi statistics client --protocols=nfs3 –-format=top --numeric --totalby=username --sort=ops,timemax

This ‘heat’ command option can be useful for viewing the files that are being utilized:

# isi statistics heat --long --classes=read,write,namespace_read,namespace_write | head -10

This shows the amount of contention where parallel user(s) operations are targeting the same object.

# isi statistics heat --long --classes=read,write,namespace_read,namespace_write --events=blocked,contended,deadlocked | head -10

It can also be useful to constrain statistics reporting to a single node. For example, the following command will show the fifteen hottest files on node 4.

# isi statistics heat --limit=15 --nodes=4

It’s worth noting that isi statistics doesn’t directly tie a client to a file or directory path. Both isi statistics heat and isi statistics client provide some of this information, but not together. The only directory/file related statistics come from the ‘heat’ stats, which track the hottest accesses in the filesystem.

The system and drive statistics can also be useful for performance analysis and troubleshooting purposes. For example:

# isi statistics system -–nodes=all --oprates --nohumanize

This output will give you the per-node OPS over protocol, network and disk. On the disk side, the sum of DiskIn (writes)  and DIskOut (reads) gives the total IOPS for all the drives per node.

For the next level of granularity, the drive statistics command provides individual disk info.

# isi statistics drive -nall -–long --type=sata --sort=busy | head -20

If most or all the drives have high busy percentages, this indicates a uniform resource constraint, and there is a strong likelihood that the cluster is spindle bound. If, say, the top five drives are much busier than the rest, this would suggest a workflow hot-spot.

OneFS Time Synchronization & NTP

OneFS provides a network time protocol (NTP) service to ensure that all nodes in a cluster can easily be synchronized to the same time source. This service automatically adjusts a cluster’s date and time settings to that of one or more external NTP servers.

NTP configuration on a cluster is performed by using the ‘isi ntp’ command line (CLI) utility, rather than modifying the nodes’ /etc/ntp.conf files manually. The syntax for this command is divided into two parts: servers and settings. For example:

# isi ntp settings

Description:

    View and modify cluster NTP configuration.

Required Privileges:

    ISI_PRIV_NTP

Usage:

    isi ntp settings <action>

        [--timeout <integer>]

        [{--help | -h}]

Actions:

    modify    Modify cluster NTP configuration.

    view      View cluster NTP configuration.

Options:

  Display Options:

    --timeout <integer>

        Number of seconds for a command timeout (specified as 'isi --timeout NNN

        <command>').

    --help | -h

        Display help for this command.


There is also an isi_ntp_config CLI command available in OneFS that provides a richer configuration set and combines the server and settings functionality :

Usage: isi_ntp_config COMMAND [ARGUMENTS ...]

Commands:

    help

      Print this help and exit.

    list

      List all configured info.

    add server SERVER [OPTION]

      Add SERVER to ntp.conf.  If ntp.conf is already

      configured for SERVER, the configuration will be replaced.

      You can specify any server option. See NTP.CONF(5)


    delete server SERVER

      Remove server configuration for SERVER if it exists.


    add exclude NODE [NODE...]

      Add NODE (or space separated nodes) to NTP excluded entry.

      Excluded nodes are not used for NTP communication with external

      NTP servers.


    delete exclude NODE [NODE...]

      Delete NODE (or space separated Nodes) from NTP excluded entry.


    keyfile KEYFILE_PATH

      Specify keyfile path for NTP auth. Specify "" to clear value.

      KEYFILE_PATH has to be a path under /ifs.


    chimers [COUNT | "default"]

      Display or modify the number of chimers NTP uses.

      Specify "default" to clear the value.

By default, if the cluster has more than three nodes, three of the nodes are selected as ‘chimers’. Chimers are nodes which can contact the external NTP servers. If the cluster comprises three nodes or less, only one node will be selected as a chimer. If no external NTP server is set, they will use the local clock instead. The other non-chimer nodes will use the chimer nodes as their NTP servers. The chimer nodes are selected by the lowest node number which is not excluded from chimer duty.

If a node is configured as a chimer. its /etc/ntp.conf entry will resemble:

# This node is one of the 3 chimer nodes that can contact external NTP

# servers. The non-chimer nodes will use this node as well as the other

# chimers as their NTP servers.

server time.isilon.com

# The other chimer nodes on this cluster:

server 192.168.10.150 iburst

server 192.168.10.151 iburst

# If none or bad connection to external servers this node may become

# the time server for this cluster. The system clock will be a time

# source and run at a high stratum

In addition to managing NTP servers and authentication, individual nodes can also be excluded from communicating with external NTP servers.

The local clock of the node is set as an NTP server at a high stratum level. In NTP, a server with lower stratum number is preferred, so if an external NTP server is set the system will prefer an external time server if configured. The stratum level for the chimer is determined by the chimer number. The first chimer is set to stratum 9, the second to stratum 11, and the others continue to increment the stratum number by 2. This is so the non-chimer nodes will prefer to get the time from the first chimer if available.

For a non-chimer node, its /etc/ntp.conf entry will resemble:

# This node is _not_ one of the 3 chimer nodes that can contact external

# NTP servers. These are the cluster's chimer nodes:

server 192.168.10.149 iburst true

server 192.168.10.150 iburst true

server 192.168.10.151 iburst true


When configuring NTP on a cluster, more than one NTP server can be specified to synchronize the system time from. This allows for full redundancy of ysnc targets. The cluster periodically contacts these server(s) and adjusts the time and/or date as necessary, based on the information it receives.

The ‘isi_ntp_config’ CLI command can be used to configure which NTP servers a cluster will reference. For example, the following syntax will add the server ‘time.isilon.com’:

# isi_ntp_config add server time.isilon.com

Alternatively, the NTP configuration can also be managed from the WebUI by browsing to Cluster Management > General Settings > NTP.

NTP also provides basic authentication-based security via symmetrical keys, if desired.

If no NTP servers are available, Windows Active Directory (AD) can synchronize domain members to a primary clock running on the domain controller(s). If there are no external NTP servers configured and the cluster is joined to AD, OneFS will use the Windows domain controller as the NTP time server. If the cluster and domain time become out of sync by more than 4 minutes, OneFS generates an event notification.

Be aware though, that if the cluster and Active Directory drift out of time sync by more than 5 minutes, AD authentication will cease to function.

If neither an NTP server or domain controller are available, the cluster’s time, date, and time zone can also be set manually using the ‘isi config’ CLI command.  For example:

1.     Run the ‘isi config’ command. The command-line prompt changes to indicate that you are in the isi config subsystem:

# isi config

Welcome to the Isilon IQ configuration console.

Copyright (c) 2001-2017 EMC Corporation. All Rights Reserved.

Enter 'help' to see list of available commands.

Enter 'help <command>' to see help for a specific command.

Enter 'quit' at any prompt to discard changes and exit.

        Node build: Isilon OneFS v8.2.2 B_8_2_2(RELEASE)Node serial number: JWXER170300301

>>>


2.     Specify the current date and time by running the date command. For example, the following command sets the cluster time to 9:20 AM on April 23, 2020:

>>> date 2020/04/23 09:20:00

Date is set to 2020/04/23 09:20:00

3.     The ‘help timezone’ command will list the available timezones. For example:

>>> help timezone


timezone [<timezone identifier>]


Sets the time zone on the cluster to the specified time zone.

Valid time zone identifiers are:

        Greenwich Mean Time

        Eastern Time Zone

        Central Time Zone

        Mountain Time Zone

        Pacific Time Zone

        Arizona

        Alaska

        Hawaii

        Japan

        Advanced

4.     To verify the currently configured time zone, run the ‘timezone’ command. For example:

>>> timezone

The current time zone is: Greenwich Mean Time

5.     To change the time zone, enter the timezone command followed by one of the displayed options. For example, the following command changes the time zone to Alaska:

>>> timezone Alaska

Time zone is set to Alaska

A message confirming the new time zone setting displays. If your desired time zone did not display when you ran the help timezone command, enter ‘timezone Advanced’. After a warning screen, you will proceed to a list of regions. When you select a region, a list of specific time zones for that region appears. Select the desired time zone (you may need to scroll), then enter OK or Cancel until you return to the isi config prompt.

6.     When done, run the commit command to save your changes and exit isi config.

>>> commit

Commit succeeded.

Alternatively, these time and date parameters can also be managed from the WebUI by browsing to Cluster Management > General Settings > Date and Time.