OneFS NFS Performance Resource Monitoring

Another feature of the recent OneFS 9.3 release is enhanced performance resource monitoring for the NFS protocol, adding path tracking for NFS and bringing it up to parity with the SMB protocol resource monitoring.

But first, a quick refresher. The performance resource monitoring framework enables OneFS to track and report the use of transient system resources (ie. resources that only exists at a given instant), providing insight into who is consuming what resources, and how much of them. Examples include CPU time, network bandwidth, IOPS, disk accesses, and cache hits (but not currently disk space or memory usage). OneFS performance resource monitoring initially debuted in OneFS 8.0.1 and is an ongoing project, which ultimately will provide both insights and control. This will allow prioritization of work flowing through the system, prioritization and protection of mission critical workflows, and the ability to detect if a cluster is at capacity.

Since identification of work is highly subjective, OneFS performance resource monitoring provides significant configuration flexibility, allowing cluster admins to define exactly how they wish to track workloads. For example, an administrator might want to partition their work based on criterial like which user is accessing the cluster, the export/share they are using, which IP address they’re coming from – and often a combination of all three.

So why not just track it all, you may ask? It would simply generate too much data (potentially requiring a cluster just to monitor your cluster!).

OneFS has always provided client and protocol statistics, however they were typically front-end only. Similarly, OneFS provides CPU, cache and disk statistics, but they did not display who was consuming them. Partitioned performance unites these two realms, tracking the usage of the CPU, drives and caches, and spanning the initiator/participant barrier.

Under the hood, OneFS collects the resources consumed, grouped into distinct workloads, and the aggregation of these workloads comprise a performance dataset.

Item Description Example
Workload ·         A set of identification metrics and resources used {username:nick, zone_name:System} consumed {cpu:1.5s, bytes_in:100K, bytes_out:50M, …}
Performance Dataset ·         The set of identification metrics to aggregate workloads by

·         The list of workloads collected matching that specification

 

{usernames, zone_names}
Filter ·         A method for including only workloads that match specific identification metrics. Filter{zone_name:System}

ü  {username:nick, zone_name:System}

ü  {username:jane, zone_name:System}

×         {username:nick, zone_name:Perf}

The following metrics are tracked:

Category Items
Identification Metrics ·         Username / UID / SID

·         Primary Groupname / GID / GSID

·         Secondary Groupname / GID / GSID

·         Zone Name

·         Local/Remote IP Address/Range

·         Path*

·         Share / Export ID

·         Protocol

·         System Name*

·         Job Type

Transient Resources ·         CPU Usage

·         Bytes In/Out

·         IOPs

·         Disk Reads/Writes

·         L2/L3 Cache Hits

Performance Statistics ·         Read/Write/Other Latency
Supported Protocols ·         NFS

·         SMB

·         S3

·         Jobs

·         Background Services

With the exception of the system dataset, performance datasets must be configured before statistics are collected. This is typically performed via the ‘isi performance’ CLI command set, but can also be done via the platform API:

 https://<node_ip>:8080/platform/performance

Once a performance dataset has been configured, it will continue to collect resource usage statistics every 30 seconds until it is deleted. These statistics can be viewed via the ‘isi statistics’ CLI interface.

This is as simple as, first, creating a dataset specifying which identification metrics you wish to partition work by:

# isi performance dataset create --name ds_test1 username protocol export_id share_name

Next, waiting 30 seconds for data to collect.

Finally, viewing the performance dataset:

# isi statistics workload --dataset ds_test1

    CPU  BytesIn  BytesOut   Ops  Reads  Writes   L2   L3  ReadLatency  WriteLatency  OtherLatency  UserName  Protocol  ExportId  ShareName  WorkloadType

---------------------------------------------------------------------------------------------------------------------------------------------------------

 11.0ms     2.8M     887.4   5.5    0.0   393.7  0.3  0.0      503.0us       638.8us         7.4ms       nick      nfs3         1          -             -

  1.2ms    10.0K     20.0M  56.0   40.0     0.0  9.0  0.0        0.0us         0.0us         0.0us      jane      nfs3         3          -             -

  1.0ms    18.3M      17.0  10.0    0.0    47.0  0.0  0.0        0.0us       100.2us         0.0us      jane      smb2         -       home             -

 31.4us     15.1      11.7   0.1    0.0     0.0  0.0  0.0      349.3us         0.0us         0.0us       nick      nfs4         4          -             -

166.3ms      0.0       0.0   0.0    0.0     0.1  0.0  0.0        0.0us         0.0us         0.0us         -         -         -          -      Excluded

 31.6ms      0.0       0.0   0.0    0.0     0.0  0.0  0.0        0.0us         0.0us         0.0us         -         -         -          -        System

 70.2us      0.0       0.0   0.0    0.0     3.3  0.1  0.0        0.0us         0.0us         0.0us         -         -         -          -       Unknown

  0.0us      0.0       0.0   0.0    0.0     0.0  0.0  0.0        0.0us         0.0us         0.0us         -         -         -          -    Additional

  0.0us      0.0       0.0   0.0    0.0     0.0  0.0  0.0        0.0us         0.0us         0.0us         -         -         -          - Overaccounted

---------------------------------------------------------------------------------------------------------------------------------------------------------

Total: 8

Be aware that OneFS can only harvest a limited number of workloads per dataset. As such, it keeps track of the highest resource consumers, which it considers top workloads, and outputs as many of those as possible, up to a limit of either 1024 top workloads or 1.5MB memory per sample (whichever is reached first).

When you’re done with a performance dataset, it can be easily removed with the ‘isi performance dataset delete <dataset_name> syntax. For example:

# isi performance dataset delete ds_test1

Are you sure you want to delete the performance dataset.? (yes/[no]): yes

Deleted performance dataset 'ds_test1' with ID number 1.

Performance resource monitoring includes the five aggregate workloads below for special cases, and OneFS can output up to 1024 additional pinned workloads. Plus, a cluster administrator can configure up to four custom datasets, or special workloads, which can aggregate special case workloads.

Aggregate Workload Description
Additional ·         Not in the ‘top’ workloads
Excluded ·         Does not match the dataset definition (ie. missing a required metric such as export_id)

·         Does not match any applied filter (See later in the slide deck)

Overaccounted ·         Total of work that has appeared in multiple workloads within the dataset

·         Can happen for datasets using path and/or group metrics

System ·         Background system/kernel work
Unknown ·         OneFS couldn’t determine where this work came from

The ‘system’ dataset is created by default and cannot be deleted or renamed. Plus, it is the only dataset that includes the OneFS services and job engine’s resource metrics:

OneFS Resource Details
System service ·         Any process started by isi_mcp / isi_daemon

·         Any protocol (likewise) service

Jobs ·         Includes job ID, job type, and phase

Non-protocol work only includes CPU, cache and drive statistics, but no bandwidth usage, op counts, or latencies. Plus, protocols (ie. S3) that haven’t been fully integrated into the resource monitoring framework yet also get these basic statistics, and only in the system dataset

If no dataset is specified in an ‘isi statistics workload’ command, the ‘system’ dataset statistics are displayed by default:

# isi statistics workload

Performance workloads can also be ‘pinned’, allowing a workload to be tracked even when it’s not a ‘top’ workload, and regardless of how many resources it consumes. This can be configured with the following CLI syntax:

# isi performance workloads pin <dataset_name/id> <metric>:<value>

Note that all metrics for the dataset must be specified. For example:

# isi performance workloads pin ds_test1 username:jane protocol:nfs3 export_id:3

# isi statistics workload --dataset ds_test1

    CPU  BytesIn  BytesOut   Ops  Reads  Writes   L2   L3  ReadLatency  WriteLatency  OtherLatency  UserName  Protocol  ExportId   WorkloadType

-----------------------------------------------------------------------------------------------------------------------------------------------

 11.0ms     2.8M     887.4   5.5    0.0   393.7  0.3  0.0      503.0us       638.8us         7.4ms       nick      nfs3         1              -

  1.2ms    10.0K     20.0M  56.0   40.0     0.0  0.0  0.0        0.0us         0.0us         0.0us      jane      nfs3         3         Pinned <- Always Visible

 31.4us     15.1      11.7   0.1    0.0     0.0  0.0  0.0      349.3us         0.0us         0.0us       jim      nfs4         4              -    Workload

166.3ms      0.0       0.0   0.0    0.0     0.1  0.0  0.0        0.0us         0.0us         0.0us         -         -         -       Excluded

 31.6ms      0.0       0.0   0.0    0.0     0.0  0.0  0.0        0.0us         0.0us         0.0us         -         -         -         System

 70.2us      0.0       0.0   0.0    0.0     3.3  0.1  0.0        0.0us         0.0us         0.0us         -         -         -        Unknown

  0.0us      0.0       0.0   0.0    0.0     0.0  0.0  0.0        0.0us         0.0us         0.0us         -         -         -     Additional <- Unpinned workloads

  0.0us      0.0       0.0   0.0    0.0     0.0  0.0  0.0        0.0us         0.0us         0.0us         -         -         -  Overaccounted    that didn’t make it

-----------------------------------------------------------------------------------------------------------------------------------------------

Total: 8

Workload filters can also be configured to restrict output to just those workloads which match a specified criteria. This allows for more finely-tuned tracking, which can be invaluable when dealing with large number of workloads. Configuration is greedy, since a workload will be included if it matches any applied filter. Filtering can be implemented with the following CLI syntax:

# isi performance dataset create <all_metrics> --filters <filtered_metrics>

# isi performance filter apply <dataset_name/id> <metric>:<value>

For example:

# isi performance dataset create --name ds_test1 username protocol export_id --filters username,protocol

# isi performance filters apply ds_test1 username:nick protocol:nfs3

# isi statistics workload --dataset ds_test1

    CPU  BytesIn  BytesOut   Ops  Reads  Writes   L2   L3  ReadLatency  WriteLatency  OtherLatency  UserName  Protocol  ExportId   WorkloadType

-----------------------------------------------------------------------------------------------------------------------------------------------

 11.0ms     2.8M     887.4   5.5    0.0   393.7  0.3  0.0      503.0us       638.8us         7.4ms       nick      nfs3         1              - <- Matches Filter

 13.0ms     1.4M     600.4   2.5    0.0   200.7  0.0  0.0      405.0us       638.8us         8.2ms       nick      nfs3         7              - <- Matches Filter

167.5ms    10.0K     20.0M  56.1   40.0     0.1  0.0  0.0      349.3us         0.0us         0.0us         -         -         -       Excluded <- Aggregate of not

 31.6ms      0.0       0.0   0.0    0.0     0.0  0.0  0.0        0.0us         0.0us         0.0us         -         -         -         System    matching filter

 70.2us      0.0       0.0   0.0    0.0     3.3  0.1  0.0        0.0us         0.0us         0.0us         -         -         -        Unknown

  0.0us      0.0       0.0   0.0    0.0     0.0  0.0  0.0        0.0us         0.0us         0.0us         -         -         -     Additional

  0.0us      0.0       0.0   0.0    0.0     0.0  0.0  0.0        0.0us         0.0us         0.0us         -         -         -  Overaccounted

-----------------------------------------------------------------------------------------------------------------------------------------------

Total: 8

Be aware that the metric to filter on must be specified when creating the dataset. A dataset with a filtered metric, but no filters applied, will output an empty dataset.

As mentioned previously, NFS path tracking is the principal enhancement to OneFS 9.3 performance resource monitoring, and this can be easily enabled as follows (for both the NFS and SMB protocols:

# isi performance dataset create --name ds_test1 username path --filters=username,path

The statistics can then be viewed with the following CLI syntax (in this case for dataset ‘ds_test1’ with ID 1):

# isi statistics workload list --dataset 1

When it comes to path tracking, there are a couple of caveats and restrictions that should be considered. Firstly, with OneFS 9.3, it is now available on the NFS and SMB protocols only. Also, path tracking is expensive and OneFS cannot track every single path. The desired paths to be tracked must be listed first, and can be specified by either pinning a workload or applying a path filter

If the resource overhead of path tracking is deemed too costly, consider a possible equivalent alternative such as tracking by NFS Export ID or SMB Share, as appropriate.

Users can have thousands of secondary groups, often simply too many to track. Only the primary group will be tracked until groups are specified, so any secondary groups to be tracked must be specified first, again either by applying a group filter or pinning a workload.

Note that some metrics only apply to certain protocols, For example, ‘export_id’ only applies to NFS. Similarly, ‘share_name’ only applies to SMB. Also, note that there is no equivalent metric (or path tracking) implemented for the S3 object protocol yet. This will be addressed in a future release.

These metrics can be used individually in a dataset or combined. For example, a dataset configured with both ‘export_id’ and metrics will list both NFS and SMB workloads with either ‘export_id’ or ‘share_name’. Likewise, a dataset with only the ‘share_name’ metric will only list SMB workloads, and a dataset with just the ‘export_id’ metric will only list NFS, excluding any SMB workloads. However, if a workload is excluded it will be aggregated into the special ‘Excluded’ workload.

When viewing collected metrics, the ‘isi statistics workload’ CLI command displays only the last sample period in table format with the statistics normalized to a per-second granularity:

# isi statistics workload [--dataset <dataset_name/id>]

Names are resolved wherever possible, such as UID to username, IP address to hostname, etc, and lookup failures are reported via an extra ‘error’ column. Alternatively, the ‘—numeric’ flag can also be included to prevent any lookups:

# isi statistics workload [--dataset <dataset_name/id>] –numeric

Aggregated cluster statistics are reported by default, but adding the ‘—nodes’ argument will provide per node statistics as viewed from each initiator. More specifically, the ‘–nodes=0’ flag will display the local node only, whereas ‘–nodes=all’ will report all nodes.

The ‘isi statistics workload’ command also includes standard statistics output formatting flags, such as ‘—sort’, ‘—totalby’, etc.

    --sort (CPU | (BytesIn|bytes_in) | (BytesOut|bytes_out) | Ops | Reads |

      Writes | L2 | L3 | (ReadLatency|latency_read) |

      (WriteLatency|latency_write) | (OtherLatency|latency_other) | Node |

      (UserId|user_id) | (UserSId|user_sid) | UserName | (Proto|protocol) |

      (ShareName|share_name) | (JobType|job_type) | (RemoteAddr|remote_address)

      | (RemoteName|remote_name) | (GroupId|group_id) | (GroupSId|group_sid) |

      GroupName | (DomainId|domain_id) | Path | (ZoneId|zone_id) |

      (ZoneName|zone_name) | (ExportId|export_id) | (SystemName|system_name) |

      (LocalAddr|local_address) | (LocalName|local_name) |

      (WorkloadType|workload_type) | Error)

        Sort data by the specified comma-separated field(s). Prepend 'asc:' or

        'desc:' to a field to change the sort order.







    --totalby (Node | (UserId|user_id) | (UserSId|user_sid) | UserName |

      (Proto|protocol) | (ShareName|share_name) | (JobType|job_type) |

      (RemoteAddr|remote_address) | (RemoteName|remote_name) |

      (GroupId|group_id) | (GroupSId|group_sid) | GroupName |

      (DomainId|domain_id) | Path | (ZoneId|zone_id) | (ZoneName|zone_name) |

      (ExportId|export_id) | (SystemName|system_name) |

      (LocalAddr|local_address) | (LocalName|local_name))

        Aggregate per specified fields(s).

In addition to the CLI, the same output can be accessed via the platform API:

# https://<node_ip>:8080/platform/statistics/summary/workload

Raw metrics can also be viewed using the ‘isi statistic query’ command:

# isi statistics query <current|history> --format=json --keys=<node|cluster>.performance.dataset.<dataset_id>

Options are either ‘current’, which will provide most recent sample, or ‘history’, which will report samples collected over the last 5 minutes. Names are not looked up and statistics are not normalized – the sample period is included in the output. The command syntax must also include the ‘–format=json’ flag, since no other output formats are currently supported. Additionally, there are two distinct types of keys:

Key Type Description
cluster.performance.dataset.<dataset_id> Cluster-wide aggregated statistics
node.performance.dataset.<dataset_id> Per node statistics from the initiator perspective

Similarly, this raw metrics can also be obtained via the platform API as follows:

https://<node_ip>:8080/platform/statistics/<current|history>?keys=node.performance.dataset.0

Leave a Reply

Your email address will not be published.