PowerScale & Isilon – Page 17 – Unstructured Data Quick Tips

OneFS Firewall Management and Troubleshooting

In the final article in this series, we’ll focus on step five of the OneFS firewall provisioning process and turn our attention to some of the management and monitoring considerations and troubleshooting tools associated with the firewall.

Management and monitoring of the firewall in OneFS 9.5 can be performed via the CLI, or platform API, or WebUI. Since data security threats come from inside an environment as well as out, such as from a rogue IT employee, a good practice is to constrain the use of all-powerful ‘root’, ‘administrator’, and ‘sudo’ accounts as much as possible. Instead of granting cluster admins full rights, a preferred approach is to use OneFS’ comprehensive authentication, authorization, and accounting framework.

OneFS role-based access control (RBAC) can be used to explicitly limit who has access to configure and monitor the firewall. A cluster security administrator selects the desired multi-tenant access zone, creates a zone-aware role within it, assigns privileges, and then assigns members. For example, from the WebUI under Access > Membership and roles > Roles:

When these members login to the cluster via a configuration interface (WebUI, Platform API, or CLI) they inherit their assigned privileges.

Accessing the firewall from the WebUI and CLI in OneFS 9.5 requires the new ISI_PRIV_FIREWALL administration privilege.

# isi auth privileges -v | grep -i -A 2 firewall

         ID: ISI_PRIV_FIREWALL

Description: Configure network firewall

       Name: Firewall

   Category: Configuration

 Permission: w

This privilege can be assigned one of four permission levels for a role, including:

Permission Indicator	Description
–	No permission.
R	Read-only permission.
X	Execute permission.
W	Write permission.

By default, the built-in ‘SystemAdmin’ roles is granted write privileges to administer the firewall, while the built-in ‘AuditAdmin’ role has read permission to view the firewall configuration and logs.

With OneFS RBAC, an enhanced security approach for a site could be to create two additional roles on a cluster, each with an increasing realm of trust. For example:

An IT ops/helpdesk role with ‘read’ access to the snapshot attributes would permit monitoring and troubleshooting the firewall, but no changes:

RBAC Role	Firewall Privilege	Permission
IT_Ops	ISI_PRIV_FIREWALL	Read

For example:

# isi auth roles create IT_Ops

# isi auth roles modify IT_Ops --add-priv-read ISI_PRIV_FIREWALL

# isi auth roles view IT_Ops | grep -A2 -i firewall

             ID: ISI_PRIV_FIREWALL

     Permission: r

2. A Firewall Admin role would provide full firewall configuration and management rights:

RBAC Role	Firewall Privilege	Permission
FirewallAdmin	ISI_PRIV_FIREWALL	Write

For example:

# isi auth roles create FirewallAdmin

# isi auth roles modify FirewallAdmin –add-priv-write ISI_PRIV_FIREWALL

# isi auth roles view FirewallAdmin | grep -A2 -i firewall

ID: ISI_PRIV_FIREWALL

Permission: w

Note that when configuring OneFS RBAC, remember to remove the ‘ISI_PRIV_AUTH’ and ‘ISI_PRIV_ROLE’ privilege from all but the most trusted administrators.

Additionally, enterprise security management tools such as CyberArk can also be incorporated to manage authentication and access control holistically across an environment. These can be configured to frequently change passwords on trusted accounts (ie. every hour or so), require multi-Level approvals prior to retrieving passwords, as well as track and audit password requests and trends.

OneFS Firewall Limits

When working with the OneFS Firewall, there are some upper bounds to the configurable attributes to keep in mind. These include:

Name	Value	Description
MAX_INTERFACES	500	Maximum number of L2 interfaces including Ethernet, VLAN, LAGG interfaces on a node.
MAX _SUBNETS	100	Maximum number of subnets within a OneFS cluster
MAX_POOLS	100	Maximum number of network pools within a OneFS cluster
DEFAULT_MAX_RULES	100	Default value of maximum rules within a firewall policy
MAX_RULES	200	Upper limit of maximum rules within a firewall policy
MAX_ACTIVE_RULES	5000	Upper limit of total active rules across the whole cluster
MAX_INACTIVE_POLICIES	200	Maximum number of policies which are not applied to any network subnet or pool. They will not be written into ipfw table.

Firewall performance

Be aware that, while the OneFS firewall can greatly enhance the network security of a cluster, by nature of its packet inspection and filtering activity, it does come with a slight performance penalty (generally less than 5%).

Firewall and hardening mode

If OneFS STIG Hardening (ie. via ‘isi hardening apply’) is applied to a cluster with the OneFS Firewall disabled, the firewall will be automatically activated. On the other hand, if the firewall is already enabled, then there will be no change and it will remain active.

Firewall and user-configurable ports

Some OneFS services allow the TCP/UDP ports on which the daemon listens to be changed. These include:

Service	CLI Command	Default Port
NDMP	isi ndmp settings global modify –port	10000
S3	isi s3 settings global modify –https-port	9020, 9021
SSH	isi ssh settings modify –port	22

The default ports for these services are already configured in the associated global policy rules. For example, for the S3 protocol:

# isi network firewall rules list | grep s3

default_pools_policy.rule_s3                  55     Firewall rule on s3 service                                                             allow

# isi network firewall rules view default_pools_policy.rule_s3

          ID: default_pools_policy.rule_s3

        Name: rule_s3

       Index: 55

 Description: Firewall rule on s3 service

    Protocol: TCP

   Dst Ports: 9020, 9021

Src Networks: -

   Src Ports: -

      Action: allow

Note that the global policies, or any custom policies, do not auto-update if these ports are reconfigured. This means that the firewall policies must be manually updated when changing ports. For example, if the NDMP port is changed from 10000 to 10001:

# isi ndmp settings global view

                       Service: False

                          Port: 10000

                           DMA: generic

          Bre Max Num Contexts: 64

MSB Context Retention Duration: 300

MSR Context Retention Duration: 600

        Stub File Open Timeout: 15

             Enable Redirector: False

              Enable Throttler: False

       Throttler CPU Threshold: 50

# isi ndmp settings global modify --port 10001

# isi ndmp settings global view | grep -i port

                          Port: 10001

The firewall’s NDMP rule port configuration must also be reset to 10001:

# isi network firewall rule list | grep ndmp

default_pools_policy.rule_ndmp                44     Firewall rule on ndmp service                                                           allow

# isi network firewall rule modify default_pools_policy.rule_ndmp --dst-ports 10001 --live

# isi network firewall rule view default_pools_policy.rule_ndmp | grep -i dst

   Dst Ports: 10001

Note that the ‘–live’ flag is specified to enact this port change immediately.

Firewall and source-based routing

Under the hood, OneFS source-based routing (SBR) and the OneFS Firewall both leverage ‘ipfw’. As such, SBR and the firewall share the single ipfw table in the kernel. However, the two features use separate ipfw table partitions.

This allows SBR and the firewall to be activated independently of each other. For example, even if the firewall is disabled, SBR can still be enabled and any configured SBR rules displayed as expected (ie. via ‘ipfw set 0 show’).

Firewall and IPv6

Note that the firewall’s global default policies have a rule allowing ICMP6 by default. For IPv6 enabled networks, ICMP6 is critical for the functioning of NDP (Neighbor Discovery Protocol). As such, when creating custom firewall policies and rules for IPv6-enabled network subnets/pools, be sure to add a rule allowing ICMP6 to support NDP. As discussed in a previous article, an alternative (and potentially easier) approach is to clone a global policy to a new one and just customize its ruleset instead.

Firewall and FTP

The OneFS FTP service can work in two modes: Active and Passive. Passive mode is the default, where FTP data connections are created on top of random ephemeral ports. However, since the OneFS firewall requires fixed ports to operate, it only supports the FTP service in active mode. Attempts to enable the firewall with FTP running in passive mode will generate the following warning:

# isi ftp settings view | grep -i active

          Active Mode: No

# isi network firewall settings modify --enabled yes

FTP service is running in Passive mode. Enabling network firewall will lead to FTP clients having their connections blocked. To avoid this, please enable FTP active mode and ensure clients are configured in active mode before retrying. Are you sure you want to proceed and enable network firewall? (yes/[no]):

In order to activate the OneFS firewall in conjunction with the FTP service, first ensure the FTP service is running in active mode before enabling the firewall. For example:

# isi ftp settings view | grep -i enable

  FTP Service Enabled: Yes

# isi ftp settings view | grep -i active

          Active Mode: No

# isi ftp setting modify –active-mode true

# isi ftp settings view | grep -i active

          Active Mode: Yes

# isi network firewall settings modify --enabled yes

Note: Verify FTP active mode support and/or firewall settings on the client side, too.

Firewall monitoring and troubleshooting

When it comes to monitoring the OneFS firewall, the following logfiles and utilities provide a variety of information and are a good source to start investigating an issue:

Utility	Description
/var/log/isi_firewall_d.log	Main OneFS firewall log file, which includes information from firewall daemon.
/var/log/isi_papi_d.log	Logfile for platform AP, including Firewall related handlers.
isi_gconfig -t firewall	CLI command that displays all firewall configuration info.
ipfw show	CLI command which displays the ipfw table residing in the FreeBSD kernel.

Note that the above files and command output are automatically included in logsets generated by the ‘isi_gather_info’ data collection tool.

The isi_gconfig command can be run with the ‘-q’ flag to identify any values that are not at their default settings. For example, the stock (default) isi_firewall_d gconfig context will not report any configuration entries:

# isi_gconfig -q -t firewall

[root] {version:1}

The firewall can also be run in the foreground for additional active rule reporting and debug output. For example, first shut down the isi_firewall_d service:

# isi services -a isi_firewall_d disable

The service 'isi_firewall_d' has been disabled.

Next, start up the firewall with the ‘-f’ flag.

# isi_firewall_d -f

Acquiring kevents for flxconfig

Acquiring kevents for nodeinfo

Acquiring kevents for firewall config

Initialize the firewall library

Initialize the ipfw set

ipfw: Rule added by ipfw is for temporary use and will be auto flushed soon. Use isi firewall instead.

cmd:/sbin/ipfw set enable 0 normal termination, exit code:0

isi_firewall_d is now running

Loaded master FlexNet config (rev:312)

Update the local firewall with changed files: flx_config, Node info, Firewall config

Start to update the firewall rule...

flx_config version changed!                             latest_flx_config_revision: new:312, orig:0

node_info version changed!                              latest_node_info_revision: new:1, orig:0

firewall gconfig version changed!                               latest_fw_gconfig_revision: new:17, orig:0

Start to update the firewall rule for firewall configuration (gconfig)

Start to handle the firewall configure (gconfig)

Handle the firewall policy default_pools_policy

ipfw: Rule added by ipfw is for temporary use and will be auto flushed soon. Use isi firewall instead.

32043 allow tcp from any to any 10000 in

cmd:/sbin/ipfw add 32043 set 8 allow TCP from any  to any 10000 in  normal termination, exit code:0

ipfw: Rule added by ipfw is for temporary use and will be auto flushed soon. Use isi firewall instead.

32044 allow tcp from any to any 389,636 in

cmd:/sbin/ipfw add 32044 set 8 allow TCP from any  to any 389,636 in  normal termination, exit code:0

Snip...

If the OneFS firewall is enabled and some network traffic is blocked, either this or the ‘ipfw show’ CLI command will often provide the first clues.

Please note that the ‘ipfw’ command should NEVER be used to modify the OneFS firewall table!

For example, say a rule is added to the default pools policy denying traffic on port 9876 from all source networks (0.0.0.0/0):

# isi network firewall rules create default_pools_policy.rule_9876 --index=100 --dst-ports 9876 --src-networks 0.0.0.0/0 --action deny –live

# isi network firewall rules view default_pools_policy.rule_9876

          ID: default_pools_policy.rule_9876

        Name: rule_9876

       Index: 100

 Description:

    Protocol: ALL

   Dst Ports: 9876

Src Networks: 0.0.0.0/0

   Src Ports: -

      Action: deny

Running ‘ipfw show’ and grepping for the port will show this new rule:

# ipfw show | grep 9876

32099            0               0 deny ip from any to any 9876 in

The ‘ipfw show’ command output also reports the statistics of how many IP packets have matched each rule This can be incredibly useful when investigating firewall issues. For example, a telnet session is initiated to the cluster on port 9876 from a client:

# telnet 10.224.127.8 9876

Trying 10.224.127.8...

telnet: connect to address 10.224.127.8: Operation timed out

telnet: Unable to connect to remote host

The connection attempt will time out since the port 9876 ‘deny’ rule will silently drop the packets. At the same time, the ‘ipfw show’ command will increment its counter to report on the denied packets. For example:

# ipfw show | grep 9876

32099            9             540 deny ip from any to any 9876 in

If this behavior is not anticipated or desired, the rule name can be found searching the rules list for the port number, in this case port 9876:

# isi network firewall rules list | grep 9876

default_pools_policy.rule_9876                100                                                                deny

The offending rule can then be reverted to ‘allow’ traffic on port 9876:

# isi network firewall rules modify default_pools_policy.rule_9876 --action allow --live

Or easily deleted, if preferred:

# isi network firewall rules delete default_pools_policy.rule_9876 --live

Are you sure you want to delete firewall rule default_pools_policy.rule_9876? (yes/[no]): yes

OneFS Firewall Configuration – Part 2

In the previous article in this OneFS firewall series, we reviewed the upgrade, activation, and policy selection components of the firewall provisioning process.

Now, we turn our attention to the firewall rule configuration step of the process.

As stated previously, role-based access control (RBAC) explicitly limits who has access to manage the OneFS firewall. So ensure that the user account which will be used to enable and configure the OneFS firewall belongs to a role with the ‘ISI_PRIV_FIREWALL’ write privilege.

Configuring Firewall Rules

Once the desired policy is created, the next step is to configure the rules. Clearly, the first step here is decide what ports and services need securing or opening, beyond the defaults.

The following CLI syntax will return a list of all the firewall’s default services, plus their respective ports, protocols, and aliases, sorted by ascending port number:

# isi network firewall services list

Service Name     Port  Protocol  Aliases

---------------------------------------------

ftp-data         20    TCP       -

ftp              21    TCP       -

ssh              22    TCP       -

smtp             25    TCP       -

dns              53    TCP       domain

                       UDP

http             80    TCP       www

                                 www-http

kerberos         88    TCP       kerberos-sec

                       UDP

rpcbind          111   TCP       portmapper

                       UDP       sunrpc

                                 rpc.bind

ntp              123   UDP       -

dcerpc           135   TCP       epmap

                       UDP       loc-srv

netbios-ns       137   UDP       -

netbios-dgm      138   UDP       -

netbios-ssn      139   UDP       -

snmp             161   UDP       -

snmptrap         162   UDP       snmp-trap

mountd           300   TCP       nfsmountd

                       UDP

statd            302   TCP       nfsstatd

                       UDP

lockd            304   TCP       nfslockd

                       UDP

nfsrquotad       305   TCP       -

                       UDP

nfsmgmtd         306   TCP       -

                       UDP

ldap             389   TCP       -

                       UDP

https            443   TCP       -

smb              445   TCP       microsoft-ds

hdfs-datanode    585   TCP       -

asf-rmcp         623   TCP       -

                       UDP

ldaps            636   TCP       sldap

asf-secure-rmcp  664   TCP       -

                       UDP

ftps-data        989   TCP       -

ftps             990   TCP       -

nfs              2049  TCP       nfsd

                       UDP

tcp-2097         2097  TCP       -

tcp-2098         2098  TCP       -

tcp-3148         3148  TCP       -

tcp-3149         3149  TCP       -

tcp-3268         3268  TCP       -

tcp-3269         3269  TCP       -

tcp-5667         5667  TCP       -

tcp-5668         5668  TCP       -

isi_ph_rpcd      6557  TCP       -

isi_dm_d         7722  TCP       -

hdfs-namenode    8020  TCP       -

isi_webui        8080  TCP       apache2

webhdfs          8082  TCP       -

tcp-8083         8083  TCP       -

ambari-handshake 8440  TCP       -

ambari-heartbeat 8441  TCP       -

tcp-8443         8443  TCP       -

tcp-8470         8470  TCP       -

s3-http          9020  TCP       -

s3-https         9021  TCP       -

isi_esrs_d       9443  TCP       -

ndmp             10000 TCP       -

cee              12228 TCP       -

nfsrdma          20049 TCP       -

                       UDP

tcp-28080        28080 TCP       -

---------------------------------------------

Total: 55

Similarly, the following CLI command will generate a list of existing rules and their associated policies, sorted in alphabetical order. For example, to show the first 5 rules:

# isi network firewall rules list –-limit 5

ID                                            Index  Description                                                                             Action

----------------------------------------------------------------------------------------------------------------------------------------------------

default_pools_policy.rule_ambari_handshake    41     Firewall rule on ambari-handshake service                                               allow

default_pools_policy.rule_ambari_heartbeat    42     Firewall rule on ambari-heartbeat service                                               allow

default_pools_policy.rule_catalog_search_req  50     Firewall rule on service for global catalog search requests                             allow

default_pools_policy.rule_cee                 52     Firewall rule on cee service                                                            allow

default_pools_policy.rule_dcerpc_tcp          18     Firewall rule on dcerpc(TCP) service                                                    allow

----------------------------------------------------------------------------------------------------------------------------------------------------

Total: 5

Both the ‘isi network firewall rules list’ and ‘isi network firewall services list’ commands also have a ‘-v’ verbose option, plus can return their output in csv, list, table, or json formats with the ‘–flag’.

The detailed info for a given firewall rule, in this case the default SMB rule, can be viewed with the following CLI syntax:

# isi network firewall rules view default_pools_policy.rule_smb

          ID: default_pools_policy.rule_smb

        Name: rule_smb

       Index: 3

 Description: Firewall rule on smb service

    Protocol: TCP

   Dst Ports: smb

Src Networks: -

   Src Ports: -

      Action: allow

Existing rules can be modified and new rules created and added into an existing firewall policy with the ‘isi network firewall rules create’ CLI syntax. Command options include:

Option	Description
–action	Allow, which mean pass packets. Deny, which means silently drop packets. Reject which means reply with ICMP error code.
id	Specifies the ID of the new rule to create. The rule must be added to an existing policy. The ID can be up to 32 alphanumeric characters long and can include underscores or hyphens, but cannot include spaces or other punctuation. Specify the rule ID in the following format: <policy_name>.<rule_name> The rule name must be unique in the policy.
–index	the rule index in the pool. the valid value is between 1 and 99. the lower value has the higher priority. if not specified, automatically go to the next available index (before default rule 100).
–live	The live option must only be used when a user issues a command to create/modify/delete a rule in an active policy. Such changes will take effect immediately on all network subnets and pools associated with this policy. Using the live option on a rule in an inactive policy will be rejected, and an error message will be returned.
–protocol	Specify the protocol matched for the inbound packets. Available value are tcp,udp,icmp,all. if not configured, the default protocol all will be used.
–dst-ports	Specify the network ports/services provided in storage system which is identified by destination port(s). The protocol specified by –protocol will be applied on these destination ports.
–src-networks	Specify one or more IP addresses with corresponding netmasks that are to be allowed by this firewall policy. The correct format for this parameter is address/netmask, similar to “192.0.2.128/25”. Multiple address/netmask pairs should be separated with commas. Use the value 0.0.0.0/0 for “any”.
–src-ports	Specify the network ports/services provided in storage system which is identified by source port(s). The protocol specified by –protocol will be applied on these source ports.

Note that, unlike for firewall policies, there is no provision for cloning individual rules.

The following CLI syntax can be used to create new firewall rules. For example, to add ‘allow’ rules for the HTTP and SSH protocols, plus a ‘deny’ rule for port TCP 9876, into firewall policy fw_test1:

# isi network firewall rules create  fw_test1.rule_http  --index 1 --dst-ports http --src-networks 10.20.30.0/24,20.30.40.0/24 --action allow

# isi network firewall rules create  fw_test1.rule_ssh  --index 2 --dst-ports ssh --src-networks 10.20.30.0/24,20.30.40.0/16 --action allow

# isi network firewall rules create fw_test1.rule_tcp_9876 --index 3 --protocol tcp --dst-ports 9876  --src-networks 10.20.30.0/24,20.30.40.0/24 -- action deny

When a new rule is created in a policy, if the index value is not specified, it will automatically inherit the next available number in the series (ie. index=4 in this case).

# isi network firewall rules create fw_test1.rule_2049  --protocol udp -dst-ports 2049 --src-networks 30.1.0.0/16 -- action deny

For a more draconian approach, a ‘deny’ rule could be created using the match-everything ‘*’ wildcard for destination ports and a 0.0.0.0/0 network and mask, which would silently drop all traffic:

# isi network firewall rules create fw_test1.rule_1234  --index=100--dst-ports * --src-networks 0.0.0.0/0 --action deny

When modifying existing firewall rules, the following CLI syntax can be used, in this case to change the source network of an HTTP allow rule (index 1) in firewall policy fw_test1:

# isi network firewall rules modify fw_test1.rule_http --index 1  --protocol ip --dst-ports http --src-networks 10.1.0.0/16 -- action allow

Or to modify an SSH rule (index 2) in firewall policy fw_test1, changing the action from ‘allow’ to ‘deny’:

# isi network firewall rules modify fw_test1.rule_ssh --index 2 --protocol tcp --dst-ports ssh --src-networks 10.1.0.0/16,20.2.0.0/16 -- action deny

Also, to re-order the custom TCP 9876 rule form the earlier example from index 3 to index 7 in firewall policy fw_test1.

# isi network firewall rules modify fw_test1.rule_tcp_9876 --index 7

Note that all rules equal or behind index 7 will have their index values incremented by one.

When deleting a rule from a firewall policy, any rule reordering is handled automatically. If the policy has been applied to a network pool, the ‘–live’ option can be used to force the change to take effect immediately. For example, to delete the HTTP rule from the firewall policy ‘fw_test1’:

# isi network firewall policies delete fw_test1.rule_http --live

Firewall rules can also be created, modified and deleted within a policy from the WebUI by navigating to Cluster management > Firewall Configuration > Firewall Policies. For example, to create a rule that permits SupportAssist and Secure Gateway traffic on the 10.219.0.0/16 network:

Once saved, the new rule is then displayed in the Firewall Configuration page:

Firewall management and monitoring.

In the next and final article in this series, we’ll turn our attention to managing, monitoring, and troubleshooting the OneFS firewall (step 5).

OneFS Firewall Configuration – Part 1

The new firewall in OneFS 9.5 enhances the security of the cluster and helps prevent unauthorized access to the storage system. When enabled, the default firewall configuration allows remote systems access to a specific set of default services for data, management, and inter-cluster interfaces (network pools).

The basic OneFS firewall provisioning process is as follows:

Note that role-based access control (RBAC) explicitly limits who has access to manage the OneFS firewall. In addition to the ubiquitous ‘root’, the cluster’s built-in SystemAdmin role has write privileges to configure and administer the firewall.

Upgrade to OneFS 9.5

First, the cluster must be running OneFS 9.5 in order to provision the firewall.

If upgrading from an earlier release, the OneFS 9.5 upgrade must be committed before enabling the firewall.

Also, be aware that configuration and management of the firewall in OneFS 9.5 requires the new ISI_PRIV_FIREWALL administration privilege. This can be granted to a role with either read-only or read-write privileges.

# isi auth privilege | grep -i firewall

ISI_PRIV_FIREWALL                   Configure network firewall

This privilege can be granted to a role with either read-only or read-write permissions. By default, the built-in ‘SystemAdmin’ roles is granted write privileges to administer the firewall:

# isi auth roles view SystemAdmin | grep -A2 -i firewall

             ID: ISI_PRIV_FIREWALL

     Permission: w

Additionally, the built-in ‘AuditAdmin’ role has read permission to view the firewall configuration and logs, etc:

# isi auth roles view AuditAdmin | grep -A2 -i firewall

             ID: ISI_PRIV_FIREWALL

     Permission: r

Ensure that the user account which will be used to enable and configure the OneFS firewall belongs to a role with the ‘ISI_PRIV_FIREWALL’ write privilege.

Activate Firewall

As mentioned previously, the OneFS firewall can be either ‘enabled’ or ‘disabled’, with the latter as the default state. The following CLI syntax will display the firewall’s global status – in this case ‘disabled’ (the default):

# isi network firewall settings view

Enabled: False

Firewall activation can be easily performed from the CLI as follows:

# isi network firewall settings modify --enabled true

# isi network firewall settings view

Enabled: True

Or from the WebUI under Cluster management > Firewall Configuration > Settings:

Note that the firewall is automatically enabled when STIG Hardening applied to a cluster.

Pick policies

A cluster’s existing firewall policies can be easily viewed from the CLI with the following command:

# isi network firewall policies list

ID        Pools                    Subnets                   Rules
-----------------------------------------------------------------------------
fw_test1  groupnet0.subnet0.pool0  groupnet0.subnet1         test_rule1
-----------------------------------------------------------------------------
Total: 1

Or from the WebUI under Cluster management > Firewall Configuration > Firewall Policies:

The OneFS firewall offers four main strategies when it comes to selecting a firewall policy. These include:

Retaining the default policy
Reconfiguring the default policy
Cloning the default policy and reconfiguring
Creating a custom firewall policy

We’ll consider each of these strategies in order:

a. Retaining the default policy

In many cases, the default OneFS firewall policy value will provide acceptable protection for a security conscious organization. In these instances, once the OneFS firewall has been enabled on a cluster, no further configuration is required, and the cluster administrators can move on to the management and monitoring phase.

The firewall policy for all front-end cluster interfaces (network pool) is ‘default’. While the default policy can be modified, be aware that this default policy is global. As such, any change against it will impact all network pools using this default policy.

The following table describes the default firewall policies that are assigned to each interface:

Policy

Description

Default pools policy

Contains rules for the inbound default ports for TCP and UDP services in OneFS.

Default subnets policy

Contains rules for:

· DNS port 53

· Rule for ICMP

· Rule for ICMP6

These can be viewed from the CLI as follows:

# isi network firewall policies view default_pools_policy

            ID: default_pools_policy

          Name: default_pools_policy

   Description: Default Firewall Pools Policy

Default Action: deny

     Max Rules: 100

         Pools: groupnet0.subnet0.pool0, groupnet0.subnet0.testpool1, groupnet0.subnet0.testpool2, groupnet0.subnet0.testpool3, groupnet0.subnet0.testpool4, groupnet0.subnet0.poolcava

       Subnets: -

         Rules: rule_ldap_tcp, rule_ldap_udp, rule_reserved_for_hw_tcp, rule_reserved_for_hw_udp, rule_isi_SyncIQ, rule_catalog_search_req, rule_lwswift, rule_session_transfer, rule_s3, rule_nfs_tcp, rule_nfs_udp, rule_smb, rule_hdfs_datanode, rule_nfsrdma_tcp, rule_nfsrdma_udp, rule_ftp_data, rule_ftps_data, rule_ftp, rule_ssh, rule_smtp, rule_http, rule_kerberos_tcp, rule_kerberos_udp, rule_rpcbind_tcp, rule_rpcbind_udp, rule_ntp, rule_dcerpc_tcp, rule_dcerpc_udp, rule_netbios_ns, rule_netbios_dgm, rule_netbios_ssn, rule_snmp, rule_snmptrap, rule_mountd_tcp, rule_mountd_udp, rule_statd_tcp, rule_statd_udp, rule_lockd_tcp, rule_lockd_udp, rule_nfsrquotad_tcp, rule_nfsrquotad_udp, rule_nfsmgmtd_tcp, rule_nfsmgmtd_udp, rule_https, rule_ldaps, rule_ftps, rule_hdfs_namenode, rule_isi_webui, rule_webhdfs, rule_ambari_handshake, rule_ambari_heartbeat, rule_isi_esrs_d, rule_ndmp, rule_isi_ph_rpcd, rule_cee, rule_icmp, rule_icmp6, rule_isi_dm_d




# isi network firewall policies view default_subnets_policy

            ID: default_subnets_policy

          Name: default_subnets_policy

   Description: Default Firewall Subnets Policy

Default Action: deny

     Max Rules: 100

         Pools: -

       Subnets: groupnet0.subnet0

         Rules: rule_subnets_dns_tcp, rule_subnets_dns_udp, rule_icmp, rule_icmp6

Or from the WebUI under Cluster Management > Firewall Configuration > Firewall Policies:

b. Reconfiguring the default policy

Depending on an organization’s threat levels or security mandates, there may be a need to restrict access to certain additional IP addresses and/or management service protocols.

If the default policy is deemed insufficient, reconfiguring the default firewall policy can be a good option if only a small number of rule changes are required. The specifics of creating, modifying, and deleting individual firewall rules is covered later in this article (step 3 below).

Note that if new rule changes behave unexpectedly, or configurating the firewall generally goes awry, OneFS does provide a ‘get out of jail free’ card. In a pinch, the global firewall policy can be quickly and easily restored to its default values. This can be achieved with the following CLI syntax:

# isi network firewall reset-global-policy

This command will reset the global firewall policies to the original system defaults. Are you sure you want to continue? (yes/[no]):

Alternatively, the default policy can also be easily reverted from the WebUI too, by clicking the ‘Reset default policies’ button:

c. Cloning the default policy and reconfiguring

Another option is cloning, which can be useful when batch modification or a large number of changes to the current policy are required. By cloning the default firewall policy, an exact copy of the existing policy and its rules is generated, but with a new policy name. For example:

# isi network firewall policies clone default_pools_policy clone_default_pools_policy

# isi network firewall policies list | grep -i clone

clone_default_pools_policy -

Cloning can also be initiated from the WebUI under Firewall Configuration > Firewall Policies > More Actions > Clone Policy:

Enter the desired name of the clone in the ‘Policy Name’ field in the pop-up window and click ‘Save’:

Once cloned, the policy can then be easily reconfigured to suit. For example, to modify the policy ‘fw_test1’ and change its default-action from deny-all to allow-all:

# isi network firewall policies modify fw_test1 --default--action allow-all

When modifying a firewall policy, the ‘–live’ option CLI option can be used to force it take effect immediately. Note that the ‘—live’ option is only valid when issuing a command to modify or delete an active custom policy and to modify default policy. Such changes will take effect immediately on all network subnets and pools associated with this policy. Using the live option on an inactive policy will be rejected, and an error message returned.

Options for creating or modifying a firewall policy include:

Option	Description
–default-action	Automatically add one rule to ‘deny all’ or ‘allow all’ to the bottom of the rule set for this created policy (Index = 100).
—max-rule-num	By default, each policy when created could have maximum 100 rules (include one default rule), so user could config maximum 99 rules. User could expand the maximum rule number to a specified value. Currently this value is limited to 200 (and user could config maximum 199 rules).
–add-subnets	Specify the network subnet(s) to add to policy, separated by a comma.
–remove-subnets	Specify the networks subnets to remove from policy and fall back to global policy.
–add-pools	Specify the network pool(s) to add to policy, separated by a comma.
–remove-pools	Specify the networks pools to remove from policy and fall back to global policy.

When modifying firewall policies, OneFS prints the following warning to verify the changes and help avoid the risk of a self-induced denial-of-service:

# isi network firewall policies modify --pools groupnet0.subnet0.pool0 fw_test1

Changing the Firewall Policy associated with a subnet or pool may change the networks and/or services allowed to connect to OneFS. Please confirm you have selected the correct Firewall Policy and Subnets/Pools. Are you sure you want to continue? (yes/[no]): yes

Once again, having the following CLI command handy, plus console access to the cluster is always a prudent move:

# isi network firewall reset-global-policy

So adding network pools or subnets to a firewall policy will cause the previous policy to be removed from them. Similarly, adding network pools or subnets to the global default policy will revert any custom policy configuration they might have. For example, to apply the firewall policy fw_test1 to IP Pool groupnet0.subnet0.pool0 and groupnet0.subnet0.pool1:

# isi network pools view groupnet0.subnet0.pool0 | grep -i firewall

      Firewall Policy: default_pools_policy

# isi network firewall policies modify fw_test1 --add-pools groupnet0.subnet0.pool0, groupnet0.subnet0.pool1

# isi network pools view groupnet0.subnet0.pool0 | grep -i firewall

      Firewall Policy: fw_test1

Or to apply the firewall policy fw_test1 to IP Pool groupnet0.subnet0.pool0 and groupnet0.subnet0:

# isi network firewall policies modify fw_test1 --apply-subnet groupnet0.subnet0.pool0, groupnet0.subnet0

# isi network pools view groupnet0.subnet0.pool0 | grep -i firewall

 Firewall Policy: fw_test1

# isi network subnets view groupnet0.subnet0 | grep -i firewall

 Firewall Policy: fw_test1

To reapply global policy at any time, either add the pools to the default policy:

# isi network firewall policies modify default_pools_policy --add-pools groupnet0.subnet0.pool0, groupnet0.subnet0.pool1

# isi network pools view groupnet0.subnet0.pool0 | grep -i firewall

 Firewall Policy: default_subnets_policy

# isi network subnets view groupnet0.subnet1 | grep -i firewall

 Firewall Policy: default_subnets_policy

Or remove the pool from the custom policy:

# isi network firewall policies modify fw_test1 --remove-pools groupnet0.subnet0.pool0 groupnet0.subnet0.pool1

Firewall policies can also be managed on the desired network pool in the OneFS WebUI by navigating to Cluster configuration > Network configuration > External network > Edit pool details. For example:

Be aware that cloning is also not limited to the default policy, as clones can be made of any custom policies too. For example:

# isi network firewall policies clone clone_default_pools_policy fw_test1

d. Creating a custom firewall policy

Alternatively, a custom firewall policy can also be created from scratch. This can be accomplished from the CLI using the following syntax, in this case to create a firewall policy named ‘fw_test1’:

# isi network firewall policies create fw_test1 --default-action deny

# isi network firewall policies view fw_test1

            ID: fw_test1

          Name: fw_test1

   Description:

Default Action: deny

     Max Rules: 100

         Pools: -

       Subnets: -

         Rules: -

Note that if a ‘default-action’ is not specified in the CLI command syntax, it will automatically default to deny.

Firewall policies can also be configured via the OneFS WebUI by navigating to Cluster management > Firewall Configuration > Firewall Policies > Create Policy:

However, in contrast to the CLI, if a ‘default-action’ is not specified when creating a policy in the WebUI, it will automatically default to ‘Allow’ instead, since the drop-down list works alphabetically.

If and when a firewall policy is no longer required, it can be swiftly and easily removed. For example, the following CLI syntax will delete the firewall policy ‘fw_test1’, clearing out any rules within this policy container:

# isi network firewall policies delete fw_test1

Are you sure you want to delete firewall policy fw_test1? (yes/[no]): yes

Note that the default global policies cannot be deleted.

# isi network firewall policies delete default_subnets_policy

Are you sure you want to delete firewall policy default_subnets_policy? (yes/[no]): yes

Firewall policy: Cannot delete default policy default_subnets_policy.

Configuring Firewall Rules

In the next article in this series, we’ll turn our attention to configuring the OneFS firewall rule(s) (step 4).

OneFS Firewall

Among the array of security features introduced in OneFS 9.5 is a new host-based firewall. This firewall allows cluster administrators to configure policies and rules on a PowerScale cluster in order to meet the network and application management needs and security mandates of an organization.

The OneFS firewall protects the cluster’s external, or front-end, network and operates as a packet filter for inbound traffic. It is available upon installation or upgrade to OneFS 9.5, but is disabled by default in both cases. However, the OneFS STIG hardening profile automatically enables the firewall and the default policies, in addition to manual activation.

The firewall generally manages IP packet filtering in accordance with the OneFS Security Configuration Guide, especially in regards to the network port usage. Packet control is governed by firewall policies, which are comprised of one or more individual rules.

Item	Description	Match	Action
Firewall Policy	Each policy is a set of firewall rules.	Rules are matched by index in ascending order	Each policy has a default action.
Firewall Rule	Each rule specifies what kinds of network packets should be matched by Firewall engine and what action should be taken upon them.	Matching criteria includes protocol, source ports, destination ports, source network address)	Options are ‘allow’, ‘deny’ or ‘reject’.

A security best practice is to enable the OneFS firewall using the default policies, with any adjustments as required. The recommended configuration process is as follows:

Step	Details
1. Access	Ensure that the cluster uses a default SSH or HTTP port before enabling. The default firewall policies block all nondefault ports until you change the policies.
2. Enable	Enable the OneFS firewall.
3. Compare	Compare your cluster network port configurations against the default ports listed in Network port usage.
4. Configure	Edit the default firewall policies to accommodate any non-standard ports in use in the cluster. NOTE: The firewall policies do not automatically update when port configurations are changed.
5. Constrain	Limit access to the OneFS Web UI to specific administrator terminals

Under the hood, the OneFS firewall is built upon the ubiquitous ‘ipfirewall’, or ‘ipfw’, which is FreeBSD’s native stateful firewall, packet filter and traffic accounting facility.

Firewall configuration and management is via the CLI, or platform API, or WebUI and OneFS 9.5 introduces a new Firewall Configuration page to support this. Note that the firewall is only available once a cluster is already running OneFS 9.5 and the feature has been manually enabled, activating the isi_firewall_d service. The firewall’s configuration is split between gconfig, which handles the settings and policies, and the ipfw table, which stores the rules themselves.

The firewall gracefully handles any SmartConnect dynamic IP movement between nodes since firewall policies are applied per network pool. Additionally, being network pool based allows the firewall to support OneFS multi-tenant access zones and shared/multitenancy models.

The individual firewall rules, which are essentially simplified wrappers around ipfw rules, work by matching packets via the 5-tuples that uniquely identify an IPv4 UDP or TCP session:

Source IP address
Source port
Destination IP address
Destination port
Transport protocol

The rules are then organized within a firewall policy, which can be applied to one or more network pools.

Note that each pool can only have a single firewall policy applied to it. If there is no custom firewall policy configured for a network pool, it automatically uses the global default firewall policy.

When enabled, the OneFS firewall function is cluster wide, and all inbound packets from external interfaces will go through either the custom policy or default global policy before reaching the protocol handling pathways. Packets passed to the firewall are compared against each of the rules in the policy, in rule-number order. Multiple rules with the same number are permitted, in which case they are processed in order of insertion. When a match is found, the action corresponding to that matching rule is performed. A packet is checked against the active ruleset in multiple places in the protocol stack, and the basic flow is as follows:

Get the logical interface for incoming packets
Find all network pools assigned to this interface
Compare these network pools one by one with destination IP address to find the matching pool (either custom firewall policy, or default global policy).
Compare each rule with service (protocol & destination ports) & source IP address in this pool from in order of lowest index value. If matched, perform actions according to the associated rule.
If no rule matches, go to the final rule (deny all or allow all) which is specified upon policy creation.

The OneFS firewall automatically reserves 20,000 rules in the ipfw table for its custom and default policies and rules. By default, each policy can gave a maximum of 100 rules, including one default rule. This translates to an effective maximum of 99 user-defined rules per policy, because the default rule is reserved and cannot be modified. As such, a maximum of 198 policies can be applied to pools or subnets since the default-pools-policy and default-subnets-policy are reserved and cannot be deleted.

Additional firewall bounds and limits to keep in mind include:

Name	Value	Description
MAX_INTERFACES	500	Maximum number of Layer 2 interfaces per node (including Ethernet, VLAN, LAGG interfaces).
MAX _SUBNETS	100	Maximum number of subnets within a OneFS cluster
MAX_POOLS	100	Maximum number of network pools within a OneFS cluster
DEFAULT_MAX_RULES	100	Default value of maximum rules within a firewall policy
MAX_RULES	200	Upper limit of maximum rules within a firewall policy
MAX_ACTIVE_RULES	5000	Upper limit of total active rules across the whole cluster
MAX_INACTIVE_POLICIES	200	Maximum number of policies which are not applied to any network subnet or pool. They will not be written into ipfw table.

The firewall default global policy is ready to use out of box and, unless a custom policy has been explicitly configured, all network pools use this global policy. Custom policies can be configured by either cloning and modifying an existing policy or creating one from scratch.

Component	Description
Custom policy	A user-defined container with a set of rules. A policy can be applied to multiple network pools, but a network pool can only apply one policy.
Firewall rule	An ipfw-like rule which can be used to restrict remote access. Each rule has an index which is valid within the policy. Index values range from 1 to 99, with lower numbers having higher priority. Source networks are described by IP and netmask, and services can be expressed either by port number (ie. 80) or service name (ie. http,ssh,smb). The ‘*‘ wildcard can also be used to denote all services. Supported actions include ‘allow’, ‘drop’ and ‘reject’.
Default policy	A global policy to manage all default services, used for maintaining OneFS minimum running and management. While ‘Deny any‘ is the default action of the policy, the defined service rules have a default action to ‘allow all remote access’. All packets not matching any of the rules are automatically dropped. Two default policies: · default-pools-policy · default-subnets-policy Note that these two default policies cannot be deleted, but individual rule modification is permitted in each.
Default services	The firewall’s default pre-defined services include the usual suspects, such as: DNS, FTP, HDFS, HTTP, HTTPS, ICMP, NDMP, NFS, NTP, S3, SMB, SNMP, SSH, etc. A full listing is available via the ‘isi network firewall services list’ CLI command output.

For a given network pool, either the global policy or a custom policy is assigned and takes effect. Additionally, all configuration changes to either policy type are managed by gconfig and are persistent across cluster reboots.

In the next article in this series we’ll take a look at the configuration and management of the OneFS firewall.

OneFS Snapshot Security

In this era of elevated cyber-crime and data security threats, there is increasing demand for immutable, tamper-proof snapshots. Often this need arises as part of a broader security mandate, ideally proactively, but oftentimes as a response to a security incident. OneFS addresses this requirement in the following ways:

On-cluster

Off-cluster

· Read-only snapshots

· Snapshot locks

· Role-based administration

· SyncIQ snapshot replication

· Cyber-vaulting

Read-only snapshots

At its core, OneFS SnapshotIQ generates read-only, point-in-time, space efficient copies of a defined subset of a cluster’s data.

Only the changed blocks of a file are stored when updating OneFS snapshots, ensuring efficient storage utilization. They are also highly scalable and typically take less than a second to create, while generating little performance overhead. As such, the RPO (recovery point objective) and RTO (recovery time objective) of a OneFS snapshot can be very small and highly flexible, with the use of rich policies and schedules.

OneFS Snapshots are created manually, via a scheduled, or automatically generated by OneFS to facilitate system operations. But whatever the generation method, once a snapshot has been taken, its contents cannot be manually altered.

Snapshot Locks

In addition to snapshot contents immutability, for an enhanced level of tamper-proofing, SnapshotIQ also provides the ability to lock snapshots with the ‘isi snapshot locks’ CLI syntax. This prevents snapshots from being accidentally or unintentionally deleted.

For example, a manual snapshot, ‘snaploc1’ is taken of /ifs/test:

# isi snapshot snapshots create /ifs/test --name snaploc1

# isi snapshot snapshots list | grep snaploc1

79188 snaploc1                                     /ifs/test

A lock is then placed on it (in this case lock ID=1):

# isi snapshot locks create snaplock1

# isi snapshot locks list snaploc1

ID

----

1

----

Total: 1

Attempts to delete the snapshot fails because the lock prevents its removal:

# isi snapshot snapshots delete snaploc1

Are you sure? (yes/[no]): yes

Snapshot "snaploc1" can't be deleted because it is locked

The CLI command ‘isi snapshot locks delete <lock_ID>’ can be used to clear existing snapshot locks, if desired. For example, to remove the only lock (ID=1) from snapshot ‘snaploc1’:

# isi snapshot locks list snaploc1

ID

----

1

----

Total: 1

# isi snapshot locks delete snaploc1 1

Are you sure you want to delete snapshot lock 1 from snaploc1? (yes/[no]): yes

# isi snap locks view snaploc1 1

No such lock

Once the lock is removed, the snapshot can then be deleted:

# isi snapshot snapshots delete snaploc1

Are you sure? (yes/[no]): yes

# isi snapshot snapshots list| grep -i snaploc1 | wc -l

       0

Note that a snapshot can have up to a maximum of sixteen locks on it at any time. Also, lock numbers are continually incremented and not recycled upon deletion.

Like snapshot expiry, snapshot locks can also have an expiry time configured. For example, to set a lock on snapshot ‘snaploc1’ that expires at 1am on April 1^st April, 2024:

# isi snap lock create snaploc1 --expires '2024-04-01T01:00:00'

# isi snap lock list snaploc1

ID

----

36

----

Total: 1

# isi snap lock view snaploc1 33

     ID: 36

Comment:

Expires: 2024-04-01T01:00:00

  Count: 1

Note that if the duration period of a particular snapshot lock expires but others remain, OneFS will not delete that snapshot until all the locks on it have been deleted or expired.

The following table provides an example snapshot expiration schedule, with monthly locked snapshots to prevent deletion:

Snapshot Frequency	Snapshot Time	Snapshot Expiration	Max Retained Snapshots
Every other hour	Start at 12:00AM End at 11:59AM	1 day	27
Every day	At 12:00AM	1 week
Every week	Saturday at 12:00AM	1 month
Every month	First Saturday of month at 12:00AM	Locked

3. Roles-based Access Control

Read-only snapshots plus locks present physically secure snapshots on a cluster. However, if you are able to login to the cluster and have the required elevated administrator privileges to do so, you can still remove locks and/or delete snapshots.

Since data security threats come from inside an environment as well as out, such as from a disgruntled IT employee or other internal bad actor, another key to a robust security profile is to constrain the use of all-powerful ‘root’, ‘administrator’, and ‘sudo’ accounts as much as possible. Instead, of granting cluster admins full rights, a preferred security best practice is to leverage the comprehensive authentication, authorization, and accounting framework that OneFS natively provides.

OneFS role-based access control (RBAC) can be used to explicitly limit who has access to manage and delete snapshots. This granular control allows administrative roles to be crafted which can create and manage snapshot schedules, but prevent their unlocking and/or deletion. Similarly, lock removal and snapshot deletion can be isolated to a specific security role (or to root only).

A cluster security administrator selects the desired multi-tenant access zone, creates a zone-aware role within it, assigns privileges, and then assigns members.

For example, from the WebUI under Access > Membership and roles > Roles:

When these members login to the cluster via a configuration interface (WebUI, Platform API, or CLI) they inherit their assigned privileges.

The specific privileges that can be used to segment OneFS snapshot management include:

Privilege	Description
ISI_PRIV_SNAPSHOT_ALIAS	Aliasing for snapshots
ISI_PRIV_SNAPSHOT_LOCKS	Locking of snapshots from deletion
ISI_PRIV_SNAPSHOT_PENDING	Upcoming snapshot based on schedules
ISI_PRIV_SNAPSHOT_RESTORE	Restoring directory to a particular snapshot
ISI_PRIV_SNAPSHOT_SCHEDULES	Scheduling for periodic snapshots
ISI_PRIV_SNAPSHOT_SETTING	Service and access settings
ISI_PRIV_SNAPSHOT_SNAPSHOTMANAGEMENT	Manual snapshots and locks
ISI_PRIV_SNAPSHOT_SNAPSHOT_SUMMARY	Snapshot summary and usage details

Each privilege can be assigned one of four permission levels for a role, including:

Permission Indicator	Description
–	No permission.
R	Read-only permission.
X	Execute permission.
W	Write permission.

The ability for a user to delete a snapshot is governed by the ‘ISI_PRIV_SNAPSHOT_SNAPSHOTMANAGEMENT’ privilege. Similarly, the ‘ISI_PRIV_SNAPSHOT_LOCKS’ governs lock creation and removal.

In the following example, the ‘snap’ role has ‘read’ rights for the ‘ISI_PRIV_SNAPSHOT_LOCKS’ privilege, allowing a user associated with this role to view snapshot locks:

# isi auth roles view snap | grep -I -A 1 locks

             ID: ISI_PRIV_SNAPSHOT_LOCKS

     Permission: r

--

# isi snapshot locks list snaploc1

ID

----

1

----

Total: 1

However, attempts to remove the lock ‘ID 1’ from the ‘snaploc1’ snapshot fail without write privileges:

# isi snapshot locks delete snaploc1 1

Privilege check failed. The following write privilege is required: Snapshot locks (ISI_PRIV_SNAPSHOT_LOCKS)

Write privileges are added to ‘ISI_PRIV_SNAPSHOT_LOCKS’ in the ‘’snaploc1’ role:

# isi auth roles modify snap –-add-priv-write ISI_PRIV_SNAPSHOT_LOCKS

# isi auth roles view snap | grep -I -A 1 locks

             ID: ISI_PRIV_SNAPSHOT_LOCKS

     Permission: w

--

This allows the lock ‘ID 1’ to be successfully deleted from the ‘snaploc1’ snapshot:

# isi snapshot locks delete snaploc1 1

Are you sure you want to delete snapshot lock 1 from snaploc1? (yes/[no]): yes

# isi snap locks view snaploc1 1

No such lock

Using OneFS RBAC, an enhanced security approach for a site could be to create three OneFS roles on a cluster, each with an increasing realm of trust:

a. First, an IT ops/helpdesk role with ‘read’ access to the snapshot attributes would permit monitoring and troubleshooting, but no changes:

Snapshot Privilege	Permission
ISI_PRIV_SNAPSHOT_ALIAS	Read
ISI_PRIV_SNAPSHOT_LOCKS	Read
ISI_PRIV_SNAPSHOT_PENDING	Read
ISI_PRIV_SNAPSHOT_RESTORE	Read
ISI_PRIV_SNAPSHOT_SCHEDULES	Read
ISI_PRIV_SNAPSHOT_SETTING	Read
ISI_PRIV_SNAPSHOT_SNAPSHOTMANAGEMENT	Read
ISI_PRIV_SNAPSHOT_SNAPSHOT_SUMMARY	Read

b. Next, a cluster admin role, with ‘read’ privileges for ‘ISI_PRIV_SNAPSHOT_LOCKS’ and ‘ISI_PRIV_SNAPSHOT_SNAPSHOTMANAGEMENT’ would prevent snapshot and lock deletion, but provide ‘write’ access for schedule configuration, restores, etc..

Snapshot Privilege	Permission
ISI_PRIV_SNAPSHOT_ALIAS	Write
ISI_PRIV_SNAPSHOT_LOCKS	Read
ISI_PRIV_SNAPSHOT_PENDING	Write
ISI_PRIV_SNAPSHOT_RESTORE	Write
ISI_PRIV_SNAPSHOT_SCHEDULES	Write
ISI_PRIV_SNAPSHOT_SETTING	Write
ISI_PRIV_SNAPSHOT_SNAPSHOTMANAGEMENT	Read
ISI_PRIV_SNAPSHOT_SNAPSHOT_SUMMARY	Write

c. Finally, a cluster security admin role (root equivalence) would provide full snapshot configuration and management, lock control, and deletion rights:

Snapshot Privilege	Permission
ISI_PRIV_SNAPSHOT_ALIAS	Write
ISI_PRIV_SNAPSHOT_LOCKS	Write
ISI_PRIV_SNAPSHOT_PENDING	Write
ISI_PRIV_SNAPSHOT_RESTORE	Write
ISI_PRIV_SNAPSHOT_SCHEDULES	Write
ISI_PRIV_SNAPSHOT_SETTING	Write
ISI_PRIV_SNAPSHOT_SNAPSHOTMANAGEMENT	Write
ISI_PRIV_SNAPSHOT_SNAPSHOT_SUMMARY	Write

Note that when configuring OneFS RBAC, remember to remove the ‘ISI_PRIV_AUTH’ and ‘ISI_PRIV_ROLE’ privilege from all but the most trusted administrators.

While this article focuses exclusively on OneFS snapshots, the expanded use of RBAC granular privileges for enhanced security is germane to most key areas of cluster management and data protection, such as SyncIQ replication, etc.

Snapshot replication

In addition to utilizing snapshots for its own checkpointing system, SyncIQ, the OneFS data replication engine, supports snapshot replication to a target cluster.

OneFS SyncIQ replication policies contain an option for triggering a replication policy when a snapshot of the source directory is completed. Additionally, at the onset of a new policy configuration, when the “Whenever a Snapshot of the Source Directory is Taken” option is selected, a checkbox appears to enable any existing snapshots in the source directory to be replicated. More information is available in this SyncIQ paper.

Cyber-vaulting

File data is arguably the most difficult to protect, because:

It is the only type of data where potentially all employees have a direct connection to the storage (with the other type of storage it’s via an application)
File data is linked (or mounted) to the operating system of the client. This means that it’s sufficient to gain file access to the OS to get access to potentially critical data.
Users are the largest breach points that happen.

The Cyber Security Framework (CSF) from the National Institute of Standards and Technology (NIST) categorizes the threat through recovery process:

Within the ‘Protect’ phase, there are two core aspects:

Applying all the core protection features available on the OneFS platform, namely:

Feature	Description
Access control	Where the core data protection functions are being executed. Assess who actually needs write access.
Immutability	Having immutable snapshots, replica versions, etc. Augmenting backup strategy with an archiving strategy with SmartLock WORM.
Encryption	Encrypting both data in-flight and data at rest.
Anti-virus	Integrating with anti-virus/anti-malware protection that does content inspection.
Security advisories	Dell Security Advisories (DSA) inform about fixes to common vulnerabilities and exposures.

Data isolation provides a last resort copy of business critical data, and can be achieved by using an air gap to isolate the cyber vault copy of the data. The vault copy is logically separated from the production copy of the data. Data syncing happens only intermittently by closing the airgap after ensuring there are no known issues.

The combination of OneFS snapshots and SyncIQ replication allows for granular data recovery. This means that only the affected files are recovered, while the most recent changes are preserved for the unaffected data. While an on-prem air-gapped cyber vault can still provide secure network isolation, in the event of an attack, the ability to failover to a fully operational ‘clean slate’ remote site provides additional security and peace of mind.

We’ll explore PowerScale cyber protection and recovery in more depth in a future article.

OneFS SmartQoS Monitoring and Troubleshooting

The previous articles in this series have covered the SmartQoS architecture, configuration, and management. Now, we’ll turn out attention to monitoring and troubleshooting.

The ‘isi statistics workload’ CLI command can be used to monitor the dataset’s performance. The ‘Ops’ column displays the current protocol operations per second. In the following example, OPs stabilize around 9.8, which is just below the configured limit value of 10 Ops.

# isi statistics workload --dataset ds1 & data

Similarly, this next example from the SmartQoS WebUI shows a small NFS workflow performing 497 protocol OPS in a pinned workload with a limit of 500 OPS:

Multiple paths and protocols can be pinned by selecting ‘Pin Workload’ option for a given Dataset. Here, four directory path workloads are each configured with different Protocol OPs limits:

When it comes to troubleshooting SmartQoS, there are a few areas that are worth checking right away, including the SmartQoS Ops limit configuration, isi_pp_d and isi_stats_d daemons, and the protocol service(s).

For suspected Ops limit configuration issues, first confirm that the SmartQoS limits feature is enabled:

# isi performance settings view
Top N Collections: 1024
Time In Queue Threshold (ms): 10.0
Target read latency in microseconds: 12000.0
Target write latency in microseconds: 12000.0
Protocol Ops Limit Enabled: Yes

Next, verify that the workload level protocols_ops limit is correctly configured:

# isi performance workloads view <workload>

Check whether any errors are reported in the isi_tardis_d configuration log:

# cat /var/log/isi_tardis_d.log

To investigating isi_pp_d, first check that the service is enabled

# isi services –a isi_pp_d

Service 'isi_pp_d' is enabled.

If necessary, the isi_pp_d service can be restarted as follows:

# isi services isi_pp_d disable

Service 'isi_pp_d' is disabled.

# isi services isi_pp_d enable

Service 'isi_pp_d' is enabled.

There’s also an isi_pp_d debug tool, which can be helpful in a pinch:

# isi_pp_d -h

Usage: isi_pp_d [-ldhs]

-l Run as a leader process; otherwise, run as a follower. Only one leader process on the cluster will be active.

-d Run in debug mode (do not daemonize).

-s Display pp_leader node (devid and lnn)

-h Display this help.

Debugging can be enabled on the isi_pp_d log file with the following command syntax:

# isi_ilog -a isi_pp_d -l debug, /var/log/isi_pp_d.log

For example, the following log snippet shows a typical isi_ppd_d.log message communication between isi_pp_d leader and isi_pp_d followers:

/ifs/.ifsvar/modules/pp/comm/SETTINGS

[090500b000000b80,08020000:0000bfddffffffff,09000100:ffbcff7cbb9779de,09000100:d8d2fee9ff9e3bfe,090001 00:0000000075f0dfdf]      

100,,,,20,1658854839  < in the format of <workload_id, cputime, disk_reads, disk_writes, protocol_ops, timestamp>

Here, the extract from the /var/log/isi_pp_d.log logfiles from nodes 1 and 2 of a cluster illustrate the different stages of Protocol Ops limit enforcement and usage:

To investigate the isi_stats_d, first confirm that the isi_pp_d service is enabled:

# isi services -a isi_stats_d
Service 'isi_stats_d' is enabled.

If necessary, the isi_stats_d service can be restarted as follows:

# isi services isi_stats_d disable

# isi services isi_stats_d enable

The workload level statistics can be viewed with the following command:

# isi statistics workload list --dataset=<name>

Debugging can be enabled and on the isi_stats_d log file with the following command syntax:

# isi_stats_tool --action set_tracelevel --value debug

# cat /var/log/isi_stats_d.log

To investigate protocol issues, the ‘isi services’ and ‘lwsm’ CLI commands can be useful. For example, to check the status of the S3 protocol:

# /usr/likewise/bin/lwsm list | grep -i protocol
hdfs                       [protocol]    stopped
lwswift                    [protocol]    running (lwswift: 8393)
nfs                        [protocol]    running (nfs: 8396)
s3                         [protocol]    stopped
srv                        [protocol]    running (lwio: 8096)

# /usr/likewise/bin/lwsm status s3
stopped

# /usr/likewise/bin/lwsm info s3
Service: s3
Description: S3 Server
Categories: protocol
Path: /usr/likewise/lib/lw-svcm/s3.so
Arguments:
Dependencies: lsass onefs_s3 AuditEnabled?flt_audit_s3
Container: s3

The above CLI output confirms that the S3 protocol is inactive. The S3 service can be started as follows:

# isi services -a | grep -i s3
s3                   S3 Service                               Enabled

Similarly, the S3 service can be restarted as follows:

# /usr/likewise/bin/lwsm restart s3
Stopping service: s3
Starting service: s3

To investigate further, the protocol’s log level verbosity can be increase. For example, to set the s3 log to ‘debug’:

# isi s3 log-level view
Current logging level is 'info'

# isi s3 log-level modify debug

# isi s3 log-level view
Current logging level is 'debug'

Next, view and monitor the appropriate protocol log. For example, for the S3 protocol:

# cat /var/log/s3.log

# tail -f /var/log/s3.log

Beyond the above, /var/log/messages can also be monitored for pertinent errors, since the main partition performance (PP) modules log to this file. Debug level logging can be enabled for the various PP modules as follows

Dataset:

# sysctl ilog.ifs.acct.raa.syslog=debug+ 
ilog.ifs.acct.raa.syslog: error,warning,notice (inherited) -> error,warning,notice,info,debug

Workload:

# sysctl ilog.ifs.acct.rat.syslog=debug+
ilog.ifs.acct.rat.syslog: error,warning,notice (inherited) -> error,warning,notice,info,debug

Actor work:

# sysctl ilog.ifs.acct.work.syslog=debug+
ilog.ifs.acct.work.syslog: error,warning,notice (inherited) -> error,warning,notice,info,debug

When finished, the default logging levels for the above modules can be restored as follows:

# sysctl ilog.ifs.acct.raa.syslog=notice+

# sysctl ilog.ifs.acct.rat.syslog=notice+

# sysctl ilog.ifs.acct.work.syslog=notice+

OneFS SmartQoS Configuration and Setup

In the previous article in this series, we looked at the underlying architecture and management of SmartQoS in OneFS 9.5. Next, we’ll step through an example SmartQoS configuration via the CLI and WebUI.

After an initial set up, configuring a SmartQoS protocol Ops limit comprises four fundamental steps. These are:

Step	Task	Description	Example
1	Identify Metrics of interest	Used for tracking, to enforce an Ops limit	Uses ‘path’ and ‘protocol’ for the metrics to identify the workload.
2	Create a Dataset	For tracking all of the chosen metric categories	Create the dataset ‘ds1’ with the metrics identified.
3	Pin a Workload	To specify exactly which values to track within the chosen metrics	path: /ifs/data/client_exports protocol: nfs3
4	Set a Limit	To limit Ops based on the dataset, metrics (categories), and metric values defined by the workload	Protocol_ops limit: 100

Step 1:

First, select a metric of interest. For this example we’ll use the following:

Protocol: NFSv3
Path: /ifs/test/expt_nfs

If not already present, create and verify an NFS export – in this case at /ifs/test/expt_nfs:

# isi nfs exports create /ifs/test/expt_nfs

# isi nfs exports list

ID Zone Paths Description

------------------------------------------------

1 System /ifs/test/expt_nfs

------------------------------------------------

Or from the WebUI, under Protocols UNIX sharing (NFS) > NFS exports:

Step 2:

The ‘dataset’ designation is used to categorize workload by various identification metrics including:

ID Metric	Details
Username	UID or SID
Primary groupname	Primary GID or GSID
Secondary groupname	Secondary GID or GSID
Zone name
IP address	Local or remote IP address or IP address range
Path	Except for S3 protocol
Share	SMB share or NFS export ID
Protocol	NFSv3, NFSv4, NFSoRDMA, SMB, or S3

SmartQoS in OneFS 9.5 only allows protocol OPs as the transient resources used for configuring a limit ceiling.

For example, the following CL I command can be used to create a dataset ‘ds1’, specifying protocol and path as the ID metrics:

# isi performance datasets create --name ds1 protocol path

Created new performance dataset 'ds1' with ID number 1.

Note: Resource usage tracking by ‘path’ metric is only supported by SMB and NFS.

The following command will display any configured datasets:

# isi performance datasets list

Or, from the WebUI by navigating to Cluster management > Smart QoS:

Step 3:

After the dataset has been created, a workload can be pinned to it by specifying the metric values. For example:

# isi performance workloads pin ds1 protocol:nfs3 path: /ifs/test/expt_nfs

Pinned performance dataset workload with ID number 100.

Or from the WebUI by browsing to Cluster management > Smart QoS > Pin workload:

After pinning a workload, the entry will show in the ‘Top Workloads’ section of the WebUI page. However, wait at least 30 seconds to start receiving updates.

To list all the pinned workloads from a specified dataset, use the following command:

# isi performance workloads list ds1

The prior command’s output indicates that there are currently no limits set for this workload.

By default, a protocol ops limit exists for each workload. However it is set to the maximum (the maximum value of a 64-bit unsigned integer). This is represented in the CLI output by a dash (“-“) if a limit has not been explicitly configured:

# isi performance workloads list ds1

ID   Name  Metric Values           Creation Time       Cluster Resource Impact  Client Impact  Limits

--------------------------------------------------------------------------------------

100  -     path:/ifs/test/expt_nfs 2023-02-02T12:06:05  -          -             -

           protocol:nfs3

--------------------------------------------------------------------------------------

Total: 1

Step 4:

For a pinned workload in dataset, a limit for the protocol ops limit can be configured from the CLI using the following syntax:

# isi performance workloads modify <dataset> <workload ID> --limits protocol_ops:<value>

When configuring SmartQoS, always be aware that it is a powerful performance throttling tool which can be applied to significant areas of a cluster’s data and userbase. For example, protocol OPs limits can be configured for metrics such as ‘path:/ifs’, which would affect the entire /ifs filesystem, or ‘zone_name:System’ which would limit the System multi-tenant access zone and all users within it. While such configurations are entirely valid, they would have a significant, system-wide impact. As such, caution should be exercised when configuring SmartQoS to avoid any inadvertent, unintended or unexpected performance constraints.

In the following example, the dataset is ‘ds1’, the workload ID is ‘100’, and the protocol OPs limit is set to value ‘10’:

# isi performance workloads modify ds1 100 --limits protocol_ops:10

protocol_ops: 18446744073709551615 -> 10

Or from the WebUI by browsing to Cluster management > Smart QoS > Pin and throttle workload:

The ‘isi performance workloads’ command can be used in ‘list’ mode to show details of the workload ‘ds1’. In this case, ‘Limits’ is set to protocol_ops = 10.

# isi performance workloads list test

ID   Name  Metric Values           Creation Time       Cluster Resource Impact  Client Impact  Limits

--------------------------------------------------------------------------------------

100  -     path:/ifs/test/expt_nfs 2023-02-02T12:06:05  -  -  protocol_ops:10

           protocol:nfs3

--------------------------------------------------------------------------------------

Total: 1

Or in ‘view’ mode:

# isi performance workloads view ds1 100

                     ID: 100

                   Name: -

          Metric Values: path:/ifs/test/expt_nfs, protocol:nfs3

          Creation Time: 2023-02-02T12:06:05

Cluster Resource Impact: -

          Client Impact: -

                 Limits: protocol_ops:10

Or from the WebUI by browsing to Cluster management > Smart QoS:

The limit value of a pinned workload can be easily modified with the following CLI syntax. For example, to set the limit to 100 OPs:

# isi performance workloads modify ds1 100 --limits protocol_ops:100

Or from the WebUI by browsing to Cluster management > Smart QoS > Edit throttle:

Similarly, the following CLI command can be used to easily remove a protocol ops limit for a pinned workload:

# isi performance workloads modify ds1 100 --no-protocol-ops-limit

Or from the WebUI by browsing to Cluster management > Smart QoS > Remove throttle:

OneFS SmartQoS Architecture and Management

The SmartQoS Protocol Ops limits architecture, introduced in OneFS 9.5, involves three primary capabilities:

Resource tracking
Resource limit distribution
Throttling

Under the hood, the OneFS protocol heads (NFS, SMB and S3) identify and track how many protocol operations are being processed through a specific export or share. The existing partitioned performance (PP) reporting infrastructure is leveraged for cluster wide resource usage collection, limit calculation and distribution, along with new OneFS 9.5 functionality to support pinned workload protocol Ops limits.

The protocol scheduling module (LwSched) has an inbuilt throttling capability that allows the execution of individual operations to be delayed by temporarily pausing them, or ‘sleeping’. Additionally, in OneFS 9.5, the partitioned performance kernel modules have also been enhanced to calculate ‘sleep time’ based on operation count resource information (requested, average usage etc.) – both within the current throttling window, and for a specific workload.

The fundamental SmartQoS workflow can be characterized as follows:

Configuration via CLI, pAPI, or WebUI.
Statistics gatherer obtains Op/s data from the partitioned performance (PP) kernel.
Stats gatherer communicates Op/s data to PP leader service.
Leader queries config manager for per-cluster rate limit.
Leader calculates per-node limit.
PP follower service is notified of per-node Op/s limit.
Kernel is informed of new per-node limit.
Work is scheduled with rate-limited resource.
Kernel returns sleep time, if needed.

When an admin configures a per-cluster protocol Ops limit, the statistics gathering service, isi_stats_d, begins collecting workload resource information every 30 seconds by default from the partitioned performance (PP) kernel on each node in the cluster and notifies the isi_pp_d leader service of this resource info. Next, the leader gets the per-cluster protocol Ops limit plus additional resource consumption metrics from the isi_acct_cpp service via isi_tardis_d, the OneFS cluster configuration service and calculates the protocol Ops limit of each node for the next throttling window. It then instructs the isi_pp_d follower service on each node to update the kernel with the newly calculated protocol Ops limit, plus a request to reset throttling window.

Upon receipt of a scheduling request for a work item from the protocol scheduler (LwSched), the kernel calculates the required ‘sleep time’ value, based on the current node protocol Ops limit and resource usage in the current throttling window. If insufficient resources are available, the thread for work item execution thread is put to sleep for a specific interval returned from PP kernel. If resources are available, or the thread is reactivated from sleeping, it executes the work item and reports the resource usages statistics back to PP, releasing any scheduling resources it may own.

SmartQoS can be configured through either the CLI, platform API, or WebUI, and OneFS 9.5 introduces a new SmartQoS WebUI page to support this. Note that SmartQoS is only available once an upgrade to OneFS 9.5 has been committed, and any attempt to configure or run the feature prior to upgrade commit will fail with the following message:

# isi performance workloads modify DS1 -w WS1 --limits protocol_ops:50000

 Setting of protocol ops limits not available until upgrade has been committed

Once a cluster is running OneFS 9.5 and the release is committed, the SmartQoS feature is enabled by default. This, and the current configuration, can be confirmed using the following CLI command:

 # isi performance settings view

                   Top N Collections: 1024

        Time In Queue Threshold (ms): 10.0

 Target read latency in microseconds: 12000.0

Target write latency in microseconds: 12000.0

          Protocol Ops Limit Enabled: Yes

In OneFS 9.5, the ‘isi performance settings modify’ CLI command now includes a ‘protocol-ops-limit-enabled’ parameter to allow the feature to be easily disabled (or re-enabled) across the cluster. For example:

# isi performance settings modify --protocol-ops-limit-enabled false

protocol_ops_limit_enabled: True -> False

Similarly, the ‘isi performance settings view’ CLI command has been extended to report the protocol OPs limit state:

# isi performance settings view *

Top N Collections: 1024

Protocol Ops Limit Enabled: Yes

In order to set a protocol OPs limit on workload from the CLI, the ‘isi performance workload pin’ and ‘isi performance workload modify’ commands now accept an optional ‘–limits’ parameter. For example, to create a pinned workload with the ‘protocol_ops’ limit set to 10000:

# isi performance workload pin test protocol:nfs3 --limits

protocol_ops:10000

Similarly, to modify an existing workload’s ‘protocol_ops’ limit to 20000:

# isi performance workload modify test 101 --limits protocol_ops:20000

protocol_ops: 10000 -> 20000

When configuring SmartQoS, always be cognizant of the fact that it is a powerful throttling tool which can be applied to significant areas of a cluster’s data and userbase. For example, protocol OPs limits can be configured for metrics such as ‘path:/ifs’, which would affect the entire /ifs filesystem, or ‘zone_name:System’ which would limit the System multi-tenant access zone and all users within it.

While such configurations are entirely valid, they would have a significant, system-wide impact. As such, caution should be exercised when configuring SmartQoS to avoid any inadvertent, unintended or unexpected performance constraints.

To clear a protocol Ops limit on workload, the ‘isi performance workload’ modify CLI command has been extended to accept an optional ‘–noprotocol-ops-limit’ argument. For example:

# isi performance workload modify test 101 --no-protocol-ops-limit

protocol_ops: 20000 -> 18446744073709551615

Note that the value of ‘18446744073709551615’ in the command output above represents ‘NO_LIMIT’ set.

A workload’s protocol Ops limit can be viewed using the ‘isi performance workload list’ and ‘isi performance workload view’ CLI commands, which have been modified in OneFS 9.5 to display the limits appropriately. For example:

# isi performance workload list test

ID Name Metric Values Creation Time Impact Limits

---------------------------------------------------------------------

101 - protocol:nfs3 2023-02-02T22:35:02 - protocol_ops:20000

---------------------------------------------------------------------



# isi performance workload view test 101

ID: 101

Name: -

Metric Values: protocol:nfs3

Creation Time: 2023-02-02T22:35:02

Impact: -

Limits: protocol_ops:20000

In the next article in this series, we’ll step through an example SmartQoS configuration and verification from both the CLI and WebUI.

OneFS SmartQoS

Built atop the partitioned performance (PP) resource monitoring framework, OneFS 9.5 introduces a new SmartQoS performance management feature. SmartQoS allows a cluster administrator to set limits on the maximum number of protocol operations per second (Protocol Ops) that individual pinned workloads can consume, in order to achieve desired business workload prioritization. Among the benefits of this new QoS functionality are:

Enabling IT infrastructure teams to achieve performance SLAs.
Allowing throttling of rogue or low priority workloads and hence prioritization of other business critical workloads.
Helping minimize data unavailability events due to overloaded clusters.

This new SmartQoS feature in OneFS 9.5 supports the NFS, SMB and S3 protocols, including mixed traffic to the same workload.

But first, a quick refresher. The partitioned performance resource monitoring framework, which initially debuted in OneFS 8.0.1, enables OneFS to track and report the use of transient system resources (resources that only exist at a given instant), providing insight into who is consuming what resources, and how much of them. Examples include CPU time, network bandwidth, IOPS, disk accesses, and cache hits, etc.

OneFS partitioned performance is an ongoing project which, in OneFS 9.5 now provides control as well as insights. This allows control of work flowing through the system, prioritization and protection of mission critical workflows, and the ability to detect if a cluster is at capacity.

Since identification of work is highly subjective, OneFS partitioned performance resource monitoring provides significant configuration flexibility, allowing cluster admins to craft exactly how they wish to define, track, and manage workloads. For example, an administrator might want to partition their work based on criterial like which user is accessing the cluster, the export/share they are using, which IP address they’re coming from – and often a combination of all three.

OneFS has always provided client and protocol statistics, however they were typically front-end only. Similarly, OneFS provides CPU, cache and disk statistics, but they did not display who was consuming them. Partitioned performance unites these two realms, tracking the usage of the CPU, drives and caches, and spanning the initiator/participant barrier.

OneFS collects the resources consumed, grouped into distinct workloads, and the aggregation of these workloads comprise a performance dataset.

Item	Description	Example
Workload	A set of identification metrics and resources used	{username:nick, zone_name:System} consumed {cpu:1.5s, bytes_in:100K, bytes_out:50M, …}
Performance Dataset	The set of identification metrics to aggregate workloads by The list of workloads collected matching that specification	{usernames, zone_names}
Filter	A method for including only workloads that match specific identification metrics.	Filter{zone_name:System} · {username:nick, zone_name:System} · {username:jane, zone_name:System} · {username:nick, zone_name:Perf}

The following metrics are tracked by partitioned performance resource monitoring:

Category	Items
Identification Metrics	· Username / UID / SID · Primary Groupname / GID / GSID · Secondary Groupname / GID / GSID · Multi-tenant access zone Name · Local/Remote IP Address/Range · Path · Share / Export ID · Protocol · System Name · Job Type
Transient Resources	· CPU Usage · Bytes In/Out – Net traffic minus TCP headers · IOPs – Protocol OPs · Disk Reads – Blocks read from disk · Disk Writes – Block written to the journal, including protection · L2 Hits – Blocks read from L2 cache · L3 Hits – Blocks read from L3 cache · Latency – Sum of time taken from start to finish of OP o ReadLatency o WriteLatency o OtherLatency
Performance Statistics	· Read/Write/Other Latency
Supported Protocols	· NFS · SMB · S3 · Jobs · Background Services

Be aware that, in OneFS 9.5, SmartQoS currently does not support the following Partitioned Performance criteria:

Unsupported Group	Unsupported Items
Metrics	• System Name • Job Type
Workloads	• Top workloads (as they are dynamically and automatically generated by kernel) • Workloads belonging to the ‘system’ dataset
Protocols	• Jobs • Background services

When pinning a workload to a dataset, note that the more metrics there are in that dataset, the more parameters need to be defined when pinning to it. For example:

Dataset = zone_name, protocol, username

To set a limit on this dataset, you’d need to pin the workload by also specifying the multi-tenant access zone name, protocol, and username.

When using the remote_address and/or local_address metrics, you can also specify a subnet. For example: 10.123.456.0/24

With the exception of the system dataset, performance datasets must be configured before statistics are collected.

For SmartQoS in OneFS 9.5, limits can be defined and configured as a maximum number of protocol operations (Protocol Ops) per second across the following protocols:

NFSv3
NFSv4
NFSoRDMA
SMB
S3

A Protocol Ops limit can be applied to up to 4 custom datasets. All pinned workloads within a dataset can have a limit configured, up to a maximum of 1024 workloads per dataset. If multiple workloads happen to share a common metric value with overlapping limits, the lowest limit that is configured would be enforced

Note that, on upgrading to OneFS 9.5, SmartQoS is activated only once the new release has been successfully committed.

In the next article in this series, we’ll take a deeper look at SmartQoS’ underlying architecture and workflow.

OneFS SmartPools Transfer Limits Configuration and Management

In the first article in this series, we looked at the architecture and considerations of the new OneFS 9.5’s SmartPools Transfer Limits. Now, we turn our attention to the configuration and management of this feature.

From the control plane side, OneFS 9.5 contains several WebUI and CLI enhancements to reflect the new SmartPools Transfer Limits functionality. Probably the most obvious change is in the ‘local storage usage status’ histogram, where tiers and their child nodepools have been aggregated, for a more logical grouping. Also blue limit-lines have been added above each of the storagepools, and a red warning status displayed for any pools that have exceeded the transfer limit.

Similarly, the storage pools status page now includes transfer limit details, with the 90% limit displayed for any storagepools using the default setting.

From the CLI, the ‘isi storagepool nodepools view’ command reports the transfer limit status and percentage for a pool. The used SSD and HDD bytes percentages, in the command output indicate where the pool utilization is relative to the transfer limit.

# isi storagepool nodepools view h5600_200tb_6.4tb-ssd_256gb
ID: 42
Name: h5600_200tb_6.4tb-ssd_256gb
Nodes: 77, 78, 79, 80, 81, 82, 83, 84
Node Type IDs: 10
Protection Policy: +2d:1n
Manual: No
L3 Enabled: Yes
L3 Migration Status: l3
Tier: -
Transfer Limit: 90%
Transfer Limit State: default
Usage
Avail Bytes: 1.13P
Avail SSD Bytes: 0.00
Avail HDD Bytes: 1.13P
Balanced: No
Free Bytes: 1.18P
Free SSD Bytes: 0.00
Free HDD Bytes: 1.18P
Total Bytes: 1.41P
Total SSD Bytes: 0.00
Total HDD Bytes: 1.41P
Used Bytes: 229.91T (17%)
Used SSD Bytes: 0.00 (0%)
Used HDD Bytes: 229.91T (17%)
Virtual Hot Spare Bytes: 56.94T

The storage transfer limit can be easily configured from the CLI as for either a specific pool, as a default, or disabled, using the new –transfer-limit and –default-transfer-limit flags.

The following CLI command can be used to set the transfer limit for a specific storagepool:

# isi storagepool nodepools/tier modify --transfer-limit={0-100, default, disabled}

For example, to set a limit of 80% on an A200 nodepool:

# isi storagepool a200_30tb_1.6tb-ssd_96gb modify --transfer-limit=80

Or to set the default limit of 90% on tier ‘perf1’:

# isi storagepool perf1 --transfer-limit=default

Note that setting the transfer limit of a tier automatically applies to all its child nodepools, regardless of any prior child limit configurations.

The global ‘isi storage settings view’ CLI command output shows the default transfer limit, which is 90%, but can be configured between 0 to 100% if desired.

# isi storagepool settings view

     Automatically Manage Protection: files_at_default

Automatically Manage Io Optimization: files_at_default

Protect Directories One Level Higher: Yes

       Global Namespace Acceleration: disabled

       Virtual Hot Spare Deny Writes: Yes

        Virtual Hot Spare Hide Spare: Yes

      Virtual Hot Spare Limit Drives: 2

     Virtual Hot Spare Limit Percent: 0

             Global Spillover Target: anywhere

                   Spillover Enabled: Yes

              Default Transfer Limit: 90%

        SSD L3 Cache Default Enabled: Yes

                     SSD Qab Mirrors: one

            SSD System Btree Mirrors: one

            SSD System Delta Mirrors: one

This default limit can be reconfigured from the CLI with the following syntax::

# isi storagepool settings modify --default-transfer-limit={0-100, disabled}

For example, to set a new default transfer limit of 85%:

# isi storagepool settings modify --default-transfer-limit=85

And the same changes can be made from the SmartPools WebUI, too, by navigating to Storage pools > SmartPools settings:

Once a SmartPools job has completed in OneFS 9.5, the job report contains a new field that reports any ‘files not moved due to transfer limit exceeded’.

# isi job reports view 1056

...

...

Policy/testpolicy/Access changes skipped 0

Policy/testpolicy/ADS containers matched 'head’ 0

Policy/testpolicy/ADS containers matched 'snapshot’ 0

Policy/testpolicy/ADS streams matched 'head’ 0

Policy/testpolicy/ADS streams matched 'snapshot’ 0

Policy/testpolicy/Directories matched 'head’ 0

Policy/testpolicy/Directories matched 'snapshot’ 0

Policy/testpolicy/File creation templates matched 0

Policy/testpolicy/Files matched 'head’ 0

Policy/testpolicy/Files matched 'snapshot’ 0

Policy/testpolicy/Files not moved due to transfer limit exceeded 0 

Policy/testpolicy/Files packed 0

Policy/testpolicy/Files repacked 0

Policy/testpolicy/Files unpacked 0

Policy/testpolicy/Packing changes skipped 0

Policy/testpolicy/Protection changes skipped 0

Policy/testpolicy/Skipped files already in containers 0

Policy/testpolicy/Skipped packing non-regular files 0

Policy/testpolicy/Skipped packing regular files 0

Additionally, the ‘SYS STORAGEPOOL FILL LIMIT EXCEEDED’ alert is triggered when a storagepool’s usage has exceeded its transfer limit. Raised at the INFO level. Each hour, CELOG fires off a monitor helper script which will measure how full each storagepool is relative to its transfer limit. The usage is gathered by reading from the diskpool database, and the transfer limits are stored in gconfig. If a nodepool has a transfer limit of 50% and usage of 75%, the monitor helper will report a measurement of 150%, triggering an alert.

# isi event view 126

ID: 126

Started: 11/29 20:32

Causes Long: storagepool: vonefs_13gb_4.2gb-ssd_6gb:hdd usage: 33.4, transfer limit: 30.0

Lnn: 0

Devid: 0

Last Event: 2022-11-29T20:32:16

Ignore: No

Ignore Time: Never

Resolved: No

Resolve Time: Never

Ended: --

Events: 1

Severity: information

And from the WebUI:

And there you have it: Transfer Limits, and the first step in the evolution towards a smarter SmartPools.