OneFS Metadata Overview

OneFS uses two principal data structures to enable information about each object, or metadata, within the file system to be searched, managed and stored efficiently and reliably. These structures are:

  • Inodes
  • B-trees

OneFS uses inodes to store file attributes and pointers to file data locations on disk, and each file, directory, link, etc, is represented by an inode.

Within OneFS, inodes come in two sizes – either 512B or 8KB. The size that OneFS uses is determined primarily by the physical and logical block formatting of the drives in a diskpool..

All OneFS inodes have both static and dynamic sections.  The static section space is limited and valuable since it can be accessed in a single I/O, and does not require a distributed lock to access. It holds fixed-width, commonly used attributes like POSIX mode bits, owner, and size.

In contrast, the dynamic portion of an inode allows new attributes to be added, if necessary, without requiring an inode format update. This can be done by simply adding a new type value with code to serialize and de-serialize it. Dynamic attributes are stored in the stream-style type-length-value (TLV) format, and include protection policies, OneFS ACLs, embedded b-tree roots, domain membership info, etc.

If necessary, OneFS can also use extension blocks, which are 8KB blocks, to store any attributes that cannot fully fit into the inode itself. Additionally, OneFS data services such as SnapshotIQ also commonly leverage inode extension blocks.

Inodes are dynamically created and stored in locations across all the cluster’s drives, and OneFS uses  b-trees (actually B+ trees) for their indexing and rapid retrieval. The general structure of a OneFS b-tree includes a top-level block, known as the ‘root’. B-tree blocks which reference other b-trees are referred to as ‘inner blocks’, and the last blocks at the end of the tree are called ‘leaf blocks’.

Only the leaf blocks actually contain metadata, whereas the root and inner blocks provide a balanced index of addresses allowing rapid identification of and access to the leaf blocks and their metadata.

A LIN, or logical inode, is accessed every time a file, directory, or b-tree is accessed.  The function of the LIN Tree is to store the mapping between a unique LIN number and it’s inode mirror addresses.

The LIN is represented as a 64-bit hexadecimal number.  Each file is assigned a single LIN and, since LINs are never reused, it is unique for the cluster’s lifespan.  For example, the file /ifs/data/test/file1 has the following LIN:

# isi get -D /ifs/data/test/f1 | grep LIN:

*  LIN:                1:2d29:4204

Similarly, its parent directory, /ifs/data/test, has:

# isi get -D /ifs/data/test | grep LIN:

*  LIN:                1:0353:bb59

*  LIN:                1:0009:0004

*  LIN:                1:2d29:4204

The file above’s LIN tree entry includes the mapping between the LIN and its three mirrored inode disk addresses.

# isi get -D /ifs/data/test/f1 | grep "inode"

* IFS inode: [ 92,14,524557565440:512, 93,19,399535074304:512, 95,19,610321964032:512 ]

Taking the first of these inode addresses, 92,14,524557565440:512, the following can be inferred, reading from left to right:

  • It’s on node 92.
  • Stored on drive lnum 14.
  • At block address 524557565440.
  • And is a 512byte inode.

The file’s parent LIN can also be easily determined:

# isi get -D /ifs/data/test/f1 | grep -i "Parent Lin"

*  Parent Lin          1:0353:bb59

In addition to the LIN tree, OneFS also uses b-trees to support file and directory access, plus the management of several other data services. That said, the three principal b-trees that OneFS employs are:

Category B+ Tree Name Description
Files Metatree or Inode Format Manager (IFM B-tree) •       This B-tree stores a mapping of Logical Block Number (LBN) to protection group

•       It is responsible to storing the physical location of file blocks on disk.

Directories Directory Format Manager (DFM B-tree) •       This B-tree stores directory entries (File names and directory/sub-directories)

•       It includes the full /ifs namespace  and everything under it.

System System B-tree (SBT) •       Standardized B+ Tree implementation to store records for OneFS internal use, typically related to a particular feature including:  Diskpool DB, IFS Domains, WORM, Idmap.  Quota (QDB) and Snapshot Tracking Files (STF) are actually separate/unique B+ Tree implementations.

OneFS also relies heavily on several other metadata structures too, including:

  • Shadow Store – Dedupe/clone metadata structures including SINs
  • QDB – Quota Database structures
  • System B+ Tree Files
  • STF – Snapshot Tracking Files
  • WORM
  • IFM Indirect
  • Idmap
  • System Directories
  • Delta Blocks
  • Logstore Files

Both inodes and b-tree blocks are mirrored on disk.  Mirror-based protection is used exclusively for all OneFS metadata because it is simple and lightweight, thereby avoiding the additional processing of erasure coding.  Since metadata typically only consumes around 2% of the overall cluster’s capacity, the mirroring overhead for metadata is minimal.

The number of inode mirrors (minimum 2x up to 8x) is determined by the nodepool’s achieved protection policy and the metadata type. Below is a mapping of the default number or mirrors for all metadata types.

Protection Level Metadata Type Number of Mirrors
+1n File inode 2 inodes per file
+2d:1n File inode 3 inodes per file
+2n File inode 3 inodes per file
+3d:1n File inode 4 inodes per file
+3d:1n1d File inode 4 inodes per file
+3n File inode 4 inodes per file
+4d:1n File inode 5 inodes per file
+4d:2n File inode 5 inodes per file
+4n File inode 5 inodes per file
2x->8x File inode Same as protection level. I.e. 2x == 2 inode mirrors
+1n Directory inode 3 inodes per file
+2d:1n Directory inode 4 inodes per file
+2n Directory inode 4 inodes per file
+3d:1n Directory inode 5 inodes per file
+3d:1n1d Directory inode 5 inodes per file
+3n Directory inode 5 inodes per file
+4d:1n Directory inode 6 inodes per file
+4d:2n Directory inode 6 inodes per file
+4n Directory inode 6 inodes per file
2x->8x Directory inode +1 protection level. I.e. 2x == 3 inode mirrors
LIN root/master 8x
LIN inner/leaf Variable – per-entry protection
IFM/DFM b-tree Variable – per-entry protection
Quota database b-tree (QDB) 8x
SBT System b-tree (SBT) Variable – per-entry protection
Snapshot tracking files (STF) 8x

Note that, by default, directory inodes are mirrored at one level higher than the achieved protection policy, since directories are more critical and make up the OneFS single namespace.  The root of the LIN Tree is the most critical metadata type and is always mirrored at 8x.

OneFS SSD strategy governs where and how much metadata is placed on SSD or HDD.  There are five SSD Strategies, and these can be configured via OneFS’ file pool policies:

SSD Strategy Description
L3 Cache All drives in a Node Pool are used as a read-only evection cache from L2 Cache.  Currently used data and metadata will fill the entire capacity of the SSD Drives in this mode.  Note:  L3 mode does not guarantee all metadata will be on SSD, so this may not be the most performant mode for metadata intensive workflows.
Metadata Read One metadata mirror is placed on SSD.  All other mirrors will be on HDD for hybrid and archive models.  This mode can boost read performance for metadata intensive workflows.
Metadata Write All metadata mirrors are placed on SSD. This mode can boost both read and write performance when there is significant demand on metadata IO.  Note:  It is important to understand the SSD capacity requirements needed to support Metadata strategies.  Therefore, we are developing the Metadata Reporting Script below which will assist in SSD metadata sizing activities.
Data Place data on SSD.  This is not a widely used strategy, as Hybrid and Archive nodes have limited SSD capacities, and metadata should take priority on SSD for best performance.
Avoid Avoid using SSD for a specific path.  This is not a widely used strategy but could be handy if you had archive workflows that did not require SSD and wanted to dedicate your SSD space for other more important paths/workflows.

Fundamentally, OneFS metadata placement is determined by the following attributes:

  • The model of the nodes in each node pool (F-series, H-series, A-series).
  • The current SSD Strategy on the node pool using configured using the default filepool policy and custom administrator-created filepool policies.
  • The cluster’s global storage pool settings.

The following CLI commands can be used to verify the current SSD strategy and metadata placement details on a cluster. For example, in order to check whether L3 Mode is enabled on a specific node pool:

# isi storagepool nodepool list

ID     Name                       Nodes  Node Type IDs  Protection Policy  Manual

----------------------------------------------------------------------------------

1      h500_30tb_3.2tb-ssd_128gb  1      1              +2d:1n             No

In the output above, there is a single H500 node pool reported with an ID of ‘1’. The details of this pool can be displayed as follows:

# isi storagepool nodepool view 1

                 ID: 1

               Name: h500_30tb_3.2tb-ssd_128gb

              Nodes: 1, 2, 3, 4, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40

      Node Type IDs: 1

  Protection Policy: +2d:1n

             Manual: No

         L3 Enabled: Yes

L3 Migration Status: l3

               Tier: -

              Usage

                Avail Bytes: 321.91T

            Avail SSD Bytes: 0.00

                   Balanced: No

                 Free Bytes: 329.77T

             Free SSD Bytes: 0.00

                Total Bytes: 643.13T

            Total SSD Bytes: 0.00

    Virtual Hot Spare Bytes: 7.86T

Note that if, as in this case, L3 is enabled on a node pool, any changes to this pool’s SSD Strategy configuration via file pool policies, etc, will not be honored. This will remain until L3 cache has been disabled and the SSDs reformatted for use as metadata mirrors.

The following CLI syntax can be used to check the cluster’s default file pool policy configuration:

# isi filepool default-policy view

          Set Requested Protection: default

               Data Access Pattern: concurrency

                  Enable Coalescer: Yes

                    Enable Packing: No

               Data Storage Target: anywhere

                 Data SSD Strategy: metadata

           Snapshot Storage Target: anywhere

             Snapshot SSD Strategy: metadata

                        Cloud Pool: -

         Cloud Compression Enabled: -

          Cloud Encryption Enabled: -

              Cloud Data Retention: -

Cloud Incremental Backup Retention: -

       Cloud Full Backup Retention: -

               Cloud Accessibility: -

                  Cloud Read Ahead: -

            Cloud Cache Expiration: -

         Cloud Writeback Frequency: -

      Cloud Archive Snapshot Files: -

                                ID: -

And to list all FilePool Policies configured on a cluster:

# isi filepool policies list

To view a specific FilePool Policy:

# isi filepool policies view <Policy Name>

OneFS also provides global storagepool configuration settings which control additional metadata placement. For example:

# isi storagepool settings view

     Automatically Manage Protection: files_at_default

Automatically Manage Io Optimization: files_at_default

Protect Directories One Level Higher: Yes

       Global Namespace Acceleration: disabled

       Virtual Hot Spare Deny Writes: Yes

        Virtual Hot Spare Hide Spare: Yes

      Virtual Hot Spare Limit Drives: 2

     Virtual Hot Spare Limit Percent: 0

             Global Spillover Target: anywhere

                   Spillover Enabled: Yes

        SSD L3 Cache Default Enabled: Yes

                     SSD Qab Mirrors: one

            SSD System Btree Mirrors: one

            SSD System Delta Mirrors: one

The CLI output below includes descriptions of the relevant metadata options available.

# isi storagepool settings modify -h | egrep -i options -A 30

Options:

    --automatically-manage-protection (all | files_at_default | none)

        Set whether SmartPools manages files' protection settings.

    --automatically-manage-io-optimization (all | files_at_default | none)

        Set whether SmartPools manages files' I/O optimization settings.

    --protect-directories-one-level-higher <boolean>

        Protect directories at one level higher.

    --global-namespace-acceleration-enabled <boolean>

        Global namespace acceleration enabled.

    --virtual-hot-spare-deny-writes <boolean>

        Virtual hot spare: deny new data writes.

    --virtual-hot-spare-hide-spare <boolean>

        Virtual hot spare: reduce amount of available space.

    --virtual-hot-spare-limit-drives <integer>

        Virtual hot spare: number of virtual drives.

    --virtual-hot-spare-limit-percent <integer>

        Virtual hot spare: percent of total storage.

    --spillover-target <str>

        Spillover target.

    --spillover-anywhere

        Set global spillover to anywhere.

    --spillover-enabled <boolean>

        Spill writes into pools within spillover_target as needed.

    --ssd-l3-cache-default-enabled <boolean>

        Default setting for enabling L3 on new Node Pools.

    --ssd-qab-mirrors (one | all)

        Controls number of mirrors of QAB blocks to place on SSDs.

    --ssd-system-btree-mirrors (one | all)

        Controls number of mirrors of system B-tree blocks to place on SSDs.

    --ssd-system-delta-mirrors (one | all)

        Controls number of mirrors of system delta blocks to place on SSDs.

OneFS defaults to protecting directories one level higher than the configured protection policy and retaining one mirror of system b-trees on SSD.  For optimal performance on hybrid platform nodes, the recommendation is to place all metadata mirrors on SSD, assuming the capacity is available.  Be aware, however, that the metadata SSD mirroring options only become active if L3 Mode is disabled.

Additionally, global namespace acceleration (GNA) is a legacy option that allows nodes without SSD to place their metadata on nodes with SSD.  All currently shipping PowerScale node models include at least one SSD drive.

 

OneFS Neighborhoods

Heterogeneous PowerScale clusters can be built with a wide variety of node styles and capacities, in order to meet the needs of a varied data set and wide spectrum of workloads. Isilon nodes are broken into several classes, or tiers, according to their functionality. These node styles encompass several hardware generations, and fall loosely into four main tiers:

OneFS neighborhoods add another level of resilience into the OneFS failure domain concept.

As we saw in the previous article, disk pools represent the smallest unit within the storage pools hierarchy. OneFS provisioning works on the premise of dividing similar nodes’ drives into sets, or disk pools, with each pool representing a separate failure domain. These are protected by default at +2d:1n (or the ability to withstand two disk or one entire node failure). In Gen6 chassis, disk pools are laid out across all five sleds in each nod.. For example, a node with three drives per sled will have the following disk pool configuration:

Node pools are groups of disk pools, spread across similar, or compatible, OneFS storage nodes. Multiple groups of different node types can work together in a single, heterogeneous cluster.

In OneFS, a failure domain is the portion of a dataset that can be negatively impacted by a specific component failure. A disk pool comprises a group of drives spread across multiple compatible nodes, and a node usually has drives in multiple disk pools which share the same node boundaries. Since each piece of data or metadata is fully contained within a single disk pool, OneFS considers the disk pool as its failure domain.

PowerScale chassis-based hybrid and archive nodes utilize sled protection, where each drive in a sled is automatically located in a different disk pool. This ensures that if a sled is removed, rather than a failure domain losing four drives, the affected failure domains each only lose one drive.

OneFS neighborhoods help organize and limit the width of a disk pool. Neighborhoods also contain all the disk pools within a certain node boundary, aligned with the disk pools’ node boundaries. As such, a node will often have drives in multiple disk pools, but a node will only be in a single neighborhood. Fundamentally, neighborhoods, node pools, and tiers are all layers on top of disk pools, and node pools and tiers are used for organizing neighborhoods and disk pools.

So the primary function of neighborhoods is to improve OneFS reliability in general, and guard against data unavailability. With the PowerScale all-flash F-series nodes, OneFS has an ideal size of 20 nodes per node pool, and a maximum size of 39 nodes. On the addition of the 40th node, the nodes automatically divide, or split, into two neighborhoods of twenty nodes.

Neighborhood F-series Nodes H-series and A-series Nodes
Smallest Size 3 4
Ideal Size 20 10
Maximum Size 39 19

In contrast, the Gen6 chassis based platforms, such as the PowerScale H-series and A-series, have an ideal neighborhood size of 10 nodes per node pool, and an automatic split occurs on the addition of the 20th node, or 5th chassis. This smaller neighborhood size helps the Gen6 hardware protect against simultaneous node-pair journal failures and full chassis failures. With the Gen6 platform and partner node protection, where possible, nodes will be placed in different neighborhoods – and hence different failure domains. Partner node protection is possible once the cluster reaches five full chassis (20 nodes) when, after the first neighborhood split, OneFS places partner nodes in different neighborhoods:

Partner node protection increases reliability because if both nodes go down, they are in different failure domains, so their failure domains only suffer the loss of a single node.

With chassis-level protection, when possible, each of the four nodes within a chassis will be placed in a separate neighborhood. Chassis protection becomes possible at 40 nodes, as the neighborhood split at 40 nodes enables every node in a chassis to be placed in a different neighborhood. As such, when a 38 node Gen6 cluster is expanded to 40 nodes, the two existing neighborhoods will be split into four 10-node neighborhoods:

Chassis-level protection ensures that if an entire chassis failed, each failure domain would only lose one node.

The distribution of nodes and drives in pools is governed by gconfig values, such as the ‘pool_ideal_size’ parameter which indicates the preferred number of nodes in a pool. For example:

# isi_gconfig smartpools | grep -i ideal

smartpools.diskpools.pool_ideal_size (int) = 20

The most common causes of a neighborhood split are:

  1. Nodes were added to the node pool and the neighborhood must be split to accommodate them, for example the nodepool went from 39 to 40 (20+20) or from 59 to 60 (20+20+20).
  2. Nodes were removed from a nodepool into a manual nodepool.
  3. Compatibility settings were changed, which made some existing nodes incompatible.

After a split, typically the Smartpools/SetProtectPlus and AutoBalance jobs run, restriping files so that the new disk pools are balanced.

The following CLI command can be used to identify the correlation between the cluster’s nodes and OneFS neighborhoods, or failure domains:

# sysctl efs.lin.lock.initiator.coordinator_weights

The command output reports the node composition of each neighborhood (failure_domain), as well as the active nodes (up_nodes) in each, and their relative weighting (weights).

With larger clusters, neighborhoods also help facilitate OneFS’ parallel cluster upgrade option. Parallel upgrade provides upgrade efficiency within node pools on larger clusters, allowing the simultaneous upgrading of a node per neighborhood until the pool is complete . By doing this, the upgrade duration is dramatically reduced, while ensuring that end-users still continue to have full access to their data.

During a parallel upgrade, the upgrade framework selects one node from each neighborhood, to run the upgrade job on simultaneously. So in this case, node 13 from neighborhood 1, node 2 from neighborhood 2, node 27 from neighborhood 3 and node 40 from neighborhood 4 will be upgraded at the same time. Considering they are all in different neighborhoods or failure domains, it will not impact the current running workload.  After the first pass completes, the upgrade framework will select another node from each neighborhood and upgrade them, and so on until the cluster is fully upgraded.

For example, consider a hundred node PowerScale H700 cluster. With an ideal layout, there would be 10 neighborhoods, each containing ten nodes. The equation for estimating upgrade a parallel completion time is as follows:

𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 = (𝑝𝑒𝑟 𝑛𝑜𝑑𝑒 𝑢𝑝𝑔𝑟𝑎𝑑𝑒 𝑑𝑢𝑟𝑎𝑡𝑖𝑜𝑛) × (max 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑛𝑜𝑑𝑒𝑠 𝑝𝑒𝑟 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟ℎ𝑜𝑜𝑑)

Assuming an upgrade time of 20 minutes per node, this would be:

20 × 10 = 200 𝑚𝑖𝑛𝑢𝑡𝑒𝑠

So the estimated duration of the hundred node parallel upgrade is 200 minutes, or just under 3 ½ hours. This is in contrast to a rolling upgrade, which would be an order of magnitude greater at 2000 minutes, or almost a day and a half.

OneFS Tiers and Pools

The term ‘tiering’ has been broadly used in data management jargon since the early days of hierarchical storage management (HSM) and information lifecycle management (ILM). Typically, a tier represents a different class of storage hardware. You could have a certain storage array with faster fiber channel drives for active data, and a slower array with large capacity SATA drives for older, infrequently accessed data. The same philosophy holds true with OneFS. However, since SmartPools terminology can prove a touch perplexing at times, this seemed like it would make for a useful blog topic.

The different hardware types within OneFS live within the same cluster as distinct groups of nodes – or ‘node pools’. As for ‘tiers’, this is actually an optional concept in OneFS. Tiers can be built from two or more node pools and accommodate similar but ‘non-compatible’ nodes to coexist in the same container.

Within OneFS, Storage Pools (and the isi storage pools command set) provide a series of abstracted layers for defining these subsets of hardware within a single cluster. This allows data to be optimally aligned with specific sets of nodes by creating data movement rules, or file pool policies. The hierarchy is as such:

Disk pools are the smallest unit within the Storage Pools hierarchy, with each pool representing a separate failure domain. Each drive may only belong to one disk pool and data protection stripes or mirrors don’t extend across pools. Disk pools are managed by OneFS and are not user configurable.

Above this, node pools are groups of disk pools, spread across similar PowerScale storage nodes (compatibility classes). Multiple groups of different node types can work together in a single, heterogeneous cluster.

Each node pool only contains disk pools from the same type of storage nodes and a disk pool may belong to exactly one node pool. Today, a minimum of 3 nodes are required per node pool.

Once node pools have been created, they can be easily modified to adapt to changing requirements. Individual nodes can be reassigned from one node pool to another.  Node pool associations can also be discarded, releasing member nodes so they can be added to new or existing pools. Node pools can also be renamed at any time without changing any other settings in the Node Pool configuration.

Any new node added to a cluster is automatically allocated to a node pool and then subdivided into Disk Pools without any additional configuration steps, inheriting the SmartPools configuration properties of that Node Pool. This means the configuration of disk pool data protection, layout and cache settings only needs to be completed once per node pool and can be done at the time the node pool is first created.

Automatic allocation is determined by the shared attributes of the new nodes with the closest matching node pool.  If the new node is not a close match to the nodes of any existing node pool, it will remain un-provisioned until the minimum node pool compatibility is met.

# isi storagepool health

SmartPools Health

Name                  Health  Type Prot   Members          Down          Smartfailed

--------------------- ------- ---- ------ ---------------- ------------- -------------

h400_30tb_1.6tb-      OK   

ssd_64gb

 h400_30tb_1.6tb-     OK    HDD  +2d:1n 37-38,40-41,43-5 Nodes:        Nodes:

ssd_64gb:47                               5:bay5,8,11,14,1 Drives:       Drives:

                                          7, 39,42:bay8,11

                                          ,14,17

a2000_200tb_800gb-    OK   

ssd_16gb

 a2000_200tb_800gb-   OK    HDD  +2d:1n 57-73:bay5,9,13, Nodes:        Nodes:

ssd_16gb:69                               17,21,           Drives:       Drives:

                                          56:bay5,13,17,21



OK = Ok, U = Too few nodes, M = Missing drives,

D = Some nodes or drives are down, S = Some nodes or drives are smartfailed,

R = Some nodes or drives need repair

When a new node pool is created and nodes are added, SmartPools associates those nodes with an ID. That ID is also used in file pool policies and file attributes to dictate file placement within a specific disk pool.

By default, a file which is not covered by a specific File Pool policy will go to the default node pool(s) identified during set up.  If no default is specified, SmartPools will write that data to the pool with the most available capacity.

Tiers are groups of node pools combined into a logical superset to optimize data storage, according to OneFS platform type:

For example, H Series node pools are often combined into a single tier, as above, in this case including H600, H500, and H400 hardware. Similarly, the archive tier combines A200 and A2000 node pools into a single, logical bucket.

This is a significant benefit because it allows customers who consistently purchase the highest capacity nodes available to consolidate a variety of node styles within a single group, or tier, and manage them as one logical group.

SmartPools users typically deploy between two and four tiers, and the maximum recommended number of tiers is five per cluster. The fastest tier usually comprises all-flash F-series nodes for the most performance demanding portions of a workflow, and the lowest, capacity-optimized tier comprising A-series chassis with large SATA drives.

The following CLI command creates the ‘archive’ tier above and adds two node pools, A200 and A2000, to this tier:

# isi storagepool tiers create archive --children a2000_200tb_800gb-ssd_16gb --children a200_30tb_800gb-ssd_16gb

Additional node pools can be easily and transparently added to a tier. For example, to add the H400 pool above to the ‘archive’ tier:

# isi storagepool nodepools modify h400_30tb_1.6tb- ssd_16gb --tier archive

Or from the WebUI:

Once the appropriate node pools and tiers have been defined and configured, file pool policies can be crafted to govern where data is placed, protected, accessed, and how it moves among the node pools and Tiers. SmartPools file pool policies can be used to broadly control the four principal attributes of a file:

Attribute Description Options
Location Where a file resides ·         Tier

·         Node Pool

I/O The file performance profile (I/O optimization setting) ·         Sequential

·         Concurrent

·         Random

·         SmartCache write caching

Protection The protection level of a file ·         Parity protection (+1n to +4n, +2d:1n, etc)

·         Mirroring (2x – 8x)

SSD Strategy The SSD strategy for a file ·         Metadata-read

·         Metadata-write

·         Data & metadata

·         Avoid SSD

A file pool policy is configured based upon a file attribute the policy can match.  These attributes include File Name, Path, File Type, File Size, Modified Time, Create Time, Metadata Change Time, Access Time or User Attributes.

Once the desired attribute is selected in a file pool policy, action can be taken on the matching subset of files. For example, if the configured attribute is File Size, additional logic is available to dictate thresholds (all files bigger than… smaller than…). Next, actions are applied: move to node pool x, set to y protection level and lay out for z access setting.

Consider a common file pools use case: An organization wants its active data to reside on their hybrid nodes in Tier 1 (SAS + SSD), and to move any data not accessed for 6 months to the cost optimized (SATA) archive Tier 2.

This can be easily achieved via a single SmartPools file pool policy, which can be configured to act either against a tier or nodepool. For example, from the WebUI by navigating to File System > Storage Pools > File Pool Policies:

Or from the CLI using the following syntax:

# isi filepool policies create "Six month archive" --description "Move all files older than 6 months to archive tier" --data-storage-target Archive1 --begin-filter --file-type=file --and --changed-time=6M --operator=gt --end-filter

The newly created file pool policy is applied when the next scheduled SmartPools job runs.

By default, the SmartPools job is scheduled to run once a day. However, the job can also be kicked off manually. For example, via the CLI:

# isi job jobs start SmartPools

Started job [55]

The running SmartPools job can be listed and queried as follows:

# isi job jobs list

ID   Type       State   Impact  Policy  Pri  Phase  Running Time

-----------------------------------------------------------------

55   SmartPools Running Low     LOW     6    1/2    -

-----------------------------------------------------------------

Total: 1
# isi job jobs view 55

               ID: 55

             Type: SmartPools

            State: Running

           Impact: Low

           Policy: LOW

              Pri: 6

            Phase: 1/2

       Start Time: 2022-03-01T22:15:22

     Running Time: 1m 51s

     Participants: 1, 2, 3

         Progress: Visited 495464 LINs (37 processed), and approx. 93 GB:  467292 files, 28172 directories; 0 errors

                   LIN Estimate based on LIN count of 2312 done on Feb 23 23:02:17 2022

                   LIN Based Estimate: N/A Remaining (>99% Complete)

                   Block Based Estimate: 11m 18s Remaining (14% Complete)




Waiting on job ID: -

      Description:

       Human Desc:

 

As can be seen above, the Job Engine ‘view’ output provides a LIN count-based progress report on the SmartPools job execution status.

OneFS CAVA Configuration and Management

In the previous article, we looked at an overview of CAVA on OneFS. Next, we’ll focus our attention on how to set it up. In a nutshell, the basic procedure for installing CAVA on a PowerScale cluster can be summarized as follows:

  • Configure CAVA servers in OneFS
  • Create an IP address pool
  • Establish a dedicate access zone for CAVA
  • Associate an Active Directory authentication provider with the access zone
  • Update the AV application’s user role.

There are also a few pre-requisites to address before starting the installation, and these include:

Pre-requisite Description
SMB service Ensure the OneFS SMB service is enabled to allow AV apps to retrieve file from cluster for scanning.
SmartConnect Service IP The SSIP should be configured at the subnet level. CAVA uses SmartConnect to balance scanning requests across all the nodes in the IP pool.
AV application and CEE Refer to the CEE installation and usage guide and the Vendor’s AV application documentation.
Active Directory OneFS CAVA requires that both cluster and AV application reside in the same AD domain.
IP Addressing All connections from AV applications are served by dedicated cluster IP pool. These IP addresses are used to configure the IP ranges in this IP pool.The best practice is to use exclusive IP addresses that are only available to the AV app.

 

  1. During CEE configuration, a domain user account is created for the Windows ‘EMC CAVA’ service. This account is used to access the hidden ‘CHECK$ ‘ SMB share in order to retrieve the files for scanning. In the following example, the user account is ‘LAB\cavausr’.
  2. Once the anti-virus servers have been installed and configured, their corresponding CAVA entries are created on the cluster. This can be done via the following CLI syntax:
# isi antivirus cava servers create --server-name=av1 --server-uri=10.1.2.3 --enabled=1

Or from the WebUI:

Multiple CAVA servers may be added in order to meet the desired server ratio for a particular PowerScale cluster. The recommended sizing formula is:

CAVA servers = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 cluster 𝑛𝑜𝑑𝑒𝑠 / 4

Before performing the following steps, ensure the CAVA ‘Service Enabled’ configuration option is set to ‘No’.

# isi antivirus cava settings view

       Service Enabled: No

     Scan Access Zones: System

               IP Pool: -

         Report Expiry: 8 weeks, 4 days

          Scan Timeout: 1 minute

Cloudpool Scan Timeout: 1 minute

     Maximum Scan Size: 0.00kB

 

  1. Next, an IP pool is created for the anti-virus applications to connect to the cluster. This dedicated IP pool should only be used by the anti-virus applications. As such, the recommendation is to ensure the IP ranges in this IP pool are exclusive and only available for use by the CAVA servers.

Avoid mixing the IP range in this dedicated IP pool with others for a regular SMB client connection.

The antivirus traffic is load balanced by the SmartConnect zone in this IP pool. Since this is a dedicated IP pool for CAVA servers, all the AV scanning should be evenly distributed within the pool. This can be accomplished with the following CLI syntax:

# isi network pools create groupnet0.subnet0.pool1 --ranges=10.1.2.3-10.1.2.13 -- sc-dns-zone "cava1.lab.onefs.com" --ifaces=1:ext-1

In this example, the IP pool is ‘groupnet0.subnet0.pool1’, with address range ‘10.1.2.3 – 10.1.2.13’, the SmartConnect Zone name is ‘cava1.lab.onefs.com’, and the assigned network interface is node 1’s ext-1. Ensure the appropriate DNS delegation is created.

  1. Once the IP pool is created, it can be associated with the CAVA configuration via the following CLI command:
# isi antivirus cava settings modify --ip-pool="groupnet0.subnet0.pool1"

This action will make the IP Pool unavailable to all other users except antivirus servers. Do you want to continue? (yes/[no]): yes

"

IP Pool groupnet0.subnet0.pool1 added to CAVA antivirus.

Note: The access zone of IP Pool groupnet0.subnet0.pool1 has been changed to AvVendor.

"

Or from the WebUI:

Be sure to create the DNS delegation for the zone name associated with this IP pool.

At this point, the IP pool is associated with the ‘AvVendor’ access zone, and the IP pool is exclusively available to the CAVA servers.

  1. Next, a dedicated access zone, ‘AvVendor’ associated with the IP pool is automatically created when the CAVA service is enabled on the cluster. The CAVA service can be enabled, via the following CLI command:
# isi antivirus cava settings modify --service-enabled=1

View the CAVA settings and verify that the ‘Server Enabled’ field is set to ‘Yes’:

# isi antivirus cava settings view

Service Enabled: Yes

Scan Access Zones: System

IP Pool: groupnet0.subnet0.pool1

Report Expiry: 8 weeks, 4 days

Scan Timeout: 1 minute

 Cloudpool Scan Timeout: 1 minute

Maximum Scan Size: 0.00kB

Confirm that the ‘AvVendor’ access zone has been successfully created:

# isi zone zones list

Name Path --------------

System /ifs

AvVendor /ifs

--------------

Total: 2
  1. If using Active Directory, OneFS CAVA requires the cluster and all the AV application servers to reside in the same AD domain.

The output of the following CLI command will display the cluster’s current authentication provider status:

# isi auth status

Evaluate which AD domain you wish to use for access. This domain should contain the account that will be used by the service on the CEE server to connect to the cluster.

If the cluster is not already joined to the desired AD domain, the following CLI syntax can be used to create an AD machine account for the cluster – in this example joining the ‘lab.onefs.com’ domain:

# isi auth ads create lab.onefs.com --user administrator

Note that a local user account can also be used in place of an AD account, if preferred.

  1. Next, the auth provider needs to be added to the ‘AvVendor’ access zone. This can be accomplished from either the WebUI or CLI. For example:
# isi zone zones modify AvVendor --add-auth-providers=lsa-activedirectoryprovider:lab.onefs.com
  1. The AV software, running on a Windows server, accesses the cluster’s data via a hidden ‘CHECK$’ share. Add the ‘ISI_PRIV_AV_VENDOR’ privilege in the AVapp role to the AV software’s user account in order to grant access to the CHECK$ share. For example, the following CLI command assigns the ‘LAB\cavausr’ user account to the ‘AVapp’ role in the ‘AvVendor’ access zone:
# isi auth roles modify AvVendor --zone= AvVendor --add-user lab\\cavausr
  1. At this point, the configuration for the CAVA service on the cluster is complete. The following CLI syntax confirms that the ‘System Status’ is reported as ‘RUNNING’:
# isi antivirus cava status

System Status: RUNNING

Fault Message: -

CEE Version: 8.7.7.0 DTD

Version: 2.3.0 AV

Vendor: Symantec
  1. On the CAVA side as well. The existing docs work fine for other products but with PowerScale there are some integration points which are NOT obvious.

The CAVA Windows service should be modified to use the AD domain or local account that was created/used in step 6 above. This user account must be added to the ‘Local Administrators’ group on the CEE server, in order to allow the CAVA process to scan the system process list and find the AV engine process:

Note that the CAVA service requires a restart after reconfiguring the log-in information.

Also ensure that the inbound port TCP12228 is available, in the case of a firewall or other packet filtering device.

Note that, if using MS Defender, ensure the option for ‘Real Time Scan’ is set to ‘enabled’.

  1. Finally, the CAVA job can be scheduled to run periodically. In this case, the job ‘av1’ is configured to scan all of /ifs, including any CloudPools, daily at 11am, and with a ‘medium’ impact policy:
# isi antivirus cava jobs create av1 -e Yes --schedule 'every day at 11:00' --impact MEDIUM --paths-to-include /ifs –enabled yes –scan-cloudpool-files yes

# isi antivirus cava jobs list

Name  Include Paths  Exclude Paths  Schedule                 Enabled

---------------------------------------------------------------------

av1   /ifs           -              every 1 days at 11:00 am Yes

---------------------------------------------------------------------

Total: 1

This can also be configured from the WebUI by navigating to Data protection > Antivirus > CAVA and clicking the ‘Add job’ button:

Additionally, CAVA antivirus filters can be managed per access zone for on-demand or protocol access using the ‘isi antivirus cava filter’s command, per below. Be aware that the ISI_PRIV_ANTIVIRUS privilege is required in order to manage CAVA filters.

# isi antivirus cava filters list

Zone   Enabled  Open-on-fail  Scan-profile  Scan Cloudpool Files

-----------------------------------------------------------------

System Yes      Yes           standard      No

zone1  Yes      Yes           standard      No

zone2  Yes      Yes           standard      No

zone3  Yes      Yes           standard      No

zone4  Yes      Yes           standard      No

-----------------------------------------------------------------

Total: 5




# isi antivirus cava filters view

Zone: System

Enabled: Yes

Open-on-fail: Yes

File Extensions: *

File Extension Action: include

Scan If No Extension: No

Exclude Paths: -

Scan-profile: standard

Scan-on-read: No

Scan-on-close: Yes

Scan-on-rename: Yes

Scan Cloudpool Files: No

The ISI_PRIV_ANTIVIRUS privilege is required in order to manage CAVA filters.

Note that blocking access, repair, and quarantine are all deferred to the specific CAVA AV Vendor, and all decisions for these are made by the AV Vendor. This is not a configurable option in OneFS for CAVA AV.

OneFS and CAVA Antivirus Scanning

When it comes to antivirus protection, OneFS provides two options. The first, and legacy, solution is using ICAP (internet content adaptation protocol), and which we featured in a previous blog article. The other, which debuted in OneFS 9.1, is the CAVA (common antivirus agent) solution. Typically providing improved performance than ICAP, CAVA employs a windows server running third-party AV software to identify and eliminate known viruses before they infect files on the system.

OneFS CAVA leverages the Dell Common Event Enabler – the same off-cluster agent that’s responsible for OneFS audit – and which provides compatibility with prominent AV software vendors. These currently include:

Product Latest Supported Version
Computer Associates eTrust 6.0
F-Secure ESS 12.12
Kaspersky Security 10 for Windows Server 10.1.2
McAfee VirusScan 8.8i Patch13
McAfee Endpoint Protection 10.7.0 Update July 2020
Microsoft SCEP 4.10.209.0
Microsoft Defender 4.18.2004
Sophos Endpoint Security Control 10.8
Symantec Protection Engine 8.0
Symantec Endpoint Protection 14.2
TrendMicro ServerProtect for Storage 6.00 Patch 1

The Common Event Enabler, or CEE, agent resides on an off-cluster server, and OneFS sends HTTP request to it as clients trigger the AV scanning workflow. The antivirus application then retrieves the pertinent files from the cluster via a hidden SMB share. These files are then checked by the AV app, after which CEE returns the appropriate response to the cluster.

OneFS Antivirus provides three different CAVA scanning methods:

AV Scan Type Description
Individual File Single file scan, triggered by the CLI/PAPI. Typically provides increased performance and reduced cluster CPU and memory utilization compared to ICAP.
On-access Triggered by SMB file operation (ie. read and close), and dependent on scan profile:

·         Standard profile: Captures an SMB close and rename operation and triggers scan on corresponding file.

·         Strict profile: Captures an SMB read, close, and rename operation and triggers scan on corresponding file.

Policy Scheduled or manual directory tree-based scans executed by the OneFS Job Engine.

An individual file scan can be initiated from the CLI or WebUI. Since the scanning target files are manually selected, and lwavscand daemon immediately sends an HTTP scanning request to the CEE or CAVA agent, at which point OneFS places a lock on the file, and the AV app retrieves it via the hidden CHECK$ SMB share. After the corresponding content has been downloaded by the CAVA server, the AV engine performs a virus detection scan, after which CEE sends the results back to the OneFS lwavscand daemon and the file lock is release. All the scanning attributes are recorded under /ifs/.ifsvar/modules/avscan/isi_avscan.db.

For on-access scanning, depending on which scan profile is selected, any SMB read or close requests are captured by the OneFS I/O manager, which passes the details to the AVscan filter. This can be configured per access-zone, and filters by:

  • File extension to include
  • File extension to exclude
  • File path to exclude

For any files matching all the filtering criteria, the lwavscand daemon sends the HTTP scanning request to the CEE/CAVA agent. Simultaneously, OneFS locks the file and the AV app downloads a copy via the hidden SMB share (CHECK$). After the corresponding content is downloaded to the CAVA server, it runs the scan with the anti-virus engine, and CEE sends the scan results and response back to the process lwavscand. At this stage, some scanning attributes are written to this file and the lock is released. The scanning attributes are listed below:

  • Scan time
  • Scan Result
  • Anti-virus signature timestamp
  • Scan current

Then the previous SMB workflow can continue if the file is not infected. Otherwise, the file is denied access. If there are errors during the scan and the scan profile is strict, the setting Open on fail determines the next action.

Policy Scan

A policy scan is triggered by the job engine. Like other OneFS jobs, the job impact policy, priority, and schedule can be configured as desired. In this case, filtering includes:

  • File extension to include
  • File extension to exclude
  • File path to exclude
  • File path to include

There are two types of connections between CAVA servers and the cluster’s nodes. An SMB connection, which is used to fetch files or contents for scanning via the hidden CHECK$ share, and CEE’s HTTP connections for scan requests, scan responses, and other functions. CAVA SMB connections use a dedicated IP pool and access zone to separate traffic from other workloads. Within this IP pool, SmartConnect ensures all SMB connections are evenly spread across all the nodes.

The anti-virus applications use the SMB protocol to fetch the file or a portion of a file for scanning in a PowerScale cluster. From the anti-virus perspective, a hidden SMB share CHECK$ is used for this purpose and resides on every anti-virus application server. This share allows access to all files on a PowerScale cluster under /ifs. SmartConnect and a dedicated access zone are introduced in this process to ensure that all the connection from the anti-virus application is fully distributed and load-balanced among all the configured nodes in the IP pool. A hidden role AVVendor is created by the CAVA anti-virus service to map the CAVA service account to OneFS.

Upon completion of a scan, a report is available containing a variety of data and statistics. The details of scan reports are stored in a database under /ifs/.ifsvar/modules/avscan/isi_avscan.db and the scans can be viewed from either the CLI or WebUI. OneFS also generates a report every 24 hours that includes all on-access scans that occurred during the previous day. These antivirus scan reports contain the following information and metrics:

  • Scan start time.
  • Scan end time.
  • Number of files scanned.
  • Total size of the files scanned.
  • Scanning network bandwidth utilization.
  • Network throughput consumed by scanning.
  • Scan completion.
  • Infected file detection count.
  • Infected file names.
  • Threats associated with infected files.
  • OneFS response to detected threats.

CAVA is broadly compatible with the other OneFS data services, but with the following notable exceptions:

Data Service Compatibility
ICAP Compatible
Protocols SMB support
SmartLock Incompatible. OneFS cannot set scanning attributes on SmartLock WORM protected files as they are read-only. As such, AV application cannot clean them.
SnapshotIQ Incompatible
SyncIQ SyncIQ is unable to replicate AV scan file attributes to a target cluster.

 When it comes to designing and deploying a OneFS CAVA environment, the recommendation is to adopt the following general sizing guidance:

  • Use Windows servers with at least 16 GB memory and two-core processors.
  • Provision a minimum of two CEE/CAVA servers for redundancy.

For the Common Event Enabler, connectivity limits and sizing rules include:

  • Maximum connections per CAVA servers = 20
  • Number of different CAVA servers a cluster node can connect to = 4
  • The nth cluster node starts from nth CAVA server

The formula is as follows:

𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑖𝑜𝑛𝑠 = 𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑐𝑜𝑛𝑒𝑐𝑡𝑖𝑜𝑛𝑠 𝑝𝑒𝑟 𝐶𝐴𝑉𝐴 𝑠𝑒𝑟𝑣𝑒𝑟 × 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝐴𝑉𝐴 𝑠𝑒𝑟𝑣𝑒𝑟𝑠 = 20 × 5 = 100

Additionally, the following formula can help determine the appropriate number of CAVA servers for a particular cluster:

CAVA servers = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 cluster 𝑛𝑜𝑑𝑒𝑠 / 4

For example, imagine a forty four node PowerScale H700 cluster. Using the above formular, the recommended number of CAVA servers will be:

42 / 4 = 10.5

So, with upward rounding, eleven CAVA servers would be an appropriate ratio for this cluster configuration. Note that this approach, based on the cluster’s node count, is applicable for both on-access and policy-based scanning.

For environments where not all of a cluster’s nodes have network connectivity (NANON), CAVA is supported with the following caveats:

Scan Type Requirements
Individual file Supported when the node which triggers the scan has the front-end connectivity to the CAVA servers. Otherwise, the CELOG ‘SW_AVSCAN_CAVA_SERVER_OFFLINE’ alert is fired.
Policy-based Works by default. OneFS 9.1 and later automatically detect any nodes without network connectivity and prevents the job engine from allocating scanning tasks to them. Only nodes with front-end connectivity to the CAVA servers will participate in running scheduled scans.
On-access Supported when the node which triggers the scan has front-end connectivity to the CAVA servers. Otherwise, the CELOG ‘SW_AVSCAN_CAVA_SERVER_OFFLINE’ alert is fired.

 

OneFS SmartLock Configuration and Management

In the previous article, we looked at the architecture of SmartLock. Now, we’ll turn our attention to its configuration, compatibility, and use.

The following CLI procedure can be used to commit a file into WORM state without having to remove the “write” permissions of end users using the chmod -w <file> command. This avoids re-enabling write permission after the files have been released from WORM retention. These commands are applicable for both Enterprise and Compliance SmartLock modes:

  1. Create and verify a new SmartLock domain. Note that if you specify the path of an existing directory, the directory must be empty. The following command creates an enterprise directory with a default retention period of two years, a minimum retention period of one year, and a maximum retention period of three years:.
# isi worm domains create /ifs/smartlk --default-retention 2Y --min-retention 1Y --max-retention 3Y --mkdir

# isi worm domains list

ID     Path         Type

------------------------------

656128 /ifs/smartlk enterprise

------------------------------

Total: 1




# isi worm domains view 656128

               ID: 656128

             Path: /ifs/smartlk

             Type: enterprise

              LIN: 4760010888

Autocommit Offset: -

    Override Date: -

Privileged Delete: off

Default Retention: 2Y

    Min Retention: 1Y

    Max Retention: 3Y

   Pending Delete: False

       Exclusions: -


Alternatively, a WORM domain can also configured from the WebUI, by navigating to Cluster management > SmartLock and clicking on the ‘Create domain’ button:

In addition to SmartLock Domains, OneFS also supports SnapRevert, SyncIQ, and writable snapshot domains.  A list of all configured domains on a cluster can be viewed with the following CLI syntax:

# isi_classic domain list -l

ID | Root Path | Type          | Overrid | Def. | Min.  | Max.  | Autocomm | Priv.

---+-------------+-------------+---------+------+-------+-------+----------+------

65>| /ifs/sync1>| SyncIQ        | None    | None | None  | None  | None     | Off

65>| /ifs/smartlk>| SmartLock     | None    | None | None  | None  | None     | Off

65>| /ifs/snap1| Writable,Snap>| None    | None | None  | None  | None     | Off
  1. Next, create a file:
# date >> /ifs/smartlk/wormfile1
  1. View the file’s permission bits and confirm that the owner has write permission:
    # ls -lsia /ifs/smartlk
total 120

4760010888 32 drwx------     2 root  wheel   27 Feb  3 23:19 .

         2 64 drwxrwxr-x +   8 root  wheel  170 Feb  3 23:11 ..

4760931018 24 -rw-------     1 root  wheel   29 Feb  3 23:19 wormfile1
  1. Examine the wormfile1 file’s contents and verify that it has not been WORM committed:
# cat /ifs/smartlk/wormfile1

cat /ifs/smartlk/wormfile1

Thu Feb  3 23:19:09 GMT 2022


# isi worm files view !$

isi worm files view /ifs/smartlk/wormfile1

WORM Domains

ID     Root Path

-------------------

656128 /ifs/smartlk


WORM State: NOT COMMITTED

   Expires: -

5. Commit the file into WORM. The ‘chmod’ CLI command can be used to manually commit a file with write permission into WORM state. For example:

# chmod a-w /ifs/smartlk/wormfile1

Or:

# chmod 444 /ifs/smartlk/wormfile1

The ‘chflags’ command can also be used:

# chflags dos-readonly /ifs/smartlk/wormfile1

Similarly, a writable file can be committed from an SMB client’s GUI by checking the ‘Read-only’ attribute within the file’s ‘Properties’ tab. For example:

  1. Verify the file is committed and the permission bits are preserved:
# isi worm files view /ifs/smartlk/wormfile1

WORM Domains

ID     Root Path

-------------------

656128 /ifs/smartlk




WORM State: COMMITTED

   Expires: 2024-02-03T23:23:45




# ls -lsia /ifs/smartlk

total 120

4760010888 32 drwx------     2 root  wheel   27 Feb  3 23:19 .

         2 64 drwxrwxr-x +   8 root  wheel  170 Feb  3 23:11 ..

4760931018 24 -rw-------     1 root  wheel   29 Feb  3 23:19 wormfile1

 

  1. Override the retention period expiration date for all WORM committed files in a SmartLock directory
# isi worm domains modify /ifs/smartlk --override-date 2024-08-03

# isi worm domains view 656128

               ID: 656128

             Path: /ifs/smartlk

             Type: enterprise

              LIN: 4760010888

Autocommit Offset: -

    Override Date: 2024-08-03T00:00:00

Privileged Delete: off

Default Retention: 2Y

    Min Retention: 1Y

    Max Retention: 3Y

   Pending Delete: False

       Exclusions: /ifs/smartlk/wormdir1


  1. Create a new directory under the domain and configure it for exclusion from WORM.
# isi worm domains modify --exclude /ifs/smartlk/notwormdir1 656128

To remove an existing exclusion domain on a directory, first delete the directory and all of its constituent files.

  1. Verify that exclusion has been configured:
# isi worm domains view 656128

               ID: 656128

             Path: /ifs/smartlk

             Type: enterprise

              LIN: 4760010888

Autocommit Offset: -

    Override Date: -

Privileged Delete: off

Default Retention: 2Y

    Min Retention: 1Y

    Max Retention: 3Y

   Pending Delete: False

       Exclusions: /ifs/smartlk/notwormdir1

10: Delete the file from its enterprise WORM domain before the expiration date via the privileged delete option:

# rm -f /ifs/smartlk/wormfile1

rm: /ifs/smartlk/wormfile1: Read-only file system

# isi worm files delete /ifs/smartlk/wormfile1

Are you sure? (yes/[no]): yes

Operation not permitted.  Please verify that privileged delete is enabled.

# isi worm domains modify /ifs/smartlk --privileged-delete true

# isi worm domains view /ifs/smartlk

               ID: 656128

             Path: /ifs/smartlk

             Type: enterprise

              LIN: 4760010888

Autocommit Offset: -

    Override Date: 2024-08-03T00:00:00

Privileged Delete: on

Default Retention: 2Y

    Min Retention: 1Y

    Max Retention: 3Y

   Pending Delete: False

       Exclusions: /ifs/smartlk/wormdir1

# isi worm files delete /ifs/smartlk/wormfile1

Are you sure? (yes/[no]): yes

# ls -lsia /ifs/smartlk/wormfile1

ls: /ifs/smartlk/wormfile1: No such file or directory

 

  1. Delete SmartLock Domain.

For enterprise-mode domains, ensure the domain is empty first, then remove with ‘rmdir’:

# rmdir /ifs/smartlk/notwormdir1

# ls -lsia /ifs/smartlk

total 96

4760010888 32 drwx------     2 root  wheel    0 Feb  4 00:06 .

         2 64 drwxrwxr-x +   8 root  wheel  170 Feb  3 23:11 ..

# isi worm domains list

ID     Path         Type

------------------------------

656128 /ifs/smartlk enterprise

------------------------------

Total: 1

# rmdir /ifs/smartlk

# isi worm domains list

ID Path Type

------------

------------

Total: 0

 Note that SmartLock’s ‘pending delete’ option can only be used for compliance-mode directories:

# isi worm domains modify --set-pending-delete 656128

You have 1 warnings:

Marking a domain for deletion is irreversible. Once marked for deletion:
  1. No new files may be created, hardlinked or renamed into the domain.
  2. Existing files may not be committed or have their retention dates extended.
  3. SyncIQ will fail to sync to and from the domain.
Are you sure? (yes/[no]): yes

Cannot mark non-compliance domains for deletion.

In the following table, the directory default retention offset is configured for one year for both scenarios A & B. This means that any file committed to that directory without a specific expiry date (ie. scenario A) will automatically inherit a one year expiry from the date it’s committed. As such, WORM protection for any files committed on 2/1/2022 will be until 2/1/2023, based on the default one year setting. In scenarios A & B, the retention date of 3/1/2023 takes precedent over any directory default retention offset period. In scenario D, the Override Retention Date, configured at the directory level, ensures that all data in that directory is automatically protected through a minimum of 1/31/2023. This can be useful for organizations to satisfy litigation holds and other blanket data retention requirements.

Scenario A

No file-retention date

Scenario B

File-retention date > directory offset

Scenario C

Directory-offset > file-retention date

Scenario 4

Override retention date

File-retention date N/A 3/1/2023 3/1/2023 3/1/2023
Directory-offset retention date 1 year 1 year 2 years 1 year
File-committed date 2/1/2022 2/1/2022 2/1/2022 2/1/2022
Expiration date 2/1/2023 3/1/2023 3/1/2023 1/31/2023

In general, SmartLock plays nicely with OneFS and the other data services. For example, SnapshotIQ can take snaps of data in a WORM directory.  Similarly, SmartLock retention settings are retained aross NDMP backups, avoiding the need to recommit files after a data restore. Be aware, though, that NDMP backups of SmartLock Compliance data do not satisfy the regulatory requirements of SEC 17a-4(f).

For CloudPools, WORM protection of SmartLink stub files is permitted in OneFS 8.2 and later, but only in Enterprise mode. Stubs can be moved into an Enterprise mode directory, preventing their modification or deletion, as well as recalled from the cloud to the cluster once committed.

SyncIQ interop with SmartLock has more complexity, context, and caveats, and the compatibility between different directory types on the replication source and target can be characterized as follows:

Source dir Target dir SyncIQ failover SyncIQ failback
Non-WORM Non-worm Yes Yes, unless files are WORM committed on target. Retention not enforced.
Non-WORM Enterprise Yes No
Non-WORM Compliance No Yes: But files do not have WORM status.
Enterprise Non-worm Yes: Replication type allowed, but retention not enforced Yes: Newly committed WORM files included.
Enterprise Enterprise Yes No
Enterprise Compliance No No
Compliance Non-worm No No
Compliance Enterprise No No
Compliance Compliance Yes Yes: Newly committed WORM files are included

When using SmartLock with SyncIQ replication, configure Network Time Protocol (NTP) peer mode on both the source and target cluster to ensure that cluster clocks are synchronized. Where possible, also run the same OneFS version across replication pairs and create a separate SyncIQ policy for each SmartLock directory.

OneFS SmartLock and WORM Data Retention

Amongst the plethora of OneFS data services sits SmartLock, which provides immutable data storage for the cluster, guarding critical data against accidental, malicious or premature deletion or alteration. Based on a write once, read many (WORM) locking capability, SmartLock offers tamper-proof archiving for regulatory compliance and disaster recovery purposes, etc. Configured at the directory-level, SmartLock delivers secure, simple to manage data containers that remain locked for a configurable duration or indefinitely. Additionally, SmartLock satisfies the regulatory compliance demands of stringent corporate and federal data retention policies.

Once SmartLock is licensed and activated on a cluster, it can be configured to run in one of two modes:

Mode Description
Enterprise Upon SmartLock license activation, cluster automatically becomes enterprise mode enabled, permitting Enterprise directory creation and committing of WORM files with specified retention period.
Compliance  A SmartLock licensed cluster can optionally be put into compliance mode, allowing data to be protected in compliance directories, in accordance with U.S. Securities and Exchange Commission rule 17a-4(f) regulations.

Note that SmartLock’s configuration is global, so enabling it in either enterprise or compliance mode is a cluster-wide action.

Under the hood, SmartLock uses both a system clock, which is common to both modes, and a compliance clock which is exclusive to compliance mode. The latter updates the time in a protected system B-tree and, unlike the system clock, cannot be manually modified by either  the ‘root’ or ‘compadmin’ roles.

SmartLock employs the OneFS job engine framework to run its WormQueue job, which routinely scans the SmartLock queue for LINs that need to be committed. By default, WormQueue is scheduled to run every day at 2am, with a ‘LOW’ impact policy and a relative job priority of ‘6’.

SmartLock also leverages the OneFS IFS domains ‘restricted writer’ infrastructure to enforce its immutability policies. Within OneFS, a domain defines a set of behaviors for a collection of files under a specified directory tree. More specifically, a protection domain is a marker which prevents a configured subset of files and directories from being deleted or modified.

If a directory has a protection domain applied to it, that domain will also affect all of the files and subdirectories under that top-level directory. A cluster’s WORM domains can be reported with the ‘isi worm domains list’.

As we’ll see, in some instances, OneFS creates protection domains automatically, but they can also be configured manually via the ‘isi worm domains create’ CLI syntax.
SmartLock domains are assigned to WORM directories to prevent committed files from being modified or deleted, and OneFS automatically sets up a SmartLock domain when a WORM directory is created. That said, domain governance does come with some boundary caveats. For example, SmartLock root directories cannot be nested, either in enterprise or compliance mode, and hard links cannot cross SmartLock domain boundaries. Note that a SmartLock domain cannot be manually deleted. However, upon removal of a top level SmartLock directory, OneFS automatically deletes the associated SmartLock domain.

Once a file is WORM committed, or SmartLocked, it is indefinitely protected from modification or moves. When its expiry, or ‘committed until’ date is reached, the only actions that can be performed on a file are either deletion or extension of its ‘committed until’ date.

Enterprise mode permits storing data in enterprise directories in a non-rewriteable, non-erasable format, protecting data from deletion or modification, and enterprise directories in both enterprise and compliance modes. If a file in an enterprise directory is committed to a WORM state, it is protected from accidental deletion or modification until the retention period has expired. In Enterprise mode, you may also create regular directories, which are not subjected to SmartLock’s retention requirements. A cluster operating in enterprise mode provides the best of both worlds, providing WORM security, while retaining root access and full administrative control.

Any directory under /ifs is a potential SmartLock candidate, and it does not have to be empty before being designated as an enterprise directory. SmartLock and regular directories can happily coexist on the same cluster and once a directory is converted to a SmartLock directory, it is immediately ready to protect files that are placed there. SmartLock also automatically protects any subdirectories in a domain, and they inherit all the WORM settings of the parent directory – unless a specific exclusion is configured.

The following table indicates which type of files (and directories) can be created in each of the cluster modes:

Directory Enterprise mode Compliance mode
Regular (non-WORM) directory Y Y
Enterprise directory (governed by system clock) Y Y
Compliance directory (governed by compliance clock) N N

Both enterprise and compliance modes also permit the creation of non-WORM files and directories, which obviously are free from any retention requirements. Also, while an existing, empty enterprise directory can be upgraded to a compliance directory, it cannot be reverted back to an enterprise directory. Writes are permitted during a directory’s conversion to a SmartLock directory, but files cannot be committed until the transformation is complete.

Attribute Enterprise directory Compliance directory
Customization file-retention dates Y Y
Post-commit write protection Y Y
SEC 17a-4(f) compliance N Y
Privileged delete On | Off | Disabled Disabled
Tamper-proof clock N Y
Root account Y N
Compadmin account N Y

Regular users and apps are not permitted to move, delete, or change SmartLock-committed files. However, SmartLock includes the notion of a ‘privileged user’ with elevated rights (ie. root access) and the ability to delete WORM protected files. ‘Privileged deletes’ can only be performed on the cluster itself, not over the network, which adds an additional layer of control of privileged functions. The privileged user exists only in enterprise mode, and a privilege delete can also be performed by non-root users that have been assigned the ‘ISI_PRIV_IFS_WORM_DELETE’ RBAC role. The privileged delete capability is disabled by default, but can easily be enabled for enterprise directories (note that there is no privileged delete for compliance directories). It may also be permanently disabled to guard against deletion or modification from admin accounts.

Files in a SmartLock directory can be committed to a WORM state simply by removing their read-write privileges. A specific retention expiry date can be set on the file, which can be increased but not reduced. Any files that have been committed to a WORM state cannot be moved, even after their retention period has expired.

A file’s retention period expiration date can be set by modifying its access time (-atime), for example by using the CLI ‘touch’ command. However, note that simply accessing a file will not set the retention period expiration date.

If a file is ‘touched’ in a SmartLock directory without specifying a WORM release date, when committed, the retention period is automatically set to the default period specified for the SmartLock directory. If a default has not be set, the file is assigned a retention period of zero seconds. As such, the recommendation is clearly to specify a minimum retention period for all SmartLock directories.

After a directory has been marked as a SmartLock directory, any files committed to this directory are immutable until their retention time expires and cannot be delete, moved, or changed. The administrator sets retention dates on files, and can extend them but not shorten them. When a file’s retention policy expires, it becomes a normal file which can be managed or removed as desired.

Any uncommitted files in a SmartLock directory can be altered or moved at will, up until they are WORM committed, at which point they become immutable. Files can be committed to a SmartLock directory either locally via the cluster CLI, or from an NFS or SMB client.

The OneFS CLI chmod command can also be used to commit a file into the WORM state without removing the write permissions. This option alleviates the need for cluster admins to re-enable the permissions for users to modify the files after they have been released from WORM retention.

Be aware that, when using the cp -p command to copy a file with a read-only flag from the OneFS CLI to WORM domains, the target file will immediately be committed to WORM state. The read-only flag can be removed from source files before copying them to WORM domains if this is undesirable.

File retention times can be set in two ways: on a per-file basis, or using the directory default setting, with the file’s retention date overriding the directory default retention date. For any existing WORM files, the retention date can be extended but not reduced. Even after their retention date has expired, WORM-protected files cannot be altered while they reside in a SmartLock directory. Instead, they must be moved to a non-WORM directory before modification is permitted. Note that, in an enterprise SmartLock directory with ‘privileged delete’ enabled, a WORM state file can still be removed within its retention period.

A directory level ‘override retention date’ option is also available, which can be used automatically extend the retention date of files. Note that any files in the directory whose retention times were already set beyond the scope of the override are unaffected by an override.

OneFS 8.2 and later permits the exclusion of a directory inside a WORM domain from WORM retention policies and protection. Any content that is created later in the excluded directory is not SmartLock protected. 8.2 also introduced a ‘pending delete’ flag, which can be set on a compliance-mode WORM domain in order to delete the domain and the directories and files in it. Note that ‘pending delete’ cannot be configured on an enterprise-mode WORM domain.

Marking a domain for deletion is an irreversible action, after which no new files may be created, hard linked, or renamed into the domain. Existing files may not be committed or have their retention dates extended. Additionally, any SyncIQ replication tasks to and from the domain will fail.

In contrast to enterprise mode, and in order to maintain an elevated level of security and immutability, SmartLock compliance mode enforces some stringent administrative restrictions. Most notably, the root account (UID 0) loses its elevated privileges. Instead, clusters operating in compliance mode can use the ‘compadmin’ account to run some restricted commands with root privileges via ‘sudo’. These commands are specified in the /usr/local/etc/sudoers file.

Given the administrative restrictions of Compliance mode and its potential to affect both compliance data and enterprise data, it is strongly advised to only use compliance mode if mandated to do so under SEC rule 17a-4(f). In most cases, enterprise mode, with privileged delete disabled, offers a level of security that is more than adequate for the vast majority environments. Some fundamental differences between the two mode includes:

Enterprise mode Compliance mode
Governed by single system clock. Governed by both system clock and compliance clock.
Data committed to WORM state only for specified retention period. WORM state file can have privileged delete capability within retention period. Data written to compliance directories, when committed, can never be altered.
Root/adminstrator access retains full administrative control. Root/administrator access is disabled.

A directory cannot be upgraded to a SmartLock Compliance directory until the WORM compliance clock has been set on the cluster. This can be done from the CLI using the ‘isi worm cdate set’ syntax. Be aware that setting the compliance clock is a one-time deal, after which it cannot be altered.

The ComplianceStoreDelete job, introduced in OneFS 8.2, automatically tracks and removes expired files from the compliance store that were placed there as a result of SyncIQ conflict resolution. By default, this job is scheduled to run once per month at ‘low’ impact and priority level 6, but it can also be run manually on-demand.

If the decision is made to configure a cluster for Compliance mode, some tips to make the transition smoother include verifying that the cluster time is correct before putting the PowerScale cluster in Compliance mode. Plan on using RBAC for cluster access to perform administrative operations and data management, and be aware that for the CLI, ‘compadmin’ account represents a regular data user. For any root-owned data, perform all ownership or permission changes before upgrading to Compliance mode and avoid changing ownership of any system files. Review the permissions and ownership of any files that exclusively permit the root account to manage or write data to them. After upgrading to Compliance mode, if the OneFS configuration limits the relevant POSIX access permissions to specific directories or files, writing data or changing ownership of these objects will be blocked. Ensure that any SMB shares do not have ‘run-as-root’ configured before putting the PowerScale cluster into Compliance mode.

In the next article in this series, we’ll take a look at SmartLock’s configuration, compatibility, and use.

OneFS HDFS ACLs Support

The Hadoop Distributed File System (HDFS) permissions model for files and directories has much in common with the ubiquitous POSIX model. HDFS access control lists, or ACLs, are Apache’s implementation of POSIX.1e ACLs, and each file and directory is associated with an owner and a group. The file or directory has separate permissions for the user that is the owner, for other users that are members of the group, and for all other users.

However, in addition to the traditional POSIX permissions model, HDFS also supports POSIX ACLs. ACLs are useful for implementing permission requirements that differ from the natural organizational hierarchy of users and groups. An ACL provides a way to set different permissions for specific named users or named groups, not only the file’s owner and the file’s group.  HDFS ACL also support extended ACL entries, which allow multiple users and groups to configure different permissions for the same HDFS directory and files.

Compared to regular OneFS ACLs, HDFS ACLs differ in both structure and access check algorithm. The most significant difference is that the OneFS ACL algorithm is cumulative, so that if a user is requesting red-write-execute access, this can be granted by three different OneFS ACEs (access control entries). In contrast, HDFS ALCs have a strict, predefined ordering of ACEs for the evaluation of permissions and there is no accumulation of permission bits. As such, OneFS uses a translation algorithm to bridge this gap. For instance, here’s an example of the mappings between HDFS ACEs and OneFS ACEs:

HDFS ACE OneFS ACE
user:rw- allow <owner-sid> std-read std-write

deny <owner-sid> std-execute

user:sheila:rw- allow <owner-sheilas-sid> std-read std-write

deny <owner-shelias-sid> std-execute

group::r– allow <group-sid> std-read

deny <group-sid> std-write std-execute

mask::rw- posix-mask  <everyone-sid> std-read std-write

deny <owner-sid> std-execute

other::–x allow <everyone-sid> std-execute

deny <everyone-sid> std-read std-write

The HDFS Mask ACEs (derived from POSIX.1e) are special access control entries which apply to every named user and group, and represent the maximum permissions that a named user or any group can have for the file. They were introduced essentially to extend traditional read-write-execute (RWX) mode bits to support the ACLs model. OneFS translates these Mask ACEs to a ‘posix-mask’ ACE, and the access checking algorithm in the kernel applies the mask permissions to all appropriate trustees.

On translation of an HDFS ACL -> OneFS ACL ->HDFS ACL, OneFS is guaranteed to return the same ACL However, translation a OneFS ACL->HDFS ACL->OneFS ACL can be unpredictable. OneFS ACLs are richer that HDFS ACLs and can lose info when translated to HDFS ACLs when dealing with HDFS ACLs with multiple named groups when a trustee is a member of multiple groups. So, for example, if a user has RWX in one group, RW in another, and R in a third group, results we be as expected and RWX access will be granted. However, if a user has W in one group, X in another, and R in a third group, in these rare cases the ACL translation algorithm will prioritize security and produce a more restrictive ‘read-only’ ACL.

Here’s the full set of ACE mappings between HDFS and OneFS internals:

HDFS ACE permission Apply to OneFS Internal ACE permission
rwx Directory allow dir_gen_read, dir_gen_write, dir_gen_execute, delete_child

deny

File allow file_gen_read, file_gen_write, file_gen_execute

deny

rw- Directory allow dir_gen_read, dir_gen_write, delete_child

deny traverse

File allow file_gen_read, file_gen_write

deny execute

r-x Directory allow dir_gen_read, dir_gen_execute,

deny add_file, add_subdir, dir_write_ext_attr, delete_child, dir_write_attr

File allow file_gen_read, file_gen_execure

deny file_write, append, file_write_ext_attr, file_write_attr

r– Directory allow dir_gen_read

deny add_file, add_subdir, dir_write_ext_attr, traverse, delete_child, dir_write_attr

File allow file_gen_read

deny file_write, append, file_write_ext_attr, execute, file_write_attr

-wx Directory allow dir_gen_write, dir_gen_execute, delete_child, dir_read_attr

deny list, dir_read_ext_attr

File Allow file_gen_write, file_gen_execute, file_read_attr

deny file_read, file_read_ext_attr

-w- Directory allow dir_gen_write, delete_child, dir_read_attr

deny list, dir_read_ext_attr, traverse

File Allow file_gen_write, file_read_attr

deny file_read, file_read_ext_attr, execute

–x Directory allow dir_gen_execute, dir_read_attr

deny list, add_file, add_subdir, dir_read_ext_attr, dir_write_ext_attr, delete_child

File allow file_gen_execute, file_read_attr

deny file_read, file_write, append, file_read_ext_attr, file_write_ext_attr, file_write_attr

Directory allow std_read_dac, std_synchronize, dir_read_attr

deny list, add_file, add_subdir, dir_read_ext_attr, dir_write_ext_attr, traverse, delete_child, dir_write_attr

File allow std_read_dac, std_synchronize, file_read_attr

deny file_read, file_write, append, file_read_ext_attr, file_write_ext_attr, execute, file_write_attr

Enabling HDFS ACLs is typically performed from the OneFS CLI, and can be configured at a per access zone granularity. For example:

# isi hdfs settings modify --zone=System --hdfs-acl-enabled=true

# isi hdfs settings --zone=System

                  Service: Yes

       Default Block Size: 128M

    Default Checksum Type: none

      Authentication Mode: simple_only

           Root Directory: /ifs/data

          WebHDFS Enabled: Yes

            Ambari Server:

          Ambari Namenode:

              ODP Version:

     Data Transfer Cipher: none

 Ambari Metrics Collector:

         HDFS ACL Enabled: Yes

Hadoop Version 3 Or Later: Yes

Note that the ‘hadoop-version-3’ parameter should be set to ‘false’ if the HDFS client(s) are running Hadoop 2 or earlier. For example:

# isi hdfs settings modify --zone=System --hadoop-version-3-or-later=false

# isi hdfs settings --zone=System

                  Service: Yes

       Default Block Size: 128M

    Default Checksum Type: none

      Authentication Mode: simple_only

           Root Directory: /ifs/data

          WebHDFS Enabled: Yes

            Ambari Server:

          Ambari Namenode:

              ODP Version:

     Data Transfer Cipher: none

 Ambari Metrics Collector:

         HDFS ACL Enabled: Yes

Hadoop Version 3 Or Later: No

Other useful ACL configuration options include:

# isi auth settings acls modify --calcmodegroup=group_only

Specifies how to approximate group mode bits. Options include:

Option Description
group_access Approximates group mode bits using all possible group ACEs. This causes the group permissions to appear more permissive than the actual permissions on the file.
group_only Approximates group mode bits using only the ACE with the owner ID. This causes the group permissions to appear more accurate, in that you see only the permissions for a particular group and not the more permissive set. Be aware that this setting may cause access-denied problems for NFS clients, however.

 

# isi auth settings acls modify --calcmode-traverse=require

Specifies whether or not traverse rights are required in order to traverse directories with existing ACLs.

# isi auth settings acls modify --group-ownerinheritance=parent

Specifies how to handle inheritance of group ownership and permissions. If you enable a setting that causes the group owner to be inherited from the creator’s primary group, you can override it on a per-folder basis by running the chmod command to set the set-gid bit. This inheritance applies only when the file is created. The following options are available:

Option Description
native Specifies that if an ACL exists on a file, the group owner will be inherited from the file creator’s primary group. If there is no ACL, the group owner is inherited from the parent folder.
parent Specifies that the group owner be inherited from the file’s parent folder.
creator Specifies that the group owner be inherited from the file creator’s primary group.

Once configured, any of these settings can then be verified with the following CLI syntax:

# isi auth setting acls view

Standard Settings

      Create Over SMB: allow

                Chmod: merge

    Chmod Inheritable: no

                Chown: owner_group_and_acl

               Access: windows

Advanced Settings

                        Rwx: retain

    Group Owner Inheritance: parent

                  Chmod 007: default

             Calcmode Owner: owner_aces

             Calcmode Group: group_only

           Synthetic Denies: remove

                     Utimes: only_owner

                   DOS Attr: deny_smb

                   Calcmode: approx

          Calcmode Traverse: require

If and when it comes to troubleshooting HDFS ACLs, the /var/log/hdfs.log file can be invaluable. Setting the hdfs.log file to the ‘debug’ level will generate log entries detailing ACL and ACE parsing and configuration. This can be easily accomplished with the following CLI command:

# isi hdfs log-level modify --set debug

Here’s an example of debug level log entries showing ACL output and creation:

A file’s detailed security descriptor configuration can be viewed from OneFS using the ‘ls -led’ command. Specifically, the ‘-e’ argument in this command will print all the ACLs and associated ACEs. For example:

# ls -led /ifs/hdfs/file1

-rwxrwxrwx +     1 yarn  hadoop  0 Jan 26 22:38 file1

OWNER: user:yarn

GROUP: group:hadoop

0: everyone posix_mask file_gen_read,file_gen_write,file_gen_execute

1: user:yarn allow file_gen_read,file_gen_write,std_write_dac

2: group:hadoop allow std_read_dac,std_synchronize,file_read_attr

3: group:hadoop deny file_read,file_write,append,file_read_ext_attr, file_write_ext_attr,execute,file_write_attr

The access rights to a file for a specific user can also be viewed using the ‘isi auth access’ CLI command as follows:

# isi auth access --user=admin /ifs/hdfs/file1

               User

                 Name: admin

                  UID: 10

                  SID: SID:S-1-22-1-10


               File

                  Owner

                 Name: yarn

                   ID: UID:5001

                  Group

                 Name: hadoop

                   ID: GID:5000

       Effective Path: /ifs/hdfs/file1

     File Permissions: file_gen_read

        Relevant Aces: group:admin allow file_gen_read

                              Group:admin deny file_write, append,file_write_ext_attr,execute,file_wrote_attr

                              Everyone allow file_gen_read,file_gen_write,file_gen_execute

        Snapshot Path: No

          Delete Child: The parent directory allows delete_child for this user, the user may delete the file.

When using SyncIQ or NDMP with HDFS ACLs, be aware that replicating or restoring data from a OneFS 9.3 cluster with HDFS ACLs enabled to a target cluster running an earlier version of OneFS can result in loss of ACL detail. Specifically, the new ‘mask’ ACE type cannot be replicated or restored on target clusters running prior releases, since the ‘mask’ ACE is only introduced in 9.3. Instead, OneFS will generate two versions of the ACL – with and without ‘Mask’ – which maintains the same level of security.

OneFS NFS Performance Resource Monitoring

Another feature of the recent OneFS 9.3 release is enhanced performance resource monitoring for the NFS protocol, adding path tracking for NFS and bringing it up to parity with the SMB protocol resource monitoring.

But first, a quick refresher. The performance resource monitoring framework enables OneFS to track and report the use of transient system resources (ie. resources that only exists at a given instant), providing insight into who is consuming what resources, and how much of them. Examples include CPU time, network bandwidth, IOPS, disk accesses, and cache hits (but not currently disk space or memory usage). OneFS performance resource monitoring initially debuted in OneFS 8.0.1 and is an ongoing project, which ultimately will provide both insights and control. This will allow prioritization of work flowing through the system, prioritization and protection of mission critical workflows, and the ability to detect if a cluster is at capacity.

Since identification of work is highly subjective, OneFS performance resource monitoring provides significant configuration flexibility, allowing cluster admins to define exactly how they wish to track workloads. For example, an administrator might want to partition their work based on criterial like which user is accessing the cluster, the export/share they are using, which IP address they’re coming from – and often a combination of all three.

So why not just track it all, you may ask? It would simply generate too much data (potentially requiring a cluster just to monitor your cluster!).

OneFS has always provided client and protocol statistics, however they were typically front-end only. Similarly, OneFS provides CPU, cache and disk statistics, but they did not display who was consuming them. Partitioned performance unites these two realms, tracking the usage of the CPU, drives and caches, and spanning the initiator/participant barrier.

Under the hood, OneFS collects the resources consumed, grouped into distinct workloads, and the aggregation of these workloads comprise a performance dataset.

Item Description Example
Workload ·         A set of identification metrics and resources used {username:nick, zone_name:System} consumed {cpu:1.5s, bytes_in:100K, bytes_out:50M, …}
Performance Dataset ·         The set of identification metrics to aggregate workloads by

·         The list of workloads collected matching that specification

 

{usernames, zone_names}
Filter ·         A method for including only workloads that match specific identification metrics. Filter{zone_name:System}

{username:nick, zone_name:System}

{username:jane, zone_name:System}

{username:nick, zone_name:Perf}

The following metrics are tracked:

Category Items
Identification Metrics ·         Username / UID / SID

·         Primary Groupname / GID / GSID

·         Secondary Groupname / GID / GSID

·         Zone Name

·         Local/Remote IP Address/Range

·         Path*

·         Share / Export ID

·         Protocol

·         System Name*

·         Job Type

Transient Resources ·         CPU Usage

·         Bytes In/Out

·         IOPs

·         Disk Reads/Writes

·         L2/L3 Cache Hits

Performance Statistics ·         Read/Write/Other Latency
Supported Protocols ·         NFS

·         SMB

·         S3

·         Jobs

·         Background Services

With the exception of the system dataset, performance datasets must be configured before statistics are collected. This is typically performed via the ‘isi performance’ CLI command set, but can also be done via the platform API:

 https://<node_ip>:8080/platform/performance

Once a performance dataset has been configured, it will continue to collect resource usage statistics every 30 seconds until it is deleted. These statistics can be viewed via the ‘isi statistics’ CLI interface.

This is as simple as, first, creating a dataset specifying which identification metrics you wish to partition work by:

# isi performance dataset create --name ds_test1 username protocol export_id share_name

Next, waiting 30 seconds for data to collect.

Finally, viewing the performance dataset:

# isi statistics workload --dataset ds_test1

    CPU  BytesIn  BytesOut   Ops  Reads  Writes   L2   L3  ReadLatency  WriteLatency  OtherLatency  UserName  Protocol  ExportId  ShareName  WorkloadType

---------------------------------------------------------------------------------------------------------------------------------------------------------

 11.0ms     2.8M     887.4   5.5    0.0   393.7  0.3  0.0      503.0us       638.8us         7.4ms       nick      nfs3         1          -             -

  1.2ms    10.0K     20.0M  56.0   40.0     0.0  9.0  0.0        0.0us         0.0us         0.0us      jane      nfs3         3          -             -

  1.0ms    18.3M      17.0  10.0    0.0    47.0  0.0  0.0        0.0us       100.2us         0.0us      jane      smb2         -       home             -

 31.4us     15.1      11.7   0.1    0.0     0.0  0.0  0.0      349.3us         0.0us         0.0us       nick      nfs4         4          -             -

166.3ms      0.0       0.0   0.0    0.0     0.1  0.0  0.0        0.0us         0.0us         0.0us         -         -         -          -      Excluded

 31.6ms      0.0       0.0   0.0    0.0     0.0  0.0  0.0        0.0us         0.0us         0.0us         -         -         -          -        System

 70.2us      0.0       0.0   0.0    0.0     3.3  0.1  0.0        0.0us         0.0us         0.0us         -         -         -          -       Unknown

  0.0us      0.0       0.0   0.0    0.0     0.0  0.0  0.0        0.0us         0.0us         0.0us         -         -         -          -    Additional

  0.0us      0.0       0.0   0.0    0.0     0.0  0.0  0.0        0.0us         0.0us         0.0us         -         -         -          - Overaccounted

---------------------------------------------------------------------------------------------------------------------------------------------------------

Total: 8

Be aware that OneFS can only harvest a limited number of workloads per dataset. As such, it keeps track of the highest resource consumers, which it considers top workloads, and outputs as many of those as possible, up to a limit of either 1024 top workloads or 1.5MB memory per sample (whichever is reached first).

When you’re done with a performance dataset, it can be easily removed with the ‘isi performance dataset delete <dataset_name> syntax. For example:

# isi performance dataset delete ds_test1

Are you sure you want to delete the performance dataset.? (yes/[no]): yes

Deleted performance dataset 'ds_test1' with ID number 1.

Performance resource monitoring includes the five aggregate workloads below for special cases, and OneFS can output up to 1024 additional pinned workloads. Plus, a cluster administrator can configure up to four custom datasets, or special workloads, which can aggregate special case workloads.

Aggregate Workload Description
Additional ·         Not in the ‘top’ workloads
Excluded ·         Does not match the dataset definition (ie. missing a required metric such as export_id)

·         Does not match any applied filter (See later in the slide deck)

Overaccounted ·         Total of work that has appeared in multiple workloads within the dataset

·         Can happen for datasets using path and/or group metrics

System ·         Background system/kernel work
Unknown ·         OneFS couldn’t determine where this work came from

The ‘system’ dataset is created by default and cannot be deleted or renamed. Plus, it is the only dataset that includes the OneFS services and job engine’s resource metrics:

OneFS Resource Details
System service ·         Any process started by isi_mcp / isi_daemon

·         Any protocol (likewise) service

Jobs ·         Includes job ID, job type, and phase

Non-protocol work only includes CPU, cache and drive statistics, but no bandwidth usage, op counts, or latencies. Plus, protocols (ie. S3) that haven’t been fully integrated into the resource monitoring framework yet also get these basic statistics, and only in the system dataset

If no dataset is specified in an ‘isi statistics workload’ command, the ‘system’ dataset statistics are displayed by default:

# isi statistics workload

Performance workloads can also be ‘pinned’, allowing a workload to be tracked even when it’s not a ‘top’ workload, and regardless of how many resources it consumes. This can be configured with the following CLI syntax:

# isi performance workloads pin <dataset_name/id> <metric>:<value>

Note that all metrics for the dataset must be specified. For example:

# isi performance workloads pin ds_test1 username:jane protocol:nfs3 export_id:3

# isi statistics workload --dataset ds_test1

    CPU  BytesIn  BytesOut   Ops  Reads  Writes   L2   L3  ReadLatency  WriteLatency  OtherLatency  UserName  Protocol  ExportId   WorkloadType

-----------------------------------------------------------------------------------------------------------------------------------------------

 11.0ms     2.8M     887.4   5.5    0.0   393.7  0.3  0.0      503.0us       638.8us         7.4ms       nick      nfs3         1              -

  1.2ms    10.0K     20.0M  56.0   40.0     0.0  0.0  0.0        0.0us         0.0us         0.0us      jane      nfs3         3         Pinned <- Always Visible

 31.4us     15.1      11.7   0.1    0.0     0.0  0.0  0.0      349.3us         0.0us         0.0us       jim      nfs4         4              -    Workload

166.3ms      0.0       0.0   0.0    0.0     0.1  0.0  0.0        0.0us         0.0us         0.0us         -         -         -       Excluded

 31.6ms      0.0       0.0   0.0    0.0     0.0  0.0  0.0        0.0us         0.0us         0.0us         -         -         -         System

 70.2us      0.0       0.0   0.0    0.0     3.3  0.1  0.0        0.0us         0.0us         0.0us         -         -         -        Unknown

  0.0us      0.0       0.0   0.0    0.0     0.0  0.0  0.0        0.0us         0.0us         0.0us         -         -         -     Additional <- Unpinned workloads

  0.0us      0.0       0.0   0.0    0.0     0.0  0.0  0.0        0.0us         0.0us         0.0us         -         -         -  Overaccounted    that didn’t make it

-----------------------------------------------------------------------------------------------------------------------------------------------

Total: 8

Workload filters can also be configured to restrict output to just those workloads which match a specified criteria. This allows for more finely-tuned tracking, which can be invaluable when dealing with large number of workloads. Configuration is greedy, since a workload will be included if it matches any applied filter. Filtering can be implemented with the following CLI syntax:

# isi performance dataset create <all_metrics> --filters <filtered_metrics>

# isi performance filter apply <dataset_name/id> <metric>:<value>

For example:

# isi performance dataset create --name ds_test1 username protocol export_id --filters username,protocol

# isi performance filters apply ds_test1 username:nick protocol:nfs3

# isi statistics workload --dataset ds_test1

    CPU  BytesIn  BytesOut   Ops  Reads  Writes   L2   L3  ReadLatency  WriteLatency  OtherLatency  UserName  Protocol  ExportId   WorkloadType

-----------------------------------------------------------------------------------------------------------------------------------------------

 11.0ms     2.8M     887.4   5.5    0.0   393.7  0.3  0.0      503.0us       638.8us         7.4ms       nick      nfs3         1              - <- Matches Filter

 13.0ms     1.4M     600.4   2.5    0.0   200.7  0.0  0.0      405.0us       638.8us         8.2ms       nick      nfs3         7              - <- Matches Filter

167.5ms    10.0K     20.0M  56.1   40.0     0.1  0.0  0.0      349.3us         0.0us         0.0us         -         -         -       Excluded <- Aggregate of not

 31.6ms      0.0       0.0   0.0    0.0     0.0  0.0  0.0        0.0us         0.0us         0.0us         -         -         -         System    matching filter

 70.2us      0.0       0.0   0.0    0.0     3.3  0.1  0.0        0.0us         0.0us         0.0us         -         -         -        Unknown

  0.0us      0.0       0.0   0.0    0.0     0.0  0.0  0.0        0.0us         0.0us         0.0us         -         -         -     Additional

  0.0us      0.0       0.0   0.0    0.0     0.0  0.0  0.0        0.0us         0.0us         0.0us         -         -         -  Overaccounted

-----------------------------------------------------------------------------------------------------------------------------------------------

Total: 8

Be aware that the metric to filter on must be specified when creating the dataset. A dataset with a filtered metric, but no filters applied, will output an empty dataset.

As mentioned previously, NFS path tracking is the principal enhancement to OneFS 9.3 performance resource monitoring, and this can be easily enabled as follows (for both the NFS and SMB protocols:

# isi performance dataset create --name ds_test1 username path --filters=username,path

The statistics can then be viewed with the following CLI syntax (in this case for dataset ‘ds_test1’ with ID 1):

# isi statistics workload list --dataset 1

When it comes to path tracking, there are a couple of caveats and restrictions that should be considered. Firstly, with OneFS 9.3, it is now available on the NFS and SMB protocols only. Also, path tracking is expensive and OneFS cannot track every single path. The desired paths to be tracked must be listed first, and can be specified by either pinning a workload or applying a path filter

If the resource overhead of path tracking is deemed too costly, consider a possible equivalent alternative such as tracking by NFS Export ID or SMB Share, as appropriate.

Users can have thousands of secondary groups, often simply too many to track. Only the primary group will be tracked until groups are specified, so any secondary groups to be tracked must be specified first, again either by applying a group filter or pinning a workload.

Note that some metrics only apply to certain protocols, For example, ‘export_id’ only applies to NFS. Similarly, ‘share_name’ only applies to SMB. Also, note that there is no equivalent metric (or path tracking) implemented for the S3 object protocol yet. This will be addressed in a future release.

These metrics can be used individually in a dataset or combined. For example, a dataset configured with both ‘export_id’ and metrics will list both NFS and SMB workloads with either ‘export_id’ or ‘share_name’. Likewise, a dataset with only the ‘share_name’ metric will only list SMB workloads, and a dataset with just the ‘export_id’ metric will only list NFS, excluding any SMB workloads. However, if a workload is excluded it will be aggregated into the special ‘Excluded’ workload.

When viewing collected metrics, the ‘isi statistics workload’ CLI command displays only the last sample period in table format with the statistics normalized to a per-second granularity:

# isi statistics workload [--dataset <dataset_name/id>]

Names are resolved wherever possible, such as UID to username, IP address to hostname, etc, and lookup failures are reported via an extra ‘error’ column. Alternatively, the ‘—numeric’ flag can also be included to prevent any lookups:

# isi statistics workload [--dataset <dataset_name/id>] –numeric

Aggregated cluster statistics are reported by default, but adding the ‘—nodes’ argument will provide per node statistics as viewed from each initiator. More specifically, the ‘–nodes=0’ flag will display the local node only, whereas ‘–nodes=all’ will report all nodes.

The ‘isi statistics workload’ command also includes standard statistics output formatting flags, such as ‘—sort’, ‘—totalby’, etc.

    --sort (CPU | (BytesIn|bytes_in) | (BytesOut|bytes_out) | Ops | Reads |

      Writes | L2 | L3 | (ReadLatency|latency_read) |

      (WriteLatency|latency_write) | (OtherLatency|latency_other) | Node |

      (UserId|user_id) | (UserSId|user_sid) | UserName | (Proto|protocol) |

      (ShareName|share_name) | (JobType|job_type) | (RemoteAddr|remote_address)

      | (RemoteName|remote_name) | (GroupId|group_id) | (GroupSId|group_sid) |

      GroupName | (DomainId|domain_id) | Path | (ZoneId|zone_id) |

      (ZoneName|zone_name) | (ExportId|export_id) | (SystemName|system_name) |

      (LocalAddr|local_address) | (LocalName|local_name) |

      (WorkloadType|workload_type) | Error)

        Sort data by the specified comma-separated field(s). Prepend 'asc:' or

        'desc:' to a field to change the sort order.







    --totalby (Node | (UserId|user_id) | (UserSId|user_sid) | UserName |

      (Proto|protocol) | (ShareName|share_name) | (JobType|job_type) |

      (RemoteAddr|remote_address) | (RemoteName|remote_name) |

      (GroupId|group_id) | (GroupSId|group_sid) | GroupName |

      (DomainId|domain_id) | Path | (ZoneId|zone_id) | (ZoneName|zone_name) |

      (ExportId|export_id) | (SystemName|system_name) |

      (LocalAddr|local_address) | (LocalName|local_name))

        Aggregate per specified fields(s).

In addition to the CLI, the same output can be accessed via the platform API:

# https://<node_ip>:8080/platform/statistics/summary/workload

Raw metrics can also be viewed using the ‘isi statistic query’ command:

# isi statistics query <current|history> --format=json --keys=<node|cluster>.performance.dataset.<dataset_id>

Options are either ‘current’, which will provide most recent sample, or ‘history’, which will report samples collected over the last 5 minutes. Names are not looked up and statistics are not normalized – the sample period is included in the output. The command syntax must also include the ‘–format=json’ flag, since no other output formats are currently supported. Additionally, there are two distinct types of keys:

Key Type Description
cluster.performance.dataset.<dataset_id> Cluster-wide aggregated statistics
node.performance.dataset.<dataset_id> Per node statistics from the initiator perspective

Similarly, this raw metrics can also be obtained via the platform API as follows:

https://<node_ip>:8080/platform/statistics/<current|history>?keys=node.performance.dataset.0

OneFS Writable Snapshots Coexistence and Caveats

In the final article in this series, we’ll take a look at how writable snapshots co-exist in OneFS, and their integration and compatibility with the various OneFS data services.

Staring with OneFS itself, support for writable snaps is introduced in OneFS 9.3 and the functionality is enabled after committing an upgrade to OneFS 9.3. Non-disruptive upgrade to OneFS 9.3 and to later releases is fully supported. However, as we’ve seen over this series of articles, writable snaps in 9.3 do have several proclivities, caveats, and recommended practices. These include observing the default OneFS limit of 30 active writable snapshots per cluster (or at least not attempting to delete more than 30 writable snapshots at any one time if the max_active_wsnaps limit is increased for some reason).

There are also certain restrictions governing where a writable snapshot’s mount point can reside in the file system. These include not at an existing directory, below a source snapshot path, or under a SmartLock or SyncIQ domain. Also, while the contents of a writable snapshot will retain the permissions they had in the source, ensure the parent directory tree has appropriate access permissions for the users of the writable snapshot.

The OneFS job engine and restriping jobs also support writable snaps and, in general, most jobs can be run from inside a writable snapshot’s path. However, be aware that jobs involving tree-walks will not perform copy-on-read for LINs under writable snapshots.

The PermissionsRepair job is unable to fix the files under a writable snapshot which have yet to be copy-on-read. To prevent this, prior to starting a PermissionsRepair job, instance the `find` CLI command (which searches for files in directory hierarchy) can be run on the writable snapshot’s root directory in order to populate the writable snapshot’s namespace.

The TreeDelete job works for subdirectories under writable snapshot. TreeDelete, run on or above a writable snapshot, will not remove the root, or head, directory of the writable snapshot (unless scheduled through writable snapshot library).

The ChangeList, FileSystemAnalyze, and IndexUpdate jobs are unable to see files in a writable snapshot. As such , the FilePolicy job, which relies on index update, cannot manage files in writable snapshot.

Writable snapshots also work as expected with OneFS access zones. For example, a writable snaps can be created in a different access zone than its source snapshot:

# isi zone zones list

Name     Path

------------------------

System   /ifs

zone1    /ifs/data/zone1

zone2    /ifs/data/zone2

------------------------

Total: 2

# isi snapshot snapshots list

118224 s118224              /ifs/data/zone1

# isi snapshot writable create s118224 /ifs/data/zone2/wsnap1

# isi snapshot writable list

Path                   Src Path          Src Snapshot

------------------------------------------------------

/ifs/data/zone2/wsnap1 /ifs/data/zone1   s118224

------------------------------------------------------

Total: 1

Writable snaps are supported on any cluster architecture that’s running OneFS 9.3, and this includes clusters using data encryption with SED drives, which are also fully compatible with writable snaps. Similarly, InsightIQ and DataIQ both support and accurately report on writable snapshots, as expected.

Writable snaps are also compatible with SmartQuotas, and use directory quotas capacity reporting to track both physical and logical space utilization. This can be viewed using the `isi quota quotas list/view` CLI commands, in addition to ‘isi snapshots writable view’ command.

Regarding data tiering, writable snaps co-exist with SmartPools, and configuring SmartPools above writable snapshots is supported. However, in OneFS 9.3, SmartPools filepool tiering policies will not apply to a writable snapshot path. Instead, the writable snapshot data will follow the tiering policies which apply to the source of the writable snapshot. Also, SmartPools is frequently used to house snapshots on a lower performance, capacity optimized tier of storage. In this case, the performance of a writable snap that has its source snapshot housed on a slower pool will likely be negatively impacted. Also, be aware that CloudPools is incompatible with writable snaps in OneFS 9.3 and CloudPools on a writable snapshot destination is currently not supported.

On the data immutability front, a SmartLock WORM domain cannot be created at or above a writable snapshot under OneFS 9.3.  Attempts will fail with following messages:

# isi snapshot writable list

Path                  Src Path          Src Snapshot

-----------------------------------------------------

/ifs/test/rw-head     /ifs/test/head1   s159776

-----------------------------------------------------

Total: 1

# isi worm domain create -d forever /ifs/test/rw-head/worm

Are you sure? (yes/[no]): yes

Failed to enable SmartLock: Operation not supported

# isi worm domain create -d forever /ifs/test/rw-head/worm

Are you sure? (yes/[no]): yes

Failed to enable SmartLock: Operation not supported

Creating a writable snapshot inside a directory with a WORM domain is also not permitted.

# isi worm domains list

ID      Path           Type

---------------------------------

2228992 /ifs/test/worm enterprise

---------------------------------

Total: 1

# isi snapshot writable create s32106 /ifs/test/worm/wsnap

Writable Snapshot cannot be nested under WORM domain 22.0300: Operation not supported

Regarding writable snaps and data reduction and storage efficiency, the story in OneFS 9.3 is as follows.  OneFS in-line compression works with writable snapshots data, but in-line deduplication is not supported, and existing files under writable snapshots will be ignored by in-line dedupe. However, inline dedupe can occur on any new files created fresh on the writable snapshot.

Post-process deduplication of writable snapshot data is not supported and the SmartDedupe job will ignore the files under writable snapshots.

Similarly, at the per-file level, attempts to clone data within a writable snapshot (cp -c) are also not permitted and will fail with the following error:

# isi snapshot writable list

Path                  Src Path          Src Snapshot

-----------------------------------------------------

/ifs/wsnap1           /ifs/test1        s32106

-----------------------------------------------------

Total: 31

# cp -c /ifs/wsnap1/file1 /ifs/wsnap1/file1.clone

cp: file1.clone: cannot clone from 1:83e1:002b::HEAD to 2:705c:0053: Invalid argument

Additionally, in a small file packing archive workload the files under a writable snapshot will be ignored by the OneFS small file storage efficiency (SFSE) process, and there is also currently no support for data inode inlining within a writable snapshot domain.

Turning attention to data availability and protection, there are also some writable snapshot caveats in OneFS 9.3 to bear in mind. Regarding SnapshotIQ:

Writable snaps cannot be created from a source snapshot of the /ifs root directory. They also cannot currently be locked or changed to read-only. However, the read-only source snapshot will be locked for the entire life cycle of a writable snapshot.

Writable snaps cannot be refreshed from a newer read-only source snapshot. However, a new writable snapshot can be created from a more current source snapshot in order to include subsequent updates to the replicated production dataset. Taking a read-only snapshot of a writable snap is also not permitted and will fail with the following error message:

# isi snapshot snapshots create /ifs/wsnap2

snapshot create failed: Operation not supported

Writable snapshots cannot be nested in the namespace under other writable snapshots, and such operations will return ENOTSUP.

Only IFS domains-based snapshots are permitted as the source of a writable snapshot. This means that any snapshots taken on a cluster prior to OneFS 8.2 cannot be used as the source for a writable snapshot.

Snapshot aliases cannot be used as the source of a writable snapshot, even if using the alias target ID instead of the alias target name. The full name of the snapshot must be specified.

# isi snapshot snapshots view snapalias1

               ID: 134340

             Name: snapalias1

             Path: /ifs/test/rwsnap2

        Has Locks: Yes

         Schedule: -

  Alias Target ID: 106976

Alias Target Name: s106976

          Created: 2021-08-16T22:18:40

          Expires: -

             Size: 90.00k

     Shadow Bytes: 0.00

        % Reserve: 0.00%

     % Filesystem: 0.00%

            State: active

# isi snapshot writable create 134340 /ifs/testwsnap1

Source SnapID(134340) is an alias: Operation not supported

The creation of SnapRevert domain is not permitted at or above a writable snapshot. Similarly, the creation of a writable snapshot inside a directory with a SnapRevert domain is not supported. Such operations will return ENOTSUP.

Finally, the SnapshotDelete job has no interaction with writable snapss and the TreeDelete job handles writable snapshot deletion instead.

Regarding NDMP backups, since NDMP uses read-only snapshots for checkpointing it is unable to backup writable snapshot data in OneFS 9.3.

Moving on to replication, SyncIQ is unable to copy or replicate the data within a writable snapshot in OneFS 9.3. More specifically:

Replication Condition Description
Writable snapshot as SyncIQ source Replication fails because snapshot creation on the source writable snapshot is not permitted.
Writable snapshot as SyncIQ target Replication job fails as snapshot creation on the target writable snapshot is not supported.
Writable snapshot one or more levels below in SyncIQ source Data under a writable snapshot will not get replicated to the target cluster. However, the rest of the source will get replicated as expected
Writable snapshot one or more levels below in SyncIQ target If the state of a writable snapshot is ACTIVE, the writable snapshot root directory will not get deleted from the target, so replication will fail.

Attempts to replicate the files within a writable snapshot with fail with the following SyncIQ job error:

“SyncIQ failed to take a snapshot on source cluster. Snapshot initialization error: snapshot create failed. Operation not supported.”

Since SyncIQ does not allow its snapshots to be locked, OneFS cannot create writable snapshots based on SyncIQ-generated snapshots. This includes all read-only snapshots with a ‘SIQ-*’ naming prefix. Any attempts to use snapshots with an SIQ* prefix will fail with the following error:

# isi snapshot writable create SIQ-4b9c0e85e99e4bcfbcf2cf30a3381117-latest /ifs/rwsnap

Source SnapID(62356) is a SyncIQ related snapshot: Invalid argument

A common use case for writable snapshots is in disaster recovery testing. For DR purposes, an enterprise typically has two PowerScale clusters configured in a source/target SyncIQ replication relationship. Many organizations have a requirement to conduct periodic DR tests to verify the functionality of their processes and tools in the event of a business continuity interruption or disaster recovery event.

Given the writable snapshots compatibility with SyncIQ caveats described above, a writable snapshot of a production dataset replicated to a target DR cluster can be created as follows:

  1. On the source cluster, create a SyncIQ policy to replicate the source directory (/ifs/test/head) to the target cluster:
# isi sync policies create --name=ro-head sync --source-rootpath=/ifs/prod/head --target-host=10.224.127.5 --targetpath=/ifs/test/ro-head

# isi sync policies list

Name  Path              Action  Enabled  Target

---------------------------------------------------

ro-head /ifs/prod/head  sync    Yes      10.224.127.5

---------------------------------------------------

Total: 1
  1. Run the SyncIQ policy to replicate the source directory to /ifs/test/ro-head on the target cluster:
# isi sync jobs start ro-head --source-snapshot s14


# isi sync jobs list

Policy Name  ID   State   Action  Duration

-------------------------------------------

ro-head        1    running run     22s

-------------------------------------------

Total: 1


# isi sync jobs view ro-head

Policy Name: ro-head

         ID: 1

      State: running

     Action: run

   Duration: 47s

 Start Time: 2021-06-22T20:30:53


Target:
  1. Take a read-only snapshot of the replicated dataset on the target cluster:
# isi snapshot snapshots create /ifs/test/ro-head

# isi snapshot snapshots list

ID   Name                                        Path

-----------------------------------------------------------------

2    SIQ_HAL_ro-head_2021-07-22_20-23-initial /ifs/test/ro-head

3    SIQ_HAL_ro-head                          /ifs/test/ro-head

5    SIQ_HAL_ro-head_2021-07-22_20-25         /ifs/test/ro-head

8    SIQ-Failover-ro-head-2021-07-22_20-26-17 /ifs/test/ro-head

9    s106976                                   /ifs/test/ro-head

-----------------------------------------------------------------
  1. Using the (non SIQ_*) snapshot of the replicated dataset above as the source, create a writable snapshot on the target cluster at /ifs/test/head:
# isi snapshot writable create s106976 /ifs/test/head
  1. Confirm the writable snapshot has been created on the target cluster:
# isi snapshot writable list

Path              Src Path         Src Snapshot

----------------------------------------------------------------------

/ifs/test/head  /ifs/test/ro-head  s106976

----------------------------------------------------------------------

Total: 1




# du -sh /ifs/test/ro-head

 21M    /ifs/test/ro-head
  1. Export and/or share the writable snapshot data under /ifs/test/head on the target cluster using the protocol(s) of choice. Mount the export or share on the client systems and perform DR testing and verification as appropriate.
  2. When DR testing is complete, delete the writable snapshot on the target cluster:
# isi snapshot writable delete /ifs/test/head

Note that writable snapshots cannot be refreshed from a newer read-only source snapshot. A new writable snapshot would need to be created using the newer snapshot source in order to reflect and subsequent updates to the production dataset on the target cluster.

So there you have it: The introduction of writable snaps v1 in OneFS 9.3 delivers the much anticipated ability to create fast, simple, efficient copies of datasets by enabling a writable view of a regular snapshot, presented at a target directory, and accessible by clients across the full range of supported NAS protocols.