Nick Trimbee – Unstructured Data Quick Tips

OneFS HTTPS Support for CAVA Antivirus

OneFS 9.14 introduced HTTPS support for the Common AntiVirus Agent (CAVA), enhancing the existing scanning solution with secure communication between the PowerScale cluster and Dell Common Event Enabler (CEE) server and enabling operation in hardening mode.

CAVA a service that runs on the CEE server, often referred to as CEE‑CAVA. It receives file access requests from the CAVA client interface on OneFS (also known as OneFS‑CAVA) and attempts to access the specified file over SMB. During this process, the antivirus engine residing on the CEE server scans the file. If CEE‑CAVA can successfully access and scan the file without detecting any threats, it returns a response indicating that the file is not infected. Conversely, if a threat is identified or access conditions indicate an issue, it responds with a message indicating that the file is infected.

Prior to OneFS 9.14, communication between the OneFS‑CAVA agent and the CEE‑CAVA service was conducted over HTTP, which did not provide secure transport. As a result, CAVA was automatically disabled on clusters operating in hardening mode. However, customers have expressed a strong requirement to enable CAVA while maintaining compliance with hardening mode security standards.

CAVA HTTPS support requires CEE version 9.1.2 or later in conjunction with OneFS 9.14 or newer, while existing antivirus licensing continues to be sufficient.

Configuring and enabling CAVA with HTTPS entails the following high-level process:

Note that the CAVA over HTTPS capability only becomes available after the upgrade to OneFS 9.14 has been fully committed. Additionally, upgrading to OneFS 9.14 does not impact existing legacy scanning workflows that continue to use standard HTTP transport.

Setup CA certificates

From the OneFS CLI, import the CA certificate(s) using its full path on the cluster:

# isi antivirus cava certificates ca import --name=ca_unit --certificate-path=/ifs/certs/ca.pem

Imported certificate: 4c6d3f0d1c3128ed09e02a78a1d9e3104edbfeb9c327c5e735afb3b48cd5bdbc

Once done, verify that the CA certificate was imported successfully:

# isi antivirus cava certificates ca list

ID Name Not Before Not After Status

--------------------------------------------------------------------------------------------------------------------------

4c6d3f0d1c3128ed09e02a78a1d9e3104edbfeb9c327c5e735afb3b48cd5bdbc ca_unit 2025-08-28T07:39:22 2125-08-04T07:39:22 valid

--------------------------------------------------------------------------------------------------------------------------

Total: 1

# isi antivirus cava certificates ca view 4c6d3f0d1c3128ed09e02a78a1d9e3104edbfeb9c327c5e735afb3b48cd5bdbc

ID: 4c6d3f0d1c3128ed09e02a78a1d9e3104edbfeb9c327c5e735afb3b48cd5bdbc

Name: ca_unit

Status: valid

Not Before: 2025-08-28T07:39:22

Not After: 2125-08-04T07:39:22

Description:

Fingerprints

Type: SHA1

Value: f1:0e:34:d0:0c:65:99:b8:86:58:f3:b3:36:21:9a:64:7a:e6:b5:03

Type: SHA256

Value: 4c:6d:3f:0d:1c:31:28:ed:09:e0:2a:78:a1:d9:e3:10:4e:db:fe:b9:c3:27:c5:e7:35:af:b3:b4:8c:d5:bd:bc

Subject: C=IN, ST=DCU, L=Gotham, O=Wayne Enterprises, OU=Batman, CN=batmanCA, emailAddress=batmanCA@bat.man

Issuer: C=IN, ST=DCU, L=Gotham, O=Wayne Enterprises, OU=Batman, CN=batmanCA, emailAddress=batmanCA@bat.man

Or from the WebUI under Data Protection > Antivirus > CAVA, where OneFS 9.14 introduces a new Certificates section which includes three tabs: Authority, Identity, and Settings:

Users with the ‘ISI_PRIV_ANTIVIRUS’ RBAC privilege can manage CA and identity certificates, including performing operations such as ‘import’, ‘edit’, ‘replace’, and ‘delete’.

Import the certificate authority (CA) under Data Protection > Antivirus > CAVA > Certificates > Authority:

Setup Identity Certificates

Next, import the identity certificate(s) as follows, specifying the full path to the cert and key files:

# isi antivirus cava certificates identity import --name=onefs_cava --certificate-path=/ifs/certs/signed_id_cert.pem --certificate-key-path=/ifs/certs/id_cert.key --skip-certificate-passphrase

Imported certificate: 70b6734399785ff5d512a789836bdf45e4ee35cf464e6da36d16cda5db6ef353

Once complete, verify that the identity certificate was imported successfully:

# isi antivirus cava certificates identity list

ID Name Not Before Not After Status

---------------------------------------------------------------------------------------------------------------------------

70b6734399785ff5d512a789836bdf45e4ee35cf464e6da36d16cda5db6ef353 onefs_cava 2026-01-21T13:51:12 2125-12-28T13:51:12 valid

---------------------------------------------------------------------------------------------------------------------------

Total: 1

# isi antivirus cava certificates identity view 70b6734399785ff5d512a789836bdf45e4ee35cf464e6da36d16cda5db6ef353

ID: 70b6734399785ff5d512a789836bdf45e4ee35cf464e6da36d16cda5db6ef353

Name: onefs_cava

Status: valid

Not Before: 2026-01-21T13:51:12

Not After: 2125-12-28T13:51:12

Description:

Fingerprints

Type: SHA1

Value: 79:68:ab:a2:5d:95:31:53:6c:90:a7:5c:99:d5:f7:95:a7:db:99:22

Type: SHA256

Value: 70:b6:73:43:99:78:5f:f5:d5:12:a7:89:83:6b:df:45:e4:ee:35:cf:46:4e:6d:a3:6d:16:cd:a5:db:6e:f3:53

Subject: C=IN, ST=DCU, L=Gotham, O=Wayne, OU=Batman, CN=robinOnefs, emailAddress=robinOnefs@bat.man

Issuer: C=IN, ST=DCU, L=Gotham, O=Wayne Enterprises, OU=Batman, CN=batmanCA, emailAddress=batmanCA@bat.man

Or from the WebUI under Data Protection > Antivirus > CAVA > Certificates > Identity:

Add CAVA server

Next, create the CAVA server configuration on the cluster:

# isi antivirus cava servers create CAVA1 https://10.10.20.50:12443/cee --enabled 1

Once done, verify that the CAVA server was added successfully:

# isi antivirus cava servers list

Server Name Server URI Enabled Server Type

------------------------------------------------------------------

CAVA1 https://10.10.20.50:12443/cee Yes CEE/CAVA

------------------------------------------------------------------

Total: 1



# isi antivirus cava servers view CAVA1

Server Name: CAVA1

Server URI: https://10.10.20.50:12443/cee

Enabled: Yes

Server Type: CEE/CAVA

Or from the WebUI under Data Protection > Antivirus > CAVA > Servers:

Note that, when adding the server, specify protocol as https:// and port number in the URI.

If needed, a CAVA server can also be removed with the following CLI syntax:

# isi antivirus cava servers delete CAVA1

Configure identity certificate ID and Enable TLS in CAVA global settings

The ‘–enforce-tls’ global configuration parameter is set to ‘false’ by default, permitting CAVA to use regular HTTP transport. By setting this flag to ‘true’, CAVA over HTTPS communication is enabled.

From the CLI:

# isi antivirus cava settings modify --enforce-tls=true --certificate-id=7ba9188a5f6bb120c8ea3e499b662b38319587804492cf444df02b33b1d4a86c

# isi antivirus cava settings view

Service Enabled: Yes

Scan Access Zones: System

IP Pool: groupnet0.subnet0.cavapool

Report Expiry: 8 weeks, 4 days

Scan Timeout: 1 minute

Cloudpool Scan Timeout: 1 minute

Maximum Scan Size: 0.00kB

Certificate ID: 7ba9188a5f6bb120c8ea3e499b662b38319587804492cf444df02b33b1d4a86c

Enforce TLS: Yes

Note that omitting the client certificate ID will result in the following error:

# isi antivirus cava settings modify --enforce-tls=true

TLS cannot be enabled as no client certificate ID is configured.

This can also be configured from the WebUI under Data protection > Antivirus > CAVA > Settings. Disabled by default, in order to activate this option, users must select an identity certificate, which is a prerequisite for enforcing TLS:

Note that attempts to activate TLS enforcement on a cluster without any identity certificates will generate the following pop-up warning:

This warning includes a link, which opens the identity certificate import and configuration portal.

Optionally add an OCSP server URI and set revocation mode:

# isi antivirus cava certificates settings view

OCSP Responder URI:

OCSP Revocation Mode: None

Strict Hostname Check: No

# isi antivirus cava certificates settings modify --ocsp-responder-uri=http://10.1.100.20 --ocsp-revocation-mode=Strict

CEE configuration

On the common event enabler (CEE) server side, there are a few prerequisites and registry configuration changes that are required to support CAVA over HTTPS.

a. First, the server needs to be running CEE version 9.2.1.0 or later.

b. The CEE port configuration needs to be configured to use HTTPS (tcp/12243) in the server’s Windows Registry. This can be achieved by running the following commands on the server:

reg add HKEY_LOCAL_MACHINE\SOFTWARE\EMC\CEE\Configuration\Security\Https /v ServerEnabled /t REG_DWORD /d 1 /f

c. Upload and configure the following CA and identity certificates on CEE server.

CA certificates:

Identity Certificates

These certificates should be copied from the PowerScale cluster to a configured path on the CEE server, for example C:\CEE_CERTS\ as in the configuration example below:

First, configure the paths for CEE identity certificate and key in the CEE server’s registry:

reg add HKEY_LOCAL_MACHINE\SOFTWARE\EMC\CEE\Configuration\Security\Https /v PrivateKey /t REG_SZ /d "C:\CEE_CERTS\cee_id_cert.key" /f

reg add HKEY_LOCAL_MACHINE\SOFTWARE\EMC\CEE\Configuration\Security\Https /v Certificate /t REG_SZ /d "C:\CEE_CERTS\cee_signed_id_cert.pem" /f

Next, install the CA certificate into the Windows certificate store:

> certutil -addstore "Root" "C:\CEE_CERTS\ca.pem"

d. Finally, confirm the CEE registry settings have been applied successfully from the Windows registry editor (regedit):

Verify CAVA status.

Once all the configuration steps above are completed, check the CAVA status and confirm ‘Good Heartbeats’ are being reported:

# isi antivirus cava status

System Status: RUNNING

Fault Message: -

CEE Version: 9.2.0.0

DTD Version: 2.5.3

AV Vendor: MS Forefront

Last Signature Update: Fri Feb 13 02:10:19 2026


# isi antivirus cava status –-servers

Server Name   Server State Good Heartbeats     Heartbeat RTT Scan RTT     Scan Requests Connections

-------------------------------------------------------------------------------------

TLS           Active        235                  14.0ms        0.0ms         0             6

--------------------------------------------------------------------------------------


# /usr/likewise/bin/lw-av active-servers

Active Anti-virus Servers:

Number of servers: 1

Server Name: CAVA1

Server Type: CAVA

Server Enabled: Yes

Server State: Active

Server URI: https://cava.lab.com:12443/cee

Good Heartbeats: 235

Heartbeat Round-Trip-Time (ms): 14

Number of Scan Requests: 0

Average Scan Round-Trip-Time (ms): 0

Target Connection Count: 6

Connection Count from Node: 6

If needed, CAVA HTTPS enforcement can also be easily disabled from the CLI via the following global settings configuration change:

# isi antivirus cava settings modify --enforce-tls=false

In addition to WebUI and CLI configuration options, OneFS 9.14 also sees the introduction and modification of the following CAVA platform API (pAPI) endpoints related to HTTPS support:

Function	API Endpoint	Status
CA certificates	/25/avscan/cava/certificates/ca /25/avscan/cava/certificates/ca/<ID>	New
Identity certificates	/25/avscan/cava/certificates/identity /25/avscan/cava/certificates/identity/<ID>	New
Certificate settings	/25/avscan/cava/certificates/settings	New
CAVA global settings	/25/avscan/cava/settings	Modified
CAVA server management	/25/avscan/servers	Modified

OneFS CELOG Bulk Event Resolution

Another feature introduced in the OneFS 9.14 release is CELOG Bulk Event Resolution. But before we get into the details, first, a quick refresher. The OneFS Cluster Event Log (or CELOG) provides a single source for the logging of events that occur on a PowerScale cluster. Events are used to communicate a picture of cluster health for various components. CELOG provides a single point from which notifications about the events are generated, including sending alert emails and SNMP traps.

Cluster events can be easily viewed from the WebUI by browsing to Cluster Management > Events and Alerts > Events group history. For example:

Or from the CLI, using the ‘isi event events view’ syntax:

# isi event events view 2.370158

           ID: 2.370158

Eventgroup ID: 271428

   Event Type: 600010001

      Message: The snapshot daemon failed to create snapshot 'Hourly - prod' in schedule 'Hourly @ Every Day': error: Name collision

        Devid: 2

          Lnn: 2

         Time: 2026-05-08T17:01:33

     Severity: warning

        Value: 0.0

In the above instance, CELOG communicates on behalf of SnapshotIQ that it’s failed to create a scheduled hourly snapshot because of an issue with the naming convention.

At a high level, processes that monitor conditions on the cluster or log important events during the course of their operation communicate directly with the CELOG system. CELOG receives event messages from other processes via a well-defined API.

A CELOG event often contains the following elements:

Element	Definition
Event	Events are generated by the system and may be communicated in various ways (email, snmp traps, etc), depending upon the configuration.
Specifier	Specifiers are strings containing extra information, which can be used to coalesce events and construct meaningful, readable messages.
Attachment	Extra chunks of information, such as parts of log files or sysctl output, added to email notifications to provide additional context about an event.

For example, in SnapshotIQ event above, we can see the event text contains a specifier and attachment that has been mostly derived from the corresponding syslog message:

# grep "Hourly - prod" /var/log/messages* | grep "2026-05-08T17:01:33"

2026-05-08T17:01:33-04:00 <3.3> a200-2 isi_snapshot_d[5631]: create_schedule_snapshot: snapshot schedule (Hourly @ Every Day) pattern created a snapshot name collision (Hourly - prod); scheduled create failed.

CELOG is a large, complex system, which can be envisioned as a large pipeline. It gathers events and statistics info on one end from isi_stats_d and isi_celog_monitor, plus directly other applications such as SmartQuotas, SyncIQ, etc. These events are passed from one functional block to another, with a database at the end of the pipe. Along the way, attachments may be generated, notifications sent, and events passed to a coalescer.

On the front end, there are two dispatchers, which pass communication from the UNIX socket and network to their corresponding handlers. As events are processed, they pass through a series of coalescers. At any point they may be intercepted by the appropriate coalescer, which creates a coalescing event and which will accept other related events.

As events drop out the bottom of the coalescer stack, they’re deposited in add, modify and delete queues in the backend database infrastructure. The coalescer thread then moves onto pushing things into the local database, forwarding them along to the master coalescer, and queueing events to have notifications sent and/or attachments generated.

The processes of safely storing events, analyzing them, deciding on what alerts to send and sending them is separated into four separate modules within the pipeline:

The following table provides a description of each of these CELOG modules:

Component	Definition
Capture	The first stage in the processing pipeline, Event Capture is responsible for reading event occurrences from the kernel queue, storing them safely on persistent local storage, generating attachments, and queueing them by priority for analysis.
Analysis	Extra chunks of information (log file extracts, sysctl output, etc) are added to alert notifications to provide additional context about an event.
Reporter	The Reporter is the third stage in the processing pipeline, and runs on only one node in the cluster. It periodically queries Event Analysis for changes and generates alert requests for any relevant conditions.
Alerter	The Alerter is the final stage in the processing pipeline, responsible for actually delivering the alerts requested by the reporter. There is a single sender for each enabled channel on the cluster.

CELOG local and backend database redundancy ensures reliable event storage and guards against bottlenecks.

By default, OneFS provides the following event group categories, each of which contain a variety of conditions, or ‘event group causes’, which will trigger an event if their conditions are met:

Event Group Category	Event Series Number
System disk events	1000*****
Node status events	2000*****
Reboot events	3000*****
Software events	4000*****
Quota events	5000*****
Snapshot events	6000*****
Windows networking events	7000*****
Filesystem events	8000*****
Hardware events	9000*****
CloudPools events	11000*****

Prior to OneFS 9.14 and the introduction of CELOG bulk event resolution, cluster events were handled individually, resulting in excessive parallel API calls, frequent timeouts, repeated database commits, and increased lock contention, which constrained system scalability under high event volumes and degraded the user experience through delays, failures, and additional manual verification. Bulk Event Resolution addresses these limitations by processing multiple events within a single atomic transaction in which all changes either succeed or fail together, ensuring data integrity while minimizing redundant API activity and providing consistent rollback on failure. This capability significantly reduces database lock contention, improves stability and throughput during high event loads, and accelerates the completion of administrative operations. The feature is fully integrated across the WebUI, PAPI, and CLI, enabling consistent interactive, programmatic, and script‑driven workflows, and delivering a more reliable and efficient operational experience with reduced need for manual validation.

Under the hood, CELOG has the following high level architecture:

The CELOG bulk resolve workflow itself operates as follows:

Both the WebUI and CLI route bulk event requests through the isi_papi_d service, where the event occurrences handler orchestrates the overall operation. The handler begins by authenticating the request and validating the input, then verifies all provided event group IDs and filters out any that are invalid. Only confirmed IDs are allowed to proceed, and these are deterministically grouped and executed together within a single transaction. The workflow is initiated through a request to the bulk endpoint, and the remaining valid events are processed as part of one automated transaction. If all steps complete successfully, the API returns a success response; if any step fails, the transaction is rolled back and an appropriate error is returned. No additional configuration is required, as bulk resolution leverages the existing PUT API and follows the same authentication and permission model. When multiple event IDs are supplied as an array, the backend automatically processes them in bulk mode, without requiring feature flags, database changes, or service configuration updates.

Bulk event actions can be managed from the OneFS WebUI under ‘Cluster management > Events and alerts > Event group history’:

Multiple event groups can be selected and resolved or ignored as a single collective action:

Or from the OneFS ClI using the following syntax:

# isi event groups bulk -–resolved true -–eventgroup_ids=<x,x,x,...>

After the action is confirmed, the WebUI submits the bulk request and processes it asynchronously in the background:

Upon successful completion of the bulk resolution operation, the WebUI displays a success banner to confirm the result:

If the request includes invalid event IDs, those IDs are filtered out during validation, and the response clearly identifies which events were successfully resolved and which were skipped. For example, from the CLI:

# isi event groups bulk -–resolved true -–eventgroup_ids=55,56,250

Resolved event-groups: 55,56

Skipped event-groups: 250

#

By consolidating processing into a single atomic operation, redundant requests are eliminated, performance is improved, and consistent, reliable outcomes are maintained even under high‑load conditions. This approach delivers a faster experience for cluster administrators, increases overall system resilience, and simplifies end‑to‑end automation.

If a bulk request fails, begin by confirming that the payload includes a valid array of event IDs and that the appropriate permissions are in place. Next, review the API log messages at /var/log/isi_papi_d.log. Because execution is atomic for the validated set, any failure during processing triggers a rollback of the entire transaction, with the API response and logs providing clear indicators of the underlying cause.

OneFS SMB Durable Handles

Introduced in OneFS 9.14, Durable Handles are a feature of the SMB2 and SMB3 protocols in which, when a client opens a file, it receives an opaque file handle that can be marked as durable. A durable handle allows the open file state to survive a temporary client disconnect, such as a brief network glitch, transient cluster interruption, wi‑fi connectivity drop, or client sleep event, enabling the SMB client to reconnect and reclaim the same handle within a server-defined grace period. From the application’s perspective, the file remains open and normal I/O continues without errors or forced reopens. Durable handles are requested by the client at open time using specific CREATE request contexts, such as SMB2_CREATE_DURABLE_HANDLE_REQUEST and the corresponding reconnect variants.

Durable file handle support in OneFS 9.14 and later provides the following attributes and benefits:

Attribute	Details
Availability	Supported with SmartConnect static IP pools in OneFS 9.14.
Client	Supports v2 Durable Handles (SMB 3+ dialects).
Context	Client can reconnect to the same file without losing its context.
Cost	Lightweight, avoiding the performance overhead of full Continuous Availability.
Persistence	Allows the cluster to keep the SMB file handle alive briefly.

Under the hood, OneFS durable handles employ the following fundamental architecture:

At the protocol level, a durable handle is established when the client sends an SMB CREATE request that includes a durable handle create context. If the server and share configuration permit durable handles, the cluster’s SMB server marks the open accordingly and returns a persistent handle identifier along with a reconnect token, such as a create GUID, which can later be used to reclaim the handle. During normal operation, the client performs READ, WRITE, and locking operations using that handle over its SMB session.

If the client disconnects unexpectedly and the underlying TCP connection and SMB session are lost, the cluster retains the durable handle’s state, including open and lock information, for a defined timeout period rather than closing it immediately. When the client reconnects, it reestablishes an SMB session and tree connection and issues a new CREATE request with a durable handle reconnect context that includes the original token:

If the cluster still holds the handle and the reconnect request matches, it rebinds the durable handle to the new session and I/O seamlessly resumes. If the client does not reconnect before the timeout expires, the cluster closes the open, releases any associated locks, and the handle can no longer be reclaimed.

As such, durable handles are intended to protect against short client-side outages and assume that the SMB server instance remains available. They do not survive a full-on SMB server and/or node failure unless additional high-availability mechanisms such are in place.

One such mechanism is SMB Continuous Availability (CA),a related and complimentary feature of the SMB3 protocol, which extends this model to provide high availability for clustered SMB file servers by allowing open file state and I/O to survive planned and unplanned server or node failovers.

Focus	SMB Durable Handles	SMB Continuous Availability
What	Best-effort resilience to transient client connectivity issues while assuming the server remains running.	Explicitly designed to tolerate server or node failures in a clustered environment.
How	Stores open state locally for a limited time and is negotiated per file open.	Requires share-level configuration and supporting cluster infrastructure to persist state across nodes.
Where	General file access scenarios where brief network disruptions are expected, such as user desktops or laptops.	CA shares are intended for critical workloads that demand uninterrupted access through failover events.
Version	OneFS 9.14 onwards.	OneFS 8.0 onwards.
Type	Lighter weight, without the write-stability requirements of CA	Uses persistent handles.
Performance	Much lower performance impact.	Higher performance cost due to stable write requirement.
Realm	Granted only when client connects through a static IP pool in OneFS 9.14.	Works with both dynamic and static IP pools.
Status	Enabled by default.	Disabled by default.

PowerScale has supported Continuous Availability since OneFS 8.0, and it is enabled at the share level by marking a share as continuously available, relying on persistent handle and lease state that is stored or replicated in a highly available, cluster-consistent manner. This allows the SMB server resource to move to another node during a failover while clients transparently reconnect and continue I/O without application disruption. CA combines persistent or durable v2 durable handles with clustering and witness mechanisms to coordinate reconnection and ensure strict data consistency semantics across nodes.

In practice, durable handles and SMB3 continuous availability differ in scope and guarantees. Durable handles provide best-effort resilience to transient client connectivity issues while assuming the server remains running, whereas CA is explicitly designed to tolerate server or node failures in a clustered environment. Durable handles store open state locally for a limited time and are negotiated per file open, while CA requires share-level configuration and supporting cluster infrastructure to persist state across nodes. As a result, durable handles are commonly used for general file access scenarios where brief network disruptions are expected, such as user desktops or laptops, while CA shares are intended for critical workloads like Hyper‑V or SQL Server over SMB that demand uninterrupted access through failover events.

On a PowerScale cluster running OneFS 9.14 or later, durable handles are supported as part of standard SMB2 and SMB3 operation to help clients recover from short connectivity disruptions. SMB3 continuous availability is provided through CA-enabled SMB shares, where OneFS ensures that file, handle, and share state are protected across service or node failovers within the cluster, allowing appropriately capable SMB3 clients to resume I/O transparently. As such, all CA shares make use of durable or persistent handle semantics internally, but durable handles alone do not imply continuous availability; CA represents a share-level, cluster-integrated high-availability capability, whereas durable handles are a file open resiliency mechanism.

Enabled by default in OneFS 9.14, durable file handles are only granted to clients that request them via an SMB session established from a static-IP SmartConnect network pool. Note that durable file handles on dynamic pools will be supported in a future OneFS release.

Durable handles can be configured on a per-share basis from the CLI, WebUI and platform API, as follows:

During creation:

# isi smb shares create --durable-handle-enabled <true | false>

Or from the OneFS WebUI under Protocols > Windows sharing (SMB) > SMB shares > Create a SMB share:

Durable handles support can modified on an existing share:

# isi smb shares modify --durable-handle-enabled <true | false>

A timeout can be configured with the ‘–ca-timeout’ flag, and the default duration is 120 seconds:

# isi smb shares create --durable-handle-enabled true --ca-timeout

As noted previously, SMB durable handles are enabled by default in OneFS 9.14:

# isi smb settings shares view | grep -i dura

Durable Handle Enabled: Yes

They can also be easily disabled globally as follows:

# isi smb settings shares modify --durable-handle-enabled 0

# isi smb settings shares view | grep -i dura

Durable Handle Enabled: No

Or from the WebUI under Protocols > Windows sharing (SMB) > Default share settings:

Additionally, Continuous Availability can also now be enabled or disabled on an existing share as follows:

# isi smb shares modify --continuously-available <true | false>

When enabling CA on a share, the following confirmation popup is displayed, advising of the potential write performance implications when activating Continuous Availability:

Note that, due to the IP allocation cache and timer of entry, a change in IP allocation method from dynamic to static or vice versa can potentially result in up to a five minute delay before durable handles configuration changes (enable or disable) are enacted.

If and when it comes to investigating and troubleshooting durable handles, the /var/log/lwiod.log is a good place to start. Beyond that, network packet captures can also be extremely helpful at understanding and verifying the SMB sessions at a protocol request level. When examining a pcap of an SMB session with a network sniffing tool (e.g. Wireshark), the presence of the ‘SMB2_CREATE_DURABLE_HANDLE_REQUEST_V2’ request with a ‘Persistent Handle’ flag of value zero (“0”) indicates that this is a Durable Handle request:

Alternatively, if the persistent handle flag contains a value of one (“1”), this indicates that the request is for a persistent handle, with the likelihood that SMB CA is involved.

OneFS S3 Bucket Lifecycle Configuration and Use

As we saw in the previous article, OneFS 9.14 adds S3 Lifecycle Management, which allows administrators to define policies that automate object management within PowerScale S3 buckets. These policies enable the automatic deletion of objects based on criteria such as age, size, or key prefix and are applied uniformly to both existing and newly created objects in a bucket. Lifecycle processing is handled by the OneFS Job Engine, which runs daily to evaluate configured rules and generates per‑bucket tasks that traverse bucket directories and remove objects that meet the defined conditions.

To support this new functionality, the S3 API support in OneFS 9.14 and later now includes the following endpoints:

API Endpoint	Description
PutBucketLifecycleConfiguration	Sets the lifecycle configuration for the bucket and replaces any existing one. User must be the bucket owner to create the lifecycle configuration.
GetBucketLifecycleConfiguration	Returns the current lifecycle configuration for the bucket. User must be the bucket owner to get the lifecycle configuration. Will return a ‘NoSuchLifecycleConfiguration’ error if a configuration is not found.
DeleteBucketLifecycle	Deletes the lifecycle configuration for the bucket. User must be the bucket owner to delete the lifecycle configuration.
AbortIncompleteMultipartUpload
· DaysAfterInitiation	Number of days after the system aborts an incomplete MPU.

Plus, the following S3 endpoints are also updated in 9.14 and require the following read and write permissions:

S3 Endpoint	Read Permission	Write Permission
CompleteMultiPartUpload	x	x
CopyObject	x	x
GetObject	x
HeadObject	x
PutObject	x	x

In this second article in the series, we’ll walk through a simple example demonstrating how to configure and validate S3 bucket lifecycle management in OneFS 9.14 and later releases.

To configure the feature, an S3 command must be sent to the cluster. A simple way to accomplish this without writing code is by using a utility like the ‘s3cmd’ tool. This tool provides a Python script which can be executed directly on a PowerScale cluster.

The s3cmd tool’s zip file can be downloaded (or copied) to a directory on the cluster and unpacked with the following CLI command:

# unzip s3cmd-2.4.0.zip

Once unzipped, a new subdirectory named ‘s3cmd-2.4.0’ (with the corresponding version-specific suffix) is created. The working directory should be changed to this new subdirectory so that the ‘s3cmd’ script itself can be executed. The contents of the directory are as follows:

# ls

S3cmd-2.4.0     s3cmd-.2.4.0.zip

#cd s3cmd-2.4.0

# ls

INSTALL.md      NEWS            S3              s3cmd.egg-info

LICENSE         PKG-INFO        s3cmd           setup.cfg

MANIFEST.in     README.md       s3cmd.1         setup.py

Next, a test bucket is configured on the cluster, and an access key and secret are generated for that bucket. In this example, the ‘root’ user and the ‘System’ multi-tenant access zone are used for access. This process begins by creating the test directory, verifying that the S3 service is enabled, and disabling HTTPS-only access.

# mkdir -p /ifs/s3lifecycle
# isi s3 settings global modify --service=true --https-only=false

Next, the bucket is created, in conjunction with the access key and secret. For example:

# isi s3 buckets create --name=s3life --path=/ifs/s3lifecycle --owner=root
# isi s3 keys create --user=root --force --show-key > /ifs/root-s3.keys
# isi s3 buckets list
Bucket Name  Path             Owner  Object ACL Policy  Object Lock Enabled  Lock Protection Mode  Description
---------------------------------------------------------------------------------------------------------------
s3life       /ifs/s3lifecycle root   replace            No                   -
---------------------------------------------------------------------------------------------------------------
Total: 1
# cat /ifs/root-s3.keys
       Access ID: 1_root_accid
      Secret Key: 0t1O0URz0H5pef6Wn6P6L9BKc8Ad
       Timestamp: 2026-05-14T14:27:36
  Old Secret Key: ****************************
Old Key Timestamp: 2026-05-12T17:25:02
  Old Key Expiry: 2026-05-14T14:37:36

After obtaining the access ID and secret, s3cmd can be configured with these credentials and the appropriate endpoint parameters to simplify command execution. Site-specific configuration parameters that will need to be specified include:

Access Key
Secret Key
S3 Endpoint
DNS-style bucket+hostname:port

If HTTPS is preferred, the S3 endpoint port should be changed from 9020 to 9021, and Y should be selected when prompted to use the HTTPS protocol. HTTP may be used instead when packet‑level debugging is required, as it allows network sniffing tools such as Wireshark to capture traffic more efficiently and comprehensibly.

# python3 s3cmd --configure

Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.

Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables.
Access Key: 1_root_accid
Secret Key: 0t1O0URz0H5pef6Wn6P6L9BKc8Ad
Default Region [US]:

Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3.
S3 Endpoint [s3.amazonaws.com]: 127.0.0.1:9020

Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used
if the target S3 system supports dns based buckets.
DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: s3://%(bucket)

Encryption password is used to protect your files from reading
by unauthorized persons while in transfer to S3
Encryption password:
Path to GPG program:

When using secure HTTPS protocol all communication with Amazon S3
servers is protected from 3rd party eavesdropping. This method is
slower than plain HTTP, and can only be proxied with Python 2.7 or newer
Use HTTPS protocol [Yes]: no

On some networks all internet access must go through a HTTP proxy.
Try setting it here if you can't connect to S3 directly
HTTP Proxy server name:

New settings:
Access Key: 1_root_accid
Secret Key: 0t1O0URz0H5pef6Wn6P6L9BKc8Ad
Default Region: US
S3 Endpoint: 127.0.0.1:9020
DNS-style bucket+hostname:port template for accessing a bucket: s3://%(bucket)
Encryption password:
Path to GPG program: None
Use HTTPS protocol: False
HTTP Proxy server name:
HTTP Proxy server port: 0

Test access with supplied credentials? [Y/n] y
Please wait, attempting to list all buckets...
Success. Your access key and secret key worked fine :-)

Now verifying that encryption works...
Not configured. Never mind.

Save settings? [y/N] y
Configuration saved to '/root/.s3cfg'

Utilities such as the ubiquitous ‘dd’ CLI command are useful for easily and rapidly generating some test data files. Similarly, the ‘touch’ CLI command can be used to alter the ‘atime’ (last access) and ‘mtime’ (last modified) timestamps of these files. For example:

# dd if=/dev/zero of=/ifs/s3lifecycle/smlfl1 bs=1k count=1
# dd if=/dev/zero of=/ifs/s3lifecycle/smlfl2 bs=1k count=1
# touch -A -400000 /ifs/s3lifecycle/smlfl2*
# dd if=/dev/zero of=/ifs/s3lifecycle/bigfl1 bs=1M count=1
# dd if=/dev/zero of=/ifs/s3lifecycle/bigfl2 bs=1M count=1
# touch -A -250000 /ifs/s3lifecycle/big*
# dd if=/dev/zero of=/ifs/s3lifecycle/notrmv1 bs=1M count=1
# dd if=/dev/zero of=/ifs/s3lifecycle/notrmv2 bs=1M count=1

# ls -l /ifs/s3lifecycle/

total 4147
-rw-------     1 root  wheel  1048576 May 13 13:48 bigfl1
-rw-------     1 root  wheel  1048576 May 13 13:48 bigfl2
-rw-------     1 root  wheel  1048576 May 14 14:48 notrmv1
-rw-------     1 root  wheel  1048576 May 14 14:48 notrmv2
-rw-------     1 root  wheel     1024 May 12 22:48 smlfl1
-rw-------     1 root  wheel     1024 May 12 22:48 smlfl2
# date
Thu May 14 14:49:26 GMT 2026
 A total of six test files are created, two of which are 1 MB in size and have a modified time more than 1 day prior, two files that are 1 MiB in size but modified within 1 day, and two small 1 KB files with modification timestamps from 2 days prior.

Next, add an XML, such as below, into a text file on the cluster. In the following example, the file is named /ifs/lifecycle.xml

<LifecycleConfiguration>
 <Rule>
   <Filter>
      <ObjectSizeGreaterThan>10000</ObjectSizeGreaterThan>
   </Filter>
   <Status>Enabled</Status>
   <Expiration>
     <Days>1</Days>
   </Expiration>
 </Rule>
</LifecycleConfiguration>

Once the XML request body has been added to the lifecycle.xml file, the ‘s3cmd’ can be run to ‘put’ the policy on the bucket. Once done, run a get lifecycle command to confirm that the policy has been applied correctly.

# python s3cmd setlifecycle /ifs/lifecycle.xml s3://s3life
s3://s3life/: Lifecycle Policy updated
onefs914-1# python s3cmd getlifecycle s3://s3life
<?xml version="1.0" ?>
<LifecycleConfiguration>
       <Rule>
               <Status>Enabled</Status>
               <Expiration>
                       <Days>1</Days>
               </Expiration>
               <Filter>
                       <ObjectSizeGreaterThan>10000</ObjectSizeGreaterThan>
               </Filter>
       </Rule>
</LifecycleConfiguration>

Next, compare the data from pre and post S3Lifecycle job runs:

# ls /ifs/s3lifecycle
bigfl1        bigfl2        notrmv1       notrmv2       smlfl1       smlfl2
# isi job start S3Lifecycle
Started job [25]
# ls /ifs/s3lifecycle
notrmv 1       notrmv l2       smlfl1       smlfl2

After the S3Lifecycle job is initiated and allowed to run for a short period, the results show that two of the larger files have been deleted while the remaining files remain unchanged. This behavior is expected and reflects the configured expiration and filter criteria defined in the XML policy, under which only objects larger than 10,000 bytes and older than one day qualify for automatic deletion.

If and when it comes to investigating and troubleshooting S3 lifecycle, the following issues and possible resolutions may be useful:

Issue	Background Potential Resolution
Unable to create lifecycle policy on bucket	Bucket owner must be the same as directory owner
Objects marked for deletion have not been deleted	Ensure object or bucket does not have lock protection enabled
	Job for deletion has not been run
	Job for deletion is still in progress

Beyond this, S3 operations are logged in the S3 log file ‘/var/log/s3.log’. Similarly, S3 Job Engine job operations and deleted objects are logged in ‘/var/log/isi_job_d_s3_lifecycle.log’.

OneFS S3 Bucket Lifecycle Management

Introduced in OneFS 9.14, the PowerScale S3 Lifecycle Management feature enables policy‑driven object management within PowerScale S3 buckets by supporting the automated deletion of objects based on administrator‑defined criteria such as object age, size, or key prefix. These lifecycle policies are applied consistently to both existing and newly created objects within a bucket. Backend processing is managed by the OneFS Job Engine, which performs daily evaluations of configured lifecycle rules and creates per‑bucket tasks to traverse bucket directories and remove objects that meet the specified conditions.

Newly introduced S3 API support in OneFS 9.14 and later includes the following endpoints:

API Endpoint	Description
PutBucketLifecycleConfiguration	Sets the lifecycle configuration for the bucket and replaces any existing one. User must be the bucket owner to create the lifecycle configuration.
GetBucketLifecycleConfiguration	Returns the current lifecycle configuration for the bucket. User must be the bucket owner to get the lifecycle configuration. Will return a ‘NoSuchLifecycleConfiguration’ error if a configuration is not found.
DeleteBucketLifecycle	Deletes the lifecycle configuration for the bucket. User must be the bucket owner to delete the lifecycle configuration.

Additionally, the following S3 endpoints are also updated in the OneFS 9.14 release and require the following read and write permissions:

S3 Endpoint	Read Permission	Write Permission
CompleteMultiPartUpload	x	x
CopyObject	x	x
GetObject	x
HeadObject	x
PutObject	x	x

Note that all the above operations have an updated ‘x-amz-expiration’ response header if ‘objectexpiration’ has been configured. For example:

HTTP/1.1 200 OK

…

x-amz-expiration: expiry-date="Wed, 30 Apr 2027 00:00:00 GMT",rule-id=“3024"

Content-Length: 434234

Content-Type: text/plain

S3 bucket lifecycle behavior In OneFS 9.14 enables the bucket owner to define lifecycle management policies that are enforced using root‑level credentials. Lifecycle processing respects object lock protections, ensuring that immutable objects are preserved and not subject to deletion. Lifecycle rules may define expiration based on either a relative time period measured in days since the object’s last modification or an absolute date and timestamp, and can incorporate filtering criteria based on object size and key prefix, including support for backdated expiration rules. Each bucket supports a maximum of 1,000 lifecycle rules.

Note that the OneFS CLI, WebUI, or platform API do not currently support bucket lifecycle configuration, which can only be performed though the S3 API. Additionally, object tag–based filtering is not supported in OneFS 9.14, nor are object transition policies, ‘ExpiredObjectDeleteMarker’ configurations, noncurrent version expiration or transition rules, or versioned objects.

Under the hood, the core S3 bucket lifecycle management architecture is as follows:

At a high level, an S3 client sends a request containing the desired lifecycle rule(s), and the cluster’s S3 protocol head saves that into OneFS’ Tardis configuration database. The OneFS Job Engine retrieves the lifecycle configuration and then proceeds to walk the bucket and file structure, deleting files based upon the expressed rules.

The lifecycle processing is handled by the ‘S3Lifecycle’ job, which, by default, runs daily at 1:00 AM. Scheduling and priority for this job are optionally configurable through the OneFS CLI by a privileged local user:

# isi job types view S3Lifecycle

         ID: S3Lifecycl

Description: Manage S3 object lifecycle per bucket lifecycle policy.

    Enabled: Yes

     Policy: LOW

   Schedule: every day at 1:00am

   Priority: 6

During execution, the job logs all deleted objects and generates an S3Lifecycle job report, which can be viewed with the following OneFS CLI command:

# isi job reports view <job id>

Job report output is along the lines of the following:

S3Lifecycle[22] phase 1 (2026-04-30T11:18:06)

---------------------------------------------

Files 3

Directories 1

Apparent size 5300

Physical size 52224

Objects Deleted 2

Objects Evaluated 3

Objects No Action 1

JE/Error Count 0

JE/Time elapsed 3 seconds

JE/Time working 3 seconds



S3Lifecycle[22] Job Summary

---------------------------

Final Job State Succeeded

Phase Executed

As above, this report summarizes the number of objects deleted and skipped, and logs detailed information about all deleted objects to /var/log/isi_job_d_s3_lifecycle.log.

The following table outlines the parameters that are supported for use within the lifecycle XML configuration body.

Configuration	Description
ID	Unique identifier for the rule.
Status	If ‘Enabled’, the rule is currently being applied. If ‘Disabled’, the rule is not currently being applied. This is a mandatory field.
Expiration
Date	Specifies expiration of the object in the form of date timestamp.
Days	Specifies expiration of the object in the form of days.
Filter
ObjectSizeGreaterThan	Minimum object size to which the rule applies.
ObjectSizeLessThan	Maximum object size to which the rule applies.
Prefix	Prefix identifying one or more objects to which the rule applies.
And	Apply a logical ‘AND’ to two or more rules inside the operator.

The ‘PutBucketLifecycleConfiguration’ request takes the following form:

PUT /?lifecycle HTTP/1.1

Host: Bucket.s3.amazonaws.com

x-amz-expected-bucket-owner: ExpectedBucketOwner

<?xml version="1.0" encoding="UTF-8"?>

<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">

    <Status>Enabled</Status>

        <Rule>

            …

        </Rule>

        <Rule>

            …

        </Rule>

</LifecycleConfiguration>

As for the rules themselves, these can be temporal in nature. For example, expressing expiration as a date and/or time box:

<Rule>

    <Expiration>

    <Date>2029-03-02T12:30:00</Date>

    <Days>365</Days>

    </Expiration>

</Rule>

The ‘date’ tag can be used to specify a time stamp, so objects in the bucket that exist during that date expression (for instance, 12:30 on March 2nd, 2029 in the example above) will be deleted. The ‘date’ tag can include a date in the past. In addition to an explicit date, a rule can also include a ‘days’ tag, which, in the example above, targets objects that haven’t been modified in 365 days, marking them for deletion.

Rules can also specify maximum and/or minimum object size in bytes. For example, greater than 1KB (1024) bytes but less than 50KB (51200 bytes):

<Rule>

    <Filter>

    <ObjectSizeGreaterThan>1024</ObjectSizeGreaterThan>

    </Filter>

<Rule>

</Rule>

    <Filter>

    <ObjectSizeLessThan>51200</ObjectSizeLessThan>

    </Filter>

</Rule>

Rules filters can also specify data locality. For example:

<Rule>

<Filter>

    <And>

        <ObjectSizeGreaterThan>18124</ObjectSizeGreaterThan>

        <ObjectSizeLessThan>92686</ObjectSizeLessThan>

<Prefix>/data/path/</Prefix>

    </And>

</Filter>

</Rule>

Note that, when specifying more than one filter, they must be wrapped in the <And> element, as above.

A bucket’s lifecycle configuration can be queried with the ‘GetBucketLifecycleConfiguration’ request. For example:

GET /?lifecycle HTTP/1.1

Host: Bucket.s3.amazonaws.com

x-amz-expected-bucket-owner: ExpectedBucketOwner

In contrast, a lifecycle can also be removed with the ‘DeleteBucketLifecycle’ request. For example:

DELETE /?lifecycle HTTP/1.1

Host: Bucket.s3.amazonaws.com

x-amz-expected-bucket-owner: ExpectedBucketOwner

In addition to bucket lifecycles, OneFS 9.14 also provides lifecycle management for incomplete S3 MPU operations. Specifically, this is the ability to craft lifecycle rules which clean up incomplete multipart uploads, helping reclaim space from abandoned large object transfers.

Configuration	Description
AbortIncompleteMultipartUpload
DaysAfterInitiation	Number of days after the system aborts an incomplete MPU.

Rule parameters include the number of days after MPU initiation and the data path prefix. For example:

<Rule>

    <AbortIncompleteMultipartUpload>

        <DaysAfterInitiation>7</DaysAfterInitiation>

    </AbortIncompleteMultipartUpload>

<Filter>

        <Prefix>/path/to/data/</Prefix>

    </Filter>

</Rule>

Note that the ‘DaysAfterInitiation’ parameter is limited in scope to incomplete multipart uploads (MPUs) and does not apply to standard objects or completed MPU uploads, while the lifecycle expiration rules similarly exclude incomplete MPUs.

In the next article in this series, we’ll look at some practical examples of how to configure, use, and validate S3 bucket lifecycle management in OneFS 9.14 and later releases.

OneFS Inline Compression Versus Incompressible Data

PowerScale OneFS inline compression is a native data reduction feature that operates in the write path to improve storage efficiency by compressing data before it is written to disk. This helps OneFS drive data reduction efficiencies to support PowerScale’s contract-backed 2:1 data reduction ratio (DRR) guarantee. Plus, many workloads and deployments achieving substantially higher ratios in practice, including EDA up to 6.5:1 compression and life sciences up to 4:1 compression.

As part of a multi-stage pipeline that includes zero-block elimination and inline deduplication, OneFS evaluates data in 128 KB regions and selectively compresses those that achieve useful space savings, storing them as compact encoded containers while leaving incompressible data unmodified.

Because this process occurs inline, it not only influences capacity utilization, but also CPU consumption, write latency, and on-disk data layout. When used with suitably compressible workloads, inline compression can significantly reduce physical I/O and increase effective cluster capacity. However, its impact on performance and data placement calls for understanding and careful consideration, particularly for workloads with low compressibility or frequent small updates.

Inline compression is often a beneficial optimization, with modest additional CPU overhead reducing physical I/O demands while increasing effective storage capacity. For many workloads, this expectation holds true. However, for incompressible data sets, enabling OneFS inline compression can negatively impact performance as compared to running with no compression at all, and the penalty doesn’t abate instantly upon disabling the feature.

So for data such as encrypted backups, already-compressed media, highly random content, or when randomly overwriting data that was previously written in a compressed layout, performance can degrade in ways that are predictable once the underlying on-disk activities are understood. The crux is that inline compression is not just a logical efficiency feature. It changes how data is laid out and how writes are handled. That means the impact is not limited to capacity savings, since it can also influence write throughput, overwrite behavior, and the cost of small updates into existing files.

Without inline compression, OneFS stores data in 8 KB blocks, grouped into protection groups (e.g. 4+2 on a six node cluster/pool with a +2n FEC protection policy).

A 128 KB logical range (stripe unit) comprises 16 × 8 KB blocks laid out as plain data or parity blocks. Conversely, with inline data reduction enabled, every incoming write flows through a multi-stage pipeline before it ever touches disk:

Order	Stage	Description
1	Zero block removal	Fully-zero 8 KB blocks are detected and stored as sparse references rather than real blocks.
2	Inline dedupe	8 KB blocks are fingerprinted and matched against an in-memory hash table; duplicates are collapsed to shared references.
3	Inline compression	Surviving 128 KB regions are handed to the compression engine (zlib or lz4 depending on platform), which decides per-region whether the result is small enough to keep.
4	Protection and layout	Whatever comes out the other side (compressed container, deduped reference, or plain blocks) is then protected with FEC and laid out across the cluster.

Note that every stage in this pipeline consumes cluster resources (CPU, etc) and adds write-path latency on every write – before OneFS knows whether it will deliver a benefit. Two things change on disk as a result:

Data is chunked into 128 KB compression regions. Each region maps to 16 logical 8 KB blocks, and this is what the compression engine actually works on.
Compressible regions are stored as compressed containers. When a 128 KB region compresses well enough, OneFS encodes it into a smaller bytestream using a node-specific algorithm (zlib, lz4, etc., fixed per platform and OneFS release). That region is no longer stored as 16 individual blocks, instead becoming a single compressed container.

Compressed regions are reported via the ‘isi get -DDO’ CLI command, as follows:

lbn 160: 10+2/2

...

5,5,1403920384:8192[DIRTY,COMPRESSED]#2

(sparse)[DIRTY,COMPRESSED]#3

...

{ebns={160,161},lbns={160,161,162,163,164},

encoded_size=10762,encoded_offset=0,

decoded_blocks=5,format=zlib}

The ‘encoded_size’ parameter returns the compressed byte count while ‘decoded_blocks’ reports how many logical 8 KB blocks live inside. The file looks identical from the outside, but, physically, the layout is where things get interesting since OneFS doesn’t know data is incompressible until it attempts to compress it. So even for data that will never compress, the engine still:

Takes the 128 KB chunk.
Runs it through the compression algorithm.
Checks if compressing the chunk does save at least one full 8 KB block.
Falls back to writing it as a chunk of normal, uncompressed 8 KB blocks.

On disk, those blocks show up tagged as ‘INCOMPRESSIBLE’. For example:

lbn 0: 10+2/2

...

6,2,1176543232:8192[DIRTY,INCOMPRESSIBLE]#16

...

1,5,1075085312:8192[DIRTY,INCOMPRESSIBLE]#12

No {encoded_size=…, decoded_blocks=…, format=…} stanza — just plain blocks. But the cluster still expended CPU and latency as a cost of trying. That overhead is real, and on CPU-constrained A-series archive nodes, it can show up directly in throughput.

That said, OneFS inline compression can be easily disabled entirely via the following CLI syntax:

# isi compression settings modify --enabled=False

With the ‘–enabled’ parameter set to ‘False’, new writes automatically skip the compression stage completely. Each 128 KB region goes straight to disk as plain 8 KB blocks — no attempt, no fallback, no overhead. The on-disk layout looks identical to the incompressible case above (without the INCOMPRESSIBLE tag), but the ‘attempted compression’ overhead never affected the write path.

However, note that disabling compression is not retroactive. Disabling compression afterwards doesn’t decompress or rewrite (re-lay-out) those regions in the background. That’s why you can disable compression cluster-wide and still see what looks like ‘compressed behavior’ on overwrites to older data. These regions can be identified by their ‘[COMPRESSED]’ tags and the compression metadata block in output from the ‘isi get -DDO’ CLI command.

With compression enabled, every write still flows through the full pipeline (zero-block removal > dedupe > compression) regardless of whether the data ends up compressed or uncompressed on disk. Hence, the latter case incurs CPU overhead and write-path latency for no additional space savings.

On platforms like the PowerScale A310, this translates to a measurable throughput drop for workloads dominated by encrypted backups, already-compressed files, or just very random data. Disabling compression and re-running against fresh data will often result in write throughput increasing noticeably.

Note that the penalty for incompressible data is often more significant than for genuinely compressible or dedupe-able data. When data compresses or deduplicates well, inline data reduction actually reduces the amount of physical I/O hitting the drives, due to fewer blocks to write, less parity to compute, less backend traffic. The CPU overhead is more than offset by the I/O savings, resulting in better throughput and lower latency than without data reduction at all. In contrast, uncompressible data yields none of that I/O relief, but only the overhead.

When over-writing existing compressed data, a compressed region is a 128 KB atomic unit. This means that even a 4 KB update in the middle of a region will force OneFS to:

Read the full 128 KB compressed container.
Decompress it in memory.
Merge the changed bytes.
Re-encode and rewrite the full 128 KB region.

This read–decompress–modify–rewrite (R-DM-W) penalty applies even with inline compression disabled compression. If the region on disk was written compressed, the overwrite path still must treat it that way until the full file is physically rewritten with an uncompressed layout.

Even over-writing regions that are ‘incompressible’ incur an overhead. Because a file written with compression enabled may contain a mix of compressed and uncompressed (incompressible) regions, OneFS has to route partial-chunk overwrites through the compression stage to stay consistent across the 128 KB boundary. It can’t know ahead of time whether a given 128 KB range is fully uncompressed, so it has to treat the whole chunk conservatively.

This means that even ‘incompressible regions (i.e. data that never compressed in the first place) still carry a partial-overwrite penalty if compression was enabled when the file was initially written. Only a full 128 KB chunk replacement can safely bypass the compression engine entirely for that region, which is usually achieved by overwriting the file completely. This often manifests itself as:

New writes returning to full speed immediately upon disabling compression.
Overwrites remaining sluggish until the underlying data is physically rewritten as uncompressed — regardless of whether the on-disk blocks are tagged ‘[COMPRESSED]’ or ‘[INCOMPRESSIBLE]’.

Investigating and troubleshooting unexpected performance with inline compression typically involves:

First, confirming whether the data is actually compressible by verifying data reduction statistics at the cluster or path level:

# isi statistics data-reduction view

# isi compression stats view

# isi_storage_efficiency /ifs/path/to/file

If compression is ‘enabled’ but little to no space savings are evident, the data almost certainly isn’t compressible.

Next, check the actual on-disk layout:

# isi get -DDO /ifs/path/to/file

Specifically, the ratio of ‘[COMPRESSED]’ to ‘[INCOMPRESSIBLE]’ tags in the ‘PROTECTION GROUPS’ section, and the ‘Metatree logical blocks:’ summary at the end.

Separate new writes from overwrites when testing:

This is the most important thing to get right when benchmarking. Use fresh test files written after any compression settings reconfiguration, and run tests in two distinct phases:

Phase 1 — Initial write (e.g. –overwrite=0 in fio)
Phase 2 — Overwrite / update (e.g. –overwrite=1 with random updates into the same files)

The following is a simple example ‘fio’ test overwriting an existing file at random offsets:

fio --name=random-update \

--filename=data.bin \

--rw=randwrite \

--bs=1M \

--size=10G \

--direct=1 \

--overwrite=1 \

--ioengine=libaio \

--iodepth=64

Then compare:

Test	What it reveals
Initial writes: compression on vs. off	The cost of the “try to compress” overhead
Overwrites into files written with compression on	The R-DM-W penalty on legacy compressed regions
Overwrites into files written with compression off	Your true baseline uncompressed overwrite performance

Rule out unrelated bottlenecks.

Before assuming a compression issue, validate that the cluster is not hitting CPU, disk, or network limits for other reasons:

# isi statistics system list --nodes=all --format=top

# isi statistics drive list --format=top --sort=Queued

# isi statistics client list --format=top

# isi statistics protocol list --format=top

The goal is to distinguish between hitting the limits of the compression design from saturating the hardware. Having confirmed that legacy compressed layout is hurting performance, disabling compression, then physically rewriting the data is generally the cleanest approach. The following options can be used to ‘rehydrate’ the data back to a regular uncompressed layout.

1. Disable compression globally (or via file pool policy)

# isi compression settings modify --enabled=False

2. Or rewrite via intra-cluster copy using the ‘cp’ CLI command:

# cp -a /ifs/source/path /ifs/target/path

3. Use SyncIQ or SmartSync to replicate/copy to another /ifs location within the cluster.

Once rewritten, the ‘isi get -DDO’ CLI will show only uncompressed blocks, and overwrites follow the efficient block-level path rather than a read-decompress-modify-write (R-DM-W) operation.

4. If rewriting from the application is impractical, the SmartPoolsTree job can be used instead. Note that this method requires SmartPools to be licensed across the cluster, and the job settings configured as follows:

a) Set the SmartPoolsTree job to a ‘retune’ restriping strategy via gconfig:

# isi_gconfig -t job-config jobs.types.smartpoolstree.find_goal=retune

# isi_gconfig -t job-config jobs.types.smartpoolstree.restripe_goal=retune

b) Then run against the target path

# isi job start SmartPoolsTree --paths=/ifs/path/to/data

Or, for a more direct, non-job-engine approach, use isi set (no SmartPools license required):

# isi set -r -g retune /ifs/path/to/data

5. Alternatively, if SmartPools is unlicensed or using the job engine is undesirable, for a more direct, the ‘isi set’ CLI command can be used as follows:

# isi set -r -g retune /ifs/path/to/data

Note the case sensitivity of the ‘isi set’ ‘-r’ and ‘-R’ flags: The lowercase ‘-r’ flag forces an immediate restripe of the file using the specified goal (-g retune), effectively decompressing and re-laying-out the data without going through the job engine. The uppercase ‘-R’ flag applies the goal recursively to an entire directory tree, which can be an extremely time-consuming operation with little visibility or control over speed and duration. Scripting ‘isi set -r -g retune’ to iterate over a list of hot files or a specific subdirectory is typically preferred, providing improved control. Because this operation trades background I/O and capacity (by giving back compression savings), it’s best targeted at specific uncompressible or otherwise hot datasets rather than an entire archive tier.

Inline data reduction in OneFS can be highly beneficial for the appropriate workloads, but its underlying on-disk behavior can lead to unexpected outcomes if unfamiliar with its operation. Key takeaways include:

Incompressible data still incurs a compression cost when compression is enabled: CPU resources are consumed and write-path latency increases even when no actual space savings occur. In some cases, this overhead can be more impactful than with compressible data. When data compresses or deduplicates effectively, reduced I/O often offsets CPU usage and can even improve performance. With incompressible data, however, there are no such benefits—only added overhead.
Both compressed and incompressible regions are subject to read–decompress–modify–write (R-DM-W) overhead on every overwrite, regardless of the current compression setting.
Turning off compression benefits new writes right away, but it does not change the layout of data that was already written in a compressed format.
Data originally written as incompressible continues to experience 128 KB partial-chunk overhead during partial overwrites until those areas are either fully overwritten or the entire file is replaced.
A solid understanding of how OneFS implements data reduction—and a clear view of your workload characteristics before enabling it—helps avoid these pitfalls entirely.

As with many things in life, compression usage is a cost-benefit conundrum. Once the underlying mechanics are understood, their behavior is consistent and predictable. The key is to evaluate, measure, and quantify, and apply the knowledge and findings proactively, rather than reacting after performance issues arise in production.

OneFS S3 Multi Part Upload Completion Status Tracking

The previous article in this series provided an overview of the S3 multipart upload (MPU) functionality in PowerScale OneFS. Now, we’ll turn our attention to extended MPU functionality introduced in OneFS 9.13, which allows PowerScale S3 users to track the status and completion progress of their MPU completion in real time.

MPU Completion is an existing standard S3 API that has been extended to return three additional response headers, Completion‑Id, Fast‑Path, and Completion‑State‑Info, when MPU completion progress tracking is enabled. MPU Completion Progress is a newly added extended S3 API that reports real‑time progress information in the response, including whether the fast path is used, the total number of parts and recovery parts, and the counts of completed and recovered parts.

Disabled by default, MPU completion progress activation on a PowerScale cluster is controlled by the boolean ‘MPUCompletionProgressEnable’ gconfig parameter, which must be set to value ‘1’, followed by a restart of the S3 service, for it to take effect. The CLI syntax for this procedure is as follows:

# isi_gconfig registry.Services.lwio.Parameters.Drivers.s3.MPUCompletionProgressEnable=1

# isi services s3 disable

# isi services s3 enable

Starting with OneFS 9.13, multipart upload (MPU) completion can follow either a fast or slow path based on part integrity and ordering, with fast‑path completion occurring when all parts are the same size except possibly the last, parts are retransmitted only in the event of failure while preserving their original sizes, every uploaded part is included in the final object, and part numbers are contiguous beginning at 1 with no gaps. When these conditions are satisfied, OneFS performs a highly optimized assembly process. If any condition is violated, OneFS falls back to the slow path, which requires more processing time.

OneFS 9.13 and later also include the following completion progress API endpoint:

GET /{bucket}/{object-key}?completionId=<completionId>&uploadId=<uploadId>

When progress tracking is enabled, the ‘CompleteMultipartUpload’ response includes a ‘completionID’, which is especially valuable for very large uploads where completion can take a noticeable amount of time, as it allows clients to determine whether the operation is using the fast or slow path, view the total number of parts involved, and retrieve real‑time completion metrics through the ‘Completion‑State‑Info’ data.

The components marked in green in the following diagram are introduced in OneFS 9.13, with ‘CompleteMultipartUpload’ used to assemble all the uploaded parts, and ‘GetMPUCompletionProgress’ to check the progress.

Component	Action
S3 Protocol Head	S3 Protocol Head processes requests from S3 clients.
Likewise Iomgr	Likewise Iomgr provides IO APIs. Both MPU requests need to call the Iomgr in order to read and write progress status file(s).
FS Layer	File system layer performs IO requests from upper layers.
Gconfig	Gconfig is in charge of storing user configuration to enable or disable this feature.
Other	The other components are mainly responsible for processing the two requests from S3 clients.

The following table details the various completion states:

Scenario	Message	Progress
Super fast path	Super fast path with parts 1..N sequential, will finish quickly and completionID will be useless.	No need to check
Fast path	Fast path with parts 1..N sequential, but x parts need to be recovered. May fallback to slow path in case of errors.	To check recovery progress
Slow path: only one part	Not a fast path due to only one part in total and this part needs to be recovered.	To check slow path progress
Slow path: part number not contiguous	Not a fast path due to non-contiguous part numbers. Please verify intent, or abort and restart the session.	To check slow path progress
Slow path: part number not from 1	Not a fast path due to part numbers not starting from 1. Please verify intent, or abort and restart the session.	To check slow path progress
Slow path: different part size	Not a fast path due to the size of non-last parts differs from the 1st part size. Please verify intent, or abort and restart the session.	To check slow path progress

For example, the following shows the HTTP 200 response headers from a fast path MPU operation on a large file:

Pertinent information returned includes:

Attribute	Example Output
completion-state-info:	Fast path with parts 1..N sequential, but 10 parts need to be recovered. May fallback to slow path in case of errors.
completion-id:	82d4597e-cb87-8a21-2087-a8c6163632b1
fast-path:	true

For non‑MPU objects, OneFS can compute MD5-based ETags when configured (use Md5 for Etag and Validate Content Md5).
Note that objects created via multipart upload do not use a simple MD5 hash as the ETag.
As with AWS S3, MPU ETags may be composite values, and applications requiring MD5 semantics must account for this.

When it comes to investigating and troubleshooting S3 MPU issues, the following error response codes can be helpful:

Status	Description	Troubleshooting
400 Bad Request	Unexpected upload Id Unexpected completion Id format	Check if upload ID is correct. Check if completion id is correct (it can be got from CompleteMPU response header), cannot be blank.
403 Forbidden	No correct permission.	Only bucket owner, MPU completion initiator, or users who have been granted READ/FULL_Control permissions on the bucket can do this operation.
404 NoSuchCompletionId	Failed to find MPU complete progress.	Check if the completion has already done. Check if completion ID is correct (ID can be found in the CompleteMPU response header)

It’s also worth nothing that Multipart upload is distinct from SigV4 streaming (chunked upload, available in OneFS 9.3 and later), which controls how payloads are signed and transmitted rather than how objects are partitioned. SigV4 chunked transfer encoding can be used with or without MPU.

Comparison:

Characteristic	Multipart Upload	SigV4 Chunked Upload
Purpose	Breaks object into logical parts	Breaks HTTP payload into signed chunks
Identifiers	uploadId, part numbers	SigV4 per‑chunk signatures
Relationship	Independent mechanisms	Can be used together

So, in summary, OneFS S3 MPU follows the same client-facing operation and semantics as AWS S3:

CreateMultipartUpload → UploadPart / UploadPartCopy → CompleteMultipartUpload / AbortMultipartUpload, plus listing operations.

Internally, OneFS stores multipart upload segments in ‘.s3_parts_<uploadId>’, assembles those parts into the final object upon completion, and removes the part files after either completion or abort, while OneFS 9.13 and later add MPU enhancements such as fast‑ or slow‑path determination for completion and real‑time progress monitoring via ‘GetMPUCompletionProgress’, including response headers like ‘Completion-Id’ and ‘Fast-Path’.

OneFS S3 Multipart Upload

Within the ubiquitous AWS S3 protocol spec, multipart upload (MPU) allows a large object to be more efficiently accessed by splitting it into smaller parts that are uploaded independently. It is used to improve reliability and performance by enabling parallel uploads and retrying only failed parts instead of restarting the entire operation. MPU is typically employed for objects larger than 5GB, and supports objects up to 5TB in size and 10,000 parts. The process involves initiating a multipart upload to obtain an upload ID, uploading individual parts (which can occur in any order), and completing the upload so the service assembles the parts into a single object, with the option to abort the upload to discard any uploaded parts.

The PowerScale S3 protocol implementation has supported Multipart Upload (MPU) since OneFS 9.0, leveraging the ‘HTTP 100-continue’ header during upload initiation. MPU enables OneFS to ingest or copy large objects in discrete sections, which improves performance, resilience, and workflow flexibility.

Using MPU provides several advantages, including increased throughput by allowing multiple parts to be uploaded in parallel, reduced recovery time because only failed parts must be retransmitted after a network interruption, and the ability to pause and resume uploads over extended periods. There is no automatic expiration for an MPU, so it must be explicitly completed or aborted by the client. MPU also enables upload workflows in which the final object size is not yet known, allowing applications to begin transmitting data as it is generated.

When operating over a stable, high‑bandwidth network, multipart upload maximizes bandwidth utilization by distributing parts across parallel upload threads. On less reliable networks, MPU improves resilience by isolating network failures to individual parts, avoiding the need to restart the entire upload operation. OneFS S3 Multipart Upload allows clients to transfer large objects as a series of independent parts that are later combined into a final object. To support this workflow, OneFS implements the full set of standard S3 MPU operations, including:

Operation	Definition
CreateMultipartUpload	Initiates a new multipart upload and returns an ‘uploadId’ that uniquely identifies the MPU session. The client must reference this ‘uploadId’ for all subsequent part upload and completion operations.
UploadPart	Uploads a single part of the object. The client specifies a ‘part number’ (1–10,000) and the ‘uploadId’. Each part is stored independently until the MPU is completed or aborted.
UploadPartCopy	Creates a part by copying a range of bytes from an existing object instead of sending new data. The resulting copied part becomes part of the MPU associated with the specified ‘uploadId’ and ‘part number’.
ListParts	Returns metadata for the parts that have already been uploaded for a given MPU. Useful for resuming interrupted uploads or verifying which parts have been received.
CompleteMultipartUpload	Finalizes the MPU. The client submits an ordered list of ‘part numbers’ and associated ‘ETags’. The service assembles the parts into the final object and removes temporary part storage.
AbortMultipartUpload	Cancels an in‑progress MPU and discards all previously uploaded parts associated with the ‘uploadId’. After aborting, the MPU cannot be resumed.
ListMultipartUploads	Returns a list of all in‑progress multipart uploads within a bucket. Useful for monitoring active sessions or identifying abandoned uploads.

OneFS S3 MPU also adheres to the standard S3 limits which include the following:

Item	Limit
Maximum number of multipart uploads returned in a list multipart uploads request	1000
Maximum number of parts per upload	A maximum of 10,000 parts per object is permitted.
Maximum number of parts returned for a list parts request	1000
Maximum object size	5 TiB
Part numbers	1 to 10,000 (inclusive)
Part size	5 MB to 5 GB. There is no minimum size limit on the last part of a multipart upload.

Under the hood, OneFS S3 MPU operates as follows:

Component	Action
S3 Protocol Head	S3 Protocol Head processes requests from S3 clients.
Likewise Iomgr	Likewise Iomgr provides IO APIs.
FS Layer	File system layer performs IO requests from upper layers.
Gconfig	Gconfig is in charge of storing user configuration parameters.
Other	The other components are mainly responsible for processing the two requests from S3 clients.

When an S3 multipart upload is initiated, OneFS creates a hidden ‘dot’ directory to store uploaded parts. The naming convention for this hidden directory is as follows:

.isi_s3_parts_<uploadId>

The hidden directory is placed under the bucket’s backing directory within the /ifs filesystem. For example:

# ls -lh .isi_s3_parts_1_1000000038001_1
total 276961
-rwx------ +   1 root  wheel   595M May 29 07:20 #31214989
-rwx------ +   1 root  wheel   1.0G May 29 07:27 #52428800
-rwx------ +   1 root  wheel     0B May 29 07:15 .1
-rwx------ +   1 root  wheel     0B May 29 07:27 .2
-rwx------ +   1 root  wheel     0B May 29 07:20 .3
-rwx------ +   1 root  wheel    50M May 29 07:37 1

Each uploaded part is saved as an individual ‘dot’ file within this directory and is keyed by its part number. During UploadPart or UploadPartCopy, the part is written to .isi_s3_parts_<uploadId>, associated with its part number (1–10,000), and the ‘uploadId’ returned by ‘CreateMultipartUpload’. Parts remain in this directory until the client completes the MPU, at which point they are assembled into the final object, or the MPU is aborted, which removes the part files and releases the associated space.

From the S3 client’s perspective, the MPU workflow operates as follows:

Action	HTTP Request	Details
Initiate MPU	POST /bucket/object-key?uploads	OneFS returns an uploadID and creates .s3_parts_<uploadId> internally.
Upload Parts	PUT /bucket/object-key?partNumber=N&uploadId=<uploadId>	Each part is written as a file inside the corresponding parts directory.
Optional Operations	GET /bucket/object-key?uploadId=<uploadId> GET /bucket?uploads	List part and/or List multipart uploads
Complete MPU	POST /bucket/object-key?uploadId=<uploadId>	The client provides an XML list of part numbers and ETags. OneFS assembles the final object and removes the .s3_parts_<uploadId> directory and its contents.
Abort MPU	DELETE /bucket/object-key?uploadId=<uploadId>	OneFS deletes the stored parts and frees the associated space.

In the next article in this series, we’ll take a look at the MPU status tracking and reporting functionality that was introduced in OneFS 9.13.

PowerScale InsightIQ 6.3 Features – Part 2

In this final article in the InsightIQ 6.3 series, we’ll dig into the details of the additional functionality that debuts in this new IIQ release. This includes:

Support for monitoring virtual clusters deployed on AWS or Azure, allowing InsightIQ to monitor environments regardless of where the application itself is hosted.
Increased performance visibility for file and object workloads with support for granular protocol operations, enabling metrics to be analyzed and broken down by individual file and/or object actions to streamline troubleshooting of protocol-related issues.
Enhanced filtering capabilities, allowing multiple values per category, such as IP addresses, hosts, nodes, and protocols, making it easier to compare performance across multiple entities within the same time range.
Strengthened security and operational integration with Single Sign-On (SSO) support using SAML-based authentication through Microsoft ADFS or Azure Entra ID.
Direct, in-place upgrades from versions 6.1 and 6.2, simplifying the upgrade process for existing Scale and Simple deployments.

Granular Protocol Operations Breakouts

InsightIQ 6.3 introduces enhanced visibility into granular protocol-level operations through the addition of a new breakout for protocol operations. This capability is now available across all performance graphs that support operation class breakouts and includes detailed operation name breakouts for actions across both file and object, such as ‘get bucket’, ‘get object’, ‘get bucket ACL’, and related S3 operations. This is an equivalent set of operations statistics as provided by the following OneFS command:

# isi statistics pstat list --protocol s3

With this enhancement, users can navigate directly to performance graphs and select operation name breakouts to analyze workload behavior at a granular level. This enables identification of specific operations contributing to elevated latency or bandwidth consumption, as well as determining which operations occur most frequently. Such insights can inform operational decisions, including selectively throttling specific operations at the PowerScale layer when required.

Previously, supported graphs provided protocol-level and operation class-level breakouts. InsightIQ 6.3 extends this functionality by adding operation name-level (Op Name) visibility, allowing users to see the exact operations being executed while maintaining consistency with existing views.

These operation name breakouts are also supported within cluster performance reports, enabling the same level of analysis in both interactive graphs and generated reports.

In addition, InsightIQ alerts can also now be configured using operation name (OP Name) filters with IIQ 6.3.

When defining alert rules, users may apply filters based on protocol, operation class, or specific operation names. For example, if an environment experiences a high frequency of access to a particular bucket or object, an alert can be configured specifically for the ‘get bucket’ operation to proactively notify administrators of anomalous or excessive activity.

This added granularity provides customers with significantly improved transparency into storage workloads. By exposing detailed operational metrics, including frequency, latency, and bandwidth consumption, cluster admins can more effectively identify performance bottlenecks, understand the root causes of slowdowns, and correlate workload behavior to observed performance impacts within the PowerScale cluster.

Operation names function as a subset of operation classes, which themselves are a subset of protocol performance data. This hierarchical relationship allows users to combine filters and breakouts to progressively refine analysis. For example, to analyze NFS workloads, a user may apply an NFS protocol filter and review operation class breakouts to determine whether read or write operations dominate performance time. Each operation class can then be further decomposed into individual operation names—such as specific read or object access operations—to gain deeper insight into workload behavior.

By combining protocol filters, operation class breakouts, and operation name breakouts, users can construct a highly detailed performance view that pinpoints which operations, protocols, or workload patterns contribute most significantly to latency or resource utilization.

As with existing protocol and operation class breakouts, operation name filters cannot be used in conjunction with interface-level filters. Additionally, operation name filtering is not supported with client-level filters due to limitations in OneFS telemetry data. Consequently, the system cannot report which specific client is responsible for a given operation, such as identifying which client initiated a particular ‘get object’ request.

The data presented through InsightIQ aligns with existing PowerScale CLI capabilities, such as output from the ‘isi statistics pstat list –protocol <protocol>’ command. However, while the CLI provides operation rates, InsightIQ extends this by presenting operation rates alongside bandwidth and latency metrics within a unified visualization. This delivers a more comprehensive and actionable view of protocol-level performance than was previously available through CLI data alone.

Multi-Value Breakouts

InsightIQ 6.3 introduces support for multi‑value selection within a single filter, enabling users to analyze multiple data sources simultaneously within a unified view. This enhancement allows multi‑line visualizations and aggregated insights to be presented together, simplifying side‑by‑side comparisons without requiring users to switch between views.

Multi‑value filter selection is supported for the following filter types: protocol, client node, node pool, and tier. For example:

When multi‑value filtering is enabled, the ‘Breakout By’ option is automatically disabled, and the heat map view is hidden. Both features are restored when the user switches back to single‑value filter mode.

In table‑based reports, column‑level filter icons are also hidden while multi‑value filtering is active and reappear when the user reverts to single‑value selection. Download functionality supports both aggregated and multi‑value data, ensuring consistency between the UI and exported results.

When selecting multiple filter values, InsightIQ displays up to five selection ‘pills’, followed by a ‘More’ option. Selecting ‘More’ opens a pop‑up displaying all selected values, where individual entries can be removed using the corresponding remove icon. If more than five values are selected, the ‘Show Multiline Graph’ option becomes unavailable, as this feature supports only two to five filter values. Additionally, InsightIQ enforces a constraint allowing multi‑value selection on only one filter at a time; other filters must remain single‑select.

Once the filter is applied in an aggregated view, the chart presents combined metrics on a single line, with the ‘Breakout By’ option disabled and the heat map hidden.

When exporting data from this aggregated view, the resulting CSV includes a column representing the aggregate of the selected filter values, ensuring alignment between the displayed visualization and exported data.

In the multi‑line scenario, users may select between two and five values (eg. multiple nodes) and enable the ‘Show Multiline Graph’ option. The resulting visualization renders a separate line for each selected value. In this mode, while the multi‑line display is preserved in the chart, the ‘Show Multiline Graph’ setting is not retained when saving filters or exporting CSV data. The exported file contains separate columns for each selected filter value, including corresponding minimum and maximum metrics, facilitating side‑by‑side comparison and offline analysis.

When viewing reports that include tabular data, such as the ‘Client Performance’ report, column‑level filter controls for attributes like address, node, and node protocol are hidden while multi‑value filtering is active. These controls are restored once the multi‑value filter is removed, allowing single‑value filtering directly from the table.

In reports where the selected filter is already part of the ‘Breakout By’ configuration (such as ‘Filesystem Cache Performance’) attempting to apply multi‑value filtering results in a notification indicating that the graph does not support this mode. This behavior is expected, as these visualizations already present multi‑line data. However, if multi‑value filtering is applied to a filter that is not used in the breakout configuration, the multi‑line chart remains available, and functions as expected.

In summary, InsightIQ 6.3 preserves the existing behavior for single‑value filtering while introducing multi‑value filtering capabilities that support both aggregated analysis and multi‑line comparisons. These enhancements provide increased analytical flexibility while maintaining consistent behavior across visualizations, reports, and exported data.

Virtual Cluster Support

InsightIQ 6.3 introduces support for monitoring virtual OneFS clusters deployed on public cloud platforms such as AWS and Azure. Historically, InsightIQ monitoring capabilities have been focused on physical PowerScale clusters. However, with the increasing adoption of cloud‑hosted virtual OneFS deployments, extending InsightIQ support to these environments has become essential.

Virtual OneFS clusters differ from physical PowerScale clusters primarily in their licensing model. While PowerScale clusters require feature‑specific licenses—such as SmartQuotas, SmartDedupe, or SmartLock—virtual OneFS clusters rely solely on a OneFS capacity license. This capacity license enables all supported OneFS features without the need for additional feature‑specific licenses.

In a physical PowerScale cluster, licensing information typically reflects multiple dynamically applied, feature‑specific licenses. By contrast, virtual OneFS clusters hosted in AWS or Azure display only the OneFS capacity license, which implicitly covers all supported features. InsightIQ 6.3 now fully understands and accounts for these licensing differences, ensuring accurate license interpretation, proper cluster type detection, and correct enablement of feature‑dependent reporting for cloud‑hosted virtual OneFS clusters.

As a result, InsightIQ now provides expanded monitoring support for customers deploying virtual OneFS clusters in public cloud environments. This enhancement ensures parity in monitoring functionality between on‑premises PowerScale clusters and cloud‑hosted virtual clusters.

Physical OneFS cluster

Virtual AWS/Azure based OneFS clusters

• Feature specific dynamic licensing (SmartQuotas, SmartDedupe etc).

• OneFS Capacity license only

• No separate feature licenses

To illustrate this capability, consider a comparison of two clusters added to an InsightIQ instance. The first cluster is a standard physical PowerScale deployment. Cluster metadata obtained through CLI commands indicates that it comprises three nodes and is identified as a non‑virtual cluster. Examination of the license information shows multiple feature‑specific licenses, including SmartQuotas, SmartDedupe, and SmartLock. Certain InsightIQ reports—such as quota‑related reports—require a valid feature license to be enabled. In this case, the SmartQuotas license is active, allowing quota reports to be displayed.

The second cluster is a virtual OneFS deployment hosted in the cloud. Cluster metadata identifies it as a virtual cluster consisting of four nodes. License information for this cluster shows only the OneFS capacity license. Despite the absence of individual feature licenses, the capacity license enables full access to all supported OneFS capabilities.

Both clusters can be added to InsightIQ using the same workflow, including credential configuration and cluster registration. Once added, InsightIQ correctly interprets the licensing model for each cluster type. For example, when viewing quota reports, InsightIQ displays the reports for the physical PowerScale cluster based on the presence of a valid SmartQuotas license. When switching to the virtual OneFS cluster, the same quota reports remain available, as the OneFS capacity license inherently enables this functionality.

With this enhancement, InsightIQ 6.3 ensures that reporting behavior remains consistent across physical and virtual deployments, regardless of underlying licensing differences. This capability significantly expands InsightIQ’s monitoring coverage, enabling comprehensive observability for both on‑premises PowerScale clusters and cloud‑hosted virtual OneFS clusters running on AWS or Azure.

SSO Support

InsightIQ 6.3 introduces support for Microsoft Active Directory Federation Services (ADFS) as a new identity provider, enabling Single Sign-On (SSO) for centralized authentication and simplified access management.

In this SSO architecture, InsightIQ functions as a Service Provider (SP) and is provisioned with the required identity claims during deployment. The platform supports full lifecycle management of identity providers, allowing administrators to create, update, delete, and retrieve IdP configurations, upload ADFS federation metadata, perform test connections to validate the integration, and enable or disable the IdP through the access control interface. InsightIQ maintains a consolidated view of all provisioned identity providers along with their operational status. When at least one identity provider is enabled, an SSO login option is automatically displayed on the InsightIQ home page.

The Launch OneFS workflow has been enhanced to support SSO-based access. When a user authenticates to InsightIQ using SSO and the same SSO configuration is present on the target PowerScale cluster, selecting the ‘Launch OneFS’ option opens the PowerScale dashboard directly without additional authentication prompts.

If SSO is not configured on the target cluster, the user is redirected to the PowerScale login page. This behavior change applies only to SSO-based authentication and does not affect existing local, Active Directory, or LDAP login mechanisms.

InsightIQ access control remains group-based and relies on Active Directory group membership. Active Directory administrators are responsible for assigning users from the same or trusted forests to the appropriate groups to grant InsightIQ access. The ADFS administrator must configure the identity provider to integrate correctly with an internal or external LDAP or Active Directory server so that accurate group membership information can be included in authentication claims and passed to InsightIQ for authorization decisions.

Several prerequisites must be met to enable SSO with ADFS in InsightIQ 6.3. InsightIQ version 6.3 must be installed on a supported Simple or Scale system, and the End User License Agreement must be accepted. An LDAP or Active Directory authentication provider must be configured and enabled in InsightIQ with appropriate group and role mappings defined. Windows Active Directory and DNS infrastructure must be properly configured and operational. Additionally, ADFS must be configured and synchronized with the same LDAP or Active Directory service used by InsightIQ to ensure consistent user and group resolution. Any mismatch in directory configuration between InsightIQ and ADFS can result in SSO authentication failures.

Single Sign-On (SSO) support using Azure EntraID is also added in InsightIQ 6.3 as a new identity provider option, enabling centralized authentication and streamlined access management in PowerScale for Azure Cloud deployments.

In this configuration, InsightIQ functions as a Service Provider (SP) and is provisioned with the required identity claims during deployment. The platform supports full lifecycle management of identity provider configurations, allowing administrators to create, update, delete, and retrieve IdP definitions, upload Azure EntraID federation metadata, and validate the integration through test connections. Identity providers can be enabled or disabled through the access control interface, and InsightIQ displays all provisioned IdPs along with their current status. When at least one identity provider is enabled, an SSO login option is automatically displayed on the InsightIQ home page. As part of this release, Azure EntraID is available as a newly introduced IdP type during identity provider configuration.

Several prerequisites must be satisfied to enable SSO integration with Azure EntraID. InsightIQ version 6.3 must be installed on a supported Simple or Scale system, and the End User License Agreement must be accepted. An LDAP or Active Directory authentication provider must be configured and enabled in InsightIQ, with appropriate group and role mappings defined. Windows Active Directory and DNS infrastructure must be properly configured and operational. Additionally, Azure EntraID must be configured and synchronized with the same LDAP or Active Directory service that is configured in InsightIQ to ensure accurate user and group synchronization for authentication and authorization.

Partitioned Performance Alignment

InsightIQ has aligned its partition-level performance aggregation logic with PowerScale’s native workload summary calculations. Previously, certain performance graphs in InsightIQ displayed values derived using methods that differed from those used by OneFS CLI tools or the native PowerScale UI, which could result in discrepancies when customers compared InsightIQ metrics with cluster-reported values.

With this update, non-latency metrics, such as IOPS, throughput, CPU reads, and CPU writes, are now computed using the same methodology as OneFS. As a result, InsightIQ metrics closely match those reported directly by the cluster. Latency metrics were already consistent with PowerScale calculations and remain unchanged.

Additionally, a naming update has been introduced in InsightIQ 6.3 to improve clarity. The ‘Workload IOPS’ graph has been renamed to ‘Workload IO Operations’ to more accurately reflect the data represented by the visualization. This change is limited to labeling and does not affect underlying functionality or calculations.

From a support perspective, this enhancement directly addresses previous customer reports regarding inconsistencies between InsightIQ metrics and PowerScale cluster statistics. With the updated aggregation logic, InsightIQ graphs should now closely align with native PowerScale reporting, reducing confusion and improving confidence in performance analysis.

So, in summary, InsightIQ 6.3 offers the following attributes and functionality:

Function	Attribute	Description
Scope	Monitoring scope	Up to 20 clusters and 504 nodes
Ecosystem	OS support	RHEL 8.10, RHEL 9.4, RHEL 10.0, and SLES 15 SP4
Platform	Resources	Reduced CPUs, memory and disk requirement
		Scale option requires just one node
	Size	Smaller package size: OVA package < 5GB
Install and upgrade	Installation	Installation time: < 12 mins
	Migration	Direct migration from 4.x
		Online migration from InsightIQ 6.3 Simple (OVA) to InsightIQ 6.3 Scale
Resilience	Data collection	Resilient data collection – no data loss
OS Support	Simple ecosystem support	InsightIQ Simple 6.3 can be deployed on the following platforms: · VMware virtual machine running ESXi version 8.0U3 or 9.0.1. · VMware Workstation 17 (free version) InsightIQ Simple 6.3 can monitor PowerScale clusters running OneFS versions 9.7 through 9.14. · OpenStack RHOSP 21 with RHEL 9.6
	Scale ecosystem support	InsightIQ Scale 6.3 can be deployed on Red Hat Enterprise Linux versions 8.10 or 9.4 (English language versions) and SUSE Enterprise Linux (SLES) 15 SP4. InsightIQ Scale 6.3 can monitor PowerScale clusters running OneFS versions 9.7 through 9.14.
Upgrade	In-place upgrade from InsightIQ 5.1.x to 6.x	The upgrade script supports in-place upgrades from InsightIQ 5.1.x to 6.x.
Reporting	Maximum and minimum ranges on all reports	All live Performance Reports display a light blue zone that indicates the range of values for a metric within the sample length. The light blue zone is shown regardless of whether any filter is applied. With this enhancement, users can observe trends in values on filtered graphs.
	Graphing and report visualization	Reports are designed to maximize the number of graphs that can appear on each page. · Excess white space is eliminated. · The report parameters section collapses when the report is run. The user can expand it manually. · Graph heights are decreased when possible. · Page scrolling occurs while the collapsed parameters section remains fixed at the top.
User interface	What’s New dialog	All InsightIQ users can view a brief introduction to new functionality in the latest release of InsightIQ. Access the dialog from the banner area of the InsightIQ web application. Click About > What’s New.
	Compact cluster performance view on the Dashboard	The IIQ dashboard provides: · Summary information for six clusters appears in the initial dashboard view. A sectional scrollbar controls the view for additional clusters. · The capacity section has its own scrollbar. · The navigation side bar is collapsible into space-saving icons. Use the << icon at the bottom of the side bar to collapse it.

PowerScale InsightIQ 6.3 Features

In this second article in the InsightIQ 6.3 series, we’ll dig into the details of the additional functionality that debuts in this new IIQ release.

When upgrading to the new InsightIQ 6.3 release, the process is largely consistent with previous upgrades, such as InsightIQ 6.2.

The specific deployment options and hardware requirements for installing and running InsightIQ 6.x are as follows:

Attribute	InsightIQ 6.3 Simple	InsightIQ 6.3 Scale
Scalability	Up to 10 clusters or 252 nodes	Up to 20 clusters or 504 nodes
Deployment	On VMware, using OVA template	RHEL, SLES, or Ubuntu with deployment script
Hardware requirements	VMware v15 or higher: · CPU: 8 vCPU · Memory: 16GB · Storage: 1.5TB (thin provisioned); Or 500GB on NFS server datastore	Up to 10 clusters and 252 nodes: · CPU: 8 vCPU or Cores · Memory: 16GB · Storage: 500GB Up to 20 clusters and 504 nodes: · CPU: 12 vCPU or Cores · Memory: 32GB · Storage: 1TB
Networking requirements	1 static IP on the PowerScale cluster’s subnet	1 static IP on the PowerScale cluster’s subnet

To initiate the upgrade to 6.3, the system must be running an InsightIQ 6.1 or 6.2 Scale or Simple deployment, plus a minimum of 40 GB of available disk space is required.

Once these prerequisites are satisfied, the upgrade process begins by extracting the InsightIQ 6.3 installer package, followed by extraction of the upgrade bundle. The upgrade is then initiated by executing the ‘upgrade-iiq.sh’ script.

Upgrade progress can be monitored using the appropriate status commands to view upgrade locks and overall status. For more detailed information, including lock details and intermediate steps, administrators can review the InsightIQ_upgrade.log file.

The InsightIQ upgrade workflow consists of five distinct stages:

During the pre-check stage, the installer verifies the availability of required Docker commands, validates the existing InsightIQ version, checks for sufficient disk space, confirms that all InsightIQ services are running, and ensures operating system compatibility.

In the pre-upgrade stage, the installer verifies acceptance of the EULA and extracts the required InsightIQ images. The currently running InsightIQ services are then stopped, necessary directories are created, and optional containers are updated as needed.

The upgrade stage includes updating resource limits, upgrading add-on services, installing the CIM component, and upgrading the remaining InsightIQ services. The EULA is updated, followed by a final health check to confirm that all InsightIQ services are running correctly.

During the post-upgrade stage, additional steps are performed depending on the source version. For systems upgrading from InsightIQ 6.1, the Docker network is upgraded, and InsightIQ metadata is then updated.

Finally, the cleanup stage replaces outdated scripts, removes obsolete Docker images, and deletes temporary upgrade and backup directories to complete the upgrade process.

Phase	Details
Pre-check	• Docker command • InsightIQ version check 6.1.0 or 6.2.0 • Free disk space • InsightIQ services status • OS compatibility
Pre-upgrade	• EULA accepted • Extract the IIQ images • Stop IIQ • Create necessary directories • Update optional containers
Upgrade	• Update resource limit • Upgrade addons services • Upgrade IIQ services • Upgrade EULA • Status Check
Post-upgrade	• Update network (if 6.1.0) • Update IIQ metadata
Cleanup	• Replace scripts • Remove old docker images • Remove upgrade and backup folders

Specific steps in the upgrade process are as follows:

Download and uncompress the bundle:

# tar xvf iiq-install-6.3.0.tar.gz

From within the InsightIQ directory, un-tar the upgrade scripts as follows:

# cd InsightIQ

# tar xvf upgrade.tar.gz

Enter the resulting ‘upgrade’ directory which contains the scripts:

# cd upgrade/

Initiate the IIQ upgrade. Note that the usage is same for both the Simple and Scale InsightIQ deployments.

# ./upgrade-iiq.sh -m <admin_email>

Upon successful upgrade completion, InsightIQ will be accessible via the primary node’s IP address.

Quick and easy upgrade progress checks include:

Check	Command syntax
Check the latest 100 lines of upgrade log	showupg –l or showupg –log
Check the latest 100 lines of upgrade status	showupg –s or showupg –status
Check detailed logs	cat /usr/share/storagemonitoring/logs/upgrade/log/insightiq_upgrade.log

AI-based Assistant

InsightIQ 6.3 introduces a new AI‑based Assistant. This intelligent, document‑aware AI companion is designed to help users quickly find answers, understand product capabilities, and troubleshoot issues related to InsightIQ and PowerScale. The assistant draws its responses from supported documentation, including InsightIQ and PowerScale documentation, release notes, and knowledge base articles.

To enable the AI Assistant, several prerequisites must be met. An AI-enabled InsightIQ deployment requires an additional 8 vCPUs or cores and 12 GB of RAM above the general IIQ 6.3 spec, and a separate AI Assistant package must also be installed, which is available in the Download Center and is distinct from the standard InsightIQ Scale and Simple packages.

Note that this feature is not available in the Greater China region due to legal and regulatory restrictions, as it relies on AI models that are not permitted in that geography. Consequently, the option to enable the AI Assistant will not appear if the system is configured for the China region.

To activate the AI Assistant, users must first download the AI Assistant tar package (iiq-ai.tar.gz) from the Download Center and run the AI Assistant prerequisite command to install all required dependencies.

Note that the IIQ server resources must be updated to include the additional CPU and memory requirements as described above, after which the AI Assistant option can be enabled.

The AI assistance prerequisite installation script ‘run-ai-assistant-prereqs’ comprises four main stages:

Stage	Description	Location
1	Push docker images	· Local registry
2	Extract models	· /usr/share/storagemonitoring/common-components/ai_models/models/reranker · /usr/share/storagemonitoring/common-components/ai_models/models/sentence-transformer
3	Extract Llama model	· /usr/share/storagemonitoring/common-components/llm
4	Extract chunk data	· /usr/share/storagemonitoring/common-components/custom_spell_terms.json

For example:

IIQ validates that all prerequisites are satisfied before launching the required containers and their respective services, including the Large Language Model (LLM), the vector database, and the InsightIQ AI Assistant controller. Once these services are running, the chatbot becomes available for use.

Next, the AI Assistant can be enabled from the InsightIQ masthead as follows:

The following popup window is displayed, reiterating the prerequisites and prompting to ‘Enable AI Assistant’:

At this point, assuming the prerequisites have been met, an AI Assistant button is added to the UI masthead:

Clicking this button opens the AI Assistant chat window with a ‘How can I help you today?’ prompt:

A warning is displayed noting “You are interacting with an AI system, not a human. Responses should be reviewed for accuracy.” A link is provided for more information on Dell’s Privacy Statement too.

At this point, natural language questions can be entered into the text box. For example, “how to enable dedupe in OneFS?”:

Upon clicking ‘Send’, the AI system parses the instruction and provides its best response, in this case by providing a four-step procedure for enabling OneFS deduplication, plus supporting documentation references and links.

In the next article in the InsightIQ 6.3 series, we’ll focus on the additional functionality that debuts in this new IIQ release, including:

Support for monitoring virtual clusters deployed on AWS or Azure.
Increased performance visibility for file and object workloads.
Enhanced filtering capabilities with multiple values per category.
Single Sign-On (SSO) support via Microsoft ADFS or Azure Entra ID.
Direct, in-place upgrades from versions 6.1 and 6.2.