PowerScale OneFS 9.12

Dell PowerScale is already powering up the summer with the launch of the innovative OneFS 9.12 release, which shipped today (14th August 2025). This new 9.12 release has something for everyone, introducing PowerScale innovations in security, serviceability, reliability, protocols, and ease of use.

OneFS 9.12 represents the latest version of PowerScale’s common software platform for on-premises and cloud deployments. This can make it a excellent choice for traditional file shares and home directories, vertical workloads like M&E, healthcare, life sciences, financial services, plus generative and agentic AI, and other ML/DL and analytics applications.

PowerScale’s scale-out architecture can be deployed on-site, in co-lo facilities, or as customer-managed PowerScale for Amazon AWS and Microsoft Azure deployments, providing core to edge to cloud flexibility, plus the scale and performance needed to run a variety of unstructured workflows on-prem or in the public cloud.

With data security, detection, and monitoring being paramount in this era of unprecedented cyber threats, OneFS 9.12 brings an array of new features and functionality to keep your unstructured data and workloads more available, manageable, and secure than ever.

Protocols

On the S3 object protocol front, OneFS 9.12 sees the debut of new security and immutability functionality. S3 Object Lock extends the standard AWS S3 Object Lock model with PowerScale’s own ‘Bucket-Lock’ protection mode semantics. Object Lock capabilities can operate on a per-zone basis and per-bucket, using the cluster’s compliance clock for the date and time evaluation of object’s retention. Additionally, S3 protocol access logging and bucket logging are also enhanced in this new 9.12 release.

Networking

As part of PowerScale’s seamless protocol failover experience for customers, OneFS 9.12 sees SmartConnect’s default IP allocation method for new pools move to ‘dynamic’. While SMB2 and SMB3 are the primary focus, all protocols benefit from this enhancement, including SMB, NFS, S3, and HDFS. Legacy pools will remain unchanged upon upgrade to 9.12, but any new pools will automatically be provisioned as dynamic (unless manually configured as ‘static’).

Security

In the interests of increased security and ransomware protection, OneFS 9.12 includes new Secure Snapshots functionality. Secure Snapshots provide true snapshot immutability, as well as protection for snapshot schedules, in order to protect against alteration or deletion, either accidentally or by a malicious actor.

Secure snapshots are built upon Multi-party Authorization (MPA), also introduced in OneFS 9.12. MPA prevents an individual administrator from executing privileged operations, such as configuration changes on snapshots and snapshot schedules, by requiring two or more trusted parties to sign off on a requested change for the privileged actions within a PowerScale cluster.

OneFS 9.12 also introduces support for common access cards (CAC) and personal identity verification (PIV) smart cards, providing physical multi-factor authentication (MFA), allowing users to SSH to a PowerScale cluster using the same security badge that grants them access into their office. In addition to US Federal mandates, CAC/PIV integration is a requirement for many security conscious organizations across the public and private sectors.

Upgrade

One-click upgrades in OneFS 9.12 allow a cluster to automatically display and download available  trusted upgrade packages from Dell Support, which can be easily applied via ‘one click installation’ from the OneFS WebUI or CLI. Upgrade package versions are automatically managed by Dell in accordance with a cluster’s telemetry data.

Support

OneFS 9.12 introduces an auto-healing capability, where the cluster detects problems using the HealthCheck framework and automatically executes a repair action for known issues and failures. This helps to increase cluster availability and durability, while reducing the time to resolution and the need for technical support engagements. Furthermore, additional repair-actions can be added at any point, outside of the general OneFS release cycle.

Hardware Innovation

On the platform hardware front, OneFS 9.12 also introduces an HDR Infiniband front-end connectivity option for the PowerScale PA110 performance and backup accelerator. Plus, 9.12 also brings a fast reboot enhancement to the high-memory PowerScale F-series nodes.

In summary, OneFS 9.12 brings the following new features and functionality to the Dell PowerScale ecosystem:

Area Feature
Networking ·         SmartConnect dynamic allocation as the default.
Platform ·         PowerScale PA110 accelerator front-end Infiniband support.

·         Conversion of front-end Ethernet to Infiniband support for F710 & F910.

·         F-series fast reboots.

Protocol ·         S3 Object Lock.

·         S3 Immutable SmartLock bucket for tamper-proof objects.

·         S3 protocol access logging.

·         S3 bucket logging.

Security ·         Multi-party authorization for privileged actions.

·         CAC/PIV smartcard SSH access.

·         Root lockdown mode.

·         Secure Snapshots with MPA override to protect data when retention period has not expired.

Support ·         Custer-level inventory request API.

·         In-field support for back-end NIC changes.

Reliability ·         Auto Remediation self-diagnosis and healing capability.
Upgrade ·         One-click upgrade.

We’ll be taking a deeper look at OneFS 9.12’s new features and functionality in future blog articles over the course of the next few weeks.

Meanwhile, the new OneFS 9.12 code is available on the Dell Support site, as both an upgrade and reimage file, allowing both installation and upgrade of this new release.

For existing clusters running a prior OneFS release, the recommendation is to open a Service Request with to schedule an upgrade. To provide a consistent and positive upgrade experience, Dell is offering assisted upgrades to OneFS 9.12 at no cost to customers with a valid support contract. Please refer to Knowledge Base article KB544296 for additional information on how to initiate the upgrade process.

ObjectScale 4.1

Hot off the press comes ObjectScale version 4.1 – a major release of Dell’s enterprise-grade object storage platform. As a foundational component of the Dell AI Data Platform, ObjectScale 4.1 delivers enhanced scalability, performance, and resilience that’s engineered to meet the evolving demands of AI-driven workloads and modern data ecosystems.

This release is available as a software upgrade for existing ECS and ObjectScale environments, and the core new features and functionality introduced in this ObjectScale 4.1 release include:

Storage Efficiency and Operational Experience

On the storage efficiency and operation experience front, ObjectScale 4.1 introduces support for multiple compression modes including LZ4, Zstandard, Deflate, and Snappy, configurable via both the UI and API. This flexibility allows admins to fine-tune compression strategies to balance performance, cost, and workload characteristics.

Post-upgrade to ObjectScale 4.1, the default algorithms are updated to LZ4 for AFA appliances (EXF900 and XF960) and Zstandard for HDD appliances (EX300, EX3000, EX500, EX5000, X560). Storage admins can change the algorithm at any time via the UI or API, based on workload or use case.

Improved garbage collection throughput enables faster reclamation of deleted capacity. Enhanced monitoring, alerting, and logging tools provide greater visibility into background processes, contributing to overall cluster stability.

An updated dashboard offers refined views of user, available, and reserved capacity. Automated alerts notify administrators when usage exceeds 90%, indicating a transition to Read-Only mode for the affected Virtual Data Center (VDC).

New port-level bandwidth controls for replication traffic allow for more predictable performance and optimized resource allocation across distributed environments.

Security and Data Protection

Within the security and data protection realm, ObjectScale now provides support for Self-Encrypting Drives (SEDs) with local key management via Dell iDRAC. This ensures hardware-level encryption and secure, appliance-local key handling for enhanced data protection.

TLS 1.3, the latest version of the Transport Layer Security protocol, is also supported in ObjectScale 4.1. This upgrade delivers stronger encryption, faster handshakes, and the removal of legacy algorithms, improving both control and data path security.

Expanded Capabilities for Modern Workloads

ObjectScale 4.1 now offers up to 3x faster object listing performance in multi-VDC environments. This enhancement improves data browsing and discovery, with better handling of deleted metadata and validation of Untrusted Listing Keys.

Through webhook-based APIs, ObjectScale can now push real-time notifications to external applications when events such as object creation, deletion, or modification occur—enabling responsive, event-driven architectures.

Support for S3FS in 4.1 allows users to mount S3 buckets on Linux systems as local file systems. This simplifies access and management, particularly for legacy applications that rely on traditional file system operations.

On the integration front, ObjectScale 4.1 is compatible with the latest AWS SDK v2.29, so Java developers can immediately use new S3 features and performance fixes in their applications, and build cloud-native applications with full access to modern AWS features and APIs.

The following hardware platforms are supported by the new ObjectScale 4.1 release:

Gen 2 systems U480E, U400T, U400E, U4000, U400, U2800, U2000, D6200, D5600, D4500
Gen 3 systems EX3000, EX300, EXF900, EX5000, EX500
Gen 4 systems X560, XF960

Note that upgrading to ObjectScale 4.1 is only supported from ECS 3.8.x and 4.0.x releases.

In summary, ObjectScale 4.1 represents a strategic advancement in Dell’s commitment to delivering intelligent, secure, and scalable storage solutions for the AI era. Whether upgrading existing infrastructure or deploying new systems, this new 4.1 release empowers organizations to meet the challenges of data growth, complexity, and innovation with confidence.

OneFS SmartSync Backup-to-Object Management and Troubleshooting

As we saw in the previous articles in this series, SmartSync in OneFS 9.11 enjoys the addition of backup-to-object functionality, which delivers high performance, full-fidelity incremental replication to ECS, ObjectScale, Wasabi, and AWS S3 & Glacier IR object stores.

This new SmartSync backup-to-object functionality supports the full spectrum of OneFS path lengths, encodings, and file sizes up to 16TB – plus special files and alternate data streams (ADS), symlinks and hardlinks, sparse regions, and POSIX and SMB attributes.

In addition to the standard ‘isi dm’ command set, the following CLI utility can also come in handy for tasks such as verifying the dataset ID for restoration, etc:

# isi_dm browse

For example, to query the SmartSync accounts and datasets:

# isi_dm browse

<no account>:<no dataset> $ list-accounts

000000000000000100000000000000000000000000000000 (tme-tgt)

ec2a72330e825f1b7e68eb2352bfb09fea4f000000000000 (DM Local Account)

fd0000000000000000000000000000000000000000000000 (DM Loopback Account)

<no account>:<no dataset> $ connect-account 000000000000000100000000000000000000000000000000

tme-tgt:<no dataset> $ list-datasets

1       2025-07-22T10:23:33+0000        /ifs/data/zone3

2       2025-07-22T10:23:33+0000        /ifs/data/zone4

1025    2025-07-22T10:25:01+0000        /ifs/data/zone3

2049    2025-07-22T10:30:04+0000        /ifs/data/zone4

tme-tgt:<no dataset> $ connect-dataset 2

tme-tgt:2 </ifs/data/zone4:> $ ls

home                           [dir]

zone2_sync1753179349           [dir]

tme-tgt:2 </ifs/data/zone4:> $ cd zone2_sync1753179349

tme-tgt:2 </ifs/data/zone4:zone2_sync1753179349/> $ ls

home                           [dir]

tme-tgt:2 </ifs/data/zone4:zone2_sync1753179349/> $

Or for additional detail:

tme-tgt:2 </ifs/data/zone4:zone2_sync1753179349/> $ settings output-to-file-on /tmp/out.txt

tme-tgt:2 </ifs/data/zone4:zone2_sync1753179349/> $ settings verbose-on

tme-tgt:2 </ifs/data/zone4:zone2_sync1753179349/> $ list-datasets

1       2025-07-22T10:23:33+0000        /ifs/data/zone3 { dmdi_tree_id={ dmdti_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdti_local_unid=1 } dmdi_revision={ dmdr_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdr_local_unid=1 } }

2       2025-07-22T10:23:33+0000        /ifs/data/zone4 { dmdi_tree_id={ dmdti_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdti_local_unid=2 } dmdi_revision={ dmdr_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdr_local_unid=2 } }

1025    2025-07-22T10:25:01+0000        /ifs/data/zone3 { dmdi_tree_id={ dmdti_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdti_local_unid=1 } dmdi_revision={ dmdr_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdr_local_unid=3 } }

2049    2025-07-22T10:30:04+0000        /ifs/data/zone4 { dmdi_tree_id={ dmdti_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdti_local_unid=2 } dmdi_revision={ dmdr_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdr_local_unid=4 } }

But when it comes to monitoring and troubleshooting SmartSync, there are a variety of diagnostic tools available. These include:

Component Tools Issue
Logging ·         /var/log/isi_dm.log

·         /var/log/messages

·         ifs/data/Isilon_Support/datamover/transfer_failures/baseline_failures_ <jobid>

General SmartSync info and  triage.
Accounts ·         isi dm accounts list / view Authentication, trust and encryption.
CloudCopy ·         S3 Browser (ie. Cloudberry), Microsoft Azure Storage Explorer Cloud access and connectivity.
Dataset ·         isi dm dataset list/view Dataset creation and health.
File system ·         isi get Inspect replicated files and objects.
Jobs ·         isi dm jobs list/view

·         isi_datamover_job_status -jt

Job and task execution, auto-pausing, completion, control, and transfer.
Network ·         isi dm throttling bw-rules list/view

·         isi_dm network ping/discover

Network connectivity and throughput.
Policies ·         isi dm policies list/view

·         isi dm base-policies list/view

Copy and dataset policy execution and transfer.
Service ·         isi services -a isi_dm_d <enable/disable> Daemon configuration and control.
Snapshots ·         isi snapshot snapshots list/view Snapshot execution and access.
System ·         isi dm throttling settings CPU load and system performance.

SmartSync info and errors are typically written to /var/log/isi_dm.log and /var/log/messages, while DM jobs transfer failures generate a log specific to the job ID under /ifs/data/Isilon_Support/datamover/transfer_failures.

Once a policy is running, the job status is reported via ‘isi dm jobs list’. Once complete, job histories are available by running ‘isi dm historical jobs list’. More details for a specific job can be gleaned from the ‘isi dm job view’ command, using the pertinent job ID from the list output above. Additionally, the ‘isi_datamover_job_status’ command with the job ID as an argument will also supply detailed information about a specific job.

Once running, a DM job can be further controlled via the ‘isi dm jobs modify’ command, and available actions include cancel, partial-completion, pause, or resume.

If a certificate authority (CA) is not correctly configured on a PowerScale cluster, the SmartSync daemon will not start, even though accounts and policies can still be configured. Be aware that the failed policies will not be reported via ‘isi dm jobs list’ or ‘isi dm historical-jobs list’ since they never started. Instead, an improperly configured CA is reported in the /var/log/isi_dm.log as follows:

Certificates not correctly installed, Data Mover service sleeping: At least one CA must be installed: No such file or directory from dm_load_certs_from_store (/b/mnt/src/isilon/lib/isi_dm/isi_dm_remote/src/rpc/dm_tls.cpp:197 ) from dm_tls_init (/b/mnt/src/isilon/lib/isi_dm/isi_dm_remote/src/rpc/dm_tls.cpp:279 ): Unable to load certificate information

Once a CA and identity are correctly configured, the SmartSync service automatically activates. Next, SmartSync attempts a handshake with the target. If the CA or identity is mis-configured, the handshake process fails, and generates an entry in /var/log/isi_dm.log. For example:

2025-07-30T12:38:17.864181+00:00 GEN-HOP-NOCL-RR-1(id1) isi_dm_d[52758]: [0x828c0a110]: /b/mnt/src/isilon/lib/isi_dm/isi_dm_remote/src/acct_mon.cpp:dm_acc tmon_try_ping:348: [Fiber 3778] ping for account guid: 0000000000000000c4000000000000000000000000000000, result: dead

Note that the full handshake error detail is logged if the SmartSync service (isi_dm_d) is set to log at the ‘info’ or ‘debug’ level using isi_ilog:

# isi_ilog -a isi_dm_d --level info+

Valid ilog levels include:

fatal error err notice info debug trace

error+ err+ notice+ info+ debug+ trace+

A copy or repeat-copy policy requires an available dataset for replication before running. If a dataset has not been successfully created prior to the copy or repeat-copy policy job starting for the same base path, the job is paused. In the following example, the base path of the copy policy is not the same as that of the dataset policy, hence the job fails with a “path doesn’t match…” error.

# ls -l /ifs/data/Isilon_support/Datamover/transfer_failures

Total 9

-rw-rw----   1 root  wheel  679  July 20 10:56 baseline_failure_10

# cat /ifs/data/Isilon_support/Datamover/transfer_failures/baseline_failure_10

Task_id=0x00000000000000ce, task_type=root task ds base copy, task_state=failed-fatal path doesn’t match dataset base path: ‘/ifs/test’ != /ifs/data/repeat-copy’:

from bc_task)initialize_dsh (/b/mnt/src/isilon/lib/isi_dm/isi_dm/src/ds_base_copy

from dmt_execute (/b/mnt/src/isilon/lib/isi_dm/isi_dm/src/ds_base_copy_root_task

from dm_txn_execute_internal (/b/mnt/src/isilon/lib/isi_dm/isi_dm_base/src/txn.cp

from dm_txn_execute (/b/mnt/src/isilon/lib/isi_dm/isi_dm_base/src/txn.cpp:2274)

from dmp_task_spark_execute (/b/mnt/src/isilon/lib/isi_dm/isi_dm/src/task_runner.

Once any errors for a policy have been resolved, the ‘isi dm jobs modify’ command can be used to resume the job.

OneFS SmartSync Backup-to-Object Configuration

As we saw in the previous article in this series, SmartSync in OneFS 9.11 sees the addition of backup-to-object, which provides high performance, full-fidelity incremental replication to ECS, ObjectScale, Wasabi, and AWS S3 & Glacier IR object stores.

This new SmartSync backup-to-object functionality supports the full spectrum of OneFS path lengths, encodings, and file sizes up to 16TB – plus special files and alternate data streams (ADS), symlinks and hardlinks, sparse regions, and POSIX and SMB attributes. Specifically:

Copy-to-object (OneFS 9.10 & earlier) Backup-to-object (OneFS 9.11)
·         One-time file system copy to object

·         Baseline replication only, no support for incremental copies

·         Browsable/accessible filesystem-on-object representation

·         Certain object limitations

o   No support for sparse regions and hardlinks

o   Limited attribute/metadata support

o   No compression

·         Full-fidelity file system baseline & incremental replication to object

o   Supports ADS, special files, symlinks, hardlinks, sparseness, POSIX/NT attributes, and encoding

o   Any file size and any path length

·         Fast incremental copies

·         Compact file system snapshot representation in native cloud

·         Object representation

o   Grouped by target base-path in policy configuration

o   Further grouped by Dataset ID, Global File ID

SmartSync backup-to-object operates on user-defined data set, which are essentially OneFS file system snapshots with plus additional properties.

A data set creation policy takes snapshots and creates a data set out of it. Additionally, there are also copy and repeat copy policies which are the policies that will transfer that data set to another system. And the execution of these two policy types can be linked and scheduled separately. So you can have one schedule for your data set creation, say to create a data set every hour on a particular path. And you can have a tiered or different distribution system for the actual copy itself. For example, to copy every hour to a hot DR cluster in data center A. But also copy every month to a deep archive cluster in data center B. So all these things are possible now, without increasing the bloat of snapshots on the system, since they’re now able to be shared.

Currently, SmartSync does not have a WebUI presence, so all its configuration is either via the command-line or platform API.

Here’s the procedure for crafting a baseline replication config:

Essentially, create the replication account, which in OneFS 9.11 will be either Dell ECS or Amazon AWS. Then configure that dataset creation policy, run it, and, if desired, create a repeat-copy policy. These specific steps with their CLI syntax include:

  1. Create a replication account:
# isi dm account create --account-type [AWS_S3 | ECS_S3]
  1. Configure a dataset creation policy
# isi dm policies create [Policy Name] --policy-type CREATION
  1. Run the dataset creation policy:
# isi dm policies list

# isi dm policies modify [Creation policy id] –-run-now=true

# isi dm jobs list

# isi dm datasets list
  1. create a repeat-copy policy
# isi dm policies create [Policy Name] --policy-type=' REPEAT_COPY'
  1. Run the repeat-copy policy:
# isi dm policies list

# isi dm policies modify [Repeat-copy policy id] –-run-now=true
  1. View the data replication job status
# isi dm jobs list

Similarly for an incremental replication config:

Note that the dataset creation policy and repeat-copy policy are already created in the baseline replication configure and can be ignored.

Incremental replication using the dataset create and repeat-copy policies from the previous slide’s baseline config.

  1. Run the dataset creation policy
# isi dm policies list

# isi dm policies modify [Creation policy id] –-run-now=true

# isi dm jobs list

# isi dm datasets list
  1. Run the repeat-copy policy:
# isi dm policies list

# isi dm policies modify [Repeat-copy policy id] –-run-now=true
  1. View the data replication incremental job status
# isi dm jobs list

And here’s the basic procedure for creating and running a partial or full restore:

Note that the replication account is already created on the original cluster and the creation step can be ignored.  Replication account creation is only required if restoring the dataset to a new cluster.

Additionally, partial restoration involves a subset of the directory structure, specified via the ‘source path’ , whereas full restoration invokes a restore of the entire dataset.

The process includes creating the replication account if needed, finding the ID of the dataset to be restored, creating and running the partial or full restoration policy, and checking the job status to verify it ran successfully.

  1. Create a replication account:
# isi dm account create --account-type [AWS_S3 | ECS_S3]

For example:

# isi dm account create --account-type ECS_S3 --name [Account Name] --access-id [access-id] --uri [URI with bucket-name] --auth-mode CLOUD --secret-key [secret-key] --storage-class=[For AWS_S3 only: STANDARD or GLACIER_IR]
  1. Verify the dataset ID for restoration:
# isi_dm browse

Checking the following attributes:

  • list-accounts
  • connect-account [Source Account ID created in step 1]
  • list-datasets
  • connect-dataset [Dataset id]
  1. Create a partial or full restoration policy
# isi dm policies create [Policy Name] --policy-type='COPY'
  1. Run the partial or full restoration policy:
# isi dm policies modify [Restoration policy id] –-run-now=true
  1. View the data restoration job status
# isi dm jobs list

OneFS 9.11 also introduces recovery point objective or RPO alerts for SmartSync, but note that these are for repeat-copy policies only. These RPO alerts can be configured through the replication policy by adding the desired time value to the ‘repeat-copy-rpo-alert’ parameter. If this configured threshold is exceeded, an RPO alert is triggered. This RPO alert is automatically resolved after the next successful policy job run.

Also be aware that the default time value for a repeat copy RPO is zero, which instructs SmartSync to not generate RPO alerts for that policy.

The following CLI syntax can be used to create a replication policy, with the ‘–repeat-copy-rpo-alert’ flag set for the desired time:

# isi dm policies create [Policy Name] --policy-type=' REPEAT_COPY' --enabled='true' --priority='NORMAL' --repeat-copy-source-base-path=[Source Path] --repeat-copy-base-base-account-id=[Source account id] --repeat-copy-base-source-account-id=[Source account id] --repeat-copy-base-target-account-id=[Target account id] --repeat-copy-base-new-tasks-account=[Source account id] --repeat-copy-base-target-dataset-type='FILE_ON_OBJECT_BACKUP' --repeat-copy-base-target-base-path=[Bucket Name] --repeat-copy-rpo-alert=[time]

And similarly to change the RPO alert configuration on an existing replication policy:

# isi dm policies modify [Policy id] --repeat-copy-rpo-alert=[time]

An alert is triggered and corresponding CELOG event created if the specified RPO for the policy is exceeded. For example:

# isi event list

ID   Started     Ended       Causes Short                     Lnn  Events  Severity

--------------------------------------------------------------------------------------

1898 07/15 00:00 07/15 00:00 SW_CELOG_HEARTBEAT               1    1       information

2012 07/15 06:03 --          SW_DM_RPO_EXCEEDED               2    1       warning

--------------------------------------------------------------------------------------

And then once RPO alert has been resolved after a successful replication policy job run:

# isi event list

ID   Started     Ended       Causes Short                     Lnn  Events  Severity

--------------------------------------------------------------------------------------

1898 07/15 00:00 07/15 00:00 SW_CELOG_HEARTBEAT               1    1       information

2012 07/15 06:03 07/15 06:12 SW_DM_RPO_EXCEEDED               2    2       warning

--------------------------------------------------------------------------------------

OneFS SmartSync Backup-to-Object

Another significant benefactor of new functionality in the recent OneFS 9.11 release is SmartSync. As you may recall, SmartSync allows multiple copies of a dataset to be copied, replicated, and stored across locations and regions, both on and off-prem, providing increased data resilience and the ability to rapidly recover from catastrophic events.

In addition to fast, efficient, scalable protection with granular recovery, SmartSync allows organizations to utilize lower cost object storage as the target for backups, reduce data protection complexity and cost by eliminating the need for separate backup applications. Plus disaster recovery options include restoring a dataset to its original state, or cloning a new cluster.

SmartSync sees the following enhancements in OneFS 9.11:

  • Automated incremental-forever replication to object storage.
  • Unparalleled scalability and speed, with seamless pause/resume for robust resiliency and control.
  • End-to-end encryption for security of data-in-flight and at rest.
  • Complete data replication, including soft/hard links, full file paths, and sparse files
  • Object storage targets: AWS S3, AWS Glacier IR, Dell ECS/ObjectScale, and Wasabi (with the addition of Azure and GCP support in a future release).

But first, a bit of background. Introduced back in OneFS 9.4, SmartSync operates in two distinct modes:

  • Regular push-and-pull transfer of file data between PowerScale clusters.
  • CloudCopy, copying of file-to-object data from a source cluster to a cloud object storage target.

CloudCopy copy-to-object in OneFS 9.10 and earlier releases is strictly a one-time copy tool, rather than a replication utility. So, after a copy, viewing the bucket contents from AWS, console or S3 browser yielded an object format tree-like representation of the OneFS file system. However, there were a number of significant shortcomings, such as no native support for attributes like ACLs, or certain file types like character files, and no method to represent hard links in a reasonable way. So OneFS had to work around these things by expanding hard links, and redirecting objects that had too long of a path. The other major limitation was that it really had just been a one-and-done copy. After creating and running a policy, once the job had completed the data was in the cloud, and that was it. OneFS had no provision for any incremental transfer of any subsequent changes to the cloud copy when the source data changed.

In order to address these limitations, SmartSync in OneFS 9.11 sees the addition of backup-to-object functionality. This includes a full-fidelity file system baseline, plus fast incremental replication to Dell ECS and ObjectScale, Wasabi, and AWS S3 and Glacier IR object stores.

This new backup-to-object functionality supports the full range of OneFS path lengths, encodings, and file sizes up to 16TB – plus special files and alternate data streams (ADS), symlinks and hardlinks, sparse regions, and POSIX and SMB attributes.

Copy-to-object (OneFS 9.10 & earlier) Backup-to-object (OneFS 9.11)
·         One-time file system copy to object

·         Baseline replication only, no support for incremental copies

·         Browsable/accessible filesystem-on-object representation

·         Certain object limitations

o   No support for spareness and hardlinks

o   Limited attribute/metadata support

o   No compression

·         Full-fidelity file system baseline & incremental replication to object

o   Supports ADS, special files, symlinks, hardlinks, sparseness, POSIX/NT attributes, and encoding

o   Any file size and any path length

·         Fast incremental copies

·         Compact file system snapshot representation in native cloud

·         Object representation

o   Grouped by target basepath in policy configuration

o   Further grouped by Dataset ID, Global File ID

 

Architecturally, SmartSync incorporates the following concepts:

Concept Description
Account •      References to systems that participate in jobs (PowerScale clusters, cloud hosts)

•      Made up of a name, a URI and auth info

Dataset •      Abstraction of a filesystem snapshot; the thing we copy between systems

•      Identified by Dataset IDs

Global File ID •      Conceptually a global LIN that references a specific file on a specific system
Policy •      Dataset creation policy creates a dataset

•      Copy/Repeat Copy policies take an existing dataset and put it on another system

•      Policy execution can be linked and scheduled

 

Push/Pull, Cascade/Reconnect

 

•      Clusters syncing to each other in sequence (A>B>C)

•      Clusters can skip baseline copy and directly perform incremental updates (A>C)

•      Clusters can both request and send datasets

Transfer resiliency

 

•      Small errors don’t need to halt a policy’s progress

Under the hood, SmartSync uses this concept of a data set, which is fundamentally an abstraction of a OneFS file system snapshot – albeit with some additional properties attached to it.

Each data set is identified by a unique ID. Plus, with this notion of data sets, OneFS can now also perform an A to B replication and an A to C replication – two replications of the same data set to two different targets. Plus with these new data sets, B and C can now also reference each other and perform incremental replication amongst themselves, assuming they have a common ancestor snapshot that they share.

A SmartSync data set creation policy takes snapshots and creates a data set from it. Additionally, there are also copy and repeat copy policies, which are the policies that are used to transfer that data set to another system. The execution of these two policy types can be linked and scheduled separately. So one schedule can be for data set creation, say to create a data set every hour on a particular path, and the other schedule for a tiered or different distribution system for the actual copy itself. For example, in order to copy hourly to a hot DR cluster in data center A, and also copy monthly to a deep archive cluster in data center B – all without increasing the proliferation of snapshots on the system, since they’re now able to be shared.

Additionally, SmartSync in 9.11 also introduces the foundational concept of a global file ID (GFID), which is essentially a global LIN that represents a specific file on a particular system. OneFS can now use this GFID, in combination with a data set, to reference a file anywhere and guarantee that it means the same thing across every cluster.

Security-wise, each SmartSync daemon has an identity certificate that acts as both a client and server certificate depending on the direction of the data movement. This identity certificate is signed by a non-public certificate authority. To establish trust between two clusters, they must have each other’s CAs. These CAs may be the same. Trust groups (daemons that may establish connections to each other) are formed by having shared CAs installed.

There are no usernames or passwords; authentication is authorization for V1. All cluster-to-cluster communication is performed via TLS-encrypted traffic. If absolutely necessary, encryption (but not authorization) can be disabled by setting a ‘NULL’ encryption cipher for specific use cases that require unencrypted traffic.

The SmartSync daemon supports checking certificate revocation status via the Online Certificate Status Protocol (OCSP). If the cluster is hardened and/or in FIPS-compliant mode, OCSP checking is forcibly enabled and set to the Strict stringency level, where any failure in OCSP processing results in a failed TLS handshake. Otherwise, OCSP checking can be totally disabled or set to a variety of values corresponding to desired behavior in cases where the responder is unavailable, the responder does not have information about the cert in question, and where information about the responder is missing entirely. Similarly, an override OCSP responder URI is configurable to support cases where preexisting certificates do not contain responder information.

SmartSync also supports a ‘strict hostname check’ option which mandates that the common name and/or subject alternative name fields of the peer certificate match the URI used to connect to that peer. This option, along with strict OCSP checking and disabling the null cipher option, are forcibly set when the cluster is operating in a hardened or FIPS-compliant mode.

For object storage connections, SmartSync uses ‘isi_cloud_api’ just as CloudPools does. As such, all considerations that apply to CloudPools also apply to SmartSync as well.

In the next article in this series, we’ll turn our attention to the core architecture and configuration of SmartSync backup-to-object.

PowerScale H and A-series Journal Mirroring and Hardware Resilience

The last couple of articles generated several questions for the field around durability and resilience in the newly released PowerScale H710/0 and A310/0 nodes. In this article, we’ll take a deeper look at the OneFS journal and boot drive mirroring functionality in these H and A-series platforms.

PowerScale chassis-based hardware, such as the new H710/7100 and A310/3100, stores the local filesystem journal and its mirror on persistent, battery-backed flash media within each node, with a 4RU PowerScale chassis housing four nodes. These nodes comprise a ‘compute node enclosure for the CPU, memory, and, and network cards, plus associated drive containers, or sleds, for each node.

The PowerScale H and A-series employ a node-pair architecture to dramatically increased system reliability, with each pair of nodes residing within a chassis power zone. This means that if a node’s PSU fails, the peer PSU supplies redundant power. It also drives a minimum cluster or node pool size of four nodes (one chassis) for the PowerScale H and A-series platforms, pairwise node population, and the need to scale the cluster two nodes at a time.

A node’s file system journal is protected against sudden power loss or hardware failure by OneFS’ journal vault functionality – otherwise known as ‘powerfail memory persistence’, or PMP. PMP automatically stores both the local journal and journal mirror on a separate flash drive across both nodes in a node pair:

This journal de-staging process is known as ‘vaulting’, during which the journal is protected by a dedicated battery in each node until it’s safely written from DRAM to SSD on both nodes in a node-pair. With PMP, constant power isn’t required to protect the journal in a degraded state since the journal is saved to M.2 flash and mirrored on the partner node.

So, the mirrored journal is comprised of both hardware and software components, including the following constituent parts:

Journal Hardware Components

  • System DRAM
  • 2 Vault Flash
  • Battery Backup Unit (BBU)
  • Non-Transparent Bridge (NTB) PCIe link to partner node
  • Clean copy on disk

Journal Software Components

  • Power-fail Memory Persistence (PMP)
  • Mirrored Non-volatile Interface (MNVI)
  • IFS Journal + Node State Block (NSB)
  • Utilities

Asynchronous DRAM Refresh (ADR) preserves RAM contents when the operating system is not running. ADR is important for preserving RAM journal contents across reboots, and it does not require any software coordination to do so.

The journal vaulting functionality encompasses the hardware, firmware, and operating system, ensuring that the journal’s contents are preserved across power failure. The mechanism is similar to the software journal mirroring employed on the PowerScale F-series nodes, albeit using a PCIe-based NTB on the chassis based platforms, instead of using the back-end network as with the all-flash nodes.

On power failure, the PMP vaulting functionality is responsible for copying both the local journal and the local copy of the partner node’s journal to persistent flash. On restoration of power, PMP is responsible for restoring the contents of both journals from flash to RAM, and notifying the operating system.

A single dedicated 480GB NVMe flash device (nvd0) is attached via an M.2 slot on the motherboard of the H710/0 and A310/0 node’s compute module, residing under the battery backup unit (BBU) pack.

This is in contrast to the prior H and A-series chassis generations, which used a 128GB SATA M.2 device (/dev/ada0).

For example, the following CLI commands show the NVMe M.2 flash device in an A310 node:

# isi_hw_status | grep -i prod
Product: A310-4U-Single-96GB-1x1GE-2x25GE SFP+-60TB-1638GB SSD-SED

# nvmecontrol devlist
 nvme0: Dell DN NVMe FIPS 7400 RI M.2 80 480GB
    nvme0ns1 (447GB)

# gpart show | grep nvd0
=>       40  937703008  nvd0  GPT  (447G)

# gpart show -l nvd0
=>       40  937703008  nvd0  GPT  (447G)
         40       2008        - free -  (1.0M)
       2048   41943040     1  isilon-pmp  (20G)
   41945088  895757960        - free -  (427G)

In the above, the ‘isilon-pmp’ partition on the M.2 flash device is used by the file system journal for its vaulting activities.

The the NVMe M.2 device is housed on the node compute module’s riser card, and its firmware is managed by the OneFS DSP (drive support package) framework:

Note that the entire compute module must be removed in order for its M.2 flash to be serviced. If the M.2 flash does need to be replaced for any reason, it will be properly partitioned and the PMP structure will be created as part of arming the node for vaulting.

For clusters using data-at-rest encryption (DARE), an encrypted M.2 device is used, in conjunction with SED data drives, to provide full FIPS compliance.

The battery backup unit (BBU), when fully charged, provides enough power to vault both the local and partner journal during a power failure event:

A single battery is utilized in the BBU, which also supports back-to-back vaulting:

On the software side, the journal’s Power-fail Memory Persistence (PMP) provides an equivalent to the NVRAM controller‘s vault/restore capabilities to preserve Journal. The PMP partition on the M.2 flash drive provides an interface between the OS and firmware.

If a node boots and its primary journal is found to be invalid for whatever reason, it has three paths for recourse:

  • Recover journal from its M.2 vault.
  • Recover journal from its disk backup copy.
  • Recover journal from its partner node’s mirrored copy.

The mirrored journal must guard against rolling back to a stale copy of the journal on reboot. This necessitates storing information about the state of journal copies outside the journal. As such, the Node State Block (NSB) is a persistent disk block that stores local and remote journal status (clean/dirty, valid/invalid, etc), as well as other non-journal information. NSB stores this node status outside the journal itself, and ensures that a node does not revert to a stale copy of the journal upon reboot.

Here’s the detail of an individual node’s compute module:

Of particular note is the ‘journal active’ LED, which is displayed as a white ‘hand icon’:

When this white hand icon is illuminated, it indicates that the mirrored journal is actively vaulting, and it is not safe to remove the node!

There is also a blue ‘power’ LED, and a yellow ‘fault’ LED per node. If the blue LED is off, the node may still be in standby mode, in which case it may still be possible to pull debug information from the baseboard management controller (BMC).

The flashing yellow ‘fault’ LED has several state indication frequencies:

Blink Speed Blink Frequency Indicator
Fast blink ¼ Hz BIOS
Medium blink 1 Hz Extended POST
Slow blink 4 Hz Booting OS
Off Off OS running

The mirrored non-volatile interface (MNVI) sits below /ifs and above RAM and the NTB, providing the abstraction of a reliable memory device to the /ifs journal. MNVI is responsible for synchronizing journal contents to peer node RAM, at the direction of the journal, and persisting writes to both systems while in a paired state. It upcalls into the journal on NTB link events, and notifies the journal of operation completion (mirror sync, block IO, etc). For example, when rebooting after a power outage, a node automatically loads the MNVI. It then establishes a link with its partner node and synchronizes its journal mirror across the PCIe Non-Transparent Bridge (NTB).

The Non-transparent Bridge (NTB) connects node pairs for OneFS Journal Replica:

The NTB Link itself is PCIe Gen3 X8, but there is no guarantee of NTB interoperability between different CPU generations. As such, the H710/0 and A310/0 use version 4 of the NTB driver, whereas the previous hardware generation uses NTBv3. This therefore means mixed-generation node pairs are unsupported.

Prior to mounting the /ifs file system, OneFS locates a valid copy of the journal from one of the following locations in order of preference:

Order Journal Location Description
1st Local disk A local copy that has been backed up to disk
2nd Local vault A local copy of the journal restored from Vault into DRAM
3rd Partner node A mirror copy of the journal from the partner node

Assuming the node was shut down cleanly, it will boot using a local disk copy of the journal. The journal will be restored into DRAM and /ifs will mount. On the other hand, if the node suffered a power disruption, the journal will be restored into DRAM from the M.2 vault flash instead (the PMP copies the journal into the M.2 vault during a power failure).

In the event that OneFS is unable to locate a valid journal on either the hard drives or M.2 flash on a node, it will retrieve a mirrored copy of the journal from its partner node over the NTB.  This is referred to as ‘Sync-back’.

Note: Sync-back state only occurs when attempting to mount /ifs.

On booting, if a node detects that its journal mirror on the partner node is out of sync (invalid), but the local journal is clean, /ifs will continue to mount.  Subsequent writes are then copied to the remote journal in a process known as ‘sync-forward’.

Here’s a list of the primary journal states:

Journal State Description
Sync-forward State in which writes to a journal are mirrored to the partner node.
Sync-back Journal is copied back from the partner node. Only occurs when attempting to mount /ifs.
Vaulting Storing a copy of the journal on M.2 flash during power failure. Vaulting is performed by PMP.

During normal operation, writes to the primary journal and its mirror are managed by the MNVI device module, which writes through local memory to the partner node’s journal via the NTB. If the NTB is unavailable for an extended period, write operations can still be completed successfully on each node. For example, if the NTB link goes down in the middle of a write operation, the local journal write operation will complete. Read operations are processed from local memory.

Additional journal protection for PowerScale chassis-based platforms is provided by OneFS’ powerfail memory persistence (PMP) functionality, which guards against PCI bus errors that can cause the NTB to fail.  If an error is detected, the CPU requests a ‘persistent reset’, during which the memory state is protected and node rebooted. When back up again, the journal is marked as intact and no further repair action is needed.

If a node loses power, the hardware notifies the BMC, initiating a memory persistent shutdown.  At this point the node is running on battery power. The node is forced to reboot and load the PMP module, which preserves its local journal and its partner’s mirrored journal by storing them on M.2 flash.  The PMP module then disables the battery and powers itself off.

Once power is back on and the node restarted, the PMP module first restores the journal before attempting to mount /ifs.  Once done, the node then continues through system boot, validating the journal, setting sync-forward or sync-back states, etc.

The mirrored journal has the following CLI commands, although these should seldom be needed during normal cluster operation:

  • isi_save_journal
  • isi_checkjournal
  • isi_testjournal
  • isi_pmp

A node’s journal can be checked and confirmed healthy as follows:

# isi_testjournal
Checking One external batteries Health...
Batteries good
Checking PowerScale Journal integrity...
Mounted DRAM journal check: good
IFS is mounted.

During boot, isi_checkjournal and isi_testjournal will invoke isi_pmp. If the M.2 vault devices are unformatted, isi_pmp will format the devices.

On clean shutdown, isi_save_journal stashes a backup copy of the /dev/mnv0 device on the root filesystem, just as it does for the NVRAM journals in previous generations of hardware.

If a mirrored journal issue is suspected, or notified via cluster alerts, the best place to start troubleshooting is to take a look at the node’s log events. The journal logs to /var/log/messages, with entries tagged as ‘journal_mirror’.

Additionally, the following sysctls also provide information about the state of the journal mirror itself and the MNVI connection respectively:

# sysctl efs.journal.mirror_state
efs.journal.mirror_state:
{
    Journal state: valid_protected
    Journal Read-only: false
    Need to inval mirror: false
    Sync in progress: false
    Sync error: 0
    Sync noop in progress: false
    Mirror work queued: false
    Local state:
    {
        Clean: dirty
        Valid: valid
    }
    Mirror state:
    {
        Connection: up
        Validity: valid
    }
}

And the MNVI connection state:

# sysctl hw.mnv0.state
hw.mnv0.state.iocnt: 0
hw.mnv0.state.cb_active: 0
hw.mnv0.state.io_gate: 0
hw.mnv0.state.state: 3

OneFS provides the following CELOG events for monitoring and alerting about mirrored journal issues:

CELOG Event Description
HW_GEN6_NTB_LINK_OUTAGE Non-transparent bridge (NTP) PCIe link is unavailable
FILESYS_JOURNAL_VERIFY_FAILURE No valid journal copy found on node

Another OneFS reliability optimization for the PowerScale chassis-based platforms is boot partition mirroring. OneFS boot and other OS partitions are stored on a node’s internal drives, and these partitions are mirrored (with the exception of crash dump partitions). The two mirrors protect against disk sled removal. Since each drive in a disk sled belongs to a separate disk pool, both elements of a mirror cannot live on the same sled.

With regard to the nodes’ internal drives, the boot disk reservation size has increased to 18GB on these new platforms from 8GB on the previous generation. Plus partition sizes have also been expanded on these new platforms in OneFS 9.11, as follows:

Partition H71x and A31x H70x and A30x
hw 1GB 500MB
journal backup 8197MB 8GB
kerneldump 5GB 2GB
keystore 64MB 64MB
root 4GB 2GB
var 4GB 2GB
var-crash 7GB 3GB

OneFS automatically rebalances these mirrors in anticipation of, and in response to, service events. Mirror rebalancing is triggered by drive events such as suspend, softfail and hard loss.

The ‘isi_mirrorctl verify’ and ‘gmirror status’ CLI commands can be used to confirm that boot mirroring is working as intended. For example, on an A310 node:

# gmirror status
Name Status Components
mirror/root0 COMPLETE da10p3 (ACTIVE)
da11p3 (ACTIVE)
mirror/mfg COMPLETE da15p7 (ACTIVE)
da12p6 (ACTIVE)
mirror/kernelsdump COMPLETE da15p6 (ACTIVE)
mirror/kerneldump COMPLETE da15p5 (ACTIVE)
mirror/var-crash COMPLETE da15p3 (ACTIVE)
da9p3 (ACTIVE)
mirror/journal-backup COMPLETE da14p5 (ACTIVE)
da12p5 (ACTIVE)
mirror/jbackup-peer COMPLETE da14p3 (ACTIVE)
da12p3 (ACTIVE)
mirror/keystore COMPLETE da12p7 (ACTIVE)
da10p10 (ACTIVE)
mirror/root1 COMPLETE da11p7 (ACTIVE)
da10p7 (ACTIVE)
mirror/var0 COMPLETE da11p6 (ACTIVE)
da10p6 (ACTIVE)
mirror/hw COMPLETE da10p9 (ACTIVE)
da7p5 (ACTIVE)
mirror/var1 COMPLETE da10p8 (ACTIVE)
da7p3 (ACTIVE)

Or:

# isi_mirrorctl verify
isi.sys.distmirror - INFO - Mirror root1: has an ACTIVE consumer of da11p5
isi.sys.distmirror - INFO - Mirror root1: has an ACTIVE consumer of da10p7
isi.sys.distmirror - INFO - Mirror var1: has an ACTIVE consumer of da13p5
isi.sys.distmirror - INFO - Mirror var1: has an ACTIVE consumer of da16p5
isi.sys.distmirror - INFO - Mirror journal-backup: has an ACTIVE consumer of da12p5
isi.sys.distmirror - INFO - Mirror journal-backup: has an ACTIVE consumer of da16p6
isi.sys.distmirror - INFO - Mirror jbackup-peer: has an ACTIVE consumer of da12p3
isi.sys.distmirror - INFO - Mirror jbackup-peer: has an ACTIVE consumer of da14p3
isi.sys.distmirror - INFO - Mirror var-crash: has an ACTIVE consumer of da10p6
isi.sys.distmirror - INFO - Mirror var-crash: has an ACTIVE consumer of da11p3
isi.sys.distmirror - INFO - Mirror kerneldump: has an ACTIVE consumer of da14p5
isi.sys.distmirror - INFO - Mirror root0: has an ACTIVE consumer of da10p3
isi.sys.distmirror - INFO - Mirror root0: has an ACTIVE consumer of da13p6
isi.sys.distmirror - INFO - Mirror var0: has an ACTIVE consumer of da13p3
isi.sys.distmirror - INFO - Mirror var0: has an ACTIVE consumer of da16p3
isi.sys.distmirror - INFO - Mirror kernelsdump: has an ACTIVE consumer of da14p6
isi.sys.distmirror - INFO - Mirror mfg: has an ACTIVE consumer of da13p9
isi.sys.distmirror - INFO - Mirror mfg: has an ACTIVE consumer of da16p7
isi.sys.distmirror - INFO - Mirror hw: has an ACTIVE consumer of da10p8
isi.sys.distmirror - INFO - Mirror hw: has an ACTIVE consumer of da13p8
isi.sys.distmirror - INFO - Mirror keystore: has an ACTIVE consumer of da13p10
isi.sys.distmirror - INFO - Mirror keystore: has an ACTIVE consumer of da16p8

The A310 node’s disks in the output above are laid out as follows:

# isi devices drive list
Lnn  Location  Device    Lnum  State   Serial       Sled
---------------------------------------------------------
128  Bay  1    /dev/da1  15    L3      X3X0A0JFTMSJ N/A
128  Bay  2    -         N/A   EMPTY                N/A
128  Bay  A0   /dev/da4  12    HEALTHY WQB0QKBR     A
128  Bay  A1   /dev/da3  13    HEALTHY WQB0QHV4     A
128  Bay  A2   /dev/da2  14    HEALTHY WQB0QHN3     A
128  Bay  B0   /dev/da7  9     HEALTHY WQB0QH4S     B
128  Bay  B1   /dev/da6  10    HEALTHY WQB0QGY3     B
128  Bay  B2   /dev/da5  11    HEALTHY WQB0QJWE     B
128  Bay  C0   /dev/da10 6     HEALTHY WQB0QJ26     C
128  Bay  C1   /dev/da9  7     HEALTHY WQB0QHYW     C
128  Bay  C2   /dev/da8  8     HEALTHY WQB0QK6Q     C
128  Bay  D0   /dev/da13 3     HEALTHY WQB0QJES     D
128  Bay  D1   /dev/da12 4     HEALTHY WQB0QHGG     D
128  Bay  D2   /dev/da11 5     HEALTHY WQB0QKH5     D
128  Bay  E0   /dev/da16 0     HEALTHY WQB0QHFR     E
128  Bay  E1   /dev/da15 1     HEALTHY WQB0QJWD     E
128  Bay  E2   /dev/da14 2     HEALTHY WQB0QKGB     E
---------------------------------------------------------

When it comes to SmartFailing nodes, there are a couple of additional caveats to be aware of with mirrored journal and the PowerScale chassis-based platforms:

  • When SmartFailing one node in a pair, there is no compulsion to smartfail its partner node too.
  • A node will still run indefinitely with its partner absent. However, this significantly increases the window of risk since there is no journal mirror to rely on (in addition to lack of redundant power supply, etc).
  • If a single node in a pair is SmartFailed, the other node’s journal is still protected by the vault and powerfail memory persistence.

PowerScale A310 and A3100 Platforms

In this article, we’ll examine the new PowerScale A310 and A3100 hardware platforms that were released a couple of weeks back.

These A310 and A3100 comprise the latest generation of PowerScale A-series ‘archive’ platforms:

The PowerScale A-series systems are designed for cooler, infrequently accessed data use cases. These include active archive workflows for the A310, such as regulatory compliance data, medical imaging archives, financial records, and legal documents. And deep archive/cold storage for the A3100 platform, including surveillance video archives, backup, and DR repositories.

Representing the archive-tier, the A310 and A3100 both utilize a single-socket Zeon processor with 96GB of memory and fifteen (A310) or twenty hard drives per node respectively, plus SSDs for metadata/caching – and with four nodes residing within a 4RU chassis. From an initial 4 node (1 chassis) starting point, A310 and A31100 clusters can be easily and non-disruptively scaled two nodes at a time up to a maximum of 252 nodes (63 chassis) per cluster.

The A31x modular platform is based on Dell’s ‘Infinity’ chassis. Each node’s compute module contains a single 16-core Intel Sapphire Rapids CPU running at 1.8 GHz and with 22.5MB of cache, plus 96GB of DDR5 DRAM. Front-End networking options include 10/25 GbE and with both Ethernet or Infiniband as selectable options for the back-End network.

As such, the new A31x core hardware specifications are as follows:

Hardware Class PowerScale A-Series (Archive)
Model A310 A3100
OS version Requires OneFS 9.11 or above.

Requires NFP 13.1 or greater.

BIOS based on Dell’s PowerBIOS

Requires OneFS 9.11 or above.

Requires NFP 13.1 or greater.

BIOS based on Dell’s PowerBIOS

Platform Four nodes per 4RU chassis; upgradeable per pair; node-compatible with prior gens. Four nodes per 4RU chassis; upgradeable per pair; node-compatible with prior gens.
CPU 8 Cores @ 1.8GHz, 22.5MB Cache 8 Cores @ 1.8GHz, 22.5MB Cache
Memory 96GB DDR5 DRAM 96GB DDR5 DRAM
Journal M.2: 480GB NVMe with 3-cell battery backup (BBU) M.2: 480GB NVMe with 3-cell battery backup (BBU)
Depth Standard 36.7 inch chassis Deep 42.2 inch chassis
Cluster size Max of 63 chassis (252 nodes) per cluster. Max of 63 chassis (252 nodes) per cluster.
Storage Drives 60 per chassis     (15 per node) 80 per chassis     (20 per node)
HDD capacities 2TB,4TB, 8TB, 12TBTB, 16TB, 20TB, 24TB 12TBTB, 16TB, 20TB, 24TB
SSD (cache) capacities 0.8TB, 1.6TB, 3.2TB, 7.68TB 0.8TB, 1.6TB, 3.2TB, 7.68TB
Max raw capacity 1.4PB per chassis 1.9PB per chassis
Front-end network 10/25 Gb Ethernet 10/25 Gb Ethernet
Back-end network Ethernet or Infiniband Ethernet or Infiniband

These node hardware attributes can be easily viewed from the OneFS CLI via the ‘isi_hw_status’ command. For example, from an A3100:

# isi_hw_status
  SerNo: CF2BC243400025

 Config: H6R28

ChsSerN:

ChsSlot: 1

FamCode: A

ChsCode: 4U

GenCode: 10

PrfCode: 3

   Tier: 3
  Class: storage
 Series: n/a
Product: A3100-4U-Single-96GB-1x1GE-2x25GE SFP+-240TB-6554GB SSD
  HWGen: PSI

Chassis: INFINITY (Infinity Chassis)

    CPU: GenuineIntel (1.80GHz, stepping 0x000806f8)

   PROC: Single-proc, Octa-core

    RAM: 103079215104 Bytes

   Mobo: INFINITYPIFANO (Custom EMC Motherboard)

  NVRam: INFINITY (Infinity Memory Journal) (4096MB card) (size 4294967296B)

 DskCtl: LSI3808 (LSI 3808 SAS Controller) (8 ports)

 DskExp: LSISAS35X36I (LSI SAS35x36 SAS Expander - Infinity)

PwrSupl: Slot1-PS0 (type=ACBEL POLYTECH, fw=03.01)

PwrSupl: Slot2-PS1 (type=ACBEL POLYTECH, fw=03.01)

  NetIF: bge0,lagg0,mce0,mce1,mce2,mce3

 BEType: 25GigE

 FEType: 25GigE

 LCDver: IsiVFD2 (Isilon VFD V2)

 Midpln: NONE (No Midplane Support)

Power Supplies OK

Power Supply Slot1-PS0 good

Power Supply Slot2-PS1 good

CPU Operation (raw 0x882C0800)  = Normal

CPU Speed Limit                 = 100.00%

Fan0_Speed                      = 12360.000

Fan1_Speed                      = 12000.000

Slot1-PS0_In_Voltage            = 212.000

Slot2-PS1_In_Voltage            = 209.000

SP_CMD_Vin                      = 12.100

CMOS_Voltage                    = 3.120

Slot1-PS0_Input_Power           = 290.000

Slot2-PS1_Input_Power           = 290.000

Pwr_Consumption                 = 590.000

SLIC0_Temp                      = na

SLIC1_Temp                      = na

DIMM_Bank0                      = 42.000

DIMM_Bank1                      = 40.000

CPU0_Temp                       = -43.000

SP_Temp0                        = 40.000

MP_Temp0                        = na

MP_Temp1                        = 29.000

Embed_IO_Temp0                  = 51.000

Hottest_SAS_Drv                 = -45.000

Ambient_Temp                    = 29.000

Slot1-PS0_Temp0                 = 47.000

Slot1-PS0_Temp1                 = 40.000

Slot2-PS1_Temp0                 = 47.000

Slot2-PS1_Temp1                 = 40.000

Battery0_Temp                   = 38.000

Drive_IO0_Temp                  = 43.000

Also note that the A310 and A3100 are only available in a 96GB memory configuration.

On the front of each chassis is an LCD front panel control with back-lit buttons and 4 LED Light Bar Segments – 1 per Node. These LEDs typically display blue for normal operation or yellow to indicate a node fault. This LCD display is articulating,  allowing it to be swung clear of the drive sleds for non-disruptive HDD replacement, etc.

The rear of the chassis houses the compute modules for each node, which contain CPU, memory, networking, cache SSDs, and power supplies. Specifically, an individual compute module contains a multi core Cascade Lake CPU, memory, M2 flash journal, up to two SSDs for L3 cache, six DIMM channels, front end 10/25 Gb ethernet, backend 40/100 or 10/25 Gb ethernet or Infiniband, an ethernet management interface, and power supply and cooling fans:

As shown above, the field replaceable components are indicated via colored ‘touchpoints’. Two touchpoint colors, orange and blue, indicate respectively which components are hot swappable versus replaceable via a node shutdown.

Touchpoint Detail
Blue Cold (offline) field serviceable component
Orange Hot (Online) field serviceable component

The serviceable components within an PowerScale A310 or A3100 chassis are as follows:

Component Hot Swap CRU FRU
Drive sled Yes Yes Yes
·         Hard drives (HDDs) Yes Yes Yes
Compute node No Yes Yes
·         Compute module No No No
o   M.2 journal flash No No Yes
o   CPU complex No No No
o   DIMMs No No Yes
o   Node fans No No Yes
o   NICs/HBAs No No Yes
o   HBA riser No No Yes
o   Battery backup unit (BBU) No No Yes
o   DIB No No No
·         Flash drives (SSDs) Yes Yes Yes
·         Power supply with fan Yes Yes Yes
Front panel Yes No Yes
Chassis No No Yes
Rail kits No No Yes
Mid-plane Replace entire chassis

Nodes are paired for resilience and durability, with each pair sharing a mirrored journal and two power supplies.

Storage-wise, each of the four nodes within a PowerScale A310/0 chassis’ has five associated drive containers, or sleds. These sleds occupy bays in the front of each chassis, with a node’s drive sleds stacked vertically. For example:

Nodes are numbered 1 through 4, left to right looking at the front of the chassis, while the drive sleds are labeled A  through E, with A at the top.

The drive sled is the tray which slides into the front of the chassis. Within each sled, the 3.5” SAS hard drives it contains are numbered sequentially starting from drive zero, which is the HDD adjacent the airdam.

Each bay in a drive sled has a yellow ‘drive fault’ LED associated with each drive:

Even when a sled is removed from its chassis and its power source, these fault LEDs will remain active for 10+ minutes. LED viewing holes are also provided so the sled’s top cover does not need to be removed.

The A3100’s 42.2 inch chassis accommodates four HDDs per sled, compared to three drives for the standard (36.7 inch) depth A310 shown above. As such, the A3100 requires a deep rack, such as the Dell Titan cabinet whereas the A310 can reside in a regular 17” data center cabinet.

The A310 and A3100 platforms support a range of HDD capacities, currently including 2TB, 4, 8, 12, 16, 20, and 24TB capacities, and both regular ISE (instant secure erase) or self-encrypting drive (SED) formats.

A node’s drive details can be queried with OneFS CLI utilities such as ‘isi_radish’ and ‘isi_drivenum’. For example, the command output from an A3100 node:

# isi_drivenum

Bay  1   Unit 6      Lnum 20    Active      SN:GXNG0X800253     /dev/da1
Bay  2   Unit 7      Lnum 21    Active      SN:GXNG0X800263     /dev/da2
Bay  A0   Unit 19     Lnum 16    Active      SN:ZRT1A5JR         /dev/da6
Bay  A1   Unit 18     Lnum 17    Active      SN:ZRT1A4SE         /dev/da5
Bay  A2   Unit 17     Lnum 18    Active      SN:ZRT1A42D         /dev/da4
Bay  A3   Unit 16     Lnum 19    Active      SN:ZRT19494         /dev/da3
Bay  B0   Unit 25     Lnum 12    Active      SN:ZRT18NEY         /dev/da10
Bay  B1   Unit 24     Lnum 13    Active      SN:ZRT1FJCJ         /dev/da9
Bay  B2   Unit 23     Lnum 14    Active      SN:ZRT18N7F         /dev/da8
Bay  B3   Unit 22     Lnum 15    Active      SN:ZRT1FDJL         /dev/da7
Bay  C0   Unit 31     Lnum 8     Active      SN:ZRT1FJ0T         /dev/da14
Bay  C1   Unit 30     Lnum 9     Active      SN:ZRT1F6BF         /dev/da13
Bay  C2   Unit 29     Lnum 10    Active      SN:ZRT1FJMS         /dev/da12
Bay  C3   Unit 28     Lnum 11    Active      SN:ZRT18NE6         /dev/da11
Bay  D0   Unit 37     Lnum 4     Active      SN:ZRT18N9P         /dev/da18
Bay  D1   Unit 36     Lnum 5     Active      SN:ZRT18N8V         /dev/da17
Bay  D2   Unit 35     Lnum 6     Active      SN:ZRT18NBE         /dev/da16
Bay  D3   Unit 34     Lnum 7     Active      SN:ZRT1FR62         /dev/da15
Bay  E0   Unit 43     Lnum 0     Active      SN:ZRT1FDJ4         /dev/da22
Bay  E1   Unit 42     Lnum 1     Active      SN:ZRT1FR86         /dev/da21
Bay  E2   Unit 41     Lnum 2     Active      SN:ZRT1EJ4H         /dev/da20
Bay  E3   Unit 40     Lnum 3     Active      SN:ZRT1E9MS         /dev/da19

The first two lines of output about (bays 1 & 2) reference the cache SSD drives, contained withing the compute modules. The remaining ‘bay’ locations indicate both the sled (A to E) and drive (0 to 3). The presence above of four HDDs per sled (ie. bay numbers 0 to 3) indicate this is an A3100 node, rather than an A310 with only three HDDs per sled.

With regard to the nodes’ internal drives, the boot disk reservation size has increased to 18GB on these new platforms from 8GB on the previous generation. Plus partition sizes have also been expanded on these new platforms in OneFS 9.11, as follows:

Partition A310 / A3100 A300 / A3000
hw 1GB 500MB
journal backup 8197MB 8GB
kerneldump 5GB 2GB
keystore 64MB 64MB
root 4GB 2GB
var 4GB 2GB
var-crash 7GB 3GB

The PowerScale A310 and A3100 platforms are available in the following networking configurations, with a 10/25Gb Ethernet front-end and either Ethernet or Infiniband back-end:

Model A310 A3100
Front-end network 10/25 GigE 10/25 GigE
Back-end network 10/25 GigE, Infiniband 10/25 GigE, Infiniband

These NICs and their PCI bus addresses can be determined via the ’pciconf’ CLI command, as follows:

# pciconf -l | grep mlx

mlx5_core0@pci0:16:0:0: class=0x020000 card=0x002015b3 chip=0x101f15b3 rev=0x00 hdr=0x00

mlx5_core1@pci0:16:0:1: class=0x020000 card=0x002015b3 chip=0x101f15b3 rev=0x00 hdr=0x00

mlx5_core2@pci0:65:0:0: class=0x020000 card=0x002015b3 chip=0x101f15b3 rev=0x00 hdr=0x00

mlx5_core3@pci0:65:0:1: class=0x020000 card=0x002015b3 chip=0x101f15b3 rev=0x00 hdr=0x00

Similarly, the NIC hardware details and drive firmware versions can be viewed as follows:

# mlxfwmanager

Querying Mellanox devices firmware ...

Device #1:
----------
  Device Type:      ConnectX6LX
  Part Number:      06XJXK_0R5WK9_Ax
  Description:      NVIDIA ConnectX-6 LX Dual Port 25 GbE SFP Network Adapter
  PSID:             DEL0000000031
  PCI Device Name:  pci0:16:0:0
  Base GUID:        58a2e10300e22a24
  Base MAC:         58a2e1e22a24
  Versions:         Current        Available
     FW             26.36.1010     N/A
     PXE            3.6.0901       N/A
     UEFI           14.29.0014     N/A

  Status:           No matching image found

Device #2:
----------
  Device Type:      ConnectX6LX
  Part Number:      06XJXK_0R5WK9_Ax
  Description:      NVIDIA ConnectX-6 LX Dual Port 25 GbE SFP Network Adapter
  PSID:             DEL0000000031
  PCI Device Name:  pci0:65:0:0
  Base GUID:        58a2e10300e22bf4
  Base MAC:         58a2e1e22bf4
  Versions:         Current        Available
     FW             26.36.1010     N/A
     PXE            3.6.0901       N/A
     UEFI           14.29.0014     N/A

  Status:           No matching image found

Compared to their A30x predecessors, the A310 and A3100 see a number of generational hardware upgrades. These include an shift to DDR5 memory, Sapphire Rapids CPU, and an up-spec’d power supply.

In terms of performance, the new A31x nodes provide a significant increase over the prior generation, as shown in the following streaming read and writes comparison chart for the A3100 and A3000:

OneFS node compatibility provides the ability to have similar node types and generations within the same node pool. In OneFS 9.11 and later, compatibility between the A310 and A3100 nodes and the previous generation platform is supported. Specifically, this node pool compatibility includes:

OneFS Node Pool Compatibility Gen6 MLK New
A200 A300/L A310/L
A2000 A3000/L A3100/L
H400 A300 A310

Node pool compatibility checking includes drive capacities, including for both data HDDs and SSD cache. This pool compatibility permits the addition of A310 node pairs to an existing node pool comprising four of more A300s if desired, rather than creating a A310 new node pool. Plus a similar compatibility for A3100/A3000 nodes.

Note that, while the A31x is node pool compatible with the A30x, the A31x nodes are effectively throttled to match the performance envelope of the A30x nodes. Regarding storage efficiency, support for OneFS inline data reduction on mixed A-series diskpools is as follows:

Gen6 MLK New Data Reduction Enabled
A200 A300/L A310/L False
A2000 A3000/L A3100/L False
H400 A300 A310 False
A200 A310 False
A300 A310 True
H400 A310 False
A2000 A3100 False
A3000 A3100 True

To summarize, in combination with OneFS 9.11, these new PowerScale hybrid A31x platforms deliver a compelling value proposition in terms of efficiency, density, flexibility, scalability, and affordability.

PowerScale H710 and H7100 Platforms

In this article, we’ll take a more in-depth look at the new PowerScale H710 and H7100 hardware platforms that were released last week. Here’s where these new systems sit in the current hardware hierarchy:

As such, the PowerScale H710 and H7100 are the workhorses of the PowerScale portfolio. Built for general-purpose workloads, the H71x platforms offering flexibility and scalability for a broad range of applications including home directories, file shares, generative AI, editing and post-production media workflows, and medical PACS and genomic data with efficient tiering.

Representing the mid-tier, the H710 and H7100 both utilize a single-socket Zeon processor with 384GB of memory and fifteen (H710) or twenty hard drives per node respectively, plus SSDs for metadata/caching – and with four nodes residing within a 4RU chassis. From an initial 4 node (1 chassis) starting point, H710 and H7100 clusters can be easily and non-disruptively scaled two nodes at a time up to a maximum of 252 nodes (63 chassis) per cluster.

The H71x modular platform is based on Dell’s ‘Infinity’ chassis. Each node’s compute module contains a single 16-core Intel Sapphire Rapids CPU running at 2.0 GHz and with 30MB of cache, plus 384GB of DDR5 DRAM. Front-End networking options include 10/25/40/100 GbE and with both 100Gb Ethernet or Infiniband as selectable options for the Back-End network.

As such, the new H71x core hardware specifications are as follows:

Hardware Class PowerScale H-Series (Hybrid)
Model A310 A3100
OS version Requires OneFS 9.11 or above, and NFP 13.1 or greater.

BIOS based on Dell’s PowerBIOS

Requires OneFS 9.11 or above, and NFP 13.1 or greater.

BIOS based on Dell’s PowerBIOS

Platform Four nodes per 4RU chassis; upgradeable per pair; node-compatible with prior gens. Four nodes per 4RU chassis; upgradeable per pair; node-compatible with prior gens.
CPU 16 Cores @ 2.0GHz, 30MB Cache 16 Cores @ 2.0GHz, 30MB Cache
Memory 384GB DDR5 DRAM 384GB DDR5 DRAM
Journal M.2: 480GB NVMe with 3-cell battery backup (BBU) M.2: 480GB NVMe with 3-cell battery backup (BBU)
Depth Standard 36.7 inch chassis Deep 42.2 inch chassis
Cluster size Max of 63 chassis (252 nodes) per cluster. Max of 63 chassis (252 nodes) per cluster.
Storage Drives 60 per chassis     (15 per node) 80 per chassis     (20 per node)
HDD capacities 2TB,4TB, 8TB, 12TBTB, 16TB, 20TB, 24TB 12TBTB, 16TB, 20TB, 24TB
SSD (cache) capacities 0.8TB, 1.6TB, 3.2TB, 7.68TB 0.8TB, 1.6TB, 3.2TB, 7.68TB
Max raw capacity 1.4PB per chassis 1.9PB per chassis
Front-end network 10/25/40/100 GigE 10/25/40/100 GigE
Back-end network 100 GigE, Infiniband 100 Gb/s Ethernet or Infiniband

These node hardware attributes, plus a variety of additional info and environmentals, can be easily viewed from the OneFS CLI via the ‘isi_hw_status’ command. For example, from an F710:

# isi_hw_status
  SerNo: CF25J243000005
 Config: 1WVXW
ChsSerN:
ChsSlot: 2
FamCode: H
ChsCode: 4U
GenCode: 10
PrfCode: 7
   Tier: 3
  Class: storage
 Series: n/a
Product: H710-4U-Single-192GB-1x1GE-2x100GE QSFP28-240TB-3277GB SSD-SED
  HWGen: PSI
Chassis: INFINITY (Infinity Chassis)
    CPU: GenuineIntel (2.00GHz, stepping 0x000806f8)
   PROC: Single-proc, 16-HT-core
    RAM: 206152138752 Bytes
   Mobo: INFINITYPIFANO (Custom EMC Motherboard)
  NVRam: INFINITY (Infinity Memory Journal) (8192MB card) (size 8589934592B)
 DskCtl: LSI3808 (LSI 3808 SAS Controller) (8 ports)
 DskExp: LSISAS35X36I (LSI SAS35x36 SAS Expander - Infinity)
PwrSupl: Slot1-PS0 (type=ARTESYN, fw=02.30)
PwrSupl: Slot2-PS1 (type=ARTESYN, fw=02.30)
  NetIF: bge0,lagg0,mce0,mce1,mce2,mce3
 BEType: 100GigE
 FEType: 100GigE
 LCDver: IsiVFD2 (Isilon VFD V2)
 Midpln: NONE (No Midplane Support)
Power Supplies OK
Power Supply Slot1-PS0 good
Power Supply Slot2-PS1 good
CPU Operation (raw 0x882D0800)  = Normal
CPU Speed Limit                 = 100.00%
Fan0_Speed                      = 12000.000
Fan1_Speed                      = 11880.000
Slot1-PS0_In_Voltage            = 208.000
Slot2-PS1_In_Voltage            = 207.000
SP_CMD_Vin                      = 12.100
CMOS_Voltage                    = 3.080
Slot1-PS0_Input_Power           = 280.000
Slot2-PS1_Input_Power           = 270.000
Pwr_Consumption                 = 560.000
SLIC0_Temp                      = na
SLIC1_Temp                      = na
DIMM_Bank0                      = 40.000
DIMM_Bank1                      = 41.000
CPU0_Temp                       = -43.000
SP_Temp0                        = 37.000
MP_Temp0                        = na
MP_Temp1                        = 29.000
Embed_IO_Temp0                  = 48.000
Hottest_SAS_Drv                 = -26.000
Ambient_Temp                    = 29.000
Slot1-PS0_Temp0                 = 58.000
Slot1-PS0_Temp1                 = 38.000
Slot2-PS1_Temp0                 = 55.000
Slot2-PS1_Temp1                 = 35.000
Battery0_Temp                   = 36.000
Drive_IO0_Temp                  = 42.000

Note that the H710 and H7100 are only available in a 384GB memory configuration.

Starting at the business end of the chassis, the articulating front panel display allows the user to join the nodes to a cluster, etc:

The chassis front panel includes an LCD display with 9 cap-touch back-lit buttons. Four LED Light bar segments, 1 per node, illuminate blue to indicate normal operation or yellow to alert of a node fault. The front panel display is hinge mounted so it can be moved clear of the drive sleds, with a ribbon cable running down the length of the chassis to connect the display to the midplane.

As with all PowerScale nodes, the front panel display provides some useful information for the four nodes, such as the ‘outstanding alerts’ status shown above, etc.

For storage, each of the four nodes within a PowerScale H710/0 chassis’ has five associated drive containers, or sleds. These sleds occupy bays in the front of each chassis, with a node’s drive sleds stacked vertically:

Nodes are numbered 1 through 4, left to right looking at the front of the chassis, while the drive sleds are labeled A through E, with sleds A occupying the top row of the chassis.

The drive sled is the tray which slides into the front of the chassis. Within each sled, the 3.5” SAS hard drives it contains are numbered sequentially starting from drive zero, which is the HDD adjacent the airdam.

The H7100 uses a longer 42.2 inch, allowing it to accommodate four HDDs per sled compared to three drives for the H710, which is 36.7 inch in depth. This also means that the H710 can reside in a regular 17” data center rack or cabinet, whereas the H7100 requires a deep rack, such as the Dell Titan cabinet.

The H710 and H7100 platforms support a range of HDD capacities, currently including 2TB, 4, 8, 12, 16, 20, and 24TB capacities, and both regular ISE (instant secure erase) or self-encrypting drive (SED) formats.

Each drive sled has a white ‘not safe to remove’ LED on its front top left, as well as a blue power/activity LED, and an amber fault LED.

The compute modules for each node are housed in the rear of the chassis, and contain CPU, memory, networking, and SSDs, as well as power supplies. Nodes 1 & 2 are a node pair, as are nodes 3 & 4. Each node-pair shares a mirrored journal and two power supplies:

Here’s the detail of an individual compute module, which contains a multi core Sapphire Rapids CPU, memory, M2 flash journal, up to two SSDs for L3 cache, six DIMM channels, front-end 40/100 or 10/25 Gb ethernet, back-end 40/100 or 10/25 Gb ethernet or Infiniband, an ethernet management interface, and power supply and cooling fans:

Of particular note is the ‘journal active’ LED, which is displayed as a white ‘hand icon’. When this is illuminated, it indicates that the mirrored journal is actively vaulting.

Note that a node’s compute module should not be removed from the chassis while this while LED is lit!

On the front of each chassis is an LCD front panel control with back-lit buttons and 4 LED Light Bar Segments – 1 per Node. These LEDs typically display blue for normal operation or yellow to indicate a node fault. This LCD display is hinged so it can be swung clear of the drive sleds for non-disruptive HDD replacement, etc.

Details can be queried with OneFS CLI drive utilities such as ‘isi_radish’ and ‘isi_drivenum’. For example, the command output from an H710 node:

tme-1# isi_drivenum

Bay  1   Unit 6      Lnum 15    Active      SN:7E30A02K0F43     /dev/da1
Bay  2   Unit N/A    Lnum N/A   N/A         SN:N/A              N/A
Bay  A0   Unit 1      Lnum 12    Active      SN:ZRS1HP4G         /dev/da4
Bay  A1   Unit 17     Lnum 13    Active      SN:ZR7105GY         /dev/da3
Bay  A2   Unit 16     Lnum 14    Active      SN:ZRS1HNZG         /dev/da2
Bay  B0   Unit 24     Lnum 9     Active      SN:ZRS1PHFG         /dev/da7
Bay  B1   Unit 23     Lnum 10    Active      SN:ZRS1HEA1         /dev/da6
Bay  B2   Unit 22     Lnum 11    Active      SN:ZRS1PHFX         /dev/da5
Bay  C0   Unit 30     Lnum 6     Active      SN:ZR5EFV0D         /dev/da10
Bay  C1   Unit 29     Lnum 7     Active      SN:ZR5FE3Z8         /dev/da9
Bay  C2   Unit 28     Lnum 8     Active      SN:ZR5FE311         /dev/da8
Bay  D0   Unit 36     Lnum 3     Active      SN:ZR5FE3DA         /dev/da13
Bay  D1   Unit 35     Lnum 4     Active      SN:ZRS1PHEF         /dev/da12
Bay  D2   Unit 34     Lnum 5     Active      SN:ZRS1HP6T         /dev/da11
Bay  E0   Unit 42     Lnum 0     Active      SN:ZRS1PHEM         /dev/da16
Bay  E1   Unit 41     Lnum 1     Active      SN:ZRS1PHDV         /dev/da15
Bay  E2   Unit 40     Lnum 2     Active      SN:ZRS1HPAT         /dev/da14

The ‘bay’ locations indicate the drive location in the chassis. ‘Bay 1’ references the cache/metadata SSD, located within the node’s compute module. Whereas the HDDs are referenced by their respective sled (A to E) and drive slot (0 to 2). For example, drive ‘E1’ in the following example:

The H710 and H7100 platforms are available in the following networking configurations, with a 10/25/40/100Gb ethernet front-end and 10/25/40/100Gb ethernet or 100Gb Infiniband back-end:

Model H710 H7100
Front-end network 10/25/40/100 GigE 10/25/40/100 GigE
Back-end network 10/25/40/100 GigE, Infiniband 10/25/40/100 GigE, Infiniband

These NICs and their PCI bus addresses can be determined via the ’pciconf’ CLI command, as follows:

# pciconf -l | grep mlx

mlx4_core0@pci0:59:0:0: class=0x020000 card=0x028815b3 chip=0x100315b3 rev=0x00 hdr=0x00

mlx5_core0@pci0:216:0:0:        class=0x020000 card=0x001615b3 chip=0x101515b3 rev=0x00 hdr=0x00

mlx5_core1@pci0:216:0:1:        class=0x020000 card=0x001615b3 chip=0x101515b3 rev=0x00 hdr=0x00

Similarly, the NIC hardware details and drive firmware versions can be viewed as follows:

# mlxfwmanager
Querying Mellanox devices firmware ...

Device #1:
----------
  Device Type:      ConnectX3
  Part Number:      105-001-013-00_Ax
  Description:      Mellanox 40GbE/56G FDR VPI card
  PSID:             EMC0000000004
  PCI Device Name:  pci0:59:0:0
  Port1 MAC:        1c34dae19e31
  Port2 MAC:        1c34dae19e32
  Versions:         Current        Available
     FW             2.42.5000      N/A
     PXE            3.4.0752       N/A
  Status:           No matching image found

Device #2:
----------
  Device Type:      ConnectX4LX
  Part Number:      020NJD_0MRT0D_Ax
  Description:      Mellanox 25GBE 2P ConnectX-4 Lx Adapter
  PSID:             DEL2420110034
  PCI Device Name:  pci0:216:0:0
  Base MAC:         1c34da4492e8
  Versions:         Current        Available
     FW             14.32.2004     N/A
     PXE            3.6.0502       N/A
     UEFI           14.25.0018     N/A
  Status:           No matching image found

Compared with their H70x predecessors, the H710 and H7100 see a number of hardware performance upgrades. These include a move to DDR5 memory, Sapphire Rapids CPU, and an upgraded power supply.

In terms of performance, the new H71x nodes provide a solid improvement over the prior generation. For example, streaming read and writes on both the H7100 and H7000:

OneFS node compatibility provides the ability to have similar node types and generations within the same node pool. In OneFS 9.11 and later, compatibility between the H710 and H7100 nodes and the previous generation platform is supported. Specifically, this node pool compatibility includes:

PowerScale H-series Node Pool Compatibility Gen6 MLK New
H500 H700 H710
H5600 H7000 H7100
H600

Node pool compatibility checking includes drive capacities for both data HDDs and SSD cache. This pool compatibility permits the addition of H710 node pairs to an existing node pool comprising four or more H700s, if desired, rather than creating an entirely new 4-node H710 node pool. Plus, there’s a similar compatibility between the H7100 and H7000 nodes.

Note that, while the H71x is node pool compatible with the H70x, it does require a performance compromise, since the H71x nodes are effectively throttled to match the performance envelope of the H70x nodes.

Apropos storage efficiency, OneFS inline data reduction support on mixed H-series diskpools is as follows:

Gen6 MLK New Data Reduction Enabled
H500 H700 H710 False
H500 H710 False
H700 H710 True
H5600 H7000 H7100 True
H5600 H7100 True
H7000 H7100 True

In the next article in this series, we’ll turn our attention to the PowerScale A310 and A3100 platforms.

PowerScale H710, H7100, A310, and A3100 Platform Nodes

Hot on the heels of the recent OneFS 9.11 release sees the launch of four new PowerScale hybrid and archive series hardware offerings. Between them, these new H310, H3100, A310 and A3100 spinning-disk-based nodes add significant blended capacity to the PowerScale stable.

Built atop the latest generation of Dell’s PowerScale chassis-based architecture, these new H-series and A-series platforms each boast a range of HDD capacities, paired with SSD for cache, a Sapphire Rapids CPU, a generous helping of DDR5 memory, and ample network connectivity – with four paired nodes all housed within a modular, power-efficient 4RU form factor chassis.

Here’s where these new platforms sit in the current PowerScale hardware hierarchy:

These new platforms will replace the PowerScale H700, H7000, A300, and A3000 systems, and further extend PowerScale’s price-density envelope.

The PowerScale H710, H7100, A310, and A3100 nodes offer an evolution from previous generations, while also focusing on environmental sustainability, reducing power consumption and carbon footprint. Housed in a 4RU chassis with balanced airflow and enhanced cooling, these new platforms offer significantly greater density than their predecessors – plus are ready to support Seagate’s 32TB HAMR HDDs when those drives become available later this year.

These new nodes all require OneFS 9.11 (or later) and also include in-line compression and deduplication by default – further increasing their capacity headroom, effective density, and power efficiency. Plus, incorporating Intel’s 4th gen Xeon Sapphire Rapids CPUs and the latest DDR5 DRAM offers greater processing horsepower plus an improved performance per watt.

Scalability-wise, both platforms require a minimum of four nodes (1 chassis) to form a cluster (or node pool). From here, they can be simply and non-disruptively scaled two nodes at a time up to a maximum of 252 nodes (63 chassis) per cluster. The basic specs for these new platforms are as follows:

Hardware Class PowerScale H-Series (Hybrid) PowerScale A-series (Archive)
Model H710 H7100 A310 A3100
OneFS version Requires OneFS 9.11 or above. Requires OneFS 9.11 or above.
CPU 16 Cores @ 2.0GHz, 30MB Cache 8 Cores @ 1.8GHz, 22.5MB Cache
Memory 384GB DDR5 DRAM 96GB DDR5 DRAM
Platform Four nodes per 4RU chassis; upgradeable per pair; node-compatible with prior generations. Four nodes per 4RU chassis; upgradeable per pair; node-compatible with prior generations.
Depth Standard 36.7 inch chassis Deep 42.2 inch chassis Standard 36.7 inch chassis Deep 42.2 inch chassis
Max cluster size Maximum of 63 chassis (252 nodes) per cluster. Maximum of 63 chassis (252 nodes) per cluster.
Storage Drives 60 per chassis     (15 per node) 80 per chassis     (20 per node) 60 per chassis     (15 per node) 80 per chassis     (20 per node)
HDD capacities 2TB,4TB, 8TB, 12TBTB, 16TB, 20TB, 24TB 2TB,4TB, 8TB, 12TBTB, 16TB, 20TB, 24TB
SSD (cache) capacities 0.8TB, 1.6TB, 3.2TB, 7.68TB 0.8TB, 1.6TB, 3.2TB, 7.68TB
Max raw capacity 1.4PB per chassis 1.9PB per chassis 1.4PB per chassis 1.9PB per chassis
Front-end network 10/25/40/100 GigE 10/25 GigE
Back-end network 10/25/40/100 GigE, Infiniband 10/25 GigE, Infiniband

In concert with the generational CPU and DRAM upgrades in the new PowerScale chassis platforms, OneFS 9.11 software advancements also help deliver a nice performance bump for the H71x and A31x hybrid platforms – particularly for sequential reads and writes.

The PowerScale H-series platforms are designed for general-purpose workloads, offering flexibility and scalability for a wide range of applications including file shares and home directories, editing and post-production media workflows, generative AI, and PACS and genomic data with efficient tiering.

In contrast, the A-series platforms are designed for cooler, infrequently accessed data use cases. These include active archive workflows for the A310, such as regulatory compliance data, medical imaging archives, financial records, and legal documents. And deep archive/cold storage for the A3100 platform, including surveillance video archives, backup, and DR repositories.

Over the next couple of articles, we’ll dig into the technical details of each of the new platforms. But, in summary, when combined with OneFS 9.11, the new PowerScale hybrid H71x and A31x platforms quite simply deliver on efficiency, flexibility, performance, scalability, and affordability!

OneFS SyncIQ Temporary Directory Hashing

SyncIQ receives an update in OneFS 9.11 with the default enablement of its Temporary Directory Hashing feature, which can help improve replication directory delete performance on target clusters.

But first, some background. For several years now, OneFS has included functionality, commonly referred to as temporary directory hashing, which addresses some of the challenges that SyncIQ can potentially encounter with large incremental replication tasks. Specifically, if a cluster contains an extra-wide directory, with many different replication threads trying to write to it simultaneously, OneFS file system performance can be impacted due to contention over lock requests on that very wide directory.

When Sync IQ performs an incremental transfer it frequently uses a temporary working directory in cases such as where a file has been created, but its parent doesn’t exist yet due to LIN order processing – or files being removed and their parents not being available, etc. Instead, SyncIQ will use this temporary working directory as a place to stash these files until it can put them in the correct location. In some incremental replication workflows, this could result in an extra-wide temporary working directory, potentially containing millions or billions of directories entries. When there are hundreds of SyncIQ workers all trying to link and unlink files from that same directory, performance can start to become impacted.

To address this, temporary directory hashing introduces support for subdirectories within a large temp working directory, based on a directory cookie. This allows SyncIQ to split up that monolithic directory into a bunch of smaller ones, so workers don’t contend with all of the other workers when they’re trying to link and unlink files within their temporary directories. They only contend with the other workers in their particular subdirectory, often providing a significant performance boost in some cases with lots of concurrent access.

In OneFS 9.11, temporary directory hashing functionality now becomes the default configuration and behavior.

SyncIQ’s temporary directory hashing functionality has actually existed within OneFS since 8.2, but prior to OneFS 9.11 it had to be manually enabled on a per-policy basis for any desired replication workflows.

When installing or upgrading a cluster to OneFS 9.11 or later, temporary directory hashing becomes the default configuration, so that any new SyncIQ policies will automatically have temporary directory hashing will enabled. However, this default change will not be applied retroactively to any legacy policies that were configured prior to OneFS 9.11 upgrade.

That said, any pre-existing policies can be easily configured to use temporary directory hashing from the SyncIQ source cluster with the following CLI syntax:

# echo '{"enable_hash_tmpdir": true}' | isi_papi_tool PUT /7/sync/policies/<policy name>

For example:

# echo '{"enable_hash_tmpdir": true}' | isi_papi_tool PUT /7/sync/policies/remote_zone1

Type request body, press enter, and CTRL-D:

204

Content-type: text/plain

Allow: GET, PUT, DELETE, HEAD

The configuration can be verified as ‘enabled’ with the following command:

# isi_papi_tool GET /7/sync/policies/remote_zone1 | grep "enable_hash_tmpdir"

"enable_hash_tmpdir" : true,

Under the hood, temporary directory hashing places any directories within a SyncIQ policy which need to be deleted into subdirectories under the ./tmp-working-dir/ directory, instead at the root of tmp-working-dir. This lowers contention on the root tmp-working-dir by moving exclusive locking requests to those subdirectories.

Performance-wise, the benefit and efficiency of SyncIQ temporary directory hashing will vary by cluster constitution, environment, and workflow. However, environments with thousands of directory deletions per policy run have seen improvements of between 2x-20x faster delete performance. To determine whether this feature is proving beneficial for a specific policy, view the SyncIQ job reports and compare the ‘STF_PHASE_CT_DIR_DELS’ job phase start and end times. This will indicate how much time those jobs have spent in this temporary directory delete phase, and can be accomplished from the replication source cluster with the following CLI syntax:

# isi sync reports view <policy_name> <job_id> | grep -C 3 "CT_DIR_DELS"

For example:

# isi sync reports view remote_zone1 31 | grep -C 3 "CT_DIR_DELS"

                            Phase: STF_PHASE_CT_DIR_DELS

                       Start Time: 2025-06-06T16:12:39

                         End Time: 2025-06-06T16:10:47

Note that for some SyncIQ policies which routinely move wide and shallow directories from one directory to another, temporary directory hashing may actually adversely impact those moves. In such instances, the feature can be disabled for each individual SyncIQ replication policy as follows:

# echo '{"enable_hash_tmpdir": false}' | isi_papi_tool PUT /7/sync/policies/<policy name>

Note that the above command should be run on the replication source cluster, using the root user authenticating to the PAPI service, replacing <policy name> with the appropriate value.

For example:

# echo '{"enable_hash_tmpdir": false}' | isi_papi_tool PUT /7/sync/policies/remote_zone1

Type request body, press enter, and CTRL-D:

204

Content-type: text/plain

Allow: GET, PUT, DELETE, HEAD

As such, OneFS will have disabled this feature for all subsequent job runs.

Similarly, the configuration can be verified as follows:

# isi_papi_tool GET /7/sync/policies/remote_zone1 | grep "enable_hash_tmpdir"

"enable_hash_tmpdir" : false,

For clusters running OneFS 8.2 through OneFS 9.10, where SyncIQ temporary directory deletion is disabled by default, it can be activated on a per-policy basis as follows:
# echo '{"enable_hash_tmpdir": true}' | isi_papi_tool PUT /7/sync/policies/<policy name>

As such, the next time SyncIQ runs the specified policy, temporary directory hashing will be enabled for this and future job runs.

So, in summary, SyncIQ temporary directory hashing can improve SyncIQ directory deletion performance for many policies with wide directory cases. While in OneFS 9.10 and earlier it was manually configurable on an individual per policy basis, now, in OneFS 9.11 and later, temporary directory hashing is enabled by default on all new Sync IQ policies.