PowerScale Cybersecurity Suite

The Dell PowerScale Cybersecurity Suite represents a forward-thinking, integrated approach to addressing the growing challenges in cybersecurity and disaster recovery.

It aligns with Gartner’s definition of Cyberstorage, a category of solutions specifically designed to secure data storage systems against modern threats. These threats include ransomware, data encryption attacks, and theft, and the emphasis of Cyberstorage is on active prevention, early detection, and the ability to block attacks before they cause damage. Recovery capabilities are also tailored to the unique demands of data storage environments, making the protection of data itself a central layer of defense.

Unlike traditional solutions that often prioritize post-incident recovery – leaving organizations exposed during the critical early stages of an attack – Dell’s PowerScale Cybersecurity Suite embraces the Cyberstorage paradigm by offering a comprehensive set of capabilities that proactively defend data and ensure operational resilience. These include:

Capability Details
Active Defense at the Data Layer The suite integrates AI-driven tools to detect and respond to threats in real-time, analyzing user behavior and unauthorized data access attempts. Bidirectional threat intelligence integrates seamlessly into SIEM, SOAR, and XDR platforms for coordinated protection at every layer.
Automated Threat Prevention Includes features like automated snapshots, operational air-gapped vaults, and immediate user lockouts to stop attacks before damage spreads. Attack simulation tools also ensure that the defenses are always optimized for emerging threats.
NIST Framework Alignment Adheres to the National Institute of Standards and Technology (NIST) cybersecurity framework, providing a structured approach to identifying, protecting, detecting, responding to, and recovering from threats. This comprehensive protection eliminates vulnerabilities overlooked by traditional backup and security tools, enabling organizations to stay ahead of today’s evolving cyber risks while ensuring business continuity.
Rapid Recovery and Resilience With secure backups and precision recovery, organizations can rapidly restore specific files or entire datasets without losing unaffected data. Recovery is accelerated by integrated workflows that minimize downtime.

By embedding detection, protection, and response directly into the data layer, the Dell PowerScale Cybersecurity Suite adopts a proactive and preventive approach to safeguarding enterprise environments:

Approach Details
Identification & Detection Detecting potential incursions in real time using AI-driven behavioral analytics.
Protection Protecting data at its source with advanced security measures and automated threat monitoring.
Response Responding decisively with automated remediation to minimize damage and accelerate recovery, ensuring seamless continuity.
Recovery Providing recovery tools and forensic data and recovery tools to quickly restore clean data in the event of a breach, minimizing business disruption.

It begins with identification and detection, where data is protected at its source through advanced security measures and continuous automated threat monitoring. Protection is achieved by identifying potential incursions in real time using AI-driven behavioral analytics, allowing organizations to act before threats escalate. When a threat is detected, the suite responds with automated remediation processes that minimize damage and accelerate recovery, ensuring uninterrupted operations.

This integrated approach enables Dell to address the full lifecycle of security and recovery within the PowerScale platform, delivering exceptional resilience across IT environments.

Released globally on August 28, 2025, the Dell PowerScale Cybersecurity Suite is available in three customizable bundles tailored to meet diverse operational and regulatory needs.

Bundle Details
Cybersecurity Bundle Leverages AI-driven threat detection, Zero Trust architecture, and automated risk mitigation to fortify security.
Airgap Vault Bundle Extends the capabilities of the Cybersecurity Bundle by adding isolated, secure backups for robust ransomware protection. This bundle requires the Cybersecurity Bundle and the Disaster Recovery Bundle..
Disaster Recovery Bundle Prioritizes rapid recovery with near-zero Recovery Point Objectives (RPO), Recovery Time Objectives (RTO), and seamless failover capabilities.

The Cybersecurity Bundle leverages AI-driven threat detection, Zero Trust architecture, and automated risk mitigation to strengthen data protection. The Airgap Vault Bundle builds on this by adding isolated, secure backups for enhanced ransomware defense, and requires both the Cybersecurity and Disaster Recovery bundles for full deployment. The Disaster Recovery Bundle focuses on rapid recovery, offering rapid Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO), along with seamless failover capabilities to ensure business continuity.

Customers can choose specific bundles based on their requirements, with the only prerequisite being that the Airgap Vault bundle must be deployed alongside both the Cybersecurity and Disaster Recovery bundles to ensure full functionality and integration.

Built upon the venerable PowerScale platform, the suite is engineered to protect unstructured data and maintain operational availability in today’s increasingly complex threat landscape. As such, it offers a comprehensive set of tools, techniques, and architectural flexibility to deliver multilayered security and responsive recovery.

This enables organizations to design robust solutions that align with their specific security priorities—from advanced threat detection to seamless disaster recovery.

Among its key benefits, the suite includes automated threat response capabilities that swiftly mitigate risks such as malicious data encryption and exfiltration.

Feature Details
Automated threat response The suite features automated responses to cybersecurity threats, such as data encryption, and the prevention of data exfiltration, helping mitigate risks swiftly and effectively.
Secure Operational Airgap Vault Data is protected within an isolated operational airgap vault, and will only be transferred to the operational airgap vault if the production storage environment is not under attack, ensuring critical assets remain secure and inaccessible to unauthorized actors.
Ecosystems Integration Seamlessly integrates with leading endpoint protection and incident response software, automating and simplifying operations during a cyberattack to ensure a coordinated and efficient response.
DoD-Certified Hardware Integration Designed to enhance PowerScale’s DoD APL certified hardware, meeting rigorous cybersecurity standards, and providing customers with a trusted platform on which to build their defenses. The suite’s advanced capabilities, robust protection, and proven hardware deliver a comprehensive cyber and DR solution tailored to meet today’s complex security challenges.

It also features a secure operational airgap vault, which ensures that data is only transferred when the production environment is verified to be safe, keeping critical assets isolated and protected from unauthorized access. Integration with leading endpoint protection and incident response platforms allows for coordinated and efficient responses during cyberattacks, streamlining operations and reducing complexity.

The suite is also designed to complement PowerScale’s DoD APL-certified hardware, meeting stringent cybersecurity standards and providing a trusted foundation for enterprise defense strategies. Its advanced capabilities, combined with proven hardware, deliver a comprehensive cybersecurity and disaster recovery solution tailored to modern security challenges.

The Dell PowerScale Cybersecurity Suite is engineered to support petabyte-scale environments containing billions of files distributed across multiple PowerScale clusters. There is no hard-coded limit on the volume of data it can manage. However, actual throughput and failover times are influenced by factors such as SyncIQ bandwidth, the degree of policy parallelism, and the overall readiness of the environment. To help forecast recovery timelines, the suite includes an Estimated Failover Time engine that uses real metrics to generate policy-specific projections.

Superna software, which underpins the suite, interacts with PowerScale systems via REST APIs. These APIs provide access to inventory data, facilitate the orchestration of snapshot creation and deletion, and enable user lockout and restoration of access to shared resources. Specifically, the suite utilizes the file system API and the system configuration API to perform these operations. Meanwhile, configuration and management can be performed via the comprehensive, intuitive WebUI:

The performance impact of running the Data Security Edition—which includes the Cybersecurity Bundle with Ransomware Defender and Easy Auditor—is primarily tied to the frequency and volume of API calls made to PowerScale. These calls can be managed and tuned by administrators and support teams during deployment and system configuration. One known scenario that may affect cluster performance occurs when user permissions are broadly configured to allow all users access to all shares. In such cases, default settings combined with enabled snapshots can lead to excessive API endpoint activity, as the system attempts to create and delete snapshots across all shares. To mitigate this, Dell recommends disabling default snapshots during deployment and instead configuring snapshots only for critical data paths. Outside of this specific configuration, Dell has not identified any significant performance impacts under normal operating conditions.

The Dell-branded software is built on the same codebase as the standard Superna 2.x release, with branding determined by licensing. This branding helps  ensure consistency, simplify user interactions, and reinforce the alignment between the suite and Dell Technologies’ broader product portfolio.

With regard to release cadence, there is typically a 60-day lead time between Superna releasing a new version and Dell launching its branded equivalent, allowing for additional QA, regression, and longevity testing.

With the Cybersecurity suite, Dell Technologies will directly manage implementation services and provide first-call support. This approach ensures a seamless and customer-focused experience, with faster response times and streamlined service delivery. It also guarantees that customers receive consistent and integrated support throughout their deployment and operational lifecycle.

It is important to note that Dell-branded and Superna-branded software cannot be mixed within the same PowerScale cluster. Currently, the Dell PowerScale Cybersecurity Suite is intended exclusively for new deployments and is not compatible with existing Superna-branded environments. Migration from Superna to Dell-branded software is not supported at this time, unless an entire Superna Eyeglass solution is being renewed. However, Dell is actively working to expand migration options in future releases.

PowerScale InsightIQ 6.1

It’s been a sizzling summer for Dell PowerScale to date! Hot on the heels of the OneFS 9.12 launch comes the unveiling of the innovative new PowerScale InsightIQ 6.1 release.

InsightIQ provides powerful performance and health monitoring and reporting functionality, helping to maximize PowerScale cluster efficiency. This includes advanced analytics to optimize applications, correlate cluster events, and the ability to accurately forecast future storage needs.

So what new goodness does this InsightIQ 6.1 release add to the PowerScale metrics and monitoring mix?

Additional functionality includes:

Feature New IIQ 6.1 Functionality
Ecosystem support ·         InsightIQ is qualified on Ubuntu
Flexible Alerting ·         Defining custom alerts on the most used set of granular metrics.

·         Nine new KPIs for a total of 16 KPIs.

·         Increased granularity of alerting and many more

Online Migration from Simple to Scale ·         Customers with InsightIQ 6.0.1 Simple (OVA) can now migrate data and functionalities to InsightIQ 6.1.0 Scale.
Self-service Admin Password Reset ·         Administrators can reset their own password through a simple, secure flow with reduced IT dependency

InsightIQ 6.1 continues to offer the same two deployment models as its predecessors:

Deployment Model Description
InsightIQ Scale Resides on bare-metal Linux hardware or virtual machine.
InsightIQ Simple Deploys on a VMware hypervisor (OVA).

The InsightIQ Scale version resides on bare-metal Linux hardware or virtual machine, whereas InsightIQ Simple deploys via OVA on a VMware hypervisor.

InsightIQ v6.x Scale enjoys a substantial breadth-of-monitoring scope, with the ability to encompass 504 nodes across up to 20 clusters.

Additionally, InsightIQ v6.x Scale can be deployed on a single Linux host. This is in stark contrast to InsightIQ 5’s requirements for a three Linux node minimum installation platform.

Deployment:

The deployment options and hardware requirements for installing and running InsightIQ 6.x are as follows:

Attribute InsightIQ 6.1 Simple InsightIQ 6.1 Scale
Scalability Up to 10 clusters or 252 nodes Up to 20 clusters or 504 nodes
Deployment On VMware, using OVA template RHEL, SLES, or Ubuntu with deployment script
Hardware requirements VMware v15 or higher:

·         CPU: 8 vCPU

·         Memory: 16GB

·         Storage: 1.5TB (thin provisioned);

Or 500GB on NFS server datastore

Up to 10 clusters and 252 nodes:

·         CPU: 8 vCPU or Cores

·         Memory: 16GB

·         Storage: 500GB

Up to 20 clusters and 504 nodes:

·         CPU: 12 vCPU or Cores

·         Memory: 32GB

·         Storage: 1TB

Networking requirements 1 static IP on the PowerScale cluster’s subnet 1 static IP on the PowerScale cluster’s subnet

Ecosystem support:

The InsightIQ ecosystem itself is also expanded in version 6.1 to also include Ubuntu 24.04 Online deployment and OpenStack RHOSP 21 with RHEL 9.6, in addition to SLES 15 SP4 and Red Hat Enterprise Linux (RHEL) versions 9.6 and 8.10. This allows customers who have standardized on Ubuntu Linux to now run an InsightIQ 6.1 Scale deployment on a v24.04 host to monitor the latest OneFS versions.

Qualified on InsightIQ 6.0 InsightIQ 6.1
OS (IIQ Scale Deployment) RHEL 8.10, RHEL 9.4, and SLES 15 SP4 RHEL 8.10, RHEL 9.6, and SLES 15 SP4
PowerScale OneFS 9.4 to 9.11 OneFS 9.5 to 9.12
VMware ESXi ESXi v7.0U3 and ESXi v8.0U3 ESXi v8.0U3
VMware Workstation Workstation 17 Free Version Workstation 17 Free Version
Ubuntu Ubuntu 24.04 Online deployment
OpenStack RHOSP 17 with RHEL 9.4 RHOSP 21 with RHEL 9.6

Similarly, in addition to deployment on VMware ESXi 8, the InsightIQ Simple version can also be installed for free on VMware Workstation 17, providing the ability to stand up InsightIQ in a non-production or lab environment for trial or demo purposes, without incurring a VMware licensing charge.

Additionally, the InsightIQ OVA template is now reduced in size to under 5GB, and with an installation time of less than 12 minutes.

Online Upgrade

The IIQ upgrade in 6.1 is a six step process:

First, the installer checks the current Insight IQ version, verifies there’s sufficient free disk space, and confirms that setup is ready. Next, IIQ is halted and dependencies met, followed by the installation of the 6.1 infrastructure and a migration of legacy InsightIQ configuration and historical report data to the new platform. The cleanup phase removes the old configuration files, etc, followed by the final phase which upgrades alerts and removes the lock, leaving InsightIQ 6.1 ready to roll.

Phase Details
Pre-check •       docker command

•        IIQ version check 6.0.1

•       Free disk space

•       IIQ services status

•       OS compatibility

Pre-upgrade •       EULA accepted

•       Extract the IIQ images

•       Stop IIQ

•       Create necessary directories

Upgrade •       Upgrade addons services

•       Upgrade IIQ services except alerts

•       Upgrade EULA

•       Status Check

Post-upgrade •       Update admin email

•       Update IIQ metadata

Cleanup •       Replace scripts

•       Remove old docker images

•       Remove upgrade and backup folders

Upgrade Alerts and Unlock •       Trigger alert upgrade

•       Clean lock file

The prerequisites for upgrading to InsightIQ 6.1 are either a Simple or Scale deployment with 6.0.1 installed, and with a minimum of 40GB free disk space.

The actual upgrade is performed by the ‘upgrade-iiq.sh’ script:

Specific steps in the upgrade process are as follows:

  • Download and uncompress the bundle
# tar xvf iiq-install-6.1.0.tar.gz
  • Enter InsightIQ folder and un-tar upgrade scripts
# cd InsightIQ
# tar xvf upgrade.tar.gz
  • Enter upgrade scripts folder
# cd upgrade/
  • Start upgrade. Note that the usage is same for both the Simple and Scale InsightIQ deployments.
# ./upgrade-iiq.sh -m <admin_email>

Upon successful upgrade completion, InsightIQ will be accessible via the primary node’s IP address.

Online Simple-to-Scale Migration

The Online Simple-to-Scale Migration feature enables seamless migration of data and functionalities from InsightIQ version 6.0.1 to version 6.1. This process is specifically designed to support migrations from InsightIQ 6.0.1 Simple (OVA) deployments to InsightIQ 6.1 Scale deployments.

Migration is supported only from InsightIQ version 6.0.1. To proceed, the following prerequisites must be met:

  • An InsightIQ 6.0.1 Simple deployment running IIQ 6.0.1.
  • An InsightIQ Scale deployment running IIQ 6.1.0 installed and the EULA accepted.

The ‘iiq_data_migration’ script can be run as follows to initiate a migration:

# cd /usr/share/storagemonitoring/online_migration
# bash iiq_data_migration.sh

Additionally, detailed logs are available at the following locations for monitoring and verifying the migration process:

Logfile Location
Metadata Migration Log /usr/share/storagemonitoring/logs/online_migration/insightiq_online_migration.log
Cluster Data Migration Log /usr/share/storagemonitoring/logs/clustermanagement/insightiq_cluster_migration.log

 Self-service Admin Password Reset

InsightIQ 6.1 introduces a streamlined self-service password reset feature for administrators. This secure process allows admins to reset their own passwords without IT intervention.

Key features include one-time password (OTP) verification, ensuring only authorized users can reset passwords. Timeout enforcement means OTPs expire after 5 minutes for added security, and accounts are locked after five failed attempts to prevent brute-force attacks.

Note that SMTP must be configured in order to receive OTPs via email.

Flexible Alerting

InsightIQ 6.1 enhances alerting capabilities with 16 total KPIs/metrics, including 9 new ones. Key improvements include:

  • Greater granularity (beyond cluster-level alerts)
  • Support for sub-filters and breakouts
  • Multiple operators and unit-based thresholding
  • Aggregator and extended duration support

Several metrics have been transformed and/or added in version 6.1. For example:

IIQ 6.1 Metric IIQ.6.0 Metric
Active Clients ·         Active Clients NFS

·         Active Clients SMB1

·         Active Clients SMB2

Average Disk Hardware Latency
Average Disk Operation Size
Average Pending Disk Operation Count ·         Pending Disk Operation Count
Capacity ·         Drive Capacity

·         Cluster Capacity

·         Node Capacity

·         Nodepool Capacity

Connected Clients ·         Connected Clients NFS

·         Connected Clients SMB

CPU Usage ·         CPU Usage
Disk Activity
Disk Operations Rate ·         Pending Disk Operation Count
Disk Throughput Rate
External Network Errors Rate
External Network Packets Rate
External Network Throughput Rate ·         Network Throughput Equivalency
File System Throughput Rate
Protocol Operations Average Latency ·         Protocol Latency NFS

·         Protocol Latency SMB

Also, clusters can now be directly associated with alert rules:

The generated alerts page sees the addition of a new ‘Metric’ field:

For example, an alert can now be generated at the nodepool level for the metric ‘External Network Throughput Rate’:

IIQ 6.1 also includes an updated email format, as follows:

Alert Migration

The alerting system has transitioned from predefined alerting to flexible alerting. During this migration, all alert policies, associated rules, resources, notification rules, and generated alerts are automatically migrated—no additional steps are required.

Key differences include:

IIQ 6.1 Flexible Alerting IIQ 6.0 Predefined Alerting
·         Each alert rule is associated with only one cluster (1:1 mapping). ·         Alert rules and resources are tightly coupled with alert policies.
·         A policy can still have multiple rules, but resources are now linked directly to rules, not policies.

·         This results in N × M combinations of alert rules and clusters (N = resources, M = rules).

·         A single policy can be linked to multiple rules and resources.

For example, imagine the following scenario:

  • Pre-upgrade (Predefined Alerting):

An IIQ 6.0 1 instance has a policy (Policy1), which is associated with two rules (CPU Rule & Capacity Ruleand 4 clusters (Cluster1-4).

  • Post-Upgrade (Flexible Alerting):

Since only one resource can be associated with one alert rule, a separate alert rule will be created for each cluster. So, after upgrading to IIQ 6.1, Policy1 will now have four individual cluster CPU Alert rules and four individual cluster Capacity Alert rules:

If an IIQ 6.1 upgrade happens to fail due to alert migration, a backup of the predefined alerting database is automatically created. To retry the migration, run:

# bash /usr/share/storagemonotoring/scripts/retrigger_alerts_upgrade.sh

Plus, for additional context and troubleshooting, the alert migration logs can be found at:

 /usr/share/storagemonitoring/logs/alerts/alerts_migration.log

Durable Data Collection

Data collection and processing in IIQ 6.x provides both performance and fault tolerance, with the following decoupled architecture:

Component Role
Data Processor Responsible for processing and storing the data in TimescaleDB for display by Reporting service.
Temporary Datastore Stores historical statistics fetched from PowerScale cluster, in-between collection and processing.
Message Broker Facilitates inter-service communication. With the separation of data collection and data processing, this allows both services to signal to each other when their respective roles come up.
Timescale DB New database storage for the time-series data. Designed for optimized handling of historical statistics.

InsightIQ TimescaleDB database stores long-term historical data via the following retention strategy:

Telemetry data is summarized and stored in the following cascading levels, each with a different data retention period:

Level Sample Length Data Retention Period
Raw table Varies by metric type. Raw data sample lengths range from 30s to 5m. 24 hours
5m summary 5 minutes 7 days
15m summary 15 minutes 4 weeks
3h summary 3 hours Infinite

Note that the actual raw sample length may vary by graph/data type – from 30 seconds for CPU % Usage data up to 5 minutes for cluster capacity metrics.

Meanwhile, the new InsightIQ v6.1 code is available for download on the Dell Support site, allowing both the installation of and upgrade to this new release.

ObjectScale XF960

Fresh off the launch of the new ObjectScale 4.1 release, Dell Technologies has announced the general availability of the ObjectScale XF960 platform, a next-generation all-flash object storage appliance designed to meet the performance demands of AI, analytics, and unstructured data workloads. The XF960 is now ready to ship, offering a compelling blend of speed, scalability, and efficiency.

Built to support performance-driven workloads, the XF960 enables organizations to unlock the full potential of their data. Whether training complex AI models, managing large datasets, or deploying cloud-native applications, the XF960 provides the object storage substrate needed to drive innovation.

As Dell’s highest-performing object storage platform to-date, the new XF960 delivers up to 300% more read throughput, 42% more write throughput, plus 75% lower read and 42% lower write response times than the previous-generation EXF900.

The XF960 scales effortlessly from small clusters to large enterprise deployments, maintaining performance and manageability throughout. It also introduces advanced data efficiency features, including five user-configurable compression modes—LZ4, Zstandard, and Deflate among them—allowing for up to a 9:1 compression ratio on certain workloads.

Enhancing its S3 protocol compatibility, the XF960 supports push-based event notifications, up to three times faster object listing, S3FS file system mounting, and seamless integration with the latest AWS SDKs, improving both data access and developer productivity.

Designed for flexible integration, the XF960 accommodates the expansion of existing ECS environments, and is initially supported with the new ObjectScale 4.1 code which dropped last week (8/12/25).

As compared to the EXF900, the XF960 features up-rev’d hardware, including a 2RU PowerEdge R760 chassis, dual Intel Sapphire Rapids CPUs with 32 cores, 256GB DDR5 memory, and support for NVMe drives ranging from 7.68TB to 61.44TB. It also includes 100GbE front-end and back-end NICs, dual 1400W power supplies, and S5448 switches.

  EXF900 XF960
CPU Dual Intel Cascade Lake 24 Cores (165 W) Dual Intel Sapphire Rapids 32 Cores (270W)
RAM 192GB RAM per node, installed as 12x16GB DDR4 RDIMMs 256GB RAM per node, installed as 16x16G DDR5 RDIMMs
SSDs

(NVMe)

3.84TB ISE

7.68TB ISE

15.36TB TLC ISE

61.44TB QLC ISE

12 and 24 drive configurations

7.68TB TLC ISE

15.36TB TLC SED FIPS

30TB QLC ISE

61.44TB QLC ISE

6, 12 and 24 drive configurations

Front End NIC 25GbE, 100GbE 100GbE
Back End NIC 25GbE, 100GbE 100GbE
Power Dual 1100W PSUs Dual 1400W PSUs
Back/front-end Switches S5248 S5448

For existing ObjectScale all-flash customers, while it is technically possible to intermix EXF900 and XF960 nodes, it is not recommended due to performance limitations. Mixed clusters will operate at EXF900 performance levels. That said, the  next ObjectScale release, v4.2, will introduce improvements for mixed environments.

PowerScale OneFS 9.12

Dell PowerScale is already powering up the summer with the launch of the innovative OneFS 9.12 release, which shipped today (14th August 2025). This new 9.12 release has something for everyone, introducing PowerScale innovations in security, serviceability, reliability, protocols, and ease of use.

OneFS 9.12 represents the latest version of PowerScale’s common software platform for on-premises and cloud deployments. This can make it a excellent choice for traditional file shares and home directories, vertical workloads like M&E, healthcare, life sciences, financial services, plus generative and agentic AI, and other ML/DL and analytics applications.

PowerScale’s scale-out architecture can be deployed on-site, in co-lo facilities, or as customer-managed PowerScale for Amazon AWS and Microsoft Azure deployments, providing core to edge to cloud flexibility, plus the scale and performance needed to run a variety of unstructured workflows on-prem or in the public cloud.

With data security, detection, and monitoring being paramount in this era of unprecedented cyber threats, OneFS 9.12 brings an array of new features and functionality to keep your unstructured data and workloads more available, manageable, and secure than ever.

Protocols

On the S3 object protocol front, OneFS 9.12 sees the debut of new security and immutability functionality. S3 Object Lock extends the standard AWS S3 Object Lock model with PowerScale’s own ‘Bucket-Lock’ protection mode semantics. Object Lock capabilities can operate on a per-zone basis and per-bucket, using the cluster’s compliance clock for the date and time evaluation of object’s retention. Additionally, S3 protocol access logging and bucket logging are also enhanced in this new 9.12 release.

Networking

As part of PowerScale’s seamless protocol failover experience for customers, OneFS 9.12 sees SmartConnect’s default IP allocation method for new pools move to ‘dynamic’. While SMB2 and SMB3 are the primary focus, all protocols benefit from this enhancement, including SMB, NFS, S3, and HDFS. Legacy pools will remain unchanged upon upgrade to 9.12, but any new pools will automatically be provisioned as dynamic (unless manually configured as ‘static’).

Security

In the interests of increased security and ransomware protection, OneFS 9.12 includes new Secure Snapshots functionality. Secure Snapshots provide true snapshot immutability, as well as protection for snapshot schedules, in order to protect against alteration or deletion, either accidentally or by a malicious actor.

Secure snapshots are built upon Multi-party Authorization (MPA), also introduced in OneFS 9.12. MPA prevents an individual administrator from executing privileged operations, such as configuration changes on snapshots and snapshot schedules, by requiring two or more trusted parties to sign off on a requested change for the privileged actions within a PowerScale cluster.

OneFS 9.12 also introduces support for common access cards (CAC) and personal identity verification (PIV) smart cards, providing physical multi-factor authentication (MFA), allowing users to SSH to a PowerScale cluster using the same security badge that grants them access into their office. In addition to US Federal mandates, CAC/PIV integration is a requirement for many security conscious organizations across the public and private sectors.

Upgrade

One-click upgrades in OneFS 9.12 allow a cluster to automatically display and download available  trusted upgrade packages from Dell Support, which can be easily applied via ‘one click installation’ from the OneFS WebUI or CLI. Upgrade package versions are automatically managed by Dell in accordance with a cluster’s telemetry data.

Support

OneFS 9.12 introduces an auto-healing capability, where the cluster detects problems using the HealthCheck framework and automatically executes a repair action for known issues and failures. This helps to increase cluster availability and durability, while reducing the time to resolution and the need for technical support engagements. Furthermore, additional repair-actions can be added at any point, outside of the general OneFS release cycle.

Hardware Innovation

On the platform hardware front, OneFS 9.12 also introduces an HDR Infiniband front-end connectivity option for the PowerScale PA110 performance and backup accelerator. Plus, 9.12 also brings a fast reboot enhancement to the high-memory PowerScale F-series nodes.

In summary, OneFS 9.12 brings the following new features and functionality to the Dell PowerScale ecosystem:

Area Feature
Networking ·         SmartConnect dynamic allocation as the default.
Platform ·         PowerScale PA110 accelerator front-end Infiniband support.

·         Conversion of front-end Ethernet to Infiniband support for F710 & F910.

·         F-series fast reboots.

Protocol ·         S3 Object Lock.

·         S3 Immutable SmartLock bucket for tamper-proof objects.

·         S3 protocol access logging.

·         S3 bucket logging.

Security ·         Multi-party authorization for privileged actions.

·         CAC/PIV smartcard SSH access.

·         Root lockdown mode.

·         Secure Snapshots with MPA override to protect data when retention period has not expired.

Support ·         Custer-level inventory request API.

·         In-field support for back-end NIC changes.

Reliability ·         Auto Remediation self-diagnosis and healing capability.
Upgrade ·         One-click upgrade.

We’ll be taking a deeper look at OneFS 9.12’s new features and functionality in future blog articles over the course of the next few weeks.

Meanwhile, the new OneFS 9.12 code is available on the Dell Support site, as both an upgrade and reimage file, allowing both installation and upgrade of this new release.

For existing clusters running a prior OneFS release, the recommendation is to open a Service Request with to schedule an upgrade. To provide a consistent and positive upgrade experience, Dell is offering assisted upgrades to OneFS 9.12 at no cost to customers with a valid support contract. Please refer to Knowledge Base article KB544296 for additional information on how to initiate the upgrade process.

ObjectScale 4.1

Hot off the press comes ObjectScale version 4.1 – a major release of Dell’s enterprise-grade object storage platform. As a foundational component of the Dell AI Data Platform, ObjectScale 4.1 delivers enhanced scalability, performance, and resilience that’s engineered to meet the evolving demands of AI-driven workloads and modern data ecosystems.

This release is available as a software upgrade for existing ECS and ObjectScale environments, and the core new features and functionality introduced in this ObjectScale 4.1 release include:

Storage Efficiency and Operational Experience

On the storage efficiency and operation experience front, ObjectScale 4.1 introduces support for multiple compression modes including LZ4, Zstandard, Deflate, and Snappy, configurable via both the UI and API. This flexibility allows admins to fine-tune compression strategies to balance performance, cost, and workload characteristics.

Post-upgrade to ObjectScale 4.1, the default algorithms are updated to LZ4 for AFA appliances (EXF900 and XF960) and Zstandard for HDD appliances (EX300, EX3000, EX500, EX5000, X560). Storage admins can change the algorithm at any time via the UI or API, based on workload or use case.

Improved garbage collection throughput enables faster reclamation of deleted capacity. Enhanced monitoring, alerting, and logging tools provide greater visibility into background processes, contributing to overall cluster stability.

An updated dashboard offers refined views of user, available, and reserved capacity. Automated alerts notify administrators when usage exceeds 90%, indicating a transition to Read-Only mode for the affected Virtual Data Center (VDC).

New port-level bandwidth controls for replication traffic allow for more predictable performance and optimized resource allocation across distributed environments.

Security and Data Protection

Within the security and data protection realm, ObjectScale now provides support for Self-Encrypting Drives (SEDs) with local key management via Dell iDRAC. This ensures hardware-level encryption and secure, appliance-local key handling for enhanced data protection.

TLS 1.3, the latest version of the Transport Layer Security protocol, is also supported in ObjectScale 4.1. This upgrade delivers stronger encryption, faster handshakes, and the removal of legacy algorithms, improving both control and data path security.

Expanded Capabilities for Modern Workloads

ObjectScale 4.1 now offers up to 3x faster object listing performance in multi-VDC environments. This enhancement improves data browsing and discovery, with better handling of deleted metadata and validation of Untrusted Listing Keys.

Through webhook-based APIs, ObjectScale can now push real-time notifications to external applications when events such as object creation, deletion, or modification occur—enabling responsive, event-driven architectures.

Support for S3FS in 4.1 allows users to mount S3 buckets on Linux systems as local file systems. This simplifies access and management, particularly for legacy applications that rely on traditional file system operations.

On the integration front, ObjectScale 4.1 is compatible with the latest AWS SDK v2.29, so Java developers can immediately use new S3 features and performance fixes in their applications, and build cloud-native applications with full access to modern AWS features and APIs.

The following hardware platforms are supported by the new ObjectScale 4.1 release:

Gen 2 systems U480E, U400T, U400E, U4000, U400, U2800, U2000, D6200, D5600, D4500
Gen 3 systems EX3000, EX300, EXF900, EX5000, EX500
Gen 4 systems X560, XF960

Note that upgrading to ObjectScale 4.1 is only supported from ECS 3.8.x and 4.0.x releases.

In summary, ObjectScale 4.1 represents a strategic advancement in Dell’s commitment to delivering intelligent, secure, and scalable storage solutions for the AI era. Whether upgrading existing infrastructure or deploying new systems, this new 4.1 release empowers organizations to meet the challenges of data growth, complexity, and innovation with confidence.

OneFS SmartSync Backup-to-Object Management and Troubleshooting

As we saw in the previous articles in this series, SmartSync in OneFS 9.11 enjoys the addition of backup-to-object functionality, which delivers high performance, full-fidelity incremental replication to ECS, ObjectScale, Wasabi, and AWS S3 & Glacier IR object stores.

This new SmartSync backup-to-object functionality supports the full spectrum of OneFS path lengths, encodings, and file sizes up to 16TB – plus special files and alternate data streams (ADS), symlinks and hardlinks, sparse regions, and POSIX and SMB attributes.

In addition to the standard ‘isi dm’ command set, the following CLI utility can also come in handy for tasks such as verifying the dataset ID for restoration, etc:

# isi_dm browse

For example, to query the SmartSync accounts and datasets:

# isi_dm browse

<no account>:<no dataset> $ list-accounts

000000000000000100000000000000000000000000000000 (tme-tgt)

ec2a72330e825f1b7e68eb2352bfb09fea4f000000000000 (DM Local Account)

fd0000000000000000000000000000000000000000000000 (DM Loopback Account)

<no account>:<no dataset> $ connect-account 000000000000000100000000000000000000000000000000

tme-tgt:<no dataset> $ list-datasets

1       2025-07-22T10:23:33+0000        /ifs/data/zone3

2       2025-07-22T10:23:33+0000        /ifs/data/zone4

1025    2025-07-22T10:25:01+0000        /ifs/data/zone3

2049    2025-07-22T10:30:04+0000        /ifs/data/zone4

tme-tgt:<no dataset> $ connect-dataset 2

tme-tgt:2 </ifs/data/zone4:> $ ls

home                           [dir]

zone2_sync1753179349           [dir]

tme-tgt:2 </ifs/data/zone4:> $ cd zone2_sync1753179349

tme-tgt:2 </ifs/data/zone4:zone2_sync1753179349/> $ ls

home                           [dir]

tme-tgt:2 </ifs/data/zone4:zone2_sync1753179349/> $

Or for additional detail:

tme-tgt:2 </ifs/data/zone4:zone2_sync1753179349/> $ settings output-to-file-on /tmp/out.txt

tme-tgt:2 </ifs/data/zone4:zone2_sync1753179349/> $ settings verbose-on

tme-tgt:2 </ifs/data/zone4:zone2_sync1753179349/> $ list-datasets

1       2025-07-22T10:23:33+0000        /ifs/data/zone3 { dmdi_tree_id={ dmdti_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdti_local_unid=1 } dmdi_revision={ dmdr_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdr_local_unid=1 } }

2       2025-07-22T10:23:33+0000        /ifs/data/zone4 { dmdi_tree_id={ dmdti_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdti_local_unid=2 } dmdi_revision={ dmdr_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdr_local_unid=2 } }

1025    2025-07-22T10:25:01+0000        /ifs/data/zone3 { dmdi_tree_id={ dmdti_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdti_local_unid=1 } dmdi_revision={ dmdr_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdr_local_unid=3 } }

2049    2025-07-22T10:30:04+0000        /ifs/data/zone4 { dmdi_tree_id={ dmdti_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdti_local_unid=2 } dmdi_revision={ dmdr_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdr_local_unid=4 } }

But when it comes to monitoring and troubleshooting SmartSync, there are a variety of diagnostic tools available. These include:

Component Tools Issue
Logging ·         /var/log/isi_dm.log

·         /var/log/messages

·         ifs/data/Isilon_Support/datamover/transfer_failures/baseline_failures_ <jobid>

General SmartSync info and  triage.
Accounts ·         isi dm accounts list / view Authentication, trust and encryption.
CloudCopy ·         S3 Browser (ie. Cloudberry), Microsoft Azure Storage Explorer Cloud access and connectivity.
Dataset ·         isi dm dataset list/view Dataset creation and health.
File system ·         isi get Inspect replicated files and objects.
Jobs ·         isi dm jobs list/view

·         isi_datamover_job_status -jt

Job and task execution, auto-pausing, completion, control, and transfer.
Network ·         isi dm throttling bw-rules list/view

·         isi_dm network ping/discover

Network connectivity and throughput.
Policies ·         isi dm policies list/view

·         isi dm base-policies list/view

Copy and dataset policy execution and transfer.
Service ·         isi services -a isi_dm_d <enable/disable> Daemon configuration and control.
Snapshots ·         isi snapshot snapshots list/view Snapshot execution and access.
System ·         isi dm throttling settings CPU load and system performance.

SmartSync info and errors are typically written to /var/log/isi_dm.log and /var/log/messages, while DM jobs transfer failures generate a log specific to the job ID under /ifs/data/Isilon_Support/datamover/transfer_failures.

Once a policy is running, the job status is reported via ‘isi dm jobs list’. Once complete, job histories are available by running ‘isi dm historical jobs list’. More details for a specific job can be gleaned from the ‘isi dm job view’ command, using the pertinent job ID from the list output above. Additionally, the ‘isi_datamover_job_status’ command with the job ID as an argument will also supply detailed information about a specific job.

Once running, a DM job can be further controlled via the ‘isi dm jobs modify’ command, and available actions include cancel, partial-completion, pause, or resume.

If a certificate authority (CA) is not correctly configured on a PowerScale cluster, the SmartSync daemon will not start, even though accounts and policies can still be configured. Be aware that the failed policies will not be reported via ‘isi dm jobs list’ or ‘isi dm historical-jobs list’ since they never started. Instead, an improperly configured CA is reported in the /var/log/isi_dm.log as follows:

Certificates not correctly installed, Data Mover service sleeping: At least one CA must be installed: No such file or directory from dm_load_certs_from_store (/b/mnt/src/isilon/lib/isi_dm/isi_dm_remote/src/rpc/dm_tls.cpp:197 ) from dm_tls_init (/b/mnt/src/isilon/lib/isi_dm/isi_dm_remote/src/rpc/dm_tls.cpp:279 ): Unable to load certificate information

Once a CA and identity are correctly configured, the SmartSync service automatically activates. Next, SmartSync attempts a handshake with the target. If the CA or identity is mis-configured, the handshake process fails, and generates an entry in /var/log/isi_dm.log. For example:

2025-07-30T12:38:17.864181+00:00 GEN-HOP-NOCL-RR-1(id1) isi_dm_d[52758]: [0x828c0a110]: /b/mnt/src/isilon/lib/isi_dm/isi_dm_remote/src/acct_mon.cpp:dm_acc tmon_try_ping:348: [Fiber 3778] ping for account guid: 0000000000000000c4000000000000000000000000000000, result: dead

Note that the full handshake error detail is logged if the SmartSync service (isi_dm_d) is set to log at the ‘info’ or ‘debug’ level using isi_ilog:

# isi_ilog -a isi_dm_d --level info+

Valid ilog levels include:

fatal error err notice info debug trace

error+ err+ notice+ info+ debug+ trace+

A copy or repeat-copy policy requires an available dataset for replication before running. If a dataset has not been successfully created prior to the copy or repeat-copy policy job starting for the same base path, the job is paused. In the following example, the base path of the copy policy is not the same as that of the dataset policy, hence the job fails with a “path doesn’t match…” error.

# ls -l /ifs/data/Isilon_support/Datamover/transfer_failures

Total 9

-rw-rw----   1 root  wheel  679  July 20 10:56 baseline_failure_10

# cat /ifs/data/Isilon_support/Datamover/transfer_failures/baseline_failure_10

Task_id=0x00000000000000ce, task_type=root task ds base copy, task_state=failed-fatal path doesn’t match dataset base path: ‘/ifs/test’ != /ifs/data/repeat-copy’:

from bc_task)initialize_dsh (/b/mnt/src/isilon/lib/isi_dm/isi_dm/src/ds_base_copy

from dmt_execute (/b/mnt/src/isilon/lib/isi_dm/isi_dm/src/ds_base_copy_root_task

from dm_txn_execute_internal (/b/mnt/src/isilon/lib/isi_dm/isi_dm_base/src/txn.cp

from dm_txn_execute (/b/mnt/src/isilon/lib/isi_dm/isi_dm_base/src/txn.cpp:2274)

from dmp_task_spark_execute (/b/mnt/src/isilon/lib/isi_dm/isi_dm/src/task_runner.

Once any errors for a policy have been resolved, the ‘isi dm jobs modify’ command can be used to resume the job.

OneFS SmartSync Backup-to-Object Configuration

As we saw in the previous article in this series, SmartSync in OneFS 9.11 sees the addition of backup-to-object, which provides high performance, full-fidelity incremental replication to ECS, ObjectScale, Wasabi, and AWS S3 & Glacier IR object stores.

This new SmartSync backup-to-object functionality supports the full spectrum of OneFS path lengths, encodings, and file sizes up to 16TB – plus special files and alternate data streams (ADS), symlinks and hardlinks, sparse regions, and POSIX and SMB attributes. Specifically:

Copy-to-object (OneFS 9.10 & earlier) Backup-to-object (OneFS 9.11)
·         One-time file system copy to object

·         Baseline replication only, no support for incremental copies

·         Browsable/accessible filesystem-on-object representation

·         Certain object limitations

o   No support for sparse regions and hardlinks

o   Limited attribute/metadata support

o   No compression

·         Full-fidelity file system baseline & incremental replication to object

o   Supports ADS, special files, symlinks, hardlinks, sparseness, POSIX/NT attributes, and encoding

o   Any file size and any path length

·         Fast incremental copies

·         Compact file system snapshot representation in native cloud

·         Object representation

o   Grouped by target base-path in policy configuration

o   Further grouped by Dataset ID, Global File ID

SmartSync backup-to-object operates on user-defined data set, which are essentially OneFS file system snapshots with plus additional properties.

A data set creation policy takes snapshots and creates a data set out of it. Additionally, there are also copy and repeat copy policies which are the policies that will transfer that data set to another system. And the execution of these two policy types can be linked and scheduled separately. So you can have one schedule for your data set creation, say to create a data set every hour on a particular path. And you can have a tiered or different distribution system for the actual copy itself. For example, to copy every hour to a hot DR cluster in data center A. But also copy every month to a deep archive cluster in data center B. So all these things are possible now, without increasing the bloat of snapshots on the system, since they’re now able to be shared.

Currently, SmartSync does not have a WebUI presence, so all its configuration is either via the command-line or platform API.

Here’s the procedure for crafting a baseline replication config:

Essentially, create the replication account, which in OneFS 9.11 will be either Dell ECS or Amazon AWS. Then configure that dataset creation policy, run it, and, if desired, create a repeat-copy policy. These specific steps with their CLI syntax include:

  1. Create a replication account:
# isi dm account create --account-type [AWS_S3 | ECS_S3]
  1. Configure a dataset creation policy
# isi dm policies create [Policy Name] --policy-type CREATION
  1. Run the dataset creation policy:
# isi dm policies list

# isi dm policies modify [Creation policy id] –-run-now=true

# isi dm jobs list

# isi dm datasets list
  1. create a repeat-copy policy
# isi dm policies create [Policy Name] --policy-type=' REPEAT_COPY'
  1. Run the repeat-copy policy:
# isi dm policies list

# isi dm policies modify [Repeat-copy policy id] –-run-now=true
  1. View the data replication job status
# isi dm jobs list

Similarly for an incremental replication config:

Note that the dataset creation policy and repeat-copy policy are already created in the baseline replication configure and can be ignored.

Incremental replication using the dataset create and repeat-copy policies from the previous slide’s baseline config.

  1. Run the dataset creation policy
# isi dm policies list

# isi dm policies modify [Creation policy id] –-run-now=true

# isi dm jobs list

# isi dm datasets list
  1. Run the repeat-copy policy:
# isi dm policies list

# isi dm policies modify [Repeat-copy policy id] –-run-now=true
  1. View the data replication incremental job status
# isi dm jobs list

And here’s the basic procedure for creating and running a partial or full restore:

Note that the replication account is already created on the original cluster and the creation step can be ignored.  Replication account creation is only required if restoring the dataset to a new cluster.

Additionally, partial restoration involves a subset of the directory structure, specified via the ‘source path’ , whereas full restoration invokes a restore of the entire dataset.

The process includes creating the replication account if needed, finding the ID of the dataset to be restored, creating and running the partial or full restoration policy, and checking the job status to verify it ran successfully.

  1. Create a replication account:
# isi dm account create --account-type [AWS_S3 | ECS_S3]

For example:

# isi dm account create --account-type ECS_S3 --name [Account Name] --access-id [access-id] --uri [URI with bucket-name] --auth-mode CLOUD --secret-key [secret-key] --storage-class=[For AWS_S3 only: STANDARD or GLACIER_IR]
  1. Verify the dataset ID for restoration:
# isi_dm browse

Checking the following attributes:

  • list-accounts
  • connect-account [Source Account ID created in step 1]
  • list-datasets
  • connect-dataset [Dataset id]
  1. Create a partial or full restoration policy
# isi dm policies create [Policy Name] --policy-type='COPY'
  1. Run the partial or full restoration policy:
# isi dm policies modify [Restoration policy id] –-run-now=true
  1. View the data restoration job status
# isi dm jobs list

OneFS 9.11 also introduces recovery point objective or RPO alerts for SmartSync, but note that these are for repeat-copy policies only. These RPO alerts can be configured through the replication policy by adding the desired time value to the ‘repeat-copy-rpo-alert’ parameter. If this configured threshold is exceeded, an RPO alert is triggered. This RPO alert is automatically resolved after the next successful policy job run.

Also be aware that the default time value for a repeat copy RPO is zero, which instructs SmartSync to not generate RPO alerts for that policy.

The following CLI syntax can be used to create a replication policy, with the ‘–repeat-copy-rpo-alert’ flag set for the desired time:

# isi dm policies create [Policy Name] --policy-type=' REPEAT_COPY' --enabled='true' --priority='NORMAL' --repeat-copy-source-base-path=[Source Path] --repeat-copy-base-base-account-id=[Source account id] --repeat-copy-base-source-account-id=[Source account id] --repeat-copy-base-target-account-id=[Target account id] --repeat-copy-base-new-tasks-account=[Source account id] --repeat-copy-base-target-dataset-type='FILE_ON_OBJECT_BACKUP' --repeat-copy-base-target-base-path=[Bucket Name] --repeat-copy-rpo-alert=[time]

And similarly to change the RPO alert configuration on an existing replication policy:

# isi dm policies modify [Policy id] --repeat-copy-rpo-alert=[time]

An alert is triggered and corresponding CELOG event created if the specified RPO for the policy is exceeded. For example:

# isi event list

ID   Started     Ended       Causes Short                     Lnn  Events  Severity

--------------------------------------------------------------------------------------

1898 07/15 00:00 07/15 00:00 SW_CELOG_HEARTBEAT               1    1       information

2012 07/15 06:03 --          SW_DM_RPO_EXCEEDED               2    1       warning

--------------------------------------------------------------------------------------

And then once RPO alert has been resolved after a successful replication policy job run:

# isi event list

ID   Started     Ended       Causes Short                     Lnn  Events  Severity

--------------------------------------------------------------------------------------

1898 07/15 00:00 07/15 00:00 SW_CELOG_HEARTBEAT               1    1       information

2012 07/15 06:03 07/15 06:12 SW_DM_RPO_EXCEEDED               2    2       warning

--------------------------------------------------------------------------------------

OneFS SmartSync Backup-to-Object

Another significant benefactor of new functionality in the recent OneFS 9.11 release is SmartSync. As you may recall, SmartSync allows multiple copies of a dataset to be copied, replicated, and stored across locations and regions, both on and off-prem, providing increased data resilience and the ability to rapidly recover from catastrophic events.

In addition to fast, efficient, scalable protection with granular recovery, SmartSync allows organizations to utilize lower cost object storage as the target for backups, reduce data protection complexity and cost by eliminating the need for separate backup applications. Plus disaster recovery options include restoring a dataset to its original state, or cloning a new cluster.

SmartSync sees the following enhancements in OneFS 9.11:

  • Automated incremental-forever replication to object storage.
  • Unparalleled scalability and speed, with seamless pause/resume for robust resiliency and control.
  • End-to-end encryption for security of data-in-flight and at rest.
  • Complete data replication, including soft/hard links, full file paths, and sparse files
  • Object storage targets: AWS S3, AWS Glacier IR, Dell ECS/ObjectScale, and Wasabi (with the addition of Azure and GCP support in a future release).

But first, a bit of background. Introduced back in OneFS 9.4, SmartSync operates in two distinct modes:

  • Regular push-and-pull transfer of file data between PowerScale clusters.
  • CloudCopy, copying of file-to-object data from a source cluster to a cloud object storage target.

CloudCopy copy-to-object in OneFS 9.10 and earlier releases is strictly a one-time copy tool, rather than a replication utility. So, after a copy, viewing the bucket contents from AWS, console or S3 browser yielded an object format tree-like representation of the OneFS file system. However, there were a number of significant shortcomings, such as no native support for attributes like ACLs, or certain file types like character files, and no method to represent hard links in a reasonable way. So OneFS had to work around these things by expanding hard links, and redirecting objects that had too long of a path. The other major limitation was that it really had just been a one-and-done copy. After creating and running a policy, once the job had completed the data was in the cloud, and that was it. OneFS had no provision for any incremental transfer of any subsequent changes to the cloud copy when the source data changed.

In order to address these limitations, SmartSync in OneFS 9.11 sees the addition of backup-to-object functionality. This includes a full-fidelity file system baseline, plus fast incremental replication to Dell ECS and ObjectScale, Wasabi, and AWS S3 and Glacier IR object stores.

This new backup-to-object functionality supports the full range of OneFS path lengths, encodings, and file sizes up to 16TB – plus special files and alternate data streams (ADS), symlinks and hardlinks, sparse regions, and POSIX and SMB attributes.

Copy-to-object (OneFS 9.10 & earlier) Backup-to-object (OneFS 9.11)
·         One-time file system copy to object

·         Baseline replication only, no support for incremental copies

·         Browsable/accessible filesystem-on-object representation

·         Certain object limitations

o   No support for spareness and hardlinks

o   Limited attribute/metadata support

o   No compression

·         Full-fidelity file system baseline & incremental replication to object

o   Supports ADS, special files, symlinks, hardlinks, sparseness, POSIX/NT attributes, and encoding

o   Any file size and any path length

·         Fast incremental copies

·         Compact file system snapshot representation in native cloud

·         Object representation

o   Grouped by target basepath in policy configuration

o   Further grouped by Dataset ID, Global File ID

 

Architecturally, SmartSync incorporates the following concepts:

Concept Description
Account •      References to systems that participate in jobs (PowerScale clusters, cloud hosts)

•      Made up of a name, a URI and auth info

Dataset •      Abstraction of a filesystem snapshot; the thing we copy between systems

•      Identified by Dataset IDs

Global File ID •      Conceptually a global LIN that references a specific file on a specific system
Policy •      Dataset creation policy creates a dataset

•      Copy/Repeat Copy policies take an existing dataset and put it on another system

•      Policy execution can be linked and scheduled

 

Push/Pull, Cascade/Reconnect

 

•      Clusters syncing to each other in sequence (A>B>C)

•      Clusters can skip baseline copy and directly perform incremental updates (A>C)

•      Clusters can both request and send datasets

Transfer resiliency

 

•      Small errors don’t need to halt a policy’s progress

Under the hood, SmartSync uses this concept of a data set, which is fundamentally an abstraction of a OneFS file system snapshot – albeit with some additional properties attached to it.

Each data set is identified by a unique ID. Plus, with this notion of data sets, OneFS can now also perform an A to B replication and an A to C replication – two replications of the same data set to two different targets. Plus with these new data sets, B and C can now also reference each other and perform incremental replication amongst themselves, assuming they have a common ancestor snapshot that they share.

A SmartSync data set creation policy takes snapshots and creates a data set from it. Additionally, there are also copy and repeat copy policies, which are the policies that are used to transfer that data set to another system. The execution of these two policy types can be linked and scheduled separately. So one schedule can be for data set creation, say to create a data set every hour on a particular path, and the other schedule for a tiered or different distribution system for the actual copy itself. For example, in order to copy hourly to a hot DR cluster in data center A, and also copy monthly to a deep archive cluster in data center B – all without increasing the proliferation of snapshots on the system, since they’re now able to be shared.

Additionally, SmartSync in 9.11 also introduces the foundational concept of a global file ID (GFID), which is essentially a global LIN that represents a specific file on a particular system. OneFS can now use this GFID, in combination with a data set, to reference a file anywhere and guarantee that it means the same thing across every cluster.

Security-wise, each SmartSync daemon has an identity certificate that acts as both a client and server certificate depending on the direction of the data movement. This identity certificate is signed by a non-public certificate authority. To establish trust between two clusters, they must have each other’s CAs. These CAs may be the same. Trust groups (daemons that may establish connections to each other) are formed by having shared CAs installed.

There are no usernames or passwords; authentication is authorization for V1. All cluster-to-cluster communication is performed via TLS-encrypted traffic. If absolutely necessary, encryption (but not authorization) can be disabled by setting a ‘NULL’ encryption cipher for specific use cases that require unencrypted traffic.

The SmartSync daemon supports checking certificate revocation status via the Online Certificate Status Protocol (OCSP). If the cluster is hardened and/or in FIPS-compliant mode, OCSP checking is forcibly enabled and set to the Strict stringency level, where any failure in OCSP processing results in a failed TLS handshake. Otherwise, OCSP checking can be totally disabled or set to a variety of values corresponding to desired behavior in cases where the responder is unavailable, the responder does not have information about the cert in question, and where information about the responder is missing entirely. Similarly, an override OCSP responder URI is configurable to support cases where preexisting certificates do not contain responder information.

SmartSync also supports a ‘strict hostname check’ option which mandates that the common name and/or subject alternative name fields of the peer certificate match the URI used to connect to that peer. This option, along with strict OCSP checking and disabling the null cipher option, are forcibly set when the cluster is operating in a hardened or FIPS-compliant mode.

For object storage connections, SmartSync uses ‘isi_cloud_api’ just as CloudPools does. As such, all considerations that apply to CloudPools also apply to SmartSync as well.

In the next article in this series, we’ll turn our attention to the core architecture and configuration of SmartSync backup-to-object.

PowerScale H and A-series Journal Mirroring and Hardware Resilience

The last couple of articles generated several questions for the field around durability and resilience in the newly released PowerScale H710/0 and A310/0 nodes. In this article, we’ll take a deeper look at the OneFS journal and boot drive mirroring functionality in these H and A-series platforms.

PowerScale chassis-based hardware, such as the new H710/7100 and A310/3100, stores the local filesystem journal and its mirror on persistent, battery-backed flash media within each node, with a 4RU PowerScale chassis housing four nodes. These nodes comprise a ‘compute node enclosure for the CPU, memory, and, and network cards, plus associated drive containers, or sleds, for each node.

The PowerScale H and A-series employ a node-pair architecture to dramatically increased system reliability, with each pair of nodes residing within a chassis power zone. This means that if a node’s PSU fails, the peer PSU supplies redundant power. It also drives a minimum cluster or node pool size of four nodes (one chassis) for the PowerScale H and A-series platforms, pairwise node population, and the need to scale the cluster two nodes at a time.

A node’s file system journal is protected against sudden power loss or hardware failure by OneFS’ journal vault functionality – otherwise known as ‘powerfail memory persistence’, or PMP. PMP automatically stores both the local journal and journal mirror on a separate flash drive across both nodes in a node pair:

This journal de-staging process is known as ‘vaulting’, during which the journal is protected by a dedicated battery in each node until it’s safely written from DRAM to SSD on both nodes in a node-pair. With PMP, constant power isn’t required to protect the journal in a degraded state since the journal is saved to M.2 flash and mirrored on the partner node.

So, the mirrored journal is comprised of both hardware and software components, including the following constituent parts:

Journal Hardware Components

  • System DRAM
  • 2 Vault Flash
  • Battery Backup Unit (BBU)
  • Non-Transparent Bridge (NTB) PCIe link to partner node
  • Clean copy on disk

Journal Software Components

  • Power-fail Memory Persistence (PMP)
  • Mirrored Non-volatile Interface (MNVI)
  • IFS Journal + Node State Block (NSB)
  • Utilities

Asynchronous DRAM Refresh (ADR) preserves RAM contents when the operating system is not running. ADR is important for preserving RAM journal contents across reboots, and it does not require any software coordination to do so.

The journal vaulting functionality encompasses the hardware, firmware, and operating system, ensuring that the journal’s contents are preserved across power failure. The mechanism is similar to the software journal mirroring employed on the PowerScale F-series nodes, albeit using a PCIe-based NTB on the chassis based platforms, instead of using the back-end network as with the all-flash nodes.

On power failure, the PMP vaulting functionality is responsible for copying both the local journal and the local copy of the partner node’s journal to persistent flash. On restoration of power, PMP is responsible for restoring the contents of both journals from flash to RAM, and notifying the operating system.

A single dedicated 480GB NVMe flash device (nvd0) is attached via an M.2 slot on the motherboard of the H710/0 and A310/0 node’s compute module, residing under the battery backup unit (BBU) pack.

This is in contrast to the prior H and A-series chassis generations, which used a 128GB SATA M.2 device (/dev/ada0).

For example, the following CLI commands show the NVMe M.2 flash device in an A310 node:

# isi_hw_status | grep -i prod
Product: A310-4U-Single-96GB-1x1GE-2x25GE SFP+-60TB-1638GB SSD-SED

# nvmecontrol devlist
 nvme0: Dell DN NVMe FIPS 7400 RI M.2 80 480GB
    nvme0ns1 (447GB)

# gpart show | grep nvd0
=>       40  937703008  nvd0  GPT  (447G)

# gpart show -l nvd0
=>       40  937703008  nvd0  GPT  (447G)
         40       2008        - free -  (1.0M)
       2048   41943040     1  isilon-pmp  (20G)
   41945088  895757960        - free -  (427G)

In the above, the ‘isilon-pmp’ partition on the M.2 flash device is used by the file system journal for its vaulting activities.

The the NVMe M.2 device is housed on the node compute module’s riser card, and its firmware is managed by the OneFS DSP (drive support package) framework:

Note that the entire compute module must be removed in order for its M.2 flash to be serviced. If the M.2 flash does need to be replaced for any reason, it will be properly partitioned and the PMP structure will be created as part of arming the node for vaulting.

For clusters using data-at-rest encryption (DARE), an encrypted M.2 device is used, in conjunction with SED data drives, to provide full FIPS compliance.

The battery backup unit (BBU), when fully charged, provides enough power to vault both the local and partner journal during a power failure event:

A single battery is utilized in the BBU, which also supports back-to-back vaulting:

On the software side, the journal’s Power-fail Memory Persistence (PMP) provides an equivalent to the NVRAM controller‘s vault/restore capabilities to preserve Journal. The PMP partition on the M.2 flash drive provides an interface between the OS and firmware.

If a node boots and its primary journal is found to be invalid for whatever reason, it has three paths for recourse:

  • Recover journal from its M.2 vault.
  • Recover journal from its disk backup copy.
  • Recover journal from its partner node’s mirrored copy.

The mirrored journal must guard against rolling back to a stale copy of the journal on reboot. This necessitates storing information about the state of journal copies outside the journal. As such, the Node State Block (NSB) is a persistent disk block that stores local and remote journal status (clean/dirty, valid/invalid, etc), as well as other non-journal information. NSB stores this node status outside the journal itself, and ensures that a node does not revert to a stale copy of the journal upon reboot.

Here’s the detail of an individual node’s compute module:

Of particular note is the ‘journal active’ LED, which is displayed as a white ‘hand icon’:

When this white hand icon is illuminated, it indicates that the mirrored journal is actively vaulting, and it is not safe to remove the node!

There is also a blue ‘power’ LED, and a yellow ‘fault’ LED per node. If the blue LED is off, the node may still be in standby mode, in which case it may still be possible to pull debug information from the baseboard management controller (BMC).

The flashing yellow ‘fault’ LED has several state indication frequencies:

Blink Speed Blink Frequency Indicator
Fast blink ¼ Hz BIOS
Medium blink 1 Hz Extended POST
Slow blink 4 Hz Booting OS
Off Off OS running

The mirrored non-volatile interface (MNVI) sits below /ifs and above RAM and the NTB, providing the abstraction of a reliable memory device to the /ifs journal. MNVI is responsible for synchronizing journal contents to peer node RAM, at the direction of the journal, and persisting writes to both systems while in a paired state. It upcalls into the journal on NTB link events, and notifies the journal of operation completion (mirror sync, block IO, etc). For example, when rebooting after a power outage, a node automatically loads the MNVI. It then establishes a link with its partner node and synchronizes its journal mirror across the PCIe Non-Transparent Bridge (NTB).

The Non-transparent Bridge (NTB) connects node pairs for OneFS Journal Replica:

The NTB Link itself is PCIe Gen3 X8, but there is no guarantee of NTB interoperability between different CPU generations. As such, the H710/0 and A310/0 use version 4 of the NTB driver, whereas the previous hardware generation uses NTBv3. This therefore means mixed-generation node pairs are unsupported.

Prior to mounting the /ifs file system, OneFS locates a valid copy of the journal from one of the following locations in order of preference:

Order Journal Location Description
1st Local disk A local copy that has been backed up to disk
2nd Local vault A local copy of the journal restored from Vault into DRAM
3rd Partner node A mirror copy of the journal from the partner node

Assuming the node was shut down cleanly, it will boot using a local disk copy of the journal. The journal will be restored into DRAM and /ifs will mount. On the other hand, if the node suffered a power disruption, the journal will be restored into DRAM from the M.2 vault flash instead (the PMP copies the journal into the M.2 vault during a power failure).

In the event that OneFS is unable to locate a valid journal on either the hard drives or M.2 flash on a node, it will retrieve a mirrored copy of the journal from its partner node over the NTB.  This is referred to as ‘Sync-back’.

Note: Sync-back state only occurs when attempting to mount /ifs.

On booting, if a node detects that its journal mirror on the partner node is out of sync (invalid), but the local journal is clean, /ifs will continue to mount.  Subsequent writes are then copied to the remote journal in a process known as ‘sync-forward’.

Here’s a list of the primary journal states:

Journal State Description
Sync-forward State in which writes to a journal are mirrored to the partner node.
Sync-back Journal is copied back from the partner node. Only occurs when attempting to mount /ifs.
Vaulting Storing a copy of the journal on M.2 flash during power failure. Vaulting is performed by PMP.

During normal operation, writes to the primary journal and its mirror are managed by the MNVI device module, which writes through local memory to the partner node’s journal via the NTB. If the NTB is unavailable for an extended period, write operations can still be completed successfully on each node. For example, if the NTB link goes down in the middle of a write operation, the local journal write operation will complete. Read operations are processed from local memory.

Additional journal protection for PowerScale chassis-based platforms is provided by OneFS’ powerfail memory persistence (PMP) functionality, which guards against PCI bus errors that can cause the NTB to fail.  If an error is detected, the CPU requests a ‘persistent reset’, during which the memory state is protected and node rebooted. When back up again, the journal is marked as intact and no further repair action is needed.

If a node loses power, the hardware notifies the BMC, initiating a memory persistent shutdown.  At this point the node is running on battery power. The node is forced to reboot and load the PMP module, which preserves its local journal and its partner’s mirrored journal by storing them on M.2 flash.  The PMP module then disables the battery and powers itself off.

Once power is back on and the node restarted, the PMP module first restores the journal before attempting to mount /ifs.  Once done, the node then continues through system boot, validating the journal, setting sync-forward or sync-back states, etc.

The mirrored journal has the following CLI commands, although these should seldom be needed during normal cluster operation:

  • isi_save_journal
  • isi_checkjournal
  • isi_testjournal
  • isi_pmp

A node’s journal can be checked and confirmed healthy as follows:

# isi_testjournal
Checking One external batteries Health...
Batteries good
Checking PowerScale Journal integrity...
Mounted DRAM journal check: good
IFS is mounted.

During boot, isi_checkjournal and isi_testjournal will invoke isi_pmp. If the M.2 vault devices are unformatted, isi_pmp will format the devices.

On clean shutdown, isi_save_journal stashes a backup copy of the /dev/mnv0 device on the root filesystem, just as it does for the NVRAM journals in previous generations of hardware.

If a mirrored journal issue is suspected, or notified via cluster alerts, the best place to start troubleshooting is to take a look at the node’s log events. The journal logs to /var/log/messages, with entries tagged as ‘journal_mirror’.

Additionally, the following sysctls also provide information about the state of the journal mirror itself and the MNVI connection respectively:

# sysctl efs.journal.mirror_state
efs.journal.mirror_state:
{
    Journal state: valid_protected
    Journal Read-only: false
    Need to inval mirror: false
    Sync in progress: false
    Sync error: 0
    Sync noop in progress: false
    Mirror work queued: false
    Local state:
    {
        Clean: dirty
        Valid: valid
    }
    Mirror state:
    {
        Connection: up
        Validity: valid
    }
}

And the MNVI connection state:

# sysctl hw.mnv0.state
hw.mnv0.state.iocnt: 0
hw.mnv0.state.cb_active: 0
hw.mnv0.state.io_gate: 0
hw.mnv0.state.state: 3

OneFS provides the following CELOG events for monitoring and alerting about mirrored journal issues:

CELOG Event Description
HW_GEN6_NTB_LINK_OUTAGE Non-transparent bridge (NTP) PCIe link is unavailable
FILESYS_JOURNAL_VERIFY_FAILURE No valid journal copy found on node

Another OneFS reliability optimization for the PowerScale chassis-based platforms is boot partition mirroring. OneFS boot and other OS partitions are stored on a node’s internal drives, and these partitions are mirrored (with the exception of crash dump partitions). The two mirrors protect against disk sled removal. Since each drive in a disk sled belongs to a separate disk pool, both elements of a mirror cannot live on the same sled.

With regard to the nodes’ internal drives, the boot disk reservation size has increased to 18GB on these new platforms from 8GB on the previous generation. Plus partition sizes have also been expanded on these new platforms in OneFS 9.11, as follows:

Partition H71x and A31x H70x and A30x
hw 1GB 500MB
journal backup 8197MB 8GB
kerneldump 5GB 2GB
keystore 64MB 64MB
root 4GB 2GB
var 4GB 2GB
var-crash 7GB 3GB

OneFS automatically rebalances these mirrors in anticipation of, and in response to, service events. Mirror rebalancing is triggered by drive events such as suspend, softfail and hard loss.

The ‘isi_mirrorctl verify’ and ‘gmirror status’ CLI commands can be used to confirm that boot mirroring is working as intended. For example, on an A310 node:

# gmirror status
Name Status Components
mirror/root0 COMPLETE da10p3 (ACTIVE)
da11p3 (ACTIVE)
mirror/mfg COMPLETE da15p7 (ACTIVE)
da12p6 (ACTIVE)
mirror/kernelsdump COMPLETE da15p6 (ACTIVE)
mirror/kerneldump COMPLETE da15p5 (ACTIVE)
mirror/var-crash COMPLETE da15p3 (ACTIVE)
da9p3 (ACTIVE)
mirror/journal-backup COMPLETE da14p5 (ACTIVE)
da12p5 (ACTIVE)
mirror/jbackup-peer COMPLETE da14p3 (ACTIVE)
da12p3 (ACTIVE)
mirror/keystore COMPLETE da12p7 (ACTIVE)
da10p10 (ACTIVE)
mirror/root1 COMPLETE da11p7 (ACTIVE)
da10p7 (ACTIVE)
mirror/var0 COMPLETE da11p6 (ACTIVE)
da10p6 (ACTIVE)
mirror/hw COMPLETE da10p9 (ACTIVE)
da7p5 (ACTIVE)
mirror/var1 COMPLETE da10p8 (ACTIVE)
da7p3 (ACTIVE)

Or:

# isi_mirrorctl verify
isi.sys.distmirror - INFO - Mirror root1: has an ACTIVE consumer of da11p5
isi.sys.distmirror - INFO - Mirror root1: has an ACTIVE consumer of da10p7
isi.sys.distmirror - INFO - Mirror var1: has an ACTIVE consumer of da13p5
isi.sys.distmirror - INFO - Mirror var1: has an ACTIVE consumer of da16p5
isi.sys.distmirror - INFO - Mirror journal-backup: has an ACTIVE consumer of da12p5
isi.sys.distmirror - INFO - Mirror journal-backup: has an ACTIVE consumer of da16p6
isi.sys.distmirror - INFO - Mirror jbackup-peer: has an ACTIVE consumer of da12p3
isi.sys.distmirror - INFO - Mirror jbackup-peer: has an ACTIVE consumer of da14p3
isi.sys.distmirror - INFO - Mirror var-crash: has an ACTIVE consumer of da10p6
isi.sys.distmirror - INFO - Mirror var-crash: has an ACTIVE consumer of da11p3
isi.sys.distmirror - INFO - Mirror kerneldump: has an ACTIVE consumer of da14p5
isi.sys.distmirror - INFO - Mirror root0: has an ACTIVE consumer of da10p3
isi.sys.distmirror - INFO - Mirror root0: has an ACTIVE consumer of da13p6
isi.sys.distmirror - INFO - Mirror var0: has an ACTIVE consumer of da13p3
isi.sys.distmirror - INFO - Mirror var0: has an ACTIVE consumer of da16p3
isi.sys.distmirror - INFO - Mirror kernelsdump: has an ACTIVE consumer of da14p6
isi.sys.distmirror - INFO - Mirror mfg: has an ACTIVE consumer of da13p9
isi.sys.distmirror - INFO - Mirror mfg: has an ACTIVE consumer of da16p7
isi.sys.distmirror - INFO - Mirror hw: has an ACTIVE consumer of da10p8
isi.sys.distmirror - INFO - Mirror hw: has an ACTIVE consumer of da13p8
isi.sys.distmirror - INFO - Mirror keystore: has an ACTIVE consumer of da13p10
isi.sys.distmirror - INFO - Mirror keystore: has an ACTIVE consumer of da16p8

The A310 node’s disks in the output above are laid out as follows:

# isi devices drive list
Lnn  Location  Device    Lnum  State   Serial       Sled
---------------------------------------------------------
128  Bay  1    /dev/da1  15    L3      X3X0A0JFTMSJ N/A
128  Bay  2    -         N/A   EMPTY                N/A
128  Bay  A0   /dev/da4  12    HEALTHY WQB0QKBR     A
128  Bay  A1   /dev/da3  13    HEALTHY WQB0QHV4     A
128  Bay  A2   /dev/da2  14    HEALTHY WQB0QHN3     A
128  Bay  B0   /dev/da7  9     HEALTHY WQB0QH4S     B
128  Bay  B1   /dev/da6  10    HEALTHY WQB0QGY3     B
128  Bay  B2   /dev/da5  11    HEALTHY WQB0QJWE     B
128  Bay  C0   /dev/da10 6     HEALTHY WQB0QJ26     C
128  Bay  C1   /dev/da9  7     HEALTHY WQB0QHYW     C
128  Bay  C2   /dev/da8  8     HEALTHY WQB0QK6Q     C
128  Bay  D0   /dev/da13 3     HEALTHY WQB0QJES     D
128  Bay  D1   /dev/da12 4     HEALTHY WQB0QHGG     D
128  Bay  D2   /dev/da11 5     HEALTHY WQB0QKH5     D
128  Bay  E0   /dev/da16 0     HEALTHY WQB0QHFR     E
128  Bay  E1   /dev/da15 1     HEALTHY WQB0QJWD     E
128  Bay  E2   /dev/da14 2     HEALTHY WQB0QKGB     E
---------------------------------------------------------

When it comes to SmartFailing nodes, there are a couple of additional caveats to be aware of with mirrored journal and the PowerScale chassis-based platforms:

  • When SmartFailing one node in a pair, there is no compulsion to smartfail its partner node too.
  • A node will still run indefinitely with its partner absent. However, this significantly increases the window of risk since there is no journal mirror to rely on (in addition to lack of redundant power supply, etc).
  • If a single node in a pair is SmartFailed, the other node’s journal is still protected by the vault and powerfail memory persistence.

PowerScale A310 and A3100 Platforms

In this article, we’ll examine the new PowerScale A310 and A3100 hardware platforms that were released a couple of weeks back.

These A310 and A3100 comprise the latest generation of PowerScale A-series ‘archive’ platforms:

The PowerScale A-series systems are designed for cooler, infrequently accessed data use cases. These include active archive workflows for the A310, such as regulatory compliance data, medical imaging archives, financial records, and legal documents. And deep archive/cold storage for the A3100 platform, including surveillance video archives, backup, and DR repositories.

Representing the archive-tier, the A310 and A3100 both utilize a single-socket Zeon processor with 96GB of memory and fifteen (A310) or twenty hard drives per node respectively, plus SSDs for metadata/caching – and with four nodes residing within a 4RU chassis. From an initial 4 node (1 chassis) starting point, A310 and A31100 clusters can be easily and non-disruptively scaled two nodes at a time up to a maximum of 252 nodes (63 chassis) per cluster.

The A31x modular platform is based on Dell’s ‘Infinity’ chassis. Each node’s compute module contains a single 16-core Intel Sapphire Rapids CPU running at 1.8 GHz and with 22.5MB of cache, plus 96GB of DDR5 DRAM. Front-End networking options include 10/25 GbE and with both Ethernet or Infiniband as selectable options for the back-End network.

As such, the new A31x core hardware specifications are as follows:

Hardware Class PowerScale A-Series (Archive)
Model A310 A3100
OS version Requires OneFS 9.11 or above.

Requires NFP 13.1 or greater.

BIOS based on Dell’s PowerBIOS

Requires OneFS 9.11 or above.

Requires NFP 13.1 or greater.

BIOS based on Dell’s PowerBIOS

Platform Four nodes per 4RU chassis; upgradeable per pair; node-compatible with prior gens. Four nodes per 4RU chassis; upgradeable per pair; node-compatible with prior gens.
CPU 8 Cores @ 1.8GHz, 22.5MB Cache 8 Cores @ 1.8GHz, 22.5MB Cache
Memory 96GB DDR5 DRAM 96GB DDR5 DRAM
Journal M.2: 480GB NVMe with 3-cell battery backup (BBU) M.2: 480GB NVMe with 3-cell battery backup (BBU)
Depth Standard 36.7 inch chassis Deep 42.2 inch chassis
Cluster size Max of 63 chassis (252 nodes) per cluster. Max of 63 chassis (252 nodes) per cluster.
Storage Drives 60 per chassis     (15 per node) 80 per chassis     (20 per node)
HDD capacities 2TB,4TB, 8TB, 12TBTB, 16TB, 20TB, 24TB 12TBTB, 16TB, 20TB, 24TB
SSD (cache) capacities 0.8TB, 1.6TB, 3.2TB, 7.68TB 0.8TB, 1.6TB, 3.2TB, 7.68TB
Max raw capacity 1.4PB per chassis 1.9PB per chassis
Front-end network 10/25 Gb Ethernet 10/25 Gb Ethernet
Back-end network Ethernet or Infiniband Ethernet or Infiniband

These node hardware attributes can be easily viewed from the OneFS CLI via the ‘isi_hw_status’ command. For example, from an A3100:

# isi_hw_status
  SerNo: CF2BC243400025

 Config: H6R28

ChsSerN:

ChsSlot: 1

FamCode: A

ChsCode: 4U

GenCode: 10

PrfCode: 3

   Tier: 3
  Class: storage
 Series: n/a
Product: A3100-4U-Single-96GB-1x1GE-2x25GE SFP+-240TB-6554GB SSD
  HWGen: PSI

Chassis: INFINITY (Infinity Chassis)

    CPU: GenuineIntel (1.80GHz, stepping 0x000806f8)

   PROC: Single-proc, Octa-core

    RAM: 103079215104 Bytes

   Mobo: INFINITYPIFANO (Custom EMC Motherboard)

  NVRam: INFINITY (Infinity Memory Journal) (4096MB card) (size 4294967296B)

 DskCtl: LSI3808 (LSI 3808 SAS Controller) (8 ports)

 DskExp: LSISAS35X36I (LSI SAS35x36 SAS Expander - Infinity)

PwrSupl: Slot1-PS0 (type=ACBEL POLYTECH, fw=03.01)

PwrSupl: Slot2-PS1 (type=ACBEL POLYTECH, fw=03.01)

  NetIF: bge0,lagg0,mce0,mce1,mce2,mce3

 BEType: 25GigE

 FEType: 25GigE

 LCDver: IsiVFD2 (Isilon VFD V2)

 Midpln: NONE (No Midplane Support)

Power Supplies OK

Power Supply Slot1-PS0 good

Power Supply Slot2-PS1 good

CPU Operation (raw 0x882C0800)  = Normal

CPU Speed Limit                 = 100.00%

Fan0_Speed                      = 12360.000

Fan1_Speed                      = 12000.000

Slot1-PS0_In_Voltage            = 212.000

Slot2-PS1_In_Voltage            = 209.000

SP_CMD_Vin                      = 12.100

CMOS_Voltage                    = 3.120

Slot1-PS0_Input_Power           = 290.000

Slot2-PS1_Input_Power           = 290.000

Pwr_Consumption                 = 590.000

SLIC0_Temp                      = na

SLIC1_Temp                      = na

DIMM_Bank0                      = 42.000

DIMM_Bank1                      = 40.000

CPU0_Temp                       = -43.000

SP_Temp0                        = 40.000

MP_Temp0                        = na

MP_Temp1                        = 29.000

Embed_IO_Temp0                  = 51.000

Hottest_SAS_Drv                 = -45.000

Ambient_Temp                    = 29.000

Slot1-PS0_Temp0                 = 47.000

Slot1-PS0_Temp1                 = 40.000

Slot2-PS1_Temp0                 = 47.000

Slot2-PS1_Temp1                 = 40.000

Battery0_Temp                   = 38.000

Drive_IO0_Temp                  = 43.000

Also note that the A310 and A3100 are only available in a 96GB memory configuration.

On the front of each chassis is an LCD front panel control with back-lit buttons and 4 LED Light Bar Segments – 1 per Node. These LEDs typically display blue for normal operation or yellow to indicate a node fault. This LCD display is articulating,  allowing it to be swung clear of the drive sleds for non-disruptive HDD replacement, etc.

The rear of the chassis houses the compute modules for each node, which contain CPU, memory, networking, cache SSDs, and power supplies. Specifically, an individual compute module contains a multi core Cascade Lake CPU, memory, M2 flash journal, up to two SSDs for L3 cache, six DIMM channels, front end 10/25 Gb ethernet, backend 40/100 or 10/25 Gb ethernet or Infiniband, an ethernet management interface, and power supply and cooling fans:

As shown above, the field replaceable components are indicated via colored ‘touchpoints’. Two touchpoint colors, orange and blue, indicate respectively which components are hot swappable versus replaceable via a node shutdown.

Touchpoint Detail
Blue Cold (offline) field serviceable component
Orange Hot (Online) field serviceable component

The serviceable components within an PowerScale A310 or A3100 chassis are as follows:

Component Hot Swap CRU FRU
Drive sled Yes Yes Yes
·         Hard drives (HDDs) Yes Yes Yes
Compute node No Yes Yes
·         Compute module No No No
o   M.2 journal flash No No Yes
o   CPU complex No No No
o   DIMMs No No Yes
o   Node fans No No Yes
o   NICs/HBAs No No Yes
o   HBA riser No No Yes
o   Battery backup unit (BBU) No No Yes
o   DIB No No No
·         Flash drives (SSDs) Yes Yes Yes
·         Power supply with fan Yes Yes Yes
Front panel Yes No Yes
Chassis No No Yes
Rail kits No No Yes
Mid-plane Replace entire chassis

Nodes are paired for resilience and durability, with each pair sharing a mirrored journal and two power supplies.

Storage-wise, each of the four nodes within a PowerScale A310/0 chassis’ has five associated drive containers, or sleds. These sleds occupy bays in the front of each chassis, with a node’s drive sleds stacked vertically. For example:

Nodes are numbered 1 through 4, left to right looking at the front of the chassis, while the drive sleds are labeled A  through E, with A at the top.

The drive sled is the tray which slides into the front of the chassis. Within each sled, the 3.5” SAS hard drives it contains are numbered sequentially starting from drive zero, which is the HDD adjacent the airdam.

Each bay in a drive sled has a yellow ‘drive fault’ LED associated with each drive:

Even when a sled is removed from its chassis and its power source, these fault LEDs will remain active for 10+ minutes. LED viewing holes are also provided so the sled’s top cover does not need to be removed.

The A3100’s 42.2 inch chassis accommodates four HDDs per sled, compared to three drives for the standard (36.7 inch) depth A310 shown above. As such, the A3100 requires a deep rack, such as the Dell Titan cabinet whereas the A310 can reside in a regular 17” data center cabinet.

The A310 and A3100 platforms support a range of HDD capacities, currently including 2TB, 4, 8, 12, 16, 20, and 24TB capacities, and both regular ISE (instant secure erase) or self-encrypting drive (SED) formats.

A node’s drive details can be queried with OneFS CLI utilities such as ‘isi_radish’ and ‘isi_drivenum’. For example, the command output from an A3100 node:

# isi_drivenum

Bay  1   Unit 6      Lnum 20    Active      SN:GXNG0X800253     /dev/da1
Bay  2   Unit 7      Lnum 21    Active      SN:GXNG0X800263     /dev/da2
Bay  A0   Unit 19     Lnum 16    Active      SN:ZRT1A5JR         /dev/da6
Bay  A1   Unit 18     Lnum 17    Active      SN:ZRT1A4SE         /dev/da5
Bay  A2   Unit 17     Lnum 18    Active      SN:ZRT1A42D         /dev/da4
Bay  A3   Unit 16     Lnum 19    Active      SN:ZRT19494         /dev/da3
Bay  B0   Unit 25     Lnum 12    Active      SN:ZRT18NEY         /dev/da10
Bay  B1   Unit 24     Lnum 13    Active      SN:ZRT1FJCJ         /dev/da9
Bay  B2   Unit 23     Lnum 14    Active      SN:ZRT18N7F         /dev/da8
Bay  B3   Unit 22     Lnum 15    Active      SN:ZRT1FDJL         /dev/da7
Bay  C0   Unit 31     Lnum 8     Active      SN:ZRT1FJ0T         /dev/da14
Bay  C1   Unit 30     Lnum 9     Active      SN:ZRT1F6BF         /dev/da13
Bay  C2   Unit 29     Lnum 10    Active      SN:ZRT1FJMS         /dev/da12
Bay  C3   Unit 28     Lnum 11    Active      SN:ZRT18NE6         /dev/da11
Bay  D0   Unit 37     Lnum 4     Active      SN:ZRT18N9P         /dev/da18
Bay  D1   Unit 36     Lnum 5     Active      SN:ZRT18N8V         /dev/da17
Bay  D2   Unit 35     Lnum 6     Active      SN:ZRT18NBE         /dev/da16
Bay  D3   Unit 34     Lnum 7     Active      SN:ZRT1FR62         /dev/da15
Bay  E0   Unit 43     Lnum 0     Active      SN:ZRT1FDJ4         /dev/da22
Bay  E1   Unit 42     Lnum 1     Active      SN:ZRT1FR86         /dev/da21
Bay  E2   Unit 41     Lnum 2     Active      SN:ZRT1EJ4H         /dev/da20
Bay  E3   Unit 40     Lnum 3     Active      SN:ZRT1E9MS         /dev/da19

The first two lines of output about (bays 1 & 2) reference the cache SSD drives, contained withing the compute modules. The remaining ‘bay’ locations indicate both the sled (A to E) and drive (0 to 3). The presence above of four HDDs per sled (ie. bay numbers 0 to 3) indicate this is an A3100 node, rather than an A310 with only three HDDs per sled.

With regard to the nodes’ internal drives, the boot disk reservation size has increased to 18GB on these new platforms from 8GB on the previous generation. Plus partition sizes have also been expanded on these new platforms in OneFS 9.11, as follows:

Partition A310 / A3100 A300 / A3000
hw 1GB 500MB
journal backup 8197MB 8GB
kerneldump 5GB 2GB
keystore 64MB 64MB
root 4GB 2GB
var 4GB 2GB
var-crash 7GB 3GB

The PowerScale A310 and A3100 platforms are available in the following networking configurations, with a 10/25Gb Ethernet front-end and either Ethernet or Infiniband back-end:

Model A310 A3100
Front-end network 10/25 GigE 10/25 GigE
Back-end network 10/25 GigE, Infiniband 10/25 GigE, Infiniband

These NICs and their PCI bus addresses can be determined via the ’pciconf’ CLI command, as follows:

# pciconf -l | grep mlx

mlx5_core0@pci0:16:0:0: class=0x020000 card=0x002015b3 chip=0x101f15b3 rev=0x00 hdr=0x00

mlx5_core1@pci0:16:0:1: class=0x020000 card=0x002015b3 chip=0x101f15b3 rev=0x00 hdr=0x00

mlx5_core2@pci0:65:0:0: class=0x020000 card=0x002015b3 chip=0x101f15b3 rev=0x00 hdr=0x00

mlx5_core3@pci0:65:0:1: class=0x020000 card=0x002015b3 chip=0x101f15b3 rev=0x00 hdr=0x00

Similarly, the NIC hardware details and drive firmware versions can be viewed as follows:

# mlxfwmanager

Querying Mellanox devices firmware ...

Device #1:
----------
  Device Type:      ConnectX6LX
  Part Number:      06XJXK_0R5WK9_Ax
  Description:      NVIDIA ConnectX-6 LX Dual Port 25 GbE SFP Network Adapter
  PSID:             DEL0000000031
  PCI Device Name:  pci0:16:0:0
  Base GUID:        58a2e10300e22a24
  Base MAC:         58a2e1e22a24
  Versions:         Current        Available
     FW             26.36.1010     N/A
     PXE            3.6.0901       N/A
     UEFI           14.29.0014     N/A

  Status:           No matching image found

Device #2:
----------
  Device Type:      ConnectX6LX
  Part Number:      06XJXK_0R5WK9_Ax
  Description:      NVIDIA ConnectX-6 LX Dual Port 25 GbE SFP Network Adapter
  PSID:             DEL0000000031
  PCI Device Name:  pci0:65:0:0
  Base GUID:        58a2e10300e22bf4
  Base MAC:         58a2e1e22bf4
  Versions:         Current        Available
     FW             26.36.1010     N/A
     PXE            3.6.0901       N/A
     UEFI           14.29.0014     N/A

  Status:           No matching image found

Compared to their A30x predecessors, the A310 and A3100 see a number of generational hardware upgrades. These include an shift to DDR5 memory, Sapphire Rapids CPU, and an up-spec’d power supply.

In terms of performance, the new A31x nodes provide a significant increase over the prior generation, as shown in the following streaming read and writes comparison chart for the A3100 and A3000:

OneFS node compatibility provides the ability to have similar node types and generations within the same node pool. In OneFS 9.11 and later, compatibility between the A310 and A3100 nodes and the previous generation platform is supported. Specifically, this node pool compatibility includes:

OneFS Node Pool Compatibility Gen6 MLK New
A200 A300/L A310/L
A2000 A3000/L A3100/L
H400 A300 A310

Node pool compatibility checking includes drive capacities, including for both data HDDs and SSD cache. This pool compatibility permits the addition of A310 node pairs to an existing node pool comprising four of more A300s if desired, rather than creating a A310 new node pool. Plus a similar compatibility for A3100/A3000 nodes.

Note that, while the A31x is node pool compatible with the A30x, the A31x nodes are effectively throttled to match the performance envelope of the A30x nodes. Regarding storage efficiency, support for OneFS inline data reduction on mixed A-series diskpools is as follows:

Gen6 MLK New Data Reduction Enabled
A200 A300/L A310/L False
A2000 A3000/L A3100/L False
H400 A300 A310 False
A200 A310 False
A300 A310 True
H400 A310 False
A2000 A3100 False
A3000 A3100 True

To summarize, in combination with OneFS 9.11, these new PowerScale hybrid A31x platforms deliver a compelling value proposition in terms of efficiency, density, flexibility, scalability, and affordability.