February 2024 – Unstructured Data Quick Tips

PowerScale All-flash F710 and F210 Platform Nodes

Hot on the heels of the recent OneFS 9.7 release sees the launch of two new PowerScale F-series hardware offerings. Between them, these new F710 and F210 all-flash nodes add some major horsepower to the PowerScale stable.

Built atop the latest generation of Dell’s PowerEdge R660 platform, the F710 and F210 each boast a range of Gen4 NVMe SSD capacities, paired with a Sapphire Rapids CPU, a generous helping of DDR5 memory, and PCI Gen5 100GbE front and back-end network connectivity – all housed within a compact, power-efficient 1RU form factor chassis.

Here’s where these new nodes sit in the current hardware hierarchy:

As illustrated in the greyed out region of the above chart, these new nodes refresh the current F600 and F200 platforms, and further extend PowerScale’s price-performance envelope.

The PowerScale F210 and F710 nodes offer a substantial hardware evolution from previous generations, while also focusing on environmental sustainability, reducing power consumption and carbon footprint. Housed in a 1RU ‘Smart Flow’ chassis for balanced airflow and enhanced cooling, both new platforms offer greater density than their F600 and F200 predecessors – the F710 now accommodating ten NVMe SSDs per node and 25% greater density, and the F210 now offering NVMe drives with a 15.36 TB option, and doubling the F200’s maximum density. Both platforms also include in-line compression and deduplication by default, further increasing their capacity headroom and effective density. Plus, using Intel’s 4^th gen Xeon Sapphire Rapids CPUs results in 19% lower cycles-per-instruction, while PCIe Gen 5 quadruples throughput over Gen 3, and the latest DDR5 DRAM offers greater speed and bandwidth – all netting up to 90% higher performance per watt. Additionally, the F710 and F210 debut a new 32 GB Software Defined Persistent Memory (SDPM) file system journal, in place of NVDIMM-n in prior platforms, thereby saving a DIMM slot on the motherboard too.

On the OneFS side, the recently launched 9.7 release delivers a dramatic performance bump – particularly for the all-flash platforms. OneFS 9.7 benefits from latency-improving enhancements to its locking infrastructure and protocol heads – plus ‘direct write’ non-cached IO, which we will explore in a future article.

This combination of generational hardware upgrades plus OneFS 9.7 software advancements results in dramatic performance gains for the F710 and F210 – particularly for streaming reads and writes, which see a 2x or greater improvement over the prior F600 and F200 platforms. This makes the F710 and F210 ideal candidates for demanding workloads such as M&E content creation and rendering, high concurrency and low latency workloads such as chip design (EDA), high frequency trading, and all phases of generative AI workflows, etc.

Scalability-wise, both platforms require a minimum of three nodes to form a cluster (or node pool), with up to a maximum of 252 nodes, and the basic specs for the new nodes include:

Component	PowerScale F710	PowerScale F210
CPU	Dual–socket Intel Sapphire Rapids, 2.6GHz, 24C	Single–socket Intel Sapphire Rapids, 2GHz, 12C
Memory	512GB DDR5 DRAM	128GB DDR5 DRAM
SSDs per node	10 x NVMe SSDs	4 x NVMe SSDs
Raw capacities per node	38.4TB to 307TB	7.7TB to 61TB
Drive options	3.84TB, 7.68TB TLC and 15.36TB, 30.72TB QLC	1.92TB, 3.84TB, 7.68TB TLC and 15.36TB QLC
Front-end network	2 x 100GbE or 25GbE	2 x 100GbE or 25GbE
Back-end network	2 x 100 GbE	2 x 100GbE or 25GbE

Note that, while the F210 can coexist with the F200 in the same node pool, the F710 does not currently have any node pool compatibility peers.

Over the next couple of articles, we’ll dig into the technical details of each of the new platforms. But, in summary, when combined with OneFS 9.7, the new PowerScale all-flash F710 and F210 platforms quite simply deliver on efficiency, flexibility, performance, and scalability.

OneFS and Externally Managed Network Pools – Management and Monitoring

In the first article in this series, we took a look at the overview and architecture of the OneFS 9.7 externally managed network pools feature. Now, we’ll turn our focus to its management and monitoring.

From a cluster security point of view, the externally managed IP service has opened up a potential new attack vector whereby a rogue DHCP server could provide bad data. As such, the recommendation is to configure a firewall around this new OneFS DHCP service to ensure that the cluster is protected. While the OneFS firewall could in theory provide this protection, in order to know what the DHCP server is, the cluster first has to discover and talk to the DHCP server and get its IP. This seems a bit paradoxical (and insecure) to be creating a firewall rule after having already talked to and trusted the DHCP server.

The following table contains recommended configuration settings for the AWS firewall.

Setting	Value
Name	Eg. ‘DHCP”
Type	‘ingress’
From Port	67
To Port	68
Protocol	UDP
CIDR Blocks	<cluster_gateway>/32
IPv6 CIDR Blocks	[]
Security Group ID	// customer specific

Note that, as mentioned in the first article in this series, there are a currently a couple of instances of unsupported networking functionality in the APEX file services for AWS offering, as compared to on-prem OneFS, and these include:

IPv6 support
VLANs
Link aggregation
NFSoverRDMA

These limitations for externally managed network pools are highlighted in red below, and are read-only settings since they are managed by the cloud provider (interfaces and IPs).

Externally managed network pools can only be created by the system with OneFS 9.7 and therefore pools cannot be manually reconfigured either to or from externally managed – even by root.

In general manual IP configuration is protected in order to guard against accidental misconfiguration. However, clusters admin may occasionally be required to manually configure the IPs in the network pool, and can be performed with the ‘isi network pool modify’ plus the inclusion of the ‘–force’ flag:

# isi network pool modify subnet0.pool0 –ranges <ip_add_range> --force

Note that AWS has a maximum threshold for the number of IPs that can be configured per network interface based on AMI instance type. If this limit is exceeded, AWS will prevent the IP address from being configured, resulting in a potential data unavailability event. OneFS 9.7 now prevents most instances of IP oversubscription at configuration time in order to ensure availability during a 1/3 cluster outage.

While OneFS accounts for externally managed, static, dynamic IPs, and SSIPs, it is unable to account for unevenly allocated dynamic IPs, so it’s therefore unable to prevent all instances.

OneFS also displays an informative error message if attempting to configure this. For example, using an AMI instance type of ‘m5d.large’:

# isi network pool modify subnet0.pool0 –ranges 10.20.30.203-10.20.30.254

AWS only allows node 2 (instance type AWS=m5d.large) to have a maximum of 10 IPv4 addresses configured. In a degraded state, the requested configuration will result in node 2 attempting to configure 28 addresses, which will leave 18 address(es) unavailable. To resolve this, consider increasing the number of nodes in dynamic pools or reducing the number of IPv4 addresses.

When it comes to troubleshooting externally managed pools, there are two log files which are useful to check. Namely:

/var/log/dhclient.log
/var/log/isi_smartconnect

The first of these is a dedicated dhclient.log file for the new dhclient instance that OneFS 9.7 introduces. In contrast, the IP Merger and IP Reporter modules will output to the isi_smartconnect log.

There are also a handful of relevant system files that are also worth being aware of, and these include:

/var/db/dhclient/lease.ena1
/ifs/.ifsvar/modules/flexnet/ip_reporter/DHCP/node.
/ifs/.ifsvar/modules/flexnet/pool_members/groupnet.1.subnet.1.pool.1
/ifs/.ifsvar/modules/smartconnect/resource/workers/ip_merger

The first of these, lease.ena1, is an append log maintained by dhclient. So the most recent lease in there is the one that is SmartConnect is looking at. Note that there may be other lease files in the system, but only the lease files in /var/db/dhclient are relevant, and being viewed by SmartConnect. OneFS has a special configuration for dhclient to ensure this.

The IP reports live in the /ifs/.ifsvar/modules/flexnet directory. The pool_members directory has been present in OneFS for a number of years now. And OneFS now coordinates the IP merger with the file under ./smartconnect/resource/workers/ directory.

As for useful CLI commands, these include the following:

# isi_smartconnect_client action –a wake-ip-reporter

The ‘isi_smartconnect_client’ CLI utility, which can be used to interact with the SmartConnect daemon, gets an additional ‘wake-ip-reporter’ action in OneFS 9.7. Under normal circumstances, the IP Reporter only checks the contents of the lease file every five minutes. However, ‘wake-ip-reporter’ now instructs IP Reporter to check the lease file immediately. So if there was some issue where dhclient restarted for some reason, IP Reporter can be awoken and forced to read the lease, rather than waiting for its next scheduled check.

Additionally, the following ‘log_level’ command arguments can be used to change the logging level of SmartConnect to the desired verbosity:

# isi_smartconnect_client log_level [-l | -r]

Note that, in OneFS 9.7, this does not change the Flexnet config file which was required in prior releases.

Instead, this log level is reset when the process dies or the ‘–r’ argument is passed. It’s worth noting that this command does not operate cluster-wise. Rather, it just affects the current instance of SmartConnect running on the local node.

Another thing to be aware of when a cluster is using externally managed pools is that networking is dependent on, and can be impacted by, the availability of AWS’ DHCP servers. While the leased IP never changes, the leases themselves have an expiration of an hour. As such, if OneFS is unable to reach the DHCP server to renew, it may lose its Primary IPs. While this is often outside the realm of control, the OneFS CELOG event service will fire a critical warning alert (SW_SC_DHCP_LEASE_REBIND) before a primary IP expires. This alert will contain the following event description:

DHCP server has not responded to requests to renew lease on <interface>. Attempting to contact other DHCP servers. If we are unable to renew the lease, the IP address <ip_address> will be removed at expiry.

For example:

In addition to the above alert, there are several log messages that give a good indication of what may be amiss. These, and their resolution info, are summarized in the following table:

Log Message	Description	Resolution
Unable to merge IP 1.2.3.4 on ext-1 from devid 1 – no matching pool found	IP is not configured in any Network Pool	Add IP to the Primary IP Pool
Unable to parse lease on NIC: ena1. Attempting to retrieve new lease	The lease file generated by dhclient could not be read.	None should be required. We will automatically backup the old lease file and restart dhclient
Lease on NIC: ena1 not found	Lease file does not exist for the specified interface	OneFS will automatically restart dhclient
Unexpected error comparing IP Reports. Attempting rewrite	We try to dedupe writes by comparing newly generated IP report with what is on disk. In the event of a failure, we’ll just overwrite.
No IP Report received from DHCP External Manager	OneFS unable to determine its IP from the DHCP leases. Will continue retrying, but currently unable to report an IP	If issues persists, check on dhclient to ensure it is operating correctly.
Failed to write IP Report node. for DHCP to disk:	OneFS unable to report its IP to /ifs, so the IP merger is unable to update Flexnet/IP Assignments with this information.	Check why SmartConnect is unable to write to /ifs. Is it read only?

OneFS and Externally Managed Network Pools

Tucked amongst the array of new functionality within OneFS 9.7’s payload is the debut of a networking feature called externally managed network pools. In layman’s terms, this is essentially the introduction of a front-end dynamic host control protocol (DHCP) client for the PowerScale cluster.

The context and motivation behind implementing this new functionality is predicated on the fact that cloud networking differs substantially from on-prem infrastructure. This is largely because the cloud hyperscalers typically require a primary IP to be configured on a specific interface that they dictate. Normally, systems operating within an off-prem environment obtain their network configuration via the DHCP protocol. But as you’re likely aware, prior to OneFS 9.7, DHCP was not supported on a cluster’s front end network. To address this , OneFS 9.6 implemented a manual work-around for PowerScale for AWS, which had its limitations. However, OneFS 9.7 and later added proper native support for IPv4 primary IP addresses on PowerScale for AWS and Azure deployments, thereby negating the need for configuring manual work-arounds, with their inherent risks.

This externally managed IP addresses feature is automatically enabled upon committing an upgrade to OneFS 9.7 or later. To support this feature, OneFS’ network pools include an ‘externally managed’ network allocation method. This is actually managed by an external service such as AWS or Azure, which dictates where these primary IPs live. So they are in charge of IP allocation, rather than the cluster’s Flexnet or SmartConnect services, which has been the case up to now. It’s worth noting that OneFS only includes (and enforces) limited DHCP support, strictly for cloud deployments currently. That said, on-prem DHCP support may be added in a future release but this is currently not on the near-term roadmap. Additional functionality is also included in OneFS 9.7 and later to prevent IP oversubscription.

So let’s take a look under the hood… Architecturally, there are three main components to the externally managed IP addresses feature:

DHCP Service
IP Reporter Module
IP Merger Module

OneFS 9.7 and later releases actually talk DHCP by leveraging the FreeBSD ‘dhclient’ implementation. Dhclient is modified so it does not actually configure the network interfaces like it would normally, in order to avoid conflicts with the OneFS Flexnet network config daemon. Instead, dhclient just persists the leases to the following files:

/ifs/.ifsvar/modules/flexnet/flx_config.xml

/ifs/.ifsvar/modules/flexnet/pool_members/groupnet.1.subnet.1.pool.1.

Additionally, SmartConnect sees the addition of two new modules, IP Reporter and IP Merger.

Component	Details
DHCP service	Adds new MCP-controlled DHCP service: dhclient-ext-1 – Uses modified FreeBSD dhclient implementation – Does not configure network interfaces – Persists leases to /var/db/dhclient/
IP Merger	Adds new cluster-wide module to SmartConnect, IP Merger: – Coordinates ownership of the role by taking locks on files on /ifs – Loads all files from IP Reports directory – Verifies network pool is configured correctly and generates IP Assignments – Updates the following files: ▪ /ifs/.ifsvar/modules/flexnet/flx_config.xml ▪ /ifs/.ifsvar/modules/flexnet/pool_members/groupnet.1.subnet.1.pool.1
IP Reporter	Adds new module to each node’s SmartConnect service: – Parses DHCP leases – Converts to a generic format – Saves to /ifs/.ifsvar/modules/flexnet/ip_reports/DHCP/node.

These modules are still part of the overarching isi_smartconnect_d, and just new components within that same daemon. The IP Reporter module will parse the above lease files and then save the information to /ifs/.ifsvar/modules/flexnet/ip_reports/DHCP/node.

In contrast, the IP Merger is a single cluster-wide instance that loads the files from the IP Reports directory, verifies the network pool configuration, generates the IP assignments, and updates the config files. The ip_merger file contains the devID of the node that has been elected as responsible for IP merging. The full path is as follows:

/ifs/.ifsvar/modules/smartconnect/resource/workers/ip_merger

The following CLI syntax can be used to determine which node is acting as the merger: For example:

# isi_for_array 'grep "Taking ownership of the IPMerger role" /var/log/isi_smartconnect’

TME-4:  2024-02-07T16:26:20.946863+00:00 <3.6> GLaDOS-4(id4) isi_smartconnect_d[3626]: Taking ownership of the IPMerger role

In this case, the command output indicates that node ID4 has taken ownership of the IPMerger role.

The underlying process is very similar to how OneFS manages SSIPs in that all nodes attempt to lock a file under /ifs, and one granted that lock, they own that responsibility. So OneFS takes the files from under /ifs/.ifsvar/modules/flexnet/ip_reports and merges the IP information into the Flexnet config and the pool members file, as follows:

The above graphic illustrates how data flows through the system from the cloud provider’s DHCP server, to dhclient, and then into isi_smartconnect_d. The modular, extensible architecture requires only a small portion of OneFS to be made aware of this new type of network pool. This all happens on the side until the data is merged into the Flexnet config and the associated state files, so it is low risk to everything else.

In OneFS 9.7 and later releases, this new DHCP allocation method is now set as ‘externally managed’ for subnet0.pool0. This can be seen even on network pools that have been upgraded from an earlier OneFS release. Additionally, the CLI output also reports the type of external manager for this network pool – for instance AWS in the example below:

The ‘isi network interfaces’ CLI syntax is also updated in OneFS 9.7 to allow filtering by ‘externally managed’ pools. For example below, again showing that the owner is AWS:

As a quick reminder, there are a currently a couple of instances of unsupported networking functionality in the PowerScale for AWS and Azure offerings, as compared to on-prem OneFS, and these include:

IPv6 support
VLANs
Link aggregation
NFSoverRDMA

In the next article in this series we’ll turn our attention to the management, monitoring, and security of OneFS externally managed network pools.

OneFS SmartSync Configuration for Google Cloud

As we saw in the previous blog in this series, with the inclusion of Google Cloud (GCP) in OneFS 9.7, SmartSync Cloud Copy now supports all three of the principal public cloud hyperscalers.

Object data replication to Google Cloud (GCP) can be configured in OneFS 9.7 via the ‘isi dm accounts create’ CLI command. Required information includes the regular account configuration parameters plus the following GCP-specific settings:

GCP account type
GCP URI
Access ID
Secret key

Or, more specifically:

Parameter	Description
Object store type	GCP (or AWS_S3, Azure, ECS_S3, etc)
URI	{http,https}://hostname:port/bucketname
Auth	Access ID, Secret Key
Proxy	Optional proxy information

For example:

# isi dm account create --account-type GCP --name [Account Name] --access-id [GCP access-id] --uri [GCP URI with bucket-name] --auth-mode CLOUD --secret-key [GCP secret-key]

Once created, the new account can be verified with the following command:

# isi dm accounts list

Additionally, the next steps for SmartSync configuration and policy creation are covered in detail in the following blog article.

SmartSync Cloud Copy supports both push and pull replication, permitting the same dataset that is copied to GCP with a push to be copied back to the cluster via a corresponding pull.

Be aware that a dataset must be available before a policy runs, or the policy will fail.

Also note that, while multiple GCP URIs and credentials are supported by SmartSync, they are not supported on the same account. Multiple accounts and multiple corresponding policies would need to be created for SmartSync.

Other SmartSync features and functionality includes:

Feature	Details
Bandwidth throttling	Set of netmask rules. Limits are per-node.
CPU throttling	Allowed and Back-off CPU percentages.
Base policies	Template providing common values to groups of related policies (schedule, source base path, enable/disable, etc). Ie. Disabling base policy affects all linked concrete policies.
Concrete policy	Predefined set of fields from the base policy
Unconnected nodes (NANON)	Active accounts are monitored by each node. No work allocation to nodes without network access.
Snapshot locking	Avoids accidental snapshot deletion, with subsequent re-base-lining.

Behind the scenes, dataset creation leverages a SnapshotIQ snapshot, which can be inspected via the ‘isi snapshot list’ command. These DM dataset snapshots are easily recognizable due to their ‘isi_dm’ prefixed naming convention.

The SmartSync Cloud Copy format provides both regular file representation, browsability and usability of file system data in the cloud. In addition to the replication of the actual data, SmartSync also preserves the common file attributes including Windows ACLs, POSIX permissions and attributes, creation times, extended attributes, etc. However, there are certain considerations and limitations to be aware of, such as no incremental copy. These also include:

CloudCopy Caveats	Details
ADS files	Skipped when encountered.
Hardlinks	An object will be created for each link (ie. links are not preserved).
Symlinks	Skipped when encountered.
Directories	An object is created for each directory.
Special files	Skipped when encountered.
Metadata	Only POSIX mode bits, UID, GID, atime, mtime, ctime are preserved.
Filename encodings	Converted to UTF-8.
Path	Path relative to root copy directory is used as object key.
Large files	An error is returned for files larger than the cloud providers maximum object size.
Long filenames	File names exceeding 256 bytes are compressed.
Long paths	Junction points are created when paths exceed 1024 bytes to redirect where objects are being stored
Sparse files	Sparse sections are not preserved and are written out fully as zeros.

SmartSync allows subsequent incremental data movement by managing and re-transferring failed file transfers. Similarly, Dataset reconnect enables systems with common base datasets to establish instant incremental syncs. SmartSync also proactively locks the SnapshotIQ snapshots it uses, providing better separation between Datamover and other snapshots.

Performance-wise, SmartSync is powered by a scalable run-time engine, spanning the cluster, and which spins up threads (fibers) on demand and uses asynchronous IO to process replication tasks (chunks). Batch operations are used for efficient small file, attribute, and data block transfer. Namespace contention avoidance, efficient snapshot utilization, and separation of dataset creation from transfer are salient design features of the both the baseline and incremental sync algorithms.