PowerScale & Isilon – Unstructured Data Quick Tips

OneFS and Front-end Infiniband Networking

A number of scientific computing and commercial HPC customers use Infiniband as their principal network transport, and to date have used Infiniband bridges and gateways to connect their IB-attached compute server fleets to their PowerScale clusters. To address this, OneFS 9.10 introduces support for low latency HDR Infiniband front-end network connectivity on the F710 and F910 all-flash platforms, providing up to 200Gb/s of bandwidth with sub-microsecond latency. This can directly benefit generative AI and machine learning environments, plus other workloads – particularly those involving highly concurrent streaming reads and writes.

In conjunction with the OneFS multipath driver, plus GPUDirect support, the choice of either HDR Infiniband or 200Gig Ethernet can satisfy the networking and data requirements of demanding technical workloads, such as autonomous driving model training, seismic analysis, and complex transformer-based AI workloads, and deep learning systems.

Specifically, this new functionality expands cluster front-end connectivity to include Infiniband, in addition to Ethernet, and enables the use of IP over IB and NFSoRDMA over IB.

With its debut in OneFS 9.10, front-end IB support is currently offered on both the PowerScale F710 and F910 all-flash nodes:

These two platforms can now host a Mellanox CX-6 VPI NIC configured for HDR Infiniband in the primary front-end PCI slot, plus a CX-6 DX NIC configured for Ethernet in the secondary front-end PCI slot. This is worth noting, since in an ethernet only node, the primary port is typically used for Gb Ethernet, while the secondary slot is unpopulated. For performance reasons, this configuration ensures that the primary interface makes use of the full bandwidth of the Gen5 PCIe slot, while the secondary slot offers PCIe Gen4 connectivity, which limits it to 100Gb. Additionally, in OneFS 9.10, the node’s backend network, used for intra-cluster communication, can also now be configured for either HDR Infiniband or 200Gb Ethernet.

Prior to OneFS 9.10, there were several components within OneFS that assumed Infiniband was synonymous with only the backend network. This included the platform support interface (PSI) that is used to get information on the running platform, and which contained a single ‘network.interfaces.infiniband‘ key for Infiniband in OneFS 9.9 and earlier releases. To rectify this, the psi.conf in OneFS 9.10 now includes a pair of ‘network.interfaces.infininband.frontend’ and ‘network.interfaces.infiniband.backend’ keys, mirroring the corresponding key pair for the ethernet front and backend networks.

Here’s the rear view of both the F710 and 910 platforms, showing the slot locations of the front and back-end NICs in the Infiniband configuration. Note that both F710 and F910 nodes configured with Infiniband frontend support are automatically shipped with both an HDR IB card in the primary front-end slot (red) and a 100Gb Ethernet NIC in the secondary front-end slot.

F710

F910

Note that the PowerScale F210 does not support either front-end Infiniband, or 200Gig Ethernet at this point. However, this is not a platform constraint, but rather a qualification effort limitation.

However, OneFS 9.10 does provides support for either backend HDR Infiniband or 200Gb Ethernet on all the current F-series platforms, including the F210 node.

Backend HDR Infiniband support is again using the venerable CX-6 VPI 200Gb NIC, paired with the Quantum QM8790 IB switch, and using supplied 200Gb HDR cables.

Note though, that the QM 8790 switch only provides 40 HDR ports, and breakout cables are not currently supported. So, for now, an F-series cluster with an IB backend running OneFS 9.10 will be limited to a maximum of forty nodes. It’s also worth mentioning that the QM 8790 switch does support lower IB data rates such as QDR and FDR, and has also been qualified with the legacy CX3 VPI NICs and IB cables for legacy IB cluster compatibility and migration purposes.

From the CLI, the isi_hw_status command can be used to report on a node’s front and back-end networking types. For example, the following output from an F710 shows the front-end network ‘FEType’ parameter as ‘Infiniband’.

# isi_hw_status | grep Type
BEType: Infiniband
FEType: 200GigE

However, the back-end network on this F710 is 200Gb Ethernet, as reported by the ‘BEType’ parameter.

In the next article in this series, we’ll look at the configuration and management of front-end Infiniband in OneFS 9.10

OneFS OpenSSL 3 and TLS 1.3 Support

Secure Sockets Layer (SSL) protocols use cryptographic algorithms to encrypt data, reducing the potential for unauthorized individuals or bad actors to intercept or tamper with the data. This is achieved through three principle and complimentary methods:

When using either the OneFS WebUI or platform API (pAPI), all communication sessions are encrypted using SSL and the related Transport Layer Security (TLS). As such, SSL and TLS play a critical role in PowerScale’s Zero Trust architecture by enhancing security via encryption, validation, and digital signing.

In OneFS 9.10, OpenSSL has been upgraded from version 1.0.2 to version 3.0.14. This makes use of the newly validated OpenSSL 3.0.9 FIPS module, which is the latest version that has been blessed by the OpenSSL upstream project, and which is supported through September 2026.

Architecturally, SSL comprises four fundamental layers:

These reside within the stack as follows:

The basic handshake process begins with a client requesting an HTTPS WebUI session to the cluster. OneFS then returns the SSL certificate and public key. The client creates a session key, encrypted with the public key it’s received from OneFS. At this point, the client only knows the session key and it sends this encrypted session key to the cluster, which decrypts it using the private key. Now, both the client and OneFS know the session key so the session, encrypted via the symmetric key, can be established. OneFS automatically defaults to the best supported version of SSL, based on the client request.

As part of the OneFS 9.10 SSL upgrade, there’s a new implementation of FIPS mode that is compatible with OpenSSL 3, which all of the OneFS daemons make use of. But probably the most significant enhancement in the OpenSSL 3 upgrade is addition of library support for the TLS 1.3 ciphers, which are designed to meet the stringent Federal data-in-flight security requirements. The OpenSSL 3 upgrade also deprecates and removes some legacy algorithms as well, so those will no longer be supported and can be removed entirely from OneFS in the future. More detail is available in the OpenSSL 3 Migration Guide, which contains an exhaustive list of every changes that was made in OpenSSL 3.

In OneFS 9.10 the TLS 1.2 cipher configuration remains the same as in OneFS 9.9, except that three TLS 1.3 ciphers are added:

TLS_AKE_WITH_AES_256_GCM_SHA384
TLS_AKE_WITH_CHACHA20_POLY1305_SHA256
TLS_AKE_WITH_AES_128_GCM_SHA256

Similarly, if FIPS mode is enabled, the same TLS 1.2 ciphers are available plus two TLS 1.3 ciphers are added:

TLS_AKE_WITH_AES_256_GCM_SHA384
TLS_AKE_WITH_AES_128_GCM_SHA256

There are no changes to the data path Apache HTTPD ciphers, so no addition of TLS 1.3 there – it still uses the same TLS 1.2 ciphers.

OneFS 9.10 also contains some changes to the SSH cryptography. So with FIPS mode disabled, the encryption algorithms, host key algorithms, or message authentication code algorithms all remain the same as in OneFS 9.9. however, support for the following four key exchange algorithms has been removed in 9.10:

diffie-hellman-group-exchange-sha256
diffie-hellman-group16-sha512
diffie-hellman-group18-sha512
diffie-hellman-group14-sha256

Similarly, with FIPS mode enabled, there are also no changes to encryption algorithms, source key, or message authentication codes. But support is removed for the following two key exchange algorithms.

diffie-hellman-group-exchange-sha256
diffie-hellman-group14-sha256

Note that the sha512 algorithms weren’t previously supported by FIPS mode anyway.

Moving on to TLS 1.3 phase one, OneFS 9.10 adds TLS 1.3 support for the WebUI and KMIP key management servers. Plus 9.10 also verifies that 1.3 is supported for the LDAP provider, for CELOG alert emails, for audit event and syslog forwarding, for the platform API and WebUI single sign-on, and also for SyncIQ.

Here’s a list of the capabilities:

Note that the OneFS components that aren’t explicitly called out in the table above likely won’t support TLS 1.3 currently, but are candidates to be uprev’d in a future phase of OneFS TLS 1.3 enablement.

The TLS 1.3 phase 1 enhancement in OneFS 9.10 allows the above components to negotiate either a TLS 1.2 or TLS 1.3 connection. The negotiated TLS version depends on the configuration of the environment. So if client supporting both TLS 1.2 and 1.3 are present, then the cluster will automatically negotiate and use TLS 1.3 where possible, but it will fall back to 1.2 for clients that only support that level. Similarly TLS 1.3 exclusively for environments with all 1.3 clients. For the curious or paranoid, it’s worth noting that the only way to verify which version of TLS is being used is via packet inspection. So if you really need to know, grabbing and analyzing packet captures will be your friend here.

There are a couple of other idiosyncrasies with TLS 1.3 support in OneFS 9.11 that also bear mentioning.

It’s not always possible to explicitly specify the minimum TLS protocol version currently since OneFS 9.10 does not expose these configuration options. This means that clients and servers on OneFS will decide automatically which version to use, and they should prefer 1.3.
OneFS 9.10 does not allow customers to disable TLS 1.3 ciphers, but this should not be an issue since all the 1.3 ciphers are still considered very secure.
OneFS also does not provide diagnostic information about which protocol version of TLS is in use. So in order to verify for certain that the cluster and/or client(s) are using a specific version of TLS, it will likely require taking and analyzing packet captures.

OneFS S3 Protocol and Concurrent Object Access

Among the array of PowerScale’s core unstructured data protocols lies the AWS S3 API – arguably the gold standard for object protocols. This enables the PowerScale data lake to natively support workloads which both write data via file protocols such as NFS, HDFS or SMB, and then read that same data as S3 objects, and vice versa.

Since OneFS S3 objects and buckets are essentially files and directories at the file system level, the same PowerScale data services, such as snapshots, replication, WORM immutability, tiering, etc, are all seamlessly integrated. So too are identity and permissions management and access controls across both the file and object realms.

This means that applications and workloads have multiple access options – across both file and object, and with the same underlying dataset, semantics, and services. This has the considerable benefit of eliminating the need for replication or migration of data for different access requirements, thereby vastly simplifying data and workload management. OneFS supports HTTPS/TLS, to meet organizations’ security, in-flight encryption, and compliance needs. Additionally, since S3 is integrated into OneFS as a top-tier protocol, it offers a high level of performance, similar to that of the SMB protocol.

By default, the S3 service listens on port 9020 for HTTP and 9021 for HTTPS, although both these ports are easily configurable. Within a PowerScale cluster, OneFS runs on and across all nodes equally, so no one node controls or ‘masters’ the cluster and all nodes are true peers. Looking from a high-level at the components within each node, the I/O stack is split into a top layer, or initiator, and a bottom layer, or participant. This division is used as a logical model for the analysis of OneFS’ read and write paths.

At a physical-level, the CPUs and memory cache within the nodes simultaneously handle both initiator and participant tasks for I/O taking place throughout the cluster.

For clarity’s sake, the level of detail that includes the caches and distributed lock manager has been omitted from the above.

When a client connects to a node’s protocol head to perform a write, it is interacting with the logical ‘top half’, or initiator, of that node. Any files or objects that are written by the client are broken into smaller logical chunks, or stripes, before being written to the logical ‘bottom half’, or participant, of a node, where the storage drives reside. Failure-safe buffering (write coalescer and journal) ensures that writes are efficient and read-modify-write operations are avoided. OneFS stripes data across all nodes and protects the files, directories and associated metadata via software erasure-code or mirroring.

File and object locking allows multiple users or processes to access data via a variety of protocols concurrently and safely. Since all nodes in an PowerScale cluster operate on the same single-namespace file system simultaneously, it requires mutual exclusion mechanisms to function correctly. For reading data, this is a fairly straightforward process involving shared locks. With writes, however, things become more complex and require exclusive locking, since data must be kept consistent.

Under the hood, the ‘bottom half’ locks OneFS uses to provide consistency inside the file system (internal) are separate from the ‘top half’ protocol locks that manage concurrency across applications (external). This allows OneFS to move a file’s metadata and data blocks around while the file itself is locked by an application. This is the premise of OneFS auto-balancing, reprotecting and tiering, where the restriper does its work behind the scenes in small chunks to minimize disruption.

The OneFS distributed lock manager (DLM) marshals locks across all the nodes in a storage cluster, allowing for multiple lock types to support both file system locks as well as cluster-coherent protocol-level locks. The DLM distributes the lock data across all the nodes in the cluster. In a mixed cluster, the DLM also balances memory utilization so that the lower-power nodes are not bullied.

Every node in a cluster is a coordinator for locking resources. A coordinator is assigned to lockable resources based on a hashing algorithm, designed so that the coordinator almost always ends up on a different node than the initiator of the request. When a lock is requested for a file/object, it could be either a shared or exclusive lock. Read requests are typically serviced by shared locks, allowing multiple users to simultaneously access the resource, whereas exclusive locks constrain to just one user at any given moment, typically for writes.

Here’s an example of how different nodes could request a lock from the coordinator:

Thread 1 from node 4 and thread 2 from node 3 simultaneously request a shared lock on a file from the coordinator on node 2.
Since no exclusive locks exist, node 2 grants shared locks, and nodes 3 and 4 read the requested file.
Thread 3 from node 1 requests an exclusive lock for the same file that’s being read by nodes 3 and 4.
Nodes 3 and 4 are still reading, so the coordinator (node 2) asks thread 3 from node 1 to wait.
Thread 3 from node 1 blocks until the exclusive lock is granted by the coordinator (node 2) and then completes its write operation.

As such, an S3 client can access, read, and write to an object using HTTP GET and PUT requests, while other file protocol and/or S3 clients also access the same resource. OneFS supports two methods of specifying buckets and objects in a URL:

Path-style requests, using the first slash-delimited component of the request-URI path. For example:

https://tme1.isilon.com:9021/bkt01/lab/object1.pdf

Virtual hosted-style requests, specifying a bucket via the HTTP Host header. I.e.:

https://bkt01.tme.isilon.com:9021/lab/object1.pdf

Additionally, the principal API operations that OneFS supports include:

Essentially, this includes the basic bucket and object create, read, update, delete, or CRUD, operations, plus multipart upload.

As for client access, from the cluster side the general OneFS S3 operation flow can be characterized as follows:

First, an S3 client or application establishes a connection to the cluster, with SmartConnect resolving the IP address with bucket name.
OneFS creates a socket/listener with the appropriate TLS handling, as required.
Next, OneFS (libLwHttp) receives and unmarshals the HTTP request/stream to determine the S3 request payload.
Authorization and authentication is performed for bucket and object access.
Next, the S3 request is queued for LwSched, which dispatches the work with the appropriate threading mode.
The S3 protocol driver handles the operational logic and calls the IO manager (Lwio).
Lwio manages any audit filter driver activity before and after the operation, while FSD (file system driver) handles the file system layer access.
Finally, the S3 protocol driver creates an HTTP response with its operation result, which is returned to the S3 client via libLwHttp.
Then back to step 3 for the next HTTP request, etc.

If a client HTTP request is invalid, or goes awry, OneFS follows the general AWS S3 error codes format – albeit with modifications to remove any AWS-specific info. The OneFS S3 implementation also includes some additional error codes for its intrinsic behaviors. These include:

So how do things work when clients try to simultaneous access the same file/object on a cluster via both file and object protocols? Here’s the basic flow describing OneFS cross-protocol locking:

OneFS FIPS-compliant SDPM Journal

The new OneFS 9.10 release delivers an important data-at-rest encryption (DARE) security enhancement for the PowerScale F-series platforms. Specifically, the OneFS software defined persistent memory (SDPM) journal now supports self-encrypting drives, or SEDs, satisfying the criteria for FIPS 140-3 compliance.

SEDs are secure storage devices which transparently encrypt all on-disk data using an internal key and a drive access password. OneFS uses nodes populated with SED drives in to provide data-at-rest encryption, thereby preventing unauthorized access data access.

All data that is written to a DARE PowerScale cluster is automatically encrypted the moment it is written and decrypted when it is read. Securing on-disk data with cryptography ensures that the data is protected from theft, or other malicious activity, in the event drives or nodes are removed from a cluster.

The OneFS journal is among the most critical components of a PowerScale node. When the OneFS writes to a drive, the data goes straight to the journal, allowing for a fast reply. OneFS uses journaling to ensure consistency across disks locally within a node and also disks across nodes.

Here’s how the journal fits into the general OneFS caching hierarchy:

Block writes go to the journal prior to being written to disk, and a transaction must be marked as ‘committed’ in the journal before returning success to the file system operation. Once the transaction is committed, the change is guaranteed to be stable. If the node happened to crash or lose power, the changes would still be applied from the journal at mount time via a ‘replay’ process. As such, the journal is battery-backed in order to be available after a catastrophic node event such as a data center power outage.

Operating primarily at the physical level, the journal stores changes to physical blocks on the local node. This is necessary because all initiators in OneFS have a physical view of the file system, and therefore issue physical read and write requests to remote nodes. The OneFS journal supports both 512byte and 8KiB block sizes of 512 bytes for storing written inodes and blocks respectively. By design, the contents of a node’s journal are only needed in a catastrophe, such as when memory state is lost.

Under the hood. the current PowerScale F-series nodes use an M.2 SSD in conjunction with OneFS’ SDPM solution to provide persistent storage for the file system journal.

This is in contrast to previous generation platforms, which used NVDIMMs.

The SDPM itself comprises two main elements:

While the BBU is self-contained, the M.2 NVMe vault is housed within a VOSS module, and both components are easily replaced if necessary.

This new OneFS 9.10 functionality enables an encrypted, FIPS compliant, M.2 SSD to be used as the back-end storage for the journal’s persistent memory region. This allows the journal in an F-series to This M.2 drive is also referred to as the ‘vault’ drive, and it sits atop the ‘vault optimized storage subsystem’ or VOSS module, along with the journal battery, etc.

This new functionality enables the transparent use of the M.2 FIPS drive, securing it in tandem with the other FIPS data PCIe drives in the node. This feature also paves the way for requiring specifically FIPS 140-3 drives across the board.

So looking a bit deeper, this new SDMP enhancement for SED nodes uses a FIPS 140-3 certified M.2 SSD within the VOSS module, providing the persistent memory for the PowerScale all-flash F-series platforms.

It also builds upon coordination or BIOS features and functions, coordinating with iDRAC and OneFS coordination with iDRAC itself through the host interface, or HOSA.

And secondarily, this FIPS SDPM feature is instrumental in delivering the SED-3 security level for the F-series nodes. The redefined OneFS SED FIPS framework was discussed at length in the previous article in this series, and the SED-3 level requires FIPS 140-3 compliant drives across the board (ie. for both data storage and journal).

Under the hood, the VOSS drive is secured by iDRAC. iDRAC itself has both a local key manager, or IKLM, function and the secure enterprise key manager, or SEKM. OneFS communicates with these key managers via the HOSA and the Redfish passthrough interfaces, and configures the VOSS drive during node configuration time. Beyond that, OneFS also tears down the VOSS drive the same way it would tear down the storage drives during a node reformat operation.

As we saw in the previous blog article, there are now three levels of self-encrypting drives or SEDs in OneFS 9.10, in addition to the standard ISE (instant secure erase) drives:

SED level 1, previously known as SED non-FIPS.
SED level 2, which was formerly FIPS 140-2.
SED level 3, which denotes FIPS 140-3 compliance.

Beyond that, the existing behavior around security of nodes with regards to drive capability and the existing OneFS logic that prevents lesser security nodes from joining higher security clusters. This basic restriction is not materially changed in 9.10, but simply a new higher tier of security, the SED-3 tier, is added. So a cluster running comprising SED-3 nodes running OneFS 9.10 would only disallow any lesser security nodes from joining.

Specifically, the SED-3 designation requires FIPS 140-3 data drives as well as a FIPS 140-3 VOSS drive within the SDMP VOSS module. The presence of incorrect drives would result in ‘wrong type’ errors, the same as with pre 9.10 behavior. So if a node is built with the incorrect VOSS drive, or OneFS is unable to secure it, that node will fail a journal healthcheck during node boot and be automatically blocked from joining the cluster.

SED-3 compliance not only requires the drives to be secure, but actively monitored. OneFS uses its ‘hardware mon’ utility to monitor a node’s drives for correct security state, as well as checking for any unexpected state transitions. If hardware monitor detects any of these, it will trigger a CELOG alert and bring the node down into read-only state. So if a SED-3 node is in read-write state, this indicates it’s fully functional and all is good.

The ‘isi status’ CLI command has a ‘SED compliance level’ parameter which report’s the node’s level, such as SED-3. Alternatively, the ‘isi_psi_tool’ CLI utility can provide more detail on the required compliance level of the data and VOSS drives themselves, as well as node type, etc.

The OneFS hardware monitor CLI utility (isi_hwmon) can be used to check the encryption state of the VOSS drive, and the encryption state values are:

Unlocked: Safe state, properly secured. Unlocked indicates that iDRAC has authenticated/unlocked VOSS for SDPM read/write usage.
Locked: Degraded state. Secured but not accessible. VOSS drive is not available for SDPM usage.
Unencrypted: Degraded state. Not secured.
Foreign: Degraded state. iLKM is unable to authenticate. Missing Key/PIN or secured by foreign entity.

As such, only ‘unlocked’ represents a healthy state. The other three states (locked, unencrypted, and foreign) indicate an issue, and will result in a read-only node.

OneFS Data-at-rest Encryption and FIPS Compliance

On the security front, the new OneFS 9.10 release’s payload includes a refinement of the compliance levels for self-encrypting drives within a PowerScale cluster. But before we get into it, first a quick refresher on OneFS Data-at-Rest Encryption, or DARE, and FIPS compliance.

Within the IT industry, compliance with the Federal information processing standards (FIPS), denotes that a product has been certified to meet all the necessary security requirements, as defined by the National institute of standards and technology, or NIST.

A FIPS certification is not only mandated by federal agencies and departments, but is recognized globally as a hallmark of security certification. For organizations that store sensitive data, a FIPS certification may be required based on government regulations or industry standards. As companies opt for drives with a FIPS certification, they are ensured that the drives meet stringent regulatory requirements. FIPS certification is provided through the Cryptographic Module Validation Program (CMVP), which ensures that products conform to the FIPS 140 security requirements.

Data-At-Rest Encryption (DARE) is a requirement for federal and industry regulations ensuring that data is encrypted when it is stored. Dell PowerScale OneFS provides DARE through self-encrypting drives (SEDs) and a key management system. The data on a SED is encrypted, preventing a drive’s data from being accessed if the SED is stolen or removed from the cluster.

Data at rest is inactive data that is physically stored on persistent storage. Encrypting data at rest with cryptography ensures that the data is protected from theft if drives or nodes are removed from a PowerScale cluster. Compared to data in motion, which must be reassembled as it traverses network hops, data at rest is of interest to malicious parties because the data is a complete structure. The files have names and require less effort to understand when compared to smaller packetized components of a file.

However, because of the way OneFS lays out data across nodes, extracting data from a drive that’s been removed from a PowerScale cluster is not a straightforward process – even without encryption. Each data stripe is composed of data bits. Reassembling a data stripe requires all the data bits and the parity bit.

PowerScale implements DARE by using self-encrypting drives (SEDs) and AES 256-bit encryption keys. The algorithm and key strength meet the National Institute of Standards and Technology (NIST) standard and FIPS compliance. The OneFS management and system requirements of a DARE cluster are no different from standard clusters.

Note that the recommendation is for a PowerScale DARE cluster to solely comprise self-encrypting drive (SED) nodes. However, a cluster mixing SED nodes and non-SED nodes is supported during its transition to an all-SED cluster.

Once a cluster contains a SED node, only SED nodes can then be added to the cluster. While a cluster contains both SED and non-SED nodes, there is no guarantee that any particular piece of data on the cluster will, or will not, be encrypted. If a non-SED node must be removed from a cluster that contains a mix of SED and non-SED nodes, it should be replaced with an SED node to continue the evolution of the cluster from non-SED to SED. Adding non-SED nodes to an all-SED node cluster is not supported. Mixing SED and non-SED drives in the same node is not supported.

A SED drive provides full-disk encryption through onboard drive hardware, removing the need for any additional external hardware to encrypt the data on the drive. As data is written to the drive, it is automatically encrypted, and data read from the drive is decrypted. A chipset in the drive controls the encryption and decryption processes. An onboard chipset allows for a transparent encryption process. System performance is not affected, providing enhanced security and eliminating dependencies on system software.

Controlling access by the drive’s onboard chipset provides security if there is theft or a software vulnerability because the data remains accessible only through the drive’s chipset. At initial setup, an SED creates a unique and random key for encrypting data during writes and decrypting data during reads. This data encryption key (DEK) ensures that the data on the drive is always encrypted. Each time data is written to the drive or read from the drive, the DEK is required to encrypt and decrypt the data,. If the DEK is not available, data on the SED is inaccessible, rendering all data on the drive unreadable.

The standard SED encryption is augmented by wrapping the DEK for each SED in an authentication key (AK). As such, the AKs for each drive are placed in a key manager (KM) which is stored securely in an encrypted database, the key manager database (KMDB), further preventing unauthorized access. The KMDB is encrypted with a 256-bit universal key (UK) as follows:

OneFS also supports an external key manager by using a key management interoperability protocol (KMIP)-compliant key manager server. In this case, the universal key (UK) is stored in a KMIP-compliant server.

Note, however, that PowerScale OneFS releases prior to OneFS 9.2 retain the UK internally on the node.

Further protecting the KMDB, OneFS 9.5 and later releases also provide the ability to rekey the UK – either on-demand or per a configured schedule. This applies to both UKs that are stored on-cluster or on an external KMIP server.

The authentication key (AK) is unique to each SED, and this ensures that OneFS never knows the DEK. If there is a drive theft from a PowerScale node, the data on the SED is useless because the trifecta of the UK, AK, and the DEK, are all required to unlock the drive. If an SED is removed from a node, OneFS automatically deletes the AK. Conversely, when a new SED is added to a node, OneFS automatically assigns a new AK.

With the PowerScale H and A-series chassis-based platforms, the KMDB is stored in the node’s NVRAM, and a copy is also placed in the partner node’s NVRAM. For PowerScale F-series nodes, the KMDB is stored in the trusted platform module (TPM). Using the KM and AKs ensures that the DEKs never leave the SED boundary, as required for FIPS compliance. In contrast, legacy Gen 5 Isilon nodes store the KMDB on both compact flash drives in each node.

The key manager uses a FIPS-validated crypto when the STIG hardening profile is applied to the cluster.

The KM and KMDB are entirely secure and cannot be compromised because they are not accessible by any CLI command or script. The KMDB only stores the local drives’ AKs in Gen 5 nodes, and buddy node drives in Gen 6 nodes. On PowerEdge based nodes, the KMDB only stores the AKs of local drives. The KM also uses its encryption not to store the AKs in plain text.

OneFS external key management operates by storing the 256-bit universal key (UK) in a key management interoperability protocol (KMIP)-compliant key manager server.

In order to store the UK on a KMIP server, a PowerScale cluster requires the following:

OneFS 9.2 (or later) cluster with SEDs
KMIP-compliant server:
KMIP 1.2 or later
KMIP storage array 1.0 or later with SEDS profile
KMIP server host/port information
509 PKI for TLS mutual authentication
Certificate authority bundle
Client certificate and private key
Administrator privilege: ISI_PRIV_KEY_MANAGER
Network connectivity from each node in the cluster to the KMIP server using an interface in a statically assigned network pool; for SED drives to be unlocked, each node in the cluster contacts the KMIP server at bootup to obtain the UK from the KMIP server, or the node bootup fails
Not All Nodes On Network (NANON) and Not all Nodes On All Networks (NANOAN) clusters are not supported

As mentioned earlier, the drive encryption levels are clarified in OneFS 9.10. There are three levels of self-encrypting drives, each now designated with a ‘SED-‘ prefix, in addition to the standard ISE (instant secure erase) drives.

These OneFS 9.10 designations include SED level 1, previously known as SED non-FIPS, SED level 2, which was formerly FIPS 140 dash 2, and SED level 3, which denotes FIPS 140-3 compliance.

Confirmation of a node’s SED level status can be verified via the ‘isi status’ CLI command output. For example, the following F710 node output indicates full SED level 3 (FIPS 140-3) compliance:

# isi status --node 1

Node LNN:         1

Node ID:          1

Node Name:       tme-f710-1-1

Node IP Address: 10.1.10.21

Node Health:            OK

Node Ext Conn:    C

Node SN:          DT10004

SED Compliance Level: SED-3

Similarly, the SED compliance level can be queried for individual drives with the following CLI syntax:

# isi device drive view [drive_bay_number] | grep -i compliance

Additionally, the ‘isi_psi_tool’ CLI utility can provide more detail on the required compliance level of the data and journal drives, as well as node type, etc. For example, the SED-3 SSDs in this F710 node:

# /usr/bin/isi_hwtools/isi_psi_tool -v

{

"DRIVES": [

"DRIVES_10x3840GB(pcie_ssd_sed3)"

],

"JOURNAL": "JOURNAL_SDPM",

"MEMORY": "MEMORY_DIMM_16x32GB",

"NETWORK": [

"NETWORK_100GBE_PCI_SLOT1",

"NETWORK_100GBE_PCI_SLOT3",

"NETWORK_1GBE_PCI_LOM"

],

"PLATFORM": "PLATFORM_PE",

"PLATFORM_MODEL": "MODEL_F710",

"PLATFORM_TYPE": "PLATFORM_PER660"

}

So, for example, a OneFS 9.10 cluster comprising SED-3 nodes would prevent any lesser security nodes (ie. SED-2 or below) from joining.

In addition to FIPS 140-3 data drives, the OneFS SED-3 designation also requires FIPS 140-3 compliant flash media for the OneFS filesystem journal. The presence of any incorrect drives (data or journal) in a node will result in ‘wrong type’ errors, the same as with pre-OneFS 9.10 behavior. Additionally, FIPS 104-3 (SED-3) not only requires a node’s drives to be secure, but also actively monitored. ‘Hardware mon’ is used within OneFS to monitor drive state, checking for correct security state as well as any unexpected state transitions. If hardware monitor detects any of these, it will trigger a CELOG alert and bring the node into read-only state. This will be covered in more detail in the next blog post in this series.

OneFS MetadataIQ Monitoring and Management

The previous article in this series focused on provisioning and configuring MetadataIQ. Now, in this final article in this series, we turn our attention to the tools and utilities that are helpful in monitoring and troubleshooting MetadataIQ.

The actual metadata that MetadataIQ provides to the ElasticSearch database can be queried from the cluster CLI with the following command syntax:

# isi_metadataiq_transfer --show-mappings

MetadataIQ uses the Job Engine’s ChangelistCreate job and checkpoint (CCP) files to both track work progress and recover from unexpected termination. The ChangelistCreate job is run with its default impact and priority settings, and job progress can be monitored via the ‘isi job jobs view’ CLI command. Additionally, the job can also be modified, paused, and resumed. Once complete or terminated, a report of the job’s execution and progress can be accessed with the ‘isi job reports’ CLI syntax.

If needed, a ChangelistCreate job instance that is found to be in a paused or indeterminate state can be cancelled as follows. In this example, the culprit is job ID 51:

# isi job list

ID   Type             State       Impact  Policy  Pri  Phase  Running Time

---------------------------------------------------------------------------

51   ChangelistCreate User Paused Low     LOW     10   4/4    2s

---------------------------------------------------------------------------

Total: 1

# isi job cancel 51

Another useful MetadataIQ management lever is the ability to constrain MetadataIQ to a subset of cluster resources, along with the associated impact control ramifications. The ‘excluded_lnns’ option allows cluster admins to explicitly define which nodes MetadataIQ can run on. For example, the following CLI command will configure a MetadataIQ exclusion on the nodes with LNNs 3 and 5:

# isi metadataiq settings modify --excluded-lnns 3,5

# isi metadataiq settings view | grep -i lnn

         Excluded Lnns: 3, 5

On excluded nodes, the MetadataIQ producer log will typically contain entries of the form:

2024-09-29T18:39:15.109639+00:00 <3.3> TME-1(id1) isi_metadataiq_consumer_d[72402]: isi_metadataiq_consumer_d daemon is not allowed to run on current node, exiting..

Note that, in a PowerScale cluster, a node’s LNN (logical node number) is not always the same as its node ID, as reported by utilities like ‘isi status’. The ‘isi_nodes %{id} , %{lnn}’ CLI command can be used to correlate the two node numbering schemes

For a cluster where not all the nodes are connected to the front-end network (NANON), MetadataIQ automatically bypasses the unconnected node(s), so an LNN exclusion does not need to be manually configured in this case.

The ‘isi metadataiq’ CLI command also provides a ‘reset’ option, which can be used to remove any existing configuration settings, including the platform API parameters:

# isi metadataiq reset

The following configuration and log files can be helpful when investigating MetadataIQ issues:

While MetadataIQ is running, the following actions can be performed periodically to confirm proper operation or troubleshoot an issue:

Regularly monitor the ElasticSearch database with queries, ensuring the number of entries are in line with expectations.
Check the health of the ChangelistCreate job(s).

# isi job jobs list | grep -i ChangelistCreate

ID Type             State     Impact Pri    Phase  Running Time

-----------------------------------------------------------------

3 ChangelistCreate Running   Low    5      2/4    9s

-----------------------------------------------------------------

Monitor SnapshotIQ to ensure that there’s no buildup of MetadataIQ snapshots.

# isi snapshot snapshots list | grep -i metadataiq

3278 MetadataIQ_1730914225                        /ifs/data

3290 MetadataIQ_1730915287                        /ifs/data

...

Note that, in a healthy environment, if there are more than two MetadataIQ snapshots, the inactive snapshot(s) should automatically be removed during the next producer cycle

Optionally, dump the checkpoint file:

# cat /ifs/.ifsvar/isi_metadata_index/checkpoint.json

The ChangelistCreate job report can be a particularly useful place to investigate. For example:

# isi job jobs list | grep 110

110  ChangelistCreate   Waiting Low     LOW     5    2/4    3h 35m

# isi job reports view 110

ChangelistCreate[110] phase 1 (2024-09-30T00:29:16)

---------------------------------------------------

Elapsed time  7384 seconds (2H3m4s)

Working time  187 seconds (3m7s)

Errors        0

Older snapid  152

Newer snapid  336

Mode          1

Entries found 2258681

Entries added 489031


ChangelistCreate[110] phase 2 (2024-10-01T19:24:04)

---------------------------------------------------

Elapsed time  154487 seconds (1D18H54m47s)

Working time  12729 seconds (3H32m9s)

Errors        0

Older snapid  152

Newer snapid  336

Mode          1

Entries found 0

Entries added 8716044


ChangelistCreate[110] Job Summary

---------------------------------

Final Job State  System Cancelled

Phase Executed   2

Similarly, a ChangeListCreate job’s status (in this case job ID 110 above) is also reported in the MetadataIQ producer Log, in the form of:

2024-10-01T18:48:45.933223+00:00 <3.7> TME-1 (id1) isi_metadataiq_producer_d[83055]: Job id: 110, status : 200, body  { "jobs" :  [  { "control_state" : "paused_priority", "create_time" : 1727648385, "current_phase" : 2, "description" : "", "human_desc" : "", "id" : 110, "impact" : "Low", "participants" :  [ 1, 2, 3, 4 ], "policy" : "LOW", "priority" : 5, "progress" : "Task results: found 0, added 8716044, 0 errors", "retries_remaining" : 0, "running_time" : 12916, "start_time" : 1727648770, "state" : "paused_priority", "total_phases" : 4, "type" : "ChangelistCreate" } ] }

2024-10-01T18:48:45.933317+00:00 <3.7> TME-1(id1) isi_metadataiq_producer_d[83055]: Job 110 state Waiting

The changelist that is created by the producer daemon should automatically be cleaned up after the subsequent metadata transfer cycle completes. The following CLI command can be used to report a cluster’s changelists:

# isi_changelist_mod -l

If a number of changelists are reported, the consumer daemon may be unable to transfer fully or otherwise keep up with changelist generation. Alternatively, some other cluster process may also be generating changelists.

If, for some reason, a MetadataIQ cycle fails in its entirety, an error of the following form will be reported:

Too many errors (...) encountered in this cycle. Failing this cycle and waiting for the next scheduled run

As described, no administrative intervention is required and MetadataIQ will automatically resume during its next scheduled run.

Similarly, Job Engine issues will typically result in the ChangelistCreate job being retried four times by default. After four failures, the following error is reported and job execution withheld until the next scheduled MetadataIQ cycle run.

ChangelistCreate job failed 4 times; giving up until next cycle

Note that the preferred job retry threshold can be specified with the ‘isi metadataiq settings modify –changelist-job-retries <integer>’ CLI syntax

For ElasticSearch server issues, such as the [500 document(s) failed to index] error, the following OneFS CLI utility can be used to verify the configuration and connectivity:

# isi_metadataiq_transfer –check

If for some reason a cluster’s /ifs file system has been placed into read-only mode, MetadataIQ will be unable to function and the following error message will be reported:

Error: Failed to open leader lock file (...): Read-only file system

If necessary, the MetadataIQ configuration can easily be completely removed. This can be required in the unlikely event that the following error message is reported:

In unrecoverable state. Please restart the MetadataIQ service or run a reset/resync

To start from scratch, first run a reset. For example:

# isi metadataiq reset

Once reset, the desired MetadataIQ configuration settings can be (re)applied via the ‘modify’ option:

# isi metadataiq settings modify --verify-certificate <boolean> --ca-certificate-path <string> --api-key <string> --hostname <string> --host-port <integer> --path <string> --schedule <string>

Using OneFS MetadataIQ

The previous article in this series focused on provisioning and configuring MetadataIQ. Now, we turn our attention to actually using it.

After the initial cluster setup and configuration, each additional MetadataIQ job execution will populate the remote ElasticSearch database with updated metadata from the configured dataset.

Periodic synchronizations keep the database updated with new metadata changes, and the recommendation is to configure a dataset-appropriate schedule for the OneFS MetadataIQ job. For example, the following CLI syntax entry will configure a schedule to run the metadata checkpointing job every five minutes:

# isi metadataiq settings modify --schedule "every day every 5 minutes"

With the producer services enabled, the first MetadataIQ cycle will start as soon as a valid schedule has been configured.

Once installed, the basic steps for using the ElasticSearch database and Kibana UI are as follows:

First, from a browser, navigate to the URL for the Kibana instance. For example:

 http://<ip_to_host>:5601/app/home#/

For example: Username ‘elastic’ and the corresponding password.

Optionally, create a ‘dataview’ first by clicking on ‘create’ under the ‘discover’ tab and pasting ‘isi_metadata_index’ or ‘isi*’ as the index pattern.
Use the ‘discover’ option to enter and execute search queries. The ‘discover’ link is typically located in the drop down menu on the top left of the GUI. From this drop down menu, click on ‘discover’ to open a search screen.
Finally, queries can be entered to analyze the data as appropriate.

The following query syntax illustrates how basic ElasticSearch searches of the OneFS metadata can be expressed. For example:

To find regular files, residing in pool 3 on a particular cluster (<cluster_name>):

file_type equals regular and doc.metadata_pool equals 3 and doc.cluster equals <cluster_name>

Or to find files on a particular cluster with a modification time (-mtime) of Monday, October 21, 2024 9:00:00 PM, expressed as an epoch value:

doc.cluster equals <cluster_name> and doc.mtime.sec >= 1729544400

Additionally, the Kibana ‘dashboard’ can be used to create data visualizations, if desired.

From the Kibana UI, the ‘Discover’ page notifies of a new data source once the ‘isi metadataiq’ utility has been successfully configured and executed on the cluster. For example:

Clicking on the ‘Create data view’ button brings up the ‘Create data view’ page, where the new metadata index (for example, ‘isi_metadata_index’) is recognized and listed as a matching data source. For example:

Creating a data view is a simple as entering ‘isi_metadata_index’ in the ‘Index pattern’ search field and clicking the ‘Save data view to Kibana’ button.

The dropdown menu also provides options to manage or add fields to this data view:

Under the ‘Analytics’ tab, the ‘Discover’ mode enables data queries to be easily crafted and executed:

One or more filters can be easily created to display a desired subset of metadata entries:

A list of the available fields is displayed on the left of the Kibana page. For example:

(Incomplete list)

Filters can be configured by clicking on the blue ‘plus’ icon to bring up the ‘Add filter’ pane:

From the ‘Add filter’ pane, first add the desired field by searching and selecting from the dropdown list:

In this case the ‘doc.path’ filter is selected:

Next, select an operator from the list:

In this case, the ‘is’ operator is selected. Next, add the dataset’s path under /ifs:

Finally, click the ‘Add filter’ button and the new filter(s) will search for the matching newly generated database entries under the path, in this case ./home/demo2.

This returns no new entries yet since the MetadataIQ services and ChangeList job are still running:

Refreshing the Kibana window with the ‘doc.path’ filter shows the new metadata entries, in this case 391 entries, under /ifs/home/demo2:

Once the next scheduled MetadataIQ cycle has successfully completed, refreshing the Kibana UI reports any changes in the number of ‘doc.path’ entries, or ‘hits’, in this case from 391 to 480.

The standard MetadataIQ configuration captures all of the ChangeList output into each document, so this can be either queried directly or represented graphically.

Within the Kibana UI, under the ‘Analytics’ tab, moving from ‘Discover’ mode to ‘Dashboard’ allows rich custom visualizations to be created:

Kibana provides multiple data presentation options. Bar charts can be useful for representing to the ‘file and physical size distributions’ data:

Pie charts can be helpful with illustrating metadata fields from multiple clusters. Up-leveled data can be collated and represented in the Kibana dashboard as interactive charts, the details of which can easily be drilled down into by clicking on the desired region. For example, the ‘file type and cluster source’ distribution metrics below:

The following chart displays the ‘entry path changed’ field from the ‘top 10 values of change_types’. Additional context for a chart region or field can be viewed with Kibana’s mouse ‘hover-over’ functionality:

In the final article in this series, we’ll examine the tools and options available for monitoring and troubleshooting MetadataIQ.

OneFS MetadataIQ ElasticSearch and Kabana Configuration

The previous article in this series focused on provisioning and configuring a PowerScale cluster to support MetadataIQ. Now, we turn our focus to the server side deployment and setup.

MetadataIQ is introduced in OneFS 9.10 and requires a cluster to be running a fresh install or committed upgrade of 9.10 or later in order to run.

Either a physical Linux client or a virtual machine instance with sufficient disk space (20+GB is recommended) and virtual memory (minimum 256MB) is required to run the ElasticSearch database, which houses the off-cluster metadata, and Kibana visualization dashboard.

OneFS MetadataIQ has been verified in-house with the following components and versions:

Install Docker on the client system using the Linux distribution’s native package manager.

For example, to install the latest version of Docker using the Yum package manager:

# sudo yum install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

While the above command installs Docker and creates a ‘docker’ group, it does not add any users to the group by default.

Once successfully installed, the Docker container service can be enabled as follows:

# sudo systemctl start docker

If needed, detailed installation and management instructions are available in the Docker installation guide.

OneFS MetadataIQ requires an off-cluster ElasticSearch database for the metadata store.

ElasticSearch and can be installed on a Linux client via the following procedure. Note that this procedure assumes that Docker has already been successfully installed on the Linux client.

First, install ElasticSearch as follows:

# docker network create elastic

# docker run --name es01 --net elastic -p 9200:9200 -e "http.publish_host=<x.x.x.x>" -it docker.elastic.co/elasticsearch/elasticsearch:8.12.2

Where <x.x.x.x> is the Linux client’s IP address.

Note that the ElasticSearch database typically uses TCP port 9200 by default.

Additional instructions and information can be found in the ‘getting started’ section of the ElasticSearch configuration guide.

The next step is to install and configure the Kibana dashboard.

The Kibana binaries can be installed as follows:

# docker pull docker.elastic.co/kibana/kibana:8.12.2

# docker run --name kibana --net elastic -p 5601:5601 docker.elastic.co/kibana/kibana:8.12.2

Note that Kibana typically runs by default on TCP port 5601.

Additional instructions and information can be found in the Kibana installation guide.

Once the ElasticSearch database is up and running, the next step is to generate a new security API token. This can be performed from the Kibana web interface, by navigating to Dev Tools > Console and using the following JSON ‘POST’ syntax:

POST /_security/api_key 
{ 
  "name": "mdidx-api-key", 
  "role_descriptors”: { 
    "role-a": { 
      "cluster": ["all"], 
      "indices": [ 
        { 
          "names": ["isi_metadata_index"], 
          "privileges": ["all"] 
        } 
      ] 
    } 
  }, 
  "metadata": { 
    "application": "my-application", 
    "environment": { 
       "level": 1, 
       "trusted": true, 
       "tags": ["dev", "staging"] 
    } 
  } 
}

The JSON request (above) is entered into the left hand pane of the Kibana dev tools console, and the output is displayed in the right hand pane. For example:

A successful request returns a JSON structure containing the API key, plus its name, ID, and encoding. Output is of the form:

{

  "id": "w8J6G44BSEF85VOlyec4",

  "name": "mdidx-api-key",

  "api_key": "Zji7NkTHTkmjaMIeu5lSXg",

  "encoded": "dzhKNkc0NEJTRUY4NVZPbHllYzQ6WmppN05rVEhUa21qYU1JZXU1bFNYZw=="

}

Instructions describing this API key generation process in detail can be found in the ElasticSearch getting started guide.

The bulk API is used to update the ElasticSearch database. Note that the role does require the following privileges:

Create
Index
Write index

More information on using the ElasticSearch bulk API can be found in the ElasticSearch configuration guide.

An SSL certificate file is required in order for the cluster to securely connect to the ElasticSearch database. This SSL certificate file must be copied from the Linux client to the PowerScale cluster.

On the Linux client, the SSL certificate can be copied from the Docker container to the cluster as follows:

# docker cp es01:/usr/share/elasticsearch/config/certs/http_ca.crt ./

# scp ./http_ca.crt <x.x.x.x>:/ifs

Where <x.x.x.x> is the cluster’s IP address.

Authentication between the ElasticSearch instance and the PowerScale cluster can be verified from OneFS as follows:

# curl --cacert /ifs/http_ca.crt -H "Authorization: ApiKey <encode in api_key response from above>" https://<x.x.x.x>:9200/

Once the file transfer is complete, the certificate can be validated by confirming a matching checksum on the client and cluster using the ‘md5sum’ command for Linux and ‘md5’ command for PowerScale OneFS respectively. For example:

# md5 /ifs/http_ca.crt

MD5 (http_ca.crt) = 8d8f1ffe34812df1011d6c0c3652e9eb

More information on how to configure the database security can be found in the ElasticSearch configuration guide.

In the next article in this series, we’ll examine the process involved for using and managing an ElasticSearch instance and Kibana visualization portal.

OneFS MetadataIQ Cluster-side Configuration

In the prior article in this series, we took an in-depth look at MetadataIQ’s architecture and operation. Now, we turn our focus to its configuration and deployment.

MetadataIQ is introduced in OneFS 9.10 and requires a cluster to be running a fresh install or committed upgrade of 9.10 or later in order to run. The installation package for the OneFS 9.10 release is available at the Dell Support site.

Once the download has completed, the cluster can be upgraded and committed to OneFS 9.10 either from the CLI using the ‘isi upgrade cluster’ command, or via the WebUI by navigating to Cluster management > Upgrade.

Post upgrade to, or installation of, OneFS 9.10, the /ifs/.ifsvar/modules/metadataiq directory is created, which houses various MetadataIQ components and logs.

In addition to a PowerScale cluster running OneFS 9.10, MetadataIQ also requires that the following dependencies are met:

MetadataIQ requires the OneFS ISI_PRIV_SNAPHSHOT privilege in order to run.

Confirm that SnapshotIQ is licensed across the cluster and that the snapshot service is enabled.

# isi license view snapshotiq | grep Status

Status: Evaluation

# isi services -a | grep -i snapshot

isi_snapshot_d       Snapshot Daemon                          Enabled

Verify that the ElasticSearch packages are installed on the cluster by running the following CLI command:

# python -m pip list | grep -i elasticsearch

elasticsearch        6.3.1

elasticsearch-midx   8.14.0

MetadataIQ configuration and management in OneFS 9.10 is currently limited to the command line (CLI) or the platform API (pAPI) endpoints.

On the PowerScale cluster, the ‘isi metadataiq’ CLI command execution syntax is as follows:

Usage:

isi metadataiq {<action> | <subcommand>}

[--timeout <integer>]

[{--help | -h}]


Actions:

reset       Reset MetadataIQ to defaults.

resync      Rebuild metadata index from initial state.


Subcommands:

settings    Manage MetadataIQ settings.

As such, the following options are available for the ‘isi metadataiq’ CLI command:

OneFS MetadataIQ requires initialization and configuration before it can be successfully deployed. To this end, the ‘isi metadataiq settings modify’ CLI command can be used to edit the desired configuration fields, and the syntax is as follows:

Usage:

isi metadataiq settings modify

[--max-threads <integer>]

[--excluded-lnns <integer> | --clear-excluded-lnns | --add-excluded-lnns

<integer> | --remove-excluded-lnns <integer>]

[--nshards <integer>]

[--fetch-size <integer>]

[--work-queue-size <integer>]

[--verify-certificate <boolean>]

[--hostname <string>]

[--host-port <integer>]

[--path <string>]

[--schedule <string>]

[--changelist-job-retries <integer>]

[--changelist-job-tolerable-pause-hours <integer>]

[--changelist-job-tolerable-state-request-failures <integer>]

[--ca-certificate-path <string>]

[--api-key <string>]

[{--verbose | -v}]

[{--help | -h}]

When configuring MetadataIQ, the following client system credentials and parameters are required:

API ID
API Key
ElasticSearch database hostname
ElasticSearch port (typically 9200)
CA certificate path

For example:

# isi metadataiq settings modify --verify-certificate <boolean> --ca-certificate-path <string> --api-key <string> --hostname <string> --host-port <integer> --path <string> --schedule <string>

The ‘path’ parameter must be a path to a directory under the cluster’s /ifs filesystem. If left unspecified, the metadata path defaults to /ifs.

Note also that the ‘host-port’ value for the ElasticSearch database is typically TCP port 9200.

These configuration parameters are stored within gconfig on each node in the /etc/gconfig/metadataiq_config.gc file.

The OneFS platform API (pAPI) also offers an equivalent set of MetadataIQ configuration endpoints to the CLI, accessible under /platform/21/metadataiq/settings/.

For example:

# curl --insecure --basic --user <uname:passwd> https://<cluster_ip>:8080/platform/21/metadataiq/settings/
{
"settings" :
{
"consumer" :
{
"database_info" :
{
"api_key" : "A key is configured",
"certificate_path" :
 
"fa0e6e93f47c8d0074832c47bffd630ab1faad065f3d129b32e0aa7ae8de8595",
"database_type" : "ELK database",
"host_port" : 9200,
"hostname" : "https://10.224.101.214:9200",
"verify_certificate" : true
},
"excluded_lnns" : [],
"fetch_size" : 2048,

"max_threads" : 8,

"number_shards" : 8,

"work_queue_size" : 16

},

"producer" :

{

"changelist_job_retries" : 2,

"changelist_job_tolerable_pause_hours" : 24,

"changelist_job_tolerable_state_request_failures" : 720,

"path" : "/ifs/data",

"schedule" : "every day every 2 hours"

}

}

}

The following CLI commands can be used to control the operation of the MetadataIQ services:

Service	Daemon/Utility	Enable/disable/view Commands
Producer	isi_metadataiq_producer_d	isi services isi_metadataiq_producer_d enable isi services isi_metadataiq_producer_d disable isi services isi_metadataiq_producer_d
Consumer	isi_metadataiq_consumer_d	isi services isi_metadataiq_consumer_d enable isi services isi_metadataiq_consumer_d disable isi services isi_metadataiq_producer_d
Transfer	isi_metadataiq_transfer	# isi_metadataiq_transfer # isi_metadataiq_transfer –check # isi_metadataiq_transfer –consumer-checkpoint # isi_metadataiq_transfer –-map-version # isi_metadataiq_transfer –-show-mappings

For example, the two MetadataIQ services can be enabled from the CLI as follows:

# isi services -a isi_metadataiq_producer_d enable

The service 'isi_metadataiq_producer_d' has been enabled.

# isi services -a isi_metadataiq_consumer_d enable

The service 'isi_metadataiq_consumer_d' has been enabled.

In the next article in this series, we’ll examine the process involved in deploying and configuring an ElasticSearch instance and Kibana visualization portal.

OneFS MetadataIQ Architecture and Operation

In this second article in the series, we’ll take an in-depth look at MetadataIQ’s architecture and operation.

The OneFS MetadataIQ framework is based on the following core components:

A ‘MetadataIQ cycle’ describes the complete series of steps run by the MetadataIQ service daemons, which represent the full sequence, from determining the changes between two snapshots through updating the ElasticSearch database.

On the cluster side there are three core MetadataIQ components that are added in OneFS 9.10: The Producer Service, Consumer Service, and Transfer agent.

The producer service daemon, isi_metadataiq_producer_d, is responsible for running a metadata scan of the specified OneFS file system path, according to a configured schedule.

When first started, or in response to a configuration change, the producer daemon first loads its configuration, which instructs it on parameters such as the file system path to use, what schedule to run on, etc. Once the producer has found a valid schedule configuration string, it will start its first execution process, or ‘producer cycle’, performing the following actions:

A new snapshot of the configured file system path is taken.
Next, a ChangelistCreate job is started between the previous snapshot and the newly-taken snapshot checkpoints.
This ChangelistCreate job instance is monitored and, if necessary, restarted per a configurable number of retry attempts.
A consumer checkpoint file (CCP) is generated.
Finally, cleanup is performed and the old snapshot removed.

It’s worth noting that, in this initial version of MetadataIQ, only a single path may be configured.

Internally, the consumer checkpoint (CCP) is a JSON file containing a system b-tree created by the ChangelistCreate job, providing a delta between two input snapshots. These CCP files are created under the /ifs/.ifsvar/modules/metadataiq/cp/ directory with a ‘Checkpoint’ nomenclature followed by an incrementing ID. For example:

# ls /ifs/.ifsvar/modules/metadataiq/cp/

Checkpoint_0_2.json

To aid identification, the producer daemon creates its snapshots with a naming convention that includes both a ‘MetadataIQ’ prefix and creation timestamp naming convention for easy identification, plus an expiration value of one year. For example:

# isi snapshot snapshots list | grep -i metadata

3278 MetadataIQ_1730914225                        /ifs/data

During a producer cycle, once a CCP file is successfully generated, the old snapshot from the on-going cycle gets cleaned up. This snapshot deletion actually occurs in two phases: While the producer daemon initiates snapshot removal, the actual deletion is performed by the Job Engine’s SnapshotDelete job. As such, the contents of a ‘deleted’ snapshot may still exist until a SnapshotDelete job has run to completion and actually cleaned it up.

If a prior MetadataIQ execution cycle has not already completed, the old snapshot ID will automatically be set to HEAD (i.e.ID=0), and only the new snapshot will be used to report the current metadata state under the configured path.

Next, the MetadataIQ consumer service takes over the operations.

The consumer service relies on a set of database configuration parameters, including the CA certificate attributes, in order to securely connect to the remote ElasticSearch instance. These include:

Note that, in OneFS 9.10, MetadataIQ only supports the ElasticSearch database as its off-cluster metadata store.

Additionally, the consumer daemon, isi_metadataiq_consumer_d, also has a couple of configurable parameters to control and tune its behavior. There are:

The consumer daemon checks the queue for the arrival of a new checkpoint (CCP) file. When a CCP arrives, the daemon instructs the transfer agent to upload the metadata to the Elasticsearch database. The consumer daemon also continuously monitors the successful execution of the transfer agent, restarting it if needed.

If there happens to be more than one CCP in the queue, the consumer daemon will always select the file with the oldest timestamp.

The actual mechanics of uploading the consumer checkpoint (CCP) file to the remote database are handled by the transfer agent.

The transfer agent (isi_metadataiq_transfer), a python script, which is spawned on demand by the consumer daemon.

The transfer script is invoked with the path to a CCP file specifying the target changelist.
Next, the transfer script attempts to take an advisory lock on the CCP file to prevent more than one instance of the transfer script working on the same CCP file at a given time. This advisory lock is released whenever the transfer script completes or terminates.
After acquiring the advisory lock, the transfer script validates that both the CCP file and changelist exist, and that the ElasticSearch database connection and mapping are valid. It will configure the index mappings if the index does not already exist.
If everything is fine in the above step, the transfer script will start its ‘draining loop’, fetching and batching changelist entries and allocating them to a worker thread pool for data processing and transfer.
Once the changelist is fully transferred to the ElasticSearch database, the transfer script removes the CCP file and changelist.
Finally, the transfer script releases its advisory lock and exits normally.

In the event of a failure, the transfer agent is automatically restarted by the consumer daemon.

After the initial cluster setup and config, each additional MetadataIQ job run populates the remote ElasticSearch database with updated metadata from the configured dataset.

The ElasticSearch database and Kibana visualization portal reside on an off-cluster Linux host.

ElasticSearch typically uses TCP port 9200 by default, for communication and receiving metadata updates from the PowerScale cluster(s). Kibana typically runs by default on TCP port 5601.

Periodic synchronizations are needed to keep the database updated with new metadata changes, and the recommendation is to configure a dataset-appropriate schedule for the MetadataIQ job. And with the services enabled, the first MetadataIQ cycle begins as soon as a valid schedule has been configured.

In the next article in this series, we’ll examine the process involved in standing up and configuring a MetadataIQ environment.