OneFS Job Engine and Parallel Restriping

One of the cluster’s functional areas that sees feature enhancement love in the new OneFS 9.7 release is the Job Engine. Specifically, the ability to support multiple restriping jobs.

As you’re probably aware, the Job Engine is a OneFS service, or daemon, that runs cluster housekeeping jobs, storage services, plus a variety of user initiated data management tasks. As such, the Job Engine performs a diverse and not always complimentary set of roles. On one hand it attempts to keeps the cluster healthy and balanced, while mitigating performance impact, and still allowing customers to perform on-demand large parallel, cluster-wide deletes, full-tree permissions management, data tiering, etc.

At a high level, this new OneFS 9.7 parallel restriping feature enables the Job Engine to run multiple restriping jobs at the same time. Restriping in OneFS is the process whereby filesystem blocks are moved around for repair, balance, tiering, etc. These restriping jobs include FlexProtect, MediaScan, AutoBalance, MultiScan, SmartPools, etc.

As such, an example of parallel restring could be running SmartPools alongside MultiScan, helping to unblock a data tiering workflow which was stuck behind an important cluster maintenance job. The following OneFS 9.7 example shows the FlexProtectLin, MediaScan, and SmartPools restriping jobs running concurrently:

# isi job jobs list

ID   Type           State   Impact  Policy  Pri  Phase  Running Time

---------------------------------------------------------------------

2273 MediaScan      Running Low     LOW     8    1/8    7h 57m

2275 SmartPools     Running Low     LOW     6    1/2    9m 44s

2305 FlexProtectLin Running Medium  MEDIUM  1    1/4    10s

---------------------------------------------------------------------

Total: 3

By way of contrast, in releases prior to OneFS 9.7, only a single restriping job can run at any point in time. Any additional restriping jobs are automatically places in a ‘waiting state’. But before getting into the details of the parallel restriping feature, a quick review of the Job Engine, and its structure and function could be useful.

In OneFS, the Job Engine runs across the entire cluster and is responsible for dividing and conquering large storage management and protection tasks. To achieve this, it reduces a task into smaller work items and then allocates, or maps, these portions of the overall job to multiple worker threads on each node. Progress is tracked and reported on throughout job execution and a detailed report and status is presented upon completion or termination.

A comprehensive check-pointing system allows jobs to be paused and resumed, in addition to stopped and started. Additionally, the Job Engine also includes an adaptive impact management system, CPU and drive-sensitive impact control, and the ability to run up to three jobs at once.

Jobs are executed as background tasks across the cluster, using spare or especially reserved capacity and resources, and can be categorized into three primary classes:

Category Description
File System Maintenance Jobs These jobs perform background file system maintenance, and typically require access to all nodes. These jobs are required to run in default configurations, and often in degraded cluster conditions. Examples include file system protection and drive rebuilds.
Feature Support Jobs The feature support jobs perform work that facilitates some extended storage management function, and typically only run when the feature has been configured. Examples include deduplication and anti-virus scanning.
User Action Jobs These jobs are run directly by the storage administrator to accomplish some data management goal. Examples include parallel tree deletes and permissions maintenance.

Although the file system maintenance jobs are run by default, either on a schedule or in reaction to a particular file system event, any Job Engine job can be managed by configuring both its priority-level (in relation to other jobs) and its impact policy.

Job Engine jobs often comprise several phases, each of which are executed in a pre-defined sequence. For instance, jobs like TreeDelete comprise a single phase, whereas more complex jobs like FlexProtect and MediaScan that have multiple distinct phases.

A job phase must be completed in entirety before the job can progress to the next phase. If any errors occur during execution, the job is marked “failed” at the end of that particular phase and the job is terminated.

Each job phase is composed of a number of work chunks, or Tasks. Tasks, which are comprised of multiple individual work items, are divided up and load balanced across the nodes within the cluster. Successful execution of a work item produces an item result, which might contain a count of the number of retries required to repair a file, plus any errors that occurred during processing.

When a Job Engine job needs to work on a large portion of the file system, there are four main methods available to accomplish this. The most straightforward access method is via metadata, using a Logical Inode (LIN) Scan. In addition to being simple to access in parallel, LINs also provide a useful way of accurately determining the amount of work required.

A directory tree walk is the traditional access method since it works similarly to common UNIX utilities, such as find – albeit in a far more distributed way. For parallel execution, the various job tasks are each assigned a separate subdirectory tree. Unlike LIN scans, tree walks may prove to be heavily unbalanced, due to varying sub-directory depths and file counts.

Disk drives provide excellent linear read access, so a drive scan can deliver orders of magnitude better performance than a directory tree walk or LIN scan for jobs that don’t require insight into file system structure. As such, drive scans are ideal for jobs like MediaScan, which linearly traverses each node’s disks looking for bad disk sectors.

A fourth class of Job Engine jobs utilize a ‘changelist’, rather than LIN-based scanning. The changelist approach analyzes two snapshots to find the LINs which changed (delta) between the snapshots, and then dives in to determine the exact changes.

Architectural, the job engine is based on a delegation hierarchy comprising coordinator, director, manager, and worker processes.

There are other threads which are not included in the diagram above, which relate to internal functions, such as communication between the various JE daemons, and collection of statistics. Also, with three jobs running simultaneously, each node would have three manager processes, each with its own number of worker threads.

Once the work is initially allocated, the job engine uses a shared work distribution model in order to execute the work, and each job is identified by a unique Job ID. When a job is launched, whether it’s scheduled, started manually, or responding to a cluster event, the Job Engine spawns a child process from the isi_job_d daemon running on each node. This job engine daemon is also known as the parent process.

The entire job engine’s orchestration is handled by the coordinator, which is a process that runs on one of the nodes in a cluster. Any node can act as the coordinator, and the principal responsibilities include:

  • Monitoring workload and the constituent nodes’ status
  • Controlling the number of worker threads per-node and cluster-wide
  • Managing and enforcing job synchronization and checkpoints

While the actual work item allocation is managed by the individual nodes, the coordinator node takes control, divides up the job, and evenly distributes the resulting tasks across the nodes in the cluster. For example, if the coordinator needs to communicate with a manager process running on node five, it first sends a message to node five’s director, which then passes it on down to the appropriate manager process under its control. The coordinator also periodically sends messages, via the director processes, instructing the managers to increment or decrement the number of worker threads.

The coordinator is also responsible for starting and stopping jobs, and also for processing work results as they are returned during the execution of a job. Should the coordinator process die for any reason, the coordinator responsibility automatically moves to another node.

The coordinator node can be identified via the following CLI command:

# isi job status --verbose | grep Coordinator

Each node in the cluster has a job engine director process, which runs continuously and independently in the background. The director process is responsible for monitoring, governing and overseeing all job engine activity on a particular node, constantly waiting for instruction from the coordinator to start a new job. The director process serves as a central point of contact for all the manager processes running on a node, and as a liaison with the coordinator process across nodes. These responsibilities include:

  • Manager process creation
  • Delegating to and requesting work from other peers
  • Sending and receiving status messages

The manager process is responsible for arranging the flow of tasks and task results throughout the duration of a job. The manager processes request and exchange work with each other and supervise the worker threads assigned to them. At any point in time, each node in a cluster can have up to three manager processes, one for each job currently running. These managers are responsible for overseeing the flow of tasks and task results.

Each manager controls and assigns work items to multiple worker threads working on items for the designated job. Under direction from the coordinator and director, a manager process maintains the appropriate number of active threads for a configured impact level, and for the node’s current activity level. Once a job has completed, the manager processes associated with that job, across all the nodes, are terminated. And new managers are automatically spawned when the next job is moved into execution.

The manager processes on each node regularly send updates to their respective node’s director, which, in turn, informs the coordinator process of the status of the various worker tasks.

Each worker thread is given a task, if available, which it processes item-by-item until the task is complete or the manager un-assigns the task. The status of the nodes’ workers can be queried by running the CLI command “isi job statistics view”. In addition to the number of current worker threads per node, a sleep to work (STW) ratio average is also provided, giving an indication of the worker thread activity level on the node.

Towards the end of a job phase, the number of active threads decreases as workers finish up their allotted work and become idle. Nodes which have completed their work items just remain idle, waiting for the last remaining node to finish its work allocation. When all tasks are done, the job phase is considered to be complete and the worker threads are terminated.

As jobs are processed, the coordinator consolidates the task status from the constituent nodes and periodically writes the results to checkpoint files. These checkpoint files allow jobs to be paused and resumed, either proactively, or in the event of a cluster outage. For example, if the node on which the Job Engine coordinator was running went offline for any reason, a new coordinator would be automatically started on another node. This new coordinator would read the last consistency checkpoint file, job control and task processing would resume across the cluster from where it left off, and no work would be lost.

Job engine checkpoint files are stored in ‘results’ and ‘tasks’ subdirectories under the path ‘/ifs/.ifsvar/modules/jobengine/cp/<job_id>/’ for a given job. On large clusters and/or with a job running at high impact, there can be many checkpoint files accessed from all nodes, which may result in contention. Checkpoints are split into sixteen subdirectories under both tasks and results to alleviate this bottleneck.

PowerScale OneFS 9.7

Dell PowerScale is already powering up the holiday season with the launch of the innovative OneFS 9.7 release, which shipped today (13th December 2023). This new 9.7 release is an all-rounder, introducing PowerScale innovations in cloud, performance, security, and ease of use.

Enhancements to APEX File Storage for AWS

After the debut of APEX File Storage for AWS earlier this year, OneFS 9.7 extends and simplifies the PowerScale in the public cloud offering delivering more features on more instance types across more regions.

In addition to providing the same OneFS software platform on-prem and in the cloud, and customer-managed for full control, APEX File Storage for AWS in OneFS 9.7 sees a 60% capacity increase, providing linear capacity and performance scaling up to six SSD nodes and 1.6 PiB per namespace/cluster, and up to 10GB/s reads and 4GB/s writes per cluster. This can make it a solid fit for traditional file shares and home directories, vertical workloads like M&E, healthcare, life sciences, finserv, and next-gen AI, ML and analytics applications.

PowerScale’s scale-out architecture can be deployed on customer managed AWS EBS and ECS infrastructure, providing the scale and performance needed to run a variety of unstructured workflows in the public cloud. Plus, with OneFS 9.7, an ‘easy button’ for streamlined AWS infrastructure provisioning and deployment.

Once in the cloud, existing PowerScale investments can be further leveraged by accessing and orchestrating your data through the platform’s multi-protocol access and APIs.

This includes the common OneFS control plane (CLI, WebUI, and platform API), and the same enterprise features: Multi-protocol, SnapshotIQ, SmartQuotas, Identity management, etc.

With OneFS 9.7, APEX File Storage for AWS also sees the addition of support for HDFS and FTP protocols, in addition to NFS, SMB, and S3. Plus granular performance prioritization and throttling is also enabled with SmartQoS, allowing admins to configure limits on the maximum number of protocol operations that NFS, S3, SMB, or mixed protocol workloads can consume on an APEX File Storage for AWS cluster.

Security

With data integrity and protection being top of mind in this era of unprecedented cyber threats, OneFS 9.7 brings a bevy of new features and functionality to keep your unstructured data and workloads more secure than ever. These new OneFS 9.7 security enhancements help address US Federal and DoD mandates, such as FIPS 140-2 and DISA STIGs – in addition to general enterprise data security requirements. Included in the new OneFS 9.7 release is a simple cluster configuration backup and restore utility, address space layout randomization, and single sign-on (SSO) lookup enhancements.

Data mobility

On the data replication front, SmartSync sees the introduction of GCP as an object storage target in OneFS 9.7, in addition to ECS, AWS and Azure. The SmartSync data mover allows flexible data movement and copying, incremental resyncs, push and pull data transfer, and one-time file to object copy.

Performance improvements

Building on the streaming read performance delivered in a prior release, OneFS 9.7 also unlocks dramatic write performance enhancements, particularly for the all-flash NVMe platforms – plus infrastructure support for future node hardware platform generations. A sizable boost in throughput to a single client helps deliver performance for the most demanding GenAI workloads, particularly for the model training and inferencing phases. Additionally, the scale-out cluster architecture enables performance to scale linearly as GPUs are increased, allowing PowerScale to easily supports AI workflows from small to large.

Cluster support for InsightIQ 5.0

The new InsightIQ 5.0 software expands PowerScale monitoring capabilities, including a new user interface, automated email alerts and added security. InsightIQ 5.0 is available today for all existing and new PowerScale customers at no additional charge. These innovations are designed to simplify management, expand scale and security and automate operations for PowerScale performance monitoring for AI, GenAI and all other workloads.

In summary, OneFS 9.7 brings the following new features and functionality to the Dell PowerScale ecosystem:

Feature Info
Cloud ·         APEX File Storage for AWS 60% capacity increase

·         Streamlined and automated APEX provisioning and deployment

·         HDFS, FTP, and SmartQoS support

Simplicity ·         Job Engine Restripe Parallelization

·         Cluster support for InsightIQ 5.0

·         SmartSync GCP support

Performance ·         Write performance improvements for NVMe-based all-flash platforms

·         Infrastructure support for next generation all-flash node hardware platforms

Security ·         Cluster configuration backup and restore

·         Address space layout randomization

·         Single sign-on (SSO) lookup enhancements

We’ll be taking a deeper look at these new features and functionality in blog articles over the course of the next few weeks.

Meanwhile, the new OneFS 9.7 code is available on the Dell Online Support site, as both an upgrade and reimage file, allowing both installation and upgrade of this new release.

OneFS and Client Bandwidth Measurement with iPerf

Sometimes in a storage admin’s course of duty there’s a need to quickly and easily assess the bandwidth between a PowerScale cluster and client. The ubiquitous iPerf tool is a handy utility for taking active measurements of the maximum achievable bandwidth between a PowerScale cluster and client, across the node’s front-end IP network(s).

iPerf was developed by NLANR/DAST as a modern alternative for measuring maximum TCP and UDP bandwidth performance. iPerf is a flexible tool, allowing the tuning of various parameters and UDP characteristics, and reporting network performance stats including bandwidth, delay jitter, datagram loss, etc.

In addition and contrast to the classic iPerf (typically version 2.x), a newer and more feature rich iPerf3 version is also available. Unlike the classic incantation, iPerf3 is primarily developed and maintained by ESnet and the Lawrence Berkeley National Laboratory, and made available under BSD licensing. Note that iPerf3 neither shares code nor provides backwards compatibility with the classic iPerf.

Additional optional features of iPerf3 include:

  • CPU affinity setting
  • IPv6 flow labeling
  • SCTP
  • TCP congestion algorithm settings
  • Sendfile / zerocopy
  • Socket pacing
  • Authentication

Both iPerf and iPerf3 are available preinstalled on OneFS, and can be useful for measuring and verifying anticipated network performance prior to running any performance benchmark. The standard ‘iperf’ CLI command automatically invokes the classic (v2) version:

# iperf -v

iperf version 2.0.4 (7 Apr 2008) pthreads

Within OneFS, the iPerf binary can be found in the /usr/local/bin/ directory on each node:

# whereis iperf

iperf: /usr/local/bin/iperf /usr/local/man/man1/iperf.1.gz

Whereas the enhanced iPerf version 3 uses the ‘iperf3’ CLI syntax, and also lives under /usr/local/bin:

# iperf3 -v

iperf 3.4 (cJSON 1.5.2)

# whereis iperf3

iperf3: /usr/local/bin/iperf3 /usr/local/man/man1/iperf3.1.gz

For Linux and Windows clients, Iperf binaries can also be downloaded and installed from the following location:

https://iperf.fr/

The iPerf source code is also available at Sourceforge for those ‘build-your-own’ aficionados among us:

http://sourceforge.net/projects/iperf/

Under the hood, iPerf allows the configuration and tuning of a variety of buffering and timing parameters across both TCP and UDP, and with support for IPv4 and IPv6 environments. For each test, iPerf reports the maximum bandwidth, loss, and other salient metrics.

More specifically, iPerf supports the following features:

Attribute Details
TCP ·         Measure bandwidth

·         Report MSS / MTU size and observed read sizes

·         Supports SCTP multi-homing and redundant paths for reliability and resilience.

UDP ·         Client can create UDP streams of specified bandwidth

·         Measure packet loss

·         Measure delay jitter

·         Supports muti-cast

Platform support ·         Windows, Linux, MacOS, BSD UNIX, Solaris, Android, VxWorks.
Concurrency ·         Client and server can support multiple simultaneous connections (-P flag).

·         iPerf3 server accepts multiple simultaneous connections from the same client.

Duration ·         Can be configured run for a specified time (-t flag), in addition to a set amount of data (-n and -k flags).

·         Server can be run as a daemon (-D flag)

Reporting ·         Can display periodic, intermediate bandwidth, jitter, and loss reports at configurable intervals (-i flag).

When it comes to running iPerf, the most basic use case is testing a single connection from a client to a node on the cluster. This can be initiated as follows:

On the cluster node, the following CLI command will initiate the iPerf server:

# iperf -s

Similarly, on the client, the following CLI syntax will target the iPerf server on the cluster node:

# iperf -c <server_IP>

For example, with a freeBSD client with IP address 10.11.12.9 connecting to a cluster node at 10.10.11.12:

# iperf -c 10.10.11.12

------------------------------------------------------------

Client connecting to 10.10.11.12, TCP port 5001

TCP window size:   131 KByte (default)

------------------------------------------------------------

[  3] local 10.11.12.9 port 65001 connected with 10.10.11.12 port 5001

[ ID] Interval       Transfer     Bandwidth

[  3]  0.0-10.0 sec  31.8 GBytes  27.3 Gbits/sec

And from the cluster node:

# iperf -s

------------------------------------------------------------

Server listening on TCP port 5001

TCP window size:   128 KByte (default)

------------------------------------------------------------

[  4] local 10.10.11.12 port 5001 connected with 10.11.12.9 port 65001

[ ID] Interval       Transfer     Bandwidth

[  4]  0.0-10.0 sec  31.8 GBytes  27.3 Gbits/sec

As indicated in the above output, iPerf uses a default window size of 128KB. Also note that the classic iPerf (v2) uses TCP port 5001 by default on OneFS. As such, this port must be open on any and all firewalls and/or packet filters situated between client and node for the above to work. Similarly, iPerf3 defaults to TCP 5201, and the same open port requirements between clients and cluster apply.

Here’s the output from the same configuration but using iPerf3:

For example, from the server:

# iperf3 -s

-----------------------------------------------------------

Server listening on 5201

-----------------------------------------------------------

Accepted connection from 10.11.12.9, port 12543

[  5] local 10.10.11.12 port 5201 connected to 10.11.12.9 port 55439

[ ID] Interval           Transfer     Bitrate

[  5]   0.00-1.00   sec  3.22 GBytes  27.7 Gbits/sec

[  5]   1.00-2.00   sec  3.59 GBytes  30.9 Gbits/sec

[  5]   2.00-3.00   sec  3.52 GBytes  30.3 Gbits/sec

[  5]   3.00-4.00   sec  3.95 GBytes  33.9 Gbits/sec

[  5]   4.00-5.00   sec  4.07 GBytes  34.9 Gbits/sec

[  5]   5.00-6.00   sec  4.10 GBytes  35.2 Gbits/sec

[  5]   6.00-7.00   sec  4.14 GBytes  35.6 Gbits/sec

[  5]   6.00-7.00   sec  4.14 GBytes  35.6 Gbits/sec

- - - - - - - - - - - - - - - - - - - - - - - - -

[ ID] Interval           Transfer     Bitrate

[  5]   0.00-7.00   sec  27.8 GBytes  34.1 Gbits/sec                  receiver

iperf3: the client has terminated

-----------------------------------------------------------

Server listening on 5201

-----------------------------------------------------------

And from the client:

# iperf3 -c 10.10.11.12

Connecting to host 10.10.11.12, port 5201

[  5] local 10.11.12.9 port 55439 connected to 10.10.11.12 port 5201

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd

[  5]   0.00-1.00   sec  3.22 GBytes  27.7 Gbits/sec    0    316 KBytes

[  5]   1.00-2.00   sec  3.59 GBytes  30.9 Gbits/sec    0    316 KBytes

[  5]   2.00-3.00   sec  3.52 GBytes  30.3 Gbits/sec    0    504 KBytes

[  5]   3.00-4.00   sec  3.95 GBytes  33.9 Gbits/sec    2    671 KBytes

[  5]   4.00-5.00   sec  4.07 GBytes  34.9 Gbits/sec    0    671 KBytes

[  5]   5.00-6.00   sec  4.10 GBytes  35.2 Gbits/sec    1    664 KBytes

[  5]   6.00-7.00   sec  4.14 GBytes  35.6 Gbits/sec    0    664 KBytes

^C[  5]   7.00-7.28   sec  1.17 GBytes  35.6 Gbits/sec    0    664 KBytes

- - - - - - - - - - - - - - - - - - - - - - - - -

[ ID] Interval           Transfer     Bitrate         Retr

[  5]   0.00-7.28   sec  27.8 GBytes  32.8 Gbits/sec    3             sender

[  5]   0.00-7.28   sec  0.00 Bytes  0.00 bits/sec                  receiver

iperf3: interrupt - the client has terminated

Regarding iPerf CLI syntax, the following options are available in each version of the tool:

Options Description iPerf iPerf3
<none> Default settings X
–authorized-users-path Path to the configuration file containing authorized users credentials to run iperf tests (if built with OpenSSL support) X
-A Set the CPU affinity, if possible (Linux, FreeBSD, and Windows only). X
-b Set target bandwidth/bitrate  to n bits/sec (default 1 Mbit/sec). Requires UDP (-u). X X
-B Bind to <host>, an interface or multicast address X X
-c Run in client mode, connecting to <host> X X
-C Compatibility; for use with older versions – does not sent extra msgs X
-C Set the congestion control algorithm (Linux and FreeBSD only) X
–cport Bind data streams to a specific client port (for TCP and UDP only, default is to use an ephemeral port) X
–connect-timeout Set timeout for establishing the initial control connection to the server, in milliseconds.  Default behavior is the OS’ timeout for TCP connection establishment. X
-d Simultaneous bi-directional bandwidth X
-d Emit debugging output X
-D Run the server as a daemon X X
–dscp Set the IP DSCP bits X
-f Format to report: Kbits/Mbits/Gbits/Tbits X
-F Input the data to be transmitted from a file X X
–forceflush Force flushing output at every interval, to avoid buffering when sending output to pipe. X
–fq-rate Set a rate to be used with fair-queueing based socket-level

pacing, in bits per second.

X
–get-server-output Get the output from the server.  The output format is determined by the server (ie. JSON ‘-j’) X
-h Help X X
-i Interval: Pause n seconds between periodic bandwidth reports. X X
-I Input the data to be transmitted from stdin X
-I Write a file with the process ID X
-J Output in JSON format X
-k Number of blocks (packets) to transmit (instead of -t or -n) X
-l Length of buffer to read or write.  For TCP tests, the default value is 128KB.  With UDP, iperf3 tries to dynamically determine a reasonable sending size based on the path MTU; if that cannot be determined it uses 1460 bytes as a sending size. For SCTP tests, the default size is 64KB. X
-L Set length read/write buffer (defaults to 8 KB) X
-L Set the IPv6 flow label X
–logfile Send output to a log file. X
-m Print TCP maximum segment size (MTU – TCP/IP header) X
-M Set TCP maximum segment size (MTU – 40 bytes) X X
-n number of bytes to transmit (instead of -t) X X
-N Set TCP no delay, disabling Nagle’s Algorithm X X
–nstreams Set number of SCTP streams. X
-o Output the report or error message to a specified file X
-O Omit the first n seconds of the test, to skip past the TCP slow-start period. X
-p Port: set server port to listen on/connect to X X
-P Number of parallel client threads to run X X
–pacing-timer Set pacing timer interval in microseconds (default 1000 microseconds, or 1 ms).  This controls iperf3’s internal pacing timer for the -b/–bitrate option. X
-r Bi-directional bandwidth X
-R Reverse the direction of a test, so that the server sends data to the client X
–rsa-private-key-path Path to the RSA private key (not password-protected) used to decrypt authentication credentials from the client (if built with OpenSSL support). X
–rsa-public-key-path Path to the RSA public key used to encrypt authentication credentials (if built with OpenSSL support) X
-s Run iPerf in server mode X X
-S Set the IP type of service. X
–sctp use SCTP rather than TCP (FreeBSD and Linux) X
-t Time in seconds to transmit for (default 10 secs) X X
-T Time-to-live, for multicast (default 1) X
-T Prefix every output line with this title string X
-u Use UDP rather than TCP. X X
-U Run in single threaded UDP mode X
–username Username to use for authentication to the iperf server (if built with OpenSSL support).  The password will be prompted for interactively when the test is run. X
-v Print version information and quit X X
-V Set the domain to IPv6 X
-V Verbose – give more detailed output X
-w TCP window size (socket buffer size) X X
-x Exclude C(connection), D(data), M(multicast), S(settings), V(server) reports X
-X Bind SCTP associations to a specific subset of links using sctp_bindx X
-y If set to C or c, report results as CSV (comma separated values) X
-Z Set TCP congestion control algorithm (Linux only) X
-Z Use a ‘zero copy’ method of sending data, such as sendfile instead of the usual write. X
-1 Handle one client connection, then exit. X
-4 Only use IPv4 X
-6 Only use Ipv6 X

To run the iPerf server across all nodes in a cluster, it can be initiated in conjunction with the OneFS ‘isi_for_array’ CLI utility, as follows:

# isi_for_array iperf -s

Bidirectional testing can also sometimes be a useful sanity-check, with OneFS acting as the client pointing to a client OS running the server instance of iPerf. For example:

# iperf -c 10.10.11.205 -i 5 -t 60 -P 4

Start the iperf client on a Linux client connecting to one of the PowerScale nodes.

# iperf -c 10.10.1.100

For a Windows client, the same CLI syntax, issued from the command shell (cmd.exe), can be used to start the iperf client and connect to a PowerScale nodes. For example:

C:\Users\pocadmin\Downloads\iperf-2.0.9-win64\iperf-2.0.9-win64>iperf.exe -c 10.10.0.196

iPerf Write Testing

When it comes to write performance testing, the following CLI syntax can be used on the client to executes a write speed (Client –> Cluster) test:

# iperf -P 8 -c <clusterIP>

Note that the ‘-P’ flag designates parallel client threads, allowing the iPerf threads to be match up with the number of physical CPU cores (not hyper-threads) available to the client.

Similarly, the following CLI command can be used on the client to initiate a read speed (Client <– Cluster) test:

# iperf -P 8 -R -c <clusterIP>

Below is an example command from a Linux VM to a single PowerScale node.  Testing was repeated from each Linux client to each node in the cluster to validate results and verify consistent network performance. Using the cluster nodes as the server, the bandwidth tested to ~ 7.2Gbps per VM. (Note that, in this case, the VM limit is 8.0 Gbps):

# iperf -c onefs-node1 -i 5 -t 60 -P 4

------------------------------------------------------------

Client connecting to isilon-node1, TCP port 5001

TCP window size: 94.5 KByte (default)

------------------------------------------------------------

[  4] local 10.10.0.205 port 44506 connected with 172.16.0.5 port 5001

[SUM]  0.0-60.0 sec  50.3 GBytes  7.20 Gbits/sec

Two Linux VMs were also testing running iPerf in parallel to maximize the ExpressRoute network link. This test involved dual iPerf writes from the Linux clients to separate cluster nodes.

[admin@Linux64GB16c-3 ~]$ iperf -c onefs-node3 -i 5 -t 40 -P 4

[SUM]  0.0-40.0 sec  22.5 GBytes  4.83 Gbits/sec 

[admin@linux-vm2 ~]$ iperf -c onefs-node2 -i 5 -t 40 -P 4

[SUM]  0.0-40.0 sec  22.1 GBytes  4.75 Gbits/sec

As can be seen from the results of the iPerf tests, writes appear to split evenly from the Linux clients to the cluster nodes, while saturating the bandwidth of its Azure ExpressRoute link.

OneFS HTTP Services and Security

To facilitate granular HTTP security configuration, OneFS provides an option to disable nonessential HTTP components selectively. Disabling a specific component’s service still allows other essential services on the cluster to continue to run unimpeded. In OneFS 9.4 and later, the following nonessential HTTP services may be disabled:

Service Description
PowerScaleUI The OneFS WebUI configuration interface.
Platform-API-External External access to the OneFS platform API endpoints.
Rest Access to Namespace (RAN) REST-ful access via HTTP to a cluster’s /ifs namespace.
RemoteService Remote Support and In-Product Activation.
SWIFT (deprecated) Deprecated object access to the cluster via the SWIFT protocol. This has been replaced by the S3 protocol in OneFS.

Each of these services may be enabled or disabled independently via the CLI or platform API by a user account with the ISI_PRIV_HTTP RBAC privilege.

The ‘isi http services’ CLI command set can be used to view and modify the nonessential services HTTP services:

# isi http services list

ID                    Enabled

------------------------------

Platform-API-External Yes

PowerScaleUI          Yes

RAN                   Yes

RemoteService         Yes

SWIFT                 No

------------------------------

Total: 5

For example, remote HTTP access to the OneFS /ifs namespace can easily be disabled as follows:

 # isi http services modify RAN --enabled=0

You are about to modify the service RAN. Are you sure? (yes/[no]): yes

Similarly, a subset of the HTTP configuration settings can also be viewed and edited via the WebUI by navigating to Protocols > HTTP settings:

That said, the implications and impact of disabling each of the services is as follows:

Service Disabling Impacts
WebUI The WebUI is completely disabled, and access attempts (default TCP port 8080) are denied with the following warning:

“Service Unavailable. Please contact Administrator.”

If the WebUI is re-enabled, the external platform API service (Platform-API-External) is also started if it is not running. Note that disabling the WebUI does not affect the PlatformAPI service.

Platform API External API requests to the cluster are denied, and the WebUI is disabled, since it uses the Platform-API-External service.

Note that the Platform-API-Internal service is not impacted if/when the Platform-API-External is disabled, and internal pAPI services continue to function as expected.

If the Platform-API-External service is re-enabled, the WebUI will remain inactive until the PowerScaleUI service is also enabled.

RAN If RAN is disabled, the WebUI components for File System Explorer and File Browser are also automatically disabled.

From the WebUI, attempts to access the OneFS file system explorer (File System > File System Explorer) fail with the following warning message:

“Browse is disabled as RAN service is not running. Contact your administrator to enable the service.”

This same warning is also displayed when attempting to access any other WebUI components that require directory selection.

RemoteService If RemoteService is disabled, the WebUI components for Remote Support and In-Product Activation are disabled.

In the WebUI, going to Cluster Management > General Settings and selecting the Remote Support tab displays the following message:

“The service required for the feature is disabled. Contact your administrator to enable the service.”

In the WebUI, going to Cluster Management > Licensing and scrolling to the License Activation section displays the following message: The service required for the feature is disabled. Contact your administrator to enable the service.

SWIFT Deprecated object protocol and disabled by default.

OneFS HTTP configuration can be displayed from the CLI via the ‘isi http settings view’ command:

# isi http settings view

            Access Control: No

      Basic Authentication: No

    WebHDFS Ran HTTPS Port: 8443

                       Dav: No

         Enable Access Log: Yes

                     HTTPS: No

 Integrated Authentication: No

               Server Root: /ifs

                   Service: disabled

           Service Timeout: 8m20s

          Inactive Timeout: 15m

           Session Max Age: 4H

Httpd Controlpath Redirect: No

Similarly, HTTP configuration can be managed and changed using the ‘isi http settings modify’ CLI syntax.

For example, to reduce the maximum session age from 4 to 2 hours:

# isi http settings view | grep -i age

           Session Max Age: 4H

# isi http settings modify --session-max-age=2H

# isi http settings view | grep -i age

           Session Max Age: 2H

The full set of configuration options for ‘isi http settings’ include:

Option Description
–access-control <boolean> Enable Access Control Authentication for HTTP service.  Access Control  Authentication requires at least one type of authentication to be enabled.
–basic-authentication <boolean> Enable Basic Authentication for HTTP service.
–webhdfs-ran-https-port <integer> Configure Data Services Port for HTTP service.
–revert-webhdfs-ran-https-port Set value to system default for –webhdfs-ran-https-port.
–dav <boolean> Comply with Class 1 and 2 of the DAV specification (RFC 2518) for HTTP service. All DAV clients must go through a single node.  DAV compliance is NOT met if you go through SmartConnect, or via 2 or more node IPs.
–enable-access-log <boolean> Enable writing to a log when the HTTP server is accessed for HTTP service.
–https <boolean> Enable HTTPS transport protocol for HTTP service.
–https <boolean> Enable HTTPS transport protocol for HTTP service.
–integrated-authentication <boolean> Enable Integrated Authentication for HTTP service.
–server-root <path> Document root directory for HTTP service. Must be within /ifs.
–service (enabled | disabled | redirect | disabled_basicfile) Enable/disable HTTP Service or redirect to WebUI or disabled BasicFileAccess.
–service-timeout <duration> Amount of time(seconds) the server will wait for certain events before failing a request. A value of 0 indicates that the service timeout value is Apache default.
–revert-service-timeout Set value to system default for –service-timeout.
–inactive-timeout <duration> Get the HTTP RequestReadTimeout directive from both WebUI and HTTP service.
–revert-inactive-timeout Set value to system default for –inactive-timeout.
–session-max-age <duration> Get the HTTP SessionMaxAge directive from both WebUI and HTTP service.
–revert-session-max-age Set value to system default for –session-max-age.
–httpd-controlpath-redirect <boolean> Enable or disable WebUI redirection to HTTP service.

Note that, while the OneFS S3 service uses HTTP, it is considered as a tier-1 protocol, and as such is managed via its own ‘isi s3’ CLI command set and corresponding WebUI area. For example, the following CLI command will force the cluster to only accept encrypted HTTPS/SSL traffic on TCP port 9999 (rather than the default TCP port 9021):

# isi s3 settings global modify --https-only 1 –https-port 9921

# isi s3 settings global view

         HTTP Port: 9020

        HTTPS Port: 9999

        HTTPS only: Yes

S3 Service Enabled: Yes

Additionally, the S3 service can be disabled entirely with the following CLI syntax:

# isi services s3 disable

The service 's3' has been disabled.

Or from the WebUI under Protocols > S3 > Global settings:

 

OneFS Additional Security Hardening – Part 3

As mentioned in previous articles in this series, applying a hardening profile is one of multiple tasks that are required in order to configure a STIG-compliant PowerScale cluster. These include:

Component Tasks
Audit Configure remote syslog servers for auditing.
Authentication Configure secure auth provider, SecurityAdmin account, and default restricted shell.
CELOG Create event channel for security officers and system admin to monitor /root and /var partition usage, audit service, security verification, and account creation.
MFA & SSO Enable and configure multi-factor authentication and single sign-on.
NTP Configure secure NTP servers with SHA256 keys.
SMB Configure SMB global settings and defaults.

Enable SMB encryption on shares.

SNMP Enable SNMP and configure SNMPv3 settings.
SyncIQ Configure SyncIQ to use CA certificates so both the source and target clusters (primary and secondary DSCs) have both Server Authentication and Client Authentication set in their Extended Key Usages fields.

In this final article in the series, we’ll cover the security configuration details for SyncIQ replication using the OneFS CLI.

SyncIQ Setup

SyncIQ supports over-the-wire, end-to-end encryption for data replication, protecting and securing in-flight data between clusters. A global setting enforces encryption on all incoming and outgoing SyncIQ policies.

  1. First, on the source cluster, which is also the primary DSC (Digital Signature Certificate), add the CA (Certificate Authority) certificate(s) to certificate store.
# isi certificate authority import [ca certificate path]

Where:

Item Description
[ca certificate path] The path to the CA certificate file.

Note that SyncIQ certificates for both the source and target clusters (aka primary and secondary DSC respectively) must have both ‘Server Authentication’ and ‘Client Authentication’ set in their ‘Extended Key Usages’ fields.

Repeat as necessary, and include root and intermediate CA certificates for both the source and target, plus the OCSP (Online Certificate Status Protocol) issuer:

  • source cluster
  • target cluster
  • OCSP issuer

To prevent unauthorized access to the private key/certificate, ensure the certificate and private key files are deleted/removed once all necessary import steps have been successfully completed.

 

  1. Next, on the source cluster (primary DSC), add the source cluster certificate to the SyncIQ server certificate store. This can be accomplished with the following CLI syntax:
# isi sync certificates server import [source certificate path]

Where:

Item Description
[source certificate path] The path to the source certificate file (in PEM or DER format).
[source certificate key path] The path to the source certificate private key file.

Once again, to prevent unauthorized access to the private key/certificate, remove the certificate and private key files once import has been completed successfully.

 

  1. On the source cluster (primary DSC), set the cluster certificate to the certificate imported above.

Find certificate ID:

# isi certificate server list -v

Then configure cluster certificate ID:

# isi sync settings modify --cluster_certificate_id [certificate_id]

Where:

Item Description
[certificate id] The ID of the cluster certificate.

 

  1. On the source cluster (primary DSC) add the target cluster’s (secondary DSC) certificate as a peer certificate.
# isi sync certificates peer import [target certificate path]
Item Description
[target certificate path] The path to the target cluster/secondary DSC certificate file..

To prevent unauthorized access to the private key/certificate, remove certificate and private key files once done with all necessary import steps.

 

  1. On the source cluster (primary DSC) configure the global Open Certificate Status Protocol (OCSP) ID and address settings.
# isi sync settings modify

 --ocsp-issuer-certificate-id=[ocsp issuer certificate id]

 --ocsp-address=[OCSP server URI]

Where:

Item Description
[ocsp issuer certificate id] The ID of the certificate as registered in the PowerScale certificate manager.
[OCSP server URI] The URI of the OCSP responder.

To find the OCSP issuer certificate ID:

# isi certificate authority list -v

This assumes that the OCSP issuer certificate file has already been successfully imported into the PowerScale certificate manager.

 

  1. On the target cluster (secondary DSC), add the CA certificate(s) to the certificate store.
# isi certificate authority import [ca certificate path]

Where:

Item Description
[ca certificate path] The path to the CA certificate file.

Repeat as necessary, including the root and intermediate CA certificates for:

  • source cluster
  • target cluster
  • OCSP issuer

To prevent unauthorized access to the private key/certificate, remove certificate and private key files once done with all necessary import steps.

On the target cluster (secondary DSC), add the target cluster certificate to the SyncIQ server certificate store.

# isi sync certificates server import [target certificate path]

Where:

Item Description
[target certificate path] The path to the target certificate file (in PEM or DER format).
[target certificate key path] The path to the target certificate private key file.

To prevent unauthorized access to the private key/certificate, remove the certificate and private key files once done with all necessary import steps.

 

  1. On the target cluster (secondary DSC), set the cluster certificate to the certificate imported above.

First, retrieve the certificate ID:

# isi certificate server list -v

Next, configure the cluster certificate ID:

# isi sync settings modify --cluster_certificate_id [certificate_id]

Where:

Item Description
[certificate id] The ID of the cluster certificate

 

  1. On the target cluster (secondary DSC), add the source cluster’s (primary DSC) certificate as a peer certificate.
# isi sync certificates peer import [source certificate path]

Where:

Item Description
[source certificate path] The path to the source cluster/secondary DSC certificate file.

To prevent unauthorized access to the private key/certificate, remove certificate and private key files once done with all necessary import steps.

On the target cluster (secondary DSC), configure the  global Open Certificate Status Protocol (OCSP) settings.

# isi sync settings modify

 --ocsp-issuer-certificate-id=[ocsp issuer certificate id]

 --ocsp-address=[OCSP server URI]

Where:

Item Description
[ocsp issuer certificate id] The ID of the certificate as registered in the PowerScale certificate manager.
[OCSP server URI] The URI of the OCSP responder.

To find the OCSP issuer certificate ID:

# isi certificate authority list -v

This assumes that the OCSP issuer certificate file has already been imported into the PowerScale certificate manager.

  1. Finally, for any pre-existing policies, configure the following OCSP settings on the source cluster (primary DSC).
# isi sync policies modify [policy name]

 --ocsp-issuer-certificate-id=[ocsp issuer certificate id]

 --ocsp-address=[OCSP server URI]

Where:

Item Description
[ocsp issuer certificate id] The ID of the certificate as registered in the PowerScale certificate manager.

To find the OCSP issuer certificate ID:

# isi certificate authority list -v

At this point, the SyncIQ certificate configuration work should be complete.

OneFS Additional Security Hardening – Part 2

As mentioned in previous articles in this series, applying a hardening profile is one of multiple tasks that are required in order to configure a STIG-compliant PowerScale cluster. These include:

Component Tasks
Audit Configure remote syslog servers for auditing.
Authentication Configure secure auth provider, SecurityAdmin account, and default restricted shell.
CELOG Create event channel for security officers and system admin to monitor /root and /var partition usage, audit service, security verification, and account creation.
MFA & SSO Enable and configure multi-factor authentication and single sign-on.
NTP Configure secure NTP servers with SHA256 keys.
SMB Configure SMB global settings and defaults.

Enable SMB encryption on shares.

SNMP Enable SNMP and configure SNMPv3 settings.
SyncIQ Configure SyncIQ to use CA certificates so both the source and target clusters (primary and secondary DSCs) have both Server Authentication and Client Authentication set in their Extended Key Usages fields.

In this article, we’ll cover the specific configuration requirements and details of the NTP, SMB, SNMP components using the OneFS CLI.

NTP Setup

  1. When implementing a secure configuration for the OneFS NTP service, create an NTP key file and populate it with NTP server key hashes.

To add secure NTP servers to the OneFS configuration, first create an NTP keys file. This can be accomplished via the following CLI syntax:

# echo "[key index] sha256 [SHA hash]" > [keyfile]

Where:

Item Description
[key index] The index (increasing from 1) of the key hash.
[SHA hash] The SHA256 hash identifying the NTP server.
[keyfile] The path to the NTP key file.

Append as many additional key entries as are necessary. The ntp.keys(5) man page provides detailed information on the NTP key file format.

  1. Next, configure OneFS to use this NTP key file.
# isi ntp settings modify --key-file /ifs/ntp.keys
  1. The following CLI syntax can be used to configure NTP servers.
# isi ntp servers create [server hostname/IP] --key [key index]

Where:

Item Description
[server hostname/IP] The fully qualified domain name (FQDN) or IP address of the NTP server.
[key index] The key used by this particular server in the NTP keys file configured above..

Note that STIG requirements explicitly state that more than one (1) NTP server is required for compliance.

SMB setup

  1. Deploying SMB in a hardened environment typically involves enabling SMB3 encryption, security signatures, and disabling unencrypted access to shares. To accomplish this, first configure the global settings and defaults as follows.
# isi smb settings global modify --support-smb3-encryption true
 --enable-security-signatures true --require-security-signatures true
 --reject-unencrypted-access true


# isi_gconfig registry.Services.lwio.Parameters.Drivers.srv.SupportSmb1=0


# isi_gconfig registry.Services.lwio.Parameters.Drivers.rdr.Smb1Enabled=0
  1. Next, update the per-share SMB settings to enable SMB encryption.
# isi smb shares modify [share_name] --smb3-encryption-enabled true

SNMP Setup

  1. The following CLI command can be used to enable the OneFS SNMP v3 service and configure its settings and password.
# isi snmp settings modify --service=true --snmp-v3-access=true --snmp-v3-password=[password]

In the next and final article in this series, we’ll focus on the remaining topic in the list:

Namely secure SyncIQ configuration.

OneFS Additional Security Hardening – Part 1

When configuring security hardening on OneFS 9.5 or later, one thing to note is that, even with the STIG profile activated, not all the rules are automatically marked as ‘applied’. Specifically:

# isi hardening report view STIG | grep “Not Applied”

check_stig_celog_alerts                        Cluster   Not Applied Military Unique Deployment Guide manually configured CELOG settings.

check_synciq_default_ocsp_settings             Cluster   Not Applied /sync/settings/:cluster_certificate_id

check_synciq_policy_ocsp_settings              Cluster   Not Applied /sync/policies/:ocsp_issuer_certificate_id

check_multiple_ntp_servers_configured          Cluster   Not Applied /protocols/ntp/servers:total

set_auth_webui_sso_mfa_idp                     Cluster   Not Applied auth/providers/saml-services/idps/System

set_auth_webui_sso_mfa_sp_host                 Cluster   Not Applied auth/providers/saml-services/sp?zone=System:hostname

Applying a hardening profile is one of multiple tasks that are required in order to configure a STIG-compliant PowerScale cluster. These include:

Component Tasks
Audit Configure remote syslog servers for auditing.
Authentication Configure secure auth provider, SecurityAdmin account, and default restricted shell.
CELOG Create event channel for security officers and system admin to monitor /root and /var partition usage, audit service, security verification, and account creation.
MFA & SSO Enable and configure multi-factor authentication and single sign-on.
NTP Configure secure NTP servers with SHA256 keys.
SMB Configure SMB global settings and defaults.

Enable SMB encryption on shares.

SNMP Enable SNMP and configure SNMPv3 settings.
SyncIQ Configure SyncIQ to use CA certificates so both the source and target clusters (primary and secondary DSCs) have both Server Authentication and Client Authentication set in their Extended Key Usages fields.

Over the course of the next two blog articles, we’ll cover the specific configuration requirements and details of each of these components via the OneFS CLI.

In this article, we’ll focus on the following tasks:

Audit Setup

  1. To set up secure auditing, first configure the remote syslog server(s). Note that, while the configuration differentiates between configuration, protocol, and system auditing, these can be sent to the same central syslog server(s). When complete, these syslog servers can be added to the OneFS audit configuration via the following CLI syntax:
# isi audit settings global modify --config-syslog-servers=[server FQDN/IP] --protocol-syslog-servers=[server FQDN/IP] --system-syslog-servers=[server FQDN/IP]
  1. Also consider adding the cluster certificate to the audit settings for mutual Transport Layer Security (TLS) authentication.
# isi audit certificates syslog import [certificate_path] [key_path]

To prevent unauthorized access to the private key/certificate, the recommendation is to remove certificate and private key files once the necessary import steps have been completed.

Authentication Setup

  1. Set the default shell for any new users created in the Local Provider.
# isi auth local modify System --login-shell=/usr/local/restricted_shell/bin/restricted_shell.py
  1. Next, configure the remote authentication provider. This could be Kerberos, LDAP, or Active Directory. For more information, see the OneFS 9.5 CLI Administration Guide.

Note that all Active Directory users must have an e-mail address configured for them for use with ADFS multi-factor authentication (MFA).

Every Active Directory user must have a home directory created on the cluster, containing the correct public key in ~/.ssh/authorized_keys for the certificate presented by SSH clients (SecureCRT, PuTTY-CAC, etc).

If using Active Directory, the recommendation is to enable LDAP encryption, commonly referred to as ‘LDAP sign and seal’. For example:

# isi auth ads modify [provider-name] --ldap-sign-and-seal true

Additionally, the ‘machine password lifespan’ should be configured to a value of 60 days or less:

# isi auth ads modify [provider-name] --machine-password-lifespan=60D

Where [provider-name] is the name of the chosen Active Directory provider.

  1. Finally, identify a remote-authenticated user and assign them administrative privileges.
# isi auth roles modify SecurityAdmin --add-user [username]

# isi auth roles modify SystemAdmin --add-user [username]

Where [username] is the name of the chosen administrative user.

CELOG Setup

  1. For CELOG security setup, create and event channel for the required ISSO/SA alerts and configure appropriate event thresholds.

The following events need to send alerts on a channel monitored by an organization’s Information Systems Security Officers (ISSOs) or System Administrators (SAs):

Event ID Event
100010001 The /var partition is near capacity.
100010002 The /var/crash partition is near capacity.
100010003 The root partition is near capacity.
400160002 Audit system cannot provide service.
400160005 Audit daemon failed to persist events.
400200001 Security verification check failed.
400200002 Security verification successfully ran.
400260000 User account(s) created/updated/removed.

The event channel can be created as follows:

# isi event channels create [channel name] [type] [options]

Next, the thresholds for the above event IDs can be set:

# isi event thresholds modify 100010001 --info 74 --warn 75

# isi event thresholds modify 100010002 --warn 75

# isi event thresholds modify 100010003 --warn 75

# isi event alerts create [event name 1] NEW [channel name]

 --eventgroup 100010001  --eventgroup 100010002 --eventgroup 100010003

 --eventgroup 400160002 --eventgroup 400160005 --eventgroup 400200001

 --eventgroup 400200002 --eventgroup 400260000



# isi event alerts create [event name 2] SEVERITY_INCREASE [channel name]

 --eventgroup 100010001 --eventgroup 100010002 --eventgroup 100010003

 --eventgroup 400160002 --eventgroup 400160005 --eventgroup 400200001

 --eventgroup 400200002 --eventgroup 400260000

Where:

Item Description
[channel name] The name of the newly configured event channel.
[event name 1] and [event name 2] The names of the events that will trigger alerts when a new event occurs or when an event increases in severity, respectively.
Multi-Factor Authentication (MFA)/Single Sign-On (SSO) Setup

  1. First, configure the SSO service provider. This can be achieved as follows:
# isi auth sso sp modify --hostname=[node IP or cluster FQDN]

Where [node IP or cluster FQDN] is the IP address of a node in the PowerScale cluster or the fully qualified domain name (FQDN) of the PowerScale cluster.

  1. Next, configure the Identity Provider (IdP) as follows:
# isi auth sso idps create [name] [options]
  1. Enable MFA/SSO.
# isi auth sso settings modify --sso-enabled=true

At this point, we’ve covered the configuration and setup of the first four components in the list.

In the next article in this series, we’ll focus on the remaining topics:

Namely secure NTP, SMB, SNMP, and SyncIQ configuration.

 

OneFS Security Hardening – Management and Troubleshooting

In the previous article, we took a look at the preparation and activation of OneFS security hardening. Now we turn out attention to its management and troubleshooting.

Once the STIG profile has been successfully activated, the bulk of the administrative attention is typically then focused on monitoring. However, applying a hardening profile is one of multiple steps needed to configure a truly STIG-compliant PowerScale cluster. These additional components include:

Component Tasks
Audit Configure remote syslog servers for auditing.
Authentication Configure secure auth provider, SecurityAdmin account, and default restricted shell.
CELOG Create event channel for security officers and system admin to monitor /root and /var partition usage, audit service, security verification, and account creation.
MFA Enable and configure multi-factor authentication.
NTP Configure secure NTP servers with SHA256 keys.
SMB Configure SMB global settings and defaults.

Enable SMB encryption on shares.

SNMP Enable SNMP and configure SNMPv3 settings.
SSO Configure single sign-on.
SyncIQ Configure SyncIQ to use CA certificates so both the source and target clusters (primary and secondary DSCs) have both Server Authentication and Client Authentication set in their Extended Key Usages fields.

We will cover the above topics and related tasks in detail in the next article in this series.

When hardening is activated, the security hardening engine reads the STIG configuration from its config files. Sets of rules, or config items, are applied to the hardening configuration to increase security and/or ensure STIG compliance. These rules are grouped by profile, which contain collections of named rules. Profiles are now stored in separate .xml files under /etc/isi_hardening/profiles.

Currently there is just one profile available (STIG), but the infrastructure is in place to support additional profiles as an when they are required.

Similarly, the individual rules are stored in separate .xml files under /etc/isi_hardening/rules.

In OneFS 9.5 and later, these rules are grouped by functional area affected vs. release, and can now apply to platform API configuration ‘collections’. For example, a rule can be applied to all NFS exports or all SyncIQ policies. In addition to actionable rules, ‘check-only’ rules are supported which apply no changes.

As may be apparent by now, OneFS security hardening is currently managed from the CLI and platform API only. There is currently no WebUI area for hardening configuration.

Note that is strongly advised not to perform administrative actions until after STIG security profile activation has completed across all nodes in the cluster.

When it comes to troubleshooting the enablement and operation of a STIG hardening profile on a cluster, there are a cope of useful places to look.

The first step is to check the hardening report. In OneFS 9.5 and later, the hardening engine reporting infrastructure enables detailed reports to be generated that indicate which hardening rules are applied or not – as well as the cluster’s overall compliance status. For example, from the following non-compliant cluster:

# isi hardening reports create

...............Hardening operation complete.

# isi hardening reports list

Name  Applied  Status        Creation Date            Report Age

-----------------------------------------------------------------

STIG  No       Not Compliant Sat Apr 22 04:28:40 2023 2m1s

-----------------------------------------------------------------

Total: 1

# isi hardening reports view STIG | more

Name                              Location  Status      Setting

----------------------------------------------------------------------------------------------

logout_zsh_clear_screen           Node 1    Not Applied /etc/zlogout

logout_profile_clear_screen       Node 1    Not Applied /etc/profile

logout_csh_clear_screen           Node 1    Not Applied /etc/csh.logout

require_password_single_user_mode Node 1    Not Applied /etc/ttys

set_password_min_length_pam_01    Node 1    Not Applied /etc/pam.d/system

set_password_min_length_pam_02    Node 1    Not Applied /etc/pam.d/other

set_password_min_length_pam_03    Node 1    Not Applied /etc/pam.d/passwd

set_password_min_length_pam_04    Node 1    Not Applied /etc/pam.d/passwd

disable_apache_proxy              Node 1    Not Applied /etc/mcp/templates/isi_data_httpd.conf

disable_apache_proxy              Node 1    Not Applied /etc/mcp/templates/isi_data_httpd.conf

disable_apache_proxy              Node 1    Not Applied /etc/mcp/templates/isi_data_httpd.conf

set_shell_timeout_01              Node 1    Not Applied /etc/profile

set_shell_timeout_02              Node 1    Applied     /etc/zshrc

set_shell_timeout_03              Node 1    Not Applied /etc/zshrc

set_shell_timeout_04              Node 1    Not Applied /etc/csh.cshrc

set_dod_banner_02                 Node 1    Not Applied symlink:/etc/issue

check_node_default_umask          Node 1    Applied     umask

--More—(byte 2185)

As indicated in the truncated output above, the vast majority of the hardening elements on node 1 of this cluster have not been successfully applied. Note that these reports can be generated regardless of cluster hardening status.

Next, there are a couple of logfiles that can often yield useful clues and information:

  • /var/log/hardening.log
  • /var/log/hardening_engine.log

After scrutinizing these logfiles, manually running the hardening worker process and monitoring its output (stdout and stderr) is typically the next course of action.

The hardening service can be shut down if it’s running as follows:

# isi services -a | grep -i hard

   hardening_service    Hardening Service                        Enabled

# isi services -a hardening_service disable

# ps -auxw | grep -i hard

#

The following syntax can be used to manually run the hardening worker process on a node:

# /usr/bin/isi_hardening/hardening_worker.py --profile STIG --action <action>

In the above, ‘<action>’ can be one of ‘apply’, ‘defaults’, or ‘report_create’.

De-activating Hardening

After applying the STIG hardening profile to a OneFS 9.5 or later cluster, it is possible to re-apply the default (non-hardened) configuration with the following CLI syntax, which will undo the changes that hardening invoked. For example:

# isi hardening disable STIG

.........Hardening operation complete.

Note that with OneFS 9.5 and earlier, the ‘disable’ attempts to undo the effects of hardening, but does not guarantee a full restore of a prior cluster configuration.

This differs from the OneFS 9.4 and earlier, which uses the ‘isi hardening revert’ CLI command and process:

# isi hardening revert

Revert Started

This may take several minutes

……

Reverting Hardening profile successful

#

# isi hardening status

Cluster Name:  TME1

Hardening Status:  Not Hardened

Note that OneFS 9.5 security hardening does not support CEE (as well as ICAP and a number of other protocols). Specifically, CEE currently does not support TLS, and the STIG hardening profile disables non-TLS communications.

The STIG security profile In OneFS 9.5 and later automatically enables the host-based firewall, enforcing the specific ports and protocols required for the various cluster services. Ensure that the default OneFS port numbers and protocols do not conflict with any custom port or protocol configuration.

After the STIG security profile is activated on a cluster, certain STIG rules do not apply. For example, when a new user is added to the cluster, some values are system defaults rather than STIG defaults. After configuration changes are complete, reapply the STIG security profile for STIG defaults. Run a new hardening report to confirm if any parameters are no longer STIG compliant. The recommendation is to check the hardening report frequently, especially after running administrative commands, to detect whether any parameters are no longer STIG compliant.

Additionally, the following table lists the components and protocols that are not currently supported by OneFS 9.5 STIG hardening or do not meet FIPS compliance requirements:

Protocol / Component Detail
SMB without encryption Without SMB3 encryption enabled, the SMB protocol relies on weak cryptography, and does not meet FIPS requirements.
NFS without krb5p The NFS protocol without Kerberos (krb5p) relies on weak cryptography, and does not meet FIPS requirements.
HDFS The Hadoop HDFS protocol relies on weak cryptography, and does not meet FIPS requirements.
S3 via HTTP The S3 protocol over HTTP relies on weak cryptography, and does not meet FIPS requirements.
CEE Dell Common Event Enabler (CEE) relies on weak cryptography, and does not meet FIPS requirements.
ICAP/CAVA ICAP and CAVA antivirus protocols rely on weak cryptography, and do not meet FIPS requirements.
NIS NIS protocol relies on weak cryptography, and does not meet FIPS requirements.
SFTP SFTP protocol lacks file audit capabilities.
SmartLock A cluster configured for SmartLock compliance mode is incompatible with STIG hardening.

OneFS Security Hardening – Application and Activation

In the first article in this series, we took a look at the architecture and enhancements to security hardening in OneFS 9.5. Now we turn out attention to its preparation, configuration, and activation.

Applying a hardening profile is just one of multiple steps required in order to configure a STIG-compliant PowerScale cluster.

OneFS 9.5 security hardening comes pre-installed on a cluster, but not activated by default. Hardening is a licensed feature, and there are no changes to the licensing requirements or structure for OneFS 9.5 and later.

Applying a hardening profile is one of multiple steps required in order to configure a STIG-compliant PowerScale cluster. As such, the general process to apply and activate security hardening on a OneFS 9.5 or later cluster is as follows:

The specifics for each step are covered below:

  1. Revert hardening on a cluster running OneFS 9.4 or earlier prior to upgrade.

Upgrading from a STIG-hardened OneFS 9.4 or earlier cluster to OneFS 9.5 and later is not supported:

Cluster Type Upgrade Details
Non-hardened cluster Upgrade to OneFS 9.5 on non-STIG hardened clusters is straightforward.
Hardened cluster Upgrade from a STIG-hardened pre-OneFS 9.5 cluster to OneFS 9.5 is not supported. Revert cluster to a non-hardened state prior to upgrade to OneFS 9.5.

As such, if the cluster currently has hardening enabled, this must be reverted before upgrading to OneFS 9.5 or later.

To accomplish this, first, log in to the cluster’s CLI with a user account with the ‘ISI_PRIV_HARDENING’ RBAC role.

OneFS security hardening requires a license in order to be activated. If it is licensed, hardening can be applied as follows:

# isi hardening apply STIG
Apply Started
This may take several minutes
……
Applied Hardening profile successfully.
#

Once applied, hardening can be verified as follows:

# isi hardening status
Cluster Name:  TME1
Hardening Status:  Hardened
Profile:  STIG
Following is the nodewise status:
TME1-1 :  Enabled
TME1-2 :  Enabled
TME1-3 :  Enabled
TME1-4 :  Enabled

Hardening can be easily removed on clusters running OneFS 9.4 or earlier:

# isi hardening revert
Revert Started
This may take several minutes
……
Reverting Hardening profile successful
#
# isi hardening status
Cluster Name:  TME1
Hardening Status:  Not Hardened
  1. Upgrade cluster to OneFS 9.5 or later.

The cluster must be running OneFS 9.5 or later in order to activate STIG hardening. If upgrading from an earlier release, the OneFS 9.5 or later upgrade must be committed before enabling hardening.

Upgrading a cluster on which security hardening has not been activated to OneFS 9.5 or later is straightforward and can be accomplished either by a simultaneous or rolling reboot strategy.

For example, to start a rolling upgrade, which is the default, run:

# isi upgrade cluster start <upgrade_image>

Similarly, the following CLI syntax will initiate a simultaneous upgrade:

# isi upgrade cluster start --simultaneous <upgrade_image>

Since OneFS supports the ability to roll back to the previous version, in-order to complete an upgrade it must be committed.

# isi upgrade cluster commit

The isi upgrade view CLI command can be used to monitor how the upgrade is progressing:

# isi upgrade view

Or, for an interactive session:

# isi upgrade view --interactive
  1. Install hardening license.

To enable STIG hardening on versions prior to OneFS 9.5, first check that hardening is licensed on the cluster:

# isi license list | grep -i harden
HARDEN      4 Nodes     4 Nodes     Evaluation

A hardening license can be added as follows:

# isi license add <path_to_licenese_file>

Alternatively, a 90-day trial license can be activated on a lab/test cluster to evaluate STIG hardening:

# isi license add --evaluation HARDENING

If a current OneFS hardening license is not available when attempting to activate security hardening on a cluster, the following warning will be returned:

# isi hardening apply STIG

The HARDENING application is not currently installed. Please contact your Isilon account team for more information on evaluating and purchasing HARDENING.
  1. Configure compliant password hash.

Before activating security hardening with the STIG profile, the password hash type should be set to use SHA512. For example:

# isi auth file modify System --password-hash-type=SHA512

NTLM support and authentication for all file protocols has been disabled for this provider due to change of password hash type.

# isi auth local modify System --password-hash-type=SHA512

Next, the account of last resort (ALR), which is ‘root’ on a PowerScale cluster, should be set to use this updated password hash type.

# isi auth users change-password root

If this step is skipped, attempts to apply hardening will fail with the following warning:

The hardening request was not accepted:

Account of last resort does not have a password set with a supported hash type (SHA256, SHA512): root.

The hardening profile was not applied.

Please see the Security Configuration Guide for guidance on how to set compatible account passwords.

The SSH key exchange algorithms should also be updated at this time with the following CLI syntax:

# isi ssh settings modify --kex-algorithms 'diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,ecdh-sha2-nistp384'

Finally, update the SSH ciphers as follows:

# isi ssh settings modify --ciphers 'aes256-ctr,aes256-gcm@openssh.com'
  1. Activate STIG hardening.

The next step involves actually applying the STIG hardening profile. This can be accomplished as follows:

# isi hardening apply STIG
..............Hardening operation complete.

Note that password restrictions are only enforced for password changes that occur after applying hardening.

After applying the STIG hardening profile, it is possible to re-apply the default (non-hardened) configuration with the following CLI syntax, which will undo the changes that hardening invoked. For example:

# isi hardening disable STIG
.........Hardening operation complete.

Note that with OneFS 9.5 and earlier, the ‘disable’ attempts to undo the effects of hardening, but does not guarantee a full restore of a prior cluster configuration. This differs from the OneFS 9.4 and earlier hardening ‘isi hardening revert’ CLI command and process described in ‘step 1’ above.

  1. Verify hardening configuration.

Finally, verify that the STIG hardening configuration was successful. This will be indicated by a status of ‘Applied’. For example:

# isi hardening list
Name  Description                       Status
-----------------------------------------------
STIG  Enable all STIG security settings Applied
-----------------------------------------------
Total: 1

Additionally, a report can be generated that provides a detailed listing of all the individual rules and their per-node status. For example:

# isi hardening report view STIG
logout_zsh_clear_screen           Node 1   Applied     /etc/zlogout                       
logout_profile_clear_screen       Node 1   Applied     /etc/profile                       
logout_csh_clear_screen           Node 1   Applied     /etc/csh.logout                    
require_password_single_user_mode Node 1   Applied     /etc/ttys                           
set_password_min_length_pam_01    Node 1   Applied     /etc/pam.d/system                  
set_password_min_length_pam_02    Node 1   Applied     /etc/pam.d/other                   
set_password_min_length_pam_03    Node 1   Applied     /etc/pam.d/passwd                  
set_password_min_length_pam_04    Node 1   Applied     /etc/pam.d/passwd                  
disable_apache_proxy              Node 1   Applied     /etc/mcp/templates/isi_data_httpd.conf
disable_apache_proxy              Node 1   Applied     /etc/mcp/templates/isi_data_httpd.conf
disable_apache_proxy              Node 1   Applied     /etc/mcp/templates/isi_data_httpd.conf
set_shell_timeout_01              Node 1   Applied     /etc/profile                       
set_shell_timeout_02              Node 1   Applied     /etc/zshrc                         
set_shell_timeout_03              Node 1   Applied     /etc/zshrc                         
set_shell_timeout_04              Node 1   Applied     /etc/csh.cshrc                      
set_dod_banner_02                 Node 1   Applied     symlink:/etc/issue                 
check_node_default_umask          Node 1   Applied     umask                              
set_celog_snmp_use_fips                        Cluster   Applied     N/A                   
disable_supportassist                          Cluster   Applied     -                     
disable_usb_ports                              Cluster   Applied     /security/settings:usb_ports_disabled
disable_ndmpd                                  Cluster   Applied     /protocols/ndmp/settings/global:service
enable_smtp_ssl                                Cluster   Applied     /1/cluster/email:smtp_auth_security
enable_onefs_cli                               Cluster   Applied     /security/settings:restricted_shell_enabled
set_min_password_percent_of_characters_changed Cluster   Applied     /16/auth/providers/local:password_percent_changed
set_ads_ldap_sign_and_seal                     Cluster   Applied     -                     
set_ads_ldap_sign_and_seal_default             Cluster   Applied     registry.Services.lsass.Parameters.Providers.ActiveDirectory.LdapSignAndSeal
set_ads_machine_password_changes               Cluster   Applied     -                     
limit_ads_machine_password_lifespan            Cluster   Applied     -                     
enable_firewall                                Cluster   Applied     /network/firewall/settings:enabled
disable_audit_log_delete                       Cluster   Applied     /ifs/.ifsvar/audit/log_delete
set_audit_retention_period                     Cluster   Applied     /audit/settings/global:retention_period
disable_webui_access_ran                       Cluster   Applied     webui_ran_access      
set_ssh_config_client_alive_interval           Cluster   Applied     client_alive_interval 
set_ssh_config_client_alive_count              Cluster   Applied     client_alive_count_max
set_nfs_security_flavors                       Cluster   Applied     /protocols/nfs/exports:security_flavors
set_nfs_security_flavors                       Cluster   Applied     /protocols/nfs/exports:security_flavors
set_nfs_security_flavors                       Cluster   Applied     /protocols/nfs/exports:security_flavors
set_nfs_security_flavors                       Cluster   Applied     /protocols/nfs/exports:security_flavors
set_nfs_security_flavors                       Cluster   Applied     /protocols/nfs/exports:security_flavors
set_nfs_default_security_flavors               Cluster   Applied     /protocols/nfs/settings/export:security_flavors
set_nfs_default_security_flavors               Cluster   Applied     /protocols/nfs/settings/export:security_flavors
set_nfs_default_security_flavors               Cluster   Applied     /protocols/nfs/settings/export:security_flavors
set_nfs_default_security_flavors               Cluster   Applied     /protocols/nfs/settings/export:security_flavors
set_nfs_default_security_flavors               Cluster   Applied     /protocols/nfs/settings/export:security_flavors
set_s3_https_only                              Cluster   Applied     /protocols/s3/settings/global:https_only
check_ipmi_enabled                             Cluster   Applied     -                      
set_cnsa_crypto_http                           Cluster   Applied     cipher_suites         
set_cnsa_crypto_webui                          Cluster   Applied     cipher_suites         
disable_hdfs                                   Cluster   Applied     registry.Services.lsass.Parameters.Zones.System.HdfsEnabled
disable_webhdfs                                Cluster   Applied     registry.Services.lsass.Parameters.Zones.System.WebHdfsEnabled
disable_http_basic_authentication              Cluster   Applied     /protocols/http/settings:basic_authentication
disable_http_dav                               Cluster   Applied     /protocols/http/settings:dav
enable_http_integrated_authentication          Cluster   Applied     /protocols/http/settings:integrated_authentication
set_apache_loglevel                            Cluster   Applied     log_level              
set_apache_inactive_timeout                    Cluster   Applied     /protocols/http/settings:inactive_timeout
set_apache_session_max_age                     Cluster   Applied     /protocols/http/settings:session_max_age
disable_cee                                    Cluster   Applied     /audit/settings/global:cee_server_uris
check_stig_celog_alerts                        Cluster   Not Applied Military Unique Deployment Guide manually configured CELOG settings.
set_auth_concurrent_session_limit              Cluster   Applied     16/auth/settings/global:concurrent_session_limit
set_ldap_tls_revocation_check_level            Cluster   Applied     -                     
set_ldap_default_tls_revocation_check_level    Cluster   Applied     /auth/settings/global:default_ldap_tls_revocation_check_level
set_synciq_require_encryption                  Cluster   Applied     14/sync/settings:encryption_required
check_synciq_default_ocsp_settings             Cluster   Not Applied /sync/settings/:cluster_certificate_id
check_synciq_policy_ocsp_settings              Cluster   Not Applied /sync/policies/:ocsp_issuer_certificate_id
check_daemon_user_disabled                     Cluster   Applied     /auth/users/USER:daemon/:enabled
check_multiple_ntp_servers_configured          Cluster   Not Applied /protocols/ntp/servers:total
check_celog_smtp_channels_use_tls              Cluster   Applied     -                     
set_apache_service_timeout                     Cluster   Applied     /protocols/http/settings:service_timeout
set_dm_tls_revocation_check_level              Cluster   Applied     /datamover/certificates/settings/:revocation_setting
check_one_account_of_last_resort               Cluster   Applied     Number of UID:0 accounts configured
set_krb5_default_tgs_enctypes                  Cluster   Applied     /auth/settings/krb5/defaults:default_tgs_enctypes
set_krb5_default_tkt_enctypes                  Cluster   Applied     /auth/settings/krb5/defaults:default_tkt_enctypes
set_krb5_permitted_enctypes                    Cluster   Applied     /auth/settings/krb5/defaults:permitted_enctypes
set_krb5_preferred_enctypes                    Cluster   Applied     /auth/settings/krb5/defaults:preferred_enctypes
set_local_lockouts_duration                    Cluster   Applied     /auth/providers/local/:lockout_duration
set_local_lockouts_threshold                   Cluster   Applied     /auth/providers/local/:lockout_threshold
set_local_lockouts_window                      Cluster   Applied     /auth/providers/local/:lockout_window
set_local_max_password_age                     Cluster   Applied     /auth/providers/local/:max_password_age
set_local_min_password_age                     Cluster   Applied     /auth/providers/local/:min_password_age
set_local_password_chars_changed               Cluster   Applied     /auth/providers/local/:min_password_length
set_local_max_inactivity                       Cluster   Applied     /auth/providers/local/:max_inactivity_days
set_global_failed_login_delay                  Cluster   Applied     /auth/settings/global:failed_login_delay_time
set_ldap_require_secure_connection             Cluster   Applied     -                     
set_ldap_do_not_ignore_tls_errors              Cluster   Applied     -                     
set_ldap_tls_protocol_min_version              Cluster   Applied     -                     
set_ldap_ntlm_support                          Cluster   Applied     -                     
disable_nis                                    Cluster   Applied     -                     
disable_duo                                    Cluster   Applied     /auth/providers/duo/:enabled
set_ntlm_support_file                          Cluster   Applied     /auth/providers/file/:ntlm_support
check_password_hashes                          Cluster   Applied     lsa-file-provider:System:root password hash
set_file_enabled                               Cluster   Applied     /auth/users/<USER>:enabled
set_local_disabled_when_inactive               Cluster   Applied     /auth/users/<USER>:disabled_when_inactive
set_local_disabled_when_inactive_default       Cluster   Applied     registry.Services.lsass.Parameters.Providers.Local.DefaultDisableWhenInactive
set_auth_webui_sso_mfa_enabled                 Cluster   Applied     auth/providers/saml-services/settings?zone=System:sso_enabled
set_auth_webui_sso_mfa_idp                     Cluster   Not Applied auth/providers/saml-services/idps/System
set_auth_webui_sso_mfa_sp_host                 Cluster   Not Applied auth/providers/saml-services/sp?zone=System:hostname
set_auth_webui_sso_mfa_required                Cluster   Applied     authentication_mode   
disable_remotesupport                          Cluster   Applied     /auth/users/USER:remotesupport/:enabled
enable_audit_1                                 Cluster   Applied     /audit/settings/global:protocol_auditing_enabled
enable_audit_2                                 Cluster   Applied     /audit/settings:syslog_forwarding_enabled
disable_vsftpd                                 Cluster   Applied     /protocols/ftp/settings:service
disable_snmpv1_v2                              Cluster   Applied     5/protocols/snmp/settings:snmp_v1_v2c_access
set_snmp_v3_auth_protocol_sha                  Cluster   Applied     5/protocols/snmp/settings:snmp_v3_auth_protocol
disable_srs                                    Cluster   Applied     /esrs/status:enabled  
set_password_min_length                        Cluster   Applied     /auth/providers/local/:min_password_length
set_min_password_complexity                    Cluster   Applied     /auth/providers/local/:password_complexity
set_password_require_history                   Cluster   Applied     /auth/providers/local/:password_history_length
disable_coredump_minidump                      Cluster   Applied     /etc/mcp/templates/sysctl.conf
set_dod_banner_01                              Cluster   Applied     /cluster/identity:motd_header
set_listen_on_ip_controlpath                   Cluster   Applied     listen_on_ip           
set_listen_on_ip_datapath                      Cluster   Applied     listen_on_ip          
enable_fips_mode                               Cluster   Applied     /security/settings:fips_mode_enabled
disable_kdb                                    Cluster   Applied     /etc/mcp/templates/sysctl.conf
disable_basic_auth                             Cluster   Applied     auth_basic            
disable_cava                                   Cluster   Applied     /avscan/settings:service_enabled
require_smb3_encryption                        Cluster   Applied     /protocols/smb/settings/global:support_smb1

OneFS Security Hardening

Over the course of the last few months, the topics for these blog articles have primarily focused on cluster security, and the variety of supporting features and enhancements that OneFS 9.5 introduced to this end. These include:

Component Enhancement
Cryptography FIPS 140-2 data-in-flight encryption for major protocols, FIPS 140-2 data at rest through SEDs, SEDs master key rekey, and TLS 1.2 support.
Public Key Infrastructure Common Public Key Infrastructure (PKI) library, providing digital signature and encryption capabilities.
Certificates PKI to issue, maintain, and revoke public key certificates.
Firewall Host-based firewall, permitting restriction of the management interface to a dedicated subnet and hosts to specified IP pools.
Audit OneFS system configuration auditing via CEE.
Authentication Multifactor authentication (MFA), single sign-on (SSO) through SAML for the WebUI, and PKI-based authentication.
HTTP HTTP Service Separation.
IPv6 IPV6-only network support for the USGv6R1 standard.
Restricted Shell Secure shell with limited access to cluster command line utilities. Eliminates areas where commands and scripts could be run and files modified maliciously and unaudited.

While these features and tools can be activated, configured, and controlled manually, they can also be enabled automatically by a OneFS security policy, under the purview of the OneFS Hardening Engine.

While security hardening has been a salient part of OneFS since 7.2.1, the underlying infrastructure saw a significant redesign and augmentation in OneFS 9.5. A primary motivation for this overhaul was to comply with the current stringent US Federal security mandates and ready PowerScale for inclusion in the Department of Defense Information Networks (DoDIN) Approved Product List (APL). Specifically, compliance with the applicable DoD Security Requirements Guides (SRGs) and Security Technical Implementation Guides (STIGs).

While retaining its legacy functionality, the enhanced security hardening functionality in OneFS 9.5 enhances both scope, scale, and accountability.  The basic hardening architecture is as follows:

When hardening is activated, the security hardening engine reads the STIG configuration from its config files. Sets of rules, or config items, are applied to the hardening configuration to increase security and/or ensure STIG compliance. These rules are grouped by profile, which contain collections of named rules. Profiles are now stored in separate .xml files under /etc/isi_hardening/profiles.

# ls /etc/isi_hardening/profiles

profile_stig.xml

Currently there is just one profile available (STIG), but the infrastructure is in place to support additional profiles as an when they are required. As of OneFS 9,5, the STIG profile contains over 100 rules.

Similarly, the individual rules are stored in separate .xml files under /etc/isi_hardening/rules.

# ls /etc/isi_hardening/rules

rules_apache.xml        rules_celog.xml         rules_pki_ocsp.xml

rules_audit.xml         rules_fips.xml          rules_shell_timeout.xml

rules_auth.xml          rules_misc.xml          rules_umask.xml

rules_banners.xml       rules_password.xml

In OneFS 9.5 and later, these rules are grouped by functional area affected (as opposed to by release in earlier versions), and can now apply to platform API configuration ‘collections’. For example, a rule can be applied to all NFS exports or all SyncIQ policies. In addition to actionable rules, ‘check-only’ rules are supported which apply no changes.

The new OneFS 9.5 rules are also smarter, and now allow comparator logic in addition to the previous equality. For example, the new rules can evaluate conditions like whether a string is empty or non-empty, and if a given timeout is greater or equal to the required value.

Examples of STIG hardening rules include:

Functional Area Rule Description
Firewall Enables the OneFS firewall.
WebUI Forces the OneFS WebUI to listen on a specific IP address.
Restricted Shell Enforces the use of the restricted shell.
WebDAV Disables WebDAV HTTP filesystem access.
SyncIQ Enabling encrypted transport for all SyncIQ replication policies.

For example:

# cat /etc/isi_hardening/profile_stig.xml

<?xml version="1.0" encoding="UTF-8"?>

<Profiles version="1">

    <Profile>

        <Name>STIG</Name>

        <Description>Enable all STIG security settings</Description>

        <Rule>set_celog_snmp_use_fips</Rule>

        <Rule>disable_supportassist</Rule>

        <Rule>disable_usb_ports</Rule>

        <Rule>disable_ndmpd</Rule>

        <Rule>enable_smtp_ssl</Rule>

        <Rule>enable_onefs_cli</Rule>

        <Rule>set_min_password_percent_of_characters_changed</Rule>

        <Rule>set_ads_ldap_sign_and_seal</Rule>

        <Rule>set_ads_ldap_sign_and_seal_default</Rule>

        <Rule>set_ads_machine_password_changes</Rule>

        <Rule>limit_ads_machine_password_lifespan</Rule>

        <Rule>enable_firewall</Rule>

        <Rule>disable_audit_log_delete</Rule>

        <Rule>set_audit_retention_period</Rule>

        <Rule>disable_webui_access_ran</Rule>

        <Rule>set_ssh_config_client_alive_interval</Rule>

        <Rule>set_ssh_config_client_alive_count</Rule>

        <Rule>set_nfs_security_flavors</Rule>

<Snip>

Several enhancements have been made to the hardening engine in OneFS 9.5, the most notable of which is a significant increase in the number of rules permitted. The hardening engine also now includes a reporting component, allowing detailed reports to be generated that indicate which hardening rules are applied or not, as well as overall compliance status.  For example:

# isi hardening reports create

...............Hardening operation complete.

# isi hardening reports list

Name  Applied  Status        Creation Date            Report Age

-----------------------------------------------------------------

STIG  No       Compliant Sat Apr 22 04:28:40 2023 2m1s

-----------------------------------------------------------------

Total: 1

# isi hardening reports view STIG | more

Name                              Location  Status      Setting

----------------------------------------------------------------------------------------------

logout_zsh_clear_screen           Node 8    Applied /etc/zlogout

logout_profile_clear_screen       Node 8    Applied /etc/profile

logout_csh_clear_screen           Node 8    Applied /etc/csh.logout

require_password_single_user_mode Node 8    Not Applied /etc/ttys

set_password_min_length_pam_01    Node 8    Not Applied /etc/pam.d/system

set_password_min_length_pam_02    Node 8    Not Applied /etc/pam.d/other

set_password_min_length_pam_03    Node 8    Not Applied /etc/pam.d/passwd

set_password_min_length_pam_04    Node 8    Not Applied /etc/pam.d/passwd

disable_apache_proxy              Node 8    Not Applied /etc/mcp/templates/isi_data_httpd.conf

disable_apache_proxy              Node 8    Not Applied /etc/mcp/templates/isi_data_httpd.conf

disable_apache_proxy              Node 8    Not Applied /etc/mcp/templates/isi_data_httpd.conf

set_shell_timeout_01              Node 8    Not Applied /etc/profile

set_shell_timeout_02              Node 8    Applied     /etc/zshrc

set_shell_timeout_03              Node 8    Not Applied /etc/zshrc

set_shell_timeout_04              Node 8    Not Applied /etc/csh.cshrc

set_dod_banner_02                 Node 8    Not Applied symlink:/etc/issue

check_node_default_umask          Node 8    Applied     umask

logout_zsh_clear_screen           Node 40   Not Applied /etc/zlogout

logout_profile_clear_screen       Node 40   Not Applied /etc/profile

logout_csh_clear_screen           Node 40   Not Applied /etc/csh.logout

require_password_single_user_mode Node 40   Not Applied /etc/ttys

--More—(byte 2185)

These reports can be generated regardless of cluster hardening status.

OneFS security hardening comes pre-installed, but not activated by default, on a PowerScale cluster, and hardening cannot be uninstalled. Hardening is a licensed feature, and there are no changes to the licensing requirements or structure for OneFS 9.5 and later.

If a current license is not available, the following warning will be returned when attempting to activate security hardening on a cluster:

# isi hardening apply STIG

The HARDENING application is not currently installed. Please contact your Isilon account team for more information on evaluating and purchasing HARDENING.

Applying a hardening profile is one of multiple steps required in order to configure a STIG-compliant PowerScale cluster. In the next article in this series we’ll cover the configuration and activation of OneFS security hardening.