PowerScale F910 Platform

In this article, we’ll take a quick peek at the new PowerScale F910 hardware platform that was released last week. Here’s where this new node sits in the current hardware hierarchy:

The PowerScale F910 is the high-end all-flash platform that utilizes a dual-socket 4th gen Zeon processor with 512GB of memory and twenty four NVMe drives, all contained within a 2RU chassis. Thus, the F910 offers a generational hardware evolution, while also focusing on environmental sustainability, reducing power consumption and carbon footprint, and delivering blistering performance. This makes the F910 and ideal candidate for demanding workloads such as M&E content creation and rendering, high concurrency and low latency workloads such as chip design (EDA), high frequency trading, and all phases of generative AI workflows, etc.

An F910 cluster can comprise between 3 and 252 nodes. Inline data reduction, which incorporates compression, dedupe, and single instancing, is also included as standard to further increase the effective capacity.

The F910 is based on the 2U R760 PowerEdge server platform, with dual socket Intel Sapphire Rapids CPUs. Front-End networking options include 100/25 GbE and with 100 GbE for the Back-End network. As such, the F910’s core hardware specifications are as follows:

Attribute F910 Spec
Chassis 2RU Dell PowerEdge R760
CPU Dual socket, 24 core Intel Sapphire Rapids 6442Y @2.6GHz
Memory 512GB Dual rank DDR5 RDIMMS (16 x 32GB)
Journal 1 x 32GB SDPM
Front-end network 2 x 100GbE or 25GbE
Back-end network 2 x 100GbE
Management port LOM (LAN on motherboard)
PCI bus PCIe v5
Drives 24 x 2.5” NVMe SSDs
Power supply Dual redundant 1400W 100V-240V, 50/60Hz

These node hardware attributes can be easily viewed from the OneFS CLI via the ‘isi_hw_status’ command. Also note that, at the current time, the F910 is only available in a 512GB memory configuration.

Starting at the business end of the node, the front panel allows the user to join an F910 to a cluster and displays the node’s name once it has successfully joined:

As with all PowerScale nodes, the front panel display provides some useful current node environmentals telemetry. The ‘check’ button activates the panel and the ‘arrow’ buttons scroll to navigate, with the initial options being ‘Setup’ or View’, as below:

After selecting ‘View’, the menu presents ‘Power’ or ‘Thermal’:

Available thermal stats include BTU/hour:

Node temperature:

Air flow in cubic ft per minute (CFM):

Removing the top cover, the internal layout of the F910 chassis is as follows:

The Dell ‘Smart Flow’ chassis is specifically designed for balanced airflow, and enhanced cooling is primarily driven by four dual-fan modules. These fan modules can be easily accessed and replaced as follows:

Additionally, the redundant power supplies (PSUs) also contain their own air flow apparatus and can be easily replaced from the rear without opening the chassis. In the event of a power supply failure, the iDRAC LED on the rear panel of the node will turn orange:

Additionally, the front panel LCD display will indicate a PSU or power cable issue:

And the amber fault light on the front panel will illuminate at the end corresponding to the faulty PSU:

For storage, each PowerScale F910 node contains ten NVMe SSDs, which are currently available in the following capacities and drive styles:

Standard drive capacity SED-FIPS drive capacity SED-non-FIPS drive capacity
3.84 TB TLC 3.84 TB TLC
7.68 TB TLC 7.68 TB TLC
15.36 TB QLC Future availability 15.36 TB QLC
30.72 TB QLC Future availability 30.72 TB QLC

Note that 15.36TB and 30.72TB SED-FIPS drive options are planned for future release.

Drive subsystem-wise, the PowerScale F910 2RU chassis is fully populated with twenty four NVMe SSDs. These are housed in drive bays spread across the front of the node as follows:

The NVMe drive connectivity is across PCIe lanes, and these drives use the NVMe and NVD drivers. The NVD is a block device driver that exposes an NVMe namespace like a drive and is what most OneFS operations act upon, and each NVMe drive has a /dev/nvmeX, /dev/nvmeXnsX and /dev/nvdX device entry  and the locations are displayed as ‘bays’. Details can be queried with OneFS CLI drive utilities such as ‘isi_radish’ and ‘isi_drivenum’. For example:

# isi_drivenum

Bay  0   Unit 15     Lnum 9     Active      SN:S61DNE0N702037   /dev/nvd5

Bay  1   Unit 14     Lnum 10    Active      SN:S61DNE0N702480   /dev/nvd4

Bay  2   Unit 13     Lnum 11    Active      SN:S61DNE0N702474   /dev/nvd3

Bay  3   Unit 12     Lnum 12    Active      SN:S61DNE0N702485   /dev/nvd2

<snip>

Moving to the back of the chassis, the rear of the F910 contains the power supplies, network, and management interfaces, which are arranged as follows:

The F910 nodes are available in the following networking configurations, with a 25/100Gb ethernet front-end and 100Gb ethernet back-end:

Front-end NIC Back-end NIC F910 NIC Support
100GbE 100GbE Yes
100GbE 25GbE No
25GbE 100GbE Yes
25GbE 25GbE No

Note that, like the F710 and F210, an Infiniband backend is not supported on the F910 at the current time. Although this option will be added in due course.

These NICs and their PCI bus addresses can be determined via the ’pciconf’ CLI command, as follows:

# pciconf -l | grep mlx

mlx5_core0@pci0:23:0:0: class=0x020000 card=0x005815b3 chip=0x101d15b3 rev=0x00 hdr=0x00

mlx5_core1@pci0:23:0:1: class=0x020000 card=0x005815b3 chip=0x101d15b3 rev=0x00 hdr=0x00

mlx5_core2@pci0:111:0:0:        class=0x020000 card=0x005815b3 chip=0x101d15b3 rev=0x00 hdr=0x00

mlx5_core3@pci0:111:0:1:        class=0x020000 card=0x005815b3 chip=0x101d15b3 rev=0x00 hdr=0x00

Similarly, the NIC hardware details and drive firmware versions can be view as follows:

# mlxfwmanager
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      ConnectX6DX
  Part Number:      0F6FXM_08P2T2_Ax
  Description:      Mellanox ConnectX-6 Dx Dual Port 100 GbE QSFP56 Network Adapter
  PSID:             DEL0000000027
  PCI Device Name:  pci0:23:0:0
  Base GUID:        a088c20300052a3c
  Base MAC:         a088c2052a3c
  Versions:         Current        Available
     FW             22.36.1010     N/A
     PXE            3.6.0901       N/A
     UEFI           14.29.0014     N/A

  Status:           No matching image found

Device #2:
----------

  Device Type:      ConnectX6DX
  Part Number:      0F6FXM_08P2T2_Ax
  Description:      Mellanox ConnectX-6 Dx Dual Port 100 GbE QSFP56 Network Adapter
  PSID:             DEL0000000027
  PCI Device Name:  pci0:111:0:0
  Base GUID:        a088c2030005194c
  Base MAC:         a088c205194c
  Versions:         Current        Available
     FW             22.36.1010     N/A
     PXE            3.6.0901       N/A
     UEFI           14.29.0014     N/A

  Status:           No matching image found

Compared with its F900 predecessor, the F910 sees a number of hardware performance upgrades. These include a move to PCI Gen5, Gen 4 NVMe, DDR5 memory, Sapphire Rapids CPU, and a new software-defined persistent memory file system journal ((SPDM). Also the 1GbE management port has moved to Lan-On-Motherboard (LOM), whereas the DB9 serial port is now on a RIO card. Firmware-wise, the F910 and OneFS 9.8 require a minimum of NFP 12.0.

In terms of performance, the new F910 provides a considerable leg up on the previous generation F900. This is particularly apparent with NFSv3 streaming writes, as can be seen here:

OneFS node compatibility provides the ability to have similar node types and generations within the same node pool. In OneFS 9.8 and later, compatibility between the F910 nodes and the previous generation F900 platform is supported.

Component F900 F910
Platform R740 R760
Drives 24 x 2.5” NVMe SSD 24 x 2.5” NVMe SSD
CPU Intel Xeon 6240R (Cascade Lake) 2.4GHz, 24C Intel Xeon 6442Y (Sapphire Rapids) 2.6GHz, 24C
Memory 736GB DDR4 512GB DDR5

This compatibility facilitates the addition of individual F910 nodes to an existing node pool comprising three of more F900s if desired, rather than creating a F910 new node.

In compatibility mode with F900 nodes containing the 1.92TB drive option, the F910’s 3.84TB drives will be short stroke formatted, resulting in a 1.92TB capacity per drive.​ Also note that, while the F910 is node pool compatible with the F900, a performance degradation is experienced where the F910 is effectively throttled to match the performance envelope of the F900s. ​

PowerScale All-flash F910 Debut

Building on the success of the recent PowerScale F710 and F210 and OneFS 9.8 releases comes the widely anticipated launch of the new high-end PowerScale F-series hardware platform. This new F910 all-flash node adds significant density, capacity, and horsepower to the PowerScale all-flash family.

Based on the latest generation of Dell’s PowerEdge R760 platform, the F910 boasts a range of Gen4 NVMe SSD capacities, paired with a Sapphire Rapids CPU, a generous helping of DDR5 memory, and PCI Gen5 100GbE front and back-end network connectivity – all housed within a compact, power-efficient 2RU form factor chassis.

Here’s where these new nodes sit in the current hardware hierarchy:

This new F910 node will supersede the F900, rounding out the all-flash platform refresh, and further extending PowerScale’s price-performance and price-density envelopes.

The PowerScale F910 node offers a substantial hardware evolution from the previous generation, while also focusing on environmental sustainability, reducing power consumption and carbon footprint. Housed in a 2RU ‘Smart Flow’ chassis for balanced airflow and enhanced cooling, the F910 offers twenty four NVMe drives with 3.85 TB or 7.68 TB TLC and 15.36 TB or 31 TB QLC SSD options.

The F910 also includes in-line compression and deduplication by default, further increasing its capacity headroom and effective density. Plus, using Intel’s 4th gen Xeon ‘Sapphire Rapids’ CPUs results in 19% lower cycles-per-instruction, while PCIe Gen 5 quadruples throughput over Gen 3, and the latest DDR5 DRAM offers greater speed and bandwidth – all netting up to 90% higher performance per watt. Additionally, like the F710 and F210, the new F910 includes the new 32 GB Software Defined Persistent Memory (SDPM) file system journal, in place of NVDIMM-n in prior platforms, thereby saving a DIMM slot on the motherboard too.

On the OneFS side, the recently launched 9.8 release delivers a dramatic performance bump – particularly for the all-flash platforms. OneFS 9.8 benefits from latency-improving sharding and parallel thread handling enhancements to its locking infrastructure and protocol heads – on top of the ‘direct write’ non-cached IO boost that 9.7 delivered for the all-flash NVMe platforms.

This combination of generational hardware upgrades plus OneFS software advancements results in dramatic performance gains for the F910 – particularly for streaming reads and writes, which see a 2x or greater improvement over the prior F900 platform. This makes the F910 an ideal candidate for demanding workloads such as M&E content creation and rendering, high concurrency and low latency HPC workloads such as chip design (EDA), high frequency trading, and all phases of generative AI workflows, etc.

Scalability-wise, the F910 requires a minimum of three nodes to form a cluster (or node pool), with up to a maximum of 252 nodes, and the basic specs for the new platform includes:

Component PowerScale F910
CPU Dual–socket Intel Sapphire Rapids, 2.6GHz, 24C
Memory 512GB DDR5 DRAM
SSDs per node 24 x NVMe SSDs
Raw capacities per node 92TB to 737TB
Drive options 3.84TB, 7.68TB TLC and 15.36TB, 30.72TB QLC
Front-end network 2 x 100GbE or 25GbE
Back-end network 2 x 100 GbE

Note that the F910 also has node compatibility with its predecessor and can therefore coexist with legacy F900s within the same node pool.

In the next article, we’ll dig into the technical details of the new platform. But, in summary, when combined with OneFS 9.8, the new PowerScale all-flash F910 platform quite simply delivers on density, efficiency, flexibility, performance, scalability, and value!

OneFS SmartLog Configuration and Management

As we saw in the previous article, OneFS 9.8 introduces new SmartLog functionality to help simply and streamline PowerScale’s issue investigation and time to resolution. SmartLog optimizes the log gathering process, while also integrating with OneFS health-checking, and CELOG events and alerting. Specifically:

Activity Description
Gather • Scope of gathers can be limited by specifying one or more functional groups.

• Extends time-based gather functionality (both shorthand, ex. 2h, and timestamp)

• Allows for gathering of small and highly optimized gathers

Healthcheck • Gathers can be triggered via ‘isi healthcheck evaluations gather’ CLI command.

• Healthcheck gathers cannot be triggered for passing evaluations

CELOG • Gathers can now be triggered via `isi event groups gather `

• CELOG gathers can only be triggered for Critical and Emergency events

In addition to the OneFS command line options in support of this new functionality, the WebUI diagnostics section has also seen a significant overhaul. This can be accessed by navigating to Cluster management > Diagnostics > Gather logs.

A gather can be easily started either by clicking the WebUI ‘Start Gather’ button below:

Or via the following CLI command:

# isi diagnostics gather start

Gather started.

Finished gathers can be found in: /ifs/data/Isilon_Support/pkg

The WebUI status monitor indicates when a gather is currently underway:

Or via the CLI:

# isi diagnostics gather status

Gather is running.

Finished gathers can be found in: /ifs/data/Isilon_Support/pkg

A running gather can also be easily terminated, either by clicking the ‘Stop Gather’ button:

Or via the following CLI command:

# isi diagnostics gather stop

Gather stopped.

When complete, SmartLog writes its gather tarfile to the /ifs/data/Isilon_Support/pkg/ directory by default. These gather files can be identified by their ‘IsilonLogs’ prefix. For example:

# ls -lsia /ifs/data/Isilon_Support/pkg/IsilonLogs*

6952453633 3124592 -rw-r--r--     1 ese  ese  2838789143 May  1 16:26 /ifs/data/Isilon_Support/pkg/IsilonLogs-HAL-9000-New1-20240501-162000-b8b6755a-eb48-467d-a5e3-3f6f650ae0d1.tgz

Note that the WebUI will display a warning recommendation to download gather log tarfiles great than 20MB in size via CLI, rather than using the WebUI option. For example:

When done, the gather file can be easily removed via the WebUI ‘Delete’ Actions button above, and successful deletion is confirmed:

The ‘Gather settings’ WebUI page remains largely unchanged in OneFS 9.8, with the choice of both a full or incremental gather, and the auto upload and various transport protocol options available:

Successful changes to the gather settings, in this case to incremental gather mode, are confirmed by a WebUI popup:

With SmartLog in OneFS 9.8, the three new options for initiating a more granular gather now include:

Gather Option Description CLI syntax
Group Gather based on the feature group(s). Ie: protocol, data service, auth, security, cloud, etc. isi_gather_info –group  <g1,g2,…,gn>
Time interval Past Gather based on duration. Time Range specified as interval (hours, days, weeks). isi_gather_info –gather-past <nw/nd/nh>
Timestamp Gather based on the beginning timestamp. isi_gather_info –gather-begin <YYYY-MM-DD [HH:MM]>

Gather based on the timestamp.

The WebUI ‘Start Gather’ page’s ‘Time Range’ option allows timestamp-based log gathers to be specified:

Timestamp-based gathers can also be initiated from the CLI with the following syntax:

# isi diagnostics gather start --gather-begin <YYYY-MM-DD [HH:MM]>

Past Gather based on duration.

Similarly, the  ‘Gather Past’ option on the WebUI ‘Start Gather’ page allows past duration log gathers to be specified:

Past-duration-based gathers can also be initiated from the CLI with the following syntax:

# isi diagnostics gather start --gather-past <nw/nd/nh>

Gather based on the feature group.

Upon initiating a gather via the WebUI, when the ‘Gather Group’ mode is selected, the full array of feature groups are displayed:

The full list of valid gather feature groups can also be displayed with the following CLI command:

# isi diagnostics gather groups

Valid components are 'abr, acct, acct_sensitive, admin, antivirus, application, auth, backup, bootmessages, celog, cloud, cloudpools, cluster, datamover, eth_backend, firmware, fs, hardware, hdfs, http, ib, iceage, job_engine, logs, messages, ndmp, network, nfs, node, performance, protocol, quotas, s3, security, smartpools, smb, snapshots, storage, synciq, usage'

For the more curious among us, the ‘isi_gather_info -l’ CLI command will list all the gather commands that SmartLog can run, plus also indicate which feature group(s) each command is a member of. For example:

# isi_gather_info -l | more

Known commands are listed by name first with important attributes nested under the commands name.

    brand_data:

        full_command_text=`cd /etc && tar -c -f /ifs/data/Isilon_Support/2024-05-02T16:47:52.717194/brand_data.tar brand`

        timeout=`300`

        is_default=True

    isi_gconfig:

        full_command_text=`/usr/bin/isi_gconfig`

        timeout=`150`

        is_default=True

        groups=[auth, celog, cloudpools, fs, hdfs, job_engine, nfs, protocol, s3, smb]

    isi_fputil_leds:

        full_command_text=`/usr/bin/isi_hwtools/isi_fputil -g`

        timeout=`150`

        is_default=True

        groups=[hardware]

    upgrade_local:

        full_command_text=`cd / && tar -c -f /ifs/data/Isilon_Support/2024-05-02T16:47:52.717194/upgrade_local.tar --exclude '/var/ifs/upgrade/AgentPersistent.db*' var/ifs/upgrade`

        timeout=`150`

        is_default=True

        groups=[admin]

    efs.lbm.drive_space:

        full_command_text=`/sbin/sysctl efs.lbm.drive_space`

        timeout=`150`

        is_default=True

        groups=[usage]

< snip >

The desired feature group(s) can be selected by clicking on their associated checkbox and then using the right arrow button to add them to the active groups column. In the following example, NFS, network, S3 and SMB have been selected, and the clicking the ‘Start Gather’ button will activate the job:

Similarly, the corresponding selected feature groups gather can be initiated from the CLI as follows:

# isi diagnostics gather start --group nfs,network,s3,smb

Gather started.

Finished gathers can be found in: /ifs/data/Isilon_Support/pkg

As of OneFS 9.5 and later, the ‘Edit gather settings’ page defaults to FTPS as the transport, with the associated radio buttons and text boxes for its configuration. These settings can also be viewed and/or modified via the CLI:

# isi diagnostics gather settings view

                Upload: Yes

                  ESRS: Yes

         Supportassist: Yes

           Gather Mode: full

  HTTP Insecure Upload: No

      HTTP Upload Host:

      HTTP Upload Path:

     HTTP Upload Proxy:

HTTP Upload Proxy Port: -

            Ftp Upload: Yes

       Ftp Upload Host: ftp.isilon.com

       Ftp Upload Path: /incoming

      Ftp Upload Proxy:

 Ftp Upload Proxy Port: -

       Ftp Upload User: anonymous

   Ftp Upload Ssl Cert:

   Ftp Upload Insecure: No

                 Group:

          Gather Begin:

           Gather Past:

While FTPS is the default and (highly) recommended transport, the legacy plaintext FTP upload method is still available, if necessary. As such, Dell’s log server, ftp.isilon.com, also supports both encrypted FTPS and plaintext FTP, so will not impact older (pre-OneFS 9.5) release FTP log upload behavior.

However, a warning is displayed if cluster admin elects to continue using non-secure FTP as the transport for the SmartLog:

Similarly from the CLI, if the ‘—ftp-upload-insecure’ option is configured, the following message is displayed, informing the user that plain text FTP upload is being used, and that the connection and data stream will not be encrypted:

# isi diagnostics gather start --ftp-upload-insecure

You are performing plain text FTP logs upload.

This feature is deprecated and will be removed

in a future release. Please consider the possibility

of using FTPS for logs upload. For further information,

please contact PowerScale support

...

Once a logfile gather arrives at Dell, it is automatically unpacked by a support process and analyzed using the ‘logviewer’ tool.

Note that the ‘isi diagnostics gather’ is a limited scope wrapper for the underlying ‘isi_gather_info’ utility. For example, the following two CLI commands can be used interchangeably:

# isi diagnostics gather start --group nfs,network,s3,smb

Or:

# isi_gather_info --group nfs,network,s3,smb

For reference, the comprehensive ‘isi_gather_info’ CLI utility in OneFS 9.8 includes the following options:

Option Description
–upload <boolean> Enable gather upload.
–esrs <boolean> Use ESRS for gather upload.
–noesrs Do not attempt to upload via ESRS.
–supportassist Attempt SupportAssist upload.
–nosupportassist Do not attempt to upload via SupportAssist.
–gather-mode (incremental | full) Type of gather: incremental, or full.
–gather-begin <YYYY-MM-DD [HH:MM]> Time to begin the gather.
–gather-past <nw/nd/nh> How far in the past to gather logs.
–group <g1,g2,…,gn> Which feature group(s) to gather logs for.
–http-insecure <boolean> Enable insecure HTTP upload on completed gather.
–http -host <string> HTTP Host to use for HTTP upload.
–http -path <string> Path on HTTP server to use for HTTP upload.
–http -proxy <string> Proxy server to use for HTTP upload.
–http -proxy-port <integer> Proxy server port to use for HTTP upload.
–ftp <boolean> Enable FTP upload on completed gather.
–noftp Do not attempt FTP upload.
–set-ftp-password Interactively specify alternate password for FTP.
–ftp -host <string> FTP host to use for FTP upload.
–ftp -path <string> Path on FTP server to use for FTP upload.
–ftp-port <string> Specifies alternate FTP port for upload.
–ftp-proxy <string> Proxy server to use for FTP upload.
–ftp -proxy-port <integer> Proxy server port to use for FTP upload.
–ftp-mode <value> Mode of FTP file transfer. Valid values are: both, active, passive
–ftp -user <string> FTP user to use for FTP upload.
–ftp-pass <string> Specify alternative password for FTP.
–ftp -ssl-cert <string> Specifies the SSL certificate to use in FTPS connection.
–ftp-upload-insecure <boolean> Whether to attempt a plain text FTP upload.
–ftp-upload-pass <string> FTP user to use for FTP upload password.
–set-ftp-upload-pass Specify the FTP upload password interactively.

 

HealthCheck Enhancements

Failing HealthCheck evaluations also now support small gathers in OneFS 9.8. HealthCheck evaluation gathers are automatically sent to Dell Support, per the cluster’s SmartLog transport configuration (‘isi diagnostics gather settings’):

From the CLI, the corresponding healthcheck gather syntax is as follows:

# isi healthcheck evaluations gather --id <evaluation id>

Note that for dark sites with no external routing, SmartLog also offers the ability to download the log gather locally:

CELOG Enhancements

CELOG event groups also support SmartLog small gathers in OneFS 9.8. However, the event severity must be either Emergency or Critical severity for the gather option to be available. For example:

Additionally, the corresponding CELOG event group gather CLI syntax is as follows:

# isi event group gather --id <event group id>

Similar to healthchecks, SmartLog also offers the ability to download the log gather locally for dark sites with no external routing:

OneFS SmartLog

Within OneFS, diagnostics gathering, either via the WebUI interface or directly using ‘isi_gather_info’ CLI utility, is the primary method for collecting and uploading a PowerScale cluster’s configuration and context. The output package is typically used help Dell Support identify and resolve bugs and issues. OneFS diagnostics gathers operate by:

  • Executing multiple commands, scripts, and utilities on a cluster, and saving their results.
  • Collating (gathering) all these files into a single ‘gzipped’ package.
  • Optionally transmitting this log gather package back to Dell via a choice of transport methods.

As part of the ongoing drive to simply and streamline PowerScale’s issue investigation and time to resolution, OneFS 9.8 introduces a new SmartLog enhancement. SmartLog refines the log gathering process, and integrates it with OneFS health-checking, and events and alerting as follows:

Activity Description
Gather • Scope of gathers can be limited by specifying one or more functional groups.

• Extends time-based gather functionality (both shorthand, ex. 2h, and timestamp)

• Allows for gathering of small and highly optimized gathers

Healthcheck • Gathers can be triggered via ‘isi healthcheck evaluations gather’ CLI command.

• Healthcheck gathers cannot be triggered for passing evaluations

CELOG • Gathers can now be triggered via `isi event groups gather `

• CELOG gathers can only be triggered for Critical and Emergency events

By default, a log gather tarfile is written to the /ifs/data/Isilon_Support/pkg/ directory. Prior to OneFS 9.8, this was an all-or-nothing operation. However, with 9.8 and SmartLog, the size and scope of this log set can be granularly controlled, both by time period or functional group. These groups span functional areas such as core OneFS protocols, data services, job engine, cloud, performance, security, authentication, networking, hardware, etc. One or many of these groups can be selected to concentrate a log gather on the area of investigation. Similarly, the desired time period can also be used to constrain the scope of a gather.

Once coalesced and zipped, a log gather can also be automatically uploaded to Dell via the following means:

Upload Mechanism Description TCP Port OneFS Release Support
SupportAssist / ESRS Uses Dell Secure Remote Support (SRS) for gather upload. 443/8443 Any
FTP Use FTP to upload completed gather. 21 Any
FTPS Use SSH-based encrypted FTPS to upload gather. 22 Default in OneFS 9.5 and later
HTTP Use HTTP to upload gather. 80/443 Any

Clearly, the ability to narrow the scope of a gather can drastically reduce the quantity of data generated and time taken to upload to Dell Support.

As indicated in the table above, FTPS is the current default option for FTP upload, thereby protecting the upload of cluster configuration and logs with an encrypted transmission session.

Under the hood, the log gather process comprises an eight phase workflow, with transmission comprising the penultimate ‘upload’ phase:

The details of each phase are as follows:

Phase Description
1.       Setup Reads from the arguments passed in, as well as any config files on disk, and sets up the config dictionary. Most of the code for this step is contained in isilon/lib/python/gather/igi_config/configuration.py. This is also the step where the program is most likely to exit, if some config arguments end up being invalid.
2.       Run local Executes all the cluster commands, which are run on the same node that is starting the gather. All these commands run in parallel (up to the current parallelism value). This is typically the second longest running phase.
3.       Run nodes Executes the node commands across all of the cluster’s nodes. This runs on each node, and while these commands run in parallel (up to the current parallelism value), they do not run in parallel with the local step.
4.       Collect Ensures all of the results end up on the overlord node (the node that started gather). If gather is using /ifs, it is very fast, but if it’s not, it needs to SCP all the node results to a single node.
5.       Generate Extra Files Generates nodes_info and package_info.xml. These are two files that are present in every single gather, and tell us some important metadata about the cluster
6.       Packing Packs (tars and gzips) all the results. This is typically the longest running phase, often by an order of magnitude
7.       Upload Transports the tarfile package to its specified destination via SupportAssist, ESRS, FTPS, FTP, HTTP, etc. Depending on the geographic location, this phase might also be a lengthy duration.
8.       Cleanup Cleanups any intermediary files that were created on cluster. This phase will run even if gather fails, or is interrupted.

Since SmartLog and its underlying isi_gather_info tool is primarily intended for troubleshooting clusters with issues, it runs as root (or compadmin in compliance mode), as it needs to be able to execute under degraded conditions (eg. without GMP, during upgrade, and under cluster splits, etc). Given these atypical requirements, isi_gather_info is built as a stand-alone utility, rather than using the platform API for data collection.

While FTPS is the default and (highly) recommend transport, the legacy plaintext FTP upload method is still available, if necessary. As such, Dell’s log server, ftp.isilon.com, also supports both encrypted FTPS and plaintext FTP, so will not impact older (pre-OneFS 9.5) release FTP log upload behavior.

However, a warning is displayed if cluster admin elects to continue using non-secure FTP as the transport for the SmartLog:

Similarly from the CLI, if the ‘–ftp-insecure’ option is configured, the following message is displayed, informing the user that plain text FTP upload is being used, and that the connection and data stream will not be encrypted:

# isi_gather_info --ftp-insecure

You are performing plain text FTP logs upload.

This feature is deprecated and will be removed

in a future release. Please consider the possibility

of using FTPS for logs upload. For further information,

please contact PowerScale support

...

Once a logfile gather arrives at Dell, it is automatically unpacked by a support process and analyzed using the ‘logviewer’ tool.

In the next article in this series, we’ll take a look at the various SmartLog configuration options available in OneFS 9.8 that can be used to target the focus of a log gather.