OneFS and Dell Technologies Connectivity Services

Within the plethora of new functionality in the OneFS 9.10 release payload is support for the Dell Technologies Connectivity Services, or DTCS, Dell’s remote connectivity system.

DTCS assists with quickly identifying, triaging, and fixing cluster issues, and boosts productivity by replacing manual routines with automated support. Its predictive issue detection and proactive remediation helps accelerate resolution – or avoid issues completely. Best of all, DTCS is included with all Dell PowerScale support plans, although the specific features available may vary based on the level of service contract.

Within OneFS, Dell Technologies Connectivity Services is intended for transmitting events, logs, and telemetry from PowerScale to Dell support. As such, it provides a full replacement for the legacy ESRS, as well as rebranding of the former SupportAssist services.

Delivering a consistent remote support experience across the storage portfolio, DTCS is intended for all sites that can send telemetry off-cluster to Dell over the internet. Dell Connectivity Services integrates the Dell Embedded Service Enabler (ESE) into PowerScale OneFS along with a suite of daemons to allow its use on a distributed system.

Dell Technologies Connectivity Services (formerly SupportAssist) ESRS
Dell’s next generation remote connectivity solution. Being phased out of service.
Can either connect directly, or via supporting gateways. Can only use gateways for remote connectivity.
Uses Connectivity Hub to coordinate support. Uses ServiceLink to coordinate support.
Requires access key and pin, or hardware key, to enable. Uses customer username and password to enable.

Dell Technologies Connectivity Services uses Connectivity Hub and can either interact directly, or through a Secure Connect gateway.

DTCS comprises a variety of components that gather and transmit various pieces of OneFS data and telemetry to Dell Support, via the Embedded Service Enabler (ESE).  These workflows include CELOG events, In-product activation (IPA) information, CloudIQ telemetry data, Isi-Gather-info (IGI) log sets, and provisioning, configuration and authentication data to ESE and the various backend services.

Workflow Details
CELOG DTCS can be configured to send CELOG events and attachments via ESE to CLM.   CELOG has a ‘Dell Connectivity Services’ channel that, when active, will create an EVENT task for DTCS to propagate.
License Activation The isi license activation start command uses DTCS to connect.

Several pieces of PowerScale and OneFS functionality require licenses, and to register and must communicate with the Dell backend services in order to activate those cluster licenses. In OneFS 9.10, DTCS is the preferred mechanism to send those license activations via the Embedded Service Enabler(ESE) to the Dell backend. License information can be generated via the ‘isi license generate’ CLI command, and then activated via the ‘isi license activation start’ syntax.

Provisioning DTCS must register with backend services in a process known as provisioning.  This process must be executed before the Embedded Service Enabler(ESE) will respond on any of its other available API endpoints.  Provisioning can only successfully occur once per installation, and subsequent provisioning tasks will fail. DTCS must be configured via the CLI or WebUI before provisioning.  The provisioning process uses authentication information that was stored in the key manager upon the first boot.
Diagnostics The OneFS isi diagnostics gather and isi_gather_info logfile collation and transmission commands have a –connectivity option.
Healthchecks HealthCheck definitions are updated using DTCS.
Telemetry CloudIQ Telemetry data is sent using DTCS.
Remote Support Remote Support uses DTCS and the Connectivity Hub to assist customers with their clusters.

DTCS requires an access key and PIN, or hardware key, in order to be enabled, with most customers likely using the access key and pin method. Secure keys are held in Key manager under the RICE domain.

In addition to the transmission of data from the cluster to Dell, Connectivity Hub also allows inbound remote support sessions to be established for remote cluster troubleshooting.

In the next article in this series, we’ll take a deeper look at the Dell Technologies Connectivity Services architecture and operation.

OneFS Healthcheck Enhancements

As the name suggests, PowerScale OneFS healthchecks enable a storage administrator to quickly and easily evaluate the status of specific software and hardware components of the cluster and its environment.

The OneFS 9.10 release includes several healthcheck enhancements, which aid cluster administrator in quickly understanding the health of the system, plus offering resolution guidance in the event of a failure. In a nutshell, these include:

Function Details
Dashboard Display current healthcheck results in the landing page to indicate the current health of the system (Real-time health of the system).
Export The ability to export in CSV or JSON formats.
Grouping Grouping of healthcheck based on category, frequency.
History Historical healthchecks presented as a separate category.
Links Links provided to relevant knowledge base (KB) articles instead of plain texts.
Troubleshooting Detailed information on the failure and troubleshooting guidance.

The healthcheck landing page in OneFS 9.10, accessible under Cluster Management > Healthchecks, displays navigation tabs for three pages:

Of these, the ‘evaluation’ and ‘heathcheck’ views are enhanced in the new release, with ‘Evaluations’ being the default landing page.

In earlier versions, the ‘healthcheck’ page, under Cluster Management > Healthchecks > Healthchecks, displayed two separate tables – one for checklists themselves and another for their contents, the checklist items. Plus, there was no properly directed relationship between the checklists and their items.

To address this, OneFS 9.10 condenses these into a single table view, where each checklist row can be expanded to make its associated items visible. For example, the expanded CELOG checklist and contents in the following:

Moving to a single table format has also enabled the addition of keyword search functionality. As the desired search string is entered into the search box, the WebUI automatically expands and collapses rows to make the matching content visible. This allows the admin to quickly drill down into their checks of interest, and then easily run the full checklist – or just individual items themselves. For example, searching for ‘quota’ reveals the following related items within the ‘auth’ and ‘basic’ checklists:

Additionally, the email settings button for each healthcheck are now more apparent, intuitive, and accessible, offering either default or custom distribution list options:

For ‘evaluations’, the enhanced Healthcheck dashboard in OneFS 9.10 clearly displays the current healthcheck status and results on the landing page. As such, navigating to Cluster Management > Healthchecks now provides a single screen synopsis of the real-time health of the cluster. For example:

In addition to a keyword search option, this view can also be filtered by the ‘latest’ evaluation, or ‘all’ evaluations.

Under the ‘Actions’ field, the ‘More’ dropdown allows logs to be easily gathered and/or downloaded:

If a log gather is selected, its progress is reported is the ‘status’ field for the associated check. For example:

Clicking the ‘view details’ button for a particular failed checklist opens up a pane with both ‘passed’ and ‘failed’ items:

The ‘passed items’ tab provides details on the specific check(s) that were successfully completed (or unsupported) in the evaluation run.

Similarly, the ‘failed items’ tab displays the unsuccessful check(s) with their error description. For example, the following job engine healthcheck, notifying of LIN-based jobs and suggesting remediation steps:

In this case, even though 260 of the checklist items have passed and only 1 has failed, the overall status for the ‘basic’ checklist is ‘failed’.

The ‘export’ drop-down allows the healthcheck error details to be exported for further analysis as either a CSV or JSON file. For example:

Similarly, the OneFS 9.10 CLI also has a ‘format’ option for exporting healthcheck evaluations. However, unlike the WebUI, the command line options include a list and table format, in addition to CSV and JSON. As such, the 9.10 Healthcheck export options can be summarized as follows:

Export Format CLI WebUI
CSV x x
JSON x x
List x
Table x

The CLI syntax for specifying the export format is as follows:

# isi healthcheck evaluations view <id> --format <csv | json | list | table>

For example, to limit the view to one basic evaluation, in table format, and without the header and footer:

# isi healthcheck evaluations view basic20250304T1105 --format table --limit 1 --no-header --no-footer

basic20250304T1105 basic -    Completed Fail

WARNING    75  [NODE   5] port_flapping

 * Network port flapping has been detected at some point in the

   last 24 hours on the following ports mce0, mce1. This can cause

   issues such as memory leaks if not addressed. Contact Dell

   Technologies Support if you are experiencing network issues.

Note that the default output contains failing items for that evaluation only. However, the ‘—verbose’ flag can be included to display all the pass and fail items for that evaluation.

On the platform API (pAPI) front, the following new v21 endpoints have been added in OneFS 9.10:

/21/healthcheck/evaluations

This now includes the ‘format_for_csv_download’ option, and is used to enable CSV download of a healthcheck evaluation.

There’s also a new endpoint to track the status of a log gather in progress:

/21/cluster/diagnostics/gather/status

For example:

# curl -k https://<name>:<Passwd>@localhost:8080/platform/21/cluster/diagnostics/gather/status
{
"gather" :
{
"item" : null,
"path" : "/ifs/data/Isilon_Support/pkg",
"status" :
{
"Active_Status" : "RUNNING",
"Last_Status" : " NOT_RUNNING "
}
}
}

OneFS Front-end Infiniband Configuration

In the previous article in this series, we examined the ‘what’ and ‘why’ of front-end Infiniband on a PowerScale cluster. Now we turn our attention to the ‘how’ – i.e. the configuration and management of front-end IB.

The networking portion of the OneFS WebUI has seen some changes in 9.10, and cluster admins now have the flexibility to create either Ethernet or InfiniBand (IB) subnets. Depending on the choice, the interface list, pool and rule details automatically adjust to match the selected link layer type. This means that if Infiniband (IB) is selected, the interface list and pool details will update to reflect settings specific to IB, including a new ‘green pill’ icon to indicate the presence of IB subnets and pools on the external network table. For example:

Similarly, the subnet1 view from the CLI, with the ‘interconnect’ field indicating ‘Infiniband’:

# isi network subnets view subnet1
              ID: groupnet0.subnet1
            Name: subnet1
        Groupnet: groupnet0
           Pools: pool-infiniband
     Addr Family: ipv4
       Base Addr: 10.205.228.0
            CIDR: 10.205.228.0/23
     Description: Initial subnet
         Gateway: 10.205.228.1
Gateway Priority: 10
    Interconnect: Infiniband
             MTU: 2044
       Prefixlen: 24
         Netmask: 255.255.254.0
SC Service Addrs: 1.2.3.4
 SC Service Name: cluster.tme.isilon.com
    VLAN Enabled: False
         VLAN ID: -

Alternatively, if Ethernet is chosen, the relevant subnet, pool, and rule options for that topology are displayed.

This dynamic adjustment ensures that only the relevant options and settings for the configured network type are displayed, making the configuration process more intuitive and streamlined.

For example, to create an IB subnet under Cluster management > Network configuration > External network > Create subnet:

Or from the CLI:

# isi network subnets create groupnet0.subnet1 ipv4 255.255.254.0 --gateway --gateway-priority 10 10.205.228.1 --linklayer infiniband

Similarly, editing an Infiniband subnet:

Note that an MTU configuration option is not available when configuring an Infiniband subnet. Also, the WebUI displays a banner warning that NFS over Infiniband will operate at a reduced speed if NFS over RDMA has not already been enabled.

In contrast, editing an Ethernet subnet provides the familiar MTU frame-size configuration options:

A font-end network IP pool can be easily created under a subnet. For example from the CLI, using the ‘<groupnet>.<subnet>.<pool>’ notation:

# isi network pools create groupnet0.infiniband1.ibpool1

Or via the WebUI:

Adding an Infiniband subnet is permitted on any cluster, regardless of its network configuration. However, the above messages will be displayed if attempting to create a pool under an Infiniband subnet on a cluster or node without any configured front-end IB interfaces.

From the CLI, the ‘isi_hw_status’ utility can be used to easily verify a node’s front and back-end networking link layer types. For example, take the following F710 configuration:

The ‘isi_hw_status’ CLI command output also confirms the front-end network ‘FEType’ parameter, in this case as ‘Infiniband’:

# isi_hw_status
  SerNo: FD7LRY3
 Config: PowerScale F710
ChsSerN: FD7LRY3
ChsSlot: n/a
FamCode: F
ChsCode: 1U
GenCode: 10
PrfCode: 7
   Tier: 16
  Class: storage
 Series: n/a
Product: F710-1U-Dual-512GB-2x1GE-2x100GE QSFP28-2x200GE QSFP56-38TB SSD
  HWGen: PSI
Chassis: POWEREDGE (Dell PowerEdge)
    CPU: GenuineIntel (2.60GHz, stepping 0x000806f8)
   PROC: Dual-proc, 24-HT-core
    RAM: 549739036672 Bytes
   Mobo: 071PXR (PowerScale F710)
  NVRam: NVDIMM (SDPM VOSS Module) (8192MB card) (size 8589934592B)
 DskCtl: NONE (No disk controller) (0 ports)
 DskExp: None (No disk expander)
PwrSupl: PS1 (type=AC, fw=00.1D.9C)
PwrSupl: PS2 (type=AC, fw=00.1D,9C)
  NetIF: bge0,bge1,lagg0,mce0,mce1,mce2,mce3,mce4
 BEType: Infiniband
 FEType: 200GigE
 LCDver: IsiVFD3 (Isilon VFD V3)
 Midpln: NONE (No FCB Support)
Power Supplies OK

In contrast, the back-end network on this F710 is 200Gb Ethernet, as reported by the ‘BEType’ parameter.

From the node cabling perspective, the interface assignments on the rear of the F710 are as follows:

Additionally, the ‘mlxfwmanager’ CLI utility can be helpful for gleaning considerably more detail on a node’s NICs, including firmware versions, MAC address, GUID, part number, etc. For example:

# mlxfwmanager

Querying Mellanox devices firmware ...

Device #1:
----------
  Device Type:      ConnectX6
  Part Number:      ORRM24_Ax
  Description:      Nvidia ConnectX-6 VPI adapter card; HDR IB (200Gb/s) and 200GbE; dual-port QSFP56; PCIe4.0 x16
  PSID:             DEL0000000052
  PCI Device Name:  pci0:13:0:0
  Base MAC:         59a2e18dfdac
  Versions:         Current        Available
     FW             20.39.1002     N/A
     PXE            3.7.0201       N/A
     UEFI           14.32.0012     N/A
  Status:           No matching image found

Device #2:
----------
  Device Type:      ConnectX6DX
  Part Number:      OF6FXM_08P2T2_Ax
  Description:      Mellanox ConnectX-6 Dual Port 100 GbE QSFP56 Network Adapter
  PSID:             DEL0000000027
  PCI Device Name:  pci0:139:0:0
  Base GUID:        e8ebd30300060684
  Base MAC:         e8ebd3060684
  Versions:         Current        Available
     FW             22.36.1010     N/A
     PXE            3.6.0901       N/A
     UEFI           14.29.0014     N/A
  Status:           No matching image found

Device #3:
----------
  Device Type:      ConnectX6
  Part Number:      ORRM24_Ax
  Description:      Nvidia ConnectX-6 VPI adapter card; HDR IB (200Gb/s) and 200GbE; dual-port QSFP56; PCIe4.0 x16
  PSID:             DEL0000000052
  PCI Device Name:  pci0:181:0:0
  Base MAC:         a088c2ec499e
  Base GUID:        a088c20300ec499a
  Versions:         Current        Available
     FW             22.39.1002     N/A
     PXE            3.7.0201       N/A
     UEFI           14.32.0012     N/A
  Status:           No matching image found

In the example above, ‘Device #1’ is the back-end NIC, ‘Device #2’ is the 100Gb Ethernet ConnectX6 DX NIC in the PCIe4 slot, and ‘Device #3’ is the front-end Infiniband ConnectX6 VPI NIC in the primary PCIe5 slot.

There are a couple of caveats to be aware of when using front-end Infiniband on F710 and F910 node pools:

  • Upon upgrade to OneFS 9.10, any front-end Infiniband interfaces will only be enabled once the new release is committed.
  • Network pools created within Infiniband subnets will have their default ‘aggregation mode’ set to ‘unset’. Furthermore, this parameter will not be modifiable.
  • Since VLANs are not supported on Infiniband, OneFS includes validation logic to prevent this.