OneFS SupportAssist Management and Troubleshooting

In this final article in the OneFS SupportAssist series, we turn our attention to management and troubleshooting.

Once the provisioning process above is complete, the ‘isi supportassist settings view’ CLI command reports the status and health of SupportAssist operations on the cluster.

# isi supportassist settings view

        Service enabled: Yes

       Connection State: enabled

      OneFS Software ID: xxxxxxxxxx

          Network Pools: subnet0:pool0

        Connection mode: direct

           Gateway host: -

           Gateway port: -

    Backup Gateway host: -

    Backup Gateway port: -

  Enable Remote Support: Yes

Automatic Case Creation: Yes

       Download enabled: Yes

This can also be obtained from the WebUI by navigating to Cluster management > General settings > SupportAssist:

There are some caveats and considerations to keep in mind when upgrading to OneFS 9.5 and enabling SupportAssist, including:

  • SupportAssist is disabled when STIG Hardening applied to cluster
  • Using SupportAssist on a hardened cluster is not supported
  • Clusters with the OneFS network firewall enabled (‘isi network firewall settings’) may need to allow outbound traffic on port 9443.
  • SupportAssist is supported on a cluster that’s running in Compliance mode
  • Secure keys are held in Key manager under the RICE domain

Also, note that ESRS can no longer be used after SupportAssist has been provisioned on a cluster.

SupportAssist has a variety of components that gather and transmit various pieces of OneFS data and telemetry to Dell Support and backend services through the Embedded Service Enabler (ESE.  These workflows include CELOG events; In-product activation (IPA) information; CloudIQ telemetry data; Isi-Gather-info (IGI) logsets; and provisioning, configuration, and authentication data to ESE and the various backend services.

Activity Information
Events and alerts SupportAssist can be configured to send CELOG events..
Diagnostics The OneFS isi diagnostics gather and isi_gather_info logfile collation and transmission commands have a SupportAssist option.
Healthchecks HealthCheck definitions are updated using SupportAssist.
License Activation The isi license activation start command uses SupportAssist to connect.
Remote Support Remote Support uses SupportAssist and the Connectivity Hub to assist customers with their clusters.
Telemetry CloudIQ telemetry data is sent using SupportAssist.

CELOG

Once SupportAssist is up and running, it can be configured to send CELOG events and attachments via ESE to CLM. This can be managed by the ‘isi event channels’ CLI command syntax. For example:

# isi event channels list

ID   Name                Type          Enabled

-----------------------------------------------

1    RemoteSupport       connectemc    No

2    Heartbeat Self-Test heartbeat     Yes

3    SupportAssist       supportassist No

-----------------------------------------------

Total: 3

# isi event channels view SupportAssist

     ID: 3

   Name: SupportAssist

   Type: supportassist

Enabled: No

Or from the WebUI:

CloudIQ Telemetry

In OneFS 9.5, SupportAssist provides an option to send telemetry data to CloudIQ. This can be enabled from the CLI as follows;

# isi supportassist telemetry modify --telemetry-enabled 1 --telemetry-persist 0

# isi supportassist telemetry view

        Telemetry Enabled: Yes

        Telemetry Persist: No

        Telemetry Threads: 8

Offline Collection Period: 7200

Or via the SupportAssist WebUI:

 

Diagnostics Gather

Also in OneFS 9.5, the ‘isi diagnostics gather’ and ‘isi_gather_info’ CLI commands both now include a ‘–supportassist’ upload option for log gathers, which also allows them to continue to function when the cluster is unhealthy via a new ‘Emergency mode’. For example, to start a gather from the CLI that will be uploaded via SupportAssist:

# isi diagnostics gather start –supportassist 1

Similarly, for the isi_gather_info utility:

# isi_gather_info --supportassist

Or to explicitly avoid using SupportAssist for ISI gather info log gather upload:

# isi_gather_info --nosupportassist

This can also be configured from the WebUI via Cluster management > General configuration > Diagnostics > Gather:

 

License Activation

A cluster’s product licenses can also be managed through SupportAssist in OneFS 9.5.

PowerScale License Activation (previously known as In-Product Activation) facilitates the management of the cluster’s entitlements and licenses by communicating directly with Software Licensing Central via SupportAssist.

To activate OneFS product licenses through the SupportAssist WebUI, navigate to Cluster management > Licensing. For example, on a new cluster without any signed licenses:

Click the button Update & Refresh in the License Activation section. In the ‘Activation File Wizard’, select the desired software modules.

Next select ‘Review changes’, review, click ‘Proceed’, and finally ‘Activate’.

Note that it can take up to 24 hours for the activation to occur.

Alternatively, cluster License activation codes (LAC) can also be added manually.

Troubleshooting

When it comes to troubleshooting SupportAssist, the basic process flow is as follows:

The OneFS components and services above are:

Component Info
ESE Embedded Service Enabler.
isi_rice_d Remote Information Connectivity Engine (RICE).
isi_crispies_d Coordinator for RICE Incidental Service Peripherals including ESE Start.
Gconfig OneFS centralized configuration infrastructure.
MCP Master Control Program – starts, monitors, and restarts OneFS services.
Tardis Configuration service and database.
Transaction journal Task manager for RICE.

 

Of these, ESE, isi_crispies_d, isi_rice_d, and the Transaction Journal are new in OneFS 9.5 and exclusive to SupportAssist. In contrast, Gconfig, MCP, and Tardis are all legacy services that are used by multiple other OneFS components.

For its connectivity, SupportAssist elects a single leader single node within the subnet pool, and NANON nodes are automatically avoided. Ports 443 and 8443 are required to be open for bi-directional communication between the cluster and Connectivity Hub, and port 9443 is for communicating with a gateway. The SupportAssist ESE component communicates with a number of Dell backend services, including SRS, Connectivity Hub, ELMS licensing, CloudIQ, ESE, etc.

As such, debugging backend issues may involve one or more services, and Dell Support can assist with this process.

The main log files for investigating and troubleshooting SupportAssist issues and idiosyncrasies are isi_rice_d.log and isi_crispies_d.log. These is also an ese_log, which can be useful, too. These can be found at:

Component Logfile Location Info
Rice /var/log/isi_rice_d.log Per node
Crispies /var/log/isi_crispies_d.log Per node
ESE /ifs/.ifsvar/ese/var/log/ESE.log Cluster-wise for single instance ESE

 

Debug level logging can be configured from the CLI as follows:

# isi_for_array isi_ilog -a isi_crispies_d --level=debug+

# isi_for_array isi_ilog -a isi_rice_d --level=debug+

Note that the OneFS log gathers (such as the output from the isi_gather_info utility) will capture all the above log files, plus the pertinent SupportAssist Gconfig contexts and Tardis namespaces, for later analysis.

If needed, the Rice and ESE configurations can also be viewed as follows:

# isi_gconfig -t ese

[root] {version:1}

ese.mode (char*) = direct

ese.connection_state (char*) = disabled

ese.enable_remote_support (bool) = true

ese.automatic_case_creation (bool) = true

ese.event_muted (bool) = false

ese.primary_contact.first_name (char*) =

ese.primary_contact.last_name (char*) =

ese.primary_contact.email (char*) =

ese.primary_contact.phone (char*) =

ese.primary_contact.language (char*) =

ese.secondary_contact.first_name (char*) =

ese.secondary_contact.last_name (char*) =

ese.secondary_contact.email (char*) =

ese.secondary_contact.phone (char*) =

ese.secondary_contact.language (char*) =

(empty dir ese.gateway_endpoints)

ese.defaultBackendType (char*) = srs

ese.ipAddress (char*) = 127.0.0.1

ese.useSSL (bool) = true

ese.srsPrefix (char*) = /esrs/{version}/devices

ese.directEndpointsUseProxy (bool) = false

ese.enableDataItemApi (bool) = true

ese.usingBuiltinConfig (bool) = false

ese.productFrontendPrefix (char*) = platform/16/supportassist

ese.productFrontendType (char*) = webrest

ese.contractVersion (char*) = 1.0

ese.systemMode (char*) = normal

ese.srsTransferType (char*) = ISILON-GW

ese.targetEnvironment (char*) = PROD

And for Rice:

# isi_gconfig -t rice

[root] {version:1}

rice.enabled (bool) = false

rice.ese_provisioned (bool) = false

rice.hardware_key_present (bool) = false

rice.supportassist_dismissed (bool) = false

rice.eligible_lnns (char*) = []

rice.instance_swid (char*) =

rice.task_prune_interval (int) = 86400

rice.last_task_prune_time (uint) = 0

rice.event_prune_max_items (int) = 100

rice.event_prune_days_to_keep (int) = 30

rice.jnl_tasks_prune_max_items (int) = 100

rice.jnl_tasks_prune_days_to_keep (int) = 30

rice.config_reserved_workers (int) = 1

rice.event_reserved_workers (int) = 1

rice.telemetry_reserved_workers (int) = 1

rice.license_reserved_workers (int) = 1

rice.log_reserved_workers (int) = 1

rice.download_reserved_workers (int) = 1

rice.misc_task_workers (int) = 3

rice.accepted_terms (bool) = false

(empty dir rice.network_pools)

rice.telemetry_enabled (bool) = true

rice.telemetry_persist (bool) = false

rice.telemetry_threads (uint) = 8

rice.enable_download (bool) = true

rice.init_performed (bool) = false

rice.ese_disconnect_alert_timeout (int) = 14400

rice.offline_collection_period (uint) = 7200

The ‘-q’ flag can also be used in conjunction with the isi_gconfig command to identify any values that are not at their default settings. For example, the stock (default) Rice gconfig context will not report any configuration entries:

# isi_gconfig -q -t rice

[root] {version:1}

Leave a Reply

Your email address will not be published. Required fields are marked *