OneFS SmartLog

Within OneFS, diagnostics gathering, either via the WebUI interface or directly using ‘isi_gather_info’ CLI utility, is the primary method for collecting and uploading a PowerScale cluster’s configuration and context. The output package is typically used help Dell Support identify and resolve bugs and issues. OneFS diagnostics gathers operate by:

  • Executing multiple commands, scripts, and utilities on a cluster, and saving their results.
  • Collating (gathering) all these files into a single ‘gzipped’ package.
  • Optionally transmitting this log gather package back to Dell via a choice of transport methods.

As part of the ongoing drive to simply and streamline PowerScale’s issue investigation and time to resolution, OneFS 9.8 introduces a new SmartLog enhancement. SmartLog refines the log gathering process, and integrates it with OneFS health-checking, and events and alerting as follows:

Activity Description
Gather • Scope of gathers can be limited by specifying one or more functional groups.

• Extends time-based gather functionality (both shorthand, ex. 2h, and timestamp)

• Allows for gathering of small and highly optimized gathers

Healthcheck • Gathers can be triggered via ‘isi healthcheck evaluations gather’ CLI command.

• Healthcheck gathers cannot be triggered for passing evaluations

CELOG • Gathers can now be triggered via `isi event groups gather `

• CELOG gathers can only be triggered for Critical and Emergency events

By default, a log gather tarfile is written to the /ifs/data/Isilon_Support/pkg/ directory. Prior to OneFS 9.8, this was an all-or-nothing operation. However, with 9.8 and SmartLog, the size and scope of this log set can be granularly controlled, both by time period or functional group. These groups span functional areas such as core OneFS protocols, data services, job engine, cloud, performance, security, authentication, networking, hardware, etc. One or many of these groups can be selected to concentrate a log gather on the area of investigation. Similarly, the desired time period can also be used to constrain the scope of a gather.

Once coalesced and zipped, a log gather can also be automatically uploaded to Dell via the following means:

Upload Mechanism Description TCP Port OneFS Release Support
SupportAssist / ESRS Uses Dell Secure Remote Support (SRS) for gather upload. 443/8443 Any
FTP Use FTP to upload completed gather. 21 Any
FTPS Use SSH-based encrypted FTPS to upload gather. 22 Default in OneFS 9.5 and later
HTTP Use HTTP to upload gather. 80/443 Any

Clearly, the ability to narrow the scope of a gather can drastically reduce the quantity of data generated and time taken to upload to Dell Support.

As indicated in the table above, FTPS is the current default option for FTP upload, thereby protecting the upload of cluster configuration and logs with an encrypted transmission session.

Under the hood, the log gather process comprises an eight phase workflow, with transmission comprising the penultimate ‘upload’ phase:

The details of each phase are as follows:

Phase Description
1.       Setup Reads from the arguments passed in, as well as any config files on disk, and sets up the config dictionary. Most of the code for this step is contained in isilon/lib/python/gather/igi_config/configuration.py. This is also the step where the program is most likely to exit, if some config arguments end up being invalid.
2.       Run local Executes all the cluster commands, which are run on the same node that is starting the gather. All these commands run in parallel (up to the current parallelism value). This is typically the second longest running phase.
3.       Run nodes Executes the node commands across all of the cluster’s nodes. This runs on each node, and while these commands run in parallel (up to the current parallelism value), they do not run in parallel with the local step.
4.       Collect Ensures all of the results end up on the overlord node (the node that started gather). If gather is using /ifs, it is very fast, but if it’s not, it needs to SCP all the node results to a single node.
5.       Generate Extra Files Generates nodes_info and package_info.xml. These are two files that are present in every single gather, and tell us some important metadata about the cluster
6.       Packing Packs (tars and gzips) all the results. This is typically the longest running phase, often by an order of magnitude
7.       Upload Transports the tarfile package to its specified destination via SupportAssist, ESRS, FTPS, FTP, HTTP, etc. Depending on the geographic location, this phase might also be a lengthy duration.
8.       Cleanup Cleanups any intermediary files that were created on cluster. This phase will run even if gather fails, or is interrupted.

Since SmartLog and its underlying isi_gather_info tool is primarily intended for troubleshooting clusters with issues, it runs as root (or compadmin in compliance mode), as it needs to be able to execute under degraded conditions (eg. without GMP, during upgrade, and under cluster splits, etc). Given these atypical requirements, isi_gather_info is built as a stand-alone utility, rather than using the platform API for data collection.

While FTPS is the default and (highly) recommend transport, the legacy plaintext FTP upload method is still available, if necessary. As such, Dell’s log server, ftp.isilon.com, also supports both encrypted FTPS and plaintext FTP, so will not impact older (pre-OneFS 9.5) release FTP log upload behavior.

However, a warning is displayed if cluster admin elects to continue using non-secure FTP as the transport for the SmartLog:

Similarly from the CLI, if the ‘–ftp-insecure’ option is configured, the following message is displayed, informing the user that plain text FTP upload is being used, and that the connection and data stream will not be encrypted:

# isi_gather_info --ftp-insecure

You are performing plain text FTP logs upload.

This feature is deprecated and will be removed

in a future release. Please consider the possibility

of using FTPS for logs upload. For further information,

please contact PowerScale support

...

Once a logfile gather arrives at Dell, it is automatically unpacked by a support process and analyzed using the ‘logviewer’ tool.

In the next article in this series, we’ll take a look at the various SmartLog configuration options available in OneFS 9.8 that can be used to target the focus of a log gather.

Leave a Reply

Your email address will not be published. Required fields are marked *