OneFS CELOG Alerts and Events WebUI

The OneFS 9.2 release introduced a number of OneFS usability enhancements for managing cluster events and alerts. This new functionality makes it considerably simpler to filter events chronologically, categorize by their status, filter by the severity, search the event history, resolve, suppress or ignore bulk events, and manage scheduled maintenance windows.

For example, you can easily categorize, identify, and filter events by using the following criteria:

Action Detail
Show ·         Show events for:

–    Today

–    This week

–    This month

–    Custom range/

–    All

Categorize ·         Categorize events by their status:

–    Active

–    Ignored

–    Resolved

–    All

Filter ·         Filter events by severity:

–    Emergency

–    Critical

–    Warning

–    Information

Search ·         Search for specific event(s) in the event history
Resolve ·         Resolve bulk events
Ignore ·         Ignore bulk events

The new WebUI page for event group history can be accessed by navigating to Cluster management > Events and Alerts. For example:

With OneFS 9.2, CELOG maintenance mode can also easily be manually enabled and disabled. During a maintenance window, the system will continue to log events but not generate alerts. However, all events that occurred during the maintenance window can then be reviewed upon disabling maintenance mode. Active event groups will automatically resume generating alerts when the scheduled maintenance period ends. For example, to enable CELOG maintenance mode, in the OneFS WebUI select Cluster Management > Events and Alerts > Alert management tab and click on the ‘Enable CELOG maintenance mode’ button. In the prompt window, select ‘Enable CELOG maintenance mode’ as follows:

Create an Alert channel. Either SMTP or SNMP can be configured for the alert channel communication, and can be created by selecting ‘Create channel’:

To create an alert rule, click the tab Alerting rule and then click the button Create alert rule. In the prompt window, fill the Rule name, set the Rule condition to NEW, apply it to all the Alert categories, and attached it to the channel you have just created.

Events can be created using CLI syntax similar to the following:

# /usr/bin/isi_celog/celog_send_events.py -o 940100002

Heap looptimes: [-1]

running -1 [940100002]

1612871342 :: Sending eventids [940100002] with specifier None

940100002 message is OneFS {version} is currently running on unsupported nodes (devid(s) {devids}). {msg}.

1.195 (70368744177859) corresponds to eventid 940100002

Out of events to run. Exiting.
# /usr/bin/isi_celog/celog_send_events.py -o 940100001

Heap looptimes: [-1]

running -1 [940100001]

1612872343 :: Sending eventids [940100001] with specifier None

940100001 message is OneFS {version} is currently running and is not supported on this hardware: {msg}.

1.196 (70368744177860) corresponds to eventid 940100001

Out of events to run. Exiting.

During maintenance mode, OneFS will still show the event but there will be no associated alert. In this example, there is no SMTP alert email triggered.

The following CLI syntax can also be used to filter all the events which happened while the cluster was in CELOG maintenance mode:

# isi event groups list --maintenance-mode=true

ID   Started     Ended       Causes Short                           Lnn  Events  Severity

------------------------------------------------------------------------------------------

16   02/09 11:49 --          HW_CLUSTER_ONEFS_VERSION_NOT_SUPPORTED 1    1       critical

17   02/09 12:05 02/09 12:19 HW_ONEFS_VERSION_NOT_SUPPORTED         1    1       critical

------------------------------------------------------------------------------------------

Click the “Disable CELOG maintenance mode’ button. and select one of the following from the display window:

    1. View event details
    2. Ignore event
    3. Resolve event

In this example, the event HW_ONEFS_VERSION_NOT_SUPPORTED is marked resolved by clicking Action and Resolve event.

After the CELOG maintenance mode is disabled, you will get the email notification only for HW_CLUSTER_ONEFS_VERSION_NOT_SUPPORTED. The event which has been marked resolved will not trigger any notification.

When an event type is suppressed, it prevents an event from alerting on all configured CELOG channels. However, the event will still be displayed in the event group history.

To suppress an event type, click the button Suppress for a specific event under Event type ID tab. In this example, both 930100006 and 930100005 have been suppressed.

Create several events, for example by using CLI commands such as the following:

# /usr/bin/isi_celog/celog_send_events.py -o 930100006

Heap looptimes: [-1]

running -1 [930100006]

1612873812 :: Sending eventids [930100006] with specifier None

930100006 message is {sensor} out of spec in chassis {chassis} slot {slot}.

1.200 (70368744177864) corresponds to eventid 930100006

Out of events to run. Exiting.
# /usr/bin/isi_celog/celog_send_events.py -o 930100005

Heap looptimes: [-1]

running -1 [930100005]

1612873817 :: Sending eventids [930100005] with specifier None

930100005 message is {sensor} out of spec in chassis {chassis} slot {slot}.

1.201 (70368744177865) corresponds to eventid 930100005

Out of events to run. Exiting.

To list all the events in the suppressed list, use the following CLI syntax:

# isi event suppress list

ID        Name

—————————-

930100005 HWMON_ANY_DISCRETE

930100006 HWMON_ANY_METERS

—————————-

These suppressed events will only show in the event history and will not trigger any notification in any channels.

The desired event types can be un-suppressed by clicking pertinent the Un-suppress button(s).

 

Leave a Reply

Your email address will not be published. Required fields are marked *