OneFS Environmentals Reporting

The Energy Star for storage initiative is a SNIA-defined criteria, in conjunction with the EPA and DoE, to evaluate the energy efficiency of a storage system. While earlier OneFS versions adhered to Energy Star 1.1 from 2014, OneFS 9.2 and later support the latest Energy Star 2.0 version. This allows a compliance engineer or administrator to easily query inlet temperature and power consumption via a simple, convenient interface. It also enables third-party datacenter management software to access environmental data via a standard network connection and take appropriate actions if an anomaly is reported. Energy star information is available via the CLI on clusters running in both enterprise and compliance mode

Under the hood, the power and temperature data is retrieved through the IPMI interface. Command support is at the node level, and values are cached for up to five seconds after retrieval.

The CLI syntax involves the ‘isi_estar’ command, which can be used to obtain energy star data from the following supported PowerScale platforms:

  • F-series: F200, F600, F900, F800, F810
  • H-series: H400, H500, H5600, H600, H700, H7000
  • A-series: A200, A2000, A300, A3000
  • Accelerators: P100, B100

For example, the output from an F600 node reports:

# isi_for_array -s isi_estar

f600prime-1: Input power:             374.000

f600prime-1: Inlet air temperature:   19.000

Note that all previous generation platforms are unsupported and will return the following error:

 “ERROR – Error reading psi sensor name, platform not supported.”

However, for nodes that don’t support the ‘isi_estar’ CLI command, similar output can be obtained using the ‘isi_hw_status -wt’ CLI command syntax. For example:

# isi_hw_status -wt | grep -e Ambient_Temp -e Pwr_Consumption

Pwr_Consumption                 = 424.000

Ambient_Temp                    = 26.000

Since the ‘isi_estar’ command is node-local, in order to report on all the nodes in the cluster it can be run via the isi_for_array utility. upplies.

# isi_for_array -s isi_estar

f600prime-1: Input power:             374.000

f600prime-1: Inlet air temperature:   19.000

f600prime-2: Input power:             374.000

f600prime-2: Inlet air temperature:   20.000

f600prime-3: Input power:             374.000

f600prime-3: Inlet air temperature:   21.000

The ‘input power’ metric displays the combined power consumption in watts for both of a node’s power supplies.

To continuously sample isi_estar output at 30 second intervals, use the following syntax:

# while true; do date; isi_estar; sleep 30; done

Another useful utility for easily verifying the status of the nodes’ journal batteries in a cluster is the ‘isi batterystatus’ CLI command set. For example:

# isi batterystatus list

Lnn  Status1                           Status2  Result1  Result2

-----------------------------------------------------------------

1    Ready, enabled, and fully charged N/A      passed   N/A

2    Ready, enabled, and fully charged N/A      passed   N/A

3    Ready, enabled, and fully charged N/A      passed   N/A

4    Ready, enabled, and fully charged N/A      passed   N/A

Detailed status for a particular nodes is also available:

# isi batterystatus view

            Lnn: 1

        Status1: Ready and enabled

        Status2: N/A

        Result1: passed

        Result2: N/A

Last Test Time1: Mon Sep 12 23:59:50 2022

Next Test Time1: Fri Sep 23 09:59:50 2022

Last Test Time2: N/A

Next Test Time2: N/A

      Supported: Yes

        Present: Yes

For more detailed hardware and environmental data, the ‘isi_hw_status’ CLI command can also be useful with its plethora of information:

# isi_hw_status

  SerNo: JACNT194340156

 Config: 110-385-400B-04

ChsSerN: JACNN194420707

ChsSlot: 4

FamCode: H

ChsCode: 4U

GenCode: 0

PrfCode: 5

   Tier: 3

  Class: storage

 Series: n/a

Product: H500-4U-Single-128GB-1x1GE-2x10GE SFP+-30TB-1638GB SSD

  HWGen: PSI

Chassis: INFINITY (Infinity Chassis)

    CPU: GenuineIntel (2.20GHz, stepping 0x000406f1)

   PROC: Single-proc, 10-HT-core

    RAM: 137226121216 Bytes

   Mobo: INFINITY (Custom EMC Motherboard)

  NVRam: INFINITY (Infinity Memory Journal) (4096MB card) (size 4294967296B)

 DskCtl: PMC8074 (PMC 8074) (8 ports)

 DskExp: PMC8056I (PMC-Sierra PM8056 - Infinity)

PwrSupl: Slot3-PS0 (type=ARTESYN, fw=02.14)

PwrSupl: Slot4-PS1 (type=ARTESYN, fw=02.14)

  NetIF: bge0,mlxen0,mlxen1,mlxen2,mlxen3

 IBType: Unknown (None)

 BEType: 40GigE

 FEType: 10GigE

 LCDver: IsiVFD2 (Isilon VFD V2)

 Midpln: NONE (No Midplane Support)

Power Supplies OK

Power Supply Slot3-PS0 good

Power Supply Slot4-PS1 good

CPU Operation (raw 0x88290000)  = Normal

CPU Speed Limit                 = 100.00%

Fan0_Speed                      = 6600.000

Fan1_Speed                      = 6600.000

Slot3-PS0_In_Voltage            = 209.000

Slot4-PS1_In_Voltage            = 209.000

SP_CMD_Vin                      = 12.200

CMOS_Voltage                    = 3.080

Slot3-PS0_Input_Power           = 176.000

Slot4-PS1_Input_Power           = 216.000

Pwr_Consumption                 = 400.000

DIMM_Bank0                      = 46.000

DIMM_Bank1                      = 47.000

CPU0_Temp                       = -36.000

SP_Temp0                        = 40.000

MP_Temp0                        = 27.000

Hottest_HDD_Temp                = 20.000

Ambient_Temp                    = 27.000

Slot3-PS0_Temp0                 = 64.000

Slot3-PS0_Temp1                 = 37.000

Slot4-PS1_Temp0                 = 66.000

Slot4-PS1_Temp1                 = 40.000

Battery0_Temp                   = 41.000

Altitude                        = 120.000

Note that, unlike the ‘isi_estar’ command, ‘isi_hw_status’ will run on all PowerScale and earlier generation Isilon nodes.

Options for the ‘isi_hw_status’ CLI command include:

Flag Description
-S pct set CPU speed throttling to given percentage
-C clear CPU overtemp indicator
-L use list-format
-T use table-format
-H suppress headers in table-format mode
-HH shows _only_ the headers in table-format mode
-c show system components
-d show debug-level system components (-dd for more)
-F show system component flags
-i show system identification
-s show system state
-f show fan speeds
-v show voltages
-a show currents
-w show watts
-t Show temperatures
-m Show meters
-I Show miscellaneous inputs
-b Show NVRAM status
-V Turns on verbose output
-g Turns on debug output
-q Turns on quiet mode (suppesses extra verbose/debug output)
-A Show all output
-r Disable IPMI command cache reads on IPMI-based platforms
-P Probe and update hardware state info in PSI

For example, the ‘isi_hw_status’ command run with the ‘-sfvt’ flags can be useful to show a node’s environmental data:

# isi_hw_status -sfvt

Power Supplies OK

Power Supply Slot3-PS0 good

Power Supply Slot4-PS1 good

CPU Operation (raw 0x88290000)  = Normal

CPU Speed Limit                 = 100.00%

Fan0_Speed                      = 6600.000

Fan1_Speed                      = 6600.000

Slot3-PS0_In_Voltage            = 212.000

Slot4-PS1_In_Voltage            = 213.000

SP_CMD_Vin                      = 12.200

CMOS_Voltage                    = 3.080

DIMM_Bank0                      = 43.000

DIMM_Bank1                      = 44.000

CPU0_Temp                       = -36.000

SP_Temp0                        = 38.000

MP_Temp0                        = 26.000

Hottest_HDD_Temp                = 20.000

Ambient_Temp                    = 26.000

Slot3-PS0_Temp0                 = 58.000

Slot3-PS0_Temp1                 = 33.000

Slot4-PS1_Temp0                 = 62.000

Slot4-PS1_Temp1                 = 37.000

Battery0_Temp                   = 38.000

Note that the ‘isi_hw_status’ command output can also be viewed in table format with the ’-T’ flag, in order to aid readability.

Additionally, the OneFS stats engine records virtually all of the sensor data, a list of which can be obtained by running:

# isi statistics list keys list | grep sensor.temp

For example, the temperature sensors on an H700 include:

node.sensor.temp.celsius.Ambient_Temp

node.sensor.temp.celsius.Battery0_Temp

node.sensor.temp.celsius.CPU0_Temp

node.sensor.temp.celsius.DIMM_Bank0

node.sensor.temp.celsius.DIMM_Bank1

node.sensor.temp.celsius.Drive_IO0_Temp

node.sensor.temp.celsius.Embed_IO_Temp0

node.sensor.temp.celsius.Hottest_SAS_Drv

node.sensor.temp.celsius.MP_Temp0

node.sensor.temp.celsius.MP_Temp1

node.sensor.temp.celsius.SLIC0_Temp

node.sensor.temp.celsius.SLIC1_Temp

node.sensor.temp.celsius.SP_Temp0

node.sensor.temp.celsius.Slot1-PS0_Temp0

node.sensor.temp.celsius.Slot1-PS0_Temp1

node.sensor.temp.celsius.Slot2-PS1_Temp0

node.sensor.temp.celsius.Slot2-PS1_Temp1

OneFS keeps in-memory a recent history of all of the stats that the engine collects.

For a node’s HDD and SSD drives, the ‘isi devices drive’ CLI command syntax can be used view status:

# isi devices drive list

Lnn  Location  Device    Lnum  State   Serial       Sled

---------------------------------------------------------

8    Bay  1    /dev/da1  15    L3      9VNX0JA00844 N/A

8    Bay  2    -         N/A   EMPTY                N/A

8    Bay  A0   /dev/da9  7     HEALTHY ZC23CH5P     A

8    Bay  A1   /dev/da2  14    HEALTHY ZC23CGX9     A

8    Bay  A2   /dev/da10 6     HEALTHY ZC23CGRE     A

8    Bay  B0   /dev/da3  13    HEALTHY ZC23C7WL     B

8    Bay  B1   /dev/da11 5     HEALTHY ZC23CH5Q     B

8    Bay  B2   /dev/da4  12    HEALTHY ZC23C8LQ     B

8    Bay  C0   /dev/da12 4     HEALTHY ZC23C8P0     C

8    Bay  C1   /dev/da5  11    HEALTHY ZC23C8C3     C

8    Bay  C2   /dev/da13 3     HEALTHY ZC23C8L1     C

8    Bay  D0   /dev/da6  10    HEALTHY ZC23C8FD     D

8    Bay  D1   /dev/da14 2     HEALTHY ZC23C7Z3     D

8    Bay  D2   /dev/da7  9     HEALTHY ZC23C874     D

8    Bay  E0   /dev/da15 1     HEALTHY ZC23C84D     E

8    Bay  E1   /dev/da8  8     HEALTHY ZC23C8LG     E

8    Bay  E2   /dev/da16 0     HEALTHY ZC23BC3X     E

---------------------------------------------------------

In this case, since it’s a chassis-based node, the drive location includes both bay and sled.

For more extensive drive information, the ‘isi_radish’ CLI command is also useful for querying a variety of drive heath and performance metrics, including drive and airflow temperatures. The ‘-T’ flag can be used to specifically report drive threshold violations:

# isi_radish -T

As mentioned previously, OneFS 9.0 and later releases also support the Intelligent Platform Management Interface protocol (IPMI), and, amongst other things, use this to gather the ‘isi_estar’ environmental data.

IPMI allows out-of-band console access and remote power control across a dedicated ethernet interface via Serial over LAN (SoL). As such, IMPI provides true lights-out management for PowerScale and Isilon Gen6 nodes without the need for additional rs-232 serial port concentrators or PDU rack power controllers.

For example, IPMI enables individual nodes, or the entire cluster, to be powered on after maintenance, or gracefully powered down after a power outage if the cluster is operating on limited backup power.

Similarly, IPMI facilitates performing a Hard/Cold Reboot/Power Cycle, for example, if a node is unresponsive to OneFS.

IPMI is enabled, configured, and operated from the CLI via the ‘isi ipmi’ command set and a cluster’s console can easily be accessed using the IPMItool utility. IPMItool is available as part of most Linux distributions, or accessible through other proprietary tools.

For the PowerScale F900, F600, F200, stand-alone nodes, the Dell iDRAC remote console option can also be accessed via an https web browser session to the default port 443 at a node’s IPMI address.

Note that in order to run the OneFS IPMI commands, the administrative account being used must have the ‘RBAC ISI_PRIV_IPMI’ privilege.

The following CLI syntax can be used to enable IPMI for DHCP:

# isi ipmi settings modify --enabled=True --allocation-type=dhcp 35 426 IPMI

Similarly, to enable IPMI for a static IP address:

# isi ipmi settings modify --enabled=True --allocation-type=static

To enable IPMI for a range of IP addresses use:

# isi ipmi network modify --gateway=[gateway IP] --prefixlen= --ranges=[IP Range]

The power control and Serial over LAN features can be configured and viewed using the following CLI command syntax. For example:

# isi ipmi features list

ID            Feature Description           Enabled

----------------------------------------------------

Power-Control Remote power control commands Yes

SOL           Serial over Lan functionality Yes

----------------------------------------------------

To enable the power control feature:

# isi ipmi features modify Power-Control --enabled=True

To enable the Serial over LAN (SoL) feature:

# isi ipmi features modify SOL --enabled=True

The following CLI commands can be used to configure a single username and password to perform IPMI tasks across all nodes in a cluster. Note that usernames can be up to 16 characters in length, while the associated passwords must be 17-20 characters in length.

To configure the username and password, run the CLI command:

# isi ipmi user modify --username [Username] --set-password

To confirm the username configuration, use:

# isi ipmi user view

Username: admin

On the client side, the ‘ipmitool’ command utility is ubiquitous in the Linux and UNIX world, and is included natively as part of most distributions. If not, it can easily be installed using the appropriate package manager, such as ‘yum’.

The ipmitool usage syntax is as follows:

[Linux Host:~]$ ipmitool -I lanplus -H [Node IP] -U [Username] -L OPERATOR -P [password]

For example, to execute power control commands:

ipmitool -I lanplus -H [Node IP] -U [Username] -L OPERATOR -P [password] power [command]

The ‘power’ command options above include status, on, off, cycle, and reset.

Leave a Reply

Your email address will not be published. Required fields are marked *