The previous post examined some of the general cluster performance metrics. In this article we’ll focus in on the disk subsystem and take a quick look at some of the drive statistics counters. As we’ll see, OneFS offers several tools to inspect and report on both drive health and performance.
Let’s start with some drive failure and wear reporting tools….
The following cluster-wide command will indicate any drives that are marked as smartfail, empty, stalled, or down:
# isi_for_array -sX 'isi devices list | egrep -vi “healthy|L3”'
Usually, any node that requires a drive replacement will have an amber warning light on the front display panel. Also, the drive that needs swapping out will typically be marked by a red LED.
Alternatively, isi_drivenum will also show the drive bay location of each drive, plus a variety of other disk related info, etc.
# isi_for_array -sX ‘isi_drivenum –A’
This next command provides drive wear information for each node’s flash (SSD) boot drives:
# isi_for_array -sSX "isi_radish -a /dev/da* | grep -e FW: -e 'Percent Life' | grep -v Used”
However, the output is in hex. This can be converted to a decimal percent value using the following shell command, where <value> is the raw hex output:
# echo "ibase=16; <value>" | bc
Alternatively, the following perl script will also translate the isi_radish command output from hex into comprehensible ‘life remaining’ percentages:
#!/usr/bin/perl use strict; use warnings; my @drives = ('ada0', 'ada1'); foreach my $drive (@drives) { print "$drive:\n"; open CMD,'-|',"isi_for_array -s isi_radish -vt /dev/$drive" or die "Failed to open pipe!\n"; while (<CMD>) { if (m/^(\S+).*(Life Remaining|Lifetime Left).*\(raw\s+([^)]+)/i) { print "$1 ".hex($3)."%\n"; } } }
The following drive statistics can be useful for both performance analysis and troubleshooting purposes.
General disk activity stats are available via the isi statistics command.
For example:
# isi statistics system –-nodes=all --oprates --nohumanize
This output will give you the per-node OPS over protocol, network and disk. On the disk side, the sum of DiskIn (writes) and DIskOut (reads) gives the total IOPS for all the drives per node.
For the next level of granularity, the following drive statistics command provides individual SSSD drive info. The sum of OpsIn and OpsOut is the total IOPS per drive in the cluster.
# isi statistics drive -nall -–long --type=sata --sort=busy | head -20
And the same info for SSDs:
# isi statistics drive -nall --long --type=ssd --sort=busy | head -20
The primary counters of interest in drive stats data are often the ‘TimeInQ’, ‘Queued’, OpsIn, OpsOut, and IO and the ’Busy’ percentage of each disk. If most or all the drives have high busy percentages, this indicates a uniform resource constraint, and there is a strong likelihood that the cluster is spindle bound. If, say, the top five drives are much busier than the rest, this suggests a workflow hot-spot.
# isi statistics pstat
The read and write mix, plus metadata operations, for a particular protocol can be gleaned from the output of the isi statistics pstat command. In addition to disk statistics, CPU and network stats are also provided. The –protocol parameter is used to specify the core NAS protocols such as NFSv3, NFSv4, SMB1, SMB2, HDFS, etc. Additionally, OneFS specific protocol stats, including job engine (jobd), platform API (papi), IRP, etc, are also available.
For example, the following will show NFSv3 stats in a ‘top’ format, refreshed every 6 seconds by default:
# isi statistics pstat --protocol nfs3 --format top
The uptime command provides system load average for 1, 5, and 15 minute intervals, and is comprised of both CPU queues and disk queues stats.
# isi_for_array -s 'uptime'
It’s worth noting that this command’s output does not take CPU quantity into account. As such, a load average of 1 on a single CPU means the node is pegged. However, that load average of 1 on a dual CPU system means the CPU is 50% idle.
The following command will give the CPU count:
# isi statistics query current --nodes all --degraded --stats node.cpu.count
The sum of disk ops across a cluster per node is available via the following syntax:
# isi statistics query current --nodes=all --stats=node.disk.xfers.rate.sum
There are a whole slew of more detailed drive metrics that OneFS makes available for query.
Disk time in queue provides an indication as to how long an operation is queued on a drive. This indicator is key if a cluster is disk-bound. A time in queue value of 10 to 50 milliseconds is concerning, whereas a value of 50 to 100 milliseconds indicates a potential problem.
The following CLI syntax can be used to obtain the maximum, minimum, and average values for disk time in queue for SATA drives in this case:
# isi statistics drive --nodes=all --degraded --no-header --no-footer | awk ' /SATA/ {sum+=$8; max=0; min=1000} {if ($8>max) max=$8; if ($8<min) min=$8} END {print “Min = “,min; print “Max = “,max; print “Average = “,sum/NR}’
The following command displays the time in queue for 30 drives sorted highest-to-lowest:
# isi statistics drive list -n all --sort=timeinq | head -n 30
Queue depth indicates how many operations are queued on drives. A queue depth of 5 to 10 is considered heavy queuing.
The following CLI command can be used to obtain the maximum, minimum, and average values for disk queue depth of SATA drives. If there’s a big delta between the maximum number and average number in the queue, it’s worth investigating further to determine whether an individual drive is working excessively.
# isi statistics drive --nodes=all --degraded --no-header --no-footer | awk ' /SATA/ {sum+=$9; max=0; min=1000} {if ($9>max) max=$9; if ($9<min) min=$9} END {print “Min = “,min; print “Max = “,max; print “Average = “,sum/NR}’
For information on SAS or SSD drives, you can substitute SAS or SSD for SATA in the above syntax.
To display queue depth for twenty drives sorted highest-to-lowest, run the following command:
# isi statistics drive list -n all --sort=queued | head -n 20
Note that the TimeAvg metric, as reported by isi statistics drive command, represents all the latency at the disk that doesn’t include the scheduler wait time (TimeInQ). So this is a measure of disk access time (ie. send the op, wait, receive response). The Total Time at the disk is a sum of the access time (TimeAvg) and the scheduler time (TimeInQ).
The disk percent busy metric can he useful to determine if a drive is getting pegged. However, it does not indicate how much extra work may be in the queue. To obtain the maximum, minimum, and average disk busy values for SATA drives, run the following command. For information on SAS or SSD drives, you can include SAS or SSD respectively, instead of SATA.
# isi statistics drive --nodes=all --degraded --no-header --no-footer | awk ' /SATA/ {sum+=$10; max=0; min=1000} {if ($10>max) max=$10; if ($10min) min=$10} END {print “Min = “,min; print “Max = “,max; print “Average = “,sum/NR}’
To display disk percent busy for twenty drives sorted highest-to-lowest issue, run the following command.
# isi statistics drive -nall --output=busy | head -n 20