OneFS FilePolicy Job – Unstructured Data Quick Tips

Traditionally, OneFS has used the SmartPools jobs to apply its file pool policies. To accomplish this, the SmartPools job visits every file, and the SmartPoolsTree job visits a tree of files. However, the scanning portion of these jobs can result in significant random impact to the cluster and lengthy execution times, particularly in the case of SmartPools job. To address this, OneFS also provides the FilePolicy job, which offers a faster, lower impact method for applying file pool policies than the full-blown SmartPools job.

But first, a quick Job Engine refresher…

As we know, he Job Engine is OneFS’ parallel task scheduling framework, and is responsible for the distribution, execution, and impact management of critical jobs and operations across the entire cluster.

The OneFS Job Engine schedules and manages all the data protection and background cluster tasks: creating jobs for each task, prioritizing them and ensuring that inter-node communication and cluster wide capacity utilization and performance are balanced and optimized. Job Engine ensures that core cluster functions have priority over less important work and gives applications integrated with OneFS – Isilon add-on software or applications integrating to OneFS via the OneFS API – the ability to control the priority of their various functions to ensure the best resource utilization.

Each job, for example the SmartPools job, has an “Impact Profile” comprising a configurable policy and a schedule which characterizes how much of the system’s resources the job will take – plus an Impact Policy and an Impact Schedule. The amount of work a job has to do is fixed, but the resources dedicated to that work can be tuned to minimize the impact to other cluster functions, like serving client data.

Here’s a list of the specific jobs that are directly associated with OneFS SmartPools:

Job	Description
SmartPools	Job that runs and moves data between the tiers of nodes within the same cluster. Also executes the CloudPools functionality if licensed and configured.
SmartPoolsTree	Enforces SmartPools file policies on a subtree.
FilePolicy	Efficient changelist-based SmartPools file pool policy job.
IndexUpdate	Creates and updates an efficient file system index for FilePolicy job.
SetProtectPlus	Applies the default file policy. This job is disabled if SmartPools is activated on the cluster.

In conjunction with the IndexUpdate job, FilePolicy improves job scan performance for by using a ‘file system index’, or changelist, to find files needing policy changes, rather than a full tree scan.

Avoiding a full treewalk dramatically decreases the amount of locking and metadata scanning work the job is required to perform, reducing impact on CPU and disk – albeit at the expense of not doing everything that SmartPools does. The FilePolicy job enforces just the SmartPools file pool policies, as opposed to the storage pool settings. For example, FilePolicy does not deal with changes to storage pools or storage pool settings, such as:

Restriping activity due to adding, removing, or reorganizing node pools.
Changes to storage pool settings or defaults, including protection.
Packing small files into shadow store containers.

However, the majority of the time SmartPools and FilePolicy perform the same work. Disabled by default, FilePolicy supports the full range of file pool policy features, reports the same information, and provides the same configuration options as the SmartPools job. Since FilePolicy is a changelist-based job, it performs best when run frequently – once or multiple times a day, depending on the configured file pool policies, data size and rate of change.

The FilePolicy job can be invoked from the OneFS CLI with the following configuration options:

Config Option	Details
–directory-only	Skip processing of regular files.
–ingest	Fast mode for quickly setting policies on directories. Alias for –directory-only –policy-only.
–nop	Dry run, calculate what would be done.
–policy-only	Apply policies but skip restriping.

Job schedules can easily be configured from the OneFS WebUI by navigating to Cluster Management > Job Operations, highlighting the desired job and selecting ‘View\Edit’. The following example illustrates configuring the IndexUpdate job to run every six hours at a LOW impact level with a priority value of 5:

When enabling and using the FilePolicy and IndexUpdate jobs, the recommendation is to continue running the SmartPools job as well, but at a reduced frequency (monthly).

In addition to running on a configured schedule, the FilePolicy job can also be executed manually.

FilePolicy requires access to a current index. In the event that the IndexUpdate job has not yet been run, attempting to start the FilePolicy job will fail with the error below. Instructions in the error message will be displayed, prompting to run the IndexUpdate job first. Once the index has been created, the FilePolicy job will run successfully. The IndexUpdate job can be run several times daily (ie. every six hours) to keep the index current and prevent the snapshots from getting large.

Consider using the FilePolicy job with the job schedules below for workflows and datasets with the following characteristics:

Data with long retention times
Large number of small files
Path-based File Pool filters configured
Where FSAnalyze job is already running on the cluster (InsightIQ monitored clusters)
There is already a SnapshotIQ schedule configured
When the SmartPools job typically takes a day or more to run to completion at LOW impact

For clusters without the characteristics described above, the recommendation is to continue running the SmartPools job as usual and to not activate the FilePolicy job.

The following table provides a suggested job schedule when deploying FilePolicy:

Job	Schedule	Impact	Priority
FilePolicy	Every day at 22:00	LOW	6
IndexUpdate	Every six hours, every day	LOW	5
SmartPools	Monthly – Sunday at 23:00	LOW	6

Since no two clusters are the same, so this suggested job schedule may require additional tuning to meet the needs of a specific environment.

Note that when clusters running older OneFS versions and the FSAnalyze job are upgraded to OneFS 8.2.x or later, the legacy FSAnalyze index and snapshots are removed and replaced by new snapshots the first time that IndexUpdate is run. The new index stores considerably more file and snapshot attributes than the old FSA index. Until the IndexUpdate job effects this change, FSA keeps running on the old index and snapshots.

Leave a Reply Cancel reply