OneFS Job Contention

Got asked the following question from the field recently:

“I kicked off a job on my cluster which seemed to be running happily. When I went back into the UI to check it had stopped and was marked waiting and wouldn’t restart.”

Situations like this occur when a running lower priority job is trumped by higher priority job(s). Since the OneFS Job Engine only allows up to three jobs to be run simultaneously, if a fourth job with a higher priority is started, the lowest of the currently executing jobs will be paused. For example:

# isi job start fsanalyze

Started job [583]

# isi job status

The job engine is running.

Running and queued jobs:

ID   Type       State Impact  Pri  Phase Running Time

---------------------------------------------------------

578  SmartPools Running Low     6   1/2   11s

581  Collect    Running Low     4   1/3   16s

583  FSAnalyze  Running Low     5   1/10  1s

---------------------------------------------------------

Total: 3



In this case, we have three jobs running: SmartPools with a priority of 6, MultiScan with priority 4, and FSAnalyze with priority 5.

Next, we go ahead and start a deduplication job, with a priority value of 4:

# isi job start dedupe

Started job [584]

If we now take a look at the cluster’s job status, we can see that SmartPools job has been put into a waiting state (paused), because of its relative priority. A value of ‘1’ indicates the highest priority job level that OneFS supports, with ‘10’ being the lowest.

# isi job status

The job engine is running.

Running and queued jobs:

ID   Type       State Impact  Pri  Phase Running Time

---------------------------------------------------------

578  SmartPools Waiting Low     6  1/2   11s

581  Collect    Running Low     4  1/3   1m 4s

583  FSAnalyze  Running Low     5  9/10  43s

584  Dedupe     Running Low     4  1/1    -

---------------------------------------------------------

Total: 4



Once the FSAnalyze job has completed, the SmartPools job is automatically restarted again:

# isi job status

The job engine is running.

Running and queued jobs:

ID   Type       State Impact  Pri  Phase Running Time

---------------------------------------------------------

578  SmartPools Running Low     6   1/2  23s

581  Collect    Running Low     4   1/3  5m 9s

584  Dedupe     Running Low     4   1/1  1m 2s

---------------------------------------------------------

Total: 3

Let’s look at this in a bit more detail. So the Job Engine’s concurrent job execution is governed by the following criteria:

  • Job Priority
  • Exclusion Sets – jobs which cannot run together (ie, FlexProtect and AutoBalance)
  • Cluster health – most jobs cannot run when the cluster is in a degraded state.

In addition to the per-job impact controls described above, additional impact management is also provided by the notion of job exclusion sets. For multiple concurrent job execution, exclusion sets, or classes of similar jobs, determine which jobs can run simultaneously. A job is not required to be part of any exclusion set, and jobs may also belong to multiple exclusion sets.

Currently, there are two exclusion sets that jobs can be part of: Restriping and Marking:

Here’s a list of the basic concurrent job combinations that OneFS supports:

  • 1 Restripe Job + 1 Mark Job + 1 Other Job
  • 1 Restripe Job + 2 Other Jobs
  • 1 Mark Job + 2 Other Jobs
  • 1 Mark and Restripe Job + 2 Other Jobs
  • 3 Other Jobs

OneFS marks blocks that are actually in use by the file system. IntegrityScan, for example, traverses the live file system, marking every block of every LIN in the cluster to proactively detect and resolve any issues with the structure of data in a cluster. The jobs that comprise the marking exclusion set are:

  • Collect
  • IntegrityScan
  • MultiScan

OneFS protects data by writing file blocks across multiple drives on different nodes in a process known as ‘restriping’. The Job Engine defines a restripe exclusion set that contains these jobs which involve file system management, protection and on-disk layout. The restripe exclusion set contains the following jobs:

  • AutoBalance
  • AutoBalanceLin
  • FlexProtect
  • FlexProtectLin
  • MediaScan
  • MultiScan
  • SetProtectPlus
  • ShadowStoreProtect
  • SmartPools
  • Upgrade

The restriping exclusion set is per-phase instead of per job. This helps to more efficiently parallelize restripe jobs when they don’t need to lock down resources.

Restriping jobs only block each other when the current phase may perform restriping. This is most evident with MultiScan, whose final phase only sweeps rather than restripes. Similarly, MediaScan, which rarely ever restripes, is usually able to run to completion more without contending with other restriping jobs.

For example, below the two restripe jobs, MediaScan and AutoBalanceLin, are both running their respective first job phases. ShadowStoreProtect, also a restriping job, is in a ‘waiting’ state, blocked by AutoBalanceLin.

Running and queued jobs:

ID    Type               State       Impact  Pri  Phase  Running Time

----------------------------------------------------------------------

26850 AutoBalanceLin     Running     Low     4    1/3    20d 18h 19m 

26910 ShadowStoreProtect Waiting     Low     6    1/1    -           

28133 MediaScan          Running     Low     8    1/8    1d 15h 37m  

----------------------------------------------------------------------

MediaScan restripes in phases 3 and 5 of the job, and only if there are disk errors (ECCs) which require data reprotection. If MediaScan reaches phase 3 with ECCs, it will pause until AutoBalanceLin is no longer running. If MediaScan’s priority were in the range 1-3, it would cause AutoBalanceLin to pause instead.

If two jobs happen to reach their restriping phases simultaneously and the jobs have different priorities, the higher priority job (ie. priority value closer to “1”) will continue to run, and the other will pause. If the two jobs have the same priority, the one already in its restriping phase will continue to run, and the one newly entering its restriping phase will pause.

Jobs may also belong to both exclusion sets. An example of this is MultiScan, since it includes both AutoBalance and Collect.

The majority of the jobs do not belong to an exclusion set, as illustrated in the following graphic. These are typically the feature support jobs, as described above, and they can coexist and contend with any of the other jobs.

Exclusion sets do not change the scope of the individual jobs themselves, so any runtime improvements via parallel job execution are the result of job management and impact control. The Job Engine monitors node CPU load and drive I/O activity per worker thread every twenty seconds to ensure that maintenance jobs do not cause cluster performance problems.

If a job affects overall system performance, Job Engine reduces the activity of maintenance jobs and yields resources to clients. Impact policies limit the system resources that a job can consume and when a job can run. You can associate jobs with impact policies, ensuring that certain vital jobs always have access to system resources.

Looking at our previous example again, where the SmartPools job is paused when FSAnalyze is running:

# isi job list

ID   Type       State Impact  Pri  Phase Running Time

---------------------------------------------------------

578  SmartPools Waiting Low     6  1/2    15s

581  Collect    Running Low     4  1/3    37m

584  Dedupe     Running Low     4  1/1    33m

586  FSAnalyze  Running Low     5  1/10   9s

---------------------------------------------------------

Total: 4


If this is undesirable, FSAnalyze can be manually paused, to allow SmartPools to run unimpeded:

# isi job pause FSAnalyze

# isi job list

ID   Type       State       Impact Pri  Phase  Running Time

-------------------------------------------------------------

578  SmartPools Waiting     Low     6   1/2    15s

581  Collect    Running     Low     4   1/4    38m

584  Dedupe     Running     Low     4   1/1    34m

586  FSAnalyze  User Paused Low     5   1/10   20s

-------------------------------------------------------------

Total: 4

Alternatively, the priority of the SmartPools job can also be elevated to value ‘4’ (or the priority of FSAnalyze lowered to value ‘7’) to permanently prioritize it over the FSAnalyze job. For example:

# isi job types modify SmartPools --priority 4

Are you sure you want to modify the job type SmartPools? (yes/[no]): yes

# isi job types view SmartPools

         ID: SmartPools

Description: Enforce SmartPools file policies. This job requires a SmartPools license.

    Enabled: Yes

     Policy: LOW

   Schedule: every day at 22:00

   Priority: 4

Or, via the WebUI:

Navigate to Job Operations > Job Type and configure the desired priority value edit the appropriate job type’s details:

Leave a Reply

Your email address will not be published. Required fields are marked *