OneFS Job Engine and Parallel Restriping – Part4

In the final article in this series, we take a look at the configuration and management of parallel restriping. To support this, OneFS 9.7 includes a new ‘isi job settings’ CLI command set, allowing the parallel restriper configuration to be viewed and modified. By default, no changes are made to the Job Engine upon upgrade to 9.7, so the legacy behavior allowing only a single restripe job to run at any point in time is preserved. This is reflected in the new ‘isi job settings’ CLI syntax:

# isi job settings view

Parallel Restriper Mode: Off

However, once a OneFS 9.7 upgrade has been committed, the parallel restriper can be configured in one of three modes:

  Mode Description
Off Default: Legacy restripe exclusion set behavior, with only one restripe job permitted.
Partial FlexProtect/FlexProtectLin runs alone, but all other restripers can be run together.
All No restripe job exclusions, beyond the overall limit of three concurrently running jobs.

For example, the following CLI command can be used to configure ‘partial’ parallel restriping support:

# isi job settings modify --parallel_restriper_mode=partial

# isi job settings view

Parallel Restriper Mode: Partial

As such, restriping jobs can run in parallel in ‘partial’ mode. For example, SmartPools and MultiScan, as in the following cluster‘s CLI output:

# isi job jobs list

ID   Type       State   Impact  Policy  Pri  Phase  Running Time

-----------------------------------------------------------------

3166 MultiScan  Running Low     LOW     4    1/4    17d 8h 5m

3790 SmartPools Running Low     LOW     6    1/2    5d 17h 16m

-----------------------------------------------------------------

Total: 2

However, if the FlexProtect is started when a cluster is in ‘partial’ mode, all other restriping jobs are automatically paused. For example:

# isi job jobs start FlexProtect

Started job [4088]

# isi job jobs start FlexProtect

Started job [4114]

# isi job jobs list

ID   Type        State              Impact  Policy  Pri  Phase  Running Time

-----------------------------------------------------------------------------

3790 SmartPools  Waiting            Low     LOW     6    1/2    36s

3166 MultiScan   Running -> Waiting Low     LOW     4    1/4    28s

4114 FlexProtect Waiting            Medium  MEDIUM  1    1/6    -

-----------------------------------------------------------------------------

Total: 3

# isi job jobs list

ID   Type        State   Impact  Policy  Pri  Phase  Running Time

------------------------------------------------------------------

3166 MultiScan   Waiting Low     LOW     4    1/4    17d 8h 7m

3790 SmartPools  Waiting Low     LOW     6    1/2    5d 17h 17m

4088 FlexProtect Running Medium  MEDIUM  1    1/6    2s

------------------------------------------------------------------

Total: 3

Similarly, no restripe job exclusions can be implemented with the following CLI syntax:

# isi job settings modify --parallel_restriper_mode=all

This allows any of the restriping jobs, including FlexProtect, to run in parallel up to the Job Engine limit of three concurrent jobs. For example, MultiScan and SmartPools are both running below:

# isi job jobs list

ID   Type       State   Impact  Policy  Pri  Phase  Running Time

-----------------------------------------------------------------

3166 MultiScan  Waiting Low     LOW     4    1/4    17d 8h 7m

3790 SmartPools Waiting Low     LOW     6    1/2    5d 17h 17m

-----------------------------------------------------------------

Total: 2

# isi job settings view

Parallel Restriper Mode: All

If the FlexProtect job is then started, all three restriping jobs are allowed to run concurrently:

# isi job jobs start FlexProtect

Started job [4089]

# isi job jobs list

ID   Type        State   Impact  Policy  Pri  Phase  Running Time

------------------------------------------------------------------

3166 MultiScan   Running Low     LOW     4    1/4    17d 8h 8m

3790 SmartPools  Running Low     LOW     6    1/2    5d 17h 18m

4089 FlexProtect Running Medium  MEDIUM  1    1/6    3s

------------------------------------------------------------------

Total: 3

Furthermore, the restripe jobs, including FlexProtect, can be run with the desired priority and impact settings. For example:

# isi job jobs start Flexprotect --policy LOW --priority 6

Started job [4100]

# isi job jobs list

ID   Type        State   Impact  Policy  Pri  Phase  Running Time

------------------------------------------------------------------

4097 SmartPools  Running Medium  MEDIUM  1    1/2    1m 42s

4098 MultiScan   Running Medium  MEDIUM  1    1/4    1m 13s

4100 FlexProtect Running Low     LOW     6    1/6    -

------------------------------------------------------------------

Total: 3

If necessary, the Job Engine can always be easily reverted to its default restripe exclusion set behavior, with only one restripe job permitted, as follows:

 # isi job settings modify --parallel_restriper_mode=off

Note that a user account with the PRIV_JOB_ENGINE RBAC role is required to configure the parallel restripe settings.

Similar to other Job Engine configuration, the parallel restripe settings are stored in gconfig under the core.parallel_restripe_mode tree.

Like any multi-threaded or parallel architecture, contending restriping jobs may lock LINs for long periods due to bigger range locks. Also, since by nature restriping jobs are moving blocks around, they tend to be quite hard on drives. So, multiple restripers running in parallel have the potential to impact cluster performance and potentially client I/O (protocol throughput, etc) – especially if the contending restripe jobs are run at a MEDIUM impact.

Also note that the new parallel restripe mode only applies to waiting jobs, or jobs transitioning between phases. Typically, if you attempt to start a second job with restripe exclusion enabled, that second job will be placed into a ‘waiting’ state. If parallel restripe is then enabled, the second job will be re-evaluated, and promoted to a ‘running’ state. However, if both jobs are running and parallel restripe is then disabled, the second job will not automatically be paused. Instead, intervention from a cluster admin would be needed to manually pause that job, if desired.

Note too that restripe exclusion is on a per-job-phase basis. For example, the MultiScan job has four phases. The first three can restripe, while the fourth does not. As such, a different restriping job (e.g. SmartPools or FlexProtect) will not conflict with MultiScan’s fourth phase. There’s also no need to run AutoBalance and a restriping MultiScan at the same time since they do exactly the same thing.
Additionally, unless there’s a really valid reason to, a good practice is to avoid running AutoBalance or MultiScan while FlexProtect is running. Re-protecting the cluster is usually of considerably more importance than correct balance, so allowing FlexProtect to consume any available resources while it’s running is typically a prudent move.

When troubleshooting the parallel restriper, the Job Engine coordinator logs to both isi_job_d.log and /var/log/messages, writing both the initial value and the subsequent configuration change. This can be a good thing to check if unexpectedly high drive load is encountered. Maybe someone inadvertently enabled parallel restripe, or at least forgot to disable it again after an intended short term configuration change.

Leave a Reply

Your email address will not be published. Required fields are marked *