OneFS Cluster Configuration Backup and Restore – Unstructured Data Quick Tips

The basic ability to export a cluster’s configuration, which can then be used to perform a config restore, has been available since OneFS 9.2. However, OneFS 9.7 sees an evolution of the cluster configuration backup and restore architecture plus a significant expansion in the breadth of supported OneFS components, which now includes authentication, networking, multi-tenancy, replication, and tiering:

A configuration export and import can be performed via either the OneFS CLI or platform API, and encompasses the following OneFS components for configuration backup and restore:

Component	Configuration / Action	Release
Auth	Roles: Backup / Restore Users: Backup / Restore Groups: Backup / Restore	OneFS 9.7
Filepool	Default-policy: Backup / Restore Policies: Backup / Restore	OneFS 9.7
HTTP	Settings: Backup / Restore	OneFS 9.2+
NDMP	Users: Backup / Restore Settings: Backup / Restore	OneFS 9.2+
Network	Groupnets: Backup / Restore Subnets: Backup / Restore Pools: Backup / Restore Rules: Backup / Restore DNScache: Backup / Restore External: Backup / Restore	OneFS 9.7
NFS	Exports: Backup / Restore Aliases: Backup / Restore Netgroup: Backup / Restore Settings: Backup / Restore	OneFS 9.2+
Quotas	Quotas: Backup / Restore Quota notifications: Backup / Restore Settings: Backup / Restore	OneFS 9.2+
S3	Buckets: Backup / Restore Settings: Backup / Restore	OneFS 9.2+
SmartPools	Nodepools: Backup Tiers: Backup Settings: Backup / Restore	OneFS 9.7
SMB	Shares: Backup / Restore Settings: Backup / Restore	OneFS 9.2+
Snapshots	Schedules: Backup / Restore Settings: Backup / Restore	OneFS 9.2+
SmartSync	Accounts: Backup / Restore Certificates: Backup Base-policies: Backup / Restore Policies: Backup / Restore Throttling: Backup / Restore	OneFS 9.7
SyncIQ	Policies: Backup / Restore Certificates: Backup Rules: Backup Settings: Backup / Restore	OneFS 9.7
Zone	Zones: Backup / Restore	OneFS 9.7

In addition to the above expanded components support, the principal feature enhancements added to cluster configuration backup and restore in OneFS 9.7 include:

Addition of a daemon to manage backup/restore jobs.
The ability to lock the configuration during a backup.
Support for custom rules when restoring subnet IP addresses.

Let’s first take a look at the overall architecture. The legacy cluster configuration backup and restore infrastructure in OneFS 9.6 and earlier was as follows:

By way of contrast, OneFS 9.7 now sees the addition of a new configuration manager daemon, adding a fifth layer to the stack, and also increasing security and guarantying configuration consistency/idempotency:

The various layers in this OneFS 9.7 architecture can be characterized as follows:

Architectural Layer	Description
User Interface	Allows users to submit operations with multiple choices, such as PlatformAPI or CLI.
pAPI Handler	Performs diﬀerent actions according to the requests ﬂowing in.
Config Manager Daemon	New daemon in OneFS 9.7 to manage backup and restore jobs.
Config Manager	Core layer executing diﬀerent jobs which are called by PAPI handlers.
Database	Lightweight database manage asynchronous jobs, tracing state and receiving task data.

The new configuration management (ConfigMgr) daemon receives job requests from the platform API export and import handlers, and launches the corresponding backup and restore jobs as required. The backup and restore jobs will call a specific component’s pAPI handler in order to export of import the configuration data. Exported configuration data itself is saved under /ifs/data/Isilon_Support/config_mgr/backup/, while the job information and context is saved to a SQLite job information database that resides at /ifs/.ifsvar/modules/config_mgr/config.sqlite.

Enabled by default, the ConfigMgr daemon runs as a OneFS service, and can be viewed and managed as such:

# isi services -a | grep -i config_mgr

   isi_config_mgr_d     Config mgr Daemon                        Enabled

This isi_config_mgr_d daemon is managed by MCP, OneFS’ main utility for distributed service control across a cluster.

MCP is responsible for starting, monitoring, and restarting failed services on a cluster. It also monitors configuration files and acts upon configuration changes, propagating local file changes to the rest of the cluster. MCP is actually comprised of three different processes, one for each of its modes:

The ‘Master’ is the central MCP process and does the bulk of the work. It monitors files and services, including the failsafe process, and delegates actions to the forker process.

The role of the ‘Forker’ is to receive command-line actions from the master, execute them, and return the resulting exit codes. It receives actions from the master process over a UNIX domain socket. If the forker is inadvertently or intentionally killed, it’s automatically restarted by the master process. If necessary, MCP will continue trying to restart the forker at an increasing interval. If, after around ten minutes of unsuccessfully attempting to restart the forker, MCP will fire off a CELOG alert, and continue trying. A second alert would then be sent after thirty minutes.

MCP ensures the correct state of the service on a node, and since isi_config_mrg_d is marked ‘enable’ by default, it will run the start action until the PID confirms the daemon is running. MCP monitors services by observing their PID files (under /var/run), plus the process table itself, to determine if a process is already running or not, comparing this state against the ‘enabled/disabled’ configuration for the service and determining whether any start or stop actions are required.

In the event of an abnormal termination of a configuration restore job, the job status will be updated in the job info database, and MPC will attempt to restart the daemon. But if a configuration backup job fails, the daemon will assist in freeing the configuration lock, too. While the backup job is running, it will lock the configuration to prevent changes until the backup is complete, guarding against any potential race-induced inconsistencies in the configuration data. Typically the config backup job execution is swift, so the locking effect on the cluster is minimal. Also, config locking does not impact in-progress POST, PUT, DETELE changes. Once successfully completed, the backup job will automatically relinquish its configuration lock(s). Additionally, the ‘isi cluster config lock’ CLI command set can be used to both view state and manually modify (enable or disable) the configuration locks.

The other main enhancement to configuration backup and restore in OneFS 9.7 is the ability to create custom rules for restoring subnet IP addresses. This allows the assignment of different IP address from the backup when restoring the network config on a target cluster. As such, a network configuration restore will not attempt to overwrite any existing subnets and pools’ IP addresses, thus avoiding a potential connectivity disruption.

In the next article in this series we’ll take a look at the operation and management of cluster configuration backup and restore.

Leave a Reply Cancel reply