OneFS SmartSync Configuration for Google Cloud – Unstructured Data Quick Tips

As we saw in the previous blog in this series, with the inclusion of Google Cloud (GCP) in OneFS 9.7, SmartSync Cloud Copy now supports all three of the principal public cloud hyperscalers.

Object data replication to Google Cloud (GCP) can be configured in OneFS 9.7 via the ‘isi dm accounts create’ CLI command. Required information includes the regular account configuration parameters plus the following GCP-specific settings:

GCP account type
GCP URI
Access ID
Secret key

Or, more specifically:

Parameter	Description
Object store type	GCP (or AWS_S3, Azure, ECS_S3, etc)
URI	{http,https}://hostname:port/bucketname
Auth	Access ID, Secret Key
Proxy	Optional proxy information

For example:

# isi dm account create --account-type GCP --name [Account Name] --access-id [GCP access-id] --uri [GCP URI with bucket-name] --auth-mode CLOUD --secret-key [GCP secret-key]

Once created, the new account can be verified with the following command:

# isi dm accounts list

Additionally, the next steps for SmartSync configuration and policy creation are covered in detail in the following blog article.

SmartSync Cloud Copy supports both push and pull replication, permitting the same dataset that is copied to GCP with a push to be copied back to the cluster via a corresponding pull.

Be aware that a dataset must be available before a policy runs, or the policy will fail.

Also note that, while multiple GCP URIs and credentials are supported by SmartSync, they are not supported on the same account. Multiple accounts and multiple corresponding policies would need to be created for SmartSync.

Other SmartSync features and functionality includes:

Feature	Details
Bandwidth throttling	Set of netmask rules. Limits are per-node.
CPU throttling	Allowed and Back-off CPU percentages.
Base policies	Template providing common values to groups of related policies (schedule, source base path, enable/disable, etc). Ie. Disabling base policy affects all linked concrete policies.
Concrete policy	Predefined set of fields from the base policy
Unconnected nodes (NANON)	Active accounts are monitored by each node. No work allocation to nodes without network access.
Snapshot locking	Avoids accidental snapshot deletion, with subsequent re-base-lining.

Behind the scenes, dataset creation leverages a SnapshotIQ snapshot, which can be inspected via the ‘isi snapshot list’ command. These DM dataset snapshots are easily recognizable due to their ‘isi_dm’ prefixed naming convention.

The SmartSync Cloud Copy format provides both regular file representation, browsability and usability of file system data in the cloud. In addition to the replication of the actual data, SmartSync also preserves the common file attributes including Windows ACLs, POSIX permissions and attributes, creation times, extended attributes, etc. However, there are certain considerations and limitations to be aware of, such as no incremental copy. These also include:

CloudCopy Caveats	Details
ADS files	Skipped when encountered.
Hardlinks	An object will be created for each link (ie. links are not preserved).
Symlinks	Skipped when encountered.
Directories	An object is created for each directory.
Special files	Skipped when encountered.
Metadata	Only POSIX mode bits, UID, GID, atime, mtime, ctime are preserved.
Filename encodings	Converted to UTF-8.
Path	Path relative to root copy directory is used as object key.
Large files	An error is returned for files larger than the cloud providers maximum object size.
Long filenames	File names exceeding 256 bytes are compressed.
Long paths	Junction points are created when paths exceed 1024 bytes to redirect where objects are being stored
Sparse files	Sparse sections are not preserved and are written out fully as zeros.

SmartSync allows subsequent incremental data movement by managing and re-transferring failed file transfers. Similarly, Dataset reconnect enables systems with common base datasets to establish instant incremental syncs. SmartSync also proactively locks the SnapshotIQ snapshots it uses, providing better separation between Datamover and other snapshots.

Performance-wise, SmartSync is powered by a scalable run-time engine, spanning the cluster, and which spins up threads (fibers) on demand and uses asynchronous IO to process replication tasks (chunks). Batch operations are used for efficient small file, attribute, and data block transfer. Namespace contention avoidance, efficient snapshot utilization, and separation of dataset creation from transfer are salient design features of the both the baseline and incremental sync algorithms.

Leave a Reply Cancel reply