OneFS SmartSync Backup-to-Object Management and Troubleshooting

As we saw in the previous articles in this series, SmartSync in OneFS 9.11 enjoys the addition of backup-to-object functionality, which delivers high performance, full-fidelity incremental replication to ECS, ObjectScale, Wasabi, and AWS S3 & Glacier IR object stores.

This new SmartSync backup-to-object functionality supports the full spectrum of OneFS path lengths, encodings, and file sizes up to 16TB – plus special files and alternate data streams (ADS), symlinks and hardlinks, sparse regions, and POSIX and SMB attributes.

In addition to the standard ‘isi dm’ command set, the following CLI utility can also come in handy for tasks such as verifying the dataset ID for restoration, etc:

# isi_dm browse

For example, to query the SmartSync accounts and datasets:

# isi_dm browse

<no account>:<no dataset> $ list-accounts

000000000000000100000000000000000000000000000000 (tme-tgt)

ec2a72330e825f1b7e68eb2352bfb09fea4f000000000000 (DM Local Account)

fd0000000000000000000000000000000000000000000000 (DM Loopback Account)

<no account>:<no dataset> $ connect-account 000000000000000100000000000000000000000000000000

tme-tgt:<no dataset> $ list-datasets

1       2025-07-22T10:23:33+0000        /ifs/data/zone3

2       2025-07-22T10:23:33+0000        /ifs/data/zone4

1025    2025-07-22T10:25:01+0000        /ifs/data/zone3

2049    2025-07-22T10:30:04+0000        /ifs/data/zone4

tme-tgt:<no dataset> $ connect-dataset 2

tme-tgt:2 </ifs/data/zone4:> $ ls

home                           [dir]

zone2_sync1753179349           [dir]

tme-tgt:2 </ifs/data/zone4:> $ cd zone2_sync1753179349

tme-tgt:2 </ifs/data/zone4:zone2_sync1753179349/> $ ls

home                           [dir]

tme-tgt:2 </ifs/data/zone4:zone2_sync1753179349/> $

Or for additional detail:

tme-tgt:2 </ifs/data/zone4:zone2_sync1753179349/> $ settings output-to-file-on /tmp/out.txt

tme-tgt:2 </ifs/data/zone4:zone2_sync1753179349/> $ settings verbose-on

tme-tgt:2 </ifs/data/zone4:zone2_sync1753179349/> $ list-datasets

1       2025-07-22T10:23:33+0000        /ifs/data/zone3 { dmdi_tree_id={ dmdti_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdti_local_unid=1 } dmdi_revision={ dmdr_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdr_local_unid=1 } }

2       2025-07-22T10:23:33+0000        /ifs/data/zone4 { dmdi_tree_id={ dmdti_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdti_local_unid=2 } dmdi_revision={ dmdr_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdr_local_unid=2 } }

1025    2025-07-22T10:25:01+0000        /ifs/data/zone3 { dmdi_tree_id={ dmdti_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdti_local_unid=1 } dmdi_revision={ dmdr_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdr_local_unid=3 } }

2049    2025-07-22T10:30:04+0000        /ifs/data/zone4 { dmdi_tree_id={ dmdti_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdti_local_unid=2 } dmdi_revision={ dmdr_system_guid={dmg_guid=0060486e3954c1b470687f084aa83df6c07d} dmdr_local_unid=4 } }

But when it comes to monitoring and troubleshooting SmartSync, there are a variety of diagnostic tools available. These include:

Component Tools Issue
Logging ·         /var/log/isi_dm.log

·         /var/log/messages

·         ifs/data/Isilon_Support/datamover/transfer_failures/baseline_failures_ <jobid>

General SmartSync info and  triage.
Accounts ·         isi dm accounts list / view Authentication, trust and encryption.
CloudCopy ·         S3 Browser (ie. Cloudberry), Microsoft Azure Storage Explorer Cloud access and connectivity.
Dataset ·         isi dm dataset list/view Dataset creation and health.
File system ·         isi get Inspect replicated files and objects.
Jobs ·         isi dm jobs list/view

·         isi_datamover_job_status -jt

Job and task execution, auto-pausing, completion, control, and transfer.
Network ·         isi dm throttling bw-rules list/view

·         isi_dm network ping/discover

Network connectivity and throughput.
Policies ·         isi dm policies list/view

·         isi dm base-policies list/view

Copy and dataset policy execution and transfer.
Service ·         isi services -a isi_dm_d <enable/disable> Daemon configuration and control.
Snapshots ·         isi snapshot snapshots list/view Snapshot execution and access.
System ·         isi dm throttling settings CPU load and system performance.

SmartSync info and errors are typically written to /var/log/isi_dm.log and /var/log/messages, while DM jobs transfer failures generate a log specific to the job ID under /ifs/data/Isilon_Support/datamover/transfer_failures.

Once a policy is running, the job status is reported via ‘isi dm jobs list’. Once complete, job histories are available by running ‘isi dm historical jobs list’. More details for a specific job can be gleaned from the ‘isi dm job view’ command, using the pertinent job ID from the list output above. Additionally, the ‘isi_datamover_job_status’ command with the job ID as an argument will also supply detailed information about a specific job.

Once running, a DM job can be further controlled via the ‘isi dm jobs modify’ command, and available actions include cancel, partial-completion, pause, or resume.

If a certificate authority (CA) is not correctly configured on a PowerScale cluster, the SmartSync daemon will not start, even though accounts and policies can still be configured. Be aware that the failed policies will not be reported via ‘isi dm jobs list’ or ‘isi dm historical-jobs list’ since they never started. Instead, an improperly configured CA is reported in the /var/log/isi_dm.log as follows:

Certificates not correctly installed, Data Mover service sleeping: At least one CA must be installed: No such file or directory from dm_load_certs_from_store (/b/mnt/src/isilon/lib/isi_dm/isi_dm_remote/src/rpc/dm_tls.cpp:197 ) from dm_tls_init (/b/mnt/src/isilon/lib/isi_dm/isi_dm_remote/src/rpc/dm_tls.cpp:279 ): Unable to load certificate information

Once a CA and identity are correctly configured, the SmartSync service automatically activates. Next, SmartSync attempts a handshake with the target. If the CA or identity is mis-configured, the handshake process fails, and generates an entry in /var/log/isi_dm.log. For example:

2025-07-30T12:38:17.864181+00:00 GEN-HOP-NOCL-RR-1(id1) isi_dm_d[52758]: [0x828c0a110]: /b/mnt/src/isilon/lib/isi_dm/isi_dm_remote/src/acct_mon.cpp:dm_acc tmon_try_ping:348: [Fiber 3778] ping for account guid: 0000000000000000c4000000000000000000000000000000, result: dead

Note that the full handshake error detail is logged if the SmartSync service (isi_dm_d) is set to log at the ‘info’ or ‘debug’ level using isi_ilog:

# isi_ilog -a isi_dm_d --level info+

Valid ilog levels include:

fatal error err notice info debug trace

error+ err+ notice+ info+ debug+ trace+

A copy or repeat-copy policy requires an available dataset for replication before running. If a dataset has not been successfully created prior to the copy or repeat-copy policy job starting for the same base path, the job is paused. In the following example, the base path of the copy policy is not the same as that of the dataset policy, hence the job fails with a “path doesn’t match…” error.

# ls -l /ifs/data/Isilon_support/Datamover/transfer_failures

Total 9

-rw-rw----   1 root  wheel  679  July 20 10:56 baseline_failure_10

# cat /ifs/data/Isilon_support/Datamover/transfer_failures/baseline_failure_10

Task_id=0x00000000000000ce, task_type=root task ds base copy, task_state=failed-fatal path doesn’t match dataset base path: ‘/ifs/test’ != /ifs/data/repeat-copy’:

from bc_task)initialize_dsh (/b/mnt/src/isilon/lib/isi_dm/isi_dm/src/ds_base_copy

from dmt_execute (/b/mnt/src/isilon/lib/isi_dm/isi_dm/src/ds_base_copy_root_task

from dm_txn_execute_internal (/b/mnt/src/isilon/lib/isi_dm/isi_dm_base/src/txn.cp

from dm_txn_execute (/b/mnt/src/isilon/lib/isi_dm/isi_dm_base/src/txn.cpp:2274)

from dmp_task_spark_execute (/b/mnt/src/isilon/lib/isi_dm/isi_dm/src/task_runner.

Once any errors for a policy have been resolved, the ‘isi dm jobs modify’ command can be used to resume the job.