rsync in Practice: Backups and Mirroring

rsync in Practice: Backups and Mirroring

What You'll Learn

  • How to use rsync differential sync for real backups and mirroring
  • The correct use of --delete, --link-dest, and --exclude - and how to avoid disasters
  • A safe workflow that prevents "I synced and it deleted everything" and "out of space" failures

Quick Summary

  • One-way refresh -> rsync -av src/ dst/
  • Full mirror (delete extras too) -> rsync -av --delete src/ dst/
  • Keep generations (incremental) -> hard-link the previous run with --link-dest
  • Always run --dry-run before anything destructive

Assumptions (target environment)

  • OS: Ubuntu (rsync 3.x)
  • Both local <-> remote (SSH) scenarios
  • For the basics, see scp / rsync Basics

What is rsync differential sync?

Conclusion: rsync compares files already on the destination and transfers only what changed. The fast second run is exactly why it fits backups and mirroring.

rsync is not just a copy tool - it is a sync tool that transfers only the difference between source and destination. The first run is a full copy, but every later run sends only changed files (and only changed blocks), so even large datasets refresh quickly.

That property is what makes it ideal for scheduled backups (refresh the same directory daily) and mirroring (keep two locations identical).

Why is differential transfer fast?

Conclusion: By default rsync skips files whose size and modification time both match. Only changed files - and within them, only changed blocks - are sent, so total transfer is small.

rsync's speed comes from two levels of difference detection.

  1. Per-file skip check: by default rsync compares each file's size and modification time (mtime); if both match, it treats the file as unchanged and skips it.
  2. Block-level delta transfer: for files marked as changed, a rolling checksum finds matching blocks and sends only the changed portions (the rsync algorithm).
# First run: full copy
$ rsync -av src/ /backup/dst/

# Later runs: only the delta (near-instant if nothing changed)
$ rsync -av src/ /backup/dst/

When mtimes are unreliable (clock skew, just after a filesystem migration), use --checksum (below) to compare actual file contents.

How do you write the basic backup form?

Conclusion: rsync -av src/ dst/ is the baseline. Archive mode (-a) copies recursively while preserving permissions, ownership, timestamps, and symlinks.

Real backups start with -a, which preserves attributes.

$ rsync -av /home/user/data/ /backup/data/

-a (archive mode) bundles these options.

Included Meaning
-r Recursive (into subdirectories)
-l Preserve symlinks as symlinks
-p Preserve permissions
-t Preserve timestamps
-g / -o Preserve group / owner
-D Preserve device and special files

Common companions.

  • -v: show what is transferred (-vv for more detail)
  • -z: compress during transfer (good on slow links; can be slower on LANs)
  • -h: human-readable sizes (pairs well with --progress)

The trailing-slash trap (most important)

rsync -av src/ dst/   # put the CONTENTS of src into dst
rsync -av src  dst/   # put the src DIRECTORY itself into dst (dst/src/...)

A trailing / on the source changes the result. Most "my backup tree is off by one level" incidents come from this.

How do you mirror a directory? (--delete)

Conclusion: For a true mirror, add --delete to remove destination files missing from the source. It is destructive, so always run --dry-run first.

A plain -a backup only adds and updates. Files you delete from the source stay on the destination forever. To make source and destination identical (mirroring), use --delete.

# Always dry-run first to see WHAT gets deleted
$ rsync -av --delete --dry-run src/ /mirror/dst/

# If the output looks right, run for real
$ rsync -av --delete src/ /mirror/dst/

In dry-run output, files to be removed appear on deleting ... lines.

sending incremental file list
deleting old-report.csv
deleting cache/tmp.dat
./
new-report.csv

You can control when deletion happens.

  • --delete-after: delete only after the whole transfer succeeds (less likely to damage the destination on a mid-run failure)
  • --delete-excluded: also remove files that --exclude skipped from the destination

How do you make generational (incremental) snapshots?

Conclusion: --link-dest hard-links unchanged files from a previous backup, so each generation looks like a full backup but only consumes disk for the differences.

A backup that overwrites the same directory every day cannot answer "restore the state from three days ago." To keep generations, use --link-dest.

--link-dest=DIR writes the destination by creating hard links to unchanged files in DIR (usually the previous backup). Identical files do not consume disk twice, so you can keep many generations efficiently.

# Make a dated snapshot directory each run
$ SRC=/home/user/data/
$ DEST=/backup/snapshots
$ TODAY=$(date +%F)          # e.g. 2026-06-05
$ LATEST=$DEST/latest        # symlink to the previous snapshot

$ rsync -av --delete \
    --link-dest="$LATEST" \
    "$SRC" "$DEST/$TODAY/"

# Point latest at the newest snapshot
$ ln -sfn "$DEST/$TODAY" "$LATEST"

Now /backup/snapshots/2026-06-05/ shows every file, but files unchanged since yesterday share hard links with yesterday's run, so the extra real disk used is only the delta.

  • The --link-dest path is relative to the destination directory (or absolute). Watch this when using a relative path
  • To delete a generation, just rm -rf its dated directory. Because of hard links, files still referenced by other generations are not actually removed

How do you exclude unwanted files? (--exclude)

Conclusion: --exclude=PATTERN skips targets; when there are many, collect them in --exclude-from=FILE. Excluding caches, logs, and temp files makes backups lighter.

Including caches and temp files wastes both space and transfer time. Skip them with --exclude.

$ rsync -av --delete \
    --exclude='*.tmp' \
    --exclude='cache/' \
    --exclude='node_modules/' \
    src/ /backup/dst/

For many patterns, put them in a file.

# Contents of .rsync-exclude (one pattern per line)
# *.tmp
# cache/
# node_modules/
# .git/

$ rsync -av --delete --exclude-from='.rsync-exclude' src/ /backup/dst/

A leading / in a pattern means relative to the source root.

  • --exclude='/cache': exclude only cache directly under the source
  • --exclude='cache/': exclude cache/ at any depth

Use --dry-run to confirm you are not excluding more than intended.

How do you control bandwidth, resume, and progress?

Conclusion: -P (--partial --progress) gives resume and progress; --bwlimit caps bandwidth. Both are essential for large data, slow links, or transfers during business hours.

Sending large data to a remote host can saturate the link or fail partway and start over. Control it with these options.

# Progress + keep partial files on interruption (resume continues)
$ rsync -avP src/ user@server:/backup/dst/

# Cap bandwidth at 10 MB/s (good for backups during peak hours)
$ rsync -av --bwlimit=10M src/ user@server:/backup/dst/

# Non-standard SSH port
$ rsync -av -e "ssh -p 2222" src/ user@server:/backup/dst/
Option Effect
-P --partial (keep partial files) + --progress
--partial Keep partially transferred files and resume next time
--bwlimit=RATE Upper transfer rate (e.g. 10M = 10 MB/s)
-e "ssh -p PORT" Specify the remote shell (non-standard port, etc.)

-z (compression) costs CPU. On a LAN or with already-compressed data (video, images, archives), dropping -z is often faster. It pays off on slow WAN links.

When should you use checksum comparison?

Conclusion: --checksum (-c) compares file contents instead of size and mtime. Use it when timestamps cannot be trusted (clock skew, post-migration verification). Avoid it for routine syncs - it is slow.

The default "size + mtime" check is fast but can miss changes when timestamps are unreliable. --checksum computes a checksum for every file and compares by content - reliable but slow.

# Strict content-based diff (migration / integrity verification)
$ rsync -avc src/ /backup/dst/

When to use it.

  • Verify a copy is complete after a filesystem or server migration
  • Guarantee identical content between source and destination (mtimes may differ)

--checksum reads every file on both sides, so it is very slow on large datasets. Use the default mtime check for routine sync and add --checksum only for verification.

Checklist to prevent disasters (summary)

Conclusion: Make --dry-run mandatory for any rsync with --delete, and check trailing slashes and empty path variables every time. That alone prevents almost every serious rsync accident.

Next Reading