gzip vs bzip2 vs xz vs zstd - Choosing a Compression Format
Which of the Four Should You Pick?
Conclusion: When in doubt, use zstd - the best balance of speed, ratio, and threading. Use gzip for compatibility, xz for maximum ratio; bzip2 has almost no reason for new use.
When compressing files on Linux, gzip, bzip2, xz, and zstd are the four standard choices. All pair with tar, but they differ greatly in compression ratio, compression speed, decompression speed, and parallel support.
Quick reference table
| Format | Extension | Ratio | Compress | Decompress | In one line |
|---|---|---|---|---|---|
| gzip | .gz |
Low | Fast | Fast | The compatibility king |
| bzip2 | .bz2 |
Medium | Slow | Slow | Older gen, fading |
| xz | .xz |
High | Slowest | Medium | Ratio-focused |
| zstd | .zst |
Med-High | Fast | Fastest | The modern default |
Assumptions (target environment)
- A common Linux distribution (Ubuntu / RHEL family, etc.)
zstdmay not be installed on older systems (apt install zstd/dnf install zstd)
What Makes Each Format Different?
Conclusion: gzip uses DEFLATE (fast, ubiquitous), bzip2 uses BWT (medium ratio but slow), xz uses LZMA2 (highest ratio), and zstd is fast with a wide tuning range.
gzip - The Compatibility Standard
gzip uses DEFLATE (LZ77 + Huffman coding). It has existed since 1992 and is installed almost everywhere. Its ratio is the lowest of the four, but it is fast and decompresses anywhere. From HTTP Content-Encoding: gzip onward, it is the de facto standard for distribution formats.
bzip2 - The BWT Old-Timer
bzip2 is block-sorting compression based on the Burrows-Wheeler Transform (BWT). It compresses better than gzip, but both compression and decompression are slow. It used to be the "smaller than gzip" option, yet today it loses to xz and zstd in both ratio and speed, so there is almost no reason to choose it for new work. It is mostly for decompressing existing .bz2 files.
xz - The Ratio Champion
xz uses the LZMA2 algorithm and delivers the highest compression ratio of the four. In exchange, it is the slowest to compress and uses a lot of memory at high levels. It fits compress once, distribute many times use cases (kernel sources, distro packages, etc.).
zstd - The Modern Default
zstd (Zstandard) is a relatively new format with an excellent balance of speed and ratio. It reaches a higher ratio than gzip at gzip-like speeds, and at high levels it approaches xz. Its very fast decompression is a major advantage, and it is increasingly adopted by the Linux kernel, btrfs, and various package managers.
How Do Ratio and Speed Compare?
Conclusion: Ratio is xz >= zstd(high level) > bzip2 > gzip. Decompression speed is zstd > gzip > xz > bzip2. zstd alone covers "fast yet reasonably small."
Compression fundamentally runs on the "smaller means slower" trade-off. The general tendencies:
- Ratio:
xzis highest.zstdapproaches xz at high levels.bzip2is medium,gzipis lowest - Compression speed:
gzipandzstd(low-to-mid levels) are fast.bzip2is slow,xzis the slowest - Decompression speed:
zstdis fastest.gzipis also fast.xzis medium,bzip2is the slowest
What matters most is decompression speed. You compress once, but decompression runs many times at the destination. If you repeatedly extract on many servers or in CI, zstd's fast decompression pays off directly.
Actual numbers vary widely by data type (text / binary / already-compressed) and CPU. The only correct answer is to benchmark on your own representative data. Measure with time and ls -l.
$ for c in gzip bzip2 xz zstd; do \
echo "== $c =="; \
time $c -k -9 -f sample.dat; \
ls -l sample.dat.* ; rm -f sample.dat.{gz,bz2,xz,zst}; \
doneHow Should You Choose?
Conclusion: Pick gzip for compatibility, xz when disk savings come first, and zstd for most everything else. Use bzip2 only to decompress existing files.
The decision flow is simple.
- Must it decompress reliably on the other side? (old systems, sharing with others)
-> gzip (
.gzextracts anywhere) - Do you want to shave off every last byte? (archives, long-term storage, many downloads) -> xz (slow to compress but smallest)
- Everything else (backups, logs, most daily work) -> zstd (fast, compresses well, fastest to decompress)
- You received a
.bz2or have legacy assets -> decompress with bzip2 (do not use it for new compression)
One-line guidance
- When in doubt,
zstd - "Handing it to someone" ->
gzip - "As small as possible" ->
xz
What Are the Basic Commands?
Conclusion: All four share the same pattern:
cmd fileto compress,cmd -d file.extto decompress, and-kto keep the original.
Single-file compression and decompression follow nearly identical conventions across all commands.
# Compress (note: the original is removed) $ gzip file.txt # -> file.txt.gz $ bzip2 file.txt # -> file.txt.bz2 $ xz file.txt # -> file.txt.xz $ zstd file.txt # -> file.txt.zst (original kept) # Compress while keeping the original (-k = keep) $ gzip -k file.txt $ xz -k file.txt # Decompress (-d = decompress) $ gzip -d file.txt.gz $ xz -d file.txt.xz $ zstd -d file.txt.zst # Dedicated decompression commands exist too $ gunzip file.txt.gz $ bunzip2 file.txt.bz2 $ unxz file.txt.xz $ unzstd file.txt.zst
gzip, bzip2, and xz delete the original by default. Add -k (keep) to retain it. zstd does the opposite - it keeps the original by default, so add --rm if you want it removed. The behavior is reversed, so be careful.
To inspect contents without writing a decompressed file, use each command's -c (to stdout) or zcat / bzcat / xzcat / zstdcat.
$ zcat access.log.gz | grep 500 $ zstdcat backup.tar.zst | tar tf -
What About Levels and Multithreading?
Conclusion: All use
-1to-9for levels (higher means smaller but slower). xz and zstd support-T0to compress in parallel across all CPU cores, cutting time substantially.
Compression levels
Higher numbers compress more but run slower. Typical defaults:
gzip:-1to-9, default-6bzip2:-1to-9, default-9(block size)xz:-0to-9, default-6zstd:-1to-19, default-3. You can push further to the maximum with--ultra -22
$ gzip -9 file # max compression $ xz -9 file # high ratio (slow, more memory) $ zstd -19 file # zstd's normal maximum $ zstd --ultra -22 file # zstd's absolute maximum
Multithreading (parallelism)
Parallelism helps on large files.
$ xz -T0 big.tar # compress using all cores $ zstd -T0 big.tar # compress using all cores (0 = auto)
gzip and bzip2 themselves do not support parallelism, but compatible parallel implementations exist. Install pigz (parallel gzip) and pbzip2 / lbzip2 to use all cores while keeping the .gz / .bz2 format.
How Do You Combine With tar?
Conclusion:
tarhas shortcuts-z(gzip),-j(bzip2), and-J(xz). For zstd, use--zstd, or the handy-a(caf) that auto-detects from the extension.
Bundling multiple files (tar) and compression are separate steps. Specify the compression format via tar options.
# Create (c = create, f = file) $ tar czf archive.tar.gz dir/ # gzip $ tar cjf archive.tar.bz2 dir/ # bzip2 $ tar cJf archive.tar.xz dir/ # xz $ tar --zstd -cf archive.tar.zst dir/ # zstd # On extraction, the format is auto-detected $ tar xf archive.tar.gz $ tar xf archive.tar.zst
Auto-detect by extension (-a)
With tar's -a (--auto-compress), tar picks the format from the output file's extension. Change formats without relearning the options (z / j / J).
$ tar caf archive.tar.zst dir/ # .zst -> zstd $ tar caf archive.tar.xz dir/ # .xz -> xz $ tar caf archive.tar.gz dir/ # .gz -> gzip
Older tar (especially non-GNU) may not support --zstd or -a. In that case, use a pipe.
$ tar cf - dir/ | zstd -T0 -o archive.tar.zst $ zstd -dc archive.tar.zst | tar xf -