GNU parallel: Running Jobs in Parallel from the Shell
What You'll Learn
- Run many commands at once across CPU cores to cut total time
- Avoid the interleaved, out-of-order output you get from
xargs -P - Use
--joblogand--resumeto resume from where you left off, skipping completed jobs
Quick Summary
- Many heavy, independent tasks (convert, download, test) →
parallel - Need clean output / input order preserved →
-k - Want to resume an interrupted run →
--joblog+--resume
Assumptions (target environment)
- Ubuntu / Debian family (the ideas apply to other distros too)
- This is GNU parallel (by Ole Tange). The unrelated
parallelshipped inmoreutilsis not compatible
What is GNU parallel?
Conclusion: It takes a list (from stdin or command-line args) and runs a command on each item in parallel. By default it runs one job per CPU core.
GNU parallel takes the xargs idea of "build a command from a list" and specializes it for parallel execution and clean output. Two basic forms:
# 1) arguments after ::: parallel echo ::: a b c # 2) from standard input seq 1 3 | parallel echo
a b c
Each of a, b, c launches echo as a separate process at the same time. The default concurrency is the CPU core count, so more inputs than cores are fed into slots as they free up.
By default one command runs per input item (the opposite of xargs). To pack several arguments into one invocation, use -N (covered below).
How do I install it and run the first command?
Conclusion: Install with
apt install parallel. On first use it prints a citation notice; runparallel --citationonce to record your acknowledgement.
sudo apt update sudo apt install parallel parallel --version
GNU parallel asks to be cited in academic work and prints a citation request on first use. If that gets in the way of scripts or CI, run the following once to record acknowledgement (it creates ~/.parallel/will-cite and silences the notice).
parallel --citation
On some distros the moreutils package provides a different parallel. Always check that the first line of parallel --version reads GNU parallel. If it does not, it is not the GNU tool.
Why use parallel instead of xargs?
Conclusion:
xargs -Pcan run in parallel too, but its output interleaves line by line. parallel groups output per job and can keep input order with-k.
xargs -P 4 is handy, but the standard output of concurrent jobs tends to interleave one line at a time. parallel buffers each job's output internally and emits it as a whole once the job finishes, so it never mixes.
| Aspect | xargs -P |
parallel |
|---|---|---|
| Output interleaving | Likely | Grouped per job |
| Output in input order | Not guaranteed | Guaranteed with -k |
| Placeholder power | {} only |
{} {.} {/} etc. |
| Run log / resume | None | --joblog --resume |
| Progress display | None | --bar --eta |
For raw speed alone, xargs -P is often enough. parallel earns its keep when you must not corrupt output or you want to resume.
How do the placeholders work?
Conclusion:
{}is the input itself.{.}strips the extension,{/}is the basename,{//}the directory,{#}the job number. They are key to building output names.
Placeholders tell parallel where to insert each input. If you omit them, a {} is appended at the end.
# Convert each *.wav to a same-named .mp3 ({.} strips the extension)
parallel ffmpeg -i {} {.}.mp3 ::: *.wavThe main placeholders:
| Syntax | Meaning | Example (input dir/file.txt) |
|---|---|---|
{} |
The input itself | dir/file.txt |
{.} |
Extension removed | dir/file |
{/} |
Basename (directory removed) | file.txt |
{//} |
Directory part | dir |
{/.} |
Basename without extension | file |
{#} |
Sequential job number | 1, 2, ... |
{%} |
Job slot number | 1..(up to concurrency) |
# Prefix each input with its job number
parallel 'echo job {#}: {}' ::: alpha beta gammajob 1: alpha job 2: beta job 3: gamma
How do I control job count and output order?
Conclusion: Set concurrency with
-j.-j0runs as many as possible,-j 200%is twice the core count. Add-kto emit output in input order.
# 4 jobs at a time
parallel -j 4 ./convert.sh ::: *.dat
# Twice the core count (good for I/O-bound work)
parallel -j 200% curl -O ::: "${urls[@]}"
# No concurrency limit (use with care)
parallel -j0 echo ::: {1..100}With parallel execution, output arrives in completion order, breaking the input-to-output mapping. To emit in input order, add -k (--keep-order).
seq 1 5 | parallel -k 'sleep $((RANDOM % 3)); echo {}'1 2 3 4 5
Before going live, add --dry-run to print just the command lines parallel would run. Catch placeholder-expansion mistakes here.
How do I combine multiple inputs?
Conclusion: Multiple
:::produce the cartesian product.--linkpairs items by position. For multi-column lines, use--colsepand reference{1}{2}.
# Cartesian product: a-1 a-2 b-1 b-2 c-1 c-2 (6 jobs) parallel echo ::: a b c ::: 1 2
# --link: pair by position -> a-1 b-2 c-3 parallel --link echo ::: a b c ::: 1 2 3
To read inputs from a file, use :::: (or -a). To split columns like a CSV, use --colsep.
# Use each line of hosts.txt as an argument
parallel ping -c1 {} :::: hosts.txt
# Split "user,host" into columns
parallel --colsep ',' ssh {2} -l {1} uptime :::: targets.csvHow do I show progress, log, and re-run failures?
Conclusion:
--barshows a progress bar,--joblogrecords each job's result. Add--resumeto carry only the unfinished jobs into the next run.
# Show a progress bar
parallel --bar ./task.sh ::: {1..50}
# Record a run log (exit code and duration per job)
parallel --joblog run.log ./task.sh ::: *.datWith a --joblog recorded, --resume skips jobs that already succeeded and continues. To retry only the failed ones, use --resume-failed.
# After an interruption/failure, add --resume to the same command parallel --joblog run.log --resume ./task.sh ::: *.dat
To stop early on an error, use --halt.
# Stop after one failure, letting running jobs finish parallel --halt now,fail=1 ./task.sh ::: *.dat
--resume assumes the same --joblog file and the same command. Changing the command or inputs breaks correct resumption.
What is the --pipe mode for splitting stdin?
Conclusion:
--pipesplits the standard input stream itself into blocks and feeds each block to a parallel command. It suits aggregating huge logs.
So far we parallelized an argument list. --pipe instead splits a single input stream for parallel processing.
# Split a huge file into 10MB blocks and grep each in parallel cat huge.log | parallel --pipe --block 10M grep ERROR
--block sets the size of each block. parallel splits on newline boundaries so lines are never cut in half.
Practical recipes
Conclusion: Bulk image conversion, mass downloads, and running a command across many hosts are the staples. Confirm with
--dry-runbefore going live.
Copy-paste templates
# Bulk-resize images (output named {.}_small.jpg)
parallel convert {} -resize 50% {.}_small.jpg ::: *.jpg
# Download a list of URLs with 8 jobs
parallel -j8 wget -q ::: $(cat urls.txt)
# Same command across many hosts (output grouped per host)
parallel -k --tag ssh {} 'uptime' :::: hosts.txt
# Preview first; drop --dry-run once it looks right
parallel --dry-run ./batch.sh {} ::: input/*Adding --tag prefixes each output line with its input (such as the host name), making it easy to tell which job produced which result.