cut, paste, and tr Basics: Column Extraction and Character Conversion
What You'll Learn
- Extract specific columns from CSV/TSV files using
cut - Merge multiple files side by side using
paste - Translate, delete, and squeeze characters using
tr - Combine all three with pipes for practical text processing
Quick Summary
| Command | Primary Use |
|---|---|
cut |
Extract specific fields or character ranges |
paste |
Merge files horizontally (column-by-column) |
tr |
Translate or delete characters one-for-one |
Environment
- OS: Ubuntu (GNU coreutils)
- macOS ships BSD versions of
cutandtr— minor behavioral differences may apply
What does cut do?
cut slices each line of text to extract specific fields or character positions. It is the go-to tool for pulling columns out of CSV/TSV files or fixed-width logs.
Field extraction (-f / -d)
Use -d to specify the delimiter and -f to select the field number(s).
# Extract the 2nd field of a comma-separated line $ echo "Alice,30,Tokyo" | cut -d, -f2 30 # Extract fields 1 and 2 from a TSV (default delimiter is tab) $ cut -f1,2 data.tsv # Extract from field 3 to end of line $ cut -f3- data.tsv
Character position extraction (-c)
Slice by character position — useful for fixed-width log formats.
# Extract the first 10 characters of each line $ cut -c1-10 access.log # Extract from character 5 to end of line $ cut -c5- access.log
-c counts characters, not bytes (-b is the byte-based option). For multi-byte character sets, -c is usually the right choice.
What are the common pitfalls with cut?
Field numbers start at 1 (not 0). Empty fields still count — a::c in a colon-delimited string produces an empty field 2.
# Field numbering starts at 1 $ echo "a:b:c" | cut -d: -f1 # → a $ echo "a:b:c" | cut -d: -f2 # → b # Empty field still counts $ echo "a::c" | cut -d: -f2 # → (empty line)
cut cannot treat multiple consecutive delimiters as a single delimiter (e.g., multiple spaces). Use awk for that.
What does paste do?
paste joins files line by line, placing each file's content in a separate column. Think of cat as vertical (append rows) versus paste as horizontal (append columns).
Basic form
# Merge two files side by side (default delimiter is tab) $ paste names.txt scores.txt Alice 95 Bob 87 Carol 72 # Use comma as delimiter $ paste -d, names.txt scores.txt Alice,95 Bob,87 Carol,72
Collapse a file into one line (-s)
The -s flag processes each file serially, turning its lines into a single row.
$ cat items.txt apple banana cherry $ paste -s -d, items.txt apple,banana,cherry
What does tr do?
tr translates characters one-for-one between two sets. It reads from stdin, so it is always used with a pipe or input redirection.
Character translation
# Lowercase to uppercase $ echo "hello world" | tr 'a-z' 'A-Z' HELLO WORLD # Replace spaces with underscores $ echo "foo bar baz" | tr ' ' '_' foo_bar_baz
Character deletion (-d)
# Delete all digits $ echo "abc123def456" | tr -d '0-9' abcdef # Remove newlines (join all lines into one) $ tr -d '\n' < multiline.txt
Squeezing repeated characters (-s)
-s collapses runs of the same character into a single one.
# Collapse multiple spaces into one $ echo "foo bar baz" | tr -s ' ' foo bar baz # Collapse multiple newlines into one $ tr -s '\n' < file.txt
tr does not support regex. For pattern-based substitution, use sed or awk.
How do you combine them in a pipeline?
Each command is narrow in scope but powerful when chained.
# Extract column 2 from CSV, then uppercase it $ cut -d, -f2 users.csv | tr 'a-z' 'A-Z' # Extract field 1 from TSV, deduplicate, then join with commas $ cut -f1 data.tsv | sort -u | paste -s -d, # Extract IPs from an access log and count unique ones $ cut -d' ' -f1 access.log | sort | uniq -c | sort -rn
Command comparison summary
| Goal | Command |
|---|---|
| Extract a CSV/TSV column | cut -f N -d DELIM |
| Extract a character range | cut -c N-M |
| Merge files side by side | paste file1 file2 |
| Collapse file lines into one row | paste -s |
| Translate characters | tr SET1 SET2 |
| Delete specific characters | tr -d SET |
| Squeeze repeated characters | tr -s SET |
| Multi-delimiter or regex logic | awk |
Common mistakes to avoid
- Treating
cutfield numbers as 0-based (they start at 1) - Passing a regex to
tr(it only accepts character sets, not patterns) - Merging files with different line counts using
paste(produces empty fields)
Copy-paste templates
# Extract column 2 from CSV cut -d, -f2 file.csv # Extract first 5 characters per line cut -c1-5 file.txt # Merge two files as CSV columns paste -d, file1.txt file2.txt # Lowercase to uppercase tr 'a-z' 'A-Z' # Squeeze multiple spaces into one tr -s ' '