cut, paste, and tr Basics: Column Extraction and Character Conversion

cut, paste, and tr Basics: Column Extraction and Character Conversion

What You'll Learn

  • Extract specific columns from CSV/TSV files using cut
  • Merge multiple files side by side using paste
  • Translate, delete, and squeeze characters using tr
  • Combine all three with pipes for practical text processing

Quick Summary

Command Primary Use
cut Extract specific fields or character ranges
paste Merge files horizontally (column-by-column)
tr Translate or delete characters one-for-one

Environment

  • OS: Ubuntu (GNU coreutils)
  • macOS ships BSD versions of cut and tr — minor behavioral differences may apply

What does cut do?

cut slices each line of text to extract specific fields or character positions. It is the go-to tool for pulling columns out of CSV/TSV files or fixed-width logs.

Field extraction (-f / -d)

Use -d to specify the delimiter and -f to select the field number(s).

# Extract the 2nd field of a comma-separated line
$ echo "Alice,30,Tokyo" | cut -d, -f2
30

# Extract fields 1 and 2 from a TSV (default delimiter is tab)
$ cut -f1,2 data.tsv

# Extract from field 3 to end of line
$ cut -f3- data.tsv

Character position extraction (-c)

Slice by character position — useful for fixed-width log formats.

# Extract the first 10 characters of each line
$ cut -c1-10 access.log

# Extract from character 5 to end of line
$ cut -c5- access.log

-c counts characters, not bytes (-b is the byte-based option). For multi-byte character sets, -c is usually the right choice.

What are the common pitfalls with cut?

Field numbers start at 1 (not 0). Empty fields still count — a::c in a colon-delimited string produces an empty field 2.

# Field numbering starts at 1
$ echo "a:b:c" | cut -d: -f1   # → a
$ echo "a:b:c" | cut -d: -f2   # → b

# Empty field still counts
$ echo "a::c" | cut -d: -f2    # → (empty line)

cut cannot treat multiple consecutive delimiters as a single delimiter (e.g., multiple spaces). Use awk for that.

What does paste do?

paste joins files line by line, placing each file's content in a separate column. Think of cat as vertical (append rows) versus paste as horizontal (append columns).

Basic form

# Merge two files side by side (default delimiter is tab)
$ paste names.txt scores.txt
Alice   95
Bob     87
Carol   72

# Use comma as delimiter
$ paste -d, names.txt scores.txt
Alice,95
Bob,87
Carol,72

Collapse a file into one line (-s)

The -s flag processes each file serially, turning its lines into a single row.

$ cat items.txt
apple
banana
cherry

$ paste -s -d, items.txt
apple,banana,cherry

What does tr do?

tr translates characters one-for-one between two sets. It reads from stdin, so it is always used with a pipe or input redirection.

Character translation

# Lowercase to uppercase
$ echo "hello world" | tr 'a-z' 'A-Z'
HELLO WORLD

# Replace spaces with underscores
$ echo "foo bar baz" | tr ' ' '_'
foo_bar_baz

Character deletion (-d)

# Delete all digits
$ echo "abc123def456" | tr -d '0-9'
abcdef

# Remove newlines (join all lines into one)
$ tr -d '\n' < multiline.txt

Squeezing repeated characters (-s)

-s collapses runs of the same character into a single one.

# Collapse multiple spaces into one
$ echo "foo   bar   baz" | tr -s ' '
foo bar baz

# Collapse multiple newlines into one
$ tr -s '\n' < file.txt

tr does not support regex. For pattern-based substitution, use sed or awk.

How do you combine them in a pipeline?

Each command is narrow in scope but powerful when chained.

# Extract column 2 from CSV, then uppercase it
$ cut -d, -f2 users.csv | tr 'a-z' 'A-Z'

# Extract field 1 from TSV, deduplicate, then join with commas
$ cut -f1 data.tsv | sort -u | paste -s -d,

# Extract IPs from an access log and count unique ones
$ cut -d' ' -f1 access.log | sort | uniq -c | sort -rn

Command comparison summary

Goal Command
Extract a CSV/TSV column cut -f N -d DELIM
Extract a character range cut -c N-M
Merge files side by side paste file1 file2
Collapse file lines into one row paste -s
Translate characters tr SET1 SET2
Delete specific characters tr -d SET
Squeeze repeated characters tr -s SET
Multi-delimiter or regex logic awk

Common mistakes to avoid

  • Treating cut field numbers as 0-based (they start at 1)
  • Passing a regex to tr (it only accepts character sets, not patterns)
  • Merging files with different line counts using paste (produces empty fields)

Copy-paste templates

# Extract column 2 from CSV
cut -d, -f2 file.csv

# Extract first 5 characters per line
cut -c1-5 file.txt

# Merge two files as CSV columns
paste -d, file1.txt file2.txt

# Lowercase to uppercase
tr 'a-z' 'A-Z'

# Squeeze multiple spaces into one
tr -s ' '

Next Reading