Text Stream Filters: cat, sort, uniq, wc, head, tail

2026-05-17 Reading time: About 13 min Difficulty: Intermediate

What You Will Achieve

Build filter pipelines that process text received from standard input
Aggregate logs accurately by combining sort / uniq
Safely inspect only the needed part of large files with head / tail
Extract fields, translate characters, and number lines with cut / tr / nl
Handle frequent aggregation tasks with chained one-liners

This is the core of LPIC-1 objective 103.2 "Process text streams using filters". Filters read standard input, transform it, and write to standard output; chaining them with pipes makes them powerful.

Which Filter, When

Goal	Filter	Key options
Reorder lines	`sort`	`-n` numeric / `-r` reverse / `-k` key
Deduplicate/aggregate	`uniq`	`-c` count / `-d` duplicates only
Count items	`wc`	`-l` lines / `-w` words / `-c` bytes
See head/tail only	`head` / `tail`	`-n` count / `tail -f` follow
Extract columns	`cut`	`-d` delimiter / `-f` field
Replace/delete chars	`tr`	`-d` delete / `-s` squeeze
Add line numbers	`nl` / `cat -n`	`-b a` number all lines

uniq only collapses adjacent duplicates, so it is almost always combined with sort. This is the most frequent pattern in both the exam and real work.

Steps

Step 1: Concatenate files to standard output

cat access.log
cat -n script.sh
cat file1 file2 > merged.txt

     1  #!/bin/bash
     2  echo "start"
     3  exit 0

cat concatenates multiple files. -n adds line numbers and -A reveals invisible characters such as newlines and tabs. To merely view a single file, less handles large files better.

Step 2: Sort and aggregate duplicates

sort access.log | uniq -c | sort -nr | head -n 5

    143 GET /index.html
     97 GET /login
     61 POST /api/data
     28 GET /favicon.ico
     12 GET /robots.txt

"sort → uniq -c to count → reverse sort by count → top 5" is the standard access-aggregation idiom. uniq -c assumes the previous step already sorted the input.

Step 3: Count lines, words, and bytes

wc -l access.log
wc -lwc README.md

  10234 access.log
   120  856 5421 README.md

-l is lines, -w is words, -c is bytes (-m is characters). Placed at the end of a pipe, it directly yields "how many items matched".

Step 4: Slice head and tail

head -n 20 large.csv
tail -n 50 syslog
tail -f /var/log/nginx/access.log

2026-05-17 10:01:22 INFO  start
2026-05-17 10:01:23 INFO  ready

tail -f displays appended lines in real time, the basics of log monitoring. Combining head and tail extracts ranges such as "M lines starting at line N".

Step 5: Column extraction and character translation

cut -d: -f1,7 /etc/passwd
echo "Hello World" | tr 'a-z' 'A-Z'
cat data.txt | tr -s ' ' | tr -d '\r'

root:/bin/bash
daemon:/usr/sbin/nologin
HELLO WORLD

cut -d: -f1 extracts the first :-delimited column. tr translates or deletes characters; -s squeezes repeated characters into one and -d deletes specified characters. It is commonly used to strip Windows-origin \r.

Why Chain Filters

Each filter follows the Unix philosophy of "do one thing well". sort only reorders; uniq only handles adjacent duplicates. Being single-purpose is exactly what makes them freely composable through pipes, achieving aggregation and extraction without writing a huge dedicated tool.

uniq handles only adjacent duplicates because it processes the stream line by line without holding state. To handle whole-input duplicates, identical lines must first be made adjacent by sorting. Understanding this constraint makes writing sort | uniq reflexive.

Troubleshooting

Symptom: uniq -c does not aggregate duplicates

Cause: The input is not sorted

Check:

sort file | uniq -c

Fix: Always put sort before uniq. Or use sort -u to sort and deduplicate at once (but then you lose the -c counts).

Symptom: sort -n does not order as expected

Cause: The numeric column contains spaces or unit characters, or the key position is unspecified

Check:

sort -k2 -n data.txt

Fix: Specify the sort field with -k and the delimiter with -t. For human-readable sizes (1K, 2M) use sort -h.

Symptom: tail -f stops following updates

Cause: The log rotated to a different inode

Check:

tail -F /var/log/syslog

Fix: -f (lowercase) follows the inode, so after rotation switch to -F (uppercase) for filename-based following.

Completion Checklist

[ ] Ran the aggregation one-liner sort | uniq -c | sort -nr
[ ] Used wc -l at the end of a pipe to count items
[ ] Inspected only the needed part of a large file with head / tail
[ ] Extracted fields with cut -d -f
[ ] Verified character translation and \r removal with tr

Summary

Scenario	Command	Purpose
Aggregate	`sort \| uniq -c \| sort -nr`	Frequency ranking
Count	`wc -l`	Line count
Head/tail	`head -n` / `tail -f`	Range slice / follow
Column	`cut -d: -f1`	Field extraction
Translate	`tr a-z A-Z`	Per-character replace/delete

Chaining filters is the fundamental text-processing pattern. More complex pattern matching needs regular expressions and grep.

Text Stream Filters: cat, sort, uniq, wc, head, tail

What You Will Achieve

Which Filter, When

Steps

Step 1: Concatenate files to standard output

Step 2: Sort and aggregate duplicates

Step 3: Count lines, words, and bytes

Step 4: Slice head and tail

Step 5: Column extraction and character translation

Why Chain Filters

Troubleshooting

Symptom: uniq -c does not aggregate duplicates

Symptom: sort -n does not order as expected

Symptom: tail -f stops following updates

Completion Checklist

Summary

Next Reading