Text Stream Filters: cat, sort, uniq, wc, head, tail
What You Will Achieve
- Build filter pipelines that process text received from standard input
- Aggregate logs accurately by combining
sort/uniq - Safely inspect only the needed part of large files with
head/tail - Extract fields, translate characters, and number lines with
cut/tr/nl - Handle frequent aggregation tasks with chained one-liners
This is the core of LPIC-1 objective 103.2 "Process text streams using filters". Filters read standard input, transform it, and write to standard output; chaining them with pipes makes them powerful.
Which Filter, When
| Goal | Filter | Key options |
|---|---|---|
| Reorder lines | sort |
-n numeric / -r reverse / -k key |
| Deduplicate/aggregate | uniq |
-c count / -d duplicates only |
| Count items | wc |
-l lines / -w words / -c bytes |
| See head/tail only | head / tail |
-n count / tail -f follow |
| Extract columns | cut |
-d delimiter / -f field |
| Replace/delete chars | tr |
-d delete / -s squeeze |
| Add line numbers | nl / cat -n |
-b a number all lines |
uniq only collapses adjacent duplicates, so it is almost always combined with sort. This is the most frequent pattern in both the exam and real work.
Steps
Step 1: Concatenate files to standard output
cat access.log cat -n script.sh cat file1 file2 > merged.txt
1 #!/bin/bash
2 echo "start"
3 exit 0
cat concatenates multiple files. -n adds line numbers and -A reveals invisible characters such as newlines and tabs. To merely view a single file, less handles large files better.
Step 2: Sort and aggregate duplicates
sort access.log | uniq -c | sort -nr | head -n 5
143 GET /index.html
97 GET /login
61 POST /api/data
28 GET /favicon.ico
12 GET /robots.txt
"sort → uniq -c to count → reverse sort by count → top 5" is the standard access-aggregation idiom. uniq -c assumes the previous step already sorted the input.
Step 3: Count lines, words, and bytes
wc -l access.log wc -lwc README.md
10234 access.log 120 856 5421 README.md
-l is lines, -w is words, -c is bytes (-m is characters). Placed at the end of a pipe, it directly yields "how many items matched".
Step 4: Slice head and tail
head -n 20 large.csv tail -n 50 syslog tail -f /var/log/nginx/access.log
2026-05-17 10:01:22 INFO start 2026-05-17 10:01:23 INFO ready
tail -f displays appended lines in real time, the basics of log monitoring. Combining head and tail extracts ranges such as "M lines starting at line N".
Step 5: Column extraction and character translation
cut -d: -f1,7 /etc/passwd echo "Hello World" | tr 'a-z' 'A-Z' cat data.txt | tr -s ' ' | tr -d '\r'
root:/bin/bash daemon:/usr/sbin/nologin HELLO WORLD
cut -d: -f1 extracts the first :-delimited column. tr translates or deletes characters; -s squeezes repeated characters into one and -d deletes specified characters. It is commonly used to strip Windows-origin \r.
Why Chain Filters
Each filter follows the Unix philosophy of "do one thing well". sort only reorders; uniq only handles adjacent duplicates. Being single-purpose is exactly what makes them freely composable through pipes, achieving aggregation and extraction without writing a huge dedicated tool.
uniq handles only adjacent duplicates because it processes the stream line by line without holding state. To handle whole-input duplicates, identical lines must first be made adjacent by sorting. Understanding this constraint makes writing sort | uniq reflexive.
Troubleshooting
Symptom: uniq -c does not aggregate duplicates
Cause: The input is not sorted
Check:
sort file | uniq -c
Fix: Always put sort before uniq. Or use sort -u to sort and deduplicate at once (but then you lose the -c counts).
Symptom: sort -n does not order as expected
Cause: The numeric column contains spaces or unit characters, or the key position is unspecified
Check:
sort -k2 -n data.txt
Fix: Specify the sort field with -k and the delimiter with -t. For human-readable sizes (1K, 2M) use sort -h.
Symptom: tail -f stops following updates
Cause: The log rotated to a different inode
Check:
tail -F /var/log/syslog
Fix: -f (lowercase) follows the inode, so after rotation switch to -F (uppercase) for filename-based following.
Completion Checklist
- [ ] Ran the aggregation one-liner
sort | uniq -c | sort -nr - [ ] Used
wc -lat the end of a pipe to count items - [ ] Inspected only the needed part of a large file with
head/tail - [ ] Extracted fields with
cut -d -f - [ ] Verified character translation and
\rremoval withtr
Summary
| Scenario | Command | Purpose |
|---|---|---|
| Aggregate | sort | uniq -c | sort -nr |
Frequency ranking |
| Count | wc -l |
Line count |
| Head/tail | head -n / tail -f |
Range slice / follow |
| Column | cut -d: -f1 |
Field extraction |
| Translate | tr a-z A-Z |
Per-character replace/delete |
Chaining filters is the fundamental text-processing pattern. More complex pattern matching needs regular expressions and grep.