Getting Started with sort and uniq: Sorting Data and Removing Duplicates
What You'll Learn
- How to sort lines with
sort(alphabetical, numeric, reverse) - How to remove duplicate lines with
uniq— and why it implicitly requiressort - How to write the classic frequency ranking pipeline
sort | uniq -c | sort -rn - Why beginners get stuck on "uniq doesn't remove duplicates" and "numbers come out in a weird order"
Quick Summary
- Want to sort? →
sort - Want to sort and dedupe? →
sort -u - Want to count occurrences? →
sort | uniq -c | sort -rn
Environment
- OS: Ubuntu / typical Linux
- GNU coreutils
sort/uniq(BSD versions on macOS differ in some option details)
1. What Does "Sorting Lines" Mean?
sort is for. sort filename reads the file line by line and prints the sorted result. It doesn't modify the file — it just prints to the screen, so you can experiment safely.Let's prepare a sample file:
$ cat fruits.txt
banana apple cherry apple banana date
1-1. Basic: Alphabetical Order
$ sort fruits.txt
apple apple banana banana cherry date
Key points
sortdefaults to alphabetical (dictionary) order- Uppercase and lowercase are typically treated as different (uppercase comes first)
- The original file is not modified —
sortonly prints to the screen
1-2. Reverse (Descending) Order: -r
$ sort -r fruits.txt
date cherry banana banana apple apple
-r stands for reverse.
2. The Numeric Sort Trap
$ cat scores.txt
100 3 25 9 1000
$ sort scores.txt
100 1000 25 3 9
100 comes before 25, and 3 and 9 are at the end. Is this a bug?sort does string comparison character by character from the left, so lines starting with 1 come before lines starting with 2 or 3. To sort as numbers, pass -n.2-1. Numeric Sort: -n
$ sort -n scores.txt
3 9 25 100 1000
-n stands for numeric.
Beginner pitfall
- Forgetting
-nwhen sorting sizes, counts, or any numeric column produces the wrong order - Rule of thumb: "if the column looks like a number, add
-n"
2-2. Numbers in Descending Order
$ sort -nr scores.txt
1000 100 25 9 3
-n and -r combine freely. This combination appears in nearly every ranking task.
3. Sort + Deduplicate in One Shot: sort -u
$ sort -u fruits.txt
apple banana cherry date
apple and banana appear only once each.-u stands for unique. It sorts and strips duplicates in a single command. When you just want "the unique values, sorted," this one option does it all.In real work, "give me the unique values" is one of the most common requests. sort -u is the shortcut.
4. uniq: The Deduplication Specialist
4-1. Basics
$ uniq fruits.txt
banana apple cherry apple banana date
apple and banana are still duplicated!uniq. It only removes adjacent duplicates — duplicates separated by other lines are kept.sort first. Once sort puts identical lines next to each other, uniq can collapse them properly.4-2. The sort | uniq Pattern
$ sort fruits.txt | uniq
apple banana cherry date
Rule of thumb
uniqalways goes aftersort- Use
uniqalone only when you already know the input is sorted - If "sort and dedupe" is all you want,
sort -uis shorter
4-3. Counting Occurrences: uniq -c
$ sort fruits.txt | uniq -c
2 apple
2 banana
1 cherry
1 date
-c stands for count — each line gets its occurrence count prepended. Extremely useful for aggregation.
4-4. Duplicates Only / Singletons Only
# Show only lines that appear more than once $ sort fruits.txt | uniq -d
apple banana
# Show only lines that appear exactly once $ sort fruits.txt | uniq -u
cherry date
| Option | Meaning | Use case |
|---|---|---|
-c |
Prepend count | Aggregation |
-d |
Duplicates only | Find duplicated items |
-u |
Singletons only | Extract values seen exactly 1x |
-i |
Case-insensitive compare | Merge case variants |
5. The Real-World Workhorse: Frequency Ranking
sort | uniq -c | sort -rn is the standard idiom. Memorize it.Sample log:
$ cat access.log
192.168.1.10 192.168.1.20 192.168.1.10 192.168.1.30 192.168.1.10 192.168.1.20
Frequency ranking:
$ sort access.log | uniq -c | sort -rn
3 192.168.1.10
2 192.168.1.20
1 192.168.1.30
Pipeline breakdown
| Stage | Command | What it does |
|---|---|---|
| 1 | sort |
Brings identical lines next to each other |
| 2 | uniq -c |
Collapses adjacent duplicates with a count |
| 3 | sort -rn |
Sorts by count (numeric) in descending order |
5-1. Top N Only
$ sort access.log | uniq -c | sort -rn | head -n 3
head -n 3 keeps the top 3 entries. Combining with head is the everyday pattern.
6. Advanced: Sort by a Specific Column with -k
For CSV or whitespace-separated data, -k chooses which field to sort by.
$ cat sales.txt
apple 120 banana 80 cherry 200 date 50
# Sort by the 2nd column (numeric) in descending order $ sort -k2 -nr sales.txt
cherry 200 apple 120 banana 80 date 50
-k2selects the second field as the sort key- Use
-nwhenever the chosen column is numeric - To change the delimiter, use
-t,(comma-separated),-t:, etc.
7. Common Beginner Pitfalls
7-1. uniq Didn't Remove the Duplicates
Cause: forgot to sort first.
# BAD: non-adjacent duplicates are not removed $ uniq fruits.txt # GOOD $ sort fruits.txt | uniq $ sort -u fruits.txt
7-2. Numbers Came Out in a Weird Order
Cause: forgot -n. sort is doing string comparison.
$ sort -n scores.txt # Sort as numbers
7-3. The Original File Wasn't Modified
sort only prints to the screen — it never modifies the input file. To save the sorted result, redirect explicitly:
$ sort fruits.txt > fruits-sorted.txt
Never do this
# BAD: this empties the file $ sort fruits.txt > fruits.txt
> truncates the destination before the command runs, so sort reads an empty file. To sort in-place safely, use sort -o:
# GOOD: -o writes only after reading is finished $ sort -o fruits.txt fruits.txt
7-4. Upper/Lowercase Are Treated as Different
$ cat names.txt
Alice bob Alice BOB
$ sort -u names.txt
Alice BOB bob
To ignore case, add -f (fold case):
$ sort -uf names.txt
Alice bob
8. Mini Exercises
Exercise 1: Print the unique words from this file.
$ cat << 'EOF' > words.txt apple banana apple cherry banana EOF
Show hint
There's one option that does "sort and dedupe" in a single step.
Show answer
$ sort -u words.txt
apple banana cherry
Exercise 2: Count how many times each word appears.
Show hint
Two-stage pipe: sort → uniq -c.
Show answer
$ sort words.txt | uniq -c
2 apple
2 banana
1 cherry
Exercise 3: Sort the counts in descending order and show only the top 2.
Show hint
Sort the count column numerically in reverse → keep 2 lines with head.
Show answer
$ sort words.txt | uniq -c | sort -rn | head -n 2
2 apple
2 banana
9. Copy-Paste Templates
Patterns to keep handy
# Sort alphabetically sort file.txt # Sort and deduplicate sort -u file.txt # Sort numerically (ascending / descending) sort -n file.txt sort -nr file.txt # Count occurrences per line sort file.txt | uniq -c # Frequency ranking (most frequent first) sort file.txt | uniq -c | sort -rn # Top 10 frequency ranking sort file.txt | uniq -c | sort -rn | head -n 10 # Sort by 2nd column, descending numeric sort -k2 -nr file.txt # Case-insensitive unique values sort -uf file.txt # Sort in place safely (avoids the > self-truncation bug) sort -o file.txt file.txt