find, grep, awk Exercises - Troubleshooting and Practical Skill Building

find, grep, awk Exercises - Troubleshooting and Practical Skill Building

The final episode. Covers skill validation exercises, fixes for common issues, and a growth roadmap. Final polish on your way to becoming a Linux power user.

What You'll Learn

  • Typical problems with find, grep, and awk you hit in production
  • Exercises and challenges to validate your skills
  • A learning roadmap for further growth
  • A final wrap-up on your way to becoming a Linux power user

Common Issues and Fixes

Conclusion: Know the typical find, grep and awk pitfalls and their fixes upfront.

If you use these in production, you will run into these typical issues. Know the fixes ahead of time.

find Command Issues

A flood of "Permission denied" errors

Symptoms:

find: '/root': Permission denied
find: '/proc/1': Permission denied

Fixes:

# Option 1: Suppress error output
find / -name "*.txt" 2>/dev/null

# Option 2: Search only locations you have access to
find /home /var /tmp -name "*.txt"

# Option 3: Run with sudo (use carefully)
sudo find / -name "*.txt"

Errors when filenames contain spaces

Symptom: "My Document.txt" is interpreted as "My", "Document.txt".

Fixes:

# Use -print0 with xargs -0
find /home -name "*.txt" -print0 | xargs -0 rm

# Use -exec with +
find /home -name "*.txt" -exec rm {} +

Searches are too slow

Fixes:

  • Skip unwanted directories with -path
  • Limit depth with -maxdepth
  • Filter files with -type f
find /var -maxdepth 3 -type f -path "*/node_modules" -prune -o -name "*.log" -print

grep Command Issues

Multibyte characters (e.g., Japanese) not searched correctly

Fixes:

# Verify and set locale
export LANG=en_US.UTF-8
grep "error" logfile.txt

# Avoid binary classification
grep -a "error" logfile.txt

Regex doesn't behave as expected

Common issues: +, ?, {} treated as literals, () grouping not supported.

Fixes:

# Use -E for extended regex
grep -E "colou?r" file.txt
grep -E "(http|https)://" file.txt

# Use the egrep alias
egrep "colou?r" file.txt

Binary file matches error

Symptom: Binary file image.jpg matches

Fixes:

# Search text files only
grep -I "pattern" *

# Restrict file types
grep -r --include="*.txt" --include="*.log" "pattern" .

awk Command Issues

Fields aren't split as expected

Symptom: CSV with commas inside quoted fields, e.g. "Tanaka","28","Tokyo, Shibuya","Engineer, Team Lead".

Fixes:

# Use a dedicated tool
csvtool col 1,2 data.csv

# Pipe through Python
python3 -c "
import csv, sys
reader = csv.reader(sys.stdin)
for row in reader: print(row[0], row[1])
" < data.csv

Numeric precision issues

Symptom: Decimal calculations are imprecise (expected 10.50, actual 10.5000000001).

Fixes:

# Use printf with precision
awk '{sum+=$1} END {printf "%.2f\n", sum}' numbers.txt

# Pipe to bc
awk '{print $1}' numbers.txt | paste -sd+ | bc

Debugging Techniques

Check incrementally

Run complex commands piece by piece.

# Final command
find /var/log -name "*.log" | xargs grep -l "ERROR" | xargs wc -l

# Debugging steps
# 1. Run only the find part
find /var/log -name "*.log"

# 2. Run up to grep
find /var/log -name "*.log" | xargs grep -l "ERROR"

# 3. Run the full command
find /var/log -name "*.log" | xargs grep -l "ERROR" | xargs wc -l

Save intermediate results

For long-running pipelines, save intermediate output.

find /var -name "*.log" > all_logs.txt
grep -l "ERROR" $(cat all_logs.txt) > error_logs.txt
wc -l $(cat error_logs.txt) > final_result.txt

Further Skills

Now that you've mastered find, grep, and awk, here are the skills to learn next.

Next-Level Commands

sed (stream editor): Fast text substitution, deletion, and insertion.

sed 's/error/ERROR/g' logfile.txt

Priority: Highest.

xargs (argument conversion): Convert pipe output to command-line arguments.

find . -name "*.txt" | xargs -P 4 wc -l

Priority: Highest.

sort/uniq (sort and dedupe): Reorder data and dedupe.

cat access.log | awk '{print $1}' | sort | uniq -c | sort -rn

Priority: High.

join/paste (file join): Merge data from multiple files.

join -t, file1.csv file2.csv

Priority: Medium.

Exercises and Challenges

Conclusion: Hands-on, staged exercises let you verify and cement your skills.

Cement skills with hands-on practice. Tackle these to verify your level.

Beginner Challenges

Challenge 1: File search basics

In /var/log and below, find files with extension .log that are 1MB or larger.

Challenge 1 - Show hint

Combine -name and -size with find.

Challenge 1 - Show solution
find /var/log -name "*.log" -size +1M

Challenge 2: Text search basics

Search system.log for lines containing "ERROR" and show with line numbers.

Challenge 2 - Show solution
grep -n "ERROR" system.log

Challenge 3: Data aggregation basics

Compute the sum of column 3 (sales) in sales.csv.

Challenge 3 - Show solution
awk -F',' '{sum += $3} END {print "Sum:", sum}' sales.csv

Intermediate Challenges

Challenge 4: Log analysis pipeline

Count today's unique IP addresses from the access log.

Challenge 4 - Show hint

Filter by today's date with grep, extract IP with awk, dedupe with sort/uniq.

Challenge 4 - Show solution
grep "$(date '+%d/%b/%Y')" access.log | awk '{print $1}' | sort -u | wc -l

Challenge 5: Large file search

From the home directory, find the top 5 files >= 100MB and display by size.

Challenge 5 - Show solution
find /home -type f -size +100M -exec ls -lh {} \; | sort -rh -k5 | head -5

Challenge 6: Error stats report

From multiple log files, aggregate error categories and show in descending order.

Challenge 6 - Show solution
find /var/log -name "*.log" | xargs grep -h "ERROR" | awk '{print $4}' | sort | uniq -c | sort -rn

Advanced Challenges

Challenge 7: Website monitoring script

From Apache access logs, find IPs with 10+ 404 errors in the past hour and produce alert messages.

Challenge 7 - Show hint

Filter by time, extract 404, group by IP, threshold filter.

Challenge 7 - Show solution
hour_ago=$(date -d '1 hour ago' '+%d/%b/%Y:%H')
current_hour=$(date '+%d/%b/%Y:%H')

grep -E "($hour_ago|$current_hour)" /var/log/apache2/access.log | \
grep " 404 " | \
awk '{print $1}' | \
sort | uniq -c | \
awk '$1 >= 10 {printf "ALERT: IP %s has %d 404 errors in last hour\n", $2, $1}'

Challenge 8: Data quality check

Build a script that checks CSV data quality and reports total rows / columns, empty rows, unique values per column, and min/max/avg of numeric columns.

Challenge 8 - Show solution
awk -F',' '
NR == 1 {
    num_columns = NF
    for (i = 1; i <= NF; i++) headers[i] = $i
    next
}
NF == 0 { empty_lines++; next }
{
    total_rows++
    for (i = 1; i <= num_columns && i <= NF; i++) {
        field_values[i][$i] = 1
        if ($i ~ /^[0-9]+\.?[0-9]*$/) {
            numeric_count[i]++
            numeric_sum[i] += $i
            if (numeric_min[i] == "" || $i < numeric_min[i]) numeric_min[i] = $i
            if (numeric_max[i] == "" || $i > numeric_max[i]) numeric_max[i] = $i
        }
    }
}
END {
    printf "Rows: %d\n", total_rows
    printf "Cols: %d\n", num_columns
    printf "Empty rows: %d\n", empty_lines + 0
    for (i = 1; i <= num_columns; i++) {
        printf "Col%d (%s): unique=%d", i, headers[i], length(field_values[i])
        if (numeric_count[i] > 0) {
            avg = numeric_sum[i] / numeric_count[i]
            printf ", min=%.2f, max=%.2f, avg=%.2f", numeric_min[i], numeric_max[i], avg
        }
        print ""
    }
}' data.csv

Challenge 9: Automated backup script

Build a backup script for important files. Only files changed since the last backup; only files smaller than 100MB; log all backup operations; auto-delete old backups (more than 7 days old).

Challenge 9 - Show solution
#!/bin/bash

BACKUP_DIR="/backup/$(date +%Y%m%d_%H%M%S)"
LAST_BACKUP_MARKER="/var/log/last_backup.timestamp"
LOG_FILE="/var/log/backup.log"

echo "=== Backup started at $(date) ===" >> "$LOG_FILE"
mkdir -p "$BACKUP_DIR"

find /home/important -type f -size -100M -newer "$LAST_BACKUP_MARKER" 2>/dev/null | \
while read file; do
    rel_path="${file#/home/important/}"
    backup_path="$BACKUP_DIR/$rel_path"
    backup_dir=$(dirname "$backup_path")
    mkdir -p "$backup_dir"

    if cp "$file" "$backup_path" 2>/dev/null; then
        echo "Backed up: $file" >> "$LOG_FILE"
    fi
done

# Delete old backups
find /backup -type d -mtime +7 -exec rm -rf {} + 2>/dev/null

# Update timestamp
date > "$LAST_BACKUP_MARKER"

Master Challenge

Challenge 10: Comprehensive system monitoring dashboard

Build a system monitoring script with:

  • Real-time log monitoring
  • Automatic alerting on errors
  • Resource usage visualization
  • Daily report generation
  • Web confirmation (HTML report)
Challenge 10 - Approach hint

Use tail -f for real-time monitoring, awk for stats, find for old file management, HTML templates for reports.

Completing this challenge means you can confidently call yourself a Linux power user.

Conclusion: First Step to Linux Power User

This series covered find, grep, and awk in detail, from basics to practical applications. Mastering these gets you to the level of a true Linux power user.

Skills Acquired

  • find: Search files and directories under any condition, fast
  • grep: Advanced text search with regex
  • awk: Data processing, aggregation, report generation
  • Effective combinations of all three
  • Performance optimization and troubleshooting
  • Industry case studies and production skills
  • Exercises and skill validation

Series Recap

  • Basics — Command overview and regex basics
  • Advanced — High-level grep and awk
  • Practical — Combinations and production usage
  • Professional — Exercises and troubleshooting (this article)

Expected Outcomes

  • Productivity: Automate the bulk of manual work
  • Problem solving: Quickly handle log analysis and data investigation
  • Career: Open paths into infrastructure, data, and DevOps

Practice now

What matters most is applying what you learned in your real work. Hands-on practice on Penguin Gym Linux and daily use of these commands will turn knowledge into skills.