find, grep, awk Exercises - Troubleshooting and Practical Skill Building

find, grep, awk Exercises - Troubleshooting and Practical Skill Building

The final episode. Covers skill validation exercises, fixes for common issues, and a growth roadmap. Final polish on your way to becoming a Linux power user.

Common Issues and Fixes

If you use these in production, you will run into these typical issues. Know the fixes ahead of time.

find Command Issues

A flood of "Permission denied" errors

Symptoms:

find: '/root': Permission denied
find: '/proc/1': Permission denied

Fixes:

# Option 1: Suppress error output
find / -name "*.txt" 2>/dev/null

# Option 2: Search only locations you have access to
find /home /var /tmp -name "*.txt"

# Option 3: Run with sudo (use carefully)
sudo find / -name "*.txt"

Errors when filenames contain spaces

Symptom: "My Document.txt" is interpreted as "My", "Document.txt".

Fixes:

# Use -print0 with xargs -0
find /home -name "*.txt" -print0 | xargs -0 rm

# Use -exec with +
find /home -name "*.txt" -exec rm {} +

Searches are too slow

Fixes:

  • Skip unwanted directories with -path
  • Limit depth with -maxdepth
  • Filter files with -type f
find /var -maxdepth 3 -type f -path "*/node_modules" -prune -o -name "*.log" -print

grep Command Issues

Multibyte characters (e.g., Japanese) not searched correctly

Fixes:

# Verify and set locale
export LANG=en_US.UTF-8
grep "error" logfile.txt

# Avoid binary classification
grep -a "error" logfile.txt

Regex doesn't behave as expected

Common issues: +, ?, {} treated as literals, () grouping not supported.

Fixes:

# Use -E for extended regex
grep -E "colou?r" file.txt
grep -E "(http|https)://" file.txt

# Use the egrep alias
egrep "colou?r" file.txt

Binary file matches error

Symptom: Binary file image.jpg matches

Fixes:

# Search text files only
grep -I "pattern" *

# Restrict file types
grep -r --include="*.txt" --include="*.log" "pattern" .

awk Command Issues

Fields aren't split as expected

Symptom: CSV with commas inside quoted fields, e.g. "Tanaka","28","Tokyo, Shibuya","Engineer, Team Lead".

Fixes:

# Use a dedicated tool
csvtool col 1,2 data.csv

# Pipe through Python
python3 -c "
import csv, sys
reader = csv.reader(sys.stdin)
for row in reader: print(row[0], row[1])
" < data.csv

Numeric precision issues

Symptom: Decimal calculations are imprecise (expected 10.50, actual 10.5000000001).

Fixes:

# Use printf with precision
awk '{sum+=$1} END {printf "%.2f\n", sum}' numbers.txt

# Pipe to bc
awk '{print $1}' numbers.txt | paste -sd+ | bc

Debugging Techniques

Check incrementally

Run complex commands piece by piece.

# Final command
find /var/log -name "*.log" | xargs grep -l "ERROR" | xargs wc -l

# Debugging steps
# 1. Run only the find part
find /var/log -name "*.log"

# 2. Run up to grep
find /var/log -name "*.log" | xargs grep -l "ERROR"

# 3. Run the full command
find /var/log -name "*.log" | xargs grep -l "ERROR" | xargs wc -l

Save intermediate results

For long-running pipelines, save intermediate output.

find /var -name "*.log" > all_logs.txt
grep -l "ERROR" $(cat all_logs.txt) > error_logs.txt
wc -l $(cat error_logs.txt) > final_result.txt

Further Skills

Now that you've mastered find, grep, and awk, here are the skills to learn next.

Next-Level Commands

sed (stream editor): Fast text substitution, deletion, and insertion.

sed 's/error/ERROR/g' logfile.txt

Priority: Highest.

xargs (argument conversion): Convert pipe output to command-line arguments.

find . -name "*.txt" | xargs -P 4 wc -l

Priority: Highest.

sort/uniq (sort and dedupe): Reorder data and dedupe.

cat access.log | awk '{print $1}' | sort | uniq -c | sort -rn

Priority: High.

join/paste (file join): Merge data from multiple files.

join -t, file1.csv file2.csv

Priority: Medium.

Exercises and Challenges

Cement skills with hands-on practice. Tackle these to verify your level.

Beginner Challenges

Challenge 1: File search basics

In /var/log and below, find files with extension .log that are 1MB or larger.

Challenge 1 - Show hint

Combine -name and -size with find.

Challenge 1 - Show solution
find /var/log -name "*.log" -size +1M

Challenge 2: Text search basics

Search system.log for lines containing "ERROR" and show with line numbers.

Challenge 2 - Show solution
grep -n "ERROR" system.log

Challenge 3: Data aggregation basics

Compute the sum of column 3 (sales) in sales.csv.

Challenge 3 - Show solution
awk -F',' '{sum += $3} END {print "Sum:", sum}' sales.csv

Intermediate Challenges

Challenge 4: Log analysis pipeline

Count today's unique IP addresses from the access log.

Challenge 4 - Show hint

Filter by today's date with grep, extract IP with awk, dedupe with sort/uniq.

Challenge 4 - Show solution
grep "$(date '+%d/%b/%Y')" access.log | awk '{print $1}' | sort -u | wc -l

Challenge 5: Large file search

From the home directory, find the top 5 files >= 100MB and display by size.

Challenge 5 - Show solution
find /home -type f -size +100M -exec ls -lh {} \; | sort -rh -k5 | head -5

Challenge 6: Error stats report

From multiple log files, aggregate error categories and show in descending order.

Challenge 6 - Show solution
find /var/log -name "*.log" | xargs grep -h "ERROR" | awk '{print $4}' | sort | uniq -c | sort -rn

Advanced Challenges

Challenge 7: Website monitoring script

From Apache access logs, find IPs with 10+ 404 errors in the past hour and produce alert messages.

Challenge 7 - Show hint

Filter by time, extract 404, group by IP, threshold filter.

Challenge 7 - Show solution
hour_ago=$(date -d '1 hour ago' '+%d/%b/%Y:%H')
current_hour=$(date '+%d/%b/%Y:%H')

grep -E "($hour_ago|$current_hour)" /var/log/apache2/access.log | \
grep " 404 " | \
awk '{print $1}' | \
sort | uniq -c | \
awk '$1 >= 10 {printf "ALERT: IP %s has %d 404 errors in last hour\n", $2, $1}'

Challenge 8: Data quality check

Build a script that checks CSV data quality and reports total rows / columns, empty rows, unique values per column, and min/max/avg of numeric columns.

Challenge 8 - Show solution
awk -F',' '
NR == 1 {
    num_columns = NF
    for (i = 1; i <= NF; i++) headers[i] = $i
    next
}
NF == 0 { empty_lines++; next }
{
    total_rows++
    for (i = 1; i <= num_columns && i <= NF; i++) {
        field_values[i][$i] = 1
        if ($i ~ /^[0-9]+\.?[0-9]*$/) {
            numeric_count[i]++
            numeric_sum[i] += $i
            if (numeric_min[i] == "" || $i < numeric_min[i]) numeric_min[i] = $i
            if (numeric_max[i] == "" || $i > numeric_max[i]) numeric_max[i] = $i
        }
    }
}
END {
    printf "Rows: %d\n", total_rows
    printf "Cols: %d\n", num_columns
    printf "Empty rows: %d\n", empty_lines + 0
    for (i = 1; i <= num_columns; i++) {
        printf "Col%d (%s): unique=%d", i, headers[i], length(field_values[i])
        if (numeric_count[i] > 0) {
            avg = numeric_sum[i] / numeric_count[i]
            printf ", min=%.2f, max=%.2f, avg=%.2f", numeric_min[i], numeric_max[i], avg
        }
        print ""
    }
}' data.csv

Challenge 9: Automated backup script

Build a backup script for important files. Only files changed since the last backup; only files smaller than 100MB; log all backup operations; auto-delete old backups (more than 7 days old).

Challenge 9 - Show solution
#!/bin/bash

BACKUP_DIR="/backup/$(date +%Y%m%d_%H%M%S)"
LAST_BACKUP_MARKER="/var/log/last_backup.timestamp"
LOG_FILE="/var/log/backup.log"

echo "=== Backup started at $(date) ===" >> "$LOG_FILE"
mkdir -p "$BACKUP_DIR"

find /home/important -type f -size -100M -newer "$LAST_BACKUP_MARKER" 2>/dev/null | \
while read file; do
    rel_path="${file#/home/important/}"
    backup_path="$BACKUP_DIR/$rel_path"
    backup_dir=$(dirname "$backup_path")
    mkdir -p "$backup_dir"

    if cp "$file" "$backup_path" 2>/dev/null; then
        echo "Backed up: $file" >> "$LOG_FILE"
    fi
done

# Delete old backups
find /backup -type d -mtime +7 -exec rm -rf {} + 2>/dev/null

# Update timestamp
date > "$LAST_BACKUP_MARKER"

Master Challenge

Challenge 10: Comprehensive system monitoring dashboard

Build a system monitoring script with:

  • Real-time log monitoring
  • Automatic alerting on errors
  • Resource usage visualization
  • Daily report generation
  • Web confirmation (HTML report)
Challenge 10 - Approach hint

Use tail -f for real-time monitoring, awk for stats, find for old file management, HTML templates for reports.

Completing this challenge means you can confidently call yourself a Linux power user.

Conclusion: First Step to Linux Power User

This series covered find, grep, and awk in detail, from basics to practical applications. Mastering these gets you to the level of a true Linux power user.

Skills Acquired

  • find: Search files and directories under any condition, fast
  • grep: Advanced text search with regex
  • awk: Data processing, aggregation, report generation
  • Effective combinations of all three
  • Performance optimization and troubleshooting
  • Industry case studies and production skills
  • Exercises and skill validation

Series Recap

  • Basics — Command overview and regex basics
  • Advanced — High-level grep and awk
  • Practical — Combinations and production usage
  • Professional — Exercises and troubleshooting (this article)

Expected Outcomes

  • Productivity: Automate the bulk of manual work
  • Problem solving: Quickly handle log analysis and data investigation
  • Career: Open paths into infrastructure, data, and DevOps

Practice now

What matters most is applying what you learned in your real work. Hands-on practice on Penguin Gym Linux and daily use of these commands will turn knowledge into skills.