find, grep, awk Exercises - Troubleshooting and Practical Skill Building
The final episode. Covers skill validation exercises, fixes for common issues, and a growth roadmap. Final polish on your way to becoming a Linux power user.
What You'll Learn
- Typical problems with find, grep, and awk you hit in production
- Exercises and challenges to validate your skills
- A learning roadmap for further growth
- A final wrap-up on your way to becoming a Linux power user
Common Issues and Fixes
Conclusion: Know the typical find, grep and awk pitfalls and their fixes upfront.
If you use these in production, you will run into these typical issues. Know the fixes ahead of time.
find Command Issues
A flood of "Permission denied" errors
Symptoms:
find: '/root': Permission denied find: '/proc/1': Permission denied
Fixes:
# Option 1: Suppress error output find / -name "*.txt" 2>/dev/null # Option 2: Search only locations you have access to find /home /var /tmp -name "*.txt" # Option 3: Run with sudo (use carefully) sudo find / -name "*.txt"
Errors when filenames contain spaces
Symptom: "My Document.txt" is interpreted as "My", "Document.txt".
Fixes:
# Use -print0 with xargs -0
find /home -name "*.txt" -print0 | xargs -0 rm
# Use -exec with +
find /home -name "*.txt" -exec rm {} +Searches are too slow
Fixes:
- Skip unwanted directories with
-path - Limit depth with
-maxdepth - Filter files with
-type f
find /var -maxdepth 3 -type f -path "*/node_modules" -prune -o -name "*.log" -print
grep Command Issues
Multibyte characters (e.g., Japanese) not searched correctly
Fixes:
# Verify and set locale export LANG=en_US.UTF-8 grep "error" logfile.txt # Avoid binary classification grep -a "error" logfile.txt
Regex doesn't behave as expected
Common issues: +, ?, {} treated as literals, () grouping not supported.
Fixes:
# Use -E for extended regex grep -E "colou?r" file.txt grep -E "(http|https)://" file.txt # Use the egrep alias egrep "colou?r" file.txt
Binary file matches error
Symptom: Binary file image.jpg matches
Fixes:
# Search text files only grep -I "pattern" * # Restrict file types grep -r --include="*.txt" --include="*.log" "pattern" .
awk Command Issues
Fields aren't split as expected
Symptom: CSV with commas inside quoted fields, e.g. "Tanaka","28","Tokyo, Shibuya","Engineer, Team Lead".
Fixes:
# Use a dedicated tool csvtool col 1,2 data.csv # Pipe through Python python3 -c " import csv, sys reader = csv.reader(sys.stdin) for row in reader: print(row[0], row[1]) " < data.csv
Numeric precision issues
Symptom: Decimal calculations are imprecise (expected 10.50, actual 10.5000000001).
Fixes:
# Use printf with precision
awk '{sum+=$1} END {printf "%.2f\n", sum}' numbers.txt
# Pipe to bc
awk '{print $1}' numbers.txt | paste -sd+ | bcDebugging Techniques
Check incrementally
Run complex commands piece by piece.
# Final command find /var/log -name "*.log" | xargs grep -l "ERROR" | xargs wc -l # Debugging steps # 1. Run only the find part find /var/log -name "*.log" # 2. Run up to grep find /var/log -name "*.log" | xargs grep -l "ERROR" # 3. Run the full command find /var/log -name "*.log" | xargs grep -l "ERROR" | xargs wc -l
Save intermediate results
For long-running pipelines, save intermediate output.
find /var -name "*.log" > all_logs.txt grep -l "ERROR" $(cat all_logs.txt) > error_logs.txt wc -l $(cat error_logs.txt) > final_result.txt
Further Skills
Now that you've mastered find, grep, and awk, here are the skills to learn next.
Next-Level Commands
sed (stream editor): Fast text substitution, deletion, and insertion.
sed 's/error/ERROR/g' logfile.txt
Priority: Highest.
xargs (argument conversion): Convert pipe output to command-line arguments.
find . -name "*.txt" | xargs -P 4 wc -l
Priority: Highest.
sort/uniq (sort and dedupe): Reorder data and dedupe.
cat access.log | awk '{print $1}' | sort | uniq -c | sort -rnPriority: High.
join/paste (file join): Merge data from multiple files.
join -t, file1.csv file2.csv
Priority: Medium.
Exercises and Challenges
Conclusion: Hands-on, staged exercises let you verify and cement your skills.
Cement skills with hands-on practice. Tackle these to verify your level.
Beginner Challenges
Challenge 1: File search basics
In /var/log and below, find files with extension .log that are 1MB or larger.
Challenge 1 - Show hint
Combine -name and -size with find.
Challenge 1 - Show solution
find /var/log -name "*.log" -size +1M
Challenge 2: Text search basics
Search system.log for lines containing "ERROR" and show with line numbers.
Challenge 2 - Show solution
grep -n "ERROR" system.log
Challenge 3: Data aggregation basics
Compute the sum of column 3 (sales) in sales.csv.
Challenge 3 - Show solution
awk -F',' '{sum += $3} END {print "Sum:", sum}' sales.csvIntermediate Challenges
Challenge 4: Log analysis pipeline
Count today's unique IP addresses from the access log.
Challenge 4 - Show hint
Filter by today's date with grep, extract IP with awk, dedupe with sort/uniq.
Challenge 4 - Show solution
grep "$(date '+%d/%b/%Y')" access.log | awk '{print $1}' | sort -u | wc -lChallenge 5: Large file search
From the home directory, find the top 5 files >= 100MB and display by size.
Challenge 5 - Show solution
find /home -type f -size +100M -exec ls -lh {} \; | sort -rh -k5 | head -5Challenge 6: Error stats report
From multiple log files, aggregate error categories and show in descending order.
Challenge 6 - Show solution
find /var/log -name "*.log" | xargs grep -h "ERROR" | awk '{print $4}' | sort | uniq -c | sort -rnAdvanced Challenges
Challenge 7: Website monitoring script
From Apache access logs, find IPs with 10+ 404 errors in the past hour and produce alert messages.
Challenge 7 - Show hint
Filter by time, extract 404, group by IP, threshold filter.
Challenge 7 - Show solution
hour_ago=$(date -d '1 hour ago' '+%d/%b/%Y:%H')
current_hour=$(date '+%d/%b/%Y:%H')
grep -E "($hour_ago|$current_hour)" /var/log/apache2/access.log | \
grep " 404 " | \
awk '{print $1}' | \
sort | uniq -c | \
awk '$1 >= 10 {printf "ALERT: IP %s has %d 404 errors in last hour\n", $2, $1}'Challenge 8: Data quality check
Build a script that checks CSV data quality and reports total rows / columns, empty rows, unique values per column, and min/max/avg of numeric columns.
Challenge 8 - Show solution
awk -F',' '
NR == 1 {
num_columns = NF
for (i = 1; i <= NF; i++) headers[i] = $i
next
}
NF == 0 { empty_lines++; next }
{
total_rows++
for (i = 1; i <= num_columns && i <= NF; i++) {
field_values[i][$i] = 1
if ($i ~ /^[0-9]+\.?[0-9]*$/) {
numeric_count[i]++
numeric_sum[i] += $i
if (numeric_min[i] == "" || $i < numeric_min[i]) numeric_min[i] = $i
if (numeric_max[i] == "" || $i > numeric_max[i]) numeric_max[i] = $i
}
}
}
END {
printf "Rows: %d\n", total_rows
printf "Cols: %d\n", num_columns
printf "Empty rows: %d\n", empty_lines + 0
for (i = 1; i <= num_columns; i++) {
printf "Col%d (%s): unique=%d", i, headers[i], length(field_values[i])
if (numeric_count[i] > 0) {
avg = numeric_sum[i] / numeric_count[i]
printf ", min=%.2f, max=%.2f, avg=%.2f", numeric_min[i], numeric_max[i], avg
}
print ""
}
}' data.csvChallenge 9: Automated backup script
Build a backup script for important files. Only files changed since the last backup; only files smaller than 100MB; log all backup operations; auto-delete old backups (more than 7 days old).
Challenge 9 - Show solution
#!/bin/bash
BACKUP_DIR="/backup/$(date +%Y%m%d_%H%M%S)"
LAST_BACKUP_MARKER="/var/log/last_backup.timestamp"
LOG_FILE="/var/log/backup.log"
echo "=== Backup started at $(date) ===" >> "$LOG_FILE"
mkdir -p "$BACKUP_DIR"
find /home/important -type f -size -100M -newer "$LAST_BACKUP_MARKER" 2>/dev/null | \
while read file; do
rel_path="${file#/home/important/}"
backup_path="$BACKUP_DIR/$rel_path"
backup_dir=$(dirname "$backup_path")
mkdir -p "$backup_dir"
if cp "$file" "$backup_path" 2>/dev/null; then
echo "Backed up: $file" >> "$LOG_FILE"
fi
done
# Delete old backups
find /backup -type d -mtime +7 -exec rm -rf {} + 2>/dev/null
# Update timestamp
date > "$LAST_BACKUP_MARKER"Master Challenge
Challenge 10: Comprehensive system monitoring dashboard
Build a system monitoring script with:
- Real-time log monitoring
- Automatic alerting on errors
- Resource usage visualization
- Daily report generation
- Web confirmation (HTML report)
Challenge 10 - Approach hint
Use tail -f for real-time monitoring, awk for stats, find for old file management, HTML templates for reports.
Completing this challenge means you can confidently call yourself a Linux power user.