find, grep, awk Exercises - Troubleshooting and Practical Skill Building
The final episode. Covers skill validation exercises, fixes for common issues, and a growth roadmap. Final polish on your way to becoming a Linux power user.
Common Issues and Fixes
If you use these in production, you will run into these typical issues. Know the fixes ahead of time.
find Command Issues
A flood of "Permission denied" errors
Symptoms:
find: '/root': Permission denied find: '/proc/1': Permission denied
Fixes:
# Option 1: Suppress error output find / -name "*.txt" 2>/dev/null # Option 2: Search only locations you have access to find /home /var /tmp -name "*.txt" # Option 3: Run with sudo (use carefully) sudo find / -name "*.txt"
Errors when filenames contain spaces
Symptom: "My Document.txt" is interpreted as "My", "Document.txt".
Fixes:
# Use -print0 with xargs -0
find /home -name "*.txt" -print0 | xargs -0 rm
# Use -exec with +
find /home -name "*.txt" -exec rm {} +Searches are too slow
Fixes:
- Skip unwanted directories with
-path - Limit depth with
-maxdepth - Filter files with
-type f
find /var -maxdepth 3 -type f -path "*/node_modules" -prune -o -name "*.log" -print
grep Command Issues
Multibyte characters (e.g., Japanese) not searched correctly
Fixes:
# Verify and set locale export LANG=en_US.UTF-8 grep "error" logfile.txt # Avoid binary classification grep -a "error" logfile.txt
Regex doesn't behave as expected
Common issues: +, ?, {} treated as literals, () grouping not supported.
Fixes:
# Use -E for extended regex grep -E "colou?r" file.txt grep -E "(http|https)://" file.txt # Use the egrep alias egrep "colou?r" file.txt
Binary file matches error
Symptom: Binary file image.jpg matches
Fixes:
# Search text files only grep -I "pattern" * # Restrict file types grep -r --include="*.txt" --include="*.log" "pattern" .
awk Command Issues
Fields aren't split as expected
Symptom: CSV with commas inside quoted fields, e.g. "Tanaka","28","Tokyo, Shibuya","Engineer, Team Lead".
Fixes:
# Use a dedicated tool csvtool col 1,2 data.csv # Pipe through Python python3 -c " import csv, sys reader = csv.reader(sys.stdin) for row in reader: print(row[0], row[1]) " < data.csv
Numeric precision issues
Symptom: Decimal calculations are imprecise (expected 10.50, actual 10.5000000001).
Fixes:
# Use printf with precision
awk '{sum+=$1} END {printf "%.2f\n", sum}' numbers.txt
# Pipe to bc
awk '{print $1}' numbers.txt | paste -sd+ | bcDebugging Techniques
Check incrementally
Run complex commands piece by piece.
# Final command find /var/log -name "*.log" | xargs grep -l "ERROR" | xargs wc -l # Debugging steps # 1. Run only the find part find /var/log -name "*.log" # 2. Run up to grep find /var/log -name "*.log" | xargs grep -l "ERROR" # 3. Run the full command find /var/log -name "*.log" | xargs grep -l "ERROR" | xargs wc -l
Save intermediate results
For long-running pipelines, save intermediate output.
find /var -name "*.log" > all_logs.txt grep -l "ERROR" $(cat all_logs.txt) > error_logs.txt wc -l $(cat error_logs.txt) > final_result.txt
Further Skills
Now that you've mastered find, grep, and awk, here are the skills to learn next.
Next-Level Commands
sed (stream editor): Fast text substitution, deletion, and insertion.
sed 's/error/ERROR/g' logfile.txt
Priority: Highest.
xargs (argument conversion): Convert pipe output to command-line arguments.
find . -name "*.txt" | xargs -P 4 wc -l
Priority: Highest.
sort/uniq (sort and dedupe): Reorder data and dedupe.
cat access.log | awk '{print $1}' | sort | uniq -c | sort -rnPriority: High.
join/paste (file join): Merge data from multiple files.
join -t, file1.csv file2.csv
Priority: Medium.
Exercises and Challenges
Cement skills with hands-on practice. Tackle these to verify your level.
Beginner Challenges
Challenge 1: File search basics
In /var/log and below, find files with extension .log that are 1MB or larger.
Challenge 1 - Show hint
Combine -name and -size with find.
Challenge 1 - Show solution
find /var/log -name "*.log" -size +1M
Challenge 2: Text search basics
Search system.log for lines containing "ERROR" and show with line numbers.
Challenge 2 - Show solution
grep -n "ERROR" system.log
Challenge 3: Data aggregation basics
Compute the sum of column 3 (sales) in sales.csv.
Challenge 3 - Show solution
awk -F',' '{sum += $3} END {print "Sum:", sum}' sales.csvIntermediate Challenges
Challenge 4: Log analysis pipeline
Count today's unique IP addresses from the access log.
Challenge 4 - Show hint
Filter by today's date with grep, extract IP with awk, dedupe with sort/uniq.
Challenge 4 - Show solution
grep "$(date '+%d/%b/%Y')" access.log | awk '{print $1}' | sort -u | wc -lChallenge 5: Large file search
From the home directory, find the top 5 files >= 100MB and display by size.
Challenge 5 - Show solution
find /home -type f -size +100M -exec ls -lh {} \; | sort -rh -k5 | head -5Challenge 6: Error stats report
From multiple log files, aggregate error categories and show in descending order.
Challenge 6 - Show solution
find /var/log -name "*.log" | xargs grep -h "ERROR" | awk '{print $4}' | sort | uniq -c | sort -rnAdvanced Challenges
Challenge 7: Website monitoring script
From Apache access logs, find IPs with 10+ 404 errors in the past hour and produce alert messages.
Challenge 7 - Show hint
Filter by time, extract 404, group by IP, threshold filter.
Challenge 7 - Show solution
hour_ago=$(date -d '1 hour ago' '+%d/%b/%Y:%H')
current_hour=$(date '+%d/%b/%Y:%H')
grep -E "($hour_ago|$current_hour)" /var/log/apache2/access.log | \
grep " 404 " | \
awk '{print $1}' | \
sort | uniq -c | \
awk '$1 >= 10 {printf "ALERT: IP %s has %d 404 errors in last hour\n", $2, $1}'Challenge 8: Data quality check
Build a script that checks CSV data quality and reports total rows / columns, empty rows, unique values per column, and min/max/avg of numeric columns.
Challenge 8 - Show solution
awk -F',' '
NR == 1 {
num_columns = NF
for (i = 1; i <= NF; i++) headers[i] = $i
next
}
NF == 0 { empty_lines++; next }
{
total_rows++
for (i = 1; i <= num_columns && i <= NF; i++) {
field_values[i][$i] = 1
if ($i ~ /^[0-9]+\.?[0-9]*$/) {
numeric_count[i]++
numeric_sum[i] += $i
if (numeric_min[i] == "" || $i < numeric_min[i]) numeric_min[i] = $i
if (numeric_max[i] == "" || $i > numeric_max[i]) numeric_max[i] = $i
}
}
}
END {
printf "Rows: %d\n", total_rows
printf "Cols: %d\n", num_columns
printf "Empty rows: %d\n", empty_lines + 0
for (i = 1; i <= num_columns; i++) {
printf "Col%d (%s): unique=%d", i, headers[i], length(field_values[i])
if (numeric_count[i] > 0) {
avg = numeric_sum[i] / numeric_count[i]
printf ", min=%.2f, max=%.2f, avg=%.2f", numeric_min[i], numeric_max[i], avg
}
print ""
}
}' data.csvChallenge 9: Automated backup script
Build a backup script for important files. Only files changed since the last backup; only files smaller than 100MB; log all backup operations; auto-delete old backups (more than 7 days old).
Challenge 9 - Show solution
#!/bin/bash
BACKUP_DIR="/backup/$(date +%Y%m%d_%H%M%S)"
LAST_BACKUP_MARKER="/var/log/last_backup.timestamp"
LOG_FILE="/var/log/backup.log"
echo "=== Backup started at $(date) ===" >> "$LOG_FILE"
mkdir -p "$BACKUP_DIR"
find /home/important -type f -size -100M -newer "$LAST_BACKUP_MARKER" 2>/dev/null | \
while read file; do
rel_path="${file#/home/important/}"
backup_path="$BACKUP_DIR/$rel_path"
backup_dir=$(dirname "$backup_path")
mkdir -p "$backup_dir"
if cp "$file" "$backup_path" 2>/dev/null; then
echo "Backed up: $file" >> "$LOG_FILE"
fi
done
# Delete old backups
find /backup -type d -mtime +7 -exec rm -rf {} + 2>/dev/null
# Update timestamp
date > "$LAST_BACKUP_MARKER"Master Challenge
Challenge 10: Comprehensive system monitoring dashboard
Build a system monitoring script with:
- Real-time log monitoring
- Automatic alerting on errors
- Resource usage visualization
- Daily report generation
- Web confirmation (HTML report)
Challenge 10 - Approach hint
Use tail -f for real-time monitoring, awk for stats, find for old file management, HTML templates for reports.
Completing this challenge means you can confidently call yourself a Linux power user.