find/grep/awk Master Series Practical
Combinations & Real-World Applications
Detailed explanation of practical data processing patterns combining find, grep, and awk, along with real-world use cases. Master practical skills for engineers and data analysts.
📋 Table of Contents
6. Combination Techniques
True Linux masters use find, grep, and awk in combination. Tasks that are difficult with single commands become powerful solutions when combined.
🔗 Basic Pipe Connection Patterns
find + grep Combination
find /var/log -name "*.log" -exec grep -l "ERROR" {} \;
Identify log files containing ERROR
find /home -name "*.txt" | xargs grep -n "password"
Search for "password" in txt files with line numbers
grep + awk Combination
grep "ERROR" /var/log/app.log | awk '{print $1, $2, $NF}'
Extract date, time, and last field from error lines
ps aux | grep "nginx" | awk '{sum+=$4} END {print "Total CPU usage:", sum "%"}'
Sum CPU usage of nginx processes
find + awk Combination
find /var -name "*.log" -printf "%s %p\n" | awk '{size+=$1; count++} END {printf "Total size: %.2f MB Files: %d\n", size/1024/1024, count}'
Calculate total size and count of log files
🎯 Production-Level Complex Processing
📊 Scenario 1: Web Server Access Analysis
Goal: Extract top 10 IP addresses with most errors from last week's access logs
Solution:
find /var/log/apache2 -name "access.log*" -mtime -7 | \
xargs grep " 5[0-9][0-9] " | \
awk '{print $1}' | \
sort | uniq -c | \
sort -rn | \
head -10 | \
awk '{printf "%-15s %d times\n", $2, $1}'
Step Explanation:
find: Search for access log files within last 7 daysgrep: Extract 5xx errors (server errors)awk: Extract only IP addresses (1st column)sort | uniq -c: Count by IP addresssort -rn: Sort by count in descending orderhead -10: Get top 10awk: Format output for readability
🔍 Scenario 2: Safe Bulk Deletion of Old Temp Files
Goal: Safely delete temporary files older than 30 days from entire system
Solution:
# 1. First confirm target files
find /tmp /var/tmp /home -name "*.tmp" -o -name "temp*" -o -name "*.temp" | \
grep -E "\.(tmp|temp)$|^temp" | \
xargs ls -la | \
awk '$6 " " $7 " " $8 < "'$(date -d "30 days ago" "+%b %d %H:%M")'" {print $NF}'
# 2. Execute deletion after confirming safety
find /tmp /var/tmp /home -name "*.tmp" -mtime +30 -size +0 | \
xargs -I {} bash -c 'echo "Deleting: {}"; rm "{}"'
Safe Deletion Procedure:
- First list and verify deletion targets
- Target only files older than 30 days AND larger than 0 bytes
- Display filename before deletion (for logging)
📈 Scenario 3: Database Connection Log Analysis
Goal: Analyze MySQL connection count by time period
Solution:
find /var/log/mysql -name "*.log" -mtime -1 | \
xargs grep -h "Connect" | \
awk '{
# Extract time (eg: 2025-01-15T14:30:25.123456Z)
match($0, /[0-9]{4}-[0-9]{2}-[0-9]{2}T([0-9]{2})/, time_parts);
hour = time_parts[1];
connections[hour]++;
}
END {
print "MySQL Connections by Hour (Last 24 Hours)";
print "================================";
for (h = 0; h < 24; h++) {
printf "%02d:00-%02d:59 | ", h, h;
count = (h in connections) ? connections[h] : 0;
printf "%5d times ", count;
# Simple graph display
for (i = 0; i < count/10; i++) printf "▓";
printf "\n";
}
}'
Advanced Processing Points:
- Time extraction using regex
- Hourly aggregation using associative arrays
- Visual graph display
- Complete 24-hour display including zero-count hours
⚡ One-Liner Technique Collection
Collection of useful one-liners ready to use with excellent practicality!
💾 Disk & File Management
find . -type f -exec du -h {} + | sort -rh | head -20
Top 20 largest files
find /var -name "*.log" -mtime +7 -exec ls -lh {} \; | awk '{size+=$5} END {print "Deletable size:", size/1024/1024 "MB"}'
Calculate total size of old log files
🌐 Network & Access Analysis
grep "$(date '+%d/%b/%Y')" /var/log/apache2/access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -10
Top 10 IP addresses with most access today
find /var/log -name "*.log" | xargs grep -h "Failed password" | awk '{print $11}' | sort | uniq -c | sort -rn
IP addresses with most SSH login failures
📊 System Monitoring
find /proc -maxdepth 2 -name "status" 2>/dev/null | xargs grep -l "VmRSS" | xargs -I {} bash -c 'echo -n "$(basename $(dirname {})): "; grep VmRSS {}'
Memory usage per process
find /var/log -name "syslog*" | xargs grep "$(date '+%b %d')" | grep -i "error\|warn\|fail" | awk '{print $5}' | sort | uniq -c | sort -rn
Today's system error/warning source statistics
🏗️ Pipeline Design Patterns: The Art of Data Flow Design
Professional techniques for designing efficient and maintainable pipelines for complex data processing.
🔄 Error Handling and Recovery Patterns
In production environments, failure is expected. Proper error handling and continued processing design are crucial.
🛡️ Failure-Proof Pipeline Design
# Detect mid-process failures with pipefail
set -euo pipefail
# Error handling function
handle_error() {
echo "ERROR: Pipeline processing error occurred (line: $1)" >&2
echo "ERROR: Check intermediate files: /tmp/pipeline_*" >&2
exit 1
}
# Set error trap
trap 'handle_error $LINENO' ERR
# Safe pipeline processing
process_logs_safely() {
local input_pattern="$1"
local output_file="$2"
local temp_dir="/tmp/pipeline_$$"
# Create temp directory
mkdir -p "$temp_dir"
# Step 1: File collection (skip on failure)
echo "Step 1: Collecting log files..."
find /var/log -name "$input_pattern" -type f 2>/dev/null > "$temp_dir/file_list" || {
echo "WARNING: Could not access some files" >&2
}
# Handle no files found
if [[ ! -s "$temp_dir/file_list" ]]; then
echo "ERROR: No files to process found" >&2
rm -rf "$temp_dir"
return 1
fi
# Step 2: Data processing (process each file individually)
echo "Step 2: Processing data..."
while IFS= read -r logfile; do
if [[ -r "$logfile" ]]; then
grep -h "ERROR\|WARN" "$logfile" 2>/dev/null >> "$temp_dir/errors.log" || true
else
echo "WARNING: Could not read $logfile" >&2
fi
done < "$temp_dir/file_list"
# Step 3: Aggregation processing
echo "Step 3: Aggregating..."
if [[ -s "$temp_dir/errors.log" ]]; then
awk '
{
# Error pattern extraction and aggregation
if ($0 ~ /ERROR/) error_count++;
if ($0 ~ /WARN/) warn_count++;
# Hourly aggregation
if (match($0, /[0-9]{2}:[0-9]{2}:[0-9]{2}/, time_match)) {
hour = substr(time_match[0], 1, 2);
hourly_errors[hour]++;
}
}
END {
printf "Error Statistics Report\n";
printf "==================\n";
printf "ERROR: %d items\n", error_count;
printf "WARN: %d items\n", warn_count;
printf "\nErrors by Hour:\n";
for (h = 0; h < 24; h++) {
printf "%02d hour: %d items\n", h, (h in hourly_errors) ? hourly_errors[h] : 0;
}
}' "$temp_dir/errors.log" > "$output_file"
else
echo "No error logs found" > "$output_file"
fi
# Delete temp files
rm -rf "$temp_dir"
echo "Processing complete: Results output to $output_file"
}
# Execution example
process_logs_safely "*.log" "/tmp/error_report.txt"
Robust pipeline with error handling, file existence checks, and temp file management
⚡ Performance Optimization Patterns
Pipeline design patterns balancing large data volumes and high-speed processing.
🚀 Parallel Processing Pipeline
# Parallelize CPU-intensive processing
parallel_log_analysis() {
local log_pattern="$1"
local output_dir="$2"
local cpu_cores=$(nproc)
local max_parallel=$((cpu_cores - 1)) # Consider system load
echo "Starting parallel processing: Max ${max_parallel} processes"
# Discover log files and distribute to worker processes
find /var/log -name "$log_pattern" -type f | \
xargs -n 1 -P "$max_parallel" -I {} bash -c '
logfile="$1"
output_dir="$2"
worker_id="$$"
echo "Worker $worker_id: Starting processing $logfile"
# Aggregation processing (CPU intensive)
result_file="$output_dir/result_$worker_id.tmp"
# Complex regex processing
grep -E "([0-9]{1,3}\.){3}[0-9]{1,3}" "$logfile" | \
awk "
BEGIN {
FS=\" \";
worker_id=\"$worker_id\";
}
{
# IP region determination (simplified)
ip = \$1;
if (match(ip, /^192\.168\./)) region = \"local\";
else if (match(ip, /^10\./)) region = \"internal\";
else if (match(ip, /^172\.(1[6-9]|2[0-9]|3[01])\./)) region = \"internal\";
else region = \"external\";
# Time period analysis
if (match(\$4, /\[([0-9]{2})/, time_parts)) {
hour = time_parts[1];
access_by_region_hour[region][hour]++;
}
total_by_region[region]++;
}
END {
printf \"# Worker %s results\n\", worker_id;
for (region in total_by_region) {
printf \"region:%s total:%d\n\", region, total_by_region[region];
for (hour = 0; hour < 24; hour++) {
count = (region SUBSEP hour in access_by_region_hour) ?
access_by_region_hour[region][hour] : 0;
if (count > 0) {
printf \"region:%s hour:%02d count:%d\n\", region, hour, count;
}
}
}
}" > "$result_file"
echo "Worker $worker_id: Processing complete"
' -- {} "$output_dir"
# Merge all worker results
echo "Merging results..."
merge_worker_results "$output_dir"
}
merge_worker_results() {
local output_dir="$1"
local final_report="$output_dir/final_analysis.txt"
# Aggregate all worker results
find "$output_dir" -name "result_*.tmp" | \
xargs cat | \
awk '
/^region:.*total:/ {
split($0, parts, " ");
region = substr(parts[1], 8); # Remove "region:"
total = substr(parts[2], 7); # Remove "total:"
region_totals[region] += total;
}
/^region:.*hour:.*count:/ {
split($0, parts, " ");
region = substr(parts[1], 8);
hour = substr(parts[2], 6);
count = substr(parts[3], 7);
region_hour_counts[region][hour] += count;
}
END {
print "=== Regional Access Analysis via Parallel Processing ===";
print "";
for (region in region_totals) {
printf "Region: %s (Total: %d)\n", region, region_totals[region];
printf "Hourly Details:\n";
for (hour = 0; hour < 24; hour++) {
count = (region SUBSEP hour in region_hour_counts) ?
region_hour_counts[region][hour] : 0;
if (count > 0) {
# Simple graph display
bar_length = int(count / 10);
if (bar_length > 50) bar_length = 50;
printf " %02d hour: %5d ", hour, count;
for (i = 0; i < bar_length; i++) printf "▓";
printf "\n";
}
}
printf "\n";
}
}' > "$final_report"
# Cleanup temp files
find "$output_dir" -name "result_*.tmp" -delete
echo "Final report generated: $final_report"
}
# Execution example
mkdir -p /tmp/parallel_analysis
parallel_log_analysis "access.log*" "/tmp/parallel_analysis"
Accelerate log analysis with multi-core parallel processing and merge results
7. Real-World Use Cases
Practice over theory! Introducing how these are used in actual work by job role.
💻 Web Engineers
🚨 Emergency Production Issue Response
Situation: "Site is slow" reported. Need to identify cause quickly
Response Procedure:
1. Check Error Logs
find /var/log/apache2 /var/log/nginx -name "*.log" | xargs grep -E "$(date '+%d/%b/%Y')" | grep -E "5[0-9][0-9]|error|timeout" | tail -50
2. Identify Slow Queries
find /var/log/mysql -name "*slow.log" | xargs grep -A 5 "Query_time" | awk '/Query_time: [5-9]/ {getline; print}'
3. Detect Abnormal Access Patterns
grep "$(date '+%d/%b/%Y')" /var/log/apache2/access.log | awk '{print $1}' | sort | uniq -c | awk '$1 > 1000 {print "Abnormal access:", $2, "Count:", $1}'
⏱️ Impact: Tasks that would take 30-60 minutes manually completed in 5 minutes
📊 Monthly Report Generation
Situation: Need to compile last month's access statistics and error rates
# Access statistics report generation script
#!/bin/bash
LAST_MONTH=$(date -d "last month" '+%b/%Y')
echo "=== $LAST_MONTH Access Statistics Report ==="
echo
# Total access count
TOTAL_ACCESS=$(find /var/log/apache2 -name "access.log*" | xargs grep "$LAST_MONTH" | wc -l)
echo "Total Access: $TOTAL_ACCESS"
# Unique visitors
UNIQUE_VISITORS=$(find /var/log/apache2 -name "access.log*" | xargs grep "$LAST_MONTH" | awk '{print $1}' | sort -u | wc -l)
echo "Unique Visitors: $UNIQUE_VISITORS"
# Error rate
ERROR_COUNT=$(find /var/log/apache2 -name "access.log*" | xargs grep "$LAST_MONTH" | grep -E " [45][0-9][0-9] " | wc -l)
ERROR_RATE=$(echo "scale=2; $ERROR_COUNT * 100 / $TOTAL_ACCESS" | bc)
echo "Error Rate: $ERROR_RATE%"
# Top 10 popular pages
echo -e "\n=== Top 10 Popular Pages ==="
find /var/log/apache2 -name "access.log*" | xargs grep "$LAST_MONTH" | awk '{print $7}' | grep -v "\.css\|\.js\|\.png\|\.jpg" | sort | uniq -c | sort -rn | head -10 | awk '{printf "%-50s %d times\n", $2, $1}'
⏱️ Impact: Excel work that took half a day automated in 3 minutes
🛠️ Infrastructure Engineers
🖥️ Server Monitoring & Maintenance
Situation: Regularly check health status of multiple servers
# Server health status check script
#!/bin/bash
echo "=== Server Health Status Report ==="
date
echo
# Disk usage warning
echo "=== Disk Usage (Warning at 80%+) ==="
df -h | awk 'NR>1 {gsub(/%/, "", $5); if($5 > 80) printf "⚠️ %s: %s used (%s%%)\n", $6, $3, $5}'
# Memory usage
echo -e "\n=== Memory Usage ==="
free -m | awk 'NR==2{printf "Memory Usage: %.1f%% (%dMB / %dMB)\n", $3*100/$2, $3, $2}'
# High CPU usage processes
echo -e "\n=== Top 5 CPU Usage ==="
ps aux --no-headers | sort -rn -k3 | head -5 | awk '{printf "%-10s %5.1f%% %s\n", $1, $3, $11}'
# Error log surge check
echo -e "\n=== Last Hour Error Count ==="
find /var/log -name "*.log" -mmin -60 | xargs grep -h -E "$(date '+%b %d %H')|$(date -d '1 hour ago' '+%b %d %H')" | grep -ci error
⏱️ Impact: Manual checks taking 1 hour automated for 24/7 monitoring
📈 Data Analysts
📊 Large Data Preprocessing
Situation: Multi-GB CSV file cannot be opened in Excel. Need preprocessing
# Large CSV analysis & preprocessing script
#!/bin/bash
CSV_FILE="sales_data_2024.csv"
OUTPUT_DIR="processed_data"
mkdir -p $OUTPUT_DIR
echo "=== Large CSV Analysis Started ==="
# File size & line count check
echo "File size: $(du -h "$CSV_FILE" | cut -f1)"
echo "Total lines: $(wc -l < "$CSV_FILE")"
# Data quality check
echo -e "\n=== Data Quality Check ==="
echo "Empty lines: $(grep -c '^$' "$CSV_FILE")"
echo "Invalid lines: $(awk -F',' 'NF != 5 {count++} END {print count+0}' "$CSV_FILE")"
# Split by month
echo -e "\n=== Splitting by month... ==="
awk -F',' 'NR==1 {header=$0; next}
{
month=substr($1,1,7); # Extract YYYY-MM portion
if(!seen[month]) {
print header > "'$OUTPUT_DIR'/sales_" month ".csv";
seen[month]=1;
}
print $0 > "'$OUTPUT_DIR'/sales_" month ".csv"
}' "$CSV_FILE"
# Create monthly summary
echo -e "\n=== Monthly Summary ==="
find $OUTPUT_DIR -name "sales_*.csv" | sort | while read file; do
month=$(basename "$file" .csv | cut -d'_' -f2)
total_sales=$(awk -F',' 'NR>1 {sum+=$4} END {print sum}' "$file")
record_count=$(expr $(wc -l < "$file") - 1)
printf "%s: %'d items, Total sales: ¥%'d\n" "$month" "$record_count" "$total_sales"
done
⏱️ Impact: Hours of Excel work completed in minutes, memory shortage resolved
🏭 Industry Case Studies: Professional Practice Examples
Detailed explanation of techniques used in actual projects across industries.
🎮 Game Development: Large-Scale Log Analysis
📋 Challenge
Detect cheating and analyze game balance from 100GB daily player behavior logs in online game
💡 Solution
# Game log analysis pipeline
#!/bin/bash
# Detect abnormal behavior from one day's logs
analyze_game_logs() {
local log_date="$1"
local output_dir="/analysis/$(date +%Y%m%d)"
mkdir -p "$output_dir"
echo "=== Game log analysis started: $log_date ==="
# Step 1: Analyze player behavior patterns
find /game/logs -name "*${log_date}*.log" -type f | \
xargs grep -h "PLAYER_ACTION" | \
awk -F'|' '
{
player_id = $3;
action = $4;
timestamp = $2;
value = $5;
# Detect abnormally frequent actions in short time
if (action == "LEVEL_UP") {
player_levelups[player_id]++;
if (player_levelups[player_id] > 10) {
print "SUSPICIOUS_LEVELUP", player_id, player_levelups[player_id] > "/tmp/cheat_suspects.log";
}
}
# Abnormal currency increase
if (action == "GOLD_CHANGE" && value > 1000000) {
print "SUSPICIOUS_GOLD", player_id, value, timestamp > "/tmp/gold_anomaly.log";
}
# Player statistics
player_actions[player_id]++;
total_actions++;
}
END {
# Players with abnormally high action counts
avg_actions = total_actions / length(player_actions);
for (player in player_actions) {
if (player_actions[player] > avg_actions * 5) {
printf "HIGH_ACTIVITY_PLAYER: %s (%d actions, avg: %.1f)\n",
player, player_actions[player], avg_actions > "/tmp/high_activity.log";
}
}
}'
# Merge results and generate report
{
echo "=== Game Cheat Detection Report $(date) ==="
echo ""
if [[ -s "/tmp/cheat_suspects.log" ]]; then
echo "🚨 Level-up Anomalies:"
sort "/tmp/cheat_suspects.log" | uniq -c | sort -rn | head -10
echo ""
fi
if [[ -s "/tmp/gold_anomaly.log" ]]; then
echo "💰 Gold Anomalies:"
sort -k3 -rn "/tmp/gold_anomaly.log" | head -10
echo ""
fi
if [[ -s "/tmp/high_activity.log" ]]; then
echo "⚡ High Activity Players:"
head -20 "/tmp/high_activity.log"
fi
} > "$output_dir/cheat_detection_report.txt"
# Delete temp files
rm -f /tmp/{cheat_suspects,gold_anomaly,high_activity}.log
echo "Report generated: $output_dir/cheat_detection_report.txt"
}
# Execution example: Analyze yesterday's logs
analyze_game_logs "$(date -d yesterday +%Y%m%d)"
Automatically detect cheating from 100GB game logs
📊 Impact
- Manual investigation: Days → Automated: 30 minutes
- Cheat detection accuracy: 95%+
- Operations effort: 80% reduction
🏪 E-commerce & Retail: Customer Behavior Analysis
📋 Challenge
Analyze customer purchase patterns from e-commerce access logs and measure personalization campaign effectiveness
💡 Solution
# E-commerce customer behavior analysis
#!/bin/bash
analyze_customer_journey() {
local analysis_period="$1" # YYYY-MM-DD
local output_dir="/analytics/customer_journey"
mkdir -p "$output_dir"
echo "=== Customer Journey Analysis: $analysis_period ==="
# Session construction and page transition analysis
find /var/log/nginx -name "access.log*" | \
xargs grep "$analysis_period" | \
awk '
BEGIN {
# Session boundary: 30 minute interval
session_timeout = 1800;
}
{
ip = $1;
url = $7;
# Session management and page transition recording
current_time = systime();
if (current_time - last_access[ip] > session_timeout) {
session_id = ip "_" current_time;
sessions[session_id]["start_time"] = current_time;
sessions[session_id]["pages"] = 0;
}
sessions[session_id]["pages"]++;
# Purchase page access detection
if (url ~ /\/checkout|\/purchase/) {
purchase_sessions[session_id] = 1;
purchase_path_length[sessions[session_id]["pages"]]++;
}
# Product page view patterns
if (url ~ /\/products\/([0-9]+)/) {
match(url, /\/products\/([0-9]+)/, product_match);
product_id = product_match[1];
product_views[product_id]++;
user_product_views[ip][product_id]++;
}
last_access[ip] = current_time;
}
END {
# Output results
print "=== Session Statistics ===";
printf "Total Sessions: %d\n", length(sessions);
printf "Purchase Completed Sessions: %d\n", length(purchase_sessions);
printf "Purchase Rate: %.2f%%\n", (length(purchase_sessions) * 100.0) / length(sessions);
print "\n=== Purchase Path Analysis ===";
for (length in purchase_path_length) {
printf "Purchases via %d pages: %d cases\n", length, purchase_path_length[length];
}
# Generate recommendation data
print "\n=== Product Co-occurrence Analysis (for Recommendations) ===";
for (user in user_product_views) {
products_viewed = "";
for (product in user_product_views[user]) {
products_viewed = products_viewed product " ";
}
if (length(user_product_views[user]) >= 2) {
print "USER_PRODUCTS:", user, products_viewed;
}
}
}' > "$output_dir/customer_journey_${analysis_period}.txt"
echo "Customer journey analysis complete: $output_dir/customer_journey_${analysis_period}.txt"
}
# Execution example: Analyze yesterday's customer behavior
analyze_customer_journey "$(date -d yesterday +%Y-%m-%d)"
Analyze e-commerce customer behavior patterns and generate recommendation data
📊 Impact
- Data analysis effort: 70% reduction
- Personalization accuracy: 30% improvement
- Real-time analysis enabled
8. Performance Optimization
Learn techniques to maximize speed and efficiency in large data processing.
⚡ Basic Speed Optimization Techniques
🌐 Locale Setting Optimization
🐌 Slow Method
grep "pattern" large_file.txt
Overhead from UTF-8 processing
⚡ Optimization
LC_ALL=C grep "pattern" large_file.txt
Up to 10x faster with ASCII processing
📁 File Access Optimization
🐌 Slow Method
find /var -name "*.log" -exec grep "ERROR" {} \;
Spawn new process for each file
⚡ Optimization
find /var -name "*.log" | xargs grep "ERROR"
Significant speedup with batch processing
💾 Memory-Efficient Processing
🐌 High Memory Usage
awk '{lines[NR]=$0} END {for(i=1;i<=NR;i++) print lines[i]}' huge.txt
Load all lines into memory
⚡ Streaming Processing
awk '{print $0}' huge.txt
Process line by line to save memory
🚀 Advanced Optimization Techniques
🔄 Parallel Processing Utilization
# Parallel search using multiple CPU cores
find /var/log -name "*.log" -type f | \
xargs -n 1 -P $(nproc) -I {} \
bash -c 'echo "Processing: {}"; grep -c "ERROR" "{}"'
Parallel processing with available CPU cores
📊 Efficient Data Pipeline
# Memory-efficient large file processing
stream_process_large_file() {
local input_file="$1"
local chunk_size=10000
# Streaming processing in chunks
split -l "$chunk_size" "$input_file" /tmp/chunk_
# Parallel process each chunk
find /tmp -name "chunk_*" | \
xargs -n 1 -P 4 -I {} \
bash -c '
chunk_file="$1"
# High-speed processing (LC_ALL=C environment)
LC_ALL=C awk "
{
# Execute only necessary processing
if (\$0 ~ /ERROR/) error_count++;
total_lines++;
}
END {
printf \"%s: errors=%d total=%d\n\", FILENAME, error_count, total_lines;
}" "$chunk_file"
# Delete processed chunks immediately
rm "$chunk_file"
' -- {}
}
# Execution example
stream_process_large_file "huge_log_file.txt"
Split large files into chunks for parallel streaming processing
⚡ Regular Expression Optimization
❌ Inefficient Regex
grep -E "(error|ERROR|Error|warning|WARNING|Warning)" logfile
Heavy processing with complex alternatives
✅ Optimized Regex
grep -iE "(error|warning)" logfile
Concise with case-insensitive option
📈 Performance Measurement and Monitoring
⏱️ Processing Time Measurement
# Detailed performance measurement
benchmark_command() {
local command="$1"
local description="$2"
echo "=== $description ==="
echo "Command: $command"
# Time measurement
time_result=$(time (eval "$command") 2>&1)
# Memory usage measurement
/usr/bin/time -v eval "$command" 2>&1 | grep -E "(Maximum resident set size|User time|System time)"
echo "Processing complete"
echo
}
# Usage example
benchmark_command "find /var/log -name '*.log' | xargs grep -c ERROR" "Traditional method"
benchmark_command "find /var/log -name '*.log' -print0 | xargs -0 grep -c ERROR" "NULL delimiter optimization"
Detailed measurement of processing time and memory usage
📊 Resource Usage Monitoring
# Processing with real-time resource monitoring
monitor_and_process() {
local log_pattern="$1"
local output_file="$2"
# Start resource monitoring in background
{
while true; do
timestamp=$(date '+%H:%M:%S')
cpu_usage=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)
mem_usage=$(free | grep Mem | awk '{printf "%.1f", $3/$2 * 100.0}')
echo "$timestamp: CPU ${cpu_usage}%, Memory ${mem_usage}%"
sleep 5
done
} &
monitor_pid=$!
# Execute main processing
echo "Processing started: $(date)"
find /var/log -name "$log_pattern" | \
xargs grep -h "ERROR" | \
awk '{
error_patterns[$0]++;
total_errors++;
}
END {
printf "Total Errors: %d\n", total_errors;
printf "Unique Error Patterns: %d\n", length(error_patterns);
print "\n=== Top 5 Frequent Errors ===";
PROCINFO["sorted_in"] = "@val_num_desc";
count = 0;
for (pattern in error_patterns) {
printf "%s: %d times\n", pattern, error_patterns[pattern];
if (++count >= 5) break;
}
}' > "$output_file"
# Terminate monitoring process
kill $monitor_pid 2>/dev/null
echo "Processing complete: $(date)"
}
# Execution example
monitor_and_process "*.log" "/tmp/error_analysis.txt"
Real-time monitoring of CPU and memory usage during processing