find/grep/awk Master Series Practical
Combinations & Real-World Applications

Detailed explanation of practical data processing patterns combining find, grep, and awk, along with real-world use cases. Master practical skills for engineers and data analysts.

📋 Table of Contents

  1. Combination Techniques
  2. Real-World Use Cases
  3. Performance Optimization

6. Combination Techniques

True Linux masters use find, grep, and awk in combination. Tasks that are difficult with single commands become powerful solutions when combined.

🔗 Basic Pipe Connection Patterns

find + grep Combination

find /var/log -name "*.log" -exec grep -l "ERROR" {} \;

Identify log files containing ERROR

find /home -name "*.txt" | xargs grep -n "password"

Search for "password" in txt files with line numbers

grep + awk Combination

grep "ERROR" /var/log/app.log | awk '{print $1, $2, $NF}'

Extract date, time, and last field from error lines

ps aux | grep "nginx" | awk '{sum+=$4} END {print "Total CPU usage:", sum "%"}'

Sum CPU usage of nginx processes

find + awk Combination

find /var -name "*.log" -printf "%s %p\n" | awk '{size+=$1; count++} END {printf "Total size: %.2f MB Files: %d\n", size/1024/1024, count}'

Calculate total size and count of log files

🎯 Production-Level Complex Processing

📊 Scenario 1: Web Server Access Analysis

Goal: Extract top 10 IP addresses with most errors from last week's access logs

Solution:
find /var/log/apache2 -name "access.log*" -mtime -7 | \ xargs grep " 5[0-9][0-9] " | \ awk '{print $1}' | \ sort | uniq -c | \ sort -rn | \ head -10 | \ awk '{printf "%-15s %d times\n", $2, $1}'

Step Explanation:

  1. find: Search for access log files within last 7 days
  2. grep: Extract 5xx errors (server errors)
  3. awk: Extract only IP addresses (1st column)
  4. sort | uniq -c: Count by IP address
  5. sort -rn: Sort by count in descending order
  6. head -10: Get top 10
  7. awk: Format output for readability

🔍 Scenario 2: Safe Bulk Deletion of Old Temp Files

Goal: Safely delete temporary files older than 30 days from entire system

Solution:
# 1. First confirm target files find /tmp /var/tmp /home -name "*.tmp" -o -name "temp*" -o -name "*.temp" | \ grep -E "\.(tmp|temp)$|^temp" | \ xargs ls -la | \ awk '$6 " " $7 " " $8 < "'$(date -d "30 days ago" "+%b %d %H:%M")'" {print $NF}' # 2. Execute deletion after confirming safety find /tmp /var/tmp /home -name "*.tmp" -mtime +30 -size +0 | \ xargs -I {} bash -c 'echo "Deleting: {}"; rm "{}"'

Safe Deletion Procedure:

  1. First list and verify deletion targets
  2. Target only files older than 30 days AND larger than 0 bytes
  3. Display filename before deletion (for logging)

📈 Scenario 3: Database Connection Log Analysis

Goal: Analyze MySQL connection count by time period

Solution:
find /var/log/mysql -name "*.log" -mtime -1 | \ xargs grep -h "Connect" | \ awk '{ # Extract time (eg: 2025-01-15T14:30:25.123456Z) match($0, /[0-9]{4}-[0-9]{2}-[0-9]{2}T([0-9]{2})/, time_parts); hour = time_parts[1]; connections[hour]++; } END { print "MySQL Connections by Hour (Last 24 Hours)"; print "================================"; for (h = 0; h < 24; h++) { printf "%02d:00-%02d:59 | ", h, h; count = (h in connections) ? connections[h] : 0; printf "%5d times ", count; # Simple graph display for (i = 0; i < count/10; i++) printf "▓"; printf "\n"; } }'

Advanced Processing Points:

  • Time extraction using regex
  • Hourly aggregation using associative arrays
  • Visual graph display
  • Complete 24-hour display including zero-count hours

⚡ One-Liner Technique Collection

Collection of useful one-liners ready to use with excellent practicality!

💾 Disk & File Management

find . -type f -exec du -h {} + | sort -rh | head -20

Top 20 largest files

find /var -name "*.log" -mtime +7 -exec ls -lh {} \; | awk '{size+=$5} END {print "Deletable size:", size/1024/1024 "MB"}'

Calculate total size of old log files

🌐 Network & Access Analysis

grep "$(date '+%d/%b/%Y')" /var/log/apache2/access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -10

Top 10 IP addresses with most access today

find /var/log -name "*.log" | xargs grep -h "Failed password" | awk '{print $11}' | sort | uniq -c | sort -rn

IP addresses with most SSH login failures

📊 System Monitoring

find /proc -maxdepth 2 -name "status" 2>/dev/null | xargs grep -l "VmRSS" | xargs -I {} bash -c 'echo -n "$(basename $(dirname {})): "; grep VmRSS {}'

Memory usage per process

find /var/log -name "syslog*" | xargs grep "$(date '+%b %d')" | grep -i "error\|warn\|fail" | awk '{print $5}' | sort | uniq -c | sort -rn

Today's system error/warning source statistics

🏗️ Pipeline Design Patterns: The Art of Data Flow Design

Professional techniques for designing efficient and maintainable pipelines for complex data processing.

🔄 Error Handling and Recovery Patterns

In production environments, failure is expected. Proper error handling and continued processing design are crucial.

🛡️ Failure-Proof Pipeline Design
# Detect mid-process failures with pipefail set -euo pipefail # Error handling function handle_error() { echo "ERROR: Pipeline processing error occurred (line: $1)" >&2 echo "ERROR: Check intermediate files: /tmp/pipeline_*" >&2 exit 1 } # Set error trap trap 'handle_error $LINENO' ERR # Safe pipeline processing process_logs_safely() { local input_pattern="$1" local output_file="$2" local temp_dir="/tmp/pipeline_$$" # Create temp directory mkdir -p "$temp_dir" # Step 1: File collection (skip on failure) echo "Step 1: Collecting log files..." find /var/log -name "$input_pattern" -type f 2>/dev/null > "$temp_dir/file_list" || { echo "WARNING: Could not access some files" >&2 } # Handle no files found if [[ ! -s "$temp_dir/file_list" ]]; then echo "ERROR: No files to process found" >&2 rm -rf "$temp_dir" return 1 fi # Step 2: Data processing (process each file individually) echo "Step 2: Processing data..." while IFS= read -r logfile; do if [[ -r "$logfile" ]]; then grep -h "ERROR\|WARN" "$logfile" 2>/dev/null >> "$temp_dir/errors.log" || true else echo "WARNING: Could not read $logfile" >&2 fi done < "$temp_dir/file_list" # Step 3: Aggregation processing echo "Step 3: Aggregating..." if [[ -s "$temp_dir/errors.log" ]]; then awk ' { # Error pattern extraction and aggregation if ($0 ~ /ERROR/) error_count++; if ($0 ~ /WARN/) warn_count++; # Hourly aggregation if (match($0, /[0-9]{2}:[0-9]{2}:[0-9]{2}/, time_match)) { hour = substr(time_match[0], 1, 2); hourly_errors[hour]++; } } END { printf "Error Statistics Report\n"; printf "==================\n"; printf "ERROR: %d items\n", error_count; printf "WARN: %d items\n", warn_count; printf "\nErrors by Hour:\n"; for (h = 0; h < 24; h++) { printf "%02d hour: %d items\n", h, (h in hourly_errors) ? hourly_errors[h] : 0; } }' "$temp_dir/errors.log" > "$output_file" else echo "No error logs found" > "$output_file" fi # Delete temp files rm -rf "$temp_dir" echo "Processing complete: Results output to $output_file" } # Execution example process_logs_safely "*.log" "/tmp/error_report.txt"

Robust pipeline with error handling, file existence checks, and temp file management

⚡ Performance Optimization Patterns

Pipeline design patterns balancing large data volumes and high-speed processing.

🚀 Parallel Processing Pipeline
# Parallelize CPU-intensive processing parallel_log_analysis() { local log_pattern="$1" local output_dir="$2" local cpu_cores=$(nproc) local max_parallel=$((cpu_cores - 1)) # Consider system load echo "Starting parallel processing: Max ${max_parallel} processes" # Discover log files and distribute to worker processes find /var/log -name "$log_pattern" -type f | \ xargs -n 1 -P "$max_parallel" -I {} bash -c ' logfile="$1" output_dir="$2" worker_id="$$" echo "Worker $worker_id: Starting processing $logfile" # Aggregation processing (CPU intensive) result_file="$output_dir/result_$worker_id.tmp" # Complex regex processing grep -E "([0-9]{1,3}\.){3}[0-9]{1,3}" "$logfile" | \ awk " BEGIN { FS=\" \"; worker_id=\"$worker_id\"; } { # IP region determination (simplified) ip = \$1; if (match(ip, /^192\.168\./)) region = \"local\"; else if (match(ip, /^10\./)) region = \"internal\"; else if (match(ip, /^172\.(1[6-9]|2[0-9]|3[01])\./)) region = \"internal\"; else region = \"external\"; # Time period analysis if (match(\$4, /\[([0-9]{2})/, time_parts)) { hour = time_parts[1]; access_by_region_hour[region][hour]++; } total_by_region[region]++; } END { printf \"# Worker %s results\n\", worker_id; for (region in total_by_region) { printf \"region:%s total:%d\n\", region, total_by_region[region]; for (hour = 0; hour < 24; hour++) { count = (region SUBSEP hour in access_by_region_hour) ? access_by_region_hour[region][hour] : 0; if (count > 0) { printf \"region:%s hour:%02d count:%d\n\", region, hour, count; } } } }" > "$result_file" echo "Worker $worker_id: Processing complete" ' -- {} "$output_dir" # Merge all worker results echo "Merging results..." merge_worker_results "$output_dir" } merge_worker_results() { local output_dir="$1" local final_report="$output_dir/final_analysis.txt" # Aggregate all worker results find "$output_dir" -name "result_*.tmp" | \ xargs cat | \ awk ' /^region:.*total:/ { split($0, parts, " "); region = substr(parts[1], 8); # Remove "region:" total = substr(parts[2], 7); # Remove "total:" region_totals[region] += total; } /^region:.*hour:.*count:/ { split($0, parts, " "); region = substr(parts[1], 8); hour = substr(parts[2], 6); count = substr(parts[3], 7); region_hour_counts[region][hour] += count; } END { print "=== Regional Access Analysis via Parallel Processing ==="; print ""; for (region in region_totals) { printf "Region: %s (Total: %d)\n", region, region_totals[region]; printf "Hourly Details:\n"; for (hour = 0; hour < 24; hour++) { count = (region SUBSEP hour in region_hour_counts) ? region_hour_counts[region][hour] : 0; if (count > 0) { # Simple graph display bar_length = int(count / 10); if (bar_length > 50) bar_length = 50; printf " %02d hour: %5d ", hour, count; for (i = 0; i < bar_length; i++) printf "▓"; printf "\n"; } } printf "\n"; } }' > "$final_report" # Cleanup temp files find "$output_dir" -name "result_*.tmp" -delete echo "Final report generated: $final_report" } # Execution example mkdir -p /tmp/parallel_analysis parallel_log_analysis "access.log*" "/tmp/parallel_analysis"

Accelerate log analysis with multi-core parallel processing and merge results

7. Real-World Use Cases

Practice over theory! Introducing how these are used in actual work by job role.

💻 Web Engineers

🚨 Emergency Production Issue Response

Situation: "Site is slow" reported. Need to identify cause quickly

Response Procedure:
1. Check Error Logs
find /var/log/apache2 /var/log/nginx -name "*.log" | xargs grep -E "$(date '+%d/%b/%Y')" | grep -E "5[0-9][0-9]|error|timeout" | tail -50
2. Identify Slow Queries
find /var/log/mysql -name "*slow.log" | xargs grep -A 5 "Query_time" | awk '/Query_time: [5-9]/ {getline; print}'
3. Detect Abnormal Access Patterns
grep "$(date '+%d/%b/%Y')" /var/log/apache2/access.log | awk '{print $1}' | sort | uniq -c | awk '$1 > 1000 {print "Abnormal access:", $2, "Count:", $1}'

⏱️ Impact: Tasks that would take 30-60 minutes manually completed in 5 minutes

📊 Monthly Report Generation

Situation: Need to compile last month's access statistics and error rates

# Access statistics report generation script #!/bin/bash LAST_MONTH=$(date -d "last month" '+%b/%Y') echo "=== $LAST_MONTH Access Statistics Report ===" echo # Total access count TOTAL_ACCESS=$(find /var/log/apache2 -name "access.log*" | xargs grep "$LAST_MONTH" | wc -l) echo "Total Access: $TOTAL_ACCESS" # Unique visitors UNIQUE_VISITORS=$(find /var/log/apache2 -name "access.log*" | xargs grep "$LAST_MONTH" | awk '{print $1}' | sort -u | wc -l) echo "Unique Visitors: $UNIQUE_VISITORS" # Error rate ERROR_COUNT=$(find /var/log/apache2 -name "access.log*" | xargs grep "$LAST_MONTH" | grep -E " [45][0-9][0-9] " | wc -l) ERROR_RATE=$(echo "scale=2; $ERROR_COUNT * 100 / $TOTAL_ACCESS" | bc) echo "Error Rate: $ERROR_RATE%" # Top 10 popular pages echo -e "\n=== Top 10 Popular Pages ===" find /var/log/apache2 -name "access.log*" | xargs grep "$LAST_MONTH" | awk '{print $7}' | grep -v "\.css\|\.js\|\.png\|\.jpg" | sort | uniq -c | sort -rn | head -10 | awk '{printf "%-50s %d times\n", $2, $1}'

⏱️ Impact: Excel work that took half a day automated in 3 minutes

🛠️ Infrastructure Engineers

🖥️ Server Monitoring & Maintenance

Situation: Regularly check health status of multiple servers

# Server health status check script #!/bin/bash echo "=== Server Health Status Report ===" date echo # Disk usage warning echo "=== Disk Usage (Warning at 80%+) ===" df -h | awk 'NR>1 {gsub(/%/, "", $5); if($5 > 80) printf "⚠️ %s: %s used (%s%%)\n", $6, $3, $5}' # Memory usage echo -e "\n=== Memory Usage ===" free -m | awk 'NR==2{printf "Memory Usage: %.1f%% (%dMB / %dMB)\n", $3*100/$2, $3, $2}' # High CPU usage processes echo -e "\n=== Top 5 CPU Usage ===" ps aux --no-headers | sort -rn -k3 | head -5 | awk '{printf "%-10s %5.1f%% %s\n", $1, $3, $11}' # Error log surge check echo -e "\n=== Last Hour Error Count ===" find /var/log -name "*.log" -mmin -60 | xargs grep -h -E "$(date '+%b %d %H')|$(date -d '1 hour ago' '+%b %d %H')" | grep -ci error

⏱️ Impact: Manual checks taking 1 hour automated for 24/7 monitoring

📈 Data Analysts

📊 Large Data Preprocessing

Situation: Multi-GB CSV file cannot be opened in Excel. Need preprocessing

# Large CSV analysis & preprocessing script #!/bin/bash CSV_FILE="sales_data_2024.csv" OUTPUT_DIR="processed_data" mkdir -p $OUTPUT_DIR echo "=== Large CSV Analysis Started ===" # File size & line count check echo "File size: $(du -h "$CSV_FILE" | cut -f1)" echo "Total lines: $(wc -l < "$CSV_FILE")" # Data quality check echo -e "\n=== Data Quality Check ===" echo "Empty lines: $(grep -c '^$' "$CSV_FILE")" echo "Invalid lines: $(awk -F',' 'NF != 5 {count++} END {print count+0}' "$CSV_FILE")" # Split by month echo -e "\n=== Splitting by month... ===" awk -F',' 'NR==1 {header=$0; next} { month=substr($1,1,7); # Extract YYYY-MM portion if(!seen[month]) { print header > "'$OUTPUT_DIR'/sales_" month ".csv"; seen[month]=1; } print $0 > "'$OUTPUT_DIR'/sales_" month ".csv" }' "$CSV_FILE" # Create monthly summary echo -e "\n=== Monthly Summary ===" find $OUTPUT_DIR -name "sales_*.csv" | sort | while read file; do month=$(basename "$file" .csv | cut -d'_' -f2) total_sales=$(awk -F',' 'NR>1 {sum+=$4} END {print sum}' "$file") record_count=$(expr $(wc -l < "$file") - 1) printf "%s: %'d items, Total sales: ¥%'d\n" "$month" "$record_count" "$total_sales" done

⏱️ Impact: Hours of Excel work completed in minutes, memory shortage resolved

🏭 Industry Case Studies: Professional Practice Examples

Detailed explanation of techniques used in actual projects across industries.

🎮 Game Development: Large-Scale Log Analysis

📋 Challenge

Detect cheating and analyze game balance from 100GB daily player behavior logs in online game

💡 Solution
# Game log analysis pipeline #!/bin/bash # Detect abnormal behavior from one day's logs analyze_game_logs() { local log_date="$1" local output_dir="/analysis/$(date +%Y%m%d)" mkdir -p "$output_dir" echo "=== Game log analysis started: $log_date ===" # Step 1: Analyze player behavior patterns find /game/logs -name "*${log_date}*.log" -type f | \ xargs grep -h "PLAYER_ACTION" | \ awk -F'|' ' { player_id = $3; action = $4; timestamp = $2; value = $5; # Detect abnormally frequent actions in short time if (action == "LEVEL_UP") { player_levelups[player_id]++; if (player_levelups[player_id] > 10) { print "SUSPICIOUS_LEVELUP", player_id, player_levelups[player_id] > "/tmp/cheat_suspects.log"; } } # Abnormal currency increase if (action == "GOLD_CHANGE" && value > 1000000) { print "SUSPICIOUS_GOLD", player_id, value, timestamp > "/tmp/gold_anomaly.log"; } # Player statistics player_actions[player_id]++; total_actions++; } END { # Players with abnormally high action counts avg_actions = total_actions / length(player_actions); for (player in player_actions) { if (player_actions[player] > avg_actions * 5) { printf "HIGH_ACTIVITY_PLAYER: %s (%d actions, avg: %.1f)\n", player, player_actions[player], avg_actions > "/tmp/high_activity.log"; } } }' # Merge results and generate report { echo "=== Game Cheat Detection Report $(date) ===" echo "" if [[ -s "/tmp/cheat_suspects.log" ]]; then echo "🚨 Level-up Anomalies:" sort "/tmp/cheat_suspects.log" | uniq -c | sort -rn | head -10 echo "" fi if [[ -s "/tmp/gold_anomaly.log" ]]; then echo "💰 Gold Anomalies:" sort -k3 -rn "/tmp/gold_anomaly.log" | head -10 echo "" fi if [[ -s "/tmp/high_activity.log" ]]; then echo "⚡ High Activity Players:" head -20 "/tmp/high_activity.log" fi } > "$output_dir/cheat_detection_report.txt" # Delete temp files rm -f /tmp/{cheat_suspects,gold_anomaly,high_activity}.log echo "Report generated: $output_dir/cheat_detection_report.txt" } # Execution example: Analyze yesterday's logs analyze_game_logs "$(date -d yesterday +%Y%m%d)"

Automatically detect cheating from 100GB game logs

📊 Impact
  • Manual investigation: Days → Automated: 30 minutes
  • Cheat detection accuracy: 95%+
  • Operations effort: 80% reduction

🏪 E-commerce & Retail: Customer Behavior Analysis

📋 Challenge

Analyze customer purchase patterns from e-commerce access logs and measure personalization campaign effectiveness

💡 Solution
# E-commerce customer behavior analysis #!/bin/bash analyze_customer_journey() { local analysis_period="$1" # YYYY-MM-DD local output_dir="/analytics/customer_journey" mkdir -p "$output_dir" echo "=== Customer Journey Analysis: $analysis_period ===" # Session construction and page transition analysis find /var/log/nginx -name "access.log*" | \ xargs grep "$analysis_period" | \ awk ' BEGIN { # Session boundary: 30 minute interval session_timeout = 1800; } { ip = $1; url = $7; # Session management and page transition recording current_time = systime(); if (current_time - last_access[ip] > session_timeout) { session_id = ip "_" current_time; sessions[session_id]["start_time"] = current_time; sessions[session_id]["pages"] = 0; } sessions[session_id]["pages"]++; # Purchase page access detection if (url ~ /\/checkout|\/purchase/) { purchase_sessions[session_id] = 1; purchase_path_length[sessions[session_id]["pages"]]++; } # Product page view patterns if (url ~ /\/products\/([0-9]+)/) { match(url, /\/products\/([0-9]+)/, product_match); product_id = product_match[1]; product_views[product_id]++; user_product_views[ip][product_id]++; } last_access[ip] = current_time; } END { # Output results print "=== Session Statistics ==="; printf "Total Sessions: %d\n", length(sessions); printf "Purchase Completed Sessions: %d\n", length(purchase_sessions); printf "Purchase Rate: %.2f%%\n", (length(purchase_sessions) * 100.0) / length(sessions); print "\n=== Purchase Path Analysis ==="; for (length in purchase_path_length) { printf "Purchases via %d pages: %d cases\n", length, purchase_path_length[length]; } # Generate recommendation data print "\n=== Product Co-occurrence Analysis (for Recommendations) ==="; for (user in user_product_views) { products_viewed = ""; for (product in user_product_views[user]) { products_viewed = products_viewed product " "; } if (length(user_product_views[user]) >= 2) { print "USER_PRODUCTS:", user, products_viewed; } } }' > "$output_dir/customer_journey_${analysis_period}.txt" echo "Customer journey analysis complete: $output_dir/customer_journey_${analysis_period}.txt" } # Execution example: Analyze yesterday's customer behavior analyze_customer_journey "$(date -d yesterday +%Y-%m-%d)"

Analyze e-commerce customer behavior patterns and generate recommendation data

📊 Impact
  • Data analysis effort: 70% reduction
  • Personalization accuracy: 30% improvement
  • Real-time analysis enabled

8. Performance Optimization

Learn techniques to maximize speed and efficiency in large data processing.

⚡ Basic Speed Optimization Techniques

🌐 Locale Setting Optimization

🐌 Slow Method
grep "pattern" large_file.txt

Overhead from UTF-8 processing

⚡ Optimization
LC_ALL=C grep "pattern" large_file.txt

Up to 10x faster with ASCII processing

📁 File Access Optimization

🐌 Slow Method
find /var -name "*.log" -exec grep "ERROR" {} \;

Spawn new process for each file

⚡ Optimization
find /var -name "*.log" | xargs grep "ERROR"

Significant speedup with batch processing

💾 Memory-Efficient Processing

🐌 High Memory Usage
awk '{lines[NR]=$0} END {for(i=1;i<=NR;i++) print lines[i]}' huge.txt

Load all lines into memory

⚡ Streaming Processing
awk '{print $0}' huge.txt

Process line by line to save memory

🚀 Advanced Optimization Techniques

🔄 Parallel Processing Utilization

# Parallel search using multiple CPU cores find /var/log -name "*.log" -type f | \ xargs -n 1 -P $(nproc) -I {} \ bash -c 'echo "Processing: {}"; grep -c "ERROR" "{}"'

Parallel processing with available CPU cores

📊 Efficient Data Pipeline

# Memory-efficient large file processing stream_process_large_file() { local input_file="$1" local chunk_size=10000 # Streaming processing in chunks split -l "$chunk_size" "$input_file" /tmp/chunk_ # Parallel process each chunk find /tmp -name "chunk_*" | \ xargs -n 1 -P 4 -I {} \ bash -c ' chunk_file="$1" # High-speed processing (LC_ALL=C environment) LC_ALL=C awk " { # Execute only necessary processing if (\$0 ~ /ERROR/) error_count++; total_lines++; } END { printf \"%s: errors=%d total=%d\n\", FILENAME, error_count, total_lines; }" "$chunk_file" # Delete processed chunks immediately rm "$chunk_file" ' -- {} } # Execution example stream_process_large_file "huge_log_file.txt"

Split large files into chunks for parallel streaming processing

⚡ Regular Expression Optimization

❌ Inefficient Regex
grep -E "(error|ERROR|Error|warning|WARNING|Warning)" logfile

Heavy processing with complex alternatives

✅ Optimized Regex
grep -iE "(error|warning)" logfile

Concise with case-insensitive option

📈 Performance Measurement and Monitoring

⏱️ Processing Time Measurement

# Detailed performance measurement benchmark_command() { local command="$1" local description="$2" echo "=== $description ===" echo "Command: $command" # Time measurement time_result=$(time (eval "$command") 2>&1) # Memory usage measurement /usr/bin/time -v eval "$command" 2>&1 | grep -E "(Maximum resident set size|User time|System time)" echo "Processing complete" echo } # Usage example benchmark_command "find /var/log -name '*.log' | xargs grep -c ERROR" "Traditional method" benchmark_command "find /var/log -name '*.log' -print0 | xargs -0 grep -c ERROR" "NULL delimiter optimization"

Detailed measurement of processing time and memory usage

📊 Resource Usage Monitoring

# Processing with real-time resource monitoring monitor_and_process() { local log_pattern="$1" local output_file="$2" # Start resource monitoring in background { while true; do timestamp=$(date '+%H:%M:%S') cpu_usage=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1) mem_usage=$(free | grep Mem | awk '{printf "%.1f", $3/$2 * 100.0}') echo "$timestamp: CPU ${cpu_usage}%, Memory ${mem_usage}%" sleep 5 done } & monitor_pid=$! # Execute main processing echo "Processing started: $(date)" find /var/log -name "$log_pattern" | \ xargs grep -h "ERROR" | \ awk '{ error_patterns[$0]++; total_errors++; } END { printf "Total Errors: %d\n", total_errors; printf "Unique Error Patterns: %d\n", length(error_patterns); print "\n=== Top 5 Frequent Errors ==="; PROCINFO["sorted_in"] = "@val_num_desc"; count = 0; for (pattern in error_patterns) { printf "%s: %d times\n", pattern, error_patterns[pattern]; if (++count >= 5) break; } }' > "$output_file" # Terminate monitoring process kill $monitor_pid 2>/dev/null echo "Processing complete: $(date)" } # Execution example monitor_and_process "*.log" "/tmp/error_analysis.txt"

Real-time monitoring of CPU and memory usage during processing