find/grep/awk Master Series Practical
Combinations & Real-World Applications

September 16, 2025 Advanced ⏱ ~25-30 minutes Series 3/4

Detailed explanation of practical data processing patterns combining find, grep, and awk, along with real-world use cases. Master practical skills for engineers and data analysts.

6. Combination Techniques

True Linux masters use find, grep, and awk in combination. Tasks that are difficult with single commands become powerful solutions when combined.

🔗 Basic Pipe Connection Patterns

find + grep Combination

find /var/log -name "*.log" -exec grep -l "ERROR" {} \;

Identify log files containing ERROR

find /home -name "*.txt" | xargs grep -n "password"

Search for "password" in txt files with line numbers

grep + awk Combination

grep "ERROR" /var/log/app.log | awk '{print $1, $2, $NF}'

Extract date, time, and last field from error lines

ps aux | grep "nginx" | awk '{sum+=$4} END {print "Total CPU usage:", sum "%"}'

Sum CPU usage of nginx processes

find + awk Combination

find /var -name "*.log" -printf "%s %p\n" | awk '{size+=$1; count++} END {printf "Total size: %.2f MB Files: %d\n", size/1024/1024, count}'

Calculate total size and count of log files

🎯 Production-Level Complex Processing

📊 Scenario 1: Web Server Access Analysis

Goal: Extract top 10 IP addresses with most errors from last week's access logs

Solution:

find /var/log/apache2 -name "access.log*" -mtime -7 | \
xargs grep " 5[0-9][0-9] " | \
awk '{print $1}' | \
sort | uniq -c | \
sort -rn | \
head -10 | \
awk '{printf "%-15s %d times\n", $2, $1}'

Step Explanation:

find: Search for access log files within last 7 days
grep: Extract 5xx errors (server errors)
awk: Extract only IP addresses (1st column)
sort | uniq -c: Count by IP address
sort -rn: Sort by count in descending order
head -10: Get top 10
awk: Format output for readability

🔍 Scenario 2: Safe Bulk Deletion of Old Temp Files

Goal: Safely delete temporary files older than 30 days from entire system

Solution:

# 1. First confirm target files
find /tmp /var/tmp /home -name "*.tmp" -o -name "temp*" -o -name "*.temp" | \
grep -E "\.(tmp|temp)$|^temp" | \
xargs ls -la | \
awk '$6 " " $7 " " $8 < "'$(date -d "30 days ago" "+%b %d %H:%M")'" {print $NF}'

# 2. Execute deletion after confirming safety
find /tmp /var/tmp /home -name "*.tmp" -mtime +30 -size +0 | \
xargs -I {} bash -c 'echo "Deleting: {}"; rm "{}"'

Safe Deletion Procedure:

First list and verify deletion targets
Target only files older than 30 days AND larger than 0 bytes
Display filename before deletion (for logging)

📈 Scenario 3: Database Connection Log Analysis

Goal: Analyze MySQL connection count by time period

Solution:

find /var/log/mysql -name "*.log" -mtime -1 | \
xargs grep -h "Connect" | \
awk '{
    # Extract time (eg: 2025-01-15T14:30:25.123456Z)
    match($0, /[0-9]{4}-[0-9]{2}-[0-9]{2}T([0-9]{2})/, time_parts);
    hour = time_parts[1];
    connections[hour]++;
}
END {
    print "MySQL Connections by Hour (Last 24 Hours)";
    print "================================";
    for (h = 0; h < 24; h++) {
        printf "%02d:00-%02d:59 | ", h, h;
        count = (h in connections) ? connections[h] : 0;
        printf "%5d times ", count;
        # Simple graph display
        for (i = 0; i < count/10; i++) printf "▓";
        printf "\n";
    }
}'

Advanced Processing Points:

Time extraction using regex
Hourly aggregation using associative arrays
Visual graph display
Complete 24-hour display including zero-count hours

⚡ One-Liner Technique Collection

Collection of useful one-liners ready to use with excellent practicality!

💾 Disk & File Management

find . -type f -exec du -h {} + | sort -rh | head -20

Top 20 largest files

find /var -name "*.log" -mtime +7 -exec ls -lh {} \; | awk '{size+=$5} END {print "Deletable size:", size/1024/1024 "MB"}'

Calculate total size of old log files

🌐 Network & Access Analysis

grep "$(date '+%d/%b/%Y')" /var/log/apache2/access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -10

Top 10 IP addresses with most access today

find /var/log -name "*.log" | xargs grep -h "Failed password" | awk '{print $11}' | sort | uniq -c | sort -rn

IP addresses with most SSH login failures

📊 System Monitoring

find /proc -maxdepth 2 -name "status" 2>/dev/null | xargs grep -l "VmRSS" | xargs -I {} bash -c 'echo -n "$(basename $(dirname {})): "; grep VmRSS {}'

Memory usage per process

find /var/log -name "syslog*" | xargs grep "$(date '+%b %d')" | grep -i "error\|warn\|fail" | awk '{print $5}' | sort | uniq -c | sort -rn

Today's system error/warning source statistics

🏗️ Pipeline Design Patterns: The Art of Data Flow Design

Professional techniques for designing efficient and maintainable pipelines for complex data processing.

🔄 Error Handling and Recovery Patterns

In production environments, failure is expected. Proper error handling and continued processing design are crucial.

🛡️ Failure-Proof Pipeline Design

# Detect mid-process failures with pipefail
set -euo pipefail

# Error handling function
handle_error() {
    echo "ERROR: Pipeline processing error occurred (line: $1)" >&2
    echo "ERROR: Check intermediate files: /tmp/pipeline_*" >&2
    exit 1
}

# Set error trap
trap 'handle_error $LINENO' ERR

# Safe pipeline processing
process_logs_safely() {
    local input_pattern="$1"
    local output_file="$2"
    local temp_dir="/tmp/pipeline_$$"

    # Create temp directory
    mkdir -p "$temp_dir"

    # Step 1: File collection (skip on failure)
    echo "Step 1: Collecting log files..."
    find /var/log -name "$input_pattern" -type f 2>/dev/null > "$temp_dir/file_list" || {
        echo "WARNING: Could not access some files" >&2
    }

    # Handle no files found
    if [[ ! -s "$temp_dir/file_list" ]]; then
        echo "ERROR: No files to process found" >&2
        rm -rf "$temp_dir"
        return 1
    fi

    # Step 2: Data processing (process each file individually)
    echo "Step 2: Processing data..."
    while IFS= read -r logfile; do
        if [[ -r "$logfile" ]]; then
            grep -h "ERROR\|WARN" "$logfile" 2>/dev/null >> "$temp_dir/errors.log" || true
        else
            echo "WARNING: Could not read $logfile" >&2
        fi
    done < "$temp_dir/file_list"

    # Step 3: Aggregation processing
    echo "Step 3: Aggregating..."
    if [[ -s "$temp_dir/errors.log" ]]; then
        awk '
        {
            # Error pattern extraction and aggregation
            if ($0 ~ /ERROR/) error_count++;
            if ($0 ~ /WARN/) warn_count++;

            # Hourly aggregation
            if (match($0, /[0-9]{2}:[0-9]{2}:[0-9]{2}/, time_match)) {
                hour = substr(time_match[0], 1, 2);
                hourly_errors[hour]++;
            }
        }
        END {
            printf "Error Statistics Report\n";
            printf "==================\n";
            printf "ERROR: %d items\n", error_count;
            printf "WARN:  %d items\n", warn_count;
            printf "\nErrors by Hour:\n";
            for (h = 0; h < 24; h++) {
                printf "%02d hour: %d items\n", h, (h in hourly_errors) ? hourly_errors[h] : 0;
            }
        }' "$temp_dir/errors.log" > "$output_file"
    else
        echo "No error logs found" > "$output_file"
    fi

    # Delete temp files
    rm -rf "$temp_dir"
    echo "Processing complete: Results output to $output_file"
}

# Execution example
process_logs_safely "*.log" "/tmp/error_report.txt"

Robust pipeline with error handling, file existence checks, and temp file management

⚡ Performance Optimization Patterns

Pipeline design patterns balancing large data volumes and high-speed processing.

🚀 Parallel Processing Pipeline

# Parallelize CPU-intensive processing
parallel_log_analysis() {
    local log_pattern="$1"
    local output_dir="$2"
    local cpu_cores=$(nproc)
    local max_parallel=$((cpu_cores - 1))  # Consider system load

    echo "Starting parallel processing: Max ${max_parallel} processes"

    # Discover log files and distribute to worker processes
    find /var/log -name "$log_pattern" -type f | \
    xargs -n 1 -P "$max_parallel" -I {} bash -c '
        logfile="$1"
        output_dir="$2"
        worker_id="$$"

        echo "Worker $worker_id: Starting processing $logfile"

        # Aggregation processing (CPU intensive)
        result_file="$output_dir/result_$worker_id.tmp"

        # Complex regex processing
        grep -E "([0-9]{1,3}\.){3}[0-9]{1,3}" "$logfile" | \
        awk "
        BEGIN {
            FS=\" \";
            worker_id=\"$worker_id\";
        }
        {
            # IP region determination (simplified)
            ip = \$1;
            if (match(ip, /^192\.168\./)) region = \"local\";
            else if (match(ip, /^10\./)) region = \"internal\";
            else if (match(ip, /^172\.(1[6-9]|2[0-9]|3[01])\./)) region = \"internal\";
            else region = \"external\";

            # Time period analysis
            if (match(\$4, /\[([0-9]{2})/, time_parts)) {
                hour = time_parts[1];
                access_by_region_hour[region][hour]++;
            }

            total_by_region[region]++;
        }
        END {
            printf \"# Worker %s results\n\", worker_id;
            for (region in total_by_region) {
                printf \"region:%s total:%d\n\", region, total_by_region[region];
                for (hour = 0; hour < 24; hour++) {
                    count = (region SUBSEP hour in access_by_region_hour) ?
                           access_by_region_hour[region][hour] : 0;
                    if (count > 0) {
                        printf \"region:%s hour:%02d count:%d\n\", region, hour, count;
                    }
                }
            }
        }" > "$result_file"

        echo "Worker $worker_id: Processing complete"
    ' -- {} "$output_dir"

    # Merge all worker results
    echo "Merging results..."
    merge_worker_results "$output_dir"
}

merge_worker_results() {
    local output_dir="$1"
    local final_report="$output_dir/final_analysis.txt"

    # Aggregate all worker results
    find "$output_dir" -name "result_*.tmp" | \
    xargs cat | \
    awk '
    /^region:.*total:/ {
        split($0, parts, " ");
        region = substr(parts[1], 8);  # Remove "region:"
        total = substr(parts[2], 7);   # Remove "total:"
        region_totals[region] += total;
    }
    /^region:.*hour:.*count:/ {
        split($0, parts, " ");
        region = substr(parts[1], 8);
        hour = substr(parts[2], 6);
        count = substr(parts[3], 7);
        region_hour_counts[region][hour] += count;
    }
    END {
        print "=== Regional Access Analysis via Parallel Processing ===";
        print "";

        for (region in region_totals) {
            printf "Region: %s (Total: %d)\n", region, region_totals[region];
            printf "Hourly Details:\n";

            for (hour = 0; hour < 24; hour++) {
                count = (region SUBSEP hour in region_hour_counts) ?
                       region_hour_counts[region][hour] : 0;
                if (count > 0) {
                    # Simple graph display
                    bar_length = int(count / 10);
                    if (bar_length > 50) bar_length = 50;

                    printf "  %02d hour: %5d ", hour, count;
                    for (i = 0; i < bar_length; i++) printf "▓";
                    printf "\n";
                }
            }
            printf "\n";
        }
    }' > "$final_report"

    # Cleanup temp files
    find "$output_dir" -name "result_*.tmp" -delete
    echo "Final report generated: $final_report"
}

# Execution example
mkdir -p /tmp/parallel_analysis
parallel_log_analysis "access.log*" "/tmp/parallel_analysis"

Accelerate log analysis with multi-core parallel processing and merge results

7. Real-World Use Cases

Practice over theory! Introducing how these are used in actual work by job role.

💻 Web Engineers

🚨 Emergency Production Issue Response

Situation: "Site is slow" reported. Need to identify cause quickly

Response Procedure:

1. Check Error Logs

find /var/log/apache2 /var/log/nginx -name "*.log" | xargs grep -E "$(date '+%d/%b/%Y')" | grep -E "5[0-9][0-9]|error|timeout" | tail -50

2. Identify Slow Queries

find /var/log/mysql -name "*slow.log" | xargs grep -A 5 "Query_time" | awk '/Query_time: [5-9]/ {getline; print}'

3. Detect Abnormal Access Patterns

grep "$(date '+%d/%b/%Y')" /var/log/apache2/access.log | awk '{print $1}' | sort | uniq -c | awk '$1 > 1000 {print "Abnormal access:", $2, "Count:", $1}'

⏱️ Impact: Tasks that would take 30-60 minutes manually completed in 5 minutes

📊 Monthly Report Generation

Situation: Need to compile last month's access statistics and error rates

# Access statistics report generation script
#!/bin/bash
LAST_MONTH=$(date -d "last month" '+%b/%Y')

echo "=== $LAST_MONTH Access Statistics Report ==="
echo

# Total access count
TOTAL_ACCESS=$(find /var/log/apache2 -name "access.log*" | xargs grep "$LAST_MONTH" | wc -l)
echo "Total Access: $TOTAL_ACCESS"

# Unique visitors
UNIQUE_VISITORS=$(find /var/log/apache2 -name "access.log*" | xargs grep "$LAST_MONTH" | awk '{print $1}' | sort -u | wc -l)
echo "Unique Visitors: $UNIQUE_VISITORS"

# Error rate
ERROR_COUNT=$(find /var/log/apache2 -name "access.log*" | xargs grep "$LAST_MONTH" | grep -E " [45][0-9][0-9] " | wc -l)
ERROR_RATE=$(echo "scale=2; $ERROR_COUNT * 100 / $TOTAL_ACCESS" | bc)
echo "Error Rate: $ERROR_RATE%"

# Top 10 popular pages
echo -e "\n=== Top 10 Popular Pages ==="
find /var/log/apache2 -name "access.log*" | xargs grep "$LAST_MONTH" | awk '{print $7}' | grep -v "\.css\|\.js\|\.png\|\.jpg" | sort | uniq -c | sort -rn | head -10 | awk '{printf "%-50s %d times\n", $2, $1}'

⏱️ Impact: Excel work that took half a day automated in 3 minutes

🛠️ Infrastructure Engineers

🖥️ Server Monitoring & Maintenance

Situation: Regularly check health status of multiple servers

# Server health status check script
#!/bin/bash

echo "=== Server Health Status Report ==="
date
echo

# Disk usage warning
echo "=== Disk Usage (Warning at 80%+) ==="
df -h | awk 'NR>1 {gsub(/%/, "", $5); if($5 > 80) printf "⚠️  %s: %s used (%s%%)\n", $6, $3, $5}'

# Memory usage
echo -e "\n=== Memory Usage ==="
free -m | awk 'NR==2{printf "Memory Usage: %.1f%% (%dMB / %dMB)\n", $3*100/$2, $3, $2}'

# High CPU usage processes
echo -e "\n=== Top 5 CPU Usage ==="
ps aux --no-headers | sort -rn -k3 | head -5 | awk '{printf "%-10s %5.1f%% %s\n", $1, $3, $11}'

# Error log surge check
echo -e "\n=== Last Hour Error Count ==="
find /var/log -name "*.log" -mmin -60 | xargs grep -h -E "$(date '+%b %d %H')|$(date -d '1 hour ago' '+%b %d %H')" | grep -ci error

⏱️ Impact: Manual checks taking 1 hour automated for 24/7 monitoring

📈 Data Analysts

📊 Large Data Preprocessing

Situation: Multi-GB CSV file cannot be opened in Excel. Need preprocessing

# Large CSV analysis & preprocessing script
#!/bin/bash

CSV_FILE="sales_data_2024.csv"
OUTPUT_DIR="processed_data"
mkdir -p $OUTPUT_DIR

echo "=== Large CSV Analysis Started ==="

# File size & line count check
echo "File size: $(du -h "$CSV_FILE" | cut -f1)"
echo "Total lines: $(wc -l < "$CSV_FILE")"

# Data quality check
echo -e "\n=== Data Quality Check ==="
echo "Empty lines: $(grep -c '^$' "$CSV_FILE")"
echo "Invalid lines: $(awk -F',' 'NF != 5 {count++} END {print count+0}' "$CSV_FILE")"

# Split by month
echo -e "\n=== Splitting by month... ==="
awk -F',' 'NR==1 {header=$0; next}
{
    month=substr($1,1,7);  # Extract YYYY-MM portion
    if(!seen[month]) {
        print header > "'$OUTPUT_DIR'/sales_" month ".csv";
        seen[month]=1;
    }
    print $0 > "'$OUTPUT_DIR'/sales_" month ".csv"
}' "$CSV_FILE"

# Create monthly summary
echo -e "\n=== Monthly Summary ==="
find $OUTPUT_DIR -name "sales_*.csv" | sort | while read file; do
    month=$(basename "$file" .csv | cut -d'_' -f2)
    total_sales=$(awk -F',' 'NR>1 {sum+=$4} END {print sum}' "$file")
    record_count=$(expr $(wc -l < "$file") - 1)
    printf "%s: %'d items, Total sales: ¥%'d\n" "$month" "$record_count" "$total_sales"
done

⏱️ Impact: Hours of Excel work completed in minutes, memory shortage resolved

🏭 Industry Case Studies: Professional Practice Examples

Detailed explanation of techniques used in actual projects across industries.

🎮 Game Development: Large-Scale Log Analysis

📋 Challenge

Detect cheating and analyze game balance from 100GB daily player behavior logs in online game

💡 Solution

# Game log analysis pipeline
#!/bin/bash

# Detect abnormal behavior from one day's logs
analyze_game_logs() {
    local log_date="$1"
    local output_dir="/analysis/$(date +%Y%m%d)"
    mkdir -p "$output_dir"

    echo "=== Game log analysis started: $log_date ==="

    # Step 1: Analyze player behavior patterns
    find /game/logs -name "*${log_date}*.log" -type f | \
    xargs grep -h "PLAYER_ACTION" | \
    awk -F'|' '
    {
        player_id = $3;
        action = $4;
        timestamp = $2;
        value = $5;

        # Detect abnormally frequent actions in short time
        if (action == "LEVEL_UP") {
            player_levelups[player_id]++;
            if (player_levelups[player_id] > 10) {
                print "SUSPICIOUS_LEVELUP", player_id, player_levelups[player_id] > "/tmp/cheat_suspects.log";
            }
        }

        # Abnormal currency increase
        if (action == "GOLD_CHANGE" && value > 1000000) {
            print "SUSPICIOUS_GOLD", player_id, value, timestamp > "/tmp/gold_anomaly.log";
        }

        # Player statistics
        player_actions[player_id]++;
        total_actions++;
    }
    END {
        # Players with abnormally high action counts
        avg_actions = total_actions / length(player_actions);
        for (player in player_actions) {
            if (player_actions[player] > avg_actions * 5) {
                printf "HIGH_ACTIVITY_PLAYER: %s (%d actions, avg: %.1f)\n",
                       player, player_actions[player], avg_actions > "/tmp/high_activity.log";
            }
        }
    }'

    # Merge results and generate report
    {
        echo "=== Game Cheat Detection Report $(date) ==="
        echo ""

        if [[ -s "/tmp/cheat_suspects.log" ]]; then
            echo "🚨 Level-up Anomalies:"
            sort "/tmp/cheat_suspects.log" | uniq -c | sort -rn | head -10
            echo ""
        fi

        if [[ -s "/tmp/gold_anomaly.log" ]]; then
            echo "💰 Gold Anomalies:"
            sort -k3 -rn "/tmp/gold_anomaly.log" | head -10
            echo ""
        fi

        if [[ -s "/tmp/high_activity.log" ]]; then
            echo "⚡ High Activity Players:"
            head -20 "/tmp/high_activity.log"
        fi
    } > "$output_dir/cheat_detection_report.txt"

    # Delete temp files
    rm -f /tmp/{cheat_suspects,gold_anomaly,high_activity}.log

    echo "Report generated: $output_dir/cheat_detection_report.txt"
}

# Execution example: Analyze yesterday's logs
analyze_game_logs "$(date -d yesterday +%Y%m%d)"

Automatically detect cheating from 100GB game logs

📊 Impact

Manual investigation: Days → Automated: 30 minutes
Cheat detection accuracy: 95%+
Operations effort: 80% reduction

🏪 E-commerce & Retail: Customer Behavior Analysis

📋 Challenge

Analyze customer purchase patterns from e-commerce access logs and measure personalization campaign effectiveness

💡 Solution

# E-commerce customer behavior analysis
#!/bin/bash

analyze_customer_journey() {
    local analysis_period="$1"  # YYYY-MM-DD
    local output_dir="/analytics/customer_journey"
    mkdir -p "$output_dir"

    echo "=== Customer Journey Analysis: $analysis_period ==="

    # Session construction and page transition analysis
    find /var/log/nginx -name "access.log*" | \
    xargs grep "$analysis_period" | \
    awk '
    BEGIN {
        # Session boundary: 30 minute interval
        session_timeout = 1800;
    }
    {
        ip = $1;
        url = $7;

        # Session management and page transition recording
        current_time = systime();

        if (current_time - last_access[ip] > session_timeout) {
            session_id = ip "_" current_time;
            sessions[session_id]["start_time"] = current_time;
            sessions[session_id]["pages"] = 0;
        }

        sessions[session_id]["pages"]++;

        # Purchase page access detection
        if (url ~ /\/checkout|\/purchase/) {
            purchase_sessions[session_id] = 1;
            purchase_path_length[sessions[session_id]["pages"]]++;
        }

        # Product page view patterns
        if (url ~ /\/products\/([0-9]+)/) {
            match(url, /\/products\/([0-9]+)/, product_match);
            product_id = product_match[1];
            product_views[product_id]++;
            user_product_views[ip][product_id]++;
        }

        last_access[ip] = current_time;
    }
    END {
        # Output results
        print "=== Session Statistics ===";
        printf "Total Sessions: %d\n", length(sessions);
        printf "Purchase Completed Sessions: %d\n", length(purchase_sessions);
        printf "Purchase Rate: %.2f%%\n", (length(purchase_sessions) * 100.0) / length(sessions);

        print "\n=== Purchase Path Analysis ===";
        for (length in purchase_path_length) {
            printf "Purchases via %d pages: %d cases\n", length, purchase_path_length[length];
        }

        # Generate recommendation data
        print "\n=== Product Co-occurrence Analysis (for Recommendations) ===";
        for (user in user_product_views) {
            products_viewed = "";
            for (product in user_product_views[user]) {
                products_viewed = products_viewed product " ";
            }
            if (length(user_product_views[user]) >= 2) {
                print "USER_PRODUCTS:", user, products_viewed;
            }
        }
    }' > "$output_dir/customer_journey_${analysis_period}.txt"

    echo "Customer journey analysis complete: $output_dir/customer_journey_${analysis_period}.txt"
}

# Execution example: Analyze yesterday's customer behavior
analyze_customer_journey "$(date -d yesterday +%Y-%m-%d)"

Analyze e-commerce customer behavior patterns and generate recommendation data

📊 Impact

Data analysis effort: 70% reduction
Personalization accuracy: 30% improvement
Real-time analysis enabled

8. Performance Optimization

Learn techniques to maximize speed and efficiency in large data processing.

⚡ Basic Speed Optimization Techniques

🌐 Locale Setting Optimization

🐌 Slow Method

grep "pattern" large_file.txt

Overhead from UTF-8 processing

⚡ Optimization

LC_ALL=C grep "pattern" large_file.txt

Up to 10x faster with ASCII processing

📁 File Access Optimization

🐌 Slow Method

find /var -name "*.log" -exec grep "ERROR" {} \;

Spawn new process for each file

⚡ Optimization

find /var -name "*.log" | xargs grep "ERROR"

Significant speedup with batch processing

💾 Memory-Efficient Processing

🐌 High Memory Usage

awk '{lines[NR]=$0} END {for(i=1;i<=NR;i++) print lines[i]}' huge.txt

Load all lines into memory

⚡ Streaming Processing

awk '{print $0}' huge.txt

Process line by line to save memory

🚀 Advanced Optimization Techniques

🔄 Parallel Processing Utilization

# Parallel search using multiple CPU cores
find /var/log -name "*.log" -type f | \
xargs -n 1 -P $(nproc) -I {} \
bash -c 'echo "Processing: {}"; grep -c "ERROR" "{}"'

Parallel processing with available CPU cores

📊 Efficient Data Pipeline

# Memory-efficient large file processing
stream_process_large_file() {
    local input_file="$1"
    local chunk_size=10000

    # Streaming processing in chunks
    split -l "$chunk_size" "$input_file" /tmp/chunk_

    # Parallel process each chunk
    find /tmp -name "chunk_*" | \
    xargs -n 1 -P 4 -I {} \
    bash -c '
        chunk_file="$1"

        # High-speed processing (LC_ALL=C environment)
        LC_ALL=C awk "
        {
            # Execute only necessary processing
            if (\$0 ~ /ERROR/) error_count++;
            total_lines++;
        }
        END {
            printf \"%s: errors=%d total=%d\n\", FILENAME, error_count, total_lines;
        }" "$chunk_file"

        # Delete processed chunks immediately
        rm "$chunk_file"
    ' -- {}
}

# Execution example
stream_process_large_file "huge_log_file.txt"

Split large files into chunks for parallel streaming processing

⚡ Regular Expression Optimization

❌ Inefficient Regex

Heavy processing with complex alternatives

✅ Optimized Regex

grep -iE "(error|warning)" logfile

Concise with case-insensitive option

📈 Performance Measurement and Monitoring

⏱️ Processing Time Measurement

# Detailed performance measurement
benchmark_command() {
    local command="$1"
    local description="$2"

    echo "=== $description ==="
    echo "Command: $command"

    # Time measurement
    time_result=$(time (eval "$command") 2>&1)

    # Memory usage measurement
    /usr/bin/time -v eval "$command" 2>&1 | grep -E "(Maximum resident set size|User time|System time)"

    echo "Processing complete"
    echo
}

# Usage example
benchmark_command "find /var/log -name '*.log' | xargs grep -c ERROR" "Traditional method"
benchmark_command "find /var/log -name '*.log' -print0 | xargs -0 grep -c ERROR" "NULL delimiter optimization"

Detailed measurement of processing time and memory usage

📊 Resource Usage Monitoring

# Processing with real-time resource monitoring
monitor_and_process() {
    local log_pattern="$1"
    local output_file="$2"

    # Start resource monitoring in background
    {
        while true; do
            timestamp=$(date '+%H:%M:%S')
            cpu_usage=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)
            mem_usage=$(free | grep Mem | awk '{printf "%.1f", $3/$2 * 100.0}')
            echo "$timestamp: CPU ${cpu_usage}%, Memory ${mem_usage}%"
            sleep 5
        done
    } &
    monitor_pid=$!

    # Execute main processing
    echo "Processing started: $(date)"
    find /var/log -name "$log_pattern" | \
    xargs grep -h "ERROR" | \
    awk '{
        error_patterns[$0]++;
        total_errors++;
    }
    END {
        printf "Total Errors: %d\n", total_errors;
        printf "Unique Error Patterns: %d\n", length(error_patterns);

        print "\n=== Top 5 Frequent Errors ===";
        PROCINFO["sorted_in"] = "@val_num_desc";
        count = 0;
        for (pattern in error_patterns) {
            printf "%s: %d times\n", pattern, error_patterns[pattern];
            if (++count >= 5) break;
        }
    }' > "$output_file"

    # Terminate monitoring process
    kill $monitor_pid 2>/dev/null
    echo "Processing complete: $(date)"
}

# Execution example
monitor_and_process "*.log" "/tmp/error_analysis.txt"

Real-time monitoring of CPU and memory usage during processing

🚀 Next Steps

📚 Next Article: Proceed to Professional Guide 🐧 Practice with Penguin Gym Linux

find/grep/awk Master Series PracticalCombinations & Real-World Applications

📋 Table of Contents

6. Combination Techniques

🔗 Basic Pipe Connection Patterns

find + grep Combination

grep + awk Combination

find + awk Combination

🎯 Production-Level Complex Processing

📊 Scenario 1: Web Server Access Analysis

Solution:

🔍 Scenario 2: Safe Bulk Deletion of Old Temp Files

Solution:

📈 Scenario 3: Database Connection Log Analysis

Solution:

⚡ One-Liner Technique Collection

💾 Disk & File Management

🌐 Network & Access Analysis

📊 System Monitoring

🏗️ Pipeline Design Patterns: The Art of Data Flow Design

🔄 Error Handling and Recovery Patterns

🛡️ Failure-Proof Pipeline Design

⚡ Performance Optimization Patterns

🚀 Parallel Processing Pipeline

7. Real-World Use Cases

💻 Web Engineers

🚨 Emergency Production Issue Response

Response Procedure:

1. Check Error Logs

2. Identify Slow Queries

3. Detect Abnormal Access Patterns

📊 Monthly Report Generation

🛠️ Infrastructure Engineers

🖥️ Server Monitoring & Maintenance

📈 Data Analysts

📊 Large Data Preprocessing

🏭 Industry Case Studies: Professional Practice Examples

🎮 Game Development: Large-Scale Log Analysis

📋 Challenge

💡 Solution

📊 Impact

🏪 E-commerce & Retail: Customer Behavior Analysis

📋 Challenge

💡 Solution

📊 Impact

8. Performance Optimization

⚡ Basic Speed Optimization Techniques

🌐 Locale Setting Optimization

🐌 Slow Method

⚡ Optimization

📁 File Access Optimization

🐌 Slow Method

⚡ Optimization

💾 Memory-Efficient Processing

🐌 High Memory Usage

⚡ Streaming Processing

🚀 Advanced Optimization Techniques

🔄 Parallel Processing Utilization

📊 Efficient Data Pipeline

⚡ Regular Expression Optimization

❌ Inefficient Regex

✅ Optimized Regex

📈 Performance Measurement and Monitoring

⏱️ Processing Time Measurement

📊 Resource Usage Monitoring

🚀 Next Steps

find/grep/awk Master Series Practical
Combinations & Real-World Applications