Process Management Practical: Advanced Control and Monitoring Techniques

Process Management Practical - System Monitoring and Optimization

After mastering process management basics, let's acquire practical operational techniques. Learn advanced skills needed in real work like job control, system monitoring, and troubleshooting.

Table of Contents

  1. Job Control
  2. nice/renice - Priority Management
  3. System Monitoring Tools
  4. Troubleshooting

1. Job Control

Managing background and foreground jobs in the shell.

Background Execution

Start Command in Background

$ long_running_command &
[1] 12345

Adding & at the end of a command runs it in the background.

Move Running Command to Background

$ long_running_command
^Z                    # Suspend with Ctrl+Z
[1]+  Stopped     long_running_command
$ bg                  # Resume in background
[1]+ long_running_command &

Job Management Commands

List Jobs

$ jobs
[1]-  Running     command1 &
[2]+  Stopped     command2

Bring to Foreground

$ fg %1    # Bring job number 1 to foreground

Resume in Background

$ bg %2    # Resume stopped job 2 in background

Continue After Logout with nohup

$ nohup long_running_command &
$ disown %1    # Detach from current shell

πŸ’‘ Practical Tips

  • nohup: Use to continue processes after SSH disconnection
  • disown: Completely detach job from shell
  • screen/tmux: Recommended for more advanced session management

2. nice/renice - Priority Management

Adjust process execution priority (nice value).

Nice Value Range

  • -20: Highest priority (requires root privileges)
  • 0: Default priority
  • +19: Lowest priority

Smaller values mean higher priority.

Start Process with Low Priority

$ nice -n 10 backup_script.sh

Execute backup script with nice value 10 (low priority)

Change Running Process Priority

$ renice +5 -p 1234

Change nice value of process PID 1234 to 5

Execute with High Priority (root privileges)

$ sudo nice -n -10 critical_process

Execute critical process with high priority

3. System Monitoring Tools

Overall System Load Check

uptime - System Uptime and Load

$ uptime
14:30:01 up 5 days, 2:15, 3 users, load average: 0.15, 0.25, 0.20

Load average: 1, 5, and 15-minute average loads

vmstat - Virtual Memory Statistics

$ vmstat 1 5    # Display 5 times at 1-second intervals

CPU, memory, I/O, and swap statistics

iostat - I/O Statistics

$ iostat -x 1 5    # Detailed I/O statistics

Detailed disk I/O statistics

sar - System Activity Report

$ sar -u 1 5    # CPU usage
$ sar -r 1 5    # Memory usage

Comprehensive system performance data

Process Detailed Information

/proc Filesystem

$ cat /proc/1234/status      # Process status
$ cat /proc/1234/cmdline     # Command line
$ cat /proc/1234/environ     # Environment variables

Files Opened by Process

$ lsof -p 1234               # Specific process
$ lsof /var/log/syslog       # Specific file

Network Connections

$ netstat -tulpn             # All connections
$ ss -tulpn                  # Faster alternative

4. Troubleshooting

Case 1: High CPU Usage

Diagnostic Procedure

# 1. Identify high CPU processes
$ top -o %CPU
$ ps aux --sort=-%cpu | head -10

# 2. Investigate process details
$ strace -p PID    # Trace system calls

Case 2: Memory Shortage

Diagnostic Procedure

# 1. Check memory usage
$ free -h
$ ps aux --sort=-%mem | head -10

# 2. Check swap usage
$ swapon -s
$ vmstat 1 5

Case 3: Zombie Processes

Resolution Method

# 1. Check for zombie processes
$ ps aux | grep -w Z

# 2. Identify and restart parent process
$ ps -eo pid,ppid,state,comm | grep Z
$ kill -HUP parent_process_PID

Case 4: Unresponsive Process

Gradual Approach

# 1. Try graceful termination
$ kill -TERM PID

# 2. Wait a few seconds, check status
$ ps -p PID

# 3. Force kill as last resort
$ kill -KILL PID

πŸ“Š Simple Monitoring Script Example

#!/bin/bash
# System monitoring script

LOG_FILE="/var/log/system_monitor.log"
THRESHOLD_CPU=80
THRESHOLD_MEM=90

# CPU usage check
CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | sed 's/%us,//')
if (( $(echo "$CPU_USAGE > $THRESHOLD_CPU" | bc -l) )); then
    echo "$(date): High CPU usage: $CPU_USAGE%" >> $LOG_FILE
fi

# Memory usage check
MEM_USAGE=$(free | grep Mem | awk '{printf("%.1f", $3/$2 * 100.0)}')
if (( $(echo "$MEM_USAGE > $THRESHOLD_MEM" | bc -l) )); then
    echo "$(date): High memory usage: $MEM_USAGE%" >> $LOG_FILE
fi

⚠️ Common Mistakes in Practice and Pitfalls

Common mistakes in actual operations and professional responses.

🚫 Mistake 1: Misusing nohup

❌ Common Mistake

$ nohup long_command  # Forgetting &
$ nohup long_command &
$ exit  # Exiting without disown

Forgetting backgrounding or causing processes to remain running.

βœ… Correct Usage

# Complete command
$ nohup long_command > output.log 2>&1 &

# Safer method
$ screen -S session_name
$ long_command  # Run within screen session
$ Ctrl+A, D    # Detach session

Properly redirect output and manage processes.

🚫 Mistake 2: Job Control Confusion

❌ Confusing Example

$ command1 &
$ command2 &
$ jobs    # Can't tell which is which
$ fg %1   # Select wrong job

Difficult to identify when multiple jobs are running.

βœ… Manageable Method

# Add meaningful comments
$ backup_script.sh &    # Backup job
$ jobs -l               # Check with PID
$ ps aux | grep backup  # Check by name

# Named sessions with tmux/screen
$ tmux new-session -d -s backup 'backup_script.sh'
$ tmux list-sessions

Manage jobs with meaningful names.

🚫 Mistake 3: Misunderstanding nice Values

❌ Incorrect Understanding

# Thinking higher nice value = faster
$ nice -n 19 important_process  # Lowest priority!
$ renice -20 $$  # Regular user specifying highest priority

Understanding nice values backwards.

βœ… Correct Understanding and Usage

# Background tasks at low priority
$ nice -n 10 backup_script.sh

# Important processes at high priority (requires root)
$ sudo nice -n -5 critical_process

# Change priority of existing process
$ sudo renice -10 -p 1234

Understand that smaller nice values mean higher priority.

🚫 Mistake 4: Excessive or Insufficient Monitoring

❌ Problematic Monitoring

# Excessive monitoring (top every second)
$ while true; do top -n 1; sleep 1; done

# Insufficient monitoring
$ ps aux | grep myprocess  # Check only once

Can increase system load or miss problems.

βœ… Appropriate Monitoring Methods

# Monitoring at appropriate intervals
$ watch -n 5 'ps aux --sort=-%cpu | head -10'

# Continuous logging
$ vmstat 5 > /tmp/vmstat.log &
$ iostat -x 5 > /tmp/iostat.log &

# Threshold-based monitoring
$ while true; do
    load=$(uptime | awk '{print $NF}' | cut -d, -f1)
    if (( $(echo "$load > 2.0" | bc -l) )); then
        echo "$(date): High load: $load" >> /var/log/load.log
    fi
    sleep 60
done

Monitor with appropriate frequency and methods for the purpose.

🚫 Mistake 5: Panic During Troubleshooting

❌ Hasty Response

# When system is heavy, immediately
$ sudo killall -9 httpd     # Force kill everything
$ sudo reboot              # Reboot immediately

Forceful measures without investigation can worsen problems.

βœ… Systematic Approach

# 1. Check situation
$ uptime                   # Check load
$ free -h                  # Check memory
$ df -h                    # Check disk

# 2. Identify problem
$ ps aux --sort=-%cpu | head -10  # Top CPU users
$ ps aux --sort=-%mem | head -10  # Top memory users

# 3. Gradual response
$ kill -TERM problematic_pid      # Try graceful termination first
$ sleep 5
$ ps -p problematic_pid           # Check status
# Additional measures as needed

Systematically analyze problems before responding.

🎯 Practical Professional Techniques

πŸ“Š Process Monitoring Automation

# Process health monitoring script
#!/bin/bash
PROCESS_NAME="nginx"
RESTART_CMD="sudo systemctl start nginx"

if ! pgrep "$PROCESS_NAME" > /dev/null; then
    echo "$(date): $PROCESS_NAME stopped. Restarting..." | logger
    $RESTART_CMD
fi

⚑ Performance Optimization

# Parallelize CPU-intensive tasks according to core count
cores=$(nproc)
for i in $(seq 1 $cores); do
    heavy_task.sh chunk_$i &
done
wait  # Wait for all processing to complete

πŸ›‘οΈ Safe Process Management

# Safe confirmation before process termination
safe_kill() {
    local pid=$1
    local timeout=${2:-10}

    # Check process exists
    if ! kill -0 "$pid" 2>/dev/null; then
        echo "Process $pid does not exist"
        return 1
    fi

    # Try graceful termination
    kill -TERM "$pid"

    # Wait for specified seconds
    for i in $(seq 1 "$timeout"); do
        if ! kill -0 "$pid" 2>/dev/null; then
            echo "Process $pid terminated gracefully"
            return 0
        fi
        sleep 1
    done

    # Force kill
    echo "Force killing process $pid"
    kill -KILL "$pid"
}

Best Practices

πŸ“Š Regular Monitoring

  • Regular system state checks with cron
  • Regular log file review
  • Track resource usage trends

πŸ”§ Resource Limit Configuration

  • Set user limits with ulimit
  • Service resource limits with systemd
  • Detailed control with cgroups

🚨 Alert Configuration

  • CPU and memory usage thresholds
  • Disk capacity monitoring
  • Critical process health monitoring

Summary

Mastering practical process management skills enables stable system operations.

Key Points

  • Job control for efficient task management
  • nice/renice for resource priority adjustment
  • Monitoring tools for proactive problem discovery
  • Systematic approach to troubleshooting

Next Steps

πŸ“š Related Learning Topics

  • Shell Scripting - Automation and task management
  • System Administration - Service management and maintenance
  • Network Monitoring - Network performance optimization
  • Security - Process-level security measures

πŸ“Š Complete Process Management Series

  1. Basics - ps, top, kill fundamental operations
  2. Practical (This Article) - Job control, nice, system monitoring
πŸ“’ About Affiliate Links

As an Amazon Associate, this site earns from qualifying purchases through product links. This is at no additional cost to you. Book recommendations are from Amazon.co.jp (Japan), chosen for their value to Japanese-speaking learners.