How to Use find, grep, and awk - Linux Text Search Tutorial

How to Use find, grep, and awk - Linux Text Search Tutorial

Once you're comfortable with basic Linux commands, the next essential skills to master are find, grep, and awk. These three commands are extremely powerful for file operations on Linux, and mastering them will dramatically improve your productivity.

This fundamentals guide covers the overview and proper usage of these three commands, the basics of regular expressions that work with all of them, and the find command's powerful search capabilities, all explained with practical examples.

Overview and Command Selection

First, understand the characteristics and use cases of each command. Choosing the right command is the first step toward efficient work.

  • Search files by name
  • Filter by size or date
  • Search by permissions or owner
  • Batch processing on found files

Specialty: "Finding files when you don't know where they are".

find /home -name "*.txt" -size +1M
  • Search text within files
  • Advanced search using regex
  • Log file analysis
  • Configuration file inspection

Specialty: "Finding specific text inside files".

grep -r "ERROR" /var/log/

awk: Text Processing and Data Manipulation

  • Extract and calculate column data
  • Process CSV files
  • Aggregate log files
  • Format conversion

Specialty: "Processing, aggregating, and transforming data".

awk '{sum+=$3} END {print sum}' sales.csv

Decision Flow

Situation Command to use
Don't know where files are find
Want to find text inside files grep
Want to process or aggregate data awk

Regular Expression Masterclass

Regular expressions are essential for unlocking the true power of find, grep, and awk. Master patterns from basics to ones immediately usable in production work.

Types of Regular Expressions

Type Abbr. Tools Characteristics
Basic Regex BRE grep, sed, vi Metacharacters need escaping
Extended Regex ERE egrep, grep -E, awk More intuitive syntax
Perl-Compatible Regex PCRE grep -P, perl Most powerful (lookahead/lookbehind)

Position Anchors

# Lines starting with ERROR
grep "^ERROR" logfile.txt

# Lines ending with .log
grep "\.log$" filelist.txt

# The word "port" (excludes "report" etc.)
grep -E "\bport\b" config.txt

Character Classes

# 192.168.1.x IP addresses
grep "192\.168\.1\." access.log

# Time format (HH:MM)
grep "[0-9][0-9]:[0-9][0-9]" log.txt

# Lines containing non-alphanumeric characters
grep "[^a-zA-Z0-9]" data.txt

Quantifiers

# Zero or more (error and failed on the same line)
grep "error.*failed" log.txt

# One or more (ERE)
grep -E "[0-9]+" data.txt

# Zero or one (http or https)
grep -E "https?" urls.txt

# Between n and m occurrences (2-4 digit numbers)
grep -E "[0-9]{2,4}" data.txt

Advanced Pattern Matching

Grouping and OR:

# Multiple keywords with OR
grep -E "(error|warning|critical)" log.txt

Lookahead and Lookbehind (PCRE):

# Numbers before "yen"
grep -P "\d+(?=yen)" price.txt

# "test" not followed by ".txt"
grep -P "test(?!\.txt)" filelist.txt

# Numbers after $ sign
grep -P "(?<=\$)\d+" invoice.txt

Practical Regex Patterns

Log Analysis:

# IPv4 addresses
grep -E "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b" access.log

# Apache date format
grep -E "\[[0-9]{2}/[A-Z][a-z]{2}/[0-9]{4}:[0-9]{2}:[0-9]{2}:[0-9]{2} [+-][0-9]{4}\]"

# HTTP status code aggregation
grep -E "\" [1-5][0-9]{2} " access.log | awk '{print $(NF-1)}' | sort | uniq -c

# Log levels
grep -E "\b(DEBUG|INFO|WARN|ERROR|FATAL|CRITICAL)\b" app.log

Data Validation:

# Email addresses (simple)
grep -E "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"

# URLs (http/https)
grep -E "https?://[^\s\"']+" webdata.txt

Code Analysis:

# Function definitions (JavaScript/Python)
grep -E "^(function|def)\s+[a-zA-Z_][a-zA-Z0-9_]*\s*\(" *.js *.py

# Variable declarations (JavaScript)
grep -E "^(var|let|const)\s+[a-zA-Z_][a-zA-Z0-9_]*" *.js

# TODO/FIXME comments
grep -E "(TODO|FIXME|XXX|HACK|NOTE):" -n *.py

Regex Debugging

Build complex regexes incrementally.

# Step 1: Lines with digits
grep "[0-9]" test.txt

# Step 2: One or more digits
grep "[0-9]\+" test.txt

# Step 3: Only digits
grep "^[0-9]\+$" test.txt

Use -o to verify partial matches:

echo "test123abc456" | grep -o "[0-9]\+"
123
456

BRE vs ERE escaping

  • BRE: needs escape for +grep "192\.168\.1\.\+"
  • ERE: no escape needed → grep -E "192\.168\.1\..+"

Performance Optimization

Three tips for faster regex

  1. Use anchors: grep "^error" huge.log is faster than grep "error" huge.log
  2. Remove unnecessary .*: grep "error" log.txt is enough (.*error.* is slow)
  3. Use -F for fixed strings: grep -F "exact_string" file.txt skips the regex engine

find Command: Mastering File Search

find is a powerful command that lets you search the filesystem in any way you need.

Basic Syntax

find [search_path] [conditions] [actions]

It locates files matching the conditions in the search path, then runs actions.

Search by Name

# Files with .txt extension
find /home -name "*.txt"

# Files starting with "config"
find . -name "config*"

# Case-insensitive .log search
find /var -iname "*.LOG"

Search by File Type

# Regular files only
find /home -type f

# Directories starting with "log"
find /var -type d -name "log*"

# Symbolic links
find /tmp -type l

Search by Size

# 100MB or larger
find /var -size +100M

# Smaller than 1KB
find /home -size -1k

# Between 1GB and 10GB
find . -size +1G -size -10G

Search by Date and Time

# Modified within the last 7 days (mtime)
find /home -mtime -7

# Modified more than 30 days ago
find /var/log -mtime +30

# Not accessed for more than 1 day (atime)
find /tmp -atime +1

# Newer than reference.txt
find /home -newer reference.txt

Search by Permission and Owner

# Files with permission 755
find /home -perm 755

# Files with setuid bit (security check)
find / -perm -4000 2>/dev/null

# Files owned by www-data
find /var -user www-data

# Files in the developers group
find /home -group developers

Execute Actions

The true power of find is being able to automatically run actions on found files.

Delete files:

# Delete temporary files in bulk
find /tmp -name "*.tmp" -delete

# Delete log files older than 30 days
find /var/log -name "*.log" -mtime +30 -delete

Change permissions:

# Set PHP files to 644
find /var/www -name "*.php" -exec chmod 644 {} \;

# Set directories to 755
find /home -type d -exec chmod 755 {} \;

Gather information:

# Show details of .txt files
find /home -name "*.txt" -exec ls -lh {} \;

# Show sizes of files larger than 100MB
find /var -size +100M -exec du -h {} \;

Best Practices

Limit the search scope

Searching from the root directory (/) is slow. Specify a more concrete starting directory.

  • Good: find /var/log -name "*.log"
  • Bad: find / -name "*.log"

Suppress permission errors

Hide error messages from inaccessible directories with 2>/dev/null.

find / -name "*.txt" 2>/dev/null

Combine conditions efficiently

Stack multiple conditions for precision.

# Logs larger than 1MB modified in the last 7 days
find /home -name "*.log" -size +1M -mtime -7

Next Steps

In the fundamentals, you learned how to choose between find, grep, and awk, the basics of regular expressions, and the powerful search capabilities of find. The advanced guide goes deeper into grep and awk techniques.