How to Use find, grep, and awk - Linux Text Search Tutorial
Once you're comfortable with basic Linux commands, the next essential skills to master are find, grep, and awk. These three commands are extremely powerful for file operations on Linux, and mastering them will dramatically improve your productivity.
This fundamentals guide covers the overview and proper usage of these three commands, the basics of regular expressions that work with all of them, and the find command's powerful search capabilities, all explained with practical examples.
Overview and Command Selection
First, understand the characteristics and use cases of each command. Choosing the right command is the first step toward efficient work.
find: File and Directory Search
- Search files by name
- Filter by size or date
- Search by permissions or owner
- Batch processing on found files
Specialty: "Finding files when you don't know where they are".
find /home -name "*.txt" -size +1M
grep: Text Content Search
- Search text within files
- Advanced search using regex
- Log file analysis
- Configuration file inspection
Specialty: "Finding specific text inside files".
grep -r "ERROR" /var/log/
awk: Text Processing and Data Manipulation
- Extract and calculate column data
- Process CSV files
- Aggregate log files
- Format conversion
Specialty: "Processing, aggregating, and transforming data".
awk '{sum+=$3} END {print sum}' sales.csvDecision Flow
| Situation | Command to use |
|---|---|
| Don't know where files are | find |
| Want to find text inside files | grep |
| Want to process or aggregate data | awk |
Regular Expression Masterclass
Regular expressions are essential for unlocking the true power of find, grep, and awk. Master patterns from basics to ones immediately usable in production work.
Types of Regular Expressions
| Type | Abbr. | Tools | Characteristics |
|---|---|---|---|
| Basic Regex | BRE | grep, sed, vi | Metacharacters need escaping |
| Extended Regex | ERE | egrep, grep -E, awk | More intuitive syntax |
| Perl-Compatible Regex | PCRE | grep -P, perl | Most powerful (lookahead/lookbehind) |
Position Anchors
# Lines starting with ERROR grep "^ERROR" logfile.txt # Lines ending with .log grep "\.log$" filelist.txt # The word "port" (excludes "report" etc.) grep -E "\bport\b" config.txt
Character Classes
# 192.168.1.x IP addresses grep "192\.168\.1\." access.log # Time format (HH:MM) grep "[0-9][0-9]:[0-9][0-9]" log.txt # Lines containing non-alphanumeric characters grep "[^a-zA-Z0-9]" data.txt
Quantifiers
# Zero or more (error and failed on the same line)
grep "error.*failed" log.txt
# One or more (ERE)
grep -E "[0-9]+" data.txt
# Zero or one (http or https)
grep -E "https?" urls.txt
# Between n and m occurrences (2-4 digit numbers)
grep -E "[0-9]{2,4}" data.txtAdvanced Pattern Matching
Grouping and OR:
# Multiple keywords with OR grep -E "(error|warning|critical)" log.txt
Lookahead and Lookbehind (PCRE):
# Numbers before "yen" grep -P "\d+(?=yen)" price.txt # "test" not followed by ".txt" grep -P "test(?!\.txt)" filelist.txt # Numbers after $ sign grep -P "(?<=\$)\d+" invoice.txt
Practical Regex Patterns
Log Analysis:
# IPv4 addresses
grep -E "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b" access.log
# Apache date format
grep -E "\[[0-9]{2}/[A-Z][a-z]{2}/[0-9]{4}:[0-9]{2}:[0-9]{2}:[0-9]{2} [+-][0-9]{4}\]"
# HTTP status code aggregation
grep -E "\" [1-5][0-9]{2} " access.log | awk '{print $(NF-1)}' | sort | uniq -c
# Log levels
grep -E "\b(DEBUG|INFO|WARN|ERROR|FATAL|CRITICAL)\b" app.logData Validation:
# Email addresses (simple)
grep -E "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"
# URLs (http/https)
grep -E "https?://[^\s\"']+" webdata.txtCode Analysis:
# Function definitions (JavaScript/Python)
grep -E "^(function|def)\s+[a-zA-Z_][a-zA-Z0-9_]*\s*\(" *.js *.py
# Variable declarations (JavaScript)
grep -E "^(var|let|const)\s+[a-zA-Z_][a-zA-Z0-9_]*" *.js
# TODO/FIXME comments
grep -E "(TODO|FIXME|XXX|HACK|NOTE):" -n *.pyRegex Debugging
Build complex regexes incrementally.
# Step 1: Lines with digits grep "[0-9]" test.txt # Step 2: One or more digits grep "[0-9]\+" test.txt # Step 3: Only digits grep "^[0-9]\+$" test.txt
Use -o to verify partial matches:
echo "test123abc456" | grep -o "[0-9]\+"
123 456
BRE vs ERE escaping
- BRE: needs escape for
+→grep "192\.168\.1\.\+" - ERE: no escape needed →
grep -E "192\.168\.1\..+"
Performance Optimization
Three tips for faster regex
- Use anchors:
grep "^error" huge.logis faster thangrep "error" huge.log - Remove unnecessary
.*:grep "error" log.txtis enough (.*error.*is slow) - Use
-Ffor fixed strings:grep -F "exact_string" file.txtskips the regex engine
find Command: Mastering File Search
find is a powerful command that lets you search the filesystem in any way you need.
Basic Syntax
find [search_path] [conditions] [actions]
It locates files matching the conditions in the search path, then runs actions.
Search by Name
# Files with .txt extension find /home -name "*.txt" # Files starting with "config" find . -name "config*" # Case-insensitive .log search find /var -iname "*.LOG"
Search by File Type
# Regular files only find /home -type f # Directories starting with "log" find /var -type d -name "log*" # Symbolic links find /tmp -type l
Search by Size
# 100MB or larger find /var -size +100M # Smaller than 1KB find /home -size -1k # Between 1GB and 10GB find . -size +1G -size -10G
Search by Date and Time
# Modified within the last 7 days (mtime) find /home -mtime -7 # Modified more than 30 days ago find /var/log -mtime +30 # Not accessed for more than 1 day (atime) find /tmp -atime +1 # Newer than reference.txt find /home -newer reference.txt
Search by Permission and Owner
# Files with permission 755 find /home -perm 755 # Files with setuid bit (security check) find / -perm -4000 2>/dev/null # Files owned by www-data find /var -user www-data # Files in the developers group find /home -group developers
Execute Actions
The true power of find is being able to automatically run actions on found files.
Delete files:
# Delete temporary files in bulk find /tmp -name "*.tmp" -delete # Delete log files older than 30 days find /var/log -name "*.log" -mtime +30 -delete
Change permissions:
# Set PHP files to 644
find /var/www -name "*.php" -exec chmod 644 {} \;
# Set directories to 755
find /home -type d -exec chmod 755 {} \;Gather information:
# Show details of .txt files
find /home -name "*.txt" -exec ls -lh {} \;
# Show sizes of files larger than 100MB
find /var -size +100M -exec du -h {} \;Best Practices
Limit the search scope
Searching from the root directory (/) is slow. Specify a more concrete starting directory.
- Good:
find /var/log -name "*.log" - Bad:
find / -name "*.log"
Suppress permission errors
Hide error messages from inaccessible directories with 2>/dev/null.
find / -name "*.txt" 2>/dev/null
Combine conditions efficiently
Stack multiple conditions for precision.
# Logs larger than 1MB modified in the last 7 days find /home -name "*.log" -size +1M -mtime -7