find/grep/awk Fundamentals
Command Overview & Regular Expression Masterclass
Once you're comfortable with basic Linux commands, the next essential skills to master are find, grep, and awk. These three commands are extremely powerful for file operations on Linux, and mastering them will dramatically improve your productivity.
This fundamentals guide covers the overview and proper usage of these three commands, a complete mastery of regular expressions that can be used with all commands, and find command's powerful search capabilities, all explained with practical examples.
π Skills You'll Master in This Article
π Table of Contents
1. Overview and Usage Guide for the Three Commands
First, let's understand the characteristics and purposes of each command. Choosing the right command is the first step toward efficient work.
π find
Primary Uses
- Search files by name
- Filter by size and date
- Search by permissions and owner
- Batch processing on found files
Specialty
"Finding files when you don't know their location"
find /home -name "*.txt" -size +1M
Search for .txt files larger than 1MB
π grep
Primary Uses
- Search text within files
- Advanced search with regex
- Log file analysis
- Configuration file verification
Specialty
"Finding specific strings inside file contents"
grep -r "ERROR" /var/log/
Search for ERROR in log directory
βοΈ awk
Primary Uses
- Column data extraction and calculation
- CSV file processing
- Log file aggregation
- Format conversion
Specialty
"Processing, aggregating, and transforming data"
awk '{sum+=$3} END {print sum}' sales.csv
Sum the 3rd column of CSV
π― Which Command Should You Use? Decision Flowchart
2. π Regular Expression Masterclass
Regular expressions are an essential skill for unlocking the true power of find, grep, and awk. This chapter provides a complete mastery of regular expressions from basics to advanced pattern matching, with examples ready for immediate real-world use.
π€ From Basic to Advanced Regular Expressions
π Regular Expression Types and Compatible Tools
| Type | Abbreviation | Compatible Tools | Characteristics |
|---|---|---|---|
| Basic Regular Expression | BRE | grep, sed, vi | Metacharacters require escaping |
| Extended Regular Expression | ERE | egrep, grep -E, awk | More intuitive notation |
| Perl Compatible Regular Expression | PCRE | grep -P, perl, python | Most feature-rich (lookahead/lookbehind support) |
π Advanced Pattern Matching
Grouping and Back References
Basic Grouping
grep -E "(error|warning|critical)" log.txt
Search for multiple keywords with OR condition
Back Reference
grep -E "([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})" access.log
Group each octet of IP address
awk usage example:
awk 'match($0, /([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)/, arr) {
if (arr[1] == 192 && arr[2] == 168) print "Private IP: " $0
}'
Greedy vs Non-Greedy Matching
Greedy Match - Default
echo "<tag>content</tag> <tag>more</tag>" | grep -o "<.*>"
Result: <tag>content</tag> <tag>more</tag>
Non-Greedy Match - PCRE
echo "<tag>content</tag> <tag>more</tag>" | grep -oP "<.*?>"
Result: <tag> </tag> <tag> </tag>
Lookahead and Lookbehind Assertions (PCRE)
Positive Lookahead (?=pattern)
grep -P "\d+(?=ε)" price.txt
Extract numbers before "ε" (yen)
Negative Lookahead (?!pattern)
grep -P "test(?!\.txt)" filelist.txt
Files with "test" except test.txt
Positive Lookbehind (?<=pattern)
grep -P "(?<=\$)\d+" invoice.txt
Extract numbers after $ sign
πΌ Practical Regular Expression Pattern Library
π Log Analysis Patterns
IP Address (IPv4)
grep -E "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b" access.log
Strict version (0-255 range check):
grep -E "\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b"
Date/Time Pattern (Apache Format)
grep -E "\[[0-9]{2}/[A-Z][a-z]{2}/[0-9]{4}:[0-9]{2}:[0-9]{2}:[0-9]{2} [+-][0-9]{4}\]"
Example: [14/Sep/2025:10:30:45 +0900]
HTTP Status Codes
grep -E "\" [1-5][0-9]{2} " access.log | awk '{print $(NF-1)}' | sort | uniq -c
Aggregate by status code
Error Level Extraction
grep -E "\b(DEBUG|INFO|WARN|ERROR|FATAL|CRITICAL)\b" app.log
π§ Data Validation Patterns
Email Address (Simple)
grep -E "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"
URL (http/https)
grep -E "https?://[^\s\"']+" webdata.txt
Phone Number (Japan)
grep -E "(0[0-9]{1,4}-?[0-9]{1,4}-?[0-9]{4})" contacts.txt
Example: 03-1234-5678, 090-1234-5678, 0312345678
Credit Card Number Masking
sed -E 's/\b([0-9]{4})[- ]?([0-9]{4})[- ]?([0-9]{4})[- ]?([0-9]{4})\b/\1-XXXX-XXXX-\4/g'
Mask all except first and last 4 digits
π» Code Analysis Patterns
Function Definition (JavaScript/Python)
grep -E "^(function|def)\s+[a-zA-Z_][a-zA-Z0-9_]*\s*\(" *.js *.py
Variable Declaration (JavaScript)
grep -E "^(var|let|const)\s+[a-zA-Z_][a-zA-Z0-9_]*" *.js
Comment Extraction
grep -E "(//.*$|/\*.*\*/)" code.js
Single-line and multi-line comments
TODO/FIXME Comments
grep -E "(TODO|FIXME|XXX|HACK|NOTE):" -n *.py
π§ Regular Expression Debugging and Testing
Debugging Techniques
1. Incremental Construction
Build complex regex incrementally:
grep "[0-9]" test.txt
Lines with digits
grep "[0-9]\+" test.txt
One or more digits
grep "^[0-9]\+$" test.txt
Lines with only digits
2. Verify Partial Matches with -o Option
echo "test123abc456" | grep -o "[0-9]\+"
123
456
3. Verify Escaping
Escaping differences between BRE and ERE:
# BRE (Basic Regular Expression)
grep "192\.168\.1\.\+" access.log # Escape +
# ERE (Extended Regular Expression)
grep -E "192\.168\.1\..+" access.log # No escape for +
β‘ Regular Expression Performance Optimization
1. Use Anchors
grep "error" huge.log
grep "^error" huge.log
Limit search scope with line start anchor
2. Minimize Character Classes
grep ".*error.*" log.txt
grep "error" log.txt
Remove unnecessary .*
3. Use -F Option for Fixed Strings
grep -F "exact_string" file.txt
Faster because regex engine is not used
3. find Command: The Art of File Search
find is a powerful command that can search the file system thoroughly. Let's learn from basic syntax to advanced applications step by step.
π§ Basic Syntax
find [search_path] [search_criteria] [action]
Find files matching search criteria in search path and execute action
π Basic Search Patterns
Search by Name
find /home -name "*.txt"
Search for files with .txt extension
find . -name "config*"
Files starting with "config" in current directory and below
find /var -iname "*.LOG"
Case-insensitive search for .log files
Search by File Type
find /home -type f
Search for regular files only
find /var -type d -name "log*"
Search for directories starting with "log"
find /tmp -type l
Search for symbolic links
Search by Size
find /var -size +100M
Files larger than 100MB
find /home -size -1k
Files smaller than 1KB
find . -size +1G -size -10G
Files between 1GB and 10GB
π Search by Date and Time
π mtime (Modification Time)
find /home -mtime -7
Files modified within last 7 days
find /var/log -mtime +30
Files modified more than 30 days ago
π atime (Access Time)
find /tmp -atime +1
Files not accessed for more than 1 day
π newer (Newer Than Reference File)
find /home -newer reference.txt
Files newer than reference.txt
π Search by Permissions and Owner
β‘ Execute Actions
The true power of find is its ability to automatically execute processes on found files.
ποΈ Delete Files
find /tmp -name "*.tmp" -delete
Batch delete temporary files
find /var/log -name "*.log" -mtime +30 -delete
Delete log files older than 30 days
π§ Change Permissions
find /var/www -name "*.php" -exec chmod 644 {} \;
Change PHP file permissions to 644
find /home -type d -exec chmod 755 {} \;
Change directory permissions to 755
π Gather Information
find /home -name "*.txt" -exec ls -lh {} \;
Display detailed information for txt files
find /var -size +100M -exec du -h {} \;
Display size of files larger than 100MB
π‘ find Command Best Practices
π― Limit Search Scope
Searching from root directory (/) takes time, so specify the most specific directory possible
find /var/log -name "*.log"find / -name "*.log"π« Avoid Permission Errors
Hide error messages from directories without access permission
find / -name "*.txt" 2>/dev/nullβ‘ Efficient Condition Combination
Combine multiple conditions for precise search
find /home -name "*.log" -size +1M -mtime -7Log files larger than 1MB, within 7 days