find/grep/awk Fundamentals
Command Overview & Regular Expression Masterclass
                    
                Once you're comfortable with basic Linux commands, the next essential skills to master are find, grep, and awk. These three commands are extremely powerful for file operations on Linux, and mastering them will dramatically improve your productivity.
This fundamentals guide covers the overview and proper usage of these three commands, a complete mastery of regular expressions that can be used with all commands, and find command's powerful search capabilities, all explained with practical examples.
π Skills You'll Master in This Article
π Table of Contents
1. Overview and Usage Guide for the Three Commands
First, let's understand the characteristics and purposes of each command. Choosing the right command is the first step toward efficient work.
π find
Primary Uses
- Search files by name
- Filter by size and date
- Search by permissions and owner
- Batch processing on found files
Specialty
"Finding files when you don't know their location"
find /home -name "*.txt" -size +1M
                                    Search for .txt files larger than 1MB
π grep
Primary Uses
- Search text within files
- Advanced search with regex
- Log file analysis
- Configuration file verification
Specialty
"Finding specific strings inside file contents"
grep -r "ERROR" /var/log/
                                    Search for ERROR in log directory
βοΈ awk
Primary Uses
- Column data extraction and calculation
- CSV file processing
- Log file aggregation
- Format conversion
Specialty
"Processing, aggregating, and transforming data"
awk '{sum+=$3} END {print sum}' sales.csv
                                    Sum the 3rd column of CSV
π― Which Command Should You Use? Decision Flowchart
2. π Regular Expression Masterclass
Regular expressions are an essential skill for unlocking the true power of find, grep, and awk. This chapter provides a complete mastery of regular expressions from basics to advanced pattern matching, with examples ready for immediate real-world use.
π€ From Basic to Advanced Regular Expressions
π Regular Expression Types and Compatible Tools
| Type | Abbreviation | Compatible Tools | Characteristics | 
|---|---|---|---|
| Basic Regular Expression | BRE | grep, sed, vi | Metacharacters require escaping | 
| Extended Regular Expression | ERE | egrep, grep -E, awk | More intuitive notation | 
| Perl Compatible Regular Expression | PCRE | grep -P, perl, python | Most feature-rich (lookahead/lookbehind support) | 
π Advanced Pattern Matching
Grouping and Back References
Basic Grouping
grep -E "(error|warning|critical)" log.txt
                                        Search for multiple keywords with OR condition
Back Reference
grep -E "([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})" access.log
                                        Group each octet of IP address
awk usage example:
awk 'match($0, /([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)/, arr) {
    if (arr[1] == 192 && arr[2] == 168) print "Private IP: " $0
}'
                                        Greedy vs Non-Greedy Matching
Greedy Match - Default
echo "<tag>content</tag> <tag>more</tag>" | grep -o "<.*>"
                                        Result: <tag>content</tag> <tag>more</tag>
Non-Greedy Match - PCRE
echo "<tag>content</tag> <tag>more</tag>" | grep -oP "<.*?>"
                                        Result: <tag> </tag> <tag> </tag>
Lookahead and Lookbehind Assertions (PCRE)
Positive Lookahead (?=pattern)
grep -P "\d+(?=ε)" price.txt
                                        Extract numbers before "ε" (yen)
Negative Lookahead (?!pattern)
grep -P "test(?!\.txt)" filelist.txt
                                        Files with "test" except test.txt
Positive Lookbehind (?<=pattern)
grep -P "(?<=\$)\d+" invoice.txt
                                        Extract numbers after $ sign
πΌ Practical Regular Expression Pattern Library
π Log Analysis Patterns
IP Address (IPv4)
grep -E "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b" access.log
                                        Strict version (0-255 range check):
grep -E "\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b"
                                        Date/Time Pattern (Apache Format)
grep -E "\[[0-9]{2}/[A-Z][a-z]{2}/[0-9]{4}:[0-9]{2}:[0-9]{2}:[0-9]{2} [+-][0-9]{4}\]"
                                        Example: [14/Sep/2025:10:30:45 +0900]
HTTP Status Codes
grep -E "\" [1-5][0-9]{2} " access.log | awk '{print $(NF-1)}' | sort | uniq -c
                                        Aggregate by status code
Error Level Extraction
grep -E "\b(DEBUG|INFO|WARN|ERROR|FATAL|CRITICAL)\b" app.log
                                    π§ Data Validation Patterns
Email Address (Simple)
grep -E "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"
                                    URL (http/https)
grep -E "https?://[^\s\"']+" webdata.txt
                                    Phone Number (Japan)
grep -E "(0[0-9]{1,4}-?[0-9]{1,4}-?[0-9]{4})" contacts.txt
                                        Example: 03-1234-5678, 090-1234-5678, 0312345678
Credit Card Number Masking
sed -E 's/\b([0-9]{4})[- ]?([0-9]{4})[- ]?([0-9]{4})[- ]?([0-9]{4})\b/\1-XXXX-XXXX-\4/g'
                                        Mask all except first and last 4 digits
π» Code Analysis Patterns
Function Definition (JavaScript/Python)
grep -E "^(function|def)\s+[a-zA-Z_][a-zA-Z0-9_]*\s*\(" *.js *.py
                                    Variable Declaration (JavaScript)
grep -E "^(var|let|const)\s+[a-zA-Z_][a-zA-Z0-9_]*" *.js
                                    Comment Extraction
grep -E "(//.*$|/\*.*\*/)" code.js
                                        Single-line and multi-line comments
TODO/FIXME Comments
grep -E "(TODO|FIXME|XXX|HACK|NOTE):" -n *.py
                                    π§ Regular Expression Debugging and Testing
Debugging Techniques
1. Incremental Construction
Build complex regex incrementally:
grep "[0-9]" test.txt
                                                Lines with digits
                                            grep "[0-9]\+" test.txt
                                                One or more digits
                                            grep "^[0-9]\+$" test.txt
                                                Lines with only digits
                                            2. Verify Partial Matches with -o Option
echo "test123abc456" | grep -o "[0-9]\+"
                                    123
456
3. Verify Escaping
Escaping differences between BRE and ERE:
# BRE (Basic Regular Expression)
grep "192\.168\.1\.\+" access.log    # Escape +
# ERE (Extended Regular Expression)
grep -E "192\.168\.1\..+" access.log  # No escape for +
                                    β‘ Regular Expression Performance Optimization
1. Use Anchors
grep "error" huge.log
                                        grep "^error" huge.log
                                        Limit search scope with line start anchor
2. Minimize Character Classes
grep ".*error.*" log.txt
                                        grep "error" log.txt
                                        Remove unnecessary .*
3. Use -F Option for Fixed Strings
grep -F "exact_string" file.txt
                                    Faster because regex engine is not used
3. find Command: The Art of File Search
find is a powerful command that can search the file system thoroughly. Let's learn from basic syntax to advanced applications step by step.
π§ Basic Syntax
find [search_path] [search_criteria] [action]
                        Find files matching search criteria in search path and execute action
π Basic Search Patterns
Search by Name
find /home -name "*.txt"
                                    Search for files with .txt extension
find . -name "config*"
                                    Files starting with "config" in current directory and below
find /var -iname "*.LOG"
                                    Case-insensitive search for .log files
Search by File Type
find /home -type f
                                    Search for regular files only
find /var -type d -name "log*"
                                    Search for directories starting with "log"
find /tmp -type l
                                    Search for symbolic links
Search by Size
find /var -size +100M
                                    Files larger than 100MB
find /home -size -1k
                                    Files smaller than 1KB
find . -size +1G -size -10G
                                    Files between 1GB and 10GB
π Search by Date and Time
π mtime (Modification Time)
find /home -mtime -7
                                        Files modified within last 7 days
find /var/log -mtime +30
                                        Files modified more than 30 days ago
π atime (Access Time)
find /tmp -atime +1
                                        Files not accessed for more than 1 day
π newer (Newer Than Reference File)
find /home -newer reference.txt
                                        Files newer than reference.txt
π Search by Permissions and Owner
β‘ Execute Actions
The true power of find is its ability to automatically execute processes on found files.
ποΈ Delete Files
find /tmp -name "*.tmp" -delete
                                    Batch delete temporary files
find /var/log -name "*.log" -mtime +30 -delete
                                    Delete log files older than 30 days
π§ Change Permissions
find /var/www -name "*.php" -exec chmod 644 {} \;
                                    Change PHP file permissions to 644
find /home -type d -exec chmod 755 {} \;
                                    Change directory permissions to 755
π Gather Information
find /home -name "*.txt" -exec ls -lh {} \;
                                    Display detailed information for txt files
find /var -size +100M -exec du -h {} \;
                                    Display size of files larger than 100MB
π‘ find Command Best Practices
π― Limit Search Scope
Searching from root directory (/) takes time, so specify the most specific directory possible
find /var/log -name "*.log"find / -name "*.log"π« Avoid Permission Errors
Hide error messages from directories without access permission
find / -name "*.txt" 2>/dev/nullβ‘ Efficient Condition Combination
Combine multiple conditions for precise search
find /home -name "*.log" -size +1M -mtime -7Log files larger than 1MB, within 7 days