find/grep/awk Fundamentals
Command Overview & Regular Expression Masterclass

Once you're comfortable with basic Linux commands, the next essential skills to master are find, grep, and awk. These three commands are extremely powerful for file operations on Linux, and mastering them will dramatically improve your productivity.

This fundamentals guide covers the overview and proper usage of these three commands, a complete mastery of regular expressions that can be used with all commands, and find command's powerful search capabilities, all explained with practical examples.

πŸ“Š Skills You'll Master in This Article

Command Selection Strategy
Regular Expression Mastery
find Command Proficiency

πŸ“‹ Table of Contents

  1. Overview and Usage Guide for the Three Commands
  2. 🎭 Regular Expression Masterclass
  3. find Command: The Art of File Search

1. Overview and Usage Guide for the Three Commands

First, let's understand the characteristics and purposes of each command. Choosing the right command is the first step toward efficient work.

πŸ” find

File & Directory Search

Primary Uses

  • Search files by name
  • Filter by size and date
  • Search by permissions and owner
  • Batch processing on found files

Specialty

"Finding files when you don't know their location"

find /home -name "*.txt" -size +1M

Search for .txt files larger than 1MB

πŸ”Ž grep

Text Content Search

Primary Uses

  • Search text within files
  • Advanced search with regex
  • Log file analysis
  • Configuration file verification

Specialty

"Finding specific strings inside file contents"

grep -r "ERROR" /var/log/

Search for ERROR in log directory

βš™οΈ awk

Text Processing & Data Manipulation

Primary Uses

  • Column data extraction and calculation
  • CSV file processing
  • Log file aggregation
  • Format conversion

Specialty

"Processing, aggregating, and transforming data"

awk '{sum+=$3} END {print sum}' sales.csv

Sum the 3rd column of CSV

🎯 Which Command Should You Use? Decision Flowchart

πŸ“ Don't know file location?
YES β†’ Use find
NO ↓
πŸ“ Want to search string inside files?
YES β†’ Use grep
NO ↓
πŸ“Š Want to process or aggregate data?
YES β†’ Use awk

2. 🎭 Regular Expression Masterclass

Regular expressions are an essential skill for unlocking the true power of find, grep, and awk. This chapter provides a complete mastery of regular expressions from basics to advanced pattern matching, with examples ready for immediate real-world use.

πŸ”€ From Basic to Advanced Regular Expressions

πŸ“Š Regular Expression Types and Compatible Tools

Type Abbreviation Compatible Tools Characteristics
Basic Regular Expression BRE grep, sed, vi Metacharacters require escaping
Extended Regular Expression ERE egrep, grep -E, awk More intuitive notation
Perl Compatible Regular Expression PCRE grep -P, perl, python Most feature-rich (lookahead/lookbehind support)

πŸ”° Complete Understanding of Basic Metacharacters

Position Specifiers (Anchors)
^ : Beginning of line
grep "^ERROR" logfile.txt Lines starting with ERROR
$ : End of line
grep "\.log$" filelist.txt Lines ending with .log
\b : Word boundary
grep -E "\bport\b" config.txt The word "port" (excludes "report", etc.)
Character Classes
. : Any single character (except newline)
grep "192\.168\.1\." access.log IP addresses like 192.168.1.x
[abc] : Character class (a, b, or c)
grep "[0-9][0-9]:[0-9][0-9]" log.txt Time format (HH:MM)
[^abc] : Negated character class
grep "[^a-zA-Z0-9]" data.txt Lines containing non-alphanumeric characters
Quantifiers
* : Zero or more repetitions
grep "error.*failed" log.txt Lines with both "error" and "failed"
+ : One or more repetitions (ERE)
grep -E "[0-9]+" data.txt One or more digits
? : Zero or one occurrence (ERE)
grep -E "https?" urls.txt http or https
{n,m} : Between n and m repetitions (ERE)
grep -E "[0-9]{2,4}" data.txt 2 to 4 digit numbers

πŸš€ Advanced Pattern Matching

Grouping and Back References
Basic Grouping
grep -E "(error|warning|critical)" log.txt

Search for multiple keywords with OR condition

Back Reference
grep -E "([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})" access.log

Group each octet of IP address

awk usage example:

awk 'match($0, /([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)/, arr) { if (arr[1] == 192 && arr[2] == 168) print "Private IP: " $0 }'
Greedy vs Non-Greedy Matching
Greedy Match - Default
echo "<tag>content</tag> <tag>more</tag>" | grep -o "<.*>"

Result: <tag>content</tag> <tag>more</tag>

Non-Greedy Match - PCRE
echo "<tag>content</tag> <tag>more</tag>" | grep -oP "<.*?>"

Result: <tag> </tag> <tag> </tag>

Lookahead and Lookbehind Assertions (PCRE)
Positive Lookahead (?=pattern)
grep -P "\d+(?=円)" price.txt

Extract numbers before "円" (yen)

Negative Lookahead (?!pattern)
grep -P "test(?!\.txt)" filelist.txt

Files with "test" except test.txt

Positive Lookbehind (?<=pattern)
grep -P "(?<=\$)\d+" invoice.txt

Extract numbers after $ sign

πŸ’Ό Practical Regular Expression Pattern Library

πŸ” Log Analysis Patterns
IP Address (IPv4)
grep -E "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b" access.log

Strict version (0-255 range check):

grep -E "\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b"
Date/Time Pattern (Apache Format)
grep -E "\[[0-9]{2}/[A-Z][a-z]{2}/[0-9]{4}:[0-9]{2}:[0-9]{2}:[0-9]{2} [+-][0-9]{4}\]"

Example: [14/Sep/2025:10:30:45 +0900]

HTTP Status Codes
grep -E "\" [1-5][0-9]{2} " access.log | awk '{print $(NF-1)}' | sort | uniq -c

Aggregate by status code

Error Level Extraction
grep -E "\b(DEBUG|INFO|WARN|ERROR|FATAL|CRITICAL)\b" app.log
πŸ“§ Data Validation Patterns
Email Address (Simple)
grep -E "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"
URL (http/https)
grep -E "https?://[^\s\"']+" webdata.txt
Phone Number (Japan)
grep -E "(0[0-9]{1,4}-?[0-9]{1,4}-?[0-9]{4})" contacts.txt

Example: 03-1234-5678, 090-1234-5678, 0312345678

Credit Card Number Masking
sed -E 's/\b([0-9]{4})[- ]?([0-9]{4})[- ]?([0-9]{4})[- ]?([0-9]{4})\b/\1-XXXX-XXXX-\4/g'

Mask all except first and last 4 digits

πŸ’» Code Analysis Patterns
Function Definition (JavaScript/Python)
grep -E "^(function|def)\s+[a-zA-Z_][a-zA-Z0-9_]*\s*\(" *.js *.py
Variable Declaration (JavaScript)
grep -E "^(var|let|const)\s+[a-zA-Z_][a-zA-Z0-9_]*" *.js
Comment Extraction
grep -E "(//.*$|/\*.*\*/)" code.js

Single-line and multi-line comments

TODO/FIXME Comments
grep -E "(TODO|FIXME|XXX|HACK|NOTE):" -n *.py

πŸ”§ Regular Expression Debugging and Testing

Debugging Techniques
1. Incremental Construction

Build complex regex incrementally:

Step 1: grep "[0-9]" test.txt Lines with digits
Step 2: grep "[0-9]\+" test.txt One or more digits
Step 3: grep "^[0-9]\+$" test.txt Lines with only digits
2. Verify Partial Matches with -o Option
echo "test123abc456" | grep -o "[0-9]\+"

123
456

3. Verify Escaping

Escaping differences between BRE and ERE:

# BRE (Basic Regular Expression) grep "192\.168\.1\.\+" access.log # Escape + # ERE (Extended Regular Expression) grep -E "192\.168\.1\..+" access.log # No escape for +

⚑ Regular Expression Performance Optimization

1. Use Anchors
❌ Slow: grep "error" huge.log
βœ… Fast: grep "^error" huge.log

Limit search scope with line start anchor

2. Minimize Character Classes
❌ Slow: grep ".*error.*" log.txt
βœ… Fast: grep "error" log.txt

Remove unnecessary .*

3. Use -F Option for Fixed Strings
grep -F "exact_string" file.txt

Faster because regex engine is not used

3. find Command: The Art of File Search

find is a powerful command that can search the file system thoroughly. Let's learn from basic syntax to advanced applications step by step.

πŸ”§ Basic Syntax

find [search_path] [search_criteria] [action]

Find files matching search criteria in search path and execute action

πŸ“ Basic Search Patterns

Search by Name

find /home -name "*.txt"

Search for files with .txt extension

find . -name "config*"

Files starting with "config" in current directory and below

find /var -iname "*.LOG"

Case-insensitive search for .log files

Search by File Type

find /home -type f

Search for regular files only

find /var -type d -name "log*"

Search for directories starting with "log"

find /tmp -type l

Search for symbolic links

Search by Size

find /var -size +100M

Files larger than 100MB

find /home -size -1k

Files smaller than 1KB

find . -size +1G -size -10G

Files between 1GB and 10GB

πŸ“… Search by Date and Time

πŸ” Search by Permissions and Owner

⚑ Execute Actions

The true power of find is its ability to automatically execute processes on found files.

πŸ—‘οΈ Delete Files

find /tmp -name "*.tmp" -delete

Batch delete temporary files

find /var/log -name "*.log" -mtime +30 -delete

Delete log files older than 30 days

πŸ”§ Change Permissions

find /var/www -name "*.php" -exec chmod 644 {} \;

Change PHP file permissions to 644

find /home -type d -exec chmod 755 {} \;

Change directory permissions to 755

πŸ“‹ Gather Information

find /home -name "*.txt" -exec ls -lh {} \;

Display detailed information for txt files

find /var -size +100M -exec du -h {} \;

Display size of files larger than 100MB

πŸ’‘ find Command Best Practices

🎯 Limit Search Scope

Searching from root directory (/) takes time, so specify the most specific directory possible

βœ… find /var/log -name "*.log"
❌ find / -name "*.log"

🚫 Avoid Permission Errors

Hide error messages from directories without access permission

βœ… find / -name "*.txt" 2>/dev/null

⚑ Efficient Condition Combination

Combine multiple conditions for precise search

βœ… find /home -name "*.log" -size +1M -mtime -7

Log files larger than 1MB, within 7 days

🎯 Next Steps

In this fundamentals guide, you've learned how to choose between find/grep/awk, complete mastery of regular expressions, and find command's powerful search capabilities.

In the next Advanced Guide, you'll master ultimate techniques for grep and awk, acquiring more sophisticated text processing and data manipulation skills.