Regular Expressions: Basic, Extended Regex and grep

Regular Expressions: Basic, Extended Regex and grep

What You Will Achieve

  • Explain the difference between Basic Regular Expressions (BRE) and Extended (ERE)
  • Write accurate patterns using anchors, character classes, and quantifiers
  • Use regex appropriately with grep / egrep / sed
  • Extract and exclude lines matching specific patterns from logs
  • Stop making the exam-frequent "BRE backslash" mistake

This is the core of LPIC-1 objective 103.7 "Search text files using regular expressions". Regex is a language for describing string patterns and underlies grep / sed / awk.

Deciding Between BRE and ERE

Regex has dialects. LPIC asks about two POSIX kinds.

Kind Commands Treatment of + ? { } ( ) |
Basic Regular Expression BRE grep / sed Backslash required (\+ \{ \()
Extended Regular Expression ERE grep -E / egrep / sed -E Used as-is (+ { ()

The difference "write \{3\} in grep but {3} in grep -E" is exam-frequent. Always be aware of which dialect you are writing.

Metacharacter Reference

Metachar Meaning Example
. Any single char a.c → abc, axc
* 0 or more of preceding ab* → a, ab, abb
^ Start-of-line anchor ^root
$ End-of-line anchor bash$
[ ] Character class [0-9] one digit
[^ ] Negated class [^0-9] non-digit
\+ (ERE +) 1 or more of preceding a\+
\? (ERE ?) 0 or 1 of preceding colou\?r
\{n,m\} (ERE {n,m}) Between n and m times [0-9]\{1,3\}
| (OR / alternation) Match either side. BRE needs the backslash, ERE not cat|dog

Steps

Step 1: Pin position with anchors

grep '^#' /etc/ssh/sshd_config
grep 'bash$' /etc/passwd
grep -x 'root' users.txt
#	$OpenBSD: sshd_config
#Port 22
root:x:0:0:root:/root:/bin/bash
root

^ is start of line and $ is end of line, zero-width anchors. grep -x matches the whole line (equivalent to ^pattern$). This is the basis of comment-line extraction and exact-match search.

Step 2: Specify ranges with character classes

grep '[0-9]\{1,3\}\.[0-9]\{1,3\}' access.log
grep -E '[0-9]{1,3}(\.[0-9]{1,3}){3}' access.log
grep '[[:space:]]' config.txt
192.168.1.10 - - [17/May/2026]
10.0.0.5 - - [17/May/2026]

[0-9] is one digit and \{1,3\} is 1 to 3 repetitions (BRE requires backslashes). [[:space:]] is a POSIX character class matching whitespace.

Step 3: Use BRE and ERE appropriately

grep 'colou\?r' notes.txt
grep -E 'colou?r' notes.txt
grep -E 'error|warning|fatal' app.log
color theme
favourite colour
[ERROR] disk full
[WARNING] high load

When using ? + | {} (), grep -E (ERE) is more readable. Doing the same in BRE requires backslash escaping and reduces readability.

Step 4: Combine extraction and exclusion

grep -v '^#' /etc/fstab | grep -v '^$'
grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' access.log | sort -u
grep -ci 'timeout' app.log
UUID=xxxx / ext4 defaults 0 1
192.168.1.10
10.0.0.5
7

-v shows non-matching lines (strip comments/blank lines), -o extracts only the match, -c gives counts, -i ignores case. Frequently used to extract effective lines from config files.

Step 5: Replace with regex in sed

echo "2026-05-17" | sed -E 's/([0-9]{4})-([0-9]{2})-([0-9]{2})/\3\/\2\/\1/'
sed -n '/^ERROR/p' app.log
17/05/2026
ERROR connection refused

sed -E is ERE mode. \1 \2 are backreferences that reuse groups captured with ( ) on the replacement side. Used for date format conversion and similar.

Why BRE Needs Backslashes

BRE historically preserves the old grep / ed syntax, treating + ? { ( | as "no special meaning (literal)". To use them as quantifiers, groups, or OR, you must mark "this is a metacharacter" with a backslash. ERE removes this clutter by treating special characters as metacharacters from the start. grep -E / egrep enable ERE.

Without understanding this design difference you cannot explain "grep 'a+' does not match one-or-more of a (it searches for literal a+)". LPIC asks exactly about this pitfall.

Troubleshooting

Symptom: grep 'a+' does not match one-or-more

Cause: In BRE, + is treated as a literal character

Check:

grep 'a\+' file
grep -E 'a+' file

Fix: Use ERE (grep -E), or escape as \+ in BRE.

Symptom: A dot matches any character and causes false positives

Cause: . means any single character in regex

Check:

grep '192\.168' access.log

Fix: Escape as \. to search for a literal dot. For fixed-string search, grep -F (fgrep) is safe.

Symptom: Cannot search a string containing special characters

Cause: [ * $ etc. are interpreted as metacharacters

Check:

grep -F '[error]' app.log

Fix: If the pattern needs no regex, use grep -F for fixed-string search. Escape only the necessary characters with \.

Completion Checklist

  • [ ] Verified start/end-of-line match with ^ $ anchors
  • [ ] Combined character classes like [0-9] with quantifiers
  • [ ] Verified BRE (\{3\}) vs ERE (grep -E '{3}') difference on a real machine
  • [ ] Tried the -v -o -c -i options
  • [ ] Replaced using sed -E backreferences (\1)

Summary

Scenario Syntax Purpose
Start match grep '^pattern' Comment/specific-line extraction
Exact match grep -x / ^...$ Whole-line match
One or more grep -E 'a+' Detect repeated chars
Match only grep -o Get only the matched part
Replace sed -E 's/.../.../' Reformat with backreferences

Regex underlies grep / sed / awk. Next, move on to other exam areas such as link mechanisms and process priorities to connect the knowledge.

Next Reading