Regular Expression Matching problem
Understanding how to solve the Regular Expression Matching problem using different algorithmic approaches is crucial.
This problem offers a solid benchmark for evaluating time and space complexity trade-offs, especially for optimizing backend services where pattern recognition is key.
In this article, we dissect four solutions to the Regular Expression Matching problem using Python.
We'll evaluate each by how well it scales (both in time and memory), starting with the slowest and most memory-intensive and ending with the most optimized.
We'll also refine these solutions for clarity and performance.
What is the Problem?
You’re given a string s and a pattern p. The pattern may contain:
- . Matches any single character.
- * matches zero(0) or more of the preceding element.
The task: Determine if the string matches the entire pattern.
Solution 1: Using re.fullmatch() (Regex Library)
import re
def reg_expr_matching(s: str, p: str) -> bool:
"""
Use Python's re.fullmatch to check if pattern matches the entire string.
"""
return re.fullmatch(p, s) is not None
This is the most straightforward approach using Python’s built-in regex engine.
Time Complexity: Exponential in worst-case scenarios (e.g., backtracking-heavy patterns).
Space Complexity: Depends on regex engine internals (usually large overhead)
Drawbacks:
- Not under your control (opaque performance)
- Doesn't teach you underlying mechanics
- Can fail performance tests in an interview or production-scale data
Solution 2: Greedy Pointer Matching (Incorrect in Some Cases)
This approach tries to simulate regex using a greedy strategy.
def reg_expr_matching(s, pattern):
"""
Simulate regex matching using a greedy scan strategy.
Fails on certain patterns like '.*' matching long strings.
"""
i, j = 0, 0
while i < len(s) and j < len(pattern):
if j + 1 < len(pattern) and pattern[j + 1] == '*':
if pattern[j] == '.':
if j + 2 == len(pattern):
return True
while i < len(s):
if reg_expr_matching(s[i:], pattern[j+2:]):
return True
i += 1
return False
else:
while i < len(s) and s[i] == pattern[j]:
i += 1
j += 2
elif pattern[j] == '.' or s[i] == pattern[j]:
i += 1
j += 1
else:
return False
while j + 1 < len(pattern) and pattern[j + 1] == '*':
j += 2
return i == len(s) and j == len(pattern)
Time Complexity: Worst case exponential due to repeated slicing and backtracking.
Space Complexity: High due to recursion and string slicing
Downside: Breaks for large inputs and complex patterns due to uncontrolled recursion depth.
Solution 3: Top-Down Dynamic Programming (Memoized Recursion)
import sys
sys.setrecursionlimit(10**6)
def reg_expr_matching(s: str, p: str) -> bool:
"""
Top-down DP with memoization to avoid redundant computation.
Scales better but hits recursion limits for large inputs.
"""
cache = {}
def dfs(i, j):
if (i, j) in cache:
return cache[(i, j)]
if j == len(p):
return i == len(s)
match = i < len(s) and (s[i] == p[j] or p[j] == '.')
if j + 1 < len(p) and p[j + 1] == '*':
cache[(i, j)] = dfs(i, j + 2) or (match and dfs(i + 1, j))
return cache[(i, j)]
if match:
cache[(i, j)] = dfs(i + 1, j + 1)
return cache[(i, j)]
cache[(i, j)] = False
return False
return dfs(0, 0)
This is a more principled solution that uses memoization to avoid redundant work.
Time Complexity: O(len(s) * len(p))
Space Complexity: O(len(s) * len(p)) for memo + call stack
Limitations: Risk of RecursionError or segmentation fault on large inputs
Solution 4: Bottom-Up Dynamic Programming (Tabulation)
def reg_expr_matching(s: str, p: str) -> bool:
"""
Bottom-up DP: tabulate all subproblem solutions to avoid recursion.
Handles large inputs and complex patterns efficiently.
"""
dp = [[False] * (len(p) + 1) for _ in range(len(s) + 1)]
dp[len(s)][len(p)] = True
for i in range(len(s), -1, -1):
for j in range(len(p) - 1, -1, -1):
match = i < len(s) and (s[i] == p[j] or p[j] == '.')
if j + 1 < len(p) and p[j + 1] == '*':
dp[i][j] = dp[i][j + 2] or (match and dp[i + 1][j])
else:
dp[i][j] = match and dp[i + 1][j + 1]
return dp[0][0]
This is the most robust and scalable solution. No recursion, no slicing, just pure iteration.
Time Complexity: O(len(s) * len(p))
Space Complexity: O(len(s) * len(p))
- Handles massive input (e.g., 50,000+ chars)
- No stack overflow
- Predictable performance
If you're building performance-sensitive applications or prepping for interviews, go with Solution 4 (Bottom-Up DP).
It’s the safest, fastest, and most scalable.
Use Solution 3 only if you want to demonstrate recursion with memoization.
Avoid Solution 2 and Solution 1 unless you're doing quick scripts or one-off checks. They fail under scale or complexity.
To master pattern recognition and dynamic programming, recreate these solutions from scratch, benchmark them on different test cases, and visualize the DP table to build intuition.