Data Structures and Algorithms

Regular Expression Matching in Python

1 day, 16 hours ago ; F(visit_count) + Value(1) views
Share this

Regular Expression Matching in Python: From Brute-Force to Optimal with Memoization

As a software engineer who focuses on Data Structures and Algorithms (DSA) and teaches others through platforms like PythonHaven.com, I’ve spent countless hours optimizing solutions for common yet deceptively tricky problems.

One of the most fundamental and widely asked problems in technical interviews is Regular Expression Matching using `.` and `*`.

In this article, we’ll walk through brute-force, basic handling, and optimized recursive approaches—with complete explanations that even a 2-year-old

could follow.

 

What’s the Problem?

You are given a string `s` and a pattern `p`. You need to return `True` if the pattern matches the entire string. Here are the rules:

 `.` matches any single character.
`*` matches zero or more of the preceding character.

 

Test Cases:

```
("aa", "a") → False  
("aa", "a*") → True  
("mississippi", "mis*is*p*.") → False  
("a"*50000 + "b", "a*b") → True  
("a"*50000 + "b", ".*") → True  
```

 

Understanding the Brute-Force Way

The first step many beginners take is simulating the pattern matching process manually.

Brute-Force Attempt: Simplistic Loop Matching

 

def reg_expr_matching(s, pattern):
    i, j = 0, 0
    while i < len(s) and j < len(pattern):
        if j + 1 < len(pattern) and pattern[j + 1] == '*':
            while i < len(s) and s[i] == pattern[j]:
                i += 1
            j += 2
        elif pattern[j] == '.':
            i += 1
            j += 1
        elif s[i] == pattern[j]:
            i += 1
            j += 1
        else:
            return False

    while j + 1 < len(pattern) and pattern[j + 1] == '*':
        j += 2

    return i == len(s) and j == len(pattern)

Why It Fails

  • It only matches literal repeats but doesn't explore different branching possibilities.
  • Completely breaks on patterns like `"a*b"` with long inputs or backtracking needs.

Better: Recursive DFS Approach

Now let’s move to a more correct but not yet optimal approach—recursive depth-first search.

Recursive Backtracking

class Solution:

    def isMatch(self, s: str, p: str) -> bool:

        def dfs(i, j):
            if i >= len(s) and j >= len(p): return True
            if j >= len(p): return False

            match = i < len(s) and (s[i] == p[j] or p[j] == ".")
            if (j + 1) < len(p) and p[j + 1] == "*":
                return dfs(i, j + 2) or (match and dfs(i + 1, j))
            if match:
                return dfs(i + 1, j + 1)

            return False
        return dfs(0, 0)



How This Works

  • Try every possibility: use the `*` or skip it.
  • Match current characters directly or with `.`.
  • Move one step forward when characters match.

Drawback

Repeated subproblems lead to exponential time for long strings (`"a"*50000 + "b"` case fails here).

 

Optimal: Memoized DFS (Top-Down Dynamic Programming)

To fix repeated calculations, we’ll  cache results using memoization.

Final Optimized Version

class Solution:

    def isMatchMemoization(self, s: str, p: str) -> bool:

        memo = {}

        def dfs(i, j):
            if (i, j) in memo:
                return memo[(i, j)]

            if i >= len(s) and j >= len(p):
                return True

            if j >= len(p):
                return False

            match = i < len(s) and (s[i] == p[j] or p[j] == ".")

            if (j + 1) < len(p) and p[j + 1] == "*":
                memo[(i, j)] = dfs(i, j + 2) or (match and dfs(i + 1, j))
                return memo[(i, j)]

            if match:
                memo[(i, j)] = dfs(i + 1, j + 1)
                return memo[(i, j)]

            memo[(i, j)] = False
            return False

        return dfs(0, 0)

Why This is the Best

Memoization avoids recomputing subproblems.
Handles long strings with `O(n * m)` complexity, where `n` and `m` are lengths of `s` and `p`.

 

Real-World Performance on Edge Cases

# Code to Run All Test Cases:

sol = Solution()
reg_expr_matching = sol.isMatchMemoization

print(f"Expected False, Actual", reg_expr_matching("mississippi", "mis*is*p*."))
print(f"Expected False, Actual", reg_expr_matching("aa", "a"))
print(f"Expected True, Actual", reg_expr_matching("aa", "a*"))
print(f"Expected True, Actual", reg_expr_matching("aab", "c*a*b"))
print(f"Expected False, Actual", reg_expr_matching("mississippi", "mis*is*p*."))
print(f"Expected True, Actual", reg_expr_matching("a"*50000 + "b", "a*b"))
print(f"Expected True, Actual", reg_expr_matching("a"*50000 + "b", ".*"))

You’ll see blazing fast results, even for huge input sizes.

 

Where to Go From Here

Now that you understand how to implement and optimize regular expression matching in Python—from basic loops to dynamic programming—you’re in a solid position for acing interview problems.

Here’s what you can do next:

- Try converting this top-down memoized approach to bottom-up DP.
- Apply similar memoization patterns to problems like wildcard matching, edit distance, and pattern parsing.
- Bookmark PythonHaven.com for deeper DSA content.

You now have a crystal-clear understanding of `.` and `*` matching in regex. Not only that, but you also know how to take a brute-force algorithm and make it lightning fast.
 

Become a member
Get the latest news right in your inbox. We never spam!

Read next

Longest substring without repeating characters

Longest Substring without Repeating Characters &nbsp; As an experienc… Read More

1 week, 3 days ago . 32 views