Data Structures and Algorithms

Regular Expression Matching Problem

1 week, 6 days ago ; F(visit_count) + Value(1) views
Share this

Regular Expression Matching problem

 

Understanding how to solve the Regular Expression Matching problem using different algorithmic approaches is crucial.

This problem offers a solid benchmark for evaluating time and space complexity trade-offs, especially for optimizing backend services where pattern recognition is key.

In this article, we dissect four solutions to the Regular Expression Matching problem using Python.

We'll evaluate each by how well it scales (both in time and memory), starting with the slowest and most memory-intensive and ending with the most optimized.

We'll also refine these solutions for clarity and performance.

 

What is the Problem?

 

You’re given a string s and a pattern p. The pattern may contain:

  • . Matches any single character.
  • * matches zero(0) or more of the preceding element.

 

The task: Determine if the string matches the entire pattern.

 

Solution 1: Using re.fullmatch() (Regex Library)

 

import re


def reg_expr_matching(s: str, p: str) -> bool:

    """

    Use Python's re.fullmatch to check if pattern matches the entire string.

    """

    return re.fullmatch(p, s) is not None

This is the most straightforward approach using Python’s built-in regex engine.

 

Time Complexity: Exponential in worst-case scenarios (e.g., backtracking-heavy patterns).

 

Space Complexity: Depends on regex engine internals (usually large overhead)

 

Drawbacks:

  • Not under your control (opaque performance)
  • Doesn't teach you underlying mechanics
  • Can fail performance tests in an interview or production-scale data

 

Solution 2: Greedy Pointer Matching (Incorrect in Some Cases)

 

This approach tries to simulate regex using a greedy strategy.

def reg_expr_matching(s, pattern):

    """

    Simulate regex matching using a greedy scan strategy.
    Fails on certain patterns like '.*' matching long strings.

    """

    i, j = 0, 0

    while i < len(s) and j < len(pattern):
        if j + 1 < len(pattern) and pattern[j + 1] == '*':
            if pattern[j] == '.':
                if j + 2 == len(pattern):
                    return True
                while i < len(s):
                    if reg_expr_matching(s[i:], pattern[j+2:]):
                        return True
                    i += 1
                return False
            else:
                while i < len(s) and s[i] == pattern[j]:
                    i += 1
                j += 2
        elif pattern[j] == '.' or s[i] == pattern[j]:
            i += 1
            j += 1
        else:
            return False

    while j + 1 < len(pattern) and pattern[j + 1] == '*':
        j += 2

    return i == len(s) and j == len(pattern)

Time Complexity: Worst case exponential due to repeated slicing and backtracking.

Space Complexity: High due to recursion and string slicing

Downside: Breaks for large inputs and complex patterns due to uncontrolled recursion depth.

 

Solution 3: Top-Down Dynamic Programming (Memoized Recursion)

 

import sys
sys.setrecursionlimit(10**6)


def reg_expr_matching(s: str, p: str) -> bool:

    """

    Top-down DP with memoization to avoid redundant computation.
    Scales better but hits recursion limits for large inputs.

    """

    cache = {}


    def dfs(i, j):

        if (i, j) in cache:
            return cache[(i, j)]

        if j == len(p):
            return i == len(s)

        match = i < len(s) and (s[i] == p[j] or p[j] == '.')

        if j + 1 < len(p) and p[j + 1] == '*':
            cache[(i, j)] = dfs(i, j + 2) or (match and dfs(i + 1, j))
            return cache[(i, j)]


        if match:
            cache[(i, j)] = dfs(i + 1, j + 1)
            return cache[(i, j)]


        cache[(i, j)] = False
        return False


    return dfs(0, 0)

This is a more principled solution that uses memoization to avoid redundant work.

Time Complexity: O(len(s) * len(p))

Space Complexity: O(len(s) * len(p)) for memo + call stack

Limitations: Risk of RecursionError or segmentation fault on large inputs

 

Solution 4: Bottom-Up Dynamic Programming (Tabulation)

def reg_expr_matching(s: str, p: str) -> bool:

    """

    Bottom-up DP: tabulate all subproblem solutions to avoid recursion.
    Handles large inputs and complex patterns efficiently.

    """

    dp = [[False] * (len(p) + 1) for _ in range(len(s) + 1)]
    dp[len(s)][len(p)] = True


    for i in range(len(s), -1, -1):

        for j in range(len(p) - 1, -1, -1):

            match = i < len(s) and (s[i] == p[j] or p[j] == '.')

            if j + 1 < len(p) and p[j + 1] == '*':

                dp[i][j] = dp[i][j + 2] or (match and dp[i + 1][j])

            else:

                dp[i][j] = match and dp[i + 1][j + 1]


    return dp[0][0]

This is the most robust and scalable solution. No recursion, no slicing, just pure iteration.

 

Time Complexity: O(len(s) * len(p))

Space Complexity: O(len(s) * len(p))

  • Handles massive input (e.g., 50,000+ chars)
  • No stack overflow
  • Predictable performance

 

If you're building performance-sensitive applications or prepping for interviews, go with Solution 4 (Bottom-Up DP).

 

It’s the safest, fastest, and most scalable.

 

Use Solution 3 only if you want to demonstrate recursion with memoization.

 

Avoid Solution 2 and Solution 1 unless you're doing quick scripts or one-off checks. They fail under scale or complexity.

 

To master pattern recognition and dynamic programming, recreate these solutions from scratch, benchmark them on different test cases, and visualize the DP table to build intuition.

 

Become a member
Get the latest news right in your inbox. We never spam!

Read next

Grouping Anagrams in Python

Grouping Anagrams in Python: From Naive to Optimal &nbsp; If you&#39;ve ever dealt with s… Read More

11 hours, 41 minutes ago . 102 views