Algorithms

Underscorify Substring

8 months ago ; 103 views
Share this

Underscorify Substring

In this question, you are provided with two strings. The first one, called the main string is the longer string. The smaller string is also called the substring.

The objective of the question is to write a function that will find every instance of the substring in the main string and return a new string that's technically the main string with underscores wrapped around every instance of the substring.

Conceptual Overview

I will first give a high-level understanding of the problem by explaining what happens in each of the member functions so that when we bring together these functions, the solution will finally make sense.

get_locations Function

Finds all occurrences of a substring in a string and returns a list of start and end indices for each occurrence.

collapse_locations Function

Merges overlapping or adjacent intervals in the list of locations.

underscore_string Function

Uses the list of locations to underscore the original string.

underscore_substring Function

Combines the process of getting locations and underscoring the string into a single function call.

Implementation of the Solution

def underscore_substring(string, substring):
    """
    Underscore specific occurrences of a substring in a string.

    Parameters:
    - string (str): The original string.
    - substring (str): The substring to underscore in the original string.

    Returns:
    - str: The modified string with underscores.
    """
    # Combine the steps to get locations and collapse them into a single call
    locations = collapse_locations(get_locations(string, substring))
    # Use the locations to underscore the original string
    return underscore_string(string, locations)

def get_locations(string, substring):
    """
    Find all occurrences of a substring in a string.

    Parameters:
    - string (str): The original string.
    - substring (str): The substring to find in the original string.

    Returns:
    - list: A list of start and end indices for each occurrence of the substring.
    """
    locations = []
    startIdx = 0
    while startIdx < len(string):
        # Find the next occurrence of the substring in the remaining part of the string
        nextIdx = string.find(substring, startIdx)
        if nextIdx != -1:
            # If found, add the start and end indices of the substring to locations
            locations.append([nextIdx, nextIdx + len(substring)])
            # Move the starting index to the next character after the found substring
            startIdx = nextIdx + 1
        else:
            # If no more occurrences are found, break out of the loop
            break
    return locations

def collapse_locations(locations):
    """
    Merge overlapping or adjacent intervals in a list of locations.

    Parameters:
    - locations (list): A list of start and end indices.

    Returns:
    - list: A list of merged or collapsed locations.
    """
    # If there are no locations, return an empty list
    if not len(locations):
        return locations
    
    newLocations = [locations[0]]
    previous = newLocations[0]
    for i in range(1, len(locations)):
        current = locations[i]
        if current[0] <= previous[1]:
            # If the current location overlaps with the previous one, merge them
            previous[1] = current[1]
        else:
             # If no overlap, add the current location to the newLocations list
            newLocations.append(current)
            previous = current
    return newLocations

def underscore_string(string, locations):
    """
    Underscore specific locations in a string.

    Parameters:
    - string (str): The original string.
    - locations (list): A list of start and end indices to underscore in the original string.

    Returns:
    - str: The modified string with underscores.
    """
    locationsIdx = 0
    stringIdx = 0
    inBetweenUnderscores = False
    finalChars = []
    i = 0

    # Iterate through the string and the locations simultaneously
    while stringIdx < len(string) or locationsIdx < len(locations):
        if locationsIdx < len(locations) and stringIdx == locations[locationsIdx][i]:
            # If the current character is at a location, add an underscore
            finalChars.append("_")
            # Toggle the inBetweenUnderscores flag
            inBetweenUnderscores = not inBetweenUnderscores
            if not inBetweenUnderscores:
                # If the underscore is closed, move to the next location
                locationsIdx += 1
            # Switch between start and end indices of the location
            i = 0 if i == 1 else 1
        elif stringIdx < len(string):
             # If not in a location, append the current character from the original string
            finalChars.append(string[stringIdx])
            stringIdx += 1
    # Join the characters to form the final underscored string
    return "".join(finalChars)

print(underscore_substring("test this is a test", "test"))
print(underscore_substring("testthis is a testtest to see if testestest it works", "test"))

 

Time & Space Complexity

In this section, I will analyze the time and space complexity of each function:

get_locations Function

Time Complexity: O(N*M), where N is the length of the string, and M is the length of the substring. The find method is used in a loop, and in the worst case, it needs to traverse the entire string for each occurrence of the substring.

Space Complexity: O(K), where K is the number of occurrences of the substring. The locations list stores the start and end indices for each occurrence.

collapse_locations Function

Time Complexity: O(K), where K is the number of locations. The function iterates through the locations once.
Space Complexity: O(K), as it creates a new list to store the collapsed locations.

underscore_string Function

Time Complexity: O(N + K), where N is the length of the string, and K is the number of locations. The function iterates through the string and locations once.
Space Complexity: O(N), as the final_chars list stores the characters of the modified string.

Generally, the time complexity is dominated by the iteration through the string in the underscore_string function. On the other hand, the space complexity is primarily determined by the space required to store the locations and the modified string.

 

 

 

Become a member
Get the latest news right in your inbox. We never spam!

Read next

Island Perimeter

&nbsp;This solution seeks to find the perimeter of an island in a grid.&nbsp; The problem considers a grid where each cell represents land (1) or … Read More

Kibsoft 3 months, 3 weeks ago . 76 views

Pacific Atlantic Waterflow

3 months, 3 weeks ago . 82 views