Longest Substring Without Duplicates
In this question, I hope to help you find the longest substring without duplicates by first getting the conceptual overview right. I will then proceed to explore how I arrived at the solution and finally present the outcome in the form of a working code.
Conceptual Overview
The goal of this problem is to find the longest substring in a given string without repeating characters. In other words, we want to identify the maximum length substring where each character appears only once.
Approach to Solving the Problem
Sliding Window Technique
Use a sliding window approach to keep track of the current substring without duplication. Maintain two pointers, startIdx and i, to represent the start and end of the current substring.
Hash Map (lastSeen)
Utilize a hash map (lastSeen) to store the index of the last seen occurrence of each character.
Iterate Through the String
Iterate through each character in the string.
If the character is already in lastSeen, update startIdx to be the maximum of its current value and the next index. Subsequently, check if the length of the current substring is longer than the longest found so far. If yes, update the longest substring.
Return the Result
Return the longest substring found.
Here, I present the actual implementation of this solution.
# O(n) time | O(n, a) space -a is length of alphabet represented by the input string
def longest_substring_without_duplication(string):
"""
Find the longest substring without repeating characters in a given string.
Parameters:
- string (str): The input string.
Returns:
- str: The longest substring without repeating characters.
"""
# Dictionary to store the index of the last seen occurrence of each character
lastSeen = {}
# List to store the start and end indices of the longest substring
longest = [0, 1]
# Variable to track the start index of the current substring
startIdx = 0
# Iterate through each character and its index in the string
for i, char in enumerate(string):
# Update start index if the character is already seen, ensuring no repetition
startIdx = max(startIdx, lastSeen.get(char, -1) + 1)
# Check if the current substring is longer than the longest found so far
if longest[1] - longest[0] < i + 1 - startIdx:
longest = [startIdx, i + 1]
# Update the last seen index of the current character
lastSeen[char] = i
# Return the longest substring found
return string[longest[0]:longest[1]]
print(longest_substring_without_duplication("clementisa cap"))
Code Overview
lastSeen Dictionary
- Keeps track of the last seen index of each character to avoid repetition.
longest List
-
Stores the start and end indices of the longest substring found.
startIdx Variable
- Represents the start index of the current substring without repetition.
Main Loop
- Iterates through each character in the string.
- Updates startIdx to avoid repetition.
- Checks and updates the longest substring if needed.
- Updates lastSeen with the current index of the character.
Result
- Returns the longest substring found in the given string.
This solution efficiently finds the longest substring without repeating characters using a sliding window approach. The hash map helps optimize the process of checking for repeated characters
Time & Space Complexity
Let's analyze the time and space complexity of the solution above.
Time Complexity
The time complexity of the solution is O(N), where N is the length of the input string. Here's the breakdown:
The solution iterates through each character in the string exactly once, and for each character, it performs constant-time operations. Therefore, the time complexity is proportional to the length of the input string, O(N).
Space Complexity
The space complexity is O(min(N, M)), where N is the length of the input string, and M is the size of the character set (number of unique characters). Let me break it down for you:
lastSeen Dictionary
The space required for the lastSeen dictionary is proportional to the size of the character set (M). In the worst case, where all characters are unique, the size of lastSeen is O(M).
Other Variables
The space required for other variables (longest, startIdx, i, char) is constant and does not depend on the size of the input string.
Since M represents the number of unique characters, it is bounded by a constant, and in practice, it is often smaller than N. Therefore, the space complexity is effectively O(min(N, M)).
This solution is efficient and linear in terms of time complexity, making it suitable for handling strings of various lengths. The space complexity is also reasonable, especially when the number of unique characters is limited.