Understanding the Frequency Counter Technique in Algorithm Design
The Frequency Counter technique is a powerful and efficient approach to solving problems requiring counting element occurrences in a dataset.
It is beneficial when dealing with arrays, strings, and other collections where we need to track element frequencies.
Instead of using nested loops (resulting in O(n^2) time complexity), this technique leverages hash maps (dictionaries in Python, objects in JavaScript) to achieve an optimal O(n) complexity.
How the Frequency Counter Works
- Use a hash map (dictionary in Python) to count occurrences of elements.
- Compare frequencies when needed (e.g., checking for anagrams and unique occurrences).
- Avoid nested loops by utilizing the constant-time lookups of hash maps.
Example 1: Checking if Two Arrays Have the Same Frequency of Elements
Given two lists, check if one is a squared version of the other.
Examples:
- [1, 2, 3, 2] and [4, 1, 9, 4] → True
- [1, 2, 3] and [1, 9] → False
Naïve Approach (O(n^2) Complexity)
# Uses nested loops and remove() which makes it inefficient
# Time Complexity: O(n^2)
def same(arr1, arr2):
if len(arr1) != len(arr2):
return False
for num in arr1:
if num ** 2 in arr2:
arr2.remove(num ** 2)
else:
return False
return True
Optimized Approach (O(n) Complexity)
from collections import Counter
def same(arr1, arr2):
if len(arr1) != len(arr2):
return False
freq1 = Counter(arr1)
freq2 = Counter(arr2)
# Check if squared numbers have the same frequency
return all(freq1[num] == freq2[num * num] for num in freq1)
Example 2: Checking for Anagrams
Determine whether two strings are anagrams (contain the same characters in the same frequency).
Examples:
- "listen" and "silent" → True
- "hello" and "ollhe" → True
- "rat" and "car" → False
Optimized Solution (O(n) Complexity)
def is_anagram(str1, str2):
if len(str1) != len(str2):
return False
freq = {}
for char in str1:
freq[char] = freq.get(char, 0) + 1
for char in str2:
if char not in freq or freq[char] == 0:
return False
freq[char] -= 1
return True
Example 3: Finding the First Non-Repeating Character
Find the first character in a string that does not repeat.
Examples:
- "aabbccdee" → 'd'
- "aabb" → None
Optimized Solution (O(n) Complexity)
def first_unique_char(s):
freq = {}
for char in s:
freq[char] = freq.get(char, 0) + 1
for char in s:
if freq[char] == 1:
return char
return None
Example 4: Longest Substring with K Distinct Characters
Consider a string ,'s', and an integer, 'k', find the length of the longest substring having at most k distinct characters.
Optimized Solution (Sliding Window + Frequency Counter)
The optimized solution results in a time complexity of O(n)
def longest_substr_with_k_distinct_chars(s, k):
"""
Finds the length of the longest substring in `s` having at most `k` distinct characters.
Args:
s (str): The input string.
k (int): The maximum number of distinct characters allowed in the substring.
Returns:
int: The length of the longest substring with at most `k` distinct characters.
Example:
>>> longest_substr_with_k_distinct_chars("eceba", 2) 3
>>> longest_substr_with_k_distinct_chars("aabbcc", 2) 4
>>> longest_substr_with_k_distinct_chars("abcabcabc", 3) 9
"""
freq = {} # Dictionary to store the frequency of characters in the current window
left = 0 # Left pointer of the sliding window
max_len = 0 # Variable to store the maximum length of the valid substring
for right in range(len(s)):
# Add the current character to the frequency dictionary
freq[s[right]] = freq.get(s[right], 0) + 1
# If the number of distinct characters exceeds `k`, move the left pointer
while len(freq) > k:
freq[s[left]] -= 1
if freq[s[left]] == 0:
del freq[s[left]] # Remove the character if its frequency drops to 0
left += 1
# Update the maximum length if the current window is valid
max_len = max(max_len, right - left + 1)
return max_len
#Example 1
print(f"Expected: 9->'abcabcabc' , Actual:", longest_substr_with_k_distinct_chars("abcabcabc", 3))
#Example 2
print(f"Expected: 3 ->'ece', Actual:",longest_substr_with_k_distinct_chars("eceba", 2))
#Example 3
print(f"Expected: 2->'aa' , Actual:", longest_substr_with_k_distinct_chars("aa", 1))
#Example 4
print(f"Expected: 4->'aabb' or 'bbcc' , Actual:", longest_substr_with_k_distinct_chars("aabbcc", 2))
Solution Walk-through
We aim to establish the length of the longest substring with at most k characters that are unique in a given string "s."
The method employs a sliding window technique with two pointers (left and right) to preserve the string's character window.
The window dynamically expands and contracts to guarantee that it has no more than k unique characters.
Key Steps
Frequency Dictionary:
- A dictionary freq stores the frequency of characters in the current window.
- This helps in tracking the number of distinct characters in the window.
Expand the Window:
- The right pointer moves forward, adding characters to the window and updating their frequency in the dictionary.
Contract the Window:
- If the number of distinct characters in the window exceeds k, the left pointer moves forward to shrink the window.
- The frequency of the character at the left pointer is decremented, and if it drops to 0, the character is removed from the dictionary.
Update Maximum Length:
- After ensuring the window is valid (contains at most k distinct characters), the length of the current window (right - left + 1) is compared with max_len, and max_len is updated if necessary.
Return the Result:
- The function returns max_len, which holds the length of the longest valid substring.
Thought Process and Evaluation
Why Use a Sliding Window?
- The sliding window technique is ideal for problems involving subarrays or substrings with specific constraints (e.g., maximum distinct characters).
- It ensures the solution is efficient, with an O(n) time complexity. In the latter, n is the length of the string.
Why Use a Frequency Dictionary?
- The frequency dictionary (freq) helps to track efficiently the count of characters that are distinct in the current window.
- We can quickly determine if the window is valid by updating the frequency of characters as the window expands and contracts.
Time Complexity
- The right pointer traverses the string once, and the left pointer moves forward only when necessary.
- Each character is processed at most twice (once by the right pointer and once by the left pointer), resulting in an O(n) time complexity.
Space Complexity
- The frequency dictionary stores k + 1 distinct characters at most, so the space complexity is O(k).
Example 5: Minimum Window Substring
Find the smallest substring in s that contains all characters of t.
Optimized Solution (Sliding Window + Frequency Counter)
This optimized version of the solution results in time complexity of O(n)
from collections import Counter
def min_window(s: str, t: str) -> str:
"""
Finds the minimum window substring in `s` that contains all the characters of `t`.
Uses 'sliding window' in determining the smallest substring in `s` that efficiently
includes all characters from `t` (including duplicates).
Args:
s (str): The input string we search for the substring.
t (str): The target string containing the required characters.
Returns:
str: The minimum window substring that contains all characters of `t`, or an empty string if no such substring exists.
Example:
>>> min_window("ADOBECODEBANC", "ABC")
'BANC'
>>> min_window("a", "a")
'a'
>>> min_window("a", "aa")
''
>>> min_window("abc", "d")
''
>>> min_window("aa", "aa")
'aa'
>>> min_window("ABAACBAB", "ABC")
'ACB'
Edge Cases Considered:
- If `t` contains characters not in `s`, return `""`.
- If `s` or `t` is empty, return `""`.
- If `s` and `t` are identical, return `s`.
- Handles cases where characters in `t` are repeated.
"""
if not s or not t:
return ""
target_freq = Counter(t) # Frequency of characters in t
window_freq = {} # Frequency of characters in the current window
left = 0 # Left boundary of the sliding window
min_len = float('inf') # Track minimum window length
min_window = "" # Store the minimum window substring
required = len(target_freq) # Number of unique characters needed
formed = 0 # Number of characters matched with required frequency
for right, char in enumerate(s):
# Expand the window by including the current character
window_freq[char] = window_freq.get(char, 0) + 1
# If the character is in t and its count in the window matches t's count, increase formed
if char in target_freq and window_freq[char] == target_freq[char]:
formed += 1
# Try to shrink the window when all characters match
while formed == required:
# Update minimum window if a smaller valid substring is found
if (right - left + 1) < min_len:
min_len = right - left + 1
min_window = s[left:right+1]
# Remove the leftmost character and move the window
window_freq[s[left]] -= 1
if s[left] in target_freq and window_freq[s[left]] < target_freq[s[left]]:
formed -= 1 # Reduce matched character count
left += 1 # Move the left boundary to shrink the window
return min_window
assert min_window("ADOBECODEBANC", "ABC") == "BANC" # Standard case
assert min_window("a", "a") == "a" # Single character match
assert min_window("a", "aa") == "" # Not enough 'a' characters in `s`
assert min_window("xyzxyzxyzxyz", "xyz") == "xyz" # Repeated patterns in `s`
assert min_window("aaaaaaaaaaaabbbbbcdd", "abcdd") == "abbbbbcdd" # Large input case
assert min_window("abc", "d") == "" # `t` contains a character not in `s`
assert min_window("aa", "aa") == "aa" # `s` and `t` are identical
assert min_window("ABAACBAB", "ABC") == "ACB" # Smallest valid window
print("All test cases passed!")
Summary Table
Problem Approach Complexity
Longest Substring with K Distinct Chars Sliding Window + Frequency Counter O(n)
Checking for Anagrams Frequency Counter O(n)
Finding First Non-Repeating Character Frequency Counter O(n)
Minimum Window Substring Sliding Window + Frequency Counter O(n)
Conclusion
The Frequency Counter technique is a crucial optimization strategy that transforms brute-force O(n^2) solutions into more efficient O(n) solutions.
We can efficiently count occurrences, compare datasets, and solve complex problems like anagrams, unique elements, substrings, and windowed searches by leveraging hash maps.
Mastering this technique, sliding windows, prefix sums, and in-place hashing will significantly improve your problem-solving skills and optimize your algorithmic approach!