Grouping Anagrams in Python: From Naive to Optimal
If you've ever dealt with strings in Python, chances are you've encountered a classic algorithm problem: **grouping anagrams**.
As a software engineer deeply involved in backend systems and algorithm optimization, I’ve worked through countless implementations of this task—from slow, brute-force methods to highly efficient solutions.
In this article, you'll learn how to group anagrams in Python, understand the reasoning behind each approach, and see how we can scale it efficiently to handle even 100,000+ strings.
Brute Force: Sorting Strings to Group Anagrams
The simplest way to group anagrams is to sort the characters in each word. Since anagrams share the same letters, sorting reveals a common structure.
from typing import List
def group_anagrams(s: List[str]) -> List[List[str]]:
dp = {}
for i in s:
# Sort characters of the string to form the key
sorted_i = ''.join(sorted(i))
if sorted_i not in dp:
dp[sorted_i] = []
dp[sorted_i].append(i)
return list(dp.values())
Example:
print(group_anagrams(["eat", "tea", "tan", "ate", "nat", "bat"]))
Output:
[['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]
Why This Works
- Sorting turns "eat" and "tea" into the same key: "aet".
- We use a dictionary to group all words with the same sorted key.
Problem:
- Sorting takes O(k log k) time for each word (k = word length).
- Becomes inefficient for large datasets.
Optimized Approach: Frequency Count as Dictionary Key
Instead of sorting, count the frequency of each character. This avoids expensive sorting operations and creates a unique signature for each anagram group.
from collections import defaultdict
from typing import List
def group_anagrams(strs: List[str]) -> List[List[str]]:
"""
Groups anagrams using frequency count as key.
Each word is mapped to a 26-length character frequency tuple.
"""
anagram_groups = defaultdict(list)
for word in strs:
count = [0] * 26 # 26 lowercase English letters
for char in word:
count[ord(char) - ord('a')] += 1
key = tuple(count) # Tuples are hashable and usable as dict keys
anagram_groups[key].append(word)
return list(anagram_groups.values())
Benefits:
- Time complexity is O(n \ k), where:
- n = number of words
- k = max length of any word
- Space complexity is O(n \ k), but very manageable
- Significantly faster on large inputs
Test Cases:
print(group_anagrams(["eat", "tea", "tan", "ate", "nat", "bat"]))
print(group_anagrams([]))
print(group_anagrams([""]))
print(group_anagrams(["a"*10000, "a"*10000]))
Even on massive datasets:
import random
print(group_anagrams(["".join(random.sample("abcde", 5)) for _ in range(100000)]))
Ready to Scale with Python
The brute-force method of grouping anagrams may work for a handful of words, but it quickly becomes a bottleneck in production-scale systems. By switching to a frequency-based key strategy, we not only make our solution faster, but also more scalable and elegant.
After reading this, you should:
* Be able to clearly distinguish between naive and optimized anagram grouping methods.
* Know how to write scalable, clean, and efficient Python code.
* Understand how to tackle related issues in real-world scenarios
Use the optimized solution in your next project or interview and watch how much more confident and capable you feel in tackling string manipulation problems.