Grouping Anagrams in Python: From Naive to Optimal
If you've ever built a spellchecker, a search suggestion engine, or even just prepared for coding interviews, you've likely encountered the Group Anagrams problem.
It’s one of the most popular string manipulation challenges because it tests not just your coding skills but your ability to optimize for scale.
In this guide, you’ll learn:
-
How to group anagrams in Python
-
The difference between brute-force and optimal solutions
-
How to handle real-world, large-scale datasets
-
Common mistakes and related problems to strengthen your understanding
Whether you're prepping for a LeetCode interview or building backend systems at scale, this guide is for you.
What Is the Group Anagrams Problem?
Problem Definition
Given a list of strings, group the strings that are anagrams of each other.
An anagram is a word formed by rearranging the letters of another word.
Example Input
["eat", "tea", "tan", "ate", "nat", "bat"]
Example Output
[["eat", "tea", "ate"], ["tan", "nat"], ["bat"]]
Real-World Applications
This isn't just a coding exercise. Grouping anagrams has real uses:
-
Spell-check suggestions (e.g., "tea" → suggest "eat" or "ate")
-
Fraud detection (identifying scrambled text patterns)
-
Natural Language Processing (NLP) preprocessing
-
Plagiarism detection where sentences are reworded or scrambled
Solution 1: Brute Force (Sorting-Based Approach)
Idea
If two words are anagrams, their sorted forms will match.
Example:
Word | Sorted Form |
---|---|
eat | aet |
tea | aet |
ate | aet |
tan | ant |
nat | ant |
bat | abt |
Group words that share the same sorted form.
The simplest way to group anagrams is to sort the characters in each word. Since anagrams share the same letters, sorting reveals a common structure.
from typing import List
def group_anagrams(s: List[str]) -> List[List[str]]:
dp = {}
for i in s:
# Sort characters of the string to form the key
sorted_i = ''.join(sorted(i))
if sorted_i not in dp:
dp[sorted_i] = []
dp[sorted_i].append(i)
return list(dp.values())
Time Complexity
Step | Complexity |
---|---|
Sorting each word | O(k log k) |
Total time | O(n × k log k) |
Space | O(n × k) |
-
n: Number of words
-
k: Average word length
Why This Works?
- Sorting turns "eat" and "tea" into the same key: "aet".
- We use a dictionary to group all words with the same sorted key.
Why Use This?
-
Simple to implement
-
Works well for small datasets
-
Quickly understandable in interviews
What's the Problem?:
- Sorting takes O(k log k) time for each word (k = word length).
- Becomes inefficient for large datasets.
Solution 2: Optimized Approach (Frequency Counter as Key)
Idea
Instead of sorting, count the frequency of each character.
Anagrams will have the same frequency distribution.
Use a 26-element tuple as the key (since we’re dealing with lowercase English letters).
This avoids expensive sorting operations and creates a unique signature for each anagram group.
from collections import defaultdict
from typing import List
def group_anagrams(strs: List[str]) -> List[List[str]]:
"""
Groups anagrams using frequency count as key.
Each word is mapped to a 26-length character frequency tuple.
"""
anagram_groups = defaultdict(list)
for word in strs:
count = [0] * 26 # 26 lowercase English letters
for char in word:
count[ord(char) - ord('a')] += 1
key = tuple(count) # Tuples are hashable and usable as dict keys
anagram_groups[key].append(word)
return list(anagram_groups.values())
Time Complexity
Step | Complexity |
---|---|
Counting characters | O(k) |
Total time | O(n × k) |
Space | O(n × k) |
- n = number of words
- k = max length of any word
Why This Is Better
-
No need to sort → Faster for large datasets
-
Handles 100,000+ strings easily
-
Scales well for production systems
Test Cases: Manual
print(group_anagrams(["eat", "tea", "tan", "ate", "nat", "bat"]))
print(group_anagrams([]))
print(group_anagrams([""]))
print(group_anagrams(["a"*10000, "a"*10000]))
Even on massive datasets:
import random
print(group_anagrams(["".join(random.sample("abcde", 5)) for _ in range(100000)]))
Test Cases: Robust Verification
Here’s a copy-paste ready test suite to validate both solutions:
def run_tests():
test_cases = [
(["eat", "tea", "tan", "ate", "nat", "bat"], 3),
([""], 1),
(["a"], 1),
(["".join(chr(97 + i % 26) for i in range(1000))] * 1000, 1),
([chr(i % 26 + 97) * 5 for i in range(10**5)], 26),
]
for i, (words, expected_groups) in enumerate(test_cases, 1):
result = group_anagrams(words)
actual_groups = len(result)
print(f"Test {i}: {'Pass' if actual_groups == expected_groups else 'Fail'}")
Common Mistakes to Avoid
-
Sorting strings in every iteration when a frequency counter is faster.
-
Using lists as dictionary keys (not hashable)—use tuples instead.
-
Ignoring edge cases like
[""]
or["a"]
.
Related Problems (Expand Your Understanding)
For full mastery, practice these too:
Problem | Link |
---|---|
Valid Anagram Check | Checks if two strings are anagrams |
Anagram Substring Search | Find anagrams of a pattern in a larger text |
Minimum Window Substring | Find smallest substring containing all characters |
Find All Anagrams in a String | Classic sliding window |
Ready to Scale with Python
The brute-force method of grouping anagrams may work for a handful of words, but it quickly becomes a bottleneck in production-scale systems. By switching to a frequency-based key strategy, we not only make our solution faster, but also more scalable and elegant.
After reading this, you should:
* Be able to clearly distinguish between naive and optimized anagram grouping methods.
* Know how to write scalable, clean, and efficient Python code.
* Understand how to tackle related issues in real-world scenarios
Real-World Scaling Tips
When moving from interviews to production:
-
Use defaultdict for clean code
-
If working on huge data pipelines:
-
Use parallelization (multiprocessing or threads)
-
Consider Redis, Memcached, or database sharding for distributed keys
-
Final Thoughts: When to Use Each Method
Scenario | Method |
---|---|
Small Inputs / Interviews | Sorting-based |
Large-scale Systems |
Frequency Counter |
Download Resources
Conclusion
Now you can:
-
Confidently solve Group Anagrams in Python
-
Optimize your solution for scale and production-readiness
-
Tackle related problems to deepen your algorithmic thinking
Join the Discussion
No comments yet. Be the first to share your thoughts!