PythonHaven | Grouping Anagrams in Python

Grouping Anagrams in Python: From Naive to Optimal

If you've ever built a spellchecker, a search suggestion engine, or even just prepared for coding interviews, you've likely encountered the Group Anagrams problem.

It’s one of the most popular string manipulation challenges because it tests not just your coding skills but your ability to optimize for scale.

In this guide, you’ll learn:

How to group anagrams in Python
The difference between brute-force and optimal solutions
How to handle real-world, large-scale datasets
Common mistakes and related problems to strengthen your understanding

Whether you're prepping for a LeetCode interview or building backend systems at scale, this guide is for you.

What Is the Group Anagrams Problem?

Problem Definition

Given a list of strings, group the strings that are anagrams of each other.

An anagram is a word formed by rearranging the letters of another word.

Example Input

["eat", "tea", "tan", "ate", "nat", "bat"]

Example Output

[["eat", "tea", "ate"], ["tan", "nat"], ["bat"]]

Real-World Applications

This isn't just a coding exercise. Grouping anagrams has real uses:

Spell-check suggestions (e.g., "tea" → suggest "eat" or "ate")
Fraud detection (identifying scrambled text patterns)
Natural Language Processing (NLP) preprocessing
Plagiarism detection where sentences are reworded or scrambled

Solution 1: Brute Force (Sorting-Based Approach)

Idea

If two words are anagrams, their sorted forms will match.

Example:

Word	Sorted Form
eat	aet
tea	aet
ate	aet
tan	ant
nat	ant
bat	abt

Group words that share the same sorted form.

The simplest way to group anagrams is to sort the characters in each word. Since anagrams share the same letters, sorting reveals a common structure.

from typing import List

def group_anagrams(s: List[str]) -> List[List[str]]:
    dp = {}
    for i in s:
        # Sort characters of the string to form the key
        sorted_i = ''.join(sorted(i))
        if sorted_i not in dp:
            dp[sorted_i] = []
        dp[sorted_i].append(i)
    return list(dp.values())

Time Complexity

Step	Complexity
Sorting each word	O(k log k)
Total time	O(n × k log k)
Space	O(n × k)

n: Number of words
k: Average word length

Why This Works?

Sorting turns "eat" and "tea" into the same key: "aet".
We use a dictionary to group all words with the same sorted key.

Why Use This?

Simple to implement
Works well for small datasets
Quickly understandable in interviews

What's the Problem?:

Sorting takes O(k log k) time for each word (k = word length).
Becomes inefficient for large datasets.

Solution 2: Optimized Approach (Frequency Counter as Key)

Idea

Instead of sorting, count the frequency of each character.
Anagrams will have the same frequency distribution.

Use a 26-element tuple as the key (since we’re dealing with lowercase English letters).

This avoids expensive sorting operations and creates a unique signature for each anagram group.

from collections import defaultdict
from typing import List

def group_anagrams(strs: List[str]) -> List[List[str]]:
    """
    Groups anagrams using frequency count as key.
    Each word is mapped to a 26-length character frequency tuple.
    """
    anagram_groups = defaultdict(list)

    for word in strs:
        count = [0] * 26  # 26 lowercase English letters
        for char in word:
            count[ord(char) - ord('a')] += 1
        key = tuple(count)  # Tuples are hashable and usable as dict keys
        anagram_groups[key].append(word)

    return list(anagram_groups.values())

Time Complexity

Step	Complexity
Counting characters	O(k)
Total time	O(n × k)
Space	O(n × k)

n = number of words
k = max length of any word

Why This Is Better

No need to sort → Faster for large datasets
Handles 100,000+ strings easily
Scales well for production systems

Test Cases: Manual

print(group_anagrams(["eat", "tea", "tan", "ate", "nat", "bat"]))
print(group_anagrams([]))
print(group_anagrams([""]))
print(group_anagrams(["a"*10000, "a"*10000]))

Even on massive datasets:

import random

print(group_anagrams(["".join(random.sample("abcde", 5)) for _ in range(100000)]))

Test Cases: Robust Verification

Here’s a copy-paste ready test suite to validate both solutions:

def run_tests():
    test_cases = [
        (["eat", "tea", "tan", "ate", "nat", "bat"], 3),
        ([""], 1),
        (["a"], 1),
        (["".join(chr(97 + i % 26) for i in range(1000))] * 1000, 1),
        ([chr(i % 26 + 97) * 5 for i in range(10**5)], 26),
    ]

    for i, (words, expected_groups) in enumerate(test_cases, 1):
        result = group_anagrams(words)
        actual_groups = len(result)
        print(f"Test {i}: {'Pass' if actual_groups == expected_groups else 'Fail'}")

Common Mistakes to Avoid

Sorting strings in every iteration when a frequency counter is faster.
Using lists as dictionary keys (not hashable)—use tuples instead.
Ignoring edge cases like [""] or ["a"].

Problem	Link
Valid Anagram Check	Checks if two strings are anagrams
Anagram Substring Search	Find anagrams of a pattern in a larger text
Minimum Window Substring	Find smallest substring containing all characters
Find All Anagrams in a String	Classic sliding window

Ready to Scale with Python

The brute-force method of grouping anagrams may work for a handful of words, but it quickly becomes a bottleneck in production-scale systems. By switching to a frequency-based key strategy, we not only make our solution faster, but also more scalable and elegant.

After reading this, you should:

* Be able to clearly distinguish between naive and optimized anagram grouping methods.
* Know how to write scalable, clean, and efficient Python code.
* Understand how to tackle related issues in real-world scenarios

Real-World Scaling Tips

When moving from interviews to production:

Use defaultdict for clean code
If working on huge data pipelines:
- Use parallelization (multiprocessing or threads)
- Consider Redis, Memcached, or database sharding for distributed keys

Final Thoughts: When to Use Each Method

Scenario	Method
Small Inputs / Interviews	Sorting-based
Large-scale Systems	Frequency Counter

Grouping Anagrams in Python

Grouping Anagrams in Python: From Naive to Optimal

What Is the Group Anagrams Problem?

Problem Definition

Example Input

Example Output

Real-World Applications

Solution 1: Brute Force (Sorting-Based Approach)

Idea

Why This Works?

Why Use This?

What's the Problem?:

Solution 2: Optimized Approach (Frequency Counter as Key)

Idea

Time Complexity

Why This Is Better

Test Cases: Manual

Test Cases: Robust Verification

Common Mistakes to Avoid

Related Problems (Expand Your Understanding)

Ready to Scale with Python

Real-World Scaling Tips

Final Thoughts: When to Use Each Method

Download Resources

Conclusion

Join the Discussion

Grouping Anagrams in Python: From Naive to Optimal

What Is the Group Anagrams Problem?

Problem Definition

Example Input

Example Output

Real-World Applications

Solution 1: Brute Force (Sorting-Based Approach)

Idea

Why This Works?

Why Use This?

What's the Problem?:

Solution 2: Optimized Approach (Frequency Counter as Key)

Idea

Time Complexity

Why This Is Better

Test Cases: Manual

Test Cases: Robust Verification

Common Mistakes to Avoid

Related Problems (Expand Your Understanding)

Ready to Scale with Python

Real-World Scaling Tips

Final Thoughts: When to Use Each Method

Download Resources

Conclusion

Become a PythonHaven Member

Join the Discussion

Read Next