Data Structures and Algorithms

Grouping Anagrams in Python

1 month ago ; F(visit_count) + Value(1) views
Share this

Grouping Anagrams in Python: From Naive to Optimal

 

If you've ever dealt with strings in Python, chances are you've encountered a classic algorithm problem: **grouping anagrams**.

As a software engineer deeply involved in backend systems and algorithm optimization, I’ve worked through countless implementations of this task—from slow, brute-force methods to highly efficient solutions.

In this article, you'll learn how to group anagrams in Python, understand the reasoning behind each approach, and see how we can scale it efficiently to handle even 100,000+ strings.

Brute Force: Sorting Strings to Group Anagrams

The simplest way to group anagrams is to sort the characters in each word. Since anagrams share the same letters, sorting reveals a common structure.

from typing import List

def group_anagrams(s: List[str]) -> List[List[str]]:
    dp = {}
    for i in s:
        # Sort characters of the string to form the key
        sorted_i = ''.join(sorted(i))
        if sorted_i not in dp:
            dp[sorted_i] = []
        dp[sorted_i].append(i)
    return list(dp.values())


Example:

print(group_anagrams(["eat", "tea", "tan", "ate", "nat", "bat"]))

Output:

[['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]

Why This Works

  • Sorting turns "eat" and "tea" into the same key: "aet".
  • We use a dictionary to group all words with the same sorted key.

Problem:

  • Sorting takes O(k log k) time for each word (k = word length).
  • Becomes inefficient for large datasets.

Optimized Approach: Frequency Count as Dictionary Key

Instead of sorting, count the frequency of each character. This avoids expensive sorting operations and creates a unique signature for each anagram group.
 

from collections import defaultdict
from typing import List

def group_anagrams(strs: List[str]) -> List[List[str]]:
    """
    Groups anagrams using frequency count as key.
    Each word is mapped to a 26-length character frequency tuple.
    """
    anagram_groups = defaultdict(list)

    for word in strs:
        count = [0] * 26  # 26 lowercase English letters
        for char in word:
            count[ord(char) - ord('a')] += 1
        key = tuple(count)  # Tuples are hashable and usable as dict keys
        anagram_groups[key].append(word)

    return list(anagram_groups.values())

Benefits:

  • Time complexity is O(n \ k), where:
    • n = number of words
    • k = max length of any word
  • Space complexity is O(n \ k), but very manageable
  • Significantly faster on large inputs

Test Cases:

print(group_anagrams(["eat", "tea", "tan", "ate", "nat", "bat"]))
print(group_anagrams([]))
print(group_anagrams([""]))
print(group_anagrams(["a"*10000, "a"*10000]))


Even on massive datasets:

import random

print(group_anagrams(["".join(random.sample("abcde", 5)) for _ in range(100000)]))


Ready to Scale with Python

The brute-force method of grouping anagrams may work for a handful of words, but it quickly becomes a bottleneck in production-scale systems. By switching to a frequency-based key strategy, we not only make our solution faster, but also more scalable and elegant.

After reading this, you should:

* Be able to clearly distinguish between naive and optimized anagram grouping methods.
* Know how to write scalable, clean, and efficient Python code.
* Understand how to tackle related issues in real-world scenarios

Use the optimized solution in your next project or interview and watch how much more confident and capable you feel in tackling string manipulation problems.
 

Become a member
Get the latest news right in your inbox. We never spam!

Read next

Count Distinct Elements in Every Window of Size K

Mastering the Sliding Window: Count Distinct Elements in Every Window of Size K &nbsp; When it comes to <st… Read More

4 hours ago . 4 views

Word Break1

1 day, 4 hours ago . 8 views