Data Structures and Algorithms

Find the Duplicate Number in a List: From Simple to Smart

2 days, 10 hours ago ; F(visit_count) + Value(1) views
Share this

Find the Duplicate Number in a List: From Simple to Smart


As a seasoned software engineer with hands-on experience in building scalable systems and optimizing performance-critical code, I often work on algorithm optimization and space/time tradeoffs.

In this article, we tackle a common algorithm problem: how to find a duplicate number in a list.

It is a frequent question in technical interviews and high-performance applications. We'll start with the most straightforward solution and work toward the most efficient one, explaining each step clearly.

The Problem

You are given a list of numbers. Some numbers might be repeated. Your job is to find any one of the duplicate numbers.

Let's look at two types of approaches:

  • The simple approach uses extra memory.
  • The optimized approach uses constant memory.
  • Simple but Effective: Using a Set to Track Seen Numbers

 

from typing import List

def find_duplicate_number(nums: List[int]) -> int:
    """
    Find a duplicate number using a set to track previously seen values.

    Args:
        nums (List[int]): List of integers.

    Returns:
        int: A duplicate number found in the list.
    """
    seen = set()
    for num in nums:
        if num in seen:
            return num
        seen.add(num)

print("Expected: 2, Actual:", find_duplicate_number([1,3,4,2,2] ))
print("Expected: 3, Actual:", find_duplicate_number([3,1,3,4,2]))
print("Expected: 9999, Actual:", find_duplicate_number([i for i in range(1, 10**5)] + [9999]))
print("Expected: 10**6, Actual:", find_duplicate_number([10**6] + list(range(1,10**6))))
print("Expected: 1, Actual:", find_duplicate_number([1]*10**6 ))


How it Works

  • We create an empty set called seen.
  • We check every number in the list if it's already in seen.
  • If it is, we return it immediately as a duplicate.
  • Otherwise, we add the number to the set.

Performance

  • Time complexity: O(n)
  • Space complexity: O(n)

This solution is simple and works for any input. However, it uses extra memory.

More innovative Approach: Floyd's Tortoise and Hare (Cycle Detection)

We can do better if the input list follows a specific rule — that the numbers are from 1 to n and only one duplicate exists. We can use a famous algorithm: Floyd's Tortoise and Hare.

Assumptions:

  • The list has n + 1 integers.
  • Each integer is between 1 and n.
  • Exactly one duplicate exists.

 


from typing import List

def find_duplicate_number(nums: List[int]) -> int:
    """
    Find the duplicate number using Floyd's Tortoise and Hare cycle detection algorithm.
    
    This algorithm assumes that the list has exactly one duplicate and numbers range from 1 to n.

    Args:
        nums (List[int]): List of integers.

    Returns:
        int: The duplicate number.
    """
    # Phase 1: Detect the cycle (intersection point)
    slow = fast = nums[0]
    while True:
        slow = nums[slow]  # move one step
        fast = nums[nums[fast]]  # move two steps
        if slow == fast:
            break

    # Phase 2: Find entrance to the cycle (duplicate)
    slow = nums[0]
    while slow != fast:
        slow = nums[slow]
        fast = nums[fast]

    return slow

print("Expected: 2, Actual:", find_duplicate_number([1,3,4,2,2] ))
print("Expected: 3, Actual:", find_duplicate_number([3,1,3,4,2]))
print("Expected: 9999, Actual:", find_duplicate_number([i for i in range(1, 10**5)] + [9999]))
print("Expected: 1, Actual:", find_duplicate_number([1]*10**6 ))


How it Works

Imagine the numbers as links in a linked list.

A duplicate creates a cycle.

We use two pointers moving at different speeds to find the start of the cycle.

Performance

  • Time complexity: O(n)
  • Space complexity: O(1) — no extra memory!

Visual Aid

Imagine a race track where one racer runs at 1x speed and another at 2x speed. They will eventually meet inside a cycle; from there, we can trace back to the duplicate.

Which Should You Use?

Approach                                         Time                                                           Space                                Works For

Set-Based                                         O(n)                                                              O(n)                                  Any list with possible multiple dups    
Floyd’s Cycle Detection                  O(n)                                                              O(1)                                  One duplicate, values from 1 to n only    

If you don't have constraints on value ranges and possible duplicates, stick to the set-based approach. If memory is tight and you meet the conditions, use Floyd's algorithm.

When Performance Matters: Apply the Right Tool

Choosing the right solution depends on your context. Floyd's algorithm is your friend if you're working on data-heavy applications where space matters (like embedded systems or mobile apps). 

The set-based approach is safer and easier to implement for general-purpose and simpler scenarios.

If you're preparing for coding interviews or optimizing real-world systems, practice both methods and understand when to use each. Save this reference and revisit whenever you need a refresh.

Become a member
Get the latest news right in your inbox. We never spam!

Read next

How to Calculate Product of Array Except Self

How to Calculate Product of Array Except Self — From Basic to Blazing Fast As a softw… Read More

2 days, 9 hours ago . 150 views