Detecting Duplicates in an Array
Detecting duplicates is key and finds application in:~
1) Inventory items
2) Any dataset
3) Validating a list of user IDS
This article will break down a Python solution to detect duplicates, explain its time and space complexity, and optimise it further for clarity and performance.
If you're working on algorithms or preparing for coding interviews, this explanation will give you a solid understanding of the logic behind duplicate detection.
Understanding the Initial Code
Below is the initial implementation of a function to check for duplicates in an array:
class Solution:
def hasDuplicate(self, nums: List[int]) -> bool:
dp = {}
for i in nums:
if i not in dp:
dp[i] = i
else:
return True
return False
Code Explanation
Initialize a Dictionary (dp)
The dp dictionary is used to keep track of the elements we've seen in the array.
Iterate Through the Array
For each element in nums, check if it already exists in dp.
Check for Duplicates
If the element is not in dp, add it as a key.
If the element already exists in dp, return True immediately, as a duplicate is found.
Return False if No Duplicates Are Found
If the loop completes without finding any duplicates, return False.
Time and Space Complexity of the Initial Code
Time Complexity
Dictionary Lookup (if i not in dp): Checking the existence of a key in a dictionary is O(1) on average. However, it can degrade to O(n) in rare cases due to hash collisions.
Iterating Through the Array: The loop runs for n elements where n is the length of nums.
Total Time Complexity:
On average: O(n) times O(1)=O(n) on average,
In rare cases: O(n^2) due to hash collisions.
Space Complexity
Dictionary Storage: The dictionary dp can store at most n keys -> A single one for each array's unique number.
This leads to a linear complexity O(n).
Optimized Solution
Let's eliminate redundant operations and streamline the logic.
class Solution:
def hasDuplicate(self, nums: List[int]) -> bool:
"""
Checks if the given list contains any duplicates.
Args:
nums (List[int]): A list of integers.
Returns:
bool: True if duplicates exist, False otherwise.
"""
seen = set() # Use a set to track seen numbers.
for num in nums:
if num in seen:
return True # Duplicate found.
seen.add(num) # Add the number to the set.
return False # No duplicates found.
Breaking Down the Optimized Code
1. Using a set for Efficiency
A set stores unique values.
In comparison to a dict, checking for membership in a set is faster (O(1).
2. Iterate and Check Membership
While iterating over the list:
- If a number is already in the set, it's a duplicate, and we return True.
- If it does not exist ->add the number to the set for future checks.
3. Return False at the End
If no duplicates are found during the iteration, the function returns False.
Key Advantages of the Optimised Solution
Improved Readability: Using a set makes the logic clearer and reduces unnecessary complexity.
Better Performance: Sets are more efficient for membership checks than dictionaries, especially for larger lists.
Fewer Lines of Code: The optimized version achieves the same functionality with fewer lines, making it easier to maintain.
Step-by-Step Example
Let's walk through an example:
# Input:
nums = [1, 2, 3, 4, 5, 2]
Process:
Start with an empty set: seen = set()
Check each number in the list:
1: Not in seen, add it (seen = {1})
2: Not in seen, add it (seen = {1, 2})
3: Not in seen, add it (seen = {1, 2, 3})
4: Not in seen, add it (seen = {1, 2, 3, 4})
5: Not in seen, add it (seen = {1, 2, 3, 4, 5})
2: Already in seen. Return True.
# Output:True
Why does this optimisation work?
Time Complexity: O(n) on average
- Set Lookup and Insertion: Checking membership (num in seen) and adding an element (seen.add(num)) are both O(1) on average.
- Loop Through the Array: The loop runs for n elements.
Space Complexity: O(n)
- The seen set stores at most n unique elements.
Key Differences Between the Initial and Optimised Code
Data Structure:
- The optimised solution uses a set instead of a dictionary.
- A set is more appropriate for checking membership and does not require storing values
- -> save space &
- -> improving readability.
Simplified Logic:
- Cleaner
- Concise code &
- No if-else statements
Applications
It is useful in large dataset processing -> checking for duplicates efficiently.
These include:~
a) Data validation
b) Managing unique identifiers in databases
c) Detecting anomalies
Conclusion
Try implementing these solutions with your datasets to detect duplicates.
Whether you're validating user input or ensuring data accuracy, this solution is versatile and efficient.
Try applying this approach to solve other list-related challenges.
Experiment with larger datasets to see how the solution performs, and explore variations like finding all duplicates instead of just detecting one.
For more coding tips and optimised solutions, stay tuned and keep refining your skills.
If you're preparing for technical interviews, practice writing clean, efficient code like the optimised example above!