42. Find the Duplicate Number
View on NeetCode
View on LeetCode
Problem
Given an array of integers nums containing n + 1 integers where each integer is in the range [1, n] inclusive.
There is only one repeated number in nums, return this repeated number.
You must solve the problem without modifying the array nums and uses only constant extra space.
Example 1:
Input: nums = [1,3,4,2,2]
Output: 2
Example 2:
Input: nums = [3,1,3,4,2]
Output: 3
Constraints:
1 <= n <= 10^5nums.length == n + 11 <= nums[i] <= n- All the integers in
numsappear only once except for precisely one integer which appears two or more times.
Follow up:
- How can we prove that at least one duplicate number must exist in
nums? - Can you solve the problem in linear runtime complexity?
Solution
Approach: Floyd’s Cycle Detection
The key insight is to treat the array as a linked list where nums[i] points to index nums[i]. Since there’s a duplicate, there must be a cycle. We use Floyd’s algorithm to find the entrance to the cycle, which is the duplicate number.
Algorithm:
- Phase 1: Find intersection point in cycle (slow and fast pointers)
- Phase 2: Find entrance to cycle (start from head and intersection)
- The entrance is the duplicate number
Implementation
class Solution:
def findDuplicate(self, nums: List[int]) -> int:
# Phase 1: Find intersection point in the cycle
slow = fast = nums[0]
while True:
slow = nums[slow]
fast = nums[nums[fast]]
if slow == fast:
break
# Phase 2: Find entrance to the cycle
slow = nums[0]
while slow != fast:
slow = nums[slow]
fast = nums[fast]
return slow
Alternative (Binary Search on Count):
class Solution:
def findDuplicate(self, nums: List[int]) -> int:
left, right = 1, len(nums) - 1
while left < right:
mid = (left + right) // 2
# Count numbers <= mid
count = sum(num <= mid for num in nums)
# If count > mid, duplicate is in [left, mid]
if count > mid:
right = mid
else:
left = mid + 1
return left
Complexity Analysis
Floyd’s Algorithm:
- Time Complexity: O(n), where n is the length of the array. Both phases take linear time.
- Space Complexity: O(1), we only use two pointers.
Binary Search:
- Time Complexity: O(n log n), we do O(log n) iterations and count elements in O(n) each iteration.
- Space Complexity: O(1).
Key Insights
-
Array as Linked List: We can treat the array as an implicit linked list: position i points to position nums[i]. Since values are in range [1, n], index 0 is never pointed to (it’s the starting point).
-
Cycle Exists: With n+1 numbers in range [1, n], by pigeonhole principle, at least one number repeats. This creates a cycle in our implicit linked list.
-
Why Floyd’s Works: The duplicate number is the entrance to the cycle. Numbers before the duplicate form the “tail,” and all numbers that can reach the duplicate form the cycle.
-
Two-Phase Process: Phase 1 finds any point in the cycle. Phase 2 uses the mathematical property that the distance from start to cycle entrance equals the distance from intersection to cycle entrance.
-
Binary Search Intuition: For a number x, if there are more than x numbers ≤ x, then the duplicate must be ≤ x (pigeonhole principle).