70. Find Median from Data Stream
View on NeetCode
View on LeetCode
Problem
The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values.
Implement the MedianFinder class:
MedianFinder()initializes theMedianFinderobject.void addNum(int num)adds the integernumfrom the data stream to the data structure.double findMedian()returns the median of all elements so far.
Example 1:
Input:
["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"]
[[], [1], [2], [], [3], []]
Output:
[null, null, null, 1.5, null, 2.0]
Explanation:
MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1); // arr = [1]
medianFinder.addNum(2); // arr = [1, 2]
medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2)
medianFinder.addNum(3); // arr = [1, 2, 3]
medianFinder.findMedian(); // return 2.0
Constraints:
-10^5 <= num <= 10^5- There will be at least one element in the data structure before calling
findMedian. - At most
5 * 10^4calls will be made toaddNumandfindMedian.
Follow up:
- If all integer numbers from the stream are in the range
[0, 100], how would you optimize your solution? - If
99%of all integer numbers from the stream are in the range[0, 100], how would you optimize your solution?
Solution
Approach: Two Heaps
The key insight is to use two heaps: a max heap for the smaller half and a min heap for the larger half. The median is always at the top of one or both heaps.
Implementation
import heapq
class MedianFinder:
def __init__(self):
self.small = [] # Max heap (negate values)
self.large = [] # Min heap
def addNum(self, num: int) -> None:
# Add to small heap first
heapq.heappush(self.small, -num)
# Ensure all elements in small <= all elements in large
if self.small and self.large and (-self.small[0] > self.large[0]):
val = -heapq.heappop(self.small)
heapq.heappush(self.large, val)
# Balance sizes (small can have at most 1 more element)
if len(self.small) > len(self.large) + 1:
val = -heapq.heappop(self.small)
heapq.heappush(self.large, val)
if len(self.large) > len(self.small):
val = heapq.heappop(self.large)
heapq.heappush(self.small, -val)
def findMedian(self) -> float:
if len(self.small) > len(self.large):
return -self.small[0]
return (-self.small[0] + self.large[0]) / 2
Complexity Analysis
- Time Complexity:
addNum: O(log n), for heap operationsfindMedian: O(1), just accessing heap tops
- Space Complexity: O(n) for storing all elements in heaps.
Key Insights
-
Two Heap Strategy: Max heap stores smaller half, min heap stores larger half. Median is at the “meeting point” of these heaps.
-
Size Balance: We maintain
|size(small) - size(large)| <= 1. This ensures we can find median from heap tops. -
Order Property: All elements in small heap <= all elements in large heap. This is crucial for correctness.
- Median Cases:
- Odd total: median is top of larger heap (small in our case)
- Even total: median is average of both heap tops
- Follow-up Optimizations:
- Range [0,100]: Use counting sort array of size 101
- 99% in [0,100]: Use counting array + heap for outliers