View on NeetCode
View on LeetCode

Problem

The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values.

Implement the MedianFinder class:

  • MedianFinder() initializes the MedianFinder object.
  • void addNum(int num) adds the integer num from the data stream to the data structure.
  • double findMedian() returns the median of all elements so far.

Example 1:

Input:
["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"]
[[], [1], [2], [], [3], []]
Output:
[null, null, null, 1.5, null, 2.0]

Explanation:
MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1);    // arr = [1]
medianFinder.addNum(2);    // arr = [1, 2]
medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2)
medianFinder.addNum(3);    // arr = [1, 2, 3]
medianFinder.findMedian(); // return 2.0

Constraints:

  • -10^5 <= num <= 10^5
  • There will be at least one element in the data structure before calling findMedian.
  • At most 5 * 10^4 calls will be made to addNum and findMedian.

Follow up:

  • If all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?
  • If 99% of all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?

Solution

Approach: Two Heaps

The key insight is to use two heaps: a max heap for the smaller half and a min heap for the larger half. The median is always at the top of one or both heaps.

Implementation

import heapq

class MedianFinder:
    def __init__(self):
        self.small = []  # Max heap (negate values)
        self.large = []  # Min heap

    def addNum(self, num: int) -> None:
        # Add to small heap first
        heapq.heappush(self.small, -num)

        # Ensure all elements in small <= all elements in large
        if self.small and self.large and (-self.small[0] > self.large[0]):
            val = -heapq.heappop(self.small)
            heapq.heappush(self.large, val)

        # Balance sizes (small can have at most 1 more element)
        if len(self.small) > len(self.large) + 1:
            val = -heapq.heappop(self.small)
            heapq.heappush(self.large, val)
        if len(self.large) > len(self.small):
            val = heapq.heappop(self.large)
            heapq.heappush(self.small, -val)

    def findMedian(self) -> float:
        if len(self.small) > len(self.large):
            return -self.small[0]
        return (-self.small[0] + self.large[0]) / 2

Complexity Analysis

  • Time Complexity:
    • addNum: O(log n), for heap operations
    • findMedian: O(1), just accessing heap tops
  • Space Complexity: O(n) for storing all elements in heaps.

Key Insights

  1. Two Heap Strategy: Max heap stores smaller half, min heap stores larger half. Median is at the “meeting point” of these heaps.

  2. Size Balance: We maintain |size(small) - size(large)| <= 1. This ensures we can find median from heap tops.

  3. Order Property: All elements in small heap <= all elements in large heap. This is crucial for correctness.

  4. Median Cases:
    • Odd total: median is top of larger heap (small in our case)
    • Even total: median is average of both heap tops
  5. Follow-up Optimizations:
    • Range [0,100]: Use counting sort array of size 101
    • 99% in [0,100]: Use counting array + heap for outliers