TIC
The Interns Company

Find Median from Data Stream

HardAcceptance: 49.8%

Problem Statement

The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values. Design a data structure that supports the following two operations: addNum(num) - Add an integer to the data structure; findMedian() - Return the median of all elements so far.

Constraints:

  • -10^5 <= num <= 10^5
  • There will be at least one element in the data structure before calling findMedian
  • At most 5 * 10^4 calls will be made to addNum and findMedian

Input Format:

  • Commands to either add a number to the data structure or find the median

Output Format:

  • For findMedian() calls, return the median of all elements added so far

Examples:

Example 1:

Input:

MedianFinder medianFinder = new MedianFinder();\nmedianFinder.addNum(1);\nmedianFinder.addNum(2);\nmedianFinder.findMedian(); // return 1.5\nmedianFinder.addNum(3);\nmedianFinder.findMedian(); // return 2.0

Output:

[null, null, null, 1.5, null, 2.0]

Explanation:

First we add 1 to the data structure, then we add 2. The median of [1,2] is 1.5. Then we add 3, making the array [1,2,3]. The median of [1,2,3] is 2.0.

Example 2:

Input:

MedianFinder medianFinder = new MedianFinder();\nmedianFinder.addNum(4);\nmedianFinder.findMedian(); // return 4.0\nmedianFinder.addNum(8);\nmedianFinder.findMedian(); // return 6.0\nmedianFinder.addNum(2);\nmedianFinder.findMedian(); // return 4.0

Output:

[null, null, 4.0, null, 6.0, null, 4.0]

Explanation:

The median of [4] is 4.0. The median of [4,8] is (4+8)/2 = 6.0. The median of [2,4,8] is 4.0.

Solutions

Two Heaps Approach

Time: O(log n) for addNum, O(1) for findMedian
Space: O(n) for storing all numbers

Maintain two heaps: a max heap for the lower half of the numbers and a min heap for the upper half. The median is either the top of the max heap (odd number of elements) or the average of the tops of both heaps (even number of elements).

Ordered Set / Balanced Binary Search Tree Approach

Time: O(log n) for addNum, O(1) for findMedian
Space: O(n) for storing all numbers

Use a balanced binary search tree (BST) or an ordered set to maintain the elements in sorted order. Find the median by accessing the middle element(s) directly.

Insertion Sort Approach

Time: O(n) for addNum, O(1) for findMedian
Space: O(n) for storing all numbers

Maintain a sorted array and insert new elements at the correct position. Find the median by accessing the middle element(s) directly.

Algorithm Walkthrough

Example input: nums = []

Step-by-step execution:

  1. Initialize empty maxHeap and minHeap
  2. Add 1: 1 goes to maxHeap (lower half), maxHeap=[1], minHeap=[]
  3. Add 2: 2 goes to minHeap (upper half), maxHeap=[1], minHeap=[2]
  4. findMedian(): Both heaps have 1 element, median = (1+2)/2 = 1.5
  5. Add 3: 3 goes to minHeap (upper half), maxHeap=[1], minHeap=[2,3]
  6. Balance heaps: Move 2 from minHeap to maxHeap, maxHeap=[1,2], minHeap=[3]
  7. findMedian(): Both heaps have equal elements, median = (2+3)/2 = 2.0

Hints

Hint 1
The naive approach of sorting the array every time you need to find the median is inefficient. Can you maintain a sorted structure?
Hint 2
Consider using heaps/priority queues to keep track of the lower half and upper half of the numbers.
Hint 3
If you use a max heap for the lower half and a min heap for the upper half, you can efficiently find the median.
Hint 4
Make sure both heaps remain balanced, with at most a difference of 1 in size.
Hint 5
If the heaps are balanced, the median is the average of the tops of both heaps. If not, it's the top of the larger heap.

Video Tutorial

Video tutorials can be a great way to understand algorithms visually

Visualization

Visualize the structure of the two heaps and how elements are distributed between them as numbers are added.

Key visualization elements:

  • max heap
  • min heap
  • balance operation
  • median calculation

Implementation Notes

This problem is a classic example of using heaps to maintain a collection of numbers with efficient access to specific statistics. The two-heap approach is elegant and efficient for finding the median in a dynamic set of numbers.