<Data Structures/Algorithms> A Complete Guide to Understanding Heaps 我一定要学会【堆】-第k大

<Data Structures/Algorithms> A Complete Guide to Understanding Heaps 我一定要学会【堆】-第k大

A Complete Guide to Understanding Heaps

Every time I write an algorithm involving heaps, I need to review the basics again. Today I am determined to fully understand it.

What is a Heap

A heap is a data structure in which elements are organized logically as a complete binary tree, but the underlying storage is implemented using an array. In other words, a tree structure is represented with an array. There are two types of heaps:

  • Max-Heap: Every node is greater than or equal to its two children.
  • Min-Heap: The opposite of a max-heap.

Here we will use the max-heap as an example.

The order of elements in the array corresponds to a level-order traversal of the heap. Since a heap is a complete binary tree, the relationship between a node and its children in the array can be determined as follows:

  • Parent index: k
  • Left child index: 2k + 1
  • Right child index: 2k + 2

Heap Adjustment

Due to the special structure of heaps, we must first understand the adjustment strategies. Heap adjustment transforms a non-heap structure into a valid heap. There are two types of adjustment: upward adjustment and downward adjustment.

Upward Adjustment

Also called “swim” or “bubble up.” It happens when a new element is inserted at the bottom of the heap (the end of the array). The new element needs to be moved to its correct position. This is done by comparing it with its parent and swapping if it is larger. The process repeats until the parent is larger, and the element stays in place.

1
2
3
4
5
6
7
8
9
10
void adjustUp(vector<int>& heap, int child){
int parent = (child - 1) / 2;
while(parent >= 0 && heap[child] > heap[parent]){
int tmp = heap[parent];
heap[parent] = heap[child];
heap[child] = tmp;
child = parent;
parent = (child - 1) / 2;
}
}

Downward Adjustment

Also called “sink.” It happens when the root element changes. The root is compared with its two children. If the parent is not the largest, the largest child replaces it as the new parent. The process continues until the parent is greater than both children.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
void adjustDown(vector<int>& heap, int parent){
int lchild = 2 * parent + 1;
int rchild = lchild + 1;
while(lchild < heap.size()){
int largest = lchild;

if(rchild >= heap.size() || heap[lchild] > heap[rchild])
largest = rchild;

if(heap[parent] > heap[largest])
break;

int tmp = heap[parent];
heap[parent] = heap[largest];
heap[largest] = tmp;

parent = largest;
lchild = 2 * parent + 1;
rchild = 2 * parent + 2;
}
}

Inserting Elements

With both adjustment strategies, inserting an element becomes simple: place it at the end of the heap and perform upward adjustment.

1
2
3
4
void heapInsert(vector<int>& heap, int val){
heap.push_back(val);
adjustUp(heap, heap.size() - 1);
}

Building a Heap

Isn’t building a heap just inserting all elements one by one?

1
2
3
4
5
void buildHeap(vector<int>& heap, vector<int>& nums){
for(int i = 0; i < nums.size(); i++){
heapInsert(heap, nums[i]);
}
}

However, this approach performs many unnecessary operations. In fact, we only need to adjust downward starting from the last non-leaf node up to the root. The number of adjustments is logarithmic (O(log n)).

1
2
3
4
5
void buildHeap(vector<int>& nums){
for(int i = (nums.size() - 1) / 2; i >= 0; i--){
adjustDown(nums, i);
}
}

This is amazing—it builds the heap directly in place. Starting from the second-to-last level, each subtree is adjusted to satisfy the max-heap property, then moving upward, ensuring the whole heap eventually satisfies the max-heap property.

Deleting Elements

Deleting any element works the same way: swap it with the last element, shrink the heap size, and that element is effectively removed. The swapped element (originally at the end) now sits in the heap but may violate the heap property. Depending on its relation to its parent and children, it needs either upward or downward adjustment.

In practice, we usually delete the root (maximum element). Since the new root is likely smaller than its children, we perform downward adjustment.

Note: sometimes we only need to conceptually remove the root and place it at the end of the array without actually deleting it. In this case, we can use an index to represent the effective end of the heap.

1
2
3
4
5
6
void heapPop(vector<int>& heap, int end){
int tmp = heap[end];
heap[end] = heap[0];
heap[0] = tmp;
adjustDown(heap, 0);
}

Heap Application – k-th Largest Element

Heaps are often used for finding the k-th largest element. The basic idea is to repeatedly remove the maximum. After k removals, we obtain the k-th largest.

1
2
3
4
5
6
7
8
9
int findKth(vector<int>& nums, int k){
buildHeap(nums);
int end = nums.size() - 1;
for(int i = 0; i < k; i++){
heapPop(nums, end);
end--;
}
return end + 1;
}

However, this is somewhat inefficient since it requires building the heap and multiple adjustments. A better approach is to maintain a min-heap of size k. The root of this heap is the k-th largest element. For each new number, compare it with the root:

  • If it is smaller, ignore it.
  • If it is larger, replace the root and adjust downward.

After processing all numbers, the heap root will be the k-th largest.

1
2
3
4
5
6
7
8
9
10
11
12
13
int findKth(vector<int>& nums, int k){
vector<int> heap;
for(int i = 0; i < k; i++){
heapInsert(heap, nums[k]); // Note: heapInsert here should be written for a min-heap
}
for(int i = k; i < nums.size(); i++){
if(nums[i] > heap[0]){
heap[0] = nums[i]; // Replace root
adjustDown(heap, 0); // Adjust downward
}
}
return heap[0];
}
中文原文 每次写关于堆的算法都要重新复习一遍,今天一定要搞清楚。 ### 什么是堆 堆是一种数据结构,其中的数据以完全二叉树的逻辑组织,但是存储数据的结构使用的是数组。也就是说,要用数组表示一个树的结构。堆分为两种:
  • 大根堆:树中任何一个节点大于或等于他的两个孩子节点。
  • 小根堆:与大根堆相反。

我们这里以大根堆为例。

堆在数组中存放的顺序是层序,也就是说先存第一层,再存第二层,……由于堆是完全二叉树,所以在数组中存放时,每个节点与其左右孩子节点的位置关系都是可以确定的。

  • 父节点位置:k
  • 左孩子位置:2k+1
  • 右孩子位置:2k+2

堆的调整

由于堆的结构特殊性,所以先要了解堆的调整策略。堆的调整是将不符合堆的结构调整成符合堆的数据结构,分为向上调整和向下调整。

向上调整

向上调整又叫上浮,发生在堆底(也就是末尾)插入新元素的时候,这时要把插入的元素调整到正确的位置,具体做法是确定父节点的位置,与之比较,如果比父节点大就和父节点交换,直到比父节点小,被管住了,就老实呆在那里了。代码如下

1
2
3
4
5
6
7
8
9
10
void adjustUp(vector<int>& heap, int child){
int parent = (child - 1) / 2;
while(parent >= 0 && heap[child] > heap[parent]){
int tmp = heap[parent];
heap[parent] = heap[child];
heap[child] = tmp;
child = parent;
parent = (child - 1) / 2;
}
}

向下调整

向下调整又叫下沉,发生在堆顶元素改变时。此时需要堆顶元素与两个孩子比较,假如父节点元素不是最大的,说明父节点管不住孩子了,需要将其中最大的孩子扶持为新的堆顶,如此循环直到父节点比他的两个孩子都大。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
void adjustDown(vector<int>& heap, int parent){
int lchild = 2 * parent + 1;
int rchild = lchild + 1;
while(lchild < heap.size()){
int largest = lchild;

if(rchild >= heap.size() || heap[lchild] > heap[rchild])
largest = rchild;

if(heap[parent] > heap[largest])
break;

int tmp = heap[parent];
heap[parent] = heap[largest];
heap[largest] = tmp;

parent = largest;
lchild = 2 * parent + 1;
rchild = 2 * parent + 2;
}
}

插入元素

现在有了两种调整堆的方法,很容易就能插入元素。把元素放到堆底并向上调整就行。

1
2
3
4
void heapInsert(vector<int>& heap, int val){
heap.push_back(val);
adjustUp(heap, heap.size() - 1);
}

堆的创建

那创建堆不就是一个一个元素往堆里面插入吗?

1
2
3
4
5
void buildHeap(vector<int>& heap, vector<int>& nums){
for(int i = 0; i < nums.size(); i++){
heapInsert(heap, nums[i]);
}
}

但是这样其实多做了很多无意义的操作,事实上,只要从孩子一步一步往父节点调整就够了,调整次数为logn,即:

1
2
3
4
5
void buildHeap(vector<int>& nums){
for(int i = (nums.size() - 1) / 2; i >= 0; i--){
adjustDown(nums, i);
}
}

太神奇了,这甚至是在原数组上就进行了,相当于对于倒数第二层的节点,首先调整以他们为堆顶的子堆,满足大根堆的性质(使用向下调整),然后调整他们父母为顶的子堆,满足大根堆的性质,直到调整到整个堆的顶部,就能保证整个堆满足了大根堆的性质。

删除元素

删除任何一个元素都是一样的,只需要把这个元素放到最后,然后把堆的大小一缩,诶,那个元素就被挡在外面了。那么原本末尾的元素呢,就放到删掉的元素的原本的位置,也就是位置互换了。互换之后末尾的元素还在堆内,但是不满足堆的性质了,所以需要根据和父节点、两个子节点的大小关系判断该选择上浮还是下沉。但是一般我们常用的就是删除堆顶,那堆顶都删没了,新进来的元素肯定没有第二层的大呀,所以需要下沉调整。

这里注意,有的时候我们只是需要形式上把堆顶拿出来,放在后面,并不需要真的把它从这个数组中删掉,只是不在堆中了。这时我们可以使用一个元素来代表堆的末尾指针。

1
2
3
4
5
6
void heapPop(vector<int>& heap, int end){
int tmp = heap[end];
heap[end] = heap[0];
heap[0] = tmp;
adjustDown(heap, 0);
}

堆的应用-第k大元素

堆在找第k大这种问题上应用比较多。具体就是使用我们的删除操作,每轮删掉一个最大的,这样删k轮就找到第k大了。

1
2
3
4
5
6
7
8
9
int findKth(vector<int>& nums, int k){
buildHeap(nums);
int end = nums.size() - 1;
for(int i = 0; i < k; i++){
heapPop(nums, end);
end--;
}
return end + 1;
}

但是其实这样是绕了弯路的,因为需要首先建堆,然后一次一次地调整。如果我们换一种思路呢?我们可以用大根堆维护所有的元素,那我们也可以用一个小根堆维护前k大的元素,这个时候小根堆的堆顶就是第k大的元素,每来一个数就和堆顶比较一下,如果比堆顶小,那么说明数组中已经发现了k个比堆顶元素大的数,那么堆顶显然就不可能是前k大的元素之一了,应该被踢掉。如此遍历整个数组,就能得到整个数组前k大的数构成的小根堆,堆顶就是第k大的元素。

1
2
3
4
5
6
7
8
9
10
11
12
13
int findKth(vector<int>& nums, int k){
vector<int> heap;
for(int i = 0; i < k; i++){
heapInsert(heap, nums[k]); //注意这里的heapInsert应该根据小根堆来写
}
for(int i = k; i < nums.size(); i++){
if(nums[i] > heap[0]){
heap[0] = nums[i]; //替换掉堆顶
adjustDown(heap, 0); //向下调整
}
}
return heap[0];
}

<Data Structures/Algorithms> A Complete Guide to Understanding Heaps 我一定要学会【堆】-第k大

http://example.com/2024/06/19/我一定要学会【堆】/

Author

John Doe

Posted on

2024-06-19

Updated on

2025-09-06

Licensed under

You need to set install_url to use ShareThis. Please set it in _config.yml.
You forgot to set the business or currency_code for Paypal. Please set it in _config.yml.

Comments

You forgot to set the shortname for Disqus. Please set it in _config.yml.