Back to index

Heap data structures in CS

In this post, I am going to tell you about different heap data structures that are common in Computer Science. I will start with the basic heap structure, i.e. the binary heap, and then discuss binomial and Fibonacci heaps. The code for each data structure is in Python. For each heap structure, only the most important methods are given for the sake of brevity - for other methods/functions consult the references given at the end. Note that the term "heap", as used here, has nothing to do with the memory heap that is a part of the runtime environment of languages like Python and C.

Binary heap

A binary heap is a binary tree that satisfies 2 properties -

  1. Heap order property There are 2 kinds of heaps - max-heap and min-heap. For a max-heap, any element (or node) other than the root is at most equal to its parent. Similarly, in a min-heap, a non-root element is at least equal to its parent.

  2. Complete binary tree property - The heap tree is a complete binary tree - in which all levels except the last are necessarily full and the nodes in the last level occupy leftmost positions. In other words, each level i of the tree, has 2^i nodes except the last level which may have fewer nodes (occupying leftmost positions). In an array A representing a heap with n nodes, the nodes are the successive elements from A[0] to A[n-1].

A diagram of a max-heap is shown below. It satisfies the above properties.

Max heap diagram
A max heap

 

The code that builds a heap takes as input an array (list) of numbers. It makes a copy of that list and builds a heap out of it.

class BinaryHeap:
    def __init__(self, input):
        '''Build a binary heap from iterable of keys.'''

        self.heap = list(input)
        self.build_max_heap()

    def build_max_heap(self):
        '''Bottom-up heap construction.'''

        self.last_index = len(self.heap) - 1
        parent_of_last = self.parent(self.last_index)
        for i in range(parent_of_last, -1, -1):
            self.max_heapify(i)

    def parent(self, i):
        '''Return index of parent of element with index i.'''

        return (i-1)//2

    def left(self, i):
        '''Return index of left-child of element with index i.'''

        return 2*i + 1

    def right(self, i):
        '''Return index of right-child of element with index i.'''

        return 2*i + 2

    def max_heapify(self, i):
        '''Downheap bubbling of an element.'''

        l = self.left(i)
        r = self.right(i)

        if l <= self.last_index and self.heap[l] > self.heap[i]:
            largest = l
        else:
            largest = i

        if r <= self.last_index and self.heap[r] > self.heap[largest]:
            largest = r

        if largest != i:
            self.heap[i], self.heap[largest] = self.heap[largest], self.heap[i]
            self.max_heapify(largest)

We construct the heap with downheap-bubbling on all elements from the parent of the last element upto the root. The elements that we don't consider are already max-heaps of one element each. max_heapify is a simple function that that takes an index and compares the value at that index with the values in the left and right children of the node (with index i). If the largest of the three is not at the root of the subtree, the function recurs with largest.

max_heapify has a running time of O(h), where h is the height of the binary tree. A complete binary tree of n nodes has height floor(log n). So the efficiency of the algorithm is O(log n). Since max_heapify is repeated O(n) times, so the heap is built in time O(nlog n). But this is not a tight bound. It can be shown that the running time of build_max_heap function is linear, i.e. O(n). This method of building the heap is a bottom-up construction.

A binary heap is used in heapsort and to build efficient priority queues.

Binomial heap

A binomial heap is a collection of binomial trees. It is essential to understanding Fibonacci heaps which are discussed later. Binomial heaps and Fibonacci heaps are known as mergeable heaps as they define an efficient union operation. Otherwise, performance-wise they are similar to binary heaps.

A binomial tree \(B_k\) has the following properties:

  1. There are \(2^k\) nodes

  2. The height of the tree is k.

  3. There are exactly \({k \choose i}\) nodes at depth i for i = 0,1,...k where \({k \choose i} = \frac {k!}{i!(k-i)!}\)

  4. The root has the highest degree, k. The k children of the root are subtrees of degree k - 1, k - 2, ..., 1, 0, from left to right.

"Binomial trees" are so named because of property 3 above which is the formula for binomial coefficient. The binomial tree \(B_0\) consists of a single node. The tree \(B_k\) is formed by linking 2 \(B_{k-1}\) trees - the root of one such tree has the other tree as its leftmost child. The diagram below shows the trees \(B_0\), \(B_1\), \(B_2\) and \(B_3\).

Binomial trees diagram
Binomial trees of degree 0, 1, 2 and 3

 

A binomial heap H is a set of binomial trees with the following properties -

  1. Nodes of a tree are ordered - in min-heap ordering, any node has a key that is equal to or greater than that of its parent. (Similarly, there can be max heap ordering.)

  2. The degrees of the roots of the constituent trees are unique.

The following diagram shows a binomial heap with trees \(B_0\), \(B_2\) and \(B_3\) -

Binomial heap diagram
Binomial heap with 3 binomial trees
From: CLRS. See Ref. 1

 

Binomial heaps are represented with the left-child, right-sibling representation. Each node consists of a parent node pointer p, a key field, a degree field, a left-child node pointer child and a right-sibling node pointer sibling. In this way, we avoid having to store pointers to all children in a parent node. The representation is shown below:

Binomial heap representation diagram
Representation of the binomial heap shown above
From: CLRS. See Ref. 1

 

The code shown below is for the union of 2 binomial heaps as it is the operation that sets such heaps apart from binary heaps. The union method uses 2 other methods - link and merge. link links 2 binomial trees with the same root degree. merge merges the root lists of 2 heaps giving a heap with possibly repeated root degrees.

class Node:
    '''A node of a binomial tree in left-child, right-sibling representation.'''
    def __init__(self, key):
        self.p = None
        self.key = key
        self.degree = 0
        self.child = None
        self.sibling = None

class BinomialHeap:
    def __init__(self):
        '''Empty binomial heap.'''
        self.head = None

    @staticmethod
    def link(r1, r2):  # 2 root nodes. r2 becomes the resulting root
        '''Make binomial tree with root r1 the left-child of binomial tree
        with root r2.'''
        r1.p = r2
        r1.sibling = r2.child
        r2.child = r1
        r2.degree += 1

    @staticmethod
    def merge(h1, h2):  # parameters are 2 binomial heaps
        '''Merge the root lists of 2 binomial heaps.'''
        x = h1.head
        y = h2.head
        start = z = Node(0)  # dummy node
        while x is not None and y is not None:
            if x.degree <= y.degree:
                z.sibling = x
                x = x.sibling
            else:
                z.sibling = y
                y = y.sibling
            z = z.sibling
        if x is None:
            z.sibling = y
        if y is None:
            z.sibling = x
        return start.sibling  # The actual start of the merged root list

    @staticmethod
    def union(h1, h2):  # parameters are 2 binomial heaps
        '''Merge 2 binomial heaps.'''
        h = BinomialHeap()
        h.head = merge(h1, h2) # h1 and h2 objects can be deleted afterwards
        if h.head is None:
            return h
        prev_x = None
        x = h.head
        next_x = x.sibling

        while next_x is not None:
            if ((x.degree != next_x.degree) or  # 1
                ((next_x.sibling is not None) and
                 (next_x.sibling.degree == x.degree)))  # 2
                     prev_x = x
                     x = next_x
            elif x.key <= next_x.key:
                x.sibling = next_x.sibling
                BinomialHeap.link(next_x, x)  # 3
            else:
                if prev_x is None:
                    h.head = next_x
                else:
                    prev_x.sibling = next_x
                BinomialHeap.link(x, next_x)  # 4
                x = next_x
            next_x = x.sibling

        return h

The union method works as follows. First an empty binomial heap is created. The 2 input heaps h1 and h2 are merged with the merge method which simply merges 2 linked lists. 3 pointers prev_x, x and next_x are set-up to point to the previous, current and next root nodes. In this algorithm, at most 3 successive root nodes can have the same degree (though right after the merge method, there can be only pairs of successive roots with same degree). The if statements at 1 and 2 check if 2 consecutive nodes are of different degrees or if 3 consecutive nodes have the same degree. In these 2 cases, the pointers are simply moved ahead by a step. If the conditions 1 and 2 are false, then either of 2 cases arise. If next_x has a greater key, then the tree at next_x is made a child of the tree at x by the link method at 3. But if x has a greater key, then it is made a child of the next_x tree by the link method at 4. Before 4, there is an if to adjust the pointers taking into account the starting scenario.

The running time of the union method is O(log n), based on the fact that the number of roots in a heap of size n nodes is at most \(\lfloor log n\rfloor + 1\). This is an improvement over the O(n) time of merging 2 binary heaps.

Fibonacci heap

Finally, we consider Fibonacci heaps. A Fibonacci heap is a collection of ordered (either min-heap or max-heap) trees. It is loosely based on the binomial heap. It is used in fast algorithms for finding minimum spanning trees and for the single-source shortest path problem. However because of the complexity of programming of such heaps, it does not find wide usage. The term "Fibonacci" in the name comes from the analysis of the running time, as will be explained later.

When only mergeable heap operations are involved (namely, make-heap, insert, minimum, extract-min and union), the trees in a Fibonacci heap are unordered binomial trees. An unordered binomial tree is a binomial tree whose root node of degree k has child nodes of degrees k-1, k-2, ... 1, 0 in no specific order. When decrease-key or delete operations are involved, we can get trees that violate the unordered binomial tree properties. Such a heap is shown in the diagram below.

Performance-wise Fibonacci heaps are an improvement over binomial heaps. The amortized cost of most of the operations, including union, is O(1). (Loosely speaking, amortization is a technique to compute the average cost spread over all the operations.) The reason behind this is that in all operations except extract-min, the "consolidation" of the heap is postponed until later. It is performed during the extract-min operation. As this operation is the most important one for this heap, we will examine the code for it.

The diagram below shows an example of a Fibonacci heap.

Fibonacci heap diagram
A Fibonacci heap
From: CLRS. See Ref. 1

 

A Fibonacci heap is represented in code with the following structure. A node within the heap contains a number of pointers apart from the key key, the degree degree and a boolean field mark described below. p is the parent pointer, child is the pointer to any one of the node's children. The children of a node are in a circular doubly-linked list with left and right being pointers to the siblings of a node.

mark is set to True when a node loses a child node after it is made a child of another node. It is set to False for newly created nodes and nodes that are made a child of another. The Fibonacci heap shown above has 3 marked nodes. In this article, we will not consider operations which remove child nodes, so the mark field will always be False.

The heap object will have 2 fields. A min field will point to the node in the root list with the minimum key and n will be the number of nodes currently in the heap. The code to perform the extract-min operation is shown below.

from math import floor, log

class FibonacciHeap:
    def __init__(self):
        '''Empty Fibonacci heap.'''

        self.min = None
        self.n = 0

    def extract_min(self):
        '''Remove and return the node with the minimum key from the heap.'''

        z = self.min
        if z is not None:
            x = z.child
            if x is not None:
                while x.right is not x:
                    x.right.p = None
                    x = x.right
                x.p = None
                self.concatenate_lists(x)
            self.delete_from_root_list(z)
            if z is z.right:
                self.min = None
            else:
                self.min = z.right
                self.consolidate()
            self.n -= 1
        return z

    def concatenate_lists(self, x):
        '''Concatenate the child list of the min node with the root list.'''

        x.left.right = self.min
        y = x.left
        x.left = self.min.left
        self.min.left.right = x
        self.min.left = y

    def delete_from_root_list(self, node):
        '''Delete node from the heap. The node will retain its pointers.'''

        node.left.right = node.right
        node.right.left = node.left

    def consolidate(self):
        '''Consolidate the heap by successively linking nodes of the same
        degree in the root list.'''

        a = [None] * (floor(log(self.n, 1.618))+1)  # 1.618 is the golden ratio
        w = self.min
        while w not in a:
            x = w
            d = x.degree
            while a[d] is not None:
                y = a[d]
                if x.key > y.key:
                    x, y = y, x
                self.link(y, x)
                a[d] = None
                d += 1
            a[d] = x
            w = x.right
        self.min = None
        z = None
        for i in range(len(a)):
            if a[i] is not None:
                z = self.add_to_list(z, a[i])
                if self.min is None or a[i].key < self.min.key:
                    self.min = a[i]

    @staticmethod
    def add_to_list(z, node):
        '''Add node to the beginning of the list pointed to by z.'''

        if z is None:
            node.left = node.right = node
            return node
        node.right = z
        n = z.left
        z.left = node
        n.right = node
        node.left = n
        return node

    def link(self, y, x):
        '''Make root node y a child of root node x.'''

        self.delete_from_root_list(y)
        x.child = self.add_to_list(x.child, y)
        y.p = x
        y.mark = False

The extract_min operation removes and returns the min node. It adds the child nodes of the min node to the root list. This is what concatenate_lists method does. It is a simple procedure to concatenate 2 circular doubly-linked lists. If the min node is the only node in the heap, then set min to None and end the extract_min method by returning the extracted node. If not, then set min to a temporary node value and consolidate the heap. Decrease the count of nodes by 1 and return the extracted node.

The consolidate method starts off by setting up an auxiliary list of size equal to the maximum degree possible of any node in the heap. The formula uses the golden ratio as the logarithm base (1.618). Then for each node in the root list, it checks the list a, with its degree as the index, if there is a node listed. If found, it links the 2 nodes x and y according to the keys in the link method. Then the a list entry is made None and higher indexes are checked. On exit from the inner while loop, the list index on exit is set to the consolidated root node. In this way, list a is built up. After the outer while loop, min is set to None and a new root list is built up from the nodes in a. min is set to the node with the least key.

If k is the degree of any node in a heap of size n, then it can be shown that \(k <= \lfloor log_\phi n \rfloor\) where \(\phi\) is the golden ratio i.e. \(\phi = (1 + \sqrt {5})/2\). The \(n^{th}\) Fibonacci number is the integer closest to \(\phi^n\), and hence the name of this heap structure.


Now you have a good idea about binary, binomial and Fibonacci heaps. Binary heaps are used more often than binomial or Fibonacci heaps. There are other advanced heap structures that I haven't gone into, like the relaxed heap. It offers better performance than the Fibonacci heap. You can read about it here.

References

  1. Introduction to algorithms - second edition. By Cormen, Leiserson, Rivest and Stein

  2. Data structures and algorithms in Python - Wiley student edition. By Goodrich, Tamassia and Goldwasser

Back to index