CSCI H343 Data Structures Fall 2024

Review for Final Exam

Topics for Final Exam

Binary Search Trees
AVL Trees
Hash tables
Heaps, Priority Queues
Binomial Queues
Graphs
- Adjacency List, Adjacency Matrix
- BFS, DFS
- Shortest Paths: Dijkstra’s
- Connected Components
- Minimum Spanning Trees: Kruskal’s
Union-find (aka. Disjoint Sets)
Dynamic Programming
- Rod cutting
- DNA Sequence Alignment
- 0-1 Knapsack

Binary Search Trees

Can implement the Set and Map interfaces.

The Binary-Search-Tree Property: For every node x in the tree,

if node y is in the left subtree of x, then y.data < x.data (or y.data <= x.data for MultiSet, that is, allow duplicates), and
if node z is in the right subtree of x, then x.data < z.data.

protected Node<K> search(K key, Node<K> curr) {
	if (curr == null)
		return null;
	else if (lessThan.test(key, curr.data))
		return search(key, curr.left);
	else if (lessThan.test(curr.data, key))
		return search(key, curr.right);
	else
		return curr;
}

`remove` method of `BinarySearchTree`

Search for the node x with the key to remove. Then consider the following cases.

Case 1: no left child

          |              |
          x              A
           \       ==>
            A

Case 2: no right child

            |            |
            x            A
           /       ==>
          A

Case 3: two children

Replace x with the node after x wrt. in-order traversal, which is the first node y of the subtree B. We then remove y from B.

AVL Trees

Goal: maintain balance in a binary search tree.

Definition The AVL Invariant is that the height of two child subtrees may only differ by 1, for every node in the tree.

After inserting a node, move up the tree rebalancing as needed using tree rotations.

Tree Rotation

                y                         x
               / \    right_rotate(y)    / \
              x   C  --------------->   A   y
             / \     <-------------        / \
            A   B     left_rotate(x)      B   C

Rebalancing Algorithm

If node x is the lowest node that is not AVL, do the following to rebalance it.

1. if height(x.right.left) ≤ height(x.right.right)

    let k = height(x.right.right)

                x k+2                                y ≤k+2
               / \          left_rotate(x)          / \
           ≤k A   y k+1     ===============>  ≤k+1 x   C k
                 / \                              / \
             ≤k B   C k                       ≤k A   B ≤k

2. if height(x.right.left) > height(x.right.right)

    let k = height(x.right.left)

          k+2 x                               y k+1
             / \                            /   \
        k-1 A   z k+1    R(z), L(x)      k x     z k
               / \      =============>    / \   / \
            k y   D k-1              k-1 A   B C   D k-1
             / \
            B   C <k

if height(x.left) > height(x.right)

if height(x.left.left) < height(x.left.right) (note: strictly less!)

let k = height(x.left.right)

         x k+2                                 z k+1
        / \                                  /   \
   k+1 y   D k-1      L(y), R(x)          k y     x k
      / \            =============>        / \   / \
 k-1 A   z k                              A   B C   D    <k
        / \
       B   C <k

if height(x.left.left) ≥ height(x.left.right) (note: greater-equal!)

let k = height(l.left.left)

       x k+2                                  y k+1
      / \         right_rotate(x)            / \
 k+1 y   C k-1    ===============>        k A   x k+1
    / \                                        / \
 k A   B ≤k                                ≤k B   C k-1

Hash tables

Dictionary/Map ADT: get, put, remove, containsKey
Direct-address tables
Prehashing: e.g. mapping strings to integers
Hash functions
- division method (modulo)
  
  h(k) = k mod m
  
  where m is the table size
- multiply-add-and-divide method
```
  h(k) = ((a * k + b) mod p) mod m
```
  where p is a prime number larger than m and a,b are randomly chosen integers between 1 and p-1.
Collision handling by separate chaining

Heaps

           ___16___
          /        \
         14          10
        /  \        /  \
       /    \      /    \
      8      7    9      3
     / \    /
    2   4  1

Def. A max heap is a heap in which for every node other than the root, A[i] ≤ A[parent(i)]

Operation: max_heapify(H, i) The tree rooted at i is not a max heap, but the subtrees left(i) and right(i) are max heaps. The resulting tree is a max heap. Percolate the 4 down to the right spot.

             ___16___
            /        \
          *4*         10
         /  \        /  \
        /    \      /    \
       14      7   9      3
      / \    /
     2   8  1

becomes

             ___16___
            /        \
          14          10
         /  \        /  \
        /    \      /    \
       8      7    9      3
      / \    /
     2  *4* 1

code:

void max_heapify(int i, int length) {
    int l = left(i); int r = right(i);
    int largest = i;
    if (l < length && data[i] < data[l])
        largest = l;
    if (r < length && data[largest] < data[r])
        largest = r;
    if (largest != i) {
        swap(data, i, largest);
        max_heapify(largest, length);
    }
}

build_max_heap(A): turns an arbitrary array into a heap

         ___1____
        /        \
       2          3
     /  \        / \
    /    \      /   \
   4      7    8     9
  / \    /
10   14 16

Max-Heapify the last parent: 7

         ___1____
        /        \
       2          3
     /  \        / \
    /    \      /   \
   4      16    8     9
  / \    /
10   14 7

Max-Heapify the next parent: 4

         ___1____
        /        \
       2          3
     /  \        / \
    /    \      /   \
   14     16   8     9
  / \    /
10   4  7

Max-Heapify the next parent: 3

         ___1____
        /        \
       2          9
     /  \        / \
    /    \      /   \
   14     16   8     3
  / \    /
10   4  7

Max-Heapify the next parent: 2

         ___1____
        /        \
      16          9
     /  \        / \
    /    \      /   \
   14     7    8     3
  / \    /
10   4  2

Max-Heapify the next parent: 1

         ___16___
        /        \
      14          9
     /  \        / \
    /    \      /   \
   10     7    8     3
  / \    /
 1   4  2

code:

void build_max_heap() {
    int last_parent = length / 2 - 1;
    for (int i = last_parent; i != -1; --i) {
        max_heapify(i);
    }
}

Depth-first search

Depth-first search is a graph search algorithm.
From a given starting node, it determines which other nodes are reachable from it.

Example:

  A--->B--->C
  |    ^    |
  V    |    |
  D--->E<---|

Depth-first trees:

  A----B----C
  |         |
  |         |
  D    E----|

  A    B----C
  |    |
  |    |
  D----E

Code:

  static <V> void DFS_visit(Graph<V> G, V u, Map<V,V> parent,
                Map<V,Boolean> visited) {
  visited.put(u, true);
  for (V v : G.adjacent(u))
      if (! visited.get(v)) {
      parent.put(v, u);
      DFS_visit(G, v, parent, visited);
      }
  }

Example with discover/finish times

  1/10  2/7  3/6
  A----B----C
  |         |
  |         |
  D    E----|
  8/9  4/5

  1/10  4/7  5/6
  A    B----C
  |    |
  |    |
  D----E
  2/9  3/8

Breadth-first Search

Solves the single-source shortest-hops problem What is the shortest path from A to E, A to B? DFS doesn’t guarantee that it’s paths are the shortest.
Algorithm idea: expand tree one level at a time.

Example:

  A--->B--->C
  |    ^    |
  V    |    |
  D--->E<---|

Breadth-first trees:

  A----B----C
  |
  |
  D----E

Show the queue at each step

Code:

    static <V> void BFS(Graph<V> G, 
                        V start, 
                        Map<V,Boolean> visited,
                        Map<V,V> parent) {
        for (V v : G.vertices())
            visited.put(v, false);
        LinkedList<V> Q = new LinkedList<V>();
        Q.add(start);
        parent.put(start, start);
        visited.put(start, true);
        while (Q.size() != 0) {
            V u = Q.remove();
            for (V v : G.adjacent(u))
                if (! visited.get(v)) {
                    parent.put(v, u);
                    Q.add(v);
                    visited.put(v, true);
                }
        }
    }

Disjoint-sets (aka Union-Find)

Example problem:

Which people live in the same state?

You’re given assertions that two people live in the same state.
You’re asked, based on the assertions so far, whether two people definitely live in the same state.

example:

      Universe: Mary, Katie, Jeremy, Luke, John

                            {^Mary} {^Katie} {^Jeremy} {^Luke} {^John}
      Mary-Katie !       {^Mary, Katie} {^Jeremy} {^Luke} {^John}
      Jeremy-Luke !      {^Mary, Katie} {Jeremy, ^Luke} {^John}
      Katie-John !       {^Mary, Katie, John} {^Jeremy, Luke}
      Mary-John ?    Yes
      Jeremy-John ?  No
      Luke-John !           {^Mary, Katie, John,Jeremy, Luke}
      Jeremy-John ?  Yes

Basic implementation

idea: the set is represented as a forest, with one tree per partition. The root of the tree is the representative.

  public class UnionFind<N> implements DisjointSets<N>
  {
      Map<N,N> parent; // key: an entity, value: the entity's parent

      public UnionFind(Map<N,N> p) {
          parent = p; 
      }
      public void make_set(N x) {
          parent.put(x, x);
      }
      public N find(N x) {
          if (x == parent.get(x))
              return x;
          else 
              return find(parent.get(x));
      }
      public N union(N x, N y) {
          N rx = find(x); N ry = find(y);
          parent.put(ry, rx);
          return rx;
      }
  }

path compression

  public N find(N x) {
      if (x == parent.get(x))
          return x;
      else {
          N rep = find(parent.get(x));
          parent.put(x, rep);
          return rep;
      }
  }

union-by-rank

  public void make_set(N x) {
      parent.put(x, x);
      rank.put(x, 0);
  }

  public N union(N x, N y) {
      N rx = find(x);
      N ry = find(y);
      if (rank.get(rx) > rank.get(ry)) {
          parent.put(ry, rx);
          return rx;
      } else {
          parent.put(rx, ry);
          if (rank.get(ry) == rank.get(rx))
              rank.put(ry, rank.get(ry) + 1);
          return ry;
      }
  }

Minimum Spanning Trees

Kruskal’s algorithm
1. Order all the edges by their weight
2. initialize a union-find data structure with every vertex by itself
3. For each edge in this ordering:
  
  If the two vertices connected by the edge have different representatives, then add this edge to the tree and union the sets associated with the two vertices.

Student group work: apply Kruskal’s to the following graph

  A--2--B--3--C--1--D
  |     |     |     |
  3     1     2     5
  |     |     |     |
  E--4--F--3--G--3--H
  |     |     |     |
  4     2     4     3
  |     |     |     |
  I--3--J--3--K--1--L

Solution: sorted edges:

  C-D,B-F,K-L,  A-B,C-G,F-J, B-C,A-E,F-G,G-H,H-L,I-J,J-K, E-F,E-I,G-K, D-H

An MST of weight 24:

   A--2--B--3--C--1--D
   |     |     |
   3     1     2     5
   |     |     |
   E  4  F  3  G--3--H
         |           |
   4     2     4     3
         |           |
   I--3--J  3  K--1--L

Shortest Paths versus Minimum Spanning Trees

A--1--B--1--C
|           |
4           1
|           |
D----2------E

What’s the shortest path from

A to D? A-D (4)
A to C? A-B-C (2)
A to E? A-B-C-E (3)

What’s the MST?

A-B-C-E-D (5)

DNA Sequence Alignment

Try to insert gaps (_) to cause the two sequences to match up the best.

Example:

CTGA    CTGA_  score = -2 +2 -1 +2 -1 = 0
ATAC    AT_AC

        C_TGA_ sore = -1 -1 +2 -1 +2 -1 = 0
        _AT_AC

Working back-to-front, what choices can be made to characterize all possible alignments? For the last column you can either

take the last character of both strings to be aligned, or
add a gap to s1 and take the last character of s2. This can be thought of as an “insertion”.
add a gap to s2 and take the last character of s1. This can be thought of as an “deletion”.

The recursive brute-force solution, shown below, tries all three of the above options and picks the best one.

Recursive function for computing a best alignment:

    int align(String s1, String s2, int i, int j) {
       if (i == 0 && j == 0) {
          return 0;
       } else if (i == 0) {
          return j * -1;
       } else if (j == 0) {
          return i * -1;
       } else {
          int M = align(s1, s2, i-1, j-1) + score(s1[i-1],s2[j-1]);
          int I = align(s1, s2, i, j-1) - 1;
          int D = align(s1, s2, i-1, j) - 1;
          return max(M, I, D);
       }
    }

We can then turn this into a dynamic-programming solution with a table to record the results of subproblems and two loops to proceed in a bottom-up fashion.

    T[0][0] = 0
    for (int i = 1; i != m + 1; ++i) {
        T[i][0] = i * -1
    }
    for (int j = 1; j != n+1; ++j) {
        T[0][j] = j * -1;
    }
    for (int i = 1; i != m+1; ++i) {
        for (int j = 1; j != n+1; ++j) {
            int M = T[i-1][j-1] + s(s1[i-1], s2[j-1]);
            int D = T[i-1][j] + -1;
            int I = T[i][j-1] + -1;
            T[(i, j)] = max(M, D, I);
        }
    }

Now let’s try this on an example: align ACGT and ATAG

                    j
                A    T     A     G
         0    I:-1  I:-2  I:-3  I:-4
      A D:-1  M:2   I:1   I:0   I:-1
    i C D:-2  D:1   M:0   I:-1  D:-2
      G D:-3  D:0   I:-1  M:-2  M:1
      T D:-4  D:-1  M:2   D:1   D:0

So an answer is

    A C _ G T
    A T A G _
    2-2-1+2-1 = 0

CSCI H343 Data Structures Fall 2024

Review for Final Exam

Topics for Final Exam

Binary Search Trees

remove method of BinarySearchTree

AVL Trees

Tree Rotation

Rebalancing Algorithm

Hash tables

Heaps

Depth-first search

Breadth-first Search

Disjoint-sets (aka Union-Find)

Minimum Spanning Trees

Shortest Paths versus Minimum Spanning Trees

DNA Sequence Alignment

`remove` method of `BinarySearchTree`