CSCI H343 Data Structures Fall 2024

Review for Final Exam

Topics for Final Exam

Binary Search Trees

Can implement the Set and Map interfaces.

The Binary-Search-Tree Property: For every node x in the tree,

  1. if node y is in the left subtree of x, then y.data < x.data (or y.data <= x.data for MultiSet, that is, allow duplicates), and
  2. if node z is in the right subtree of x, then x.data < z.data.
protected Node<K> search(K key, Node<K> curr) {
	if (curr == null)
		return null;
	else if (lessThan.test(key, curr.data))
		return search(key, curr.left);
	else if (lessThan.test(curr.data, key))
		return search(key, curr.right);
	else
		return curr;
}

remove method of BinarySearchTree

Search for the node x with the key to remove. Then consider the following cases.

          |              |
          x              A
           \       ==>
            A
            |            |
            x            A
           /       ==>
          A
             |
             x
            / \
           A   B

Replace x with the node after x wrt. in-order traversal, which is the first node y of the subtree B. We then remove y from B.

AVL Trees

Goal: maintain balance in a binary search tree.

Definition The AVL Invariant is that the height of two child subtrees may only differ by 1, for every node in the tree.

After inserting a node, move up the tree rebalancing as needed using tree rotations.

Tree Rotation

                y                         x
               / \    right_rotate(y)    / \
              x   C  --------------->   A   y
             / \     <-------------        / \
            A   B     left_rotate(x)      B   C

Rebalancing Algorithm

If node x is the lowest node that is not AVL, do the following to rebalance it.

1. if height(x.right.left) ≤ height(x.right.right)

    let k = height(x.right.right)

                x k+2                                y ≤k+2
               / \          left_rotate(x)          / \
           ≤k A   y k+1     ===============>  ≤k+1 x   C k
                 / \                              / \
             ≤k B   C k                       ≤k A   B ≤k

2. if height(x.right.left) > height(x.right.right)

    let k = height(x.right.left)

          k+2 x                               y k+1
             / \                            /   \
        k-1 A   z k+1    R(z), L(x)      k x     z k
               / \      =============>    / \   / \
            k y   D k-1              k-1 A   B C   D k-1
             / \
            B   C <k

Hash tables

Heaps

           ___16___
          /        \
         14          10
        /  \        /  \
       /    \      /    \
      8      7    9      3
     / \    /
    2   4  1

Def. A max heap is a heap in which for every node other than the root, A[i] ≤ A[parent(i)]

Disjoint-sets (aka Union-Find)

Minimum Spanning Trees

Shortest Paths versus Minimum Spanning Trees

A--1--B--1--C
|           |
4           1
|           |
D----2------E

What’s the shortest path from

A to D? A-D (4)
A to C? A-B-C (2)
A to E? A-B-C-E (3)

What’s the MST?

A-B-C-E-D (5)

DNA Sequence Alignment

Try to insert gaps (_) to cause the two sequences to match up the best.

Example:

CTGA    CTGA_  score = -2 +2 -1 +2 -1 = 0
ATAC    AT_AC

        C_TGA_ sore = -1 -1 +2 -1 +2 -1 = 0
        _AT_AC

Working back-to-front, what choices can be made to characterize all possible alignments? For the last column you can either

  1. take the last character of both strings to be aligned, or
  2. add a gap to s1 and take the last character of s2. This can be thought of as an “insertion”.
  3. add a gap to s2 and take the last character of s1. This can be thought of as an “deletion”.

The recursive brute-force solution, shown below, tries all three of the above options and picks the best one.

Recursive function for computing a best alignment:

    int align(String s1, String s2, int i, int j) {
       if (i == 0 && j == 0) {
          return 0;
       } else if (i == 0) {
          return j * -1;
       } else if (j == 0) {
          return i * -1;
       } else {
          int M = align(s1, s2, i-1, j-1) + score(s1[i-1],s2[j-1]);
          int I = align(s1, s2, i, j-1) - 1;
          int D = align(s1, s2, i-1, j) - 1;
          return max(M, I, D);
       }
    }

We can then turn this into a dynamic-programming solution with a table to record the results of subproblems and two loops to proceed in a bottom-up fashion.

    T[0][0] = 0
    for (int i = 1; i != m + 1; ++i) {
        T[i][0] = i * -1
    }
    for (int j = 1; j != n+1; ++j) {
        T[0][j] = j * -1;
    }
    for (int i = 1; i != m+1; ++i) {
        for (int j = 1; j != n+1; ++j) {
            int M = T[i-1][j-1] + s(s1[i-1], s2[j-1]);
            int D = T[i-1][j] + -1;
            int I = T[i][j-1] + -1;
            T[(i, j)] = max(M, D, I);
        }
    }

Now let’s try this on an example: align ACGT and ATAG

                    j
                A    T     A     G
         0    I:-1  I:-2  I:-3  I:-4
      A D:-1  M:2   I:1   I:0   I:-1
    i C D:-2  D:1   M:0   I:-1  D:-2
      G D:-3  D:0   I:-1  M:-2  M:1
      T D:-4  D:-1  M:2   D:1   D:0

So an answer is

    A C _ G T
    A T A G _
    2-2-1+2-1 = 0