CSCI H343 Data Structures Fall 2023

Review for Final Exam

Topics for Final Exam

Hash tables

Heaps

           ___16___
          /        \
         14          10
        /  \        /  \
       /    \      /    \
      8      7    9      3
     / \    /
    2   4  1

Def. A max heap is a heap in which for every node other than the root, A[i] ≤ A[parent(i)]

Disjoint-sets (aka Union-Find)

Minimum Spanning Trees

Shortest Paths versus Minimum Spanning Trees

A--1--B--1--C
|           |
4           1
|           |
D----2------E

What’s the shortest path from

A to D? A-D (4)
A to C? A-B-C (2)
A to E? A-B-C-E (3)

What’s the MST?

A-B-C-E-D (5)

DNA Sequence Alignment

Try to insert gaps (_) to cause the two sequences to match up the best.

Example:

CTGA    CTGA_  score = -2 +2 -1 +2 -1 = 0
ATAC    AT_AC

        C_TGA_ sore = -1 -1 +2 -1 +2 -1 = 0
        _AT_AC

Working back-to-front, what choices can be made to characterize all possible alignments? For the last column you can either

  1. take the last character of both strings to be aligned, or
  2. add a gap to s1 and take the last character of s2. This can be thought of as an “insertion”.
  3. add a gap to s2 and take the last character of s1. This can be thought of as an “deletion”.

The recursive brute-force solution, shown below, tries all three of the above options and picks the best one.

Recursive function for computing a best alignment:

    int align(String s1, String s2, int i, int j) {
       if (i == 0 && j == 0) {
          return 0;
       } else if (i == 0) {
          return j * -1;
       } else if (j == 0) {
          return i * -1;
       } else {
          int M = align(s1, s2, i-1, j-1) + score(s1[i-1],s2[j-1]);
          int I = align(s1, s2, i, j-1) - 1;
          int D = align(s1, s2, i-1, j) - 1;
          return max(M, I, D);
       }
    }

We can then turn this into a dynamic-programming solution with a table to record the results of subproblems and two loops to proceed in a bottom-up fashion.

    T[0][0] = 0
    for (int i = 1; i != m + 1; ++i) {
        T[i][0] = i * -1
    }
    for (int j = 1; j != n+1; ++j) {
        T[0][j] = j * -1;
    }
    for (int i = 1; i != m+1; ++i) {
        for (int j = 1; j != n+1; ++j) {
            int M = T[i-1][j-1] + s(s1[i-1], s2[j-1]);
            int D = T[i-1][j] + -1;
            int I = T[i][j-1] + -1;
            T[(i, j)] = max(M, D, I);
        }
    }

Now let’s try this on an example: align ACGT and ATAG

                    j
                A    T     A     G
         0    I:-1  I:-2  I:-3  I:-4
      A D:-1  M:2   I:1   I:0   I:-1
    i C D:-2  D:1   M:0   I:-1  D:-2
      G D:-3  D:0   I:-1  M:-2  M:1
      T D:-4  D:-1  M:2   D:1   D:0

So an answer is

    A C _ G T
    A T A G _
    2-2-1+2-1 = 0