Can implement the Set and Map interfaces.
The Binary-Search-Tree Property: For every node x in the tree,
y.data < x.data
(or y.data <= x.data
for MultiSet, that is, allow duplicates), andx.data < z.data
.protected Node<K> search(K key, Node<K> curr) {
if (curr == null)
return null;
else if (lessThan.test(key, curr.data))
return search(key, curr.left);
else if (lessThan.test(curr.data, key))
return search(key, curr.right);
else
return curr;
}
remove
method of BinarySearchTree
Search for the node x with the key to remove. Then consider the following cases.
| |
x A
\ ==>
A
| |
x A
/ ==>
A
|
x
/ \
A B
Replace x with the node after x wrt. in-order traversal, which is the
first
node y of the subtree B. We then remove y from B.
Goal: maintain balance in a binary search tree.
Definition The AVL Invariant is that the height of two child subtrees may only differ by 1, for every node in the tree.
After inserting a node, move up the tree rebalancing as needed using tree rotations.
y x
/ \ right_rotate(y) / \
x C ---------------> A y
/ \ <------------- / \
A B left_rotate(x) B C
If node x is the lowest node that is not AVL, do the following to rebalance it.
1. if height(x.right.left) ≤ height(x.right.right)
let k = height(x.right.right)
x k+2 y ≤k+2
/ \ left_rotate(x) / \
≤k A y k+1 ===============> ≤k+1 x C k
/ \ / \
≤k B C k ≤k A B ≤k
2. if height(x.right.left) > height(x.right.right)
let k = height(x.right.left)
k+2 x y k+1
/ \ / \
k-1 A z k+1 R(z), L(x) k x z k
/ \ =============> / \ / \
k y D k-1 k-1 A B C D k-1
/ \
B C <k
if height(x.left) > height(x.right)
if height(x.left.left) < height(x.left.right) (note: strictly less!)
let k = height(x.left.right)
x k+2 z k+1
/ \ / \
k+1 y D k-1 L(y), R(x) k y x k
/ \ =============> / \ / \
k-1 A z k A B C D <k
/ \
B C <k
if height(x.left.left) ≥ height(x.left.right) (note: greater-equal!)
let k = height(l.left.left)
x k+2 y k+1
/ \ right_rotate(x) / \
k+1 y C k-1 ===============> k A x k+1
/ \ / \
k A B ≤k ≤k B C k-1
get
, put
, remove
, containsKey
division method (modulo)
h(k) = k mod m
where m is the table size
multiply-add-and-divide method
h(k) = ((a * k + b) mod p) mod m
where p is a prime number larger than m and a,b are randomly chosen integers between 1 and p-1.
___16___
/ \
14 10
/ \ / \
/ \ / \
8 7 9 3
/ \ /
2 4 1
Def. A max heap is a heap in which for every node other than the root, A[i] ≤ A[parent(i)]
Operation: max_heapify(H, i) The tree rooted at i is not a max heap, but the subtrees left(i) and right(i) are max heaps. The resulting tree is a max heap. Percolate the 4 down to the right spot.
___16___
/ \
*4* 10
/ \ / \
/ \ / \
14 7 9 3
/ \ /
2 8 1
becomes
___16___
/ \
14 10
/ \ / \
/ \ / \
8 7 9 3
/ \ /
2 *4* 1
code:
void max_heapify(int i, int length) {
int l = left(i); int r = right(i);
int largest = i;
if (l < length && data[i] < data[l])
largest = l;
if (r < length && data[largest] < data[r])
largest = r;
if (largest != i) {
swap(data, i, largest);
max_heapify(largest, length);
}
}
build_max_heap(A): turns an arbitrary array into a heap
___1____
/ \
2 3
/ \ / \
/ \ / \
4 7 8 9
/ \ /
10 14 16
Max-Heapify the last parent: 7
___1____
/ \
2 3
/ \ / \
/ \ / \
4 16 8 9
/ \ /
10 14 7
Max-Heapify the next parent: 4
___1____
/ \
2 3
/ \ / \
/ \ / \
14 16 8 9
/ \ /
10 4 7
Max-Heapify the next parent: 3
___1____
/ \
2 9
/ \ / \
/ \ / \
14 16 8 3
/ \ /
10 4 7
Max-Heapify the next parent: 2
___1____
/ \
16 9
/ \ / \
/ \ / \
14 7 8 3
/ \ /
10 4 2
Max-Heapify the next parent: 1
___16___
/ \
14 9
/ \ / \
/ \ / \
10 7 8 3
/ \ /
1 4 2
code:
void build_max_heap() {
int last_parent = length / 2 - 1;
for (int i = last_parent; i != -1; --i) {
max_heapify(i);
}
}
Example:
A--->B--->C
| ^ |
V | |
D--->E<---|
Depth-first trees:
A----B----C
| |
| |
D E----|
A B----C
| |
| |
D----E
Code:
static <V> void DFS_visit(Graph<V> G, V u, Map<V,V> parent,
Map<V,Boolean> visited) {
visited.put(u, true);
for (V v : G.adjacent(u))
if (! visited.get(v)) {
parent.put(v, u);
DFS_visit(G, v, parent, visited);
}
}
Example with discover/finish times
1/10 2/7 3/6
A----B----C
| |
| |
D E----|
8/9 4/5
1/10 4/7 5/6
A B----C
| |
| |
D----E
2/9 3/8
Solves the single-source shortest-hops problem What is the shortest path from A to E, A to B? DFS doesn’t guarantee that it’s paths are the shortest.
Algorithm idea: expand tree one level at a time.
Example:
A--->B--->C
| ^ |
V | |
D--->E<---|
Breadth-first trees:
A----B----C
|
|
D----E
Show the queue at each step
Code:
static <V> void BFS(Graph<V> G,
V start,
Map<V,Boolean> visited,
Map<V,V> parent) {
for (V v : G.vertices())
visited.put(v, false);
LinkedList<V> Q = new LinkedList<V>();
Q.add(start);
parent.put(start, start);
visited.put(start, true);
while (Q.size() != 0) {
V u = Q.remove();
for (V v : G.adjacent(u))
if (! visited.get(v)) {
parent.put(v, u);
Q.add(v);
visited.put(v, true);
}
}
}
Example problem:
Which people live in the same state?
example:
Universe: Mary, Katie, Jeremy, Luke, John
{^Mary} {^Katie} {^Jeremy} {^Luke} {^John}
Mary-Katie ! {^Mary, Katie} {^Jeremy} {^Luke} {^John}
Jeremy-Luke ! {^Mary, Katie} {Jeremy, ^Luke} {^John}
Katie-John ! {^Mary, Katie, John} {^Jeremy, Luke}
Mary-John ? Yes
Jeremy-John ? No
Luke-John ! {^Mary, Katie, John,Jeremy, Luke}
Jeremy-John ? Yes
Basic implementation
idea: the set is represented as a forest, with one tree per partition. The root of the tree is the representative.
public class UnionFind<N> implements DisjointSets<N>
{
Map<N,N> parent; // key: an entity, value: the entity's parent
public UnionFind(Map<N,N> p) {
parent = p;
}
public void make_set(N x) {
parent.put(x, x);
}
public N find(N x) {
if (x == parent.get(x))
return x;
else
return find(parent.get(x));
}
public N union(N x, N y) {
N rx = find(x); N ry = find(y);
parent.put(ry, rx);
return rx;
}
}
path compression
public N find(N x) {
if (x == parent.get(x))
return x;
else {
N rep = find(parent.get(x));
parent.put(x, rep);
return rep;
}
}
union-by-rank
public void make_set(N x) {
parent.put(x, x);
rank.put(x, 0);
}
public N union(N x, N y) {
N rx = find(x);
N ry = find(y);
if (rank.get(rx) > rank.get(ry)) {
parent.put(ry, rx);
return rx;
} else {
parent.put(rx, ry);
if (rank.get(ry) == rank.get(rx))
rank.put(ry, rank.get(ry) + 1);
return ry;
}
}
Kruskal’s algorithm
Order all the edges by their weight
initialize a union-find data structure with every vertex by itself
For each edge in this ordering:
If the two vertices connected by the edge have different representatives, then add this edge to the tree and union the sets associated with the two vertices.
Student group work: apply Kruskal’s to the following graph
A--2--B--3--C--1--D
| | | |
3 1 2 5
| | | |
E--4--F--3--G--3--H
| | | |
4 2 4 3
| | | |
I--3--J--3--K--1--L
Solution: sorted edges:
C-D,B-F,K-L, A-B,C-G,F-J, B-C,A-E,F-G,G-H,H-L,I-J,J-K, E-F,E-I,G-K, D-H
An MST of weight 24:
A--2--B--3--C--1--D
| | |
3 1 2 5
| | |
E 4 F 3 G--3--H
| |
4 2 4 3
| |
I--3--J 3 K--1--L
A--1--B--1--C
| |
4 1
| |
D----2------E
What’s the shortest path from
A to D? A-D (4)
A to C? A-B-C (2)
A to E? A-B-C-E (3)
What’s the MST?
A-B-C-E-D (5)
Try to insert gaps (_) to cause the two sequences to match up the best.
Example:
CTGA CTGA_ score = -2 +2 -1 +2 -1 = 0
ATAC AT_AC
C_TGA_ sore = -1 -1 +2 -1 +2 -1 = 0
_AT_AC
Working back-to-front, what choices can be made to characterize all possible alignments? For the last column you can either
The recursive brute-force solution, shown below, tries all three of the above options and picks the best one.
Recursive function for computing a best alignment:
int align(String s1, String s2, int i, int j) {
if (i == 0 && j == 0) {
return 0;
} else if (i == 0) {
return j * -1;
} else if (j == 0) {
return i * -1;
} else {
int M = align(s1, s2, i-1, j-1) + score(s1[i-1],s2[j-1]);
int I = align(s1, s2, i, j-1) - 1;
int D = align(s1, s2, i-1, j) - 1;
return max(M, I, D);
}
}
We can then turn this into a dynamic-programming solution with a table to record the results of subproblems and two loops to proceed in a bottom-up fashion.
T[0][0] = 0
for (int i = 1; i != m + 1; ++i) {
T[i][0] = i * -1
}
for (int j = 1; j != n+1; ++j) {
T[0][j] = j * -1;
}
for (int i = 1; i != m+1; ++i) {
for (int j = 1; j != n+1; ++j) {
int M = T[i-1][j-1] + s(s1[i-1], s2[j-1]);
int D = T[i-1][j] + -1;
int I = T[i][j-1] + -1;
T[(i, j)] = max(M, D, I);
}
}
Now let’s try this on an example: align ACGT and ATAG
j
A T A G
0 I:-1 I:-2 I:-3 I:-4
A D:-1 M:2 I:1 I:0 I:-1
i C D:-2 D:1 M:0 I:-1 D:-2
G D:-3 D:0 I:-1 M:-2 M:1
T D:-4 D:-1 M:2 D:1 D:0
So an answer is
A C _ G T
A T A G _
2-2-1+2-1 = 0