Course web page for Data Structures H343 Fall 2022
View the Project on GitHub IUDataStructuresCourse/course-web-page-fall-2022
Create helper functions to test the data structure invariants:
Test the BST property
Make sure the tree contains the correct keys
Test the parent pointers
Test the AVL property
Test that the height of the tree is O(log n).
Create a helper function to test all the operations for reading the tree:
Test next(), previous()
Test keys()
Test search() and contains()
Test isEmpty() and size()
Create helper functions for testing the tree mutating operations:
Test insert() Test all the read operations after each insert.
Test remove() Test all the read operations after each remove.
Test on lots of different trees.
Special cases: removing a node with two children removing the root node removing an internal node four cases of AVL fixups (LL, LR, RL, RR)
Randomized trees
Java’s HashMap
and HashSet
classes are implemented with hash tables
Most modern languages have hash tables built-in or in the standard library.
The Map Abstract Data Type (aka. “dictionary”)
interface Map<K,V> {
V get(K key);
V put(K key, V value);
V remove(K key);
boolean containsKey(K key);
}
Motivation: maps are everywhere!
We could implement the Map ADT with AVL Trees, then get()
is O(log(n)).
Map
implementationIf keys are integers, we can use an array.
Store items in array indexed by key (draw picture) use None to indicate absense of key.
What’s good?
get()
is O(1)
What’s bad?
Prehashing fixes problem 1 by mapping everything to integers. (Textbook calls this the creation of a hash code.)
In Java, o.hashCode()
computes the prehash of object o
.
Ideally: x.hashCode() == y.hashCode()
iff x
and y
are the same
object (but sometimes different objects have the same hash code)
User-definable: a class can override the hashCode
method, and should
do so if you are overriding the equals
method.
Algorithm for prehashing a string (aka. polynomial hash code) Map each character to one digit in a number. But there are 256 different characters, not 10. So we use a different base.
prehash_string('ab') == 97 * 256 + 98
prehash_string('abc') == 97 * (256^2) + 98 * (256^1) + 99
Hashing fixes problem 2 (reduce memory consumption).
The word “hash” is from cooking: “a finely chopped mixture”.
Towards proving that the average case time is O(1).
Simple Uniform Hashing assumption (mostly true but not completely true)
each key is equally likely to be hashed to each slot of the table independent of where other keys land. (uniformity and idependence)
what’s the expected length of a chain?
n keys in m slots: n/m = alpha (load factor)
Search:
total for search: O(1 + alpha)
Takeaway: need to grow table size m as n increases so that alpha stays small.
need to be careful about choice of table size m
if not, may not use all of the table
table size 4 (slots 0..3)
suppose the keys are all even: 0,2,..
0 -> 0 (0 mod 4 = 0)
2 -> 2 (2 mod 4 = 2)
4 -> 0 (4 mod 4 = 0)
6 -> 2 (6 mod 4 = 2)
8 -> 0 (8 mod 4 = 0)
...
Never use slot 1 and 3.
Good to choose a prime number for m, not close to
a power of 2 or 10.
h(k) = ((a * k + b) mod p) mod m
where
Using the division method and chaining, insert the keys 4, 1, 3, 2, 0 into a hash table with table size 3 (m=3).
Solution:
0 -> [0,3]
1 -> [1,4]
2 -> [2]