hash table

the ideas behind hashtable is: 1) input the key to hash function, output the bucket index; 2) adding the element to the bucket[idx]. usually design the bucket[idx] as a linked list, allowing mulitply keys in the same bucket, which is called “collesion chaining”. 3) hashtable has three basic operators: insert/put, search/get, and remove.

the memory usage of hashtable is continuous bucket array, then each bucket is a linked list. when first thought about the “continue array”, really is a vector, then why need hash function. That’s the point, hashtable allows the benefits of both vector and linked list.

a few interesting topics about hash.

1)why prime numbers? the fundamental mathematical operations with prime numbers generally result in numbers who’s bit biases are close to random. in other word, when multiply or add a set of randome numbers by a prime number, the resulting numbers when as a group, statistically analyzed at their bit levels should show no bias towards being one state or another. in comupter science, pseudo random number generator has bit bias.

2) string hash/scattering,
basically the hash functions should deal with each chracter in the input string, so no information about this string will be missed, and the information entropy after hashing operator suppose increase.

3) Blizzard hash function. not all hash function deal collision conflict with linked list, here is the example.

4) hash map benchmark performance review

one feeling at this moment, so many brilliant and deep-focusing guys there, stay foolish and stay humble.

std::unordered_map

there is map vs unordered_map :

map      |     unordered_map

Ordering        | increasing  order   | no ordering
                | (by default)        |

Implementation  | Self balancing BST  | Hash Table

search time     | log(n)    | O(1) ->Average 
                |           | O(n) -> Worst 

Insertion time  | log(n) + Rebalance  | Same as search

Deletion time   | log(n) + Rebalance  | Same as search

basically map works for traversal, pre/post element access; unordered_map is quick in once search,insertion, deletion. here is a dictionary demo

example

word ladder, hash table is used to find a special word from dictionary. the benefit of unorder_set is O(1) find. so design a unorder_set to store the dict.

in the following example, each character position in the word requires 25 times search of the dict. also every marked word from the dict, should not be searched second time.

another variable is used to track the list of one_character_diff from current word, it’s like BFS. so design as a queue, which only need take care the neighbor two layers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
int (string start, string end, vector<string>& wordList)
{
int length = 2 ;
unordered_set<string> dict ;
foreach word in wordList:
dict.insert(word)
queue<string> one_diff_list;
one_diff_list.push(start);
while(!one_diff_list.empty()){
int size = one_diff_list.size();
for(int i=0; i<size; i++){
string word=one_diff_list.front();
one_diff_list.pop();
for(int i=0; i<start.length(); i++){
char oldChar = word[i];
for(char c='a'; c<='z'; c++){
if(c == oldChar) continue ;
word[i] = c ;
if(dict.find(word) != dict.end()){
if(word == end)
{ //find the match
return length;
}
one_diff_list.push(word);
dict.erase(word);
}
}
word[i] = oldChar;
}
}
length++;
}
return 0;
}