
the ideas behind hashtable is: 1) input the key to hash function, output the bucket index; 2) adding the element
the memory usage of hashtable is continuous bucket array, then each bucket is a linked list. when first thought about the “continue array”, really is a vector, then why need hash function. That’s the point, hashtable allows the benefits of both vector and linked list.
a few interesting topics about hash.
1)why prime numbers? the fundamental mathematical operations with prime numbers generally result in numbers who’s bit biases are close to random. in other word, when multiply or add a set of randome numbers by a prime number, the resulting numbers when as a group, statistically analyzed at their bit levels should show no bias towards being one state or another. in comupter science, pseudo random number generator has bit bias.
2) string hash/scattering,
basically the hash functions should deal with each chracter in the input string, so no information about this string will be missed, and the information entropy after hashing operator suppose increase.
3) Blizzard hash function. not all hash function deal collision conflict with linked list, here is the example.
4) hash map benchmark performance review
one feeling at this moment, so many brilliant and deep-focusing guys there, stay foolish and stay humble.
std::unordered_map
there is map vs unordered_map :
map | unordered_map
Ordering | increasing order | no ordering
| (by default) |
Implementation | Self balancing BST | Hash Table
search time | log(n) | O(1) ->Average
| | O(n) -> Worst
Insertion time | log(n) + Rebalance | Same as search
Deletion time | log(n) + Rebalance | Same as search
basically map works for traversal, pre/post element access; unordered_map is quick in once search,insertion, deletion. here is a dictionary demo
example
word ladder, hash table is used to find a special word from dictionary. the benefit of unorder_set is O(1) find. so design a unorder_set to store the dict.
in the following example, each character position in the word requires 25 times search of the dict. also every marked word from the dict, should not be searched second time.
another variable is used to track the list of one_character_diff from current word, it’s like BFS. so design as a queue, which only need take care the neighbor two layers.
|
|




近期评论