HashMap - Yige

HashMap#

1. Basic Concepts#

In JDK8 and later versions, HashMap introduced a red-black tree structure, changing its underlying data structure from array + linked list to array + linked list and array + red-black tree.

Overall structure of HashMap:

Red-black tree structure of HashMap after JDK1.8

Conversion to Red-Black Tree#

Reference: https://juejin.im/entry/5839ad0661ff4b007ec7cc7a
Several parameters:

// The threshold for treeification of a bucket; when the number of elements in the bucket exceeds this value, red-black tree nodes will replace linked list nodes
static final int TREEIFY_THRESHOLD = 8;
// The threshold for un-treeifying a tree; when expanding, if the number of elements in the bucket is less than this value, the tree will be restored (split) into a linked list structure
static final int UNTREEIFY_THRESHOLD = 6;
// The minimum capacity for treeification of the hash table; when the capacity of the hash table exceeds this value, the buckets can be treeified; otherwise, if there are too many elements in the bucket, it will expand instead of treeifying
// To avoid conflicts between expansion and treeification, this value cannot be less than 4 * TREEIFY_THRESHOLD
static final int MIN_TREEIFY_CAPACITY = 64;

When adding elements, by default, if the number of linked list nodes in the bucket exceeds 7, the linked list will attempt to convert to a red-black tree:


// If the capacity of HashMap is less than 64, it will choose to expand instead of directly converting to a red-black tree
if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY) {
            resize();
}
// TREEIFY_THRESHOLD defaults to 8
if (binCount >= TREEIFY_THRESHOLD - 1) {
    treeifyBin(tab, hash);
}

When the resize() method is called to expand, if the current bucket structure is a red-black tree and the number of elements is less than the linked list un-treeification threshold UNTREEIFY_THRESHOLD (default is 6), it will call the split() method to reduce the tree structure in the bucket or directly restore (split) it to a linked list structure:

if (lc <= UNTREEIFY_THRESHOLD) {
    tab[index] = loHead.untreeify(map);
}

2. Understanding Default Parameters of HashMap#

Why is the default length of HashMap's Entry array 16? Why must the array length be a power of 2?#

First, let's look at how HashMap calculates the hashcode to get the storage position:

static int indexFor(int h, int length) {
    return h & (length-1);
}

A length of 16 or other powers of 2 means that the value of length - 1 has all binary bits set to 1. In this case, the result of the index is equivalent to the last few bits of the hashcode value. As long as the input hashcode itself is evenly distributed, the result of the hash algorithm will be uniform.
Thus: The default length of HashMap is 16 and the requirement for array length to be a power of 2 is to reduce the probability of hash collisions.

Why is the load factor for HashMap expansion limited to 0.75?#

The way HashMap expands is by creating a new array that is twice the length of the previous array and transferring all elements from the current Entry array to the new one. The length of the new array after expansion is twice that of the previous one, so expansion is relatively resource-intensive.
The trigger condition for expansion is: Critical value = default length of the array x load factor

If the load factor is 0.5 or even lower, the resulting temporary critical value will be very small, leading to wasted allocated memory and excess unused memory space, which does not satisfy the condition for uniform distribution of the hash table.
If the load factor reaches 1, meaning the Entry array is full before expansion occurs, this will lead to a large number of hash collisions, resulting in excessively long linked lists that affect the efficiency of get queries.
Therefore, a compromise value of 0.5 to 1, which is 0.75, is chosen to balance the above situations.

3. Basic Operations of HashMap#

get Operation#

/**
 * Calculates the hash value of the key using the hash(key) method
 * Retrieves the node using the getNode() method
 * If the node is null, returns null; otherwise, returns node.value
 */
public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

/**
 * getNode can be divided into the following steps:
 * 1. If the hash table is empty or the bucket corresponding to the key is empty, return null
 * 2. If not empty, if the first node in the bucket matches the specified hash and key, return this node directly
 * 3. If the first node does not match, continue searching:
 *  3.1 If the current bucket uses a red-black tree, call the red-black tree's getTreeNode to get the node value
 *  3.2 If it is a linked list structure, traverse the linked list directly to search
 * 4. Return the value if the node is found; otherwise, return null
 */
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    // If the hash table and the bucket corresponding to the key are not empty
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        // Optimization point 1: Reduce unnecessary queries
        // If the first node in the bucket matches the specified hash and key, return this node directly
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        // If the first node does not match, continue searching
        if ((e = first.next) != null) {
            // If the current bucket uses a red-black tree, call the red-black tree's getTreeNode to get the node value
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // If it is a linked list structure, traverse the linked list directly to search
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

put Operation#

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

/**
 * The putVal method can be divided into the following steps
 * 1. If the hash table is empty, call the resize method to create it
 * 2. If the specified key does not conflict, i.e., there is no hash collision, directly insert the key-value pair
 * 3. If there is a conflict, find the conflicting node
 *  3.1 If the first node matches, record it; otherwise, continue searching
 *  3.2 If the current structure is a red-black tree, call the putTreeVal method to insert
 *  3.3 If it is a linked list structure, loop through the linked list to find it; if not found at the end, insert the node using the tail insertion method; if found, record this node and exit the loop
 * 4. If a conflicting node is found, check if onlyIfAbsent is true, which indicates whether to allow modification of the old value
 * 5. Increment the modification count modCount and the capacity size size by 1; if size > threshold, i.e., the critical value is reached, perform expansion
 *
 * @param hash The hash value of the specified key
 * @param key The specified key
 * @param value The value to be inserted
 * @param e, Indicates that replacing the old value is not allowed
 * @param evict Indicates whether HashMap is in creation mode
 * @return If the specified key exists and the old value is replaced, return this value; otherwise, return null; of course, the value may also be null
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // If the hash table is empty, call the resize method to create it
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // If the specified key does not conflict, i.e., there is no hash collision, directly insert the key-value pair
    // (n - 1) & hash is equivalent to hash % n, but bitwise operations are faster
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    // If there is a conflict, find the conflicting node
    else {
        Node<K,V> e; K k;
        // If the first node matches, record it; otherwise, continue searching
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        // If the current structure is a red-black tree, call the putTreeVal method to insert
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            // If it is a linked list structure, loop through the linked list to find it; if not found at the end, insert the node using the tail insertion method
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // Check if the length of the linked list has reached the critical value; if so, convert it to a red-black tree structure
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                // If found, record this node and exit the loop
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        // If the value is not null, it indicates that a conflicting node was found
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            // Check if onlyIfAbsent is true, which indicates whether to allow modification of the old value
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    // Check if size has reached the critical value and needs to be expanded
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

Expansion Operation#

/**
 * Initializes or expands the table; during expansion, the capacity is doubled each time, and after binary calculation, the original nodes will either remain in their original positions or be allocated to "original position + old capacity"
 * Summary of the resize steps:
 * 1. Calculate the capacity and critical value after expansion
 * 2. Update the critical value of the HashMap
 * 3. Create a new array based on the calculated expansion capacity, then point the HashMap's table reference to the new array
 * 4. Copy the elements from the old array to the table
 */
final Node<K,V>[] resize() {
    // Create a new oldtab array to save the array table before expansion
    Node<K,V>[] oldTab = table;
    // Get the length of the old array
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    // Critical value of the old array
    int oldThr = threshold;
    int newCap, newThr = 0;
    // If the old array capacity is greater than 0
    if (oldCap > 0) {
        // If the old array capacity exceeds the maximum value, set its critical value to the maximum integer value
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // If the old array capacity is greater than the default capacity (16) and the new array capacity after doubling is less than the maximum value, double the critical value of the old array
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    // If old capacity <= 0, and critical value > 0
    else if (oldThr > 0) // initial capacity was placed in threshold
        // Set the new array capacity to the old array's critical value
        newCap = oldThr;
    // If old capacity <= 0, and critical value <= 0
    else {
        // Set the new capacity to the default value (16)
        newCap = DEFAULT_INITIAL_CAPACITY;
        // Set the new critical value to default capacity * default load factor (16 * 0.75) = 12
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // In the above condition checks, only when oldThr > 0, newThr == 0
    if (newThr == 0) {
        // Perform legal checks and set the new critical value
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    // Perform the operation of copying the contents of the old array to the new array
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        // Traverse each node of the old hash table to copy to the new array
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            // Record the value of the old node with e
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                // If the structure of the old bucket's storage conflict hash value is a red-black tree
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                // If the structure of the old bucket's storage conflict hash value is a linked list structure
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    // In Java 7, insertion was done one by one in the while loop, which could cause circular issues in multithreaded situations
                    // In Java 8, the array is assigned after the entire while loop of the linked list ends, so there won't be circular issues in multithreaded situations

                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    // Nodes remain in their original positions
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // Nodes are allocated to "original position + old capacity"
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

4. What Conditions Must a Class Meet if Its Key Value in HashMap is an Object?#

Generally, wrapper classes like String and Integer are used as keys because:

These wrapper types are final, meaning they are immutable, thus ensuring the immutability and accuracy of the hash value.
They override the equals() and hashcode() methods internally.

Therefore, if the key in HashMap is an object of a class, the hashcode and equals() methods must be overridden.

5. How Does HashMap Solve Hash Collisions?#

Methods to Resolve Hash Collisions#

Analysis of Common Methods to Resolve Hash Collisions

Open Addressing Method#

Starting from the cell where the collision occurred, find an empty cell in the hash table in a certain order. This method requires the table length to be greater than or equal to the number of elements to be stored.
Includes the following methods:

Linear Probing
Starting from the cell where the collision occurred, sequentially check if the next cell is empty; when reaching the last cell, check from the beginning until an empty cell is found or all cells have been checked.
Quadratic Probing
When a collision occurs, add 1², 2², etc. to the cell d[i] where the collision occurred, i.e., d[i] + 1², d[i] + 2², d[i] + 3²... until an empty cell is found.
Double Hashing Method

Chaining Method (Separate Chaining)#

The idea of the chaining method is to form a singly linked list of elements with the same hash value, storing the head pointer of the linked list in the ith cell of the hash table. Searching, inserting, and deleting mainly occur within the synonym linked list.

Rehashing Method#

This method constructs multiple different hash functions simultaneously:
Hi = RHi(key) i= 1,2,3 ... k;
When H1 = RH1(key) collides, use H2 = RH2(key) for calculation until no more collisions occur. This method is less likely to cause clustering but increases computation time.

6. What Are the Differences Between HashMap in JDK1.7 and JDK1.8?#

JDK1.8 introduced red-black trees, reducing the time complexity of traversal queries from O(n) with linked lists to O(logn).
In data insertion, JDK1.7 used head insertion, which could cause reverse order and circular linked list issues, while JDK1.8 used tail insertion to avoid this problem. See: Reasons for Circular Linked Lists in JDK1.7 HashMap and Solutions in JDK1.8.
The calculation method for data storage positions after expansion is different:
1. JDK 1.7 used hashCode() + 4 bitwise operations + 5 XOR operations (9 perturbations).
static final int hash(int h) {
h ^= k.hashCode();
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
```
  ```
```
2. JDK 1.8 simplified the perturbation function, performing only 2 perturbations = 1 bitwise operation + 1 XOR operation.
Java static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); // a. When key = null, the hash value = 0, so the key of HashMap can be null. // Note: Compared to HashTable, HashTable directly uses hashCode() on the key, and if the key is null, it throws an exception, so the key of HashTable cannot be null. // b. When key ≠ null, the hashCode() of the key is first calculated (denoted as h), and then the hash code undergoes perturbation processing: the hash code is XORed with the binary representation of itself right-shifted by 16 bits. }

7. Why Doesn't HashMap Directly Use the Hash Value Processed by hashCode() as the Index of the Table?#

HashMap does not directly use hashCode() but chooses to use a perturbation function to ensure that the hash code (hash value) generated based on the key is more evenly distributed and random, avoiding hash value collisions.

8. Differences Between HashMap, LinkedHashMap, and TreeMap#

When traversing HashMap, its output is unordered.
LinkedHashMap inherits from HashMap, maintaining efficiency while adding an internal linked list to store the order of elements.
TreeMap implements the SortedMap interface, allowing elements to be sorted.
LinkedHashMap sorts based on the order of elements entering the collection or the order of access, while TreeMap sorts based on the inherent order of elements (determined by Comparator or Comparable).

9. HashMap Data Loss Issues#

Reference: HashMap Source Code and Data Loss Issues in Multithreaded Situations

10. Reference Links#

Java Source Code Analysis: Major Updates on HashMap 1.8