Skip to content

HashMap in Java

In the Java Collections Framework, HashMap is one of the most commonly used containers for storing key-value pairs. It uses a hash table as its underlying data structure, enabling near-constant-time lookup and insertion through hashing algorithms.

This article focuses on the underlying implementation principles of HashMap prior to Java 8 (mainly Java 7 and earlier), including hash computation, resizing mechanisms, and collision handling strategies, and compares them with improvements introduced in Java 8 (red-black trees).


Basic Usage Layer

API Usage (CRUD)

  • Create — Add an element You can add a key-value pair to a HashMap using the put() method. For example, adding names and ages as key-value pairs:
java
final var map = new HashMap<String, Integer>();
map.put("Zhang San", 20);
map.put("Li Si", 25);
  • Delete — Remove an element You can remove a key-value pair using the remove() method. For example, remove the key "Zhang San":
java
map.remove("Zhang San");
  • Update — Modify an element You can modify an existing key-value pair by calling put() again. For example, change "Zhang San"’s age to 30:
java
map.put("Zhang San", 30);
  • Read — Retrieve an element You can retrieve a value by its key using get():
java
final var age = map.get("Zhang San");

In practical scenarios, HashMap can be used for caching or indexing. For instance, you might store user IDs as keys and user info as values to cache user data for quick lookup. Alternatively, you can store keywords as keys and lists of document IDs as values to support fast keyword-based searches.

The underlying implementation of HashMap is based on a hash table — essentially an array where each position can hold a single key-value pair, a linked list, or a red-black tree (explained later). When a new key-value pair is added, the HashMap computes the array index using the key’s hash value and inserts it into the corresponding bucket.

When retrieving a value, the same hash computation determines where to look.


Implementation Layer

1. Basic Structure

In Java 7, HashMap is implemented as an array (Entry[] table), where each element of the array is called a bucket. Each bucket holds a singly linked list — a chain of elements that share the same hash index.

Diagram:

table[0] -> Entry1 -> Entry2 -> Entry3
table[1] -> null
table[2] -> Entry4 -> Entry5
...

Each Entry node in Java 7 is defined roughly as follows:

java
static class Entry<K,V> implements Map.Entry<K,V> {
    final K key;
    V value;
    Entry<K,V> next;
    final int hash;

    Entry(int h, K k, V v, Entry<K,V> n) {
        value = v;
        next = n;
        key = k;
        hash = h;
    }
}

2. Hash Computation and Indexing

HashMap determines where to store a key by computing its hashCode(). To ensure better hash distribution, Java 7 applies a hash perturbation function (also called “hash mixing”).

java
int hash = hash(key);
int index = (table.length - 1) & hash;

1. hash() Function

In Java 7:

java
final int hash(Object k) {
    int h = k.hashCode();
    return h ^ (h >>> 16);  // High-bit perturbation
}

This ensures that high bits also influence low bits, reducing the likelihood of hash collisions.

2. Index Calculation

& (length - 1) is used instead of % length to compute the index, because the table size is always a power of two — making bit operations much faster than modulo operations.


3. Hash Collisions and Linked Lists

When two different keys map to the same array index, a hash collision occurs.

The solution: Separate chaining — each bucket stores a linked list of entries.

Example insertion:

java
Entry<K,V> e = table[index];
table[index] = new Entry<>(hash, key, value, e);

In Java 7, new entries are added to the head of the list (“head insertion”), meaning:

  • The newly added node becomes the new head, and the old nodes are linked behind it.
  • This insertion order can cause problems in multithreaded environments (e.g., circular linked lists leading to infinite loops during rehashing).

4. Resizing Mechanism (Rehashing)

1. Trigger Condition

HashMap resizes when the number of elements exceeds its threshold: threshold = capacity * loadFactor

Default values:

java
initialCapacity = 16
loadFactor = 0.75

→ Threshold = 16 × 0.75 = 12 So when more than 12 elements are stored, resizing occurs.

2. Resizing Process

When resizing, HashMap creates a new array (double the previous capacity) and rehashes all existing elements into their new positions:

java
void resize(int newCapacity) {
    Entry[] oldTable = table;
    Entry[] newTable = new Entry[newCapacity];
    for (Entry<K,V> e : oldTable) {
        while (e != null) {
            Entry<K,V> next = e.next;
            int i = e.hash & (newCapacity - 1);
            e.next = newTable[i];
            newTable[i] = e;
            e = next;
        }
    }
    table = newTable;
}

This process is known as Rehashing because each key’s hash index must be recomputed. Resizing is expensive — frequent resizing can severely impact performance.


5. Java 8 Improvements: Red-Black Trees (Treeify)

In Java 8, to avoid performance degradation from long linked lists, red-black trees were introduced for buckets with excessive collisions.

1. Trigger Condition

When the length of a linked list in a bucket exceeds 8 and the array capacity is greater than 64, the linked list is converted into a red-black tree.

java
if (binCount >= TREEIFY_THRESHOLD - 1)
    treeifyBin(tab, hash);

2. Advantages

  • Linked list lookup complexity: O(n)
  • Red-black tree lookup complexity: O(log n)

Thus, performance improves significantly in high-collision scenarios.

3. Node Type Changes

Java 8 replaced Entry with Node, and added a TreeNode structure:

java
static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
}

When a bucket becomes treeified, Node instances are converted to TreeNodes, forming a self-balancing binary search tree (red-black tree).


6. Thread Safety Issues

Important notes:

  • HashMap is not thread-safe.
  • In Java 7, head insertion during resizing can cause circular linked lists, leading to infinite loops.
  • In Java 8, tail insertion is used instead, and the resizing logic was improved to prevent loops.

For concurrent environments, use:

  • ConcurrentHashMap, or
  • Collections.synchronizedMap().

7. Summary: Java 7 vs Java 8

FeatureJava 7Java 8
Underlying StructureArray + Linked ListArray + Linked List + Red-Black Tree
Insertion OrderHead insertionTail insertion
Resizing TriggerWhen exceeding thresholdSame, but logic optimized
Collision HandlingLinked ListLinked List + Red-Black Tree
Hash MixingPresentSimplified and optimized
Thread SafetyNot safeNot safe (use ConcurrentHashMap)

8. Conclusion

Java 7’s HashMap is a classic hash table implementation that combines arrays and linked lists for efficient key-value storage. Java 8 builds upon this by introducing red-black trees, which solve the performance issues caused by long linked lists.

Understanding the underlying principles of HashMap not only helps in writing more efficient code but also provides valuable insight when diagnosing complex issues such as rehashing-related infinite loops and hash collisions.

Just something casual. Hope you like it. Built with VitePress