HashMap in Java
In the Java Collections Framework, HashMap
is one of the most commonly used containers for storing key-value pairs. It uses a hash table as its underlying data structure, enabling near-constant-time lookup and insertion through hashing algorithms.
This article focuses on the underlying implementation principles of HashMap
prior to Java 8 (mainly Java 7 and earlier), including hash computation, resizing mechanisms, and collision handling strategies, and compares them with improvements introduced in Java 8 (red-black trees).
Basic Usage Layer
API Usage (CRUD)
- Create — Add an element You can add a key-value pair to a
HashMap
using theput()
method. For example, adding names and ages as key-value pairs:
final var map = new HashMap<String, Integer>();
map.put("Zhang San", 20);
map.put("Li Si", 25);
- Delete — Remove an element You can remove a key-value pair using the
remove()
method. For example, remove the key"Zhang San"
:
map.remove("Zhang San");
- Update — Modify an element You can modify an existing key-value pair by calling
put()
again. For example, change"Zhang San"
’s age to 30:
map.put("Zhang San", 30);
- Read — Retrieve an element You can retrieve a value by its key using
get()
:
final var age = map.get("Zhang San");
In practical scenarios, HashMap
can be used for caching or indexing. For instance, you might store user IDs as keys and user info as values to cache user data for quick lookup. Alternatively, you can store keywords as keys and lists of document IDs as values to support fast keyword-based searches.
The underlying implementation of HashMap
is based on a hash table — essentially an array where each position can hold a single key-value pair, a linked list, or a red-black tree (explained later). When a new key-value pair is added, the HashMap
computes the array index using the key’s hash value and inserts it into the corresponding bucket.
When retrieving a value, the same hash computation determines where to look.
Implementation Layer
1. Basic Structure
In Java 7, HashMap
is implemented as an array (Entry[] table
), where each element of the array is called a bucket. Each bucket holds a singly linked list — a chain of elements that share the same hash index.
Diagram:
table[0] -> Entry1 -> Entry2 -> Entry3
table[1] -> null
table[2] -> Entry4 -> Entry5
...
Each Entry
node in Java 7 is defined roughly as follows:
static class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
Entry<K,V> next;
final int hash;
Entry(int h, K k, V v, Entry<K,V> n) {
value = v;
next = n;
key = k;
hash = h;
}
}
2. Hash Computation and Indexing
HashMap
determines where to store a key by computing its hashCode()
. To ensure better hash distribution, Java 7 applies a hash perturbation function (also called “hash mixing”).
int hash = hash(key);
int index = (table.length - 1) & hash;
1. hash()
Function
In Java 7:
final int hash(Object k) {
int h = k.hashCode();
return h ^ (h >>> 16); // High-bit perturbation
}
This ensures that high bits also influence low bits, reducing the likelihood of hash collisions.
2. Index Calculation
& (length - 1)
is used instead of % length
to compute the index, because the table size is always a power of two — making bit operations much faster than modulo operations.
3. Hash Collisions and Linked Lists
When two different keys map to the same array index, a hash collision occurs.
The solution: Separate chaining — each bucket stores a linked list of entries.
Example insertion:
Entry<K,V> e = table[index];
table[index] = new Entry<>(hash, key, value, e);
In Java 7, new entries are added to the head of the list (“head insertion”), meaning:
- The newly added node becomes the new head, and the old nodes are linked behind it.
- This insertion order can cause problems in multithreaded environments (e.g., circular linked lists leading to infinite loops during rehashing).
4. Resizing Mechanism (Rehashing)
1. Trigger Condition
HashMap
resizes when the number of elements exceeds its threshold: threshold = capacity * loadFactor
Default values:
initialCapacity = 16
loadFactor = 0.75
→ Threshold = 16 × 0.75 = 12 So when more than 12 elements are stored, resizing occurs.
2. Resizing Process
When resizing, HashMap
creates a new array (double the previous capacity) and rehashes all existing elements into their new positions:
void resize(int newCapacity) {
Entry[] oldTable = table;
Entry[] newTable = new Entry[newCapacity];
for (Entry<K,V> e : oldTable) {
while (e != null) {
Entry<K,V> next = e.next;
int i = e.hash & (newCapacity - 1);
e.next = newTable[i];
newTable[i] = e;
e = next;
}
}
table = newTable;
}
This process is known as Rehashing because each key’s hash index must be recomputed. Resizing is expensive — frequent resizing can severely impact performance.
5. Java 8 Improvements: Red-Black Trees (Treeify)
In Java 8, to avoid performance degradation from long linked lists, red-black trees were introduced for buckets with excessive collisions.
1. Trigger Condition
When the length of a linked list in a bucket exceeds 8 and the array capacity is greater than 64, the linked list is converted into a red-black tree.
if (binCount >= TREEIFY_THRESHOLD - 1)
treeifyBin(tab, hash);
2. Advantages
- Linked list lookup complexity: O(n)
- Red-black tree lookup complexity: O(log n)
Thus, performance improves significantly in high-collision scenarios.
3. Node Type Changes
Java 8 replaced Entry
with Node
, and added a TreeNode
structure:
static class Node<K,V> implements Map.Entry<K,V> {
final int hash;
final K key;
V value;
Node<K,V> next;
}
When a bucket becomes treeified, Node
instances are converted to TreeNode
s, forming a self-balancing binary search tree (red-black tree).
6. Thread Safety Issues
Important notes:
HashMap
is not thread-safe.- In Java 7, head insertion during resizing can cause circular linked lists, leading to infinite loops.
- In Java 8, tail insertion is used instead, and the resizing logic was improved to prevent loops.
For concurrent environments, use:
ConcurrentHashMap
, orCollections.synchronizedMap()
.
7. Summary: Java 7 vs Java 8
Feature | Java 7 | Java 8 |
---|---|---|
Underlying Structure | Array + Linked List | Array + Linked List + Red-Black Tree |
Insertion Order | Head insertion | Tail insertion |
Resizing Trigger | When exceeding threshold | Same, but logic optimized |
Collision Handling | Linked List | Linked List + Red-Black Tree |
Hash Mixing | Present | Simplified and optimized |
Thread Safety | Not safe | Not safe (use ConcurrentHashMap) |
8. Conclusion
Java 7’s HashMap
is a classic hash table implementation that combines arrays and linked lists for efficient key-value storage. Java 8 builds upon this by introducing red-black trees, which solve the performance issues caused by long linked lists.
Understanding the underlying principles of HashMap
not only helps in writing more efficient code but also provides valuable insight when diagnosing complex issues such as rehashing-related infinite loops and hash collisions.