Aussie AI

Chapter 13. LRU Cache Data Structure

Book Excerpt from "Advanced C++ Memory Techniques: Efficiency and Safety"

by David Spuler, Ph.D.

Chapter 13. LRU Cache Data Structure

What is an LRU Cache?

Least-Recently-Used (LRU) caches are a common requirement in low-latency programming. There are several important applications of an LRU cache:

Operating system paging algorithms
Memory access caches (low-level)
Order book updates in trading

The idea of an LRU cache is to maintain a cache of recently used data, such as memory we’ve just accessed, or a piece of data we’ve just updated. But we don’t want an unlimited size data structure, so when it gets full, we evict the data that was “least recently used” (i.e., the oldest data).

Note that an LRU cache is a more specific type of cache that just mapping keys to the values they were set to. The operations we need to support include:

Add a new key to the cache (with its corresponding value).
Update a key when it gets re-used again (more recently).
Remove the least-recently-used item in the cache (to make room for insertions).

Sounds like a queue? No, it’s not!

Not a Queue or Deque

An LRU cache has features that sound like a queue with FIFO ordering. We want to evict the oldest items from the cache, which sounds exactly like maintaining a queue of elements, and deleting from the tail of the queue will remove the oldest element. These features are very queue-like and maintain a FIFO-like order-of-insertion:

Add a new item to the end of the queue (the newest item).
Remove from the front (to evict the oldest item).

The feature that’s not like a queue occurs on the “update” of a key that’s already in there, which occurs if a cached item is then accessed a second time. This requires two problematic operations:

Search — find the item already in our LRU cache, and
Deletion — remove the item from the middle of the queue.

It’s starting to sound less-and-less like a queue. There’s no fast searching of std::queue and std::deque, and we’d have to use a linear scan.

Deletion is also a problem. We need to move an item from the middle of the queue back to the head of the queue. This is not like a standard queue, which only allow deletions from the end. A standard dequeue container also allows deletions from the front, but this doesn’t help us. Hence, we can’t just use a queue or dequeue, but need something fancier as our implementation of an LRU cache.

Overall, an LRU cache has similar requirements to the general case earlier: fast searches, insertions, and deletions. We also need to maintain order-of-insertion for cache evictions, but we need to remove arbitrary nodes from that sequence, so a standard queue or dequeue won’t work. Note that, unlike the general case, we don’t actually need to traverse the sequence in order, but only use it for evictions.

Nevertheless, the basic idea of an LRU cache implementation is similar to the general case of a data structure that maintains ordering by insertion sequence:

Hash table for fast searches, insertions, and deletions.
Maintain order-of-insertion sorting via an array, vector, or linked list.

Adding a new node into the cache is simply an insertion into the hash table, and adding it to the head of the array or list. This item is the “most recently used” so it will now be the last to be evicted from the cache.

If our cache is full, adding a new node means removing the oldest. It’s easy to remove the “least recently used” by removing it from the hash table, and removing the end element from the list (effectively, like a queue). We could seemingly implement this queue-like functionality with two possible approaches:

Statically with a fixed-sized array (i.e., a ring buffer wraparound), or
Dynamically via a linked list.

Only one of these ideas will work!

Array Implementation Fails

Let’s consider a contiguous array implementation first, which would be desirable for cache locality efficiency. In other words, we use a hash table for searching, insertion and deletion, but also maintain a separate array or vector data structure to track insertion order. In practice, we’d need to use a wrap-around of elements in a ring buffer structure, implemented via an array or vector container.

This is workable for many of the LRU cache requirements. Search and insertion is very fast in the hash table. We don’t actually search the array, which is fortunate, and inserting into an array with order-of-insertion is just adding it to the end (fast!).

However, deletion is a problem. We run into a significant efficiency problem arises when we need to update a cache item that’s already in the cache from a prior access: Every update of a value already in the cache needs to do two things to the array:

(a) delete the node in its previous place in the array, and

(b) re-insert the node at the head (it’s now the most-recently used item).

The key point is that the “previous place” for an item could be anywhere in the array or ring buffer. So, we need arbitrary deletions at any location. For the reasons discussed in the general case, an array or vector that implements a ring buffer or a fixed-size array will fail in this situation.

Removing an item from the middle of the array is problematic and needs an inefficient shuffle method to fill the gap, followed by trying to update pointers to all the array elements that were moved by the shuffle. Alternatively, moving the array’s end element down to cover the gap fails because it completely messes up the order of elements in the array.

A ring buffer implemented in an array or vector is no better at handling random deletions. Removing from the middle of a wraparound sequence in a ring buffer is actually the exact same situation, except rotated, and has the same problems.

One solution is to not allow cache updates. If an item is already in the cache, we could simply not update its position in the sequence. However, this is no longer an LRU cache, but more like a Least-Recently-Loaded (LRL) cache, or really a FIFO queue version of a cache.

The requirements for an LRU cache are somewhat different to a FIFO queue. For example, all frequently-used items will get evicted from the cache in a fixed order, getting no benefit over infrequent accesses. The efficiency of the cache does not adapt to access patterns. Overall, it seems that a contiguous data structure is not effective for an LRU cache.

Linked lists to the rescue!

Doubly-Linked List LRU Cache

Fortunately, an LRU cache is also fast to implement with a hash table and doubly-linked list. Note that a singly-linked list fails to provide efficient deletion, so we have to double up. Hence, the basic idea is:

Hash table — efficient search, insertion and deletion (but without ordering).
Doubly-linked list — maintains data according to order-of-insertion.

There are two ways to implement our doubly-linked list:

Second container — using the std::list container separately (it’s doubly-linked).
Threaded intrusively — use a doubly-linked list that is threaded through the hash table nodes.

The first solution is workable if we maintain a pointer or iterator into the linked list from our hash table nodes. We could make our list contain copies of the keys (if small), or pointers to the hash table nodes if the keys are a complex object (i.e., don’t copy it). But overall, the two container approach is inefficient because we’re doubling the number of allocated nodes by doing memory allocation once in the hash table, and again in the std::list container.

A better solution is to intrusively thread our own hand-coded doubly-linked list through our hash table nodes. This requires extra space for “next” and “previous” pointers in our hash table nodes, but doesn’t require a second memory allocation, and also maintains only one copy of the keys.

Let’s run with that idea and examine the efficiency of the operations:

Search — use the hash table to get O(1) average search cost (we don’t search the linked list).
Insertion — fast O(1) insertion into the hash table, and also O(1) insertion at the end of the doubly-linked list.
Deletion — fast (O1) deletion from the hash table, and also O(1) deletion in the middle of a doubly-linked list (hooray!).
Traversal (insertion-ordered) — linear scan of the linked list (easy).

The linked list needs to be doubly-linked because deletion from the middle of a singly-linked list is problematic. Efficient deletion from the middle of a singly-linked list needs to go backwards to find the previous node, which doesn’t work with one-way pointers.

Deletion from the middle of a doubly-linked list is easy by resetting two pointers, in the node prior to us, and the node afterwards. This is fiddly but has only O(1) complexity, with just a few pointer operations. Unlike the array version, there’s no “shuffling” or other hidden costs, so deletion is also fast, and maintains the order-of-insertion requirement.

The deletion algorithm for doubly-linked lists is fiddly with some edge cases, but not that difficult. Once the list node to remove is found, we need to update the pointers in both the previous and the next node on the list. We also need to handle special cases like when the array is empty, or has only one element, or when deletion is at the head or tail of the array.