Step by Step design a Least Recently Used cache storage

Introduction

Today, I implement a Least Recently Used (LRU) cache. In this blog I will demonstrate the concepts, first approach without concern for time complexity, second approach the optimizes time complexity, and the final approach which fixes the shortcoming of earlier approaches. The data structures are implemented without using library. Finally, I demonstrate using standard container provided by C++.

Concept Least Recently Used (LRU)

Least Recently Used simply means a strategy or cache policy to store data into a data structure such that:

the data structure has limited capacity. The capacity is defined.
the item that is least queried, retrieved, or used are first to be deleted when a new item is added and when the cache has reached capacity.

In this program, I will focus on implementing the cache such that it has 2 basic operations:

addData: add the data to the cache
queryData: returns true if data is found

Implementation with Linked List

We could use an array as a cache, but the array has short coming: we can't know which item was queried least or not queried at all. Perhaps we could attach a count value for each item, but it just requires extra logic to increment count and search for least value of count.

Queue is another candidate. Queue is FIFO, First In - First Out or Last In - Last Out. The first item going into the queue is the least use; the last item going into the queue is the latest used. However, queue does support accessing item from anywhere within the queue but the top. What if there is a query for the first item, which makes it the least recently used? Stack doesn't help for similar reason.

Linked List is the preferred choice:

We could add item at the tail (end). The item is most recently used.
The head (top) item is the least recent used and will be replaced when the cache reaches capacity.
If any item, including the top item is query, it should be moved to the end.

I make the following program skeleton to illustrate the main work to be done:

typedef struct Data{
    int val;
    Data(int x) : val(x), next(nullptr), capacity(5) {} ;
    Data *next;
} Data;

class Linked_List {
    public:
    // O(1)
    void addData (Data *newData){
        // always add to end
        // if capacity is reached, add at the head(front)
    }

    //O(n)
    bool queryData(Data* &data) {
        Data* temp = head;
        if (found)
        // detach the item and place at the end/tail
        // return true
        return false; // Not found
    }

    private:
    Data * head;
    Data * tail;
    int size;
    const int capacity;
};

Time complexity analysis:

function addData has time complexity of O(1) because we know the exact location to add item for any scenario.
function queryData has to reiterate from head to tail of the linked list to find the item, then it has to perform operations to move the item to the tail. Therefore, the combine time complexity is O(n) where n is the size of the cache.

We could improve the queryData function with help of hashmap.

Optimization with hashmap

Hashmap has {key, value} where

the key is hashed and indexed for fast access O(1).
hash function that performs hashing.
Value should store the address of the data node in the linked list. Therefore, using hashmap's search function could return address of the data node in O(1) time. Including the hashmap in the queryData function consequently improves time complexity to O(1)

In order to improve the query time complexity, we need to incorporate the hashmap into the Linked_List class such that:

addData updates the hashmap where the {key, value} has value points to newly added data.
queryData also updates the hashmap where the {key, value} has value points to the tail

In C++, hashmap is implemented by unordered_list. For brevity, particularly not implementing hashmap's search function, and hash function, I will use this STL container.

The following program skeleton with comments explains the main work to be done.

typedef struct Data{
    int val;
    Data(int x) : val(x), next(nullptr), capacity(5) {} ;
    Data *pre;
    Data *next;
} Data;

class Linked_List {
    public:
    // O(1)
    void addData (Data *newData){
        // always add to end  
        // if capacity is reached, add at the head(front)
        // ***  update hashmap {key,val} where val = newData ***
    }

    //O(1)
    bool queryData(Data* &data) {
        Data* temp = head;
        if (hashmap.find(data))
        // detach the item and place at the end/tail
        // ***  update hashmap {key,val} where val = found data***
        // return true
        return false; // Not found
    }

    private:
    Data * head;
    Data * tail;
    int size;
    const int capacity;
    unordered_map<int, Data *> hashmap;
};

Other concerns - Mem leak Issue

When we remove the item of the linked list, the item should be deallocated, instead of simple "detach" because the linked list manages pointer to Data items.

We need to modify the addData function to delete the "old head".
For function queryData, we don't worry about mem leak, because we "move" the found item to the tail by changing the next and previous pointers of its neighbor.

Implementation of singly linked list

In the final program, I added other functions to print linked list, for debugging and function emptyList to deallocate the entire list properly at the end of the program.

#include <iostream>
#include <unordered_map>
using namespace std;

typedef struct Data {
    int val;
    Data(int x) : val(x), next(nullptr) {}
    Data* next;
} Data;

class Linked_List {
public:
    Linked_List() : head(nullptr), tail(nullptr), size(0), capacity(5) {}
    ~Linked_List() {
        freeList();
    }

    void addData(Data* newData) {
        if (size < capacity) {
            // Add to the end (tail)
            if (tail == nullptr) {
                head = tail = newData;
            } else {
                tail->next = newData;
                tail = newData;
            }
            size++;
        } else {
            // Add to the front (head)
            if (head != nullptr) {
                // Remove the original head
                Data* oldHead = head;
                head = head->next;
                if (oldHead == tail) {
                    tail = nullptr;
                }
                hashMap.erase(oldHead->val); // Remove from hash map
                delete oldHead;              // delete the old head to avoid mem leak. 
            }
            newData->next = head;
            head = newData;
            if (tail == nullptr) {
                tail = newData;
            }       
        }
        // Update hash map
        hashMap[newData->val] = newData; 
    }

    bool queryData(Data*& data) {
        if (hashMap.find(data->val) != hashMap.end()) {
            // update hashmap.
            Data* temp = hashMap[data->val];
            // Move it to the end
            if (temp != tail) {
                // Detach the node
                if (temp == head) {
                    head = temp->next;
                } else {
                    Data* prev = head;
                    while (prev->next != temp) {
                        prev = prev->next;
                    }
                    prev->next = temp->next;
                }
                // Move to end
                tail->next = temp;
                temp->next = nullptr;
                tail = temp;
            }
            return true;
        }
        return false; // Not found
    }

    // Print for debug/check result
    void printLinkedList() {
        if (head == nullptr)
            cout << "empty linkedlist" << endl;
        else {
            Data* tmp = head;
            while (tmp != nullptr) {
                cout << tmp->val << " ";
                tmp = tmp->next;
            }
            cout << "NULL" << endl;
        }
    }

private:
    Data* head;
    Data* tail;
    int size;
    const int capacity;
    unordered_map<int, Data*> hashMap; // Hash map for O(1) access

    void freeList() {
        Data* tmp = head;
        while (tmp != nullptr) {
            Data* next = tmp->next;
            delete tmp;
            tmp = next;
        }
    }
};

int main() {
    Data* data1 = new Data(1);
    Data* data2 = new Data(2);
    Data* data3 = new Data(3);
    Data* data4 = new Data(4);
    Data* data5 = new Data(5);
    Data* data6 = new Data(6);

    Linked_List* myList = new Linked_List();
    myList->addData(data1);
    myList->addData(data2);
    myList->addData(data3);
    myList->addData(data4);
    myList->addData(data5);
    cout << "print linked list before query" << endl; 
    myList->printLinkedList();

    myList->queryData(data1);
    cout << "print linked list after query data1, data 1 is moved to end" << endl;
    myList->printLinkedList();

    myList->addData(data6);
    cout << "print linked list after adding data6, data at front is removed" << endl;
    myList->printLinkedList();

    delete myList; 

    return 0;
}

Notes: The queryData is not O(1) but O(n)!!!!
In the above problem, the following codes requires O(n). Basically, we have to update the list from head to where the data is found.

Data* prev = head;
while (prev->next != temp) {
    prev = prev->next;
}
prev->next = temp->next;

The following figure explains the reasons. Let say Data_3 is the item that we need to update. We can update forward link next in O(1) time. However, there is no way to access Data_2, and thus we need to loop from Head to Data_2 to find it.

Final code, self implementation with double linked-list

Double linked-list requires more complex operation when adding new item to the end: we have to update the previous node's link. However, when removing item, and put it to the end/tail, we have the previous node link, and just need a few steps modification. Operation is O(1)

The following illustrates how item Data_3 is removed and put at the tail:

#include <iostream>
#include <unordered_map>
using namespace std;

class Data {
public:
    int val;
    Data* next;
    Data* prev;
    Data(int v) : val(v), next(nullptr), prev(nullptr) {}
};

class Linked_List {
public:
    Linked_List() : head(nullptr), tail(nullptr), size(0), capacity(5) {}
    ~Linked_List() {
        freeList();
    }

    void addData(Data* newData) {
        if (size < capacity) {
            // Add to the end (tail)
            if (tail == nullptr) {
                head = tail = newData;
            } else {
                tail->next = newData;
                newData->prev = tail;
                tail = newData;
            }
            size++;
        } else {
            // Add to the front (head)
            if (head != nullptr) {
                // Remove the original head
                Data* oldHead = head;
                head = head->next;
                if (head != nullptr) {
                    head->prev = nullptr;
                }
                if (oldHead == tail) {
                    tail = nullptr;
                }
                hashMap.erase(oldHead->val); // Remove from hash map
                delete oldHead;              // delete the old head to avoid mem leak. 
            }
            newData->next = head;
            if (head != nullptr) {
                head->prev = newData;
            }
            head = newData;
            if (tail == nullptr) {
                tail = newData;
            }       
        }
        // Update hash map
        hashMap[newData->val] = newData; 
    }

    bool queryData(Data*& data) {
        if (hashMap.find(data->val) != hashMap.end()) {
            // update hashmap.
            Data* temp = hashMap[data->val];
            // Move it to the end
            if (temp != tail) {
                // Detach the node
                if (temp == head) {
                    head = temp->next;
                    if (head != nullptr) {
                        head->prev = nullptr;
                    }
                } else {
                    temp->prev->next = temp->next;
                    if (temp->next != nullptr) {
                        temp->next->prev = temp->prev;
                    }
                }
                // Move to end
                tail->next = temp;
                temp->prev = tail;
                temp->next = nullptr;
                tail = temp;
            }
            return true;
        }
        return false; // Not found
    }

    // Print for debug/check result
    void printLinkedList() {
        if (head == nullptr)
            cout << "empty linkedlist" << endl;
        else {
            Data* tmp = head;
            while (tmp != nullptr) {
                cout << tmp->val << " ";
                tmp = tmp->next;
            }
            cout << "NULL" << endl;
        }
    }

private:
    Data* head;
    Data* tail;
    int size;
    const int capacity;
    unordered_map<int, Data*> hashMap; // Hash map for O(1) access

    void freeList() {
        Data* tmp = head;
        while (tmp != nullptr) {
            Data* next = tmp->next;
            delete tmp;
            tmp = next;
        }
    }
};

Implementation with STL list

C++ provide STL container list as a double linked list. From the above program, instead of implementing logic to add nodes and find nodes, I will used the provided APIs by std::list.

addData used std::list::push_front(data) if cache is not at capacity, and std::list::push_back() otherwise.
Note that we only need to remove item the oldHead of the hashmap. std::list deletes its own oldHead of the list.
and queryData are now updated to use std::list::remove to delete the item off the list. and use std::list::push_back to push newly query item to the end/tail.

#include <iostream>
#include <unordered_map>
#include <list>
using namespace std;

typedef struct Data {
    int val;
    Data(int x) : val(x) {}
} Data;

class DataList {
public:
    DataList() : capacity(5) {}
    ~DataList() {
        freeList();
    }

    void addData(Data* newData) {
        if (dataList.size() < capacity) {
            // Add to the end
            dataList.push_back(newData);
        } else {
            // Remove the front element
            Data* oldHead = dataList.front();
            dataList.pop_front();
            hashMap.erase(oldHead->val); // Remove from hash map
            delete oldHead;              // delete the old head to avoid mem leak. 
            // Add new data to the front
            dataList.push_front(newData);
        }
        // Update hash map
        hashMap[newData->val] = newData; 
    }

    bool queryData(Data*& data) {
        if (hashMap.find(data->val) != hashMap.end()) {
            // update hashmap.
            Data* temp = hashMap[data->val];
            // Move it to the end
            dataList.remove(temp);
            dataList.push_back(temp);
            return true;
        }
        return false; // Not found
    }

    // Print for debug/check result
    void printDataList() {
        if (dataList.empty())
            cout << "empty list" << endl;
        else {
            for (Data* data : dataList) {
                cout << data->val << " ";
            }
            cout << "NULL" << endl;
        }
    }

private:
    list<Data*> dataList;
    const int capacity;
    unordered_map<int, Data*> hashMap; // Hash map for O(1) access

    void freeList() {
        for (Data* data : dataList) {
            delete data;
        }
        dataList.clear();
    }
};

int main() {
    Data* data1 = new Data(1);
    Data* data2 = new Data(2);
    Data* data3 = new Data(3);
    Data* data4 = new Data(4);
    Data* data5 = new Data(5);
    Data* data6 = new Data(6);

    DataList* myList = new DataList();
    myList->addData(data1);
    myList->addData(data2);
    myList->addData(data3);
    myList->addData(data4);
    myList->addData(data5);
    cout << "print list before query" << endl; 
    myList->printDataList();

    myList->queryData(data1);
    cout << "print list after query data1, data 1 is moved to end" << endl;
    myList->printDataList();

    myList->addData(data6);
    cout << "print list after adding data6, data at front is removed" << endl;
    myList->printDataList();

    delete myList; 

    return 0;
}

Conclusion and Follow up

For LRU cache, it is important to remember that:

use combination of hashmap and linkedlist.
hashmap uses key to make addition and search O(1) in time complexity.
double linked list allows removal of any item in O(1) time complexity.
we can assign rules that linked list stores the most recent used item at the front and least recent used item at the back or vice versa.

In the next blog, I will demonstrate implementing Least Frequently Used cache.