Hash Tables


A hash table (also known as a hash map) is a data structure that offers a way to store and retrieve values efficiently using a hash function. It is one of the most commonly used data structures for implementing associative arrays or maps, where you associate a key with a value.

Hash tables are popular because of their ability to provide constant-time average complexity for search, insert, and delete operations, making them crucial in various applications, such as databases, caches, and indexing systems.


What is a Hash Table?

A hash table is a collection of key-value pairs, where each key is unique. The basic idea is to use a hash function to convert the key into an index, where the corresponding value is stored in an array or list.

The key steps in how a hash table works are:

  1. Hashing: A hash function takes the key and returns an index (hash code) where the value is stored.
  2. Indexing: The index derived from the hash function points to a position in the underlying array or list, where the value is placed.
  3. Handling Collisions: If two keys generate the same index (hash collision), the hash table needs a way to resolve it.

Hash Table Structure

A hash table typically consists of:

  • An array: The underlying data structure where values are stored.
  • A hash function: A function that maps keys to indices in the array.
  • Collision handling method: Techniques to manage hash collisions when two keys have the same hash value.

Operations on Hash Tables

There are several key operations that can be performed on a hash table:

  1. Insertion: Add a key-value pair to the hash table.
  2. Search: Retrieve the value associated with a given key.
  3. Deletion: Remove a key-value pair from the hash table.
  4. Update: Modify the value associated with an existing key.
  5. Collision Resolution: Handle situations where multiple keys map to the same index.

1. Insertion in Hash Table

The insertion operation involves:

  • Calculating the hash code for the key using a hash function.
  • Storing the value at the calculated index in the array.
  • Handling any collisions that might occur.

2. Searching in Hash Table

To search for a value in a hash table:

  • Calculate the hash code for the key.
  • Check if the value is stored at the corresponding index.
  • If there’s a collision, check the next node in the collision resolution list (e.g., linked list, open addressing).

3. Deletion in Hash Table

To delete a key-value pair:

  • Calculate the hash code for the key.
  • Remove the key-value pair from the array or linked list.
  • Handle any rehashing or repositioning if needed.

4. Collision Resolution Techniques

Collisions occur when two different keys hash to the same index. There are several methods to handle collisions:

  • Chaining: Store a linked list of all keys that hash to the same index.
  • Open Addressing: Find another open slot within the array to store the colliding key.

Types of Hashing Methods

  • Division Method: A simple approach that divides the key by a prime number and returns the remainder.

  • Multiplication Method: Multiplies the key by a constant, then takes the fractional part and multiplies it by the size of the array.


Python Code Implementation of Hash Table

Here’s a simple implementation of a hash table using chaining for collision resolution:

class HashTable:
    def __init__(self, size):
        self.size = size
        self.table = [[] for _ in range(size)]  # Create an empty table with 'size' lists

    def hash_function(self, key):
        return hash(key) % self.size  # Hash the key and mod by table size

    # Insert a key-value pair
    def insert(self, key, value):
        index = self.hash_function(key)
        # Check if the key already exists in the chain, and update its value if it does
        for item in self.table[index]:
            if item[0] == key:
                item[1] = value
                return
        # If the key doesn't exist, append a new (key, value) pair
        self.table[index].append([key, value])

    # Search for a value by key
    def search(self, key):
        index = self.hash_function(key)
        for item in self.table[index]:
            if item[0] == key:
                return item[1]  # Return the value if the key is found
        return None  # Return None if key is not found

    # Delete a key-value pair
    def delete(self, key):
        index = self.hash_function(key)
        for i, item in enumerate(self.table[index]):
            if item[0] == key:
                del self.table[index][i]  # Remove the (key, value) pair
                return True
        return False  # Return False if the key wasn't found

    # Display the hash table
    def display(self):
        for i in range(self.size):
            if self.table[i]:
                print(f"Index {i}: {self.table[i]}")

# Example usage
ht = HashTable(10)
ht.insert("apple", 100)
ht.insert("banana", 200)
ht.insert("orange", 300)

# Display the hash table
ht.display()

# Search for a key
print("Value for 'apple':", ht.search("apple"))  # Output: 100

# Delete a key
ht.delete("banana")
ht.display()  # 'banana' should be removed

# Search for a deleted key
print("Value for 'banana':", ht.search("banana"))  # Output: None

Explanation of the Code:

  • HashTable Class: This class defines the hash table.
  • hash_function: A function that generates the hash code of the key by using Python's built-in hash() function and applying modulo operation to fit within the table size.
  • insert: Adds a key-value pair to the table. If the key already exists, its value is updated.
  • search: Searches for the value associated with a given key.
  • delete: Removes a key-value pair from the table.
  • display: Displays the current state of the hash table.

Collision Handling in Hash Tables

1. Chaining

In the chaining technique, when two keys hash to the same index, a linked list (or another list-like structure) is created at that index to store multiple key-value pairs. This method allows multiple keys to share the same index, but it requires extra space for the linked list.

Example:

  • Index 0: [(apple, 100), (banana, 200)]
  • Index 1: [(orange, 300)]

2. Open Addressing

Open addressing involves finding another open slot in the table if a collision occurs. It uses techniques like linear probing, quadratic probing, or double hashing to find the next available slot.


Real-World Applications of Hash Tables

  • Databases: Hash tables are used for indexing and fast data retrieval in databases.
  • Caching: Hash tables are commonly used to implement caches, such as in-memory caches, to speed up data retrieval.
  • Associative Arrays: Many programming languages implement their dictionary or map-like structures using hash tables (e.g., Python's dict).
  • Sets: Hash tables are also used to implement sets, ensuring fast lookup of elements.