Rabin-Karp Algorithm
The Rabin-Karp Algorithm is a well-known string matching algorithm that uses hashing to efficiently find a substring within a main string. Unlike traditional algorithms like the Naive String Matching algorithm, which checks each position one by one, the Rabin-Karp algorithm computes a hash value for the substring and compares it with the hash of the text. This results in a much faster matching process, especially when looking for multiple occurrences of a substring.
The main advantage of the Rabin-Karp algorithm is its ability to handle multiple pattern searches efficiently. However, in the worst case, the time complexity can degrade to that of the Naive Algorithm, which is O(n * m).
The Rabin-Karp algorithm follows these basic steps:
The algorithm uses rolling hash to efficiently compute the hash of the next substring in constant time.
For a substring starting at position i
, the hash is calculated as:
Where:
p
is a prime number (usually small, like 31).q
is a large prime number to reduce collisions.m
is the length of the pattern.For each subsequent substring, the hash can be updated using:
This allows the algorithm to update the hash in O(1) time for each shift of the window.
Consider the text "ABC ABCDAB ABCDABCDABDE" and the pattern "ABCDABD". The goal is to find the pattern in the text using the Rabin-Karp algorithm.
def rabin_karp(text, pattern):
# Define prime numbers for hashing
d = 256 # Number of characters in the input alphabet
q = 101 # A prime number for modulus
# Length of the pattern and text
m = len(pattern)
n = len(text)
# Calculate the hash value of the pattern and the first window of the text
p_hash = 0 # Hash value for pattern
t_hash = 0 # Hash value for text
# The value of d^m-1, used for rolling hash
h = 1
for i in range(m - 1):
h = (h * d) % q
# Calculate initial hash values of the pattern and the first window
for i in range(m):
p_hash = (d * p_hash + ord(pattern[i])) % q
t_hash = (d * t_hash + ord(text[i])) % q
# Slide the pattern over text one by one
for i in range(n - m + 1):
# If the hash values match, check the actual strings
if p_hash == t_hash:
if text[i:i + m] == pattern:
print(f"Pattern found at index {i}")
# Calculate the hash value for the next window
if i < n - m:
t_hash = (d * (t_hash - ord(text[i]) * h) + ord(text[i + m])) % q
if t_hash < 0:
t_hash = t_hash + q # Ensure positive hash value
# Example usage
text = "ABC ABCDAB ABCDABCDABDE"
pattern = "ABCDABD"
rabin_karp(text, pattern)
#include <iostream>
#include <string>
using namespace std;
void rabin_karp(string text, string pattern) {
int d = 256; // Number of characters in the input alphabet
int q = 101; // A prime number for modulus
int m = pattern.length();
int n = text.length();
// The value of d^(m-1), used for rolling hash
int h = 1;
for (int i = 0; i < m - 1; i++) {
h = (h * d) % q;
}
// Compute hash of the pattern and the first window of text
int p_hash = 0, t_hash = 0;
for (int i = 0; i < m; i++) {
p_hash = (d * p_hash + pattern[i]) % q;
t_hash = (d * t_hash + text[i]) % q;
}
// Slide the pattern over the text one by one
for (int i = 0; i <= n - m; i++) {
// If the hash values match, check the actual strings
if (p_hash == t_hash) {
if (text.substr(i, m) == pattern) {
cout << "Pattern found at index " << i << endl;
}
}
// Calculate the hash value for the next window
if (i < n - m) {
t_hash = (d * (t_hash - text[i] * h) + text[i + m]) % q;
if (t_hash < 0) {
t_hash += q; // Ensure positive hash value
}
}
}
}
int main() {
string text = "ABC ABCDAB ABCDABCDABDE";
string pattern = "ABCDABD";
rabin_karp(text, pattern);
return 0;
}
Time Complexity:
Space Complexity:
The space complexity is O(1), since the algorithm only uses a constant amount of extra space to store the hash values and other variables.
String Matching:
The primary use of the Rabin-Karp algorithm is in string matching, where you need to find one or more occurrences of a substring in a given text. It's particularly efficient when there are multiple patterns to search for in the same text.
Plagiarism Detection:
In plagiarism detection systems, Rabin-Karp can be used to identify similar sections of text by comparing substrings using hashing.
Search Engines:
Search engines use variations of string matching algorithms like Rabin-Karp to efficiently search for relevant documents based on keyword matching.
Bioinformatics:
In DNA sequence analysis, Rabin-Karp is used to find specific subsequences within a large DNA sequence.