Rabin Karp Hashing

Rabin-Karp hashing is a method of using linear time precomputation to allow constant time hashing of arbitrary substrings with a polynomial hash.

# Polynomial Hash

For a string $a_0,a_1,a_2,\ldots,a_n$, a polynomial hash of it is $(a_0+a_1\cdot p + a_2\cdot p^2 + \cdots + a_n\cdot p^n) % m$ for some modulus $m$ and $p$ is generally prime. The modulus is generally ignored in implementations and is taken care of by integer overflow.

# Rabin-Karp

We create an array of the hashes for the prefixes. For example, in the array $[3,1,4,1,5]$, we would get the array of hashes $[3,\ 3+1\cdot p,\ 3+1\cdot p+4\cdot p^2,\ 3+1\cdot p+4\cdot p^2+1\cdot p^3,\ 3+1\cdot p + 4\cdot p^2 + 1\cdot p^3 + 5\cdot p^4]$.

Now suppose we wanted the hash of the substring [1,4,1]. We notice that if we take the hash of [3,1,4,1] and subtract the hash of [3], we get the hash of [1,4,1] but multiplied by a factor of $p$. To take care of this, when comparing hashes, we multiply by appropriate powers of $p$ such that both sides start with the same power.

This gives linear time precomputation for constant time hashes.

Please add implementation

page revision: 0, last edited: 05 Jan 2008 03:26
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License