rabin karp algorithm

implementation of usage of Rabin Karp Algorithm

input
a is a list a string
b is also list of string, where each string of a can replace with string of same index in b
a and b is same size
given a string s, replace all substring of s which in a to string in b

only need to replace one (ignore replaced part)
if there is multiple choose in a, repleace with the longest

Algorithm design:
key point:
reduce time complexity of comparing char by char in s and each string in a
solution:
use the Rabin Karp Algorithm

1. for each string in a, calculate its hash balue
2. for each prefix in s, calculate its hash value
  assume s is "abcde", prefixs are ["a","ab","abc","abcde"]
3. loop through s
  for each substring of s start at position l:
    hash(s[l,r]) = hash([1,r]) - hash([1,l-1])*base(r-l+1)
      multiple base^(r-l+1) to string before we need, mnius it so the remind hash value is hash value between s[l,r]
    so, for index l in range (0,len(s))
      for String s in a:
        r = l + len(s)
        calculate hash(s[l,r])
        compare with hash(s)
        if same, means there is replacement
        record len(s) and index j of s in a
      for the max len(s)
        replace s [l+len(s)] with b[j]

time complexiity: O(len(a)*len(s))

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56

class :
"""
@param a: The A array
@param b: The B array
@param s: The S string
@return: The answer
"""
def stringReplace(self, a, b, s):

seed = 33
mod = 1000000007
ans = 0
mxLen = -1
aHash = []
sHash = []
base = []
for i in a:
ans = 1
mxLen = max(mxLen, len(i))
for j in i:
ans = (ans * seed + ord(j) - ord('a')) % mod
aHash.append(ans)
ans = 1
sHash.append(ans)
mxLen = max(mxLen, len(s))
for i in s:
ans = (ans * seed + ord(i) - ord('a')) % mod
sHash.append(ans)
ans = 1
base.append(ans)
for i in range(mxLen):
ans = ans * seed % mod
base.append(ans)
ret = [i for i in s]
i = 0
while i < len(s):
maxLen = -1
index = 0
for j in range(len(a)):
lenaj = len(a[j])
l = i + 1
r = i + lenaj
if r > len(s):
continue
sHashValue = (sHash[r] - base[r - l + 1] * sHash[l - 1] % mod + mod) % mod
aHashValue = (aHash[j] - base[lenaj] + mod) % mod
if sHashValue == aHashValue and lenaj > maxLen:
maxLen = lenaj
index = j
if maxLen != -1:
for j in range(maxLen):
ret[i + j] = b[index][j]
i = i + maxLen - 1
i = i + 1
return "".join(ret)