
Not difficult to understand. I learned it from Coursera.
Prefix Function
- A “Border” of a string, is a substring that both acts as a prefix and a suffix of that string
- The “Prefix Function” of a string records the length of the longest border of every prefix of that string
Symbol
sis a strings[:i]is a prefix ofsmade up bys[0], s[1], ..., s[i-1]L[i]is the length of the longest border ofs[:i]- In other words,
Lis the “Prefix Function” ofs
- In other words,
Conclusion
L[i+1] <= L[i]+1, and the equality holds if and only ifs[L] = s[i]- The longest border of the longest border of a string, is the second longest border of that string.
- if some border of
s[:i]has a length oflenands[len] = s[i], then there must be a border ins[:i+1]with a length oflen+1
Method
L[0] = 0L[i+1] = len + 1, wherelenis the length of the longest border ofs[:i]that satisfiess[len] = s[i]- To find
len, try every border ofs[:i]from the longest to the shortest - To find the next longest border, see Conclusion 2
- If no border is found, simply compare
s[0]withs[i]
- If
s[0] = s[i], thenL[i+1] = 1 - Else,
L[i+1] = 0
- To find
Implementation
int *pf(string s) {
int len=s.size(),b=0;
int *p=new int[len]; p[0]=0;
for(int i=1;i<len;i++) {
while(b>0 && s[i]!=s[b]) b=p[b-1];
if(s[i]==s[b]) b++;
p[i] = b;
}
return p;
}
KMP
Symbol
pis the pattern string,tis the text stringLis the “Prefix Function” ofp
Conclusion
- If
pmatchestpartially with a longest common prefixp[:i], then no matches will be found beforepis shiftedlen(p) - L[i]positions rightwards.- There are two “submatches” in this partial match, both of which are the longest border of
p[:i] - The left one acts as a prefix, while the right one acts as a postfix
pis shifted rightwards, so doesp[:i]- No matches are possible before the left submatch arrives at the previous position of the right submatch
- The length of this “vacuum area” is
len(p) - L[i]
- There are two “submatches” in this partial match, both of which are the longest border of
Method
- Make
s = p + '$' + t, where ‘$’ is absent from bothpandt - Calculate the “Prefix Function” of
s, marked asL - A match is found at position
i - 2 * len(p)whenL[i] = len(p)- Under such circumstances, the longest border of
s[:i]happens to bepitself, aspis a prefix ofs - This border is also a substring of
s - Thanks to the ‘$’ sign, we can guarantee that this border never crosses the real “border” between
pandt, so it is also a substring oft - Thus the starting position of the match is
i - (len(p) - 1)ins, andi - (len(p) - 1) - (len(p) + 1)int
- Under such circumstances, the longest border of
Implementation
int main() {
string pattern, text, s;
int *pf;
while(cin >> pattern >> text && pattern != "") {
if(pattern.size() > text.size()) {
cout << "-1" << endl;
continue;
}
s = pattern + "$" + text;
pf = prefix_func(s);
for(int i = pattern.size()+1; i < s.size(); i++) {
if(pf[i] == pattern.size()) {
cout << i - 2 * pattern.size() << " ";
}
}
cout << endl;
delete pf;
}
return 0;
}




近期评论