
- given a text string of length n and a pattern string of length m ≤ n, find the index/indexes at which the pattern begins.
- The pattern-matching problem is inherent to many behaviors of Java’s String class
Brute Force
- enumerate all possible configurations
public class BruteForceMatching { public static int findBrute(char[] text, char[] patten) { int n = text.length; int m = patten.length; for (int i = 0; i <= n - m; i++) { int k = 0; while (k < m && patten[k] == text[i + k]) k++; if (k == m) return i; } return -1; } }Boyer-Moore pattern-matching algorithm
- 2 Rules: bad character; good suffix
- avoid comparisons with whole groups of characters in the text
- when a mismatch is found at the right end of the pattern, realign the pattern beyond the mismatch
- when a match is found for a character, try to extend the match with the predecessor character
- If the mismatched character occurs elsewhere in the pattern, two possible subcases depending on whether its last occurrence is before or after the mismatched character (here we don’t search for another occurrence; for the efficiency, we only define a function to quick determine the last occurrence, i.e. a hash table)
- O(nm) time in the worst case(unlikely), the same as the brute-force algorithm.
- The original algorithm achieves worst-case running time O(n + m + |Σ|) by using an alternative shift heuristic
public class BoyerMoore { public static int findBoyerMoore(char[] text, char[] pattern) { int n = text.length; int m = pattern.length; if (m == 0) return 0; Map<Character, Integer> last = new HashMap<>(); for (int i = 0; i < n; i++) last.put(text[i], -1); for (int i = 0; i < m; i++) last.put(pattern[i], i); int i = m - 1; int j = m - 1; while (i < n) { if (text[i] == pattern[j]) { if (j == 0) return i; i--; j--; } else { i += m - Math.min(j, 1 + last.get(text[i])); j = m - 1; } } return -1; } }The Knuth-Morris-Pratt Algorithm
- achieves a running time of O(n + m)
- precompute self-overlaps between portions of the pattern so that when a mismatch occurs at one location, we immediately know the maximum amount to shift
- precompute a failure function, indicating the proper shift of the pattern upon a failed comparison.
- f(k) is defined as the length of the longest prefix of the pattern that is a suffix of the substring pattern[1..k]
- use f(k) when a mismatch upon character pattern[k+1]
public class KnuthMorrisPratt {
public static int findKMP(char[] text, char[] pattern) {
int n = text.length;
int m = pattern.length;
if (m == 0) return 0;
int[] fail = computeFailKMP(pattern);
int j = 0;
int k = 0;
while (j < n) {
if (text[j] == pattern[k]) {
if (k == m - 1) return j - m + 1;
k++;
j++;
} else if (k > 0) k = fail[k - 1];
else j++;
}
return -1;
}
private static int[] computeFailKMP(char[] pattern) {
int m = pattern.length;
int[] fail = new int[m];
int j = 1;
int k = 0;
while (j < m) {
if (pattern[j] == pattern[k]) {
fail[j] = k + 1;
j++;
k++;
} else if (k > 0) k = fail[k - 1];
else j++;
}
return fail;
}
}




近期评论