An intuitive solution would be traverse through the whole string, create HashMap using key of every 10-letter-long substring. But comparing 2 strings is slow, there are better ways.
Consider ACGT, there are only 4, therefore we can use 2 bit binary numbers to represent them. That is to say: 00 for A, 01 for C, 10 for G, 11 for T. For ACGT, it will be 00011011, for example. Using this method, we are able to store the 10-letter-long substring using only 1 integer.
class : """ @param s: a string @return: return List[str] """ deffindRepeatedDnaSequences(self, s): from collections import defaultdict table = {'A':0,'C':1,'G':2,'T':3} d = defaultdict(int) m = {} for i in range(len(s)-9): pattern = s[i:i+10] key = 0 for p in pattern: key += table[p] key <<= 2 d[key] += 1 m[key] = pattern ans = [] for e in d: if d[e] > 1: ans.append(m[e]) return ans
近期评论