word break

These are three problem about Word Break. Dynamic Programming or other fancy tricks can applied to solve and optimize these problems.

Description

Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, determine if s can be segmented into a space-separated sequence of one or more dictionary words.

Note:

  • The same word in the dictionary may be reused multiple times in the segmentation.
  • You may assume the dictionary does not contain duplicate words.

Example

1
2
3
Input: s = "leetcode", wordDict = ["leet", "code"]
Output: true
Explanation: Return true because "leetcode" can be segmented as "leet code".

Solution

Actually, DFS or BFS can be applied to solve the problem. So I wrote a DFS version recursively.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class :
def wordBreak(self, s, wordDict):
"""
:type s: str
:type wordDict: List[str]
:rtype: bool
"""
if len(s)==0:
return True
for item in wordDict:
if s.startswith(item):
if self.wordBreak(s[len(item):], wordDict):
return True
return False

However, there is a obvious loophole. That is you can prune your search tree by memorizing the specific position which has been visited.

So, I got the following solution:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class :
def wordBreak(self, s, wordDict):
start = 0
stack = [start]
visited = set()
while stack:
start = stack.pop()
if start in visited:
continue
visited.add(start)
for word in wordDict:
if s[start:].startswith(word):
x = len(word)
if x == len(s[start:]):
return True
stack.append(start + x)
return False

Furthermore, how to return all the possible sentences?

Just like:

1
2
3
4
5
6
7
8
Input:
s = "catsanddog"
wordDict = ["cat", "cats", "and", "sand", "dog"]
Output:
[
"cats and dog",
"cat sand dog"
]

It is just a extension about the former problem. However, It is not that easy to implement.

The following solution refers a post.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
class :
def wordBreak(self, s, wordDict):
"""
:type s: str
:type wordDict: List[str]
:rtype: List[str]
"""

# maxlen is the max word length used to minimize the window to scan
wordd, maxlen = {}, 0
for i, word in enumerate(wordDict):
wordd[word] = i
maxlen = max(maxlen, len(word))
n = len(s)

# dp[i] stores the indices of the last words that s[:i] ends with
dp = {i: [] for i in range(n+1)}

# scan through s to get dp[i]
for i in range(n+1):
for j in range(max(i-maxlen, 0), i):
if s[j:i] in wordd:
dp[i].append(wordd[s[j:i]])

memo = {}
# Recursively built the result from the tail of s, we can save a lot of space&time comparing to bottom-up
# by not needing to calculate the temp results that are not in the paths of the final result
def buildResult(i):
if i == 0:
return ['']
else:
if i in memo:
return memo[i]
ret = []
for idx in dp[i]:
ret += [tmp + ' '*int(bool(tmp)) + wordDict[idx] for tmp in buildResult(i - len(wordDict[idx]))]
memo[i] = ret
return ret

return buildResult(n)

Okey. What’s next?

Think about that Given a list of words (without duplicates), please write a program that returns all concatenated words in the given list of words.
A concatenated word is defined as a string that is comprised entirely of at least two shorter words in the given array.

Example:

1
2
3
4
5
6
7
Input: ["cat","cats","catsdogcats","dog","dogcatsdog","hippopotamuses","rat","ratcatdogcat"]

Output: ["catsdogcats","dogcatsdog","ratcatdogcat"]

Explanation: "catsdogcats" can be concatenated by "cats", "dog" and "cats";
"dogcatsdog" can be concatenated by "dog", "cats" and "dog";
"ratcatdogcat" can be concatenated by "rat", "cat", "dog" and "cat".

If you just use the aforementioned algorithms, it will be very time-consuming. Because the word dict could be very large and Only a little fraction of them can bu concatenated.

I saw some posts, and some brillion ideas are very provoking.

Here is one. Mark it, I will try other methods.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class :
def findAllConcatenatedWordsInADict(self, words):
"""
:type words: List[str]
:rtype: List[str]
"""
def brk(arr, st, wordDict, m):
if st in m:
return m[st]
if st == len(arr):
m[st] = True
return True
for en in range(st+1, len(arr) + 1):
if arr[st:en] in wordDict and brk(arr, en, wordDict, m):
m[en] = True
return True
m[st] = False
return False


wordDict = set(words)
res = []
for word in words:
wordDict.remove(word)
if brk(word, 0, wordDict, {}) and len(word):
res.append(word)
wordDict.add(word)
return res