using sets to find out all of the consecutive sequences

About this series of posts

This series of posts are to introduce some simple and elegant problem solving skills of algorithm related problems. The contents would be updated regularly.

Problem 1: Return the number of consecutive sequences

Given an array of unique integers nums, please find out the number of consecutive sequences in the array. For example, for num={1,2,3,4,33,11,9,8,7}, there are 4 consecutive sequences {1,2,3,4} , {33} , {11} and {7,8,9}. So, the returned value should be 4.

We will start by jotting down some facts of the problem.

Facts

  1. In a consecutive sequence, there must exist one smallest number called minNum. For example, in the consecutive sequence {1,2,3,4}, the smallest number is 1.
  • If minNum is known to us, we can claim that minNum-1 is not in the array nums. This fact can be proved by contradiction.
  • If we can find out the smallest number of a consecutive sequence, will can easily find the other numbers within the same consecutive sequence by using set.(explanation: we do this by maintaining a set)

With the 3 facts above, we can write down our pseudocode

1
2
3
4
5
6
7
consecutiveSequenceNum(nums)
nums_set = make set from nums
ret = 0
for n in nums
if n-1 is not in nums_set
ret++
return ret

In our algorithm, we have used fact.1 and fact.2. What it does is just scan through the nums array, if the predecessor of n is not in the set, we know that n is the smallest number of the corresponding consecutive sequence. So, we increment ret by 1. Note that, the input array contains only unique integers, so we don’t need to worry about duplicate numbers.

Additionally, we can get the same result by finding the maximum number. The only thing we need to change is the condition of the if clause.

1
2
3
4
5
6
7
consecutiveSequenceNum(nums)
nums_set = make set from nums
ret = 0
for n in nums
if n+1 is not in nums_set
ret++
return ret

Problem 2: Return the number of elements in each consecutive sequences

Given an array of unique integers nums, please find out the number of elements in each consecutive sequence. The order of the returned result is not important. For example, for num={1,2,3,4,33,11,9,8,7}, there are 4 consecutive sequences {1,2,3,4} , {33} , {11} and {7,8,9}. The returned value should be {4,1,1,3}.(you can return the result in any order)

In this case, we need to use fact.1 , fact.2 and fact.3. If we have found a minNum of a consecutive sequence, we can find out the other members by testing whether {minNum+1, minNum+2, ... } are in nums_set. The psudocode are shown below

1
2
3
4
5
6
7
8
9
10
11
(nums)
nums_set = make set from nums
for n in nums
if n-1 is not in nums_set
elementNum = 0
i = n
while i in nums_set
elementNum++
i++
ret.push(elementNum)
return ret

The psudocode above is self explained. We can extend the problem to return the longest consecutive sequences or return all of members of longest consecutive sequences. The solution to these problems need only some minor changes from the above code.

Conclusion

Set is an effective data structure to check the existence of an element in O(1) time. By using set, we can find the minimum number of a consecutive sequence. With the minimum number, we can easily find the other elements within the same consecutive sequence. The same logic can be applied to similar problems of sequences. For example, the problems above can be generalized as sequences with steps equal to k. We will explorer more characteristics and applications of sets. If you are interested in it, please see the other posts of kelvin.ink