[LeetCode] 187. Repeated DNA Sequences

Problem

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

Example:

1
2
3
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

Output: ["AAAAACCCCC", "CCCCCAAAAA"]

Explanation

  1. If the string has length less than or equal to 10, we return empty result.

  2. We can loop through the string, put each 10 characters long string into a set. In the future iteration, check if the set already contains the new 10 character slong string. If contains, then we can add this string into the result set.

  3. Finally return a list of the result set.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        if (s.length() < 9) return new ArrayList<>();
        Set<String> set = new HashSet<>();
        Set<String> res = new HashSet<>();

        for (int i = 0; i + 9 < s.length(); i++) {
            String curStr = s.substring(i, i+10);
            if (set.contains(curStr)) {
                res.add(curStr);
            } else {
                set.add(curStr);
            }
        }

        return new ArrayList<String>(res);
    }
}