2015年6月16日星期二

[LeetCode] Repeated DNA Sequences

Problem:

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
Return:
["AAAAACCCCC", "CCCCCAAAAA"].

Java Code:

public class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        List<String> result = new ArrayList<String>();
        if (s.length() < 10) {
            return result;
        }
        
        HashMap<Integer, Integer> hs = new HashMap<Integer, Integer>();
        for (int i = 0; i <= s.length() - 10; i++) {
            int current = seqToInt(s.substring(i, i + 10));
            if (!hs.containsKey(current)) {
                hs.put(current, 1);
            } else if (hs.get(current) == 1){
                result.add(intToSeq(current));
                hs.put(current, 2);
            }
        }
        
        return result;
    }
    
    private int seqToInt(String seq) {
        //A-00 C-01 G-10 T-11
        int res = 0;
        for (int i = 0; i < seq.length(); i++) {
            char a = seq.charAt(i);
            if (a == 'A') {
                res += 0;
            } else if (a == 'C') {
                res += 1;
            } else if (a == 'G') {
                res += 2;
            } else {
                res += 3;
            }
            res = res << 2;
        }
        res = res >> 2;
        return res;
    }
    
    private String intToSeq(int seq) {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < 10; i++) {
            int cur = seq & 0x0003;
            if (cur == 0) {
                sb.insert(0, 'A');
            } else if (cur == 1) {
                sb.insert(0, 'C');
            } else if (cur == 2) {
                sb.insert(0, 'G');
            } else {
                sb.insert(0, 'T');
            }
            seq = seq >> 2;
        }
        return new String(sb);
    }
}

没有评论:

发表评论