Problem:
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
Return:
["AAAAACCCCC", "CCCCCAAAAA"].
Java Code:
public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
List<String> result = new ArrayList<String>();
if (s.length() < 10) {
return result;
}
HashMap<Integer, Integer> hs = new HashMap<Integer, Integer>();
for (int i = 0; i <= s.length() - 10; i++) {
int current = seqToInt(s.substring(i, i + 10));
if (!hs.containsKey(current)) {
hs.put(current, 1);
} else if (hs.get(current) == 1){
result.add(intToSeq(current));
hs.put(current, 2);
}
}
return result;
}
private int seqToInt(String seq) {
//A-00 C-01 G-10 T-11
int res = 0;
for (int i = 0; i < seq.length(); i++) {
char a = seq.charAt(i);
if (a == 'A') {
res += 0;
} else if (a == 'C') {
res += 1;
} else if (a == 'G') {
res += 2;
} else {
res += 3;
}
res = res << 2;
}
res = res >> 2;
return res;
}
private String intToSeq(int seq) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 10; i++) {
int cur = seq & 0x0003;
if (cur == 0) {
sb.insert(0, 'A');
} else if (cur == 1) {
sb.insert(0, 'C');
} else if (cur == 2) {
sb.insert(0, 'G');
} else {
sb.insert(0, 'T');
}
seq = seq >> 2;
}
return new String(sb);
}
}
没有评论:
发表评论