A sequence is an ordered collection of objects (usually numbers), which are allowed to repeat. Sequences can be finite or infinite. Two examples are the finite sequence (π,−2–√,0,π)(π,−2,0,π) and the infinite sequence of odd numbers (1,3,5,7,9,…)(1,3,5,7,9,…). We use the notation anan to represent the nn-th term of a sequence.
A recurrence relation is a way of defining the terms of a sequence with respect to the values of previous terms. In the case of Fibonacci's rabbits from the introduction, any given month will contain the rabbits that were alive the previous month, plus any new offspring. A key observation is that the number of offspring in any month is equal to the number of rabbits that were alive two months prior. As a result, if FnFn represents the number of rabbit pairs alive after the nn-th month, then we obtain the Fibonacci sequence having terms FnFn that are defined by the recurrence relation Fn=Fn−1+Fn−2Fn=Fn−1+Fn−2 (with F1=F2=1F1=F2=1 to initiate the sequence). Although the sequence bears Fibonacci's name, it was known to Indian mathematicians over two millennia ago.
When finding the nn-th term of a sequence defined by a recurrence relation, we can simply use the recurrence relation to generate terms for progressively larger values of nn. This problem introduces us to the computational technique of dynamic programming, which successively builds up solutions by using the answers to smaller cases.
Given: Positive integers n≤40n≤40 and k≤5k≤5.
Return: The total number of rabbit pairs that will be present after nn months, if we begin with 1 pair and in each generation, every pair of reproduction-age rabbits produces a litter of kk rabbit pairs (instead of only 1 pair).
The GC-content of a DNA string is given by the percentage of symbols in the string that are 'C' or 'G'. For example, the GC-content of "AGCTATAG" is 37.5%. Note that the reverse complement of any DNA string has the same GC-content.
DNA strings must be labeled when they are consolidated into a database. A commonly used method of string labeling is called FASTA format. In this format, the string is introduced by a line that begins with '>', followed by some labeling information. Subsequent lines contain the string itself; the first line to begin with '>' indicates the label of the next string.
In Rosalind's implementation, a string in FASTA format will be labeled by the ID "Rosalind_xxxx", where "xxxx" denotes a four-digit code between 0000 and 9999.
Given: At most 10 DNA strings in FASTA format (of length at most 1 kbp each).
Return: The ID of the string having the highest GC-content, followed by the GC-content of that string. Rosalind allows for a default error of 0.001 in all decimal answers unless otherwise stated; please see the note on absolute error below.
with open("Computing_GC_Contentrosalind_gc.fasta", 'r') as fasta: for line in fasta.readlines(): if line.startswith(">"): id = line.split()[0].strip(">") fastaDict[id] = "" else: fastaDict[id] += line.strip()
for id, seq in fastaDict.items(): resultID.append(id) resultGC.append(GCcontent(seq))
index = resultGC.index(max(resultGC)) print(f"文件中GC含量最高的序列信息") print(resultID[index]) print(resultGC[index])
Figure 2. The Hamming distance between these two strings is 7. Mismatched symbols are colored red.
Given two strings ss and tt of equal length, the Hamming distance between ss and tt, denoted dH(s,t)dH(s,t), is the number of corresponding symbols that differ in ss and tt. See Figure 2.
Given: Two DNA strings ss and tt of equal length (not exceeding 1 kbp).
Figure 2. The probability of any outcome (leaf) in a probability tree diagram is given by the product of probabilities from the start of the tree to the outcome. For example, the probability that X is blue and Y is blue is equal to (2/5)(1/4), or 1/10.
Probability is the mathematical study of randomly occurring phenomena. We will model such a phenomenon with a random variable, which is simply a variable that can take a number of different distinct outcomes depending on the result of an underlying random process.
For example, say that we have a bag containing 3 red balls and 2 blue balls. If we let XX represent the random variable corresponding to the color of a drawn ball, then the probability of each of the two outcomes is given by
Random variables can be combined to yield new random variables. Returning to the ball example, let YY model the color of a second ball drawn from the bag (without replacing the first ball). The probability of YY being red depends on whether the first ball was red or blue. To represent all outcomes of XX and YY, we therefore use a probability tree diagram. This branching diagram represents all possible individual probabilities for XX and YY, with outcomes at the endpoints ("leaves") of the tree. The probability of any outcome is given by the product of probabilities along the path from the beginning of the tree; see Figure 2 for an illustrative example.
An event is simply a collection of outcomes. Because outcomes are distinct, the probability of an event can be written as the sum of the probabilities of its constituent outcomes. For our colored ball example, let AA be the event "YY is blue." Pr(A)Pr(A) is equal to the sum of the probabilities of two different outcomes: Pr(X=blue and Y=blue)+Pr(X=red and Y=blue)Pr(X=blue and Y=blue)+Pr(X=red and Y=blue), or 310+110=25310+110=25 (see Figure 2 above).
Given: Three positive integers kk, mm, and nn, representing a population containing k+m+nk+m+n organisms: kk individuals are homozygous dominant for a factor, mm are heterozygous, and nn are homozygous recessive.
Return: The probability that two randomly selected mating organisms will produce an individual possessing a dominant allele (and thus displaying the dominant phenotype). Assume that any two organisms can mate.
using the complementary probability of getting no dominant alleles. The terms in the sum correspond to the choice of the mating partners: from the subpopulation with recessive alleles only, one from the mixed and one from recessive only (counted twice, but only 50% of the offspring are homozygous), both from mixed (only 25% of the offspring are homozygous).
The 20 commonly occurring amino acids are abbreviated by using 20 letters from the English alphabet (all letters except for B, J, O, U, X, and Z). Protein strings are constructed from these 20 symbols. Henceforth, the term genetic string will incorporate protein strings along with DNA strings and RNA strings.
The RNA codon table dictates the details regarding the encoding of specific codons into the amino acid alphabet.
Given: An RNA string ss corresponding to a strand of mRNA (of length at most 10 kbp).
seq = "AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA" nucleics = "" import re # 用正则表达式,re 中的 findall方法分割字符串 seq_codons = re.findall('.{3}', seq.upper()) for codon in seq_codons: for nucleic, codons in CodonDict.items(): if codon notin codons: continue nucleics += nucleic
print(nucleics.strip("Stop"))
绝对有更好的办法吧?
别人的解法
string = """UUU F CUU L AUU I GUU V UUC F CUC L AUC I GUC V UUA L CUA L AUA I GUA V UUG L CUG L AUG M GUG V UCU S CCU P ACU T GCU A UCC S CCC P ACC T GCC A UCA S CCA P ACA T GCA A UCG S CCG P ACG T GCG A UAU Y CAU H AAU N GAU D UAC Y CAC H AAC N GAC D UAA Stop CAA Q AAA K GAA E UAG Stop CAG Q AAG K GAG E UGU C CGU R AGU S GGU G UGC C CGC R AGC S GGC G UGA Stop CGA R AGA R GGA G UGG W CGG R AGG R GGG G"""
for i in range(0, len(coded)-3, 3): decoded += traDict[coded[i:i+3]]
引用一个 这个段代码下面的评论:
“
I arrive on these threads a proud papa, thrilled that I have cracked the problem. And then I'm floored by these answers that get to the same result in half the lines (and elegantly, to boot) :)