我有一个名为input.txt
的文本文件,如下所示.
A C H E C Q D S S C H H C R Q K L E D T S C H L E D V G K M
N T Y H C G E G I N N G P N A S C K F M L P C V V A E F E N H T
E T D W R C K L E A E H C D C K D A A V N H H F Y S L C K D V T E E W
请注意,上面的输入有3行氨基酸序列.
我有一个名为input.txt的文本文件,如下所示.
A C H E C Q D S S C H H C R Q K L E D T S C H L E D V G K M
N T Y H C G E G I N N G P N A S C K F M L P C V V A E F E N H T
E T D W R C K L E A E H C D C K D A A V N H H F Y S L C K D V T E E W
请注意,上面的输入有3行氨基酸序列.
我想把它转换成下面的格式.
<|endoftext|>
ACHECQDSSCHHCRQKLEDTSCHLEDVGKM
<|endoftext|>
NTYHCGEGINNGPNASCKFMLPCVVAEFEN
HT
<|endoftext|>
ETDWRCKLEAEHCDCKDAAVNHHFYSLCKD
VTEEW
氨基酸序列的每个开头都应以字符串"<;|endofText|>;"开头 每条新品系的氨基酸不应超过30个.
我有这个代码,但它不能完成工作:
def process_amino_acids(file_name):
with open(file_name, "r") as file:
data = file.read().replace("\n", "").replace(" ", "")
output = "<|endoftext|>"
for i, amino_acid in enumerate(data):
if i % 30 == 0 and i != 0:
output += "\n"
output += amino_acid
return output
def main():
input_file = "data/input.txt"
processed_amino_acids = process_amino_acids(input_file)
with open("data/output.txt", "w") as output_file:
output_file.write(processed_amino_acids)
print("Formatted amino acid sequences are written to output.txt")
if __name__ == "__main__":
main()
它给出的输出是:
<|endoftext|>ACHECQDSSCHHCRQKLEDTSCHLEDVGKM
NTYHCGEGINNGPNASCKFMLPCVVAEFEN
HTETDWRCKLEAEHCDCKDAAVNHHFYSLC
KDVTEEW
我怎样才能用Python正确地完成它呢?