(交叉发布到Biostar:https://www.biostars.org/p/9562110/)
我有一个FASTA文件,我想在>;之后的第一个位置添加‘similar to’
>_Anouracaudifer_00017283-RA transcript Name:"Similar to Chid1 Chitinase domain-containing protein 1 (Rattus norvegicus OX=10116)" offset:0 AED:0.30 eAED:0.30 QI:0|0|0|1|1|1|12|0|393
ATGAAGGCGCTCCTGCATGTGCTCTGGCTCACTCTGGCCTGCGGCTCTGCTCACACCACCCTGTCGAAGTCGGATGCCAAGAAGTCTGCCTCCAAGACACTGCAGGAGAAGACTCAGCTCTCAGAGACACCTGTGCAGGACCGGGGTCTGGTGGTAACAGACCCCCGAGCCGAGGACG
我希望输出是这样的
>Chid1_Anouracaudifer_00017283-RA transcript Name:"Similar to Chid1 Chitinase domain-containing protein 1 (Rattus norvegicus OX=10116)" offset:0 AED:0.30 eAED:0.30 QI:0|0|0|1|1|1|12|0|393
ATGAAGGCGCTCCTGCATGTGCTCTGGCTCACTCTGGCCTGCGGCTCTGCTCACACCACCCTGTCGAAGTCGGATGCCAAGAAGTCTGCCTCCAAGACACTGCAGGAGAAGACTCAGCTCTCAGAGACACCTGTGCAGGACCGGGGTCTGGTGGTAACAGACCCCCGAGCCGAGGACG
我该怎么做呢? 我已经试过了
sed -E 's/(Similar to )(\w+)/>CHIA_\2\1\2/' file.txt > new_file_2.txt
个
并将其存储在一个新文件中,并try 将其粘贴到标题中,但它不起作用,你有什么 idea 吗?
并且还使用了一个python脚本
def extract_similar_to_word(line):
words = line.split()
for i, word in enumerate(words):
if word == "Similar":
similar_to_word = words[i + 2].strip('""')
if i + 3 < len(words) and words[i + 3].strip('""')[0].isupper():
similar_to_word = words[i + 1].strip('""') + words[i + 2].strip('""')
return similar_to_word
return None
def modify_fasta_headers(input_file, output_file):
with open(input_file, "r") as in_file, open(output_file, "w") as out_file:
for line in in_file:
if line.startswith(">"):
similar_to_word = extract_similar_to_word(line)
if similar_to_word:
# Find the first space in the line, then insert the similar_to_word
first_space_index = line.find(" ")
line = ">" + similar_to_word + "_" + line[1:first_space_index] + line[first_space_index:]
out_file.write(line)
input_file = "all_chias.fasta"
output_file = "modified_output_fasta_v1.fasta"
modify_fasta_headers(input_file, output_file)