我有两个数据帧:
df1个,例如:
COL1 COL2 Taxon
A B Canis_lupus
C D Felis_catus
E F Mus_musculus
G H Canidae
I J Felidae
K L Muridae
M N Canis_lupus_familiaris
df2个,例如:
COL3 Number Taxonomy
1 120 d__Eukaryota;p__Metazoera;c__tetrapoda;o__Carnivora;f__Canidae;g__Canis;s__Canis_lupus
2 129 d__Eukaryota;p__Metazoera;c__tetrapoda;o__Carnivora;f__Canidae;g__Canis;s__Canis_lupus_familiaris
3 134 d__Eukaryota;p__Metazoera;c__tetrapoda;o__Carnivora;f__Felidae;g__Felis;s__Felis_catus
4 234 d__Eukaryota;p__Metazoera;c__tetrapoda;o__Rodentia;f__Muridae;g__Mus;s__Mus_musculus
5 12 d__Eukaryota;p__Metazoera;c__tetrapoda;o__Rodentia;f__Muridae;g__Rattus;s__Rattus_norgevigus
6 289 d__Eukaryota;p__Metazoera;c__tetrapoda;o__Rodentia;f__Muridae;g__Rattus;s__Rattus_Rattus
我想在df1中增加一个新的列,例如它应该是每个df1["Taxon"]
的df2["Number"]
列的总和.
df1个,例如:
COL1 COL2 Taxon Sum
A B Canis_lupus 120 # which is the sum of df2["Number"] for df2["Taxonomy"] == Canis_lupus
C D Felis_catus 134 # which is the sum of df2["Number"] for df2["Taxonomy"] == Felis_catus
E F Mus_musculus 234 # which is the sum of df2["Number"] for df2["Taxonomy"] == Mus_musculus
G H Canidae 249 # which is the sum of df2["Number"] for df2["Taxonomy"] == Canidae
I J Felidae 134 # which is the sum of df2["Number"] for df2["Taxonomy"] == Felidae
K L Muridae 535 # which is the sum of df2["Number"] for df2["Taxonomy"] == Muridae
M N Canis_lupus_familiaris 129 # which is the sum of df2["Number"] for df2["Taxonomy"] == Canis_lupus_familiaris
有谁有主意吗?
以下是以Python格式表示的数据:
from io import StringIO
data = """COL1\tCOL2\tTaxon
A\tB\tCanis_lupus
C\tD\tFelis_catus
E\tF\tMus_musculus
G\tH\tCanidae
I\tJ\tFelidae
K\tL\tMuridae
M\tN\tCanis_lupus_familiaris"""
# Read the tab-separated data into a DataFrame
df1 = pd.read_csv(StringIO(data), sep='\t')
df2 = pd.DataFrame({'COL3': [1, 2, 3, 4, 5, 6],
'Number': [120, 129, 134, 234, 12, 289],
'Taxonomy': ['d__Eukaryota;p__Metazoera;c__tetrapoda;o__Carnivora;f__Canidae;g__Canis;s__Canis_lupus',
'd__Eukaryota;p__Metazoera;c__tetrapoda;o__Carnivora;f__Canidae;g__Canis;s__Canis_lupus_familiaris',
'd__Eukaryota;p__Metazoera;c__tetrapoda;o__Carnivora;f__Felidae;g__Felis;s__Felis_catus',
'd__Eukaryota;p__Metazoera;c__tetrapoda;o__Rodentia;f__Muridae;g__Mus;s__Mus_musculus',
'd__Eukaryota;p__Metazoera;c__tetrapoda;o__Rodentia;f__Muridae;g__Rattus;s__Rattus_norgevigus',
'd__Eukaryota;p__Metazoera;c__tetrapoda;o__Rodentia;f__Muridae;g__Rattus;s__Rattus_Rattus']})