我有分层数据,我想使用Python中的嵌套饼图来可视化这些数据.数据由门、属和物种级别组成,我想创建一个嵌套饼图,其中每个级别代表图表中的一个环.

我已经try 使用Matplotlib实现这一点,但我面临着根据某些类别的丰富程度仅过滤和显示嵌套饼图的特定部分的挑战.具体来说,我想:

最初显示所有门. 仅过滤和显示与特定门相关的属(例如,硬壁菌门). 仅过滤和显示与特定属相关的物种(例如,杆菌). 我try 根据在网上找到的建议修改代码,但没有得到所需的输出.

有人能否提供有关如何使用Python和Matplotlib实现此可视化的指导或代码示例?

如有任何帮助,我们将不胜感激.谢谢!

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.patches import Patch

# Read the Excel file
TissueS35_Analysis_Report = pd.read_excel("TissueS35_Analysis_Report.xlsx", sheet_name="Species")

# Select only the 'Phylum', 'Genus', and 'Species' columns
selected_columns = TissueS35_Analysis_Report[['Phylum', 'Genus', 'Species', 'Absolute Count']]

# Group by Phylum, Genus, and Species and sum the counts
grouped_data = selected_columns.groupby(['Phylum', 'Genus', 'Species']).sum().reset_index()

# Function to generate nested pie chart data
def nested_pie(df):
    outd = {}
    for level in range(3):
        if level == 0:
            gb = df.groupby('Phylum', sort=False).sum()
        elif level == 1:
            gb = df.groupby(['Phylum', 'Genus'], sort=False).sum()
        else:
            gb = df.groupby(['Phylum', 'Genus', 'Species'], sort=False).sum()
        outd[level] = {'names': gb.index.get_level_values(level).tolist(), 'values': gb['Absolute Count'].values}
    return outd

# Generate nested pie chart data
outd = nested_pie(grouped_data)

# Plot nested donut pie chart
fig, ax = plt.subplots()

# Plot Species level (Outermost ring)
sizes = outd[2]['values']
species_colors = plt.cm.tab20c.colors
species_labels = outd[2]['names']
ax.pie(sizes, radius=1, colors=species_colors, labels=species_labels, wedgeprops=dict(width=0.3, edgecolor='w'))

# Plot Genus level (Middle ring)
sizes = outd[1]['values']
genus_colors = plt.cm.tab20b.colors
genus_labels = outd[1]['names']
ax.pie(sizes, radius=0.7, colors=genus_colors, wedgeprops=dict(width=0.3, edgecolor='w'))

# Plot Phylum level (Innermost ring)
sizes = outd[0]['values']
phylum_colors = plt.cm.tab20.colors
phylum_labels = outd[0]['names']
ax.pie(sizes, radius=0.4, colors=phylum_colors, wedgeprops=dict(width=0.3, edgecolor='w'))

# Create legend for Phylum level
legend_handles = [Patch(color=color, label=label) for color, label in zip(phylum_colors, phylum_labels)]
ax.legend(handles=legend_handles, loc='center left', bbox_to_anchor=(1, 0.5), title='Phylum')

ax.set(aspect="equal")
plt.show()

enter image description here

small data refernce is  as follow 
Phylum             Genus         Species  Absolute Count
168  Proteobacteria       Pseudomonas    Unclassified           73745
152  Proteobacteria        Klebsiella    Unclassified           10777
190  Proteobacteria      Unclassified    Unclassified            4932
132  Proteobacteria   Chromobacterium    Unclassified            1840
84       Firmicutes    Lysinibacillus  boronitolerans            1780
104      Firmicutes         Weissella       ghanensis            1101
10   Actinobacteria   Corynebacterium    Unclassified             703
138  Proteobacteria       Cupriavidus        gilardii             586
93       Firmicutes    Staphylococcus    Unclassified             568
183  Proteobacteria  Stenotrophomonas      geniculata             542
Selection deleted

If possible, how can i do for overlay image as given below, I will be thankful for this help, Regards enter image description here

推荐答案

一种方法是定义一个创建嵌套饼图的函数:

import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1.inset_locator import inset_axes

data = {
    'Phylum': ['Proteobacteria', 'Proteobacteria', 'Proteobacteria', 'Proteobacteria',
               'Firmicutes', 'Firmicutes', 'Actinobacteria', 'Proteobacteria',
               'Firmicutes', 'Proteobacteria'],
    'Genus': ['Pseudomonas', 'Klebsiella', 'Unclassified', 'Chromobacterium',
              'Lysinibacillus', 'Weissella', 'Corynebacterium', 'Cupriavidus',
              'Staphylococcus', 'Stenotrophomonas'],
    'Species': ['Unclassified', 'Unclassified', 'Unclassified', 'Unclassified',
                'boronitolerans', 'ghanensis', 'Unclassified', 'gilardii',
                'Unclassified', 'geniculata'],
    'Absolute Count': [73745, 10777, 4932, 1840, 1780, 1101, 703, 586, 568, 542]
}

df = pd.DataFrame(data)


def create_nested_pie(df):
    fig, ax = plt.subplots()
    size = 0.3
    phylum_counts = df.groupby('Phylum')['Absolute Count'].sum()
    phylum_labels = phylum_counts.index.tolist()
    ax.pie(phylum_counts, labels=phylum_labels, radius=1, wedgeprops=dict(width=size, edgecolor='w'))

    firmicutes_genus_counts = df[df['Phylum'] == 'Firmicutes'].groupby('Genus')['Absolute Count'].sum()
    firmicutes_genus_labels = firmicutes_genus_counts.index.tolist()
    ax.pie(firmicutes_genus_counts, labels=firmicutes_genus_labels, radius=1-size, wedgeprops=dict(width=size, edgecolor='w'),
           labeldistance=0.7)

    lysinibacillus_species_counts = df[(df['Phylum'] == 'Firmicutes') & (df['Genus'] == 'Lysinibacillus')].groupby('Species')['Absolute Count'].sum()
    lysinibacillus_species_labels = lysinibacillus_species_counts.index.tolist()
    ax.pie(lysinibacillus_species_counts, labels=lysinibacillus_species_labels, radius=1-2*size, wedgeprops=dict(width=size, edgecolor='w'),
           labeldistance=0.4)

    plt.show()

create_nested_pie(df)

这给您:

enter image description here

更新:过滤

如果您的数据很大,那么可以通过稍微修改该函数来进行过滤以显示特定标签:

import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1.inset_locator import inset_axes

data = {
    'Phylum': ['Proteobacteria', 'Proteobacteria', 'Proteobacteria', 'Proteobacteria',
               'Firmicutes', 'Firmicutes', 'Actinobacteria', 'Proteobacteria',
               'Firmicutes', 'Proteobacteria'],
    'Genus': ['Pseudomonas', 'Klebsiella', 'Unclassified', 'Chromobacterium',
              'Lysinibacillus', 'Weissella', 'Corynebacterium', 'Cupriavidus',
              'Staphylococcus', 'Stenotrophomonas'],
    'Species': ['Unclassified', 'Unclassified', 'Unclassified', 'Unclassified',
                'boronitolerans', 'ghanensis', 'Unclassified', 'gilardii',
                'Unclassified', 'geniculata'],
    'Absolute Count': [73745, 10777, 4932, 1840, 1780, 1101, 703, 586, 568, 542]
}

df = pd.DataFrame(data)

def create_filtered_nested_pie(df, phylum_filter=None, genus_filter=None, species_filter=None):
    fig, ax = plt.subplots()
    size = 0.3
    
    if phylum_filter is not None:
        df = df[df['Phylum'].isin(phylum_filter)]
    phylum_counts = df.groupby('Phylum')['Absolute Count'].sum()
    ax.pie(phylum_counts, labels=phylum_counts.index.tolist(), radius=1, 
           wedgeprops=dict(width=size, edgecolor='w'))

    if genus_filter is not None:
        df_genus = df[df['Genus'].isin(genus_filter)]
    else:
        df_genus = df
    genus_counts = df_genus.groupby('Genus')['Absolute Count'].sum()
    ax.pie(genus_counts, labels=genus_counts.index.tolist(), radius=1-size, 
           wedgeprops=dict(width=size, edgecolor='w'), labeldistance=0.7)

    if species_filter is not None:
        df_species = df_genus[df_genus['Species'].isin(species_filter)]
    else:
        df_species = df_genus
    species_counts = df_species.groupby('Species')['Absolute Count'].sum()
    ax.pie(species_counts, labels=species_counts.index.tolist(), radius=1-2*size, 
           wedgeprops=dict(width=size, edgecolor='w'), labeldistance=0.4)

    plt.show()

create_filtered_nested_pie(df, 
                           phylum_filter=['Proteobacteria', 'Firmicutes'],
                           genus_filter=['Pseudomonas', 'Lysinibacillus'],
                           species_filter=['boronitolerans', 'Unclassified'])

Wich给予

enter image description here

为了获得比例饼图并消除所有未过滤的内容(然后变成透明),我稍微修改了我的函数以计算所占的比例并"隐藏"未过滤的内容:

import pandas as pd
import matplotlib.pyplot as plt

data = {
    'Phylum': ['Proteobacteria', 'Proteobacteria', 'Proteobacteria', 'Proteobacteria',
               'Firmicutes', 'Firmicutes', 'Actinobacteria', 'Proteobacteria',
               'Firmicutes', 'Proteobacteria'],
    'Genus': ['Pseudomonas', 'Klebsiella', 'Unclassified', 'Chromobacterium',
              'Lysinibacillus', 'Weissella', 'Corynebacterium', 'Cupriavidus',
              'Staphylococcus', 'Stenotrophomonas'],
    'Species': ['Unclassified', 'Unclassified', 'Unclassified', 'Unclassified',
                'boronitolerans', 'ghanensis', 'Unclassified', 'gilardii',
                'Unclassified', 'geniculata'],
    'Absolute Count': [3745, 10777, 4932, 1840, 1780, 1101, 703, 586, 568, 542]
}
df = pd.DataFrame(data)

def create_selective_label_pie(df, phylum_filter=None, genus_filter=None, species_filter=None):
    fig, ax = plt.subplots()
    size = 0.3

    total_phylum_counts = df.groupby('Phylum')['Absolute Count'].sum()
    phylum_colors = ['blue' if phylum in phylum_filter else 'none' for phylum in total_phylum_counts.index]
    phylum_labels = [phylum if phylum in phylum_filter else "" for phylum in total_phylum_counts.index]
    ax.pie(total_phylum_counts, labels=phylum_labels, colors=phylum_colors, radius=1,
           wedgeprops=dict(width=size, edgecolor='w'))

    total_genus_counts = df.groupby(['Phylum', 'Genus'])['Absolute Count'].sum()
    genus_colors = ['green' if (phylum in phylum_filter and genus in genus_filter) else 'none' 
                    for (phylum, genus) in total_genus_counts.index]
    genus_labels = [genus if (phylum in phylum_filter and genus in genus_filter) else ""
                    for (phylum, genus) in total_genus_counts.index]
    ax.pie(total_genus_counts, labels=genus_labels, colors=genus_colors, radius=1-size,
           wedgeprops=dict(width=size, edgecolor='w'), labeldistance=0.7)

    total_species_counts = df.groupby(['Phylum', 'Genus', 'Species'])['Absolute Count'].sum()
    species_colors = ['red' if (phylum in phylum_filter and genus in genus_filter and species in species_filter) else 'none'
                      for (phylum, genus, species) in total_species_counts.index]
    species_labels = [species if (phylum in phylum_filter and genus in genus_filter and species in species_filter) else ""
                      for (phylum, genus, species) in total_species_counts.index]
    ax.pie(total_species_counts, labels=species_labels, colors=species_colors, radius=1-2*size,
           wedgeprops=dict(width=size, edgecolor='w'), labeldistance=0.4)

    plt.title('Pie Chart with Selective Labels and Transparency')
    plt.show()

create_selective_label_pie(df, 
                           phylum_filter=['Firmicutes'],
                           genus_filter=['Weissella'],
                           species_filter=['boronitolerans'])

这给出了(一个荒谬的例子,但说明性的)

enter image description here


create_selective_label_pie(df, 
                           phylum_filter=['Proteobacteria', 'Firmicutes'],
                           genus_filter=['Lysinibacillus'],
                           species_filter=['boronitolerans'])

会给

enter image description here

对于 colored颜色 变化,您必须自己做一些事情.

Python相关问答推荐

在应用循环中间保存pandas DataFrame

try 与gemini-pro进行多轮聊天时出错

如何使用matplotlib在Python中使用规范化数据和原始t测试值创建组合热图?

海运图:调整行和列标签

使用@ guardlasses. guardlass和注释的Python继承

Python虚拟环境的轻量级使用

对所有子图应用相同的轴格式

将tdqm与cx.Oracle查询集成

如何使用SentenceTransformers创建矢量嵌入?

如何使用使用来自其他列的值的公式更新一个rabrame列?

巨 Python :逆向猜谜游戏

Tensorflow tokenizer问题.num_words到底做了什么?

如何从比较函数生成ngroup?

如何重新组织我的Pandas DataFrame,使列名成为列值?

使用SQLAlchemy从多线程Python应用程序在postgr中插入多行的最佳方法是什么?'

如何获取包含`try`外部堆栈的`__traceback__`属性的异常

如何在Python中解析特定的文本,这些文本包含了同一行中的所有内容,

类型对象';敌人';没有属性';损害';

函数()参数';代码';必须是代码而不是字符串

使用Scikit的ValueError-了解