我遇到了与this question类似的问题;我正在try 合并Seborn的三个地块,但我y轴上的标签与条形图不对齐.

我的代码(现在是一个有效的复制粘贴示例):

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm

### Generate example data
np.random.seed(123)
year = [2018, 2019, 2020, 2021]
task = [x + 2 for x in range(18)]
student = [x for x in range(200)]
amount = [x + 10 for x in range(90)]
violation = [letter for letter in "thisisjustsampletextforlabels"] # one letter labels

df_example = pd.DataFrame({

    # some ways to create random data
    'year':np.random.choice(year,500),
    'task':np.random.choice(task,500),
    'violation':np.random.choice(violation, 500),
    'amount':np.random.choice(amount, 500),
    'student':np.random.choice(student, 500)
})

### My code
temp = df_example.groupby(["violation"])["amount"].sum().sort_values(ascending = False).reset_index()
total_violations = temp["amount"].sum()
sns.set(font_scale = 1.2)


f, axs = plt.subplots(1,3,
                      figsize=(5,5),
                      sharey="row",
                      gridspec_kw=dict(width_ratios=[3,1.5,5]))

# Plot frequency
df1 = df_example.groupby(["year","violation"])["amount"].sum().sort_values(ascending = False).reset_index()
frequency = sns.barplot(data = df1, y = "violation", x = "amount", log = True, ax=axs[0])


# Plot percent
df2 = df_example.groupby(["violation"])["amount"].sum().sort_values(ascending = False).reset_index()
total_violations = df2["amount"].sum()
percent = sns.barplot(x='amount', y='violation', estimator=lambda x: sum(x) / total_violations * 100, data=df2, ax=axs[1])

# Pivot table and plot heatmap 
df_heatmap = df_example.groupby(["violation", "task"])["amount"].sum().sort_values(ascending = False).reset_index()
df_heatmap_pivot = df_heatmap.pivot("violation", "task", "amount")
df_heatmap_pivot = df_heatmap_pivot.reindex(index=df_heatmap["violation"].unique())
heatmap = sns.heatmap(df_heatmap_pivot, fmt = "d", cmap="Greys", norm=LogNorm(), ax=axs[2])
plt.subplots_adjust(top=1)


axs[2].set_facecolor('xkcd:white')
axs[2].set(ylabel="",xlabel="Task")

axs[0].set_xlabel('Total amount of violations per year')
axs[1].set_xlabel('Percent (%)')

axs[1].set_ylabel('')
axs[0].set_ylabel('Violation')

结果可以在这里看到:

Barplot and Heatmap

Y形标签根据我的上一张图,热图对齐.但是,条形图中的条形图中的条形图是在顶部剪裁的,并且不与标签对齐.我只需要轻轻推一下wine 吧里的栏杆--但怎么做呢?我一直在看文档,但我觉得到目前为止我还是一无所知.

推荐答案

  • 查看here确保所有y轴标记都没有对齐,因为多个数据帧用于绘图.使用要绘制的聚合数据创建单个数据帧违规行为会更好.从违规金额之和开始,然后添加新的百分比列.这将确保两个条形图具有相同的y轴.
  • 不使用.groupby,然后使用.pivot来创建DF_热图_枢轴,使用.pivot_table,然后使用违规行为.violation重新编制索引.
  • Tested in 100, 101, 102, 103

DataFrame和导入

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm

# Generate example data
year = [2018, 2019, 2020, 2021]
task = [x + 2 for x in range(18)]
student = [x for x in range(200)]
amount = [x + 10 for x in range(90)]
violation = list("thisisjustsampletextforlabels")  # one letter labels

np.random.seed(123)
df_example = pd.DataFrame({name: np.random.choice(group, 500) for name, group in
                           zip(['year', 'task', 'violation', 'amount', 'student'],
                               [year, task, violation, amount, student])})

# organize all of the data
# 违规行为 frequency
违规行为 = df_example.groupby(["violation"])["amount"].sum().sort_values(ascending=False).reset_index()
total_违规行为 = 违规行为["amount"].sum()

# add percent
违规行为['percent'] = 违规行为.amount.div(total_违规行为).mul(100).round(2)

# Use .pivot_table to create the pivot table
DF_热图_枢轴 = df_example.pivot_table(index='violation', columns='task', values='amount', aggfunc='sum')
# Set the index to match the plot order of the 'violation' column 
DF_热图_枢轴 = DF_热图_枢轴.reindex(index=违规行为.violation)

标绘

  • 使用sharey='row'会导致对齐问题.使用sharey=False,并从axs[1]axs[2]中删除yticklabels,并使用axs[1 or 2].set_yticks([]).
  • 有关使用.bar_label的其他详细信息和示例,请参阅How to add value labels on a bar chart.
# set seaborn plot format
sns.set(font_scale=1.2)

# create the figure and set sharey=False
f, axs = plt.subplots(1, 3, figsize=(12, 12), sharey=False, gridspec_kw=dict(width_ratios=[3,1.5,5]))

# Plot frequency
sns.barplot(data=违规行为, x="amount", y="violation", log=True, ax=axs[0])

# Plot percent
sns.barplot(data=违规行为, x='percent', y='violation', ax=axs[1])

# add the bar labels
axs[1].bar_label(axs[1].containers[0], fmt='%.2f%%', label_type='edge', padding=3)
# add extra space for the annotation
axs[1].margins(x=1.3)

# plot the heatmap
heatmap = sns.heatmap(DF_热图_枢轴, fmt = "d", cmap="Greys", norm=LogNorm(), ax=axs[2])

# additional formatting
axs[2].set_facecolor('xkcd:white')
axs[2].set(ylabel="", xlabel="Task")

axs[0].set_xlabel('Total amount of 违规行为 per year')
axs[1].set_xlabel('Percent (%)')

axs[1].set_ylabel('')
axs[0].set_ylabel('Violation')

# remove yticks / labels
axs[1].set_yticks([])  
_ = axs[2].set_yticks([])

enter image description here

  • 注释掉最后两行,以验证每axs个ytickLabels是否对齐.

enter image description here

DataFrame视图

Df_Example.head()

   year  task violation  amount  student
0  2020     2         i      84       59
1  2019     2         u      12      182
2  2020     5         s      20        9
3  2020    11         u      56      163
4  2018    17         t      59      125

违规行为

   violation  amount  percent
0          s    4869    17.86
1          l    3103    11.38
2          t    3044    11.17
3          e    2634     9.66
4          a    2177     7.99
5          i    2099     7.70
6          h    1275     4.68
7          f    1232     4.52
8          b    1191     4.37
9          m    1155     4.24
10         o    1075     3.94
11         p     763     2.80
12         r     762     2.80
13         j     707     2.59
14         u     595     2.18
15         x     578     2.12

DF_热图_枢轴

task          2      3      4      5      6      7      8      9      10     11     12     13     14     15     16     17     18     19
violation                                                                                                                              
s           62.0   36.0  263.0  273.0  191.0  250.0  556.0  239.0  230.0  188.0  185.0  516.0  249.0  331.0  212.0  219.0  458.0  411.0
l           83.0  245.0  264.0  451.0  155.0  314.0   98.0  125.0  310.0  117.0   21.0   99.0   98.0   50.0   40.0  268.0  192.0  173.0
t          212.0  255.0   45.0  141.0   74.0  135.0   52.0  202.0  107.0  128.0  158.0    NaN  261.0  137.0  339.0  207.0  362.0  229.0
e          215.0  315.0    NaN  116.0  213.0  165.0  130.0  194.0   56.0  355.0   75.0    NaN  118.0  189.0  160.0  177.0   79.0   77.0
a          135.0    NaN  165.0  156.0  204.0  115.0   77.0   65.0   80.0  143.0   83.0  146.0   21.0   29.0  285.0   72.0  116.0  285.0
i          209.0    NaN   20.0  187.0   83.0  136.0   24.0  132.0  257.0   56.0  201.0   52.0  136.0  226.0  104.0  145.0   91.0   40.0
h           27.0    NaN  255.0    NaN   99.0    NaN   71.0   53.0  100.0   89.0    NaN  106.0    NaN  170.0   86.0   79.0  140.0    NaN
f           75.0   23.0   99.0    NaN   26.0  103.0    NaN  185.0   99.0  145.0    NaN   63.0   64.0   29.0  114.0  141.0   38.0   28.0
b           44.0   70.0   56.0   12.0   55.0   14.0  158.0  130.0    NaN   11.0   21.0    NaN   52.0  137.0  162.0    NaN  231.0   38.0
m           86.0    NaN    NaN  147.0   74.0  131.0   49.0  180.0   94.0   16.0    NaN   88.0    NaN    NaN    NaN   51.0  161.0   78.0
o          109.0    NaN   51.0    NaN    NaN    NaN   20.0  139.0  149.0    NaN  101.0   60.0    NaN  143.0   39.0   73.0   10.0  181.0
p           16.0    NaN  197.0   50.0   87.0    NaN   88.0    NaN   11.0  162.0    NaN   14.0    NaN   78.0   45.0    NaN    NaN   15.0
r            NaN   85.0   73.0   40.0    NaN    NaN   68.0   77.0    NaN   26.0  122.0  105.0    NaN   98.0    NaN    NaN    NaN   68.0
j            NaN   70.0    NaN    NaN   73.0   76.0    NaN  150.0    NaN    NaN    NaN   81.0    NaN   97.0   97.0   63.0    NaN    NaN
u          174.0   45.0    NaN    NaN   32.0    NaN    NaN   86.0   30.0   56.0   13.0    NaN   24.0    NaN    NaN   69.0   54.0   12.0
x           69.0   29.0    NaN  106.0    NaN   43.0    NaN    NaN    NaN   97.0   56.0   29.0  149.0    NaN    NaN    NaN    NaN    NaN

Python相关问答推荐

具有多个选项的计数_匹配

如何自动抓取以下CSV

如何使用LangChain和AzureOpenAI在Python中解决AttribeHelp和BadPressMessage错误?

Python库:可选地支持numpy类型,而不依赖于numpy

删除字符串中第一次出现单词后的所有内容

为什么抓取的HTML与浏览器判断的元素不同?

实现自定义QWidgets作为QTimeEdit的弹出窗口

创建可序列化数据模型的最佳方法

在单次扫描中创建列表

处理具有多个独立头的CSV文件

在Admin中显示从ManyToMany通过模型的筛选结果

Gunicorn无法启动Flask应用,因为无法将应用解析为属性名或函数调用.'"'' "

计算空值

当单元测试失败时,是否有一个惯例会抛出许多类似的错误消息?

你能把函数的返回类型用作其他地方的类型吗?'

比较两个有条件的数据帧并删除所有不合格的数据帧

对数据帧进行分组,并按组间等概率抽样n行

如何获取给定列中包含特定值的行号?

基于2级列表的Pandas 切片3级多索引

Python键盘模块不会立即检测到按键