这是我的数据框片段:

    species bill_length_mm  bill_depth_mm   flipper_length_mm     body_mass_g   predicted_species
0   Adelie       18                   18         181             3750                Chinstrap
1   Adelie       17                   17         186             3800                Adelie
2   Adelie       18                   18         195             3250                Gentoo
3   Adelie       0                    0           0               0                  Adelie
4   Chinstrap    19                   19         193             3450                Chinstrap
5   Chinstrap    20                   20         190             3650                Gentoo
6   Chinstrap    17                   17         181             3625                Adelie
7   Gentoo       19                   19         195             4675                Chinstrap
8   Gentoo       18                   18         193             3475                Gentoo
9   Gentoo       20                   20         190             4250                Gentoo

I want to make a biplot for my data, which would be something like this: enter image description here

但我想 for each speciespredicted_species矩阵做一个双曲线图,所以9个子曲线图,如上所述,我不确定如何才能实现.一种方法可能是将数据帧拆分成数据帧,并 for each 数据帧制作一个双曲线,但这不是很有效,很难进行比较.

有谁能就如何做到这一点提供一些建议?

推荐答案

把朱启云在how to plot a biplot上的答案和我在how to split the plot上的答案结合成真实的子集和预测的子集,你可以这样做:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Load iris data.
iris = sns.load_dataset('iris')
X = iris.iloc[:, :4].values
y = iris.iloc[:, 4].values
features = iris.columns[:4]
targets = ['setosa', 'versicolor', 'virginica']

# Mock up some predictions.
iris['species_pred'] = (40 * ['setosa'] + 5 * ['versicolor'] + 5 * ['virginica']
                        + 40 * ['versicolor'] + 5 * ['setosa'] + 5 * ['virginica']
                        + 40 * ['virginica'] + 5 * ['versicolor'] + 5 * ['setosa'])

# Reduce features to two dimensions.
X_scaled = StandardScaler().fit_transform(X)
pca = PCA(n_components=2).fit(X_scaled)
X_reduced = pca.transform(X_scaled)
iris[['pc1', 'pc2']] = X_reduced


def biplot(x, y, data=None, **kwargs):
    # Plot data points.
    sns.scatterplot(data=data, x=x, y=y, **kwargs)
    
    # Calculate arrow parameters.
    loadings = pca.components_[:2].T
    pvars = pca.explained_variance_ratio_[:2] * 100
    arrows = loadings * np.ptp(X_reduced, axis=0)
    width = -0.0075 * np.min([np.subtract(*plt.xlim()), np.subtract(*plt.ylim())])

    # Plot arrows.
    horizontal_alignment = ['right', 'left', 'right', 'right']
    vertical_alignment = ['bottom', 'top', 'top', 'bottom']
    for (i, arrow), ha, va in zip(enumerate(arrows), 
                                  horizontal_alignment, vertical_alignment):
        plt.arrow(0, 0, *arrow, color='k', alpha=0.5, width=width, ec='none',
                  length_includes_head=True)
        plt.text(*(arrow * 1.05), features[i], ha=ha, va=va, 
                 fontsize='small', color='gray')

    
# Plot small multiples, corresponding to confusion matrix.
sns.set()
g = sns.FacetGrid(iris, row='species', col='species_pred', 
                  hue='species', margin_titles=True)
g.map(biplot, 'pc1', 'pc2')
plt.show()

biplot split into nine parts

Python相关问答推荐

pandas MultiIndex是SQL复合索引的对应物吗?

计算每月过go x年的平均值

如何在Pygame中绘制右对齐的文本?

Numpy索引argsorted使用integer数组,同时保留排序顺序

收件箱转换错误- polars.exceptions. ComputeHelp- pandera(0.19.0b3)带有polars

Python主进程和分支进程如何共享gc信息?

如何防止Plotly在输出到PDF时减少行中的点数?

如何让我的Tkinter应用程序适合整个窗口,无论大小如何?

@Property方法上的inspect.getmembers出现意外行为,引发异常

使用FASTCGI在IIS上运行Django频道

将特定列信息移动到当前行下的新行

rame中不兼容的d类型

scikit-learn导入无法导入名称METRIC_MAPPING64'

如何在Python数据框架中加速序列的符号化

avxspan与pandas period_range

使用NeuralProphet绘制置信区间时出错

什么是最好的方法来切割一个相框到一个面具的第一个实例?

如何在BeautifulSoup/CSS Select 器中处理regex?

未调用自定义JSON编码器

Maya Python脚本将纹理应用于所有对象,而不是选定对象