我正在用西伯恩的histplot函数绘制两个直方图.第一个直方图代表我的整个数据集,而第二个是第一个的子集.然而,第二个直方图似乎并不像预期的那样与第一个重叠.下面是我使用的代码:
import numpy as np
from scipy.stats import norm
data = np.sin(np.arange(0, 6*np.pi, 0.1)) * 100
sns.scatterplot(x=[np.mean(data)], y=[0])
sns.lineplot(data)
population_size = 10000
sample_size = 100
total_means = []
for x in range(population_size):
total_means.append(np.mean(np.random.choice(data, 100)))
total_means = np.array(total_means)
sns.histplot(total_means, kde=True)
# Q. Find the range for 68% of data will lie in that interval
from scipy.stats import norm
z1 = norm.ppf(.50 - .68/2)
se = np.array(data).std() / sample_size ** .5
x1 = z1 * se + np.array(data).mean()
z2 = norm.ppf(.50 + .68/2)
x2 = z2 * se + np.array(data).mean()
print(x1, x2)
plt.xticks(np.arange(total_means.min(), total_means.max(), 10))
plt.xticks(np.arange(0, 500, 100))
sns.histplot(total_means, kde=True)
sns.histplot(total_means[(total_means >= x1) & (total_means <= x2)], kde=True, color='r')
在Stack Overflow上,建议避免发布完整的代码.然而,我有一些数据可以用来快速解决问题,而无需生成新数据.
在我的代码中,最后两行画了两个直方图.然而,从结果图中可以看出,这些直方图并不像预期的那样重叠.
sns.histplot(total_means, kde=True)
sns.histplot(total_means[(total_means >= x1) & (total_means <= x2)], kde=True, color='r')