我有一个数据框看起来是这样的:
col1
0 10
1 5
2 8
3 12
4 13
5 6
6 9
7 11
8 10
9 3
10 21
11 18
12 14
13 16
14 30
15 45
16 31
17 40
18 38
对于‘col1’中的每个单元格,我计算一个值范围:
df['df_min'] = df.col1 - df.col1 * 0.2
df['df_max'] = df.col1 + df.col1 * 0.2
对于每个单元格都有一个范围,我想要计算过go xx个单元格(本例中为3个)中有多少个单元格在该范围内,但没有循环,因为使用我的实际模型需要很长时间.
我正在努力实现这样的结果:
col1 df_min df_max counter
0 10 8.0 12.0 -1
1 5 4.0 6.0 -1
2 8 6.4 9.6 -1
3 12 9.6 14.4 1
4 13 10.4 15.6 1
5 6 4.8 7.2 0
6 9 7.2 10.8 0
7 11 8.8 13.2 2
8 10 8.0 12.0 2
9 3 2.4 3.6 0
10 21 16.8 25.2 0
11 18 14.4 21.6 1
12 14 11.2 16.8 0
13 16 12.8 19.2 2
14 30 24.0 36.0 0
15 45 36.0 54.0 0
16 31 24.8 37.2 1
17 40 32.0 48.0 1
18 38 30.4 45.6 3
下面是我能想出的(乱七八糟的)代码,但如果可能的话,我真的想要一个更快的解决方案.如有任何帮助,我们将不胜感激.
df = pd.DataFrame({"col1":[10, 5, 8, 12, 13, 6, 9, 11, 10, 3, 21, 18, 14, 16, 30, 45, 31, 40, 38]})
back = 3 # numbers of cells to check back
df['df_min'] = df.col1 - df.col1 * 0.2
df['df_max'] = df.col1 + df.col1 * 0.2
l = []
for window in df.col1.rolling(window=back+1, center=False, closed='right'):
if window.empty:
pass
else:
a = window.iloc[-1]
range_min = a - a * 0.2
range_max = a + a * 0.2
c = 0
if len(window) == back+1:
for b in window:
if (b >= range_min and b <= range_max):
c += 1
c = c-1 # substract 1 because window includes the tested value which is always true
l.append(c)
df1 = pd.DataFrame(l, columns=['counter'])
df = df.join(df1)
print(df)