我创建了一个for循环,该循环从一列中获取一个值,并在以后的数据中是否超过该值一两次时进行展望.该代码可以工作,但由于其运行的数据集非常大,因此代码变得非常慢.我怀疑特别是因为每次迭代都会计算超出值的次数(超过大约50万行).有没有办法加快速度?
import pandas as pd
df1 = pd.DataFrame({'index': [0,1,2,3,4], 'Time': ['2022-01-01','2022-01-02','2022-01-03','2022-01-04','2022-01-05'], 'A':[234,456,323,576,234], 'B': [0,1,0,1,0], 'B.v': [0,234,0,323,0], 'in' : [0,0,0,0,0], 'out':[0,0,0,0,0]})
def calc(df1):
df2 = pd.DataFrame(df1[df1['B'] == 1])
for x in range(len(df2)):
index = df2.iloc[x, df2.columns.get_loc('index')]
tvalue = df2.iloc[x, df2.columns.get_loc('A')]
pointvalue = df2.iloc[x, df2.columns.get_loc('B.v')]
postrates = df1['A'].values[range(index,len(df1))]
if sum(pointvalue > postrates) == 1:
df1.iloc[index, df1.columns.get_loc('in')] = 1
if sum(pointvalue > postrates) >= 2:
df1.iloc[index, df1.columns.get_loc('in')] = 2
if sum(tvalue < postrates) == 1:
df1.iloc[index, df1.columns.get_loc('out')] = 1
if sum(tvalue < postrates) >= 2:
df1.iloc[index, df1.columns.get_loc('out')] = 2
return df1
if __name__ == "__main__":
print(calc(df1))