我有两个DataFrame:
import pandas as pd
df1 = pd.DataFrame(
{
'date': ['2024-01-01','2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05', '2024-01-06', '2024-01-07', '2024-01-08', '2024-01-09', '2024-01-10', '2024-01-11', '2024-01-12', '2024-01-13'],
'price': list(range(13))
}
)
df2 = pd.DataFrame(
{
'start': ['2024-01-01', '2024-01-03', '2024-01-10'],
'end': ['2024-01-03', '2024-01-08', '2024-01-12'],
'id': ['a', 'b', 'c']
}
)
这是预期输出.我想把id
加到df1
:
date price id
0 2024-01-01 0 NaN
1 2024-01-02 1 a
2 2024-01-03 2 a
3 2024-01-04 3 b
4 2024-01-05 4 b
5 2024-01-06 5 b
6 2024-01-07 6 b
7 2024-01-08 7 b
8 2024-01-09 8 NaN
9 2024-01-10 9 NaN
10 2024-01-11 10 c
11 2024-01-12 11 c
12 2024-01-13 12 NaN
过程是这样的.让我举一个输出行1
的例子:
a) The date
is 2024—01—02.看df2
.每行df2
都有一个范围.这个date
在第一行df2
之间.注意,start
是互斥的,end
是包容的.
b)从df2
中标识的行中获取id
并输入输出.
由于这两个DataFrame之间没有公共列,所以我使用了一个循环来获得输出.这是可行的,但我不确定这是否是最好的方法:
df1['date'] = pd.to_datetime(df1.date)
df2[['start', 'end']] = df2[['start', 'end']].apply(pd.to_datetime)
for idx, row in df2.iterrows():
start = row['start']
end = row['end']
id = row['id']
df1.loc[df1.date.between(start, end, inclusive='right'), 'id'] = id
有什么建议吗?