我有一个Pandas 数据框df
:
Car | Open | Time |
---|---|---|
Audi A5 | 0 | 0 |
Audi A5 | 0 | 1 |
Audi A5 | 0 | 2 |
Audi A5 | 1 | 3 |
Audi A5 | 1 | 4 |
Audi A5 | 0 | 5 |
Audi A5 | 0 | 6 |
Audi A5 | 0 | 7 |
Audi A5 | 1 | 8 |
Audi A5 | 1 | 9 |
Mercedes Class A | 1 | 0 |
Mercedes Class A | 1 | 1 |
Mercedes Class A | 1 | 2 |
Mercedes Class A | 0 | 3 |
Mercedes Class A | 0 | 4 |
Mercedes Class A | 1 | 5 |
Mercedes Class A | 1 | 6 |
Mercedes Class A | 0 | 7 |
Mercedes Class A | 0 | 8 |
Mercedes Class A | 1 | 9 |
我想将二进制序列Open
的有效部分放大n
个单位,但在将数据帧按Car
分组之后.
活动部分是一组连续的1,它们要么被0包围,要么只有0作为前一个值,或者只有0作为下一个值.忽略级数只有1作为值的情况.
如果为n = 1
,我希望获得以下数据帧:
Car | Open | Time |
---|---|---|
Audi A5 | 0 | 0 |
Audi A5 | 0 | 1 |
Audi A5 | 1 | 2 |
Audi A5 | 1 | 3 |
Audi A5 | 1 | 4 |
Audi A5 | 0 | 5 |
Audi A5 | 0 | 6 |
Audi A5 | 1 | 7 |
Audi A5 | 1 | 8 |
Audi A5 | 1 | 9 |
Mercedes Class A | 1 | 0 |
Mercedes Class A | 1 | 1 |
Mercedes Class A | 1 | 2 |
Mercedes Class A | 0 | 3 |
Mercedes Class A | 1 | 4 |
Mercedes Class A | 1 | 5 |
Mercedes Class A | 1 | 6 |
Mercedes Class A | 0 | 7 |
Mercedes Class A | 1 | 8 |
Mercedes Class A | 1 | 9 |
我可以使用以下代码获取所有活动部件的索引:
df = pd.DataFrame(
{
"Car": ["Audi A5"]*10 + ["Mercedes Class A"]*10,
"Time" : list(range(10)) + list(range(10)),
"Open" : [0,0,0,1,1,0,0,0,1,1,1,1,1,0,0,1,1,0,0,1]
}
)
def enlarge(dataframe : pd.DataFrame, sensor : str, n : int = 1) -> pd.DataFrame:
get_group_indexes = (
lambda x: x.index[0]
if x.index[-1] - x.index[0] >= 1
else None
)
groups = (
dataframe[sensor]
.eq(0)
.cumsum()[dataframe[sensor].ne(0)]
.to_frame()
.groupby(sensor)
.apply(get_group_indexes)
.dropna()
)
if groups.empty:
return dataframe
for index in groups:
dataframe.loc[index-n:index, sensor] = 1
return dataframe
当我不一定要按Car
分组,但我想在执行此转换之前按此列分组时,它是有效的.有没有人知道如何有效地使用Pandas 技巧来实现这一点?谢谢.