Python 在每个唯一 ID 的列中查找特定值之前的索引列表

发布于08月25日

我有这样的数据帧:

id |    date   | status
________________________
...     ...      ...
1  |2020-01-01 | reserve
1  |2020-01-02 | sold
2  |2020-01-01 | free
3  |2020-01-03 | reserve
3  |2020-01-25 | signed
3  |2020-01-30 | sold
...     ...      ...
10 |2020-01-02 | signed
10 |2020-02-15 | sold 
...     ...      ....

我希望找到状态为sold的所有行的索引，然后在其他情况下，为这些行的前29天(状态为sold的行)分配1和0.

所需的数据帧如下

id |    date   | status  | label
_________________________________
...     ...      ...        ...
1  |2020-01-01 | reserve | 1 
1  |2020-01-02 | sold    | 1
2  |2019-12-02 | free    | 0    # no sold status for 2
3  |2020-01-03 | reserve | 1
3  |2020-01-25 | signed  | 1
3  |2020-01-30 | sold    | 1
...     ...      ...        ...
10 |2020-01-02 | signed  | 0
10 |2020-02-15 | sold    | 1    # more than 29 days from 2020-02-15
...     ...      ....       ...

我try 使用apply()，但我发现不能像那样调用函数

def make_labels(df):    
    
    def get_indices(df):
        return list(df[df['date'] >= df.iloc[-1]['date'] - timedelta(days=29)].index)

    df.sort_values(['id', 'date'], inplace=True)
    zero_labels = pd.Series(0, index = df.index, name='sold_labels')    
    one_lables = df.groupby('id')['status'].apply(lambda s: get_indices if s.iloc[-1] == 'sold').sum()
    zero_labels.loc[one_lables] = 1
    
    return zero_labels

df['label'] = make_labels(df)

输入的数据帧构造函数:

d = {'id': [1, 1, 2, 3, 3, 3, 10, 10], 
     'date': ['2020-01-01', '2020-01-02', '2020-01-01', '2020-01-03', '2020-01-25', '2020-01-30', '2020-01-02', '2020-02-15'],
     'status': ['reserve', 'sold', 'free', 'reserve', 'signed', 'sold', 'signed', 'sold']
    }
df = pd.DataFrame(data=d)

df['date'] = pd.to_datetime(df['date']) ref = (df['date'] .where(df['status'].eq('sold')) .groupby(df['id']) .transform('first') ) df['label'] = (df['date'].rsub(ref) .le('29days') .astype(int) )

id date status label 0 1 2020-01-01 reserve 1 1 1 2020-01-02 sold 1 2 2 2020-01-01 free 0 3 3 2020-01-03 reserve 1 4 3 2020-01-25 signed 1 5 3 2020-01-30 sold 1 6 10 2020-01-02 signed 0 7 10 2020-02-15 sold 1

Python 在每个唯一 ID 的列中查找特定值之前的索引列表

推荐答案

Python相关问答推荐

将轨迹优化问题描述为NLP.如何用Gekko解决这个问题？当前面临异常：@错误：最大方程长度错误

如何使用Jinja语法在HTML中重定向期间传递变量？

类型错误：输入类型不支持ufuncisnan-在执行Mann-Whitney U测试时[SOLVED]

如何在Python中并行化以下搜索？

当从Docker的--env-file参数读取Python中的环境变量时，每个\n都会添加一个\'.如何没有额外的？

pyscript中的压痕问题

Odoo 16使用NTFS使字段只读

不允许访问非IPM文件夹

从Windows Python脚本在WSL上运行Linux应用程序

如何在PySide/Qt QColumbnView中删除列

Matplotlib中的字体权重

将标签移动到matplotlib饼图中楔形块的开始处

如何检测鼠标/键盘的空闲时间，而不是其他输入设备？

python sklearn ValueError：使用序列设置数组元素

将一个双框爆炸到另一个双框的范围内

Python将一个列值分割成多个列，并保持其余列相同

比较两个有条件的数据帧并删除所有不合格的数据帧

利用SCIPY沿第一轴对数组进行内插

如何在Pandas中用迭代器求一个序列的平均值？

如何在Quarto中的标题页之前创建序言页