我有在任何给定日期为给定问题ID所做的更改的数据库.CHANGED_PARAMETER中的值是对其进行更改的参数.

其已更改参数的旧值和新值分别在old_value new_value列中更新

Issue_Id Due_Date status estimation_hour changed_date changed_parameter old_value new_value
101 1/31/2023 closed 40 1/10/2023 status Defined Accepted
101 1/31/2023 closed 40 1/15/2023 estimation_hour 0 20
101 1/31/2023 closed 40 1/16/2023 estimation_hour 20 30
101 1/31/2023 closed 40 1/16/2023 Due_Date 1/20/2023 1/31/2023
101 1/31/2023 closed 40 1/20/2023 status Accepted InProgress
101 1/31/2023 closed 40 1/25/2023 estimation_hour 30 40
101 1/31/2023 closed 40 1/30/2023 status InProgress Closed
102 2/28/2023 closed 50 1/10/2023 status Defined Accepted
102 2/28/2023 closed 50 1/15/2023 estimation_hour 0 30
102 2/28/2023 closed 50 1/20/2023 status Accepted InProgress
102 2/28/2023 closed 50 1/25/2023 estimation_hour 30 50
102 2/28/2023 closed 50 1/30/2023 status InProgress Closed

因此,现在我必须根据上述数据创建更改日期的问题ID的快照

所以我的最终表格应该是这样的

Issue_Id Due_Date status estimation_hour changed_date changed_parameter old_value new_value
101 1/20/2023 Accepted 0 1/10/2023 status Defined Accepted
101 1/20/2023 Accepted 20 1/15/2023 estimation_hour 0 20
101 1/20/2023 Accepted 30 1/16/2023 estimation_hour 20 30
101 1/31/2023 Accepted 30 1/16/2023 Due_Date 1/20/2023 1/31/2023
101 1/31/2023 InProgress 30 1/20/2023 status Accepted InProgress
101 1/31/2023 InProgress 40 1/25/2023 estimation_hour 30 40
101 1/31/2023 closed 40 1/30/2023 status InProgress Closed
102 2/28/2023 Accepted 0 1/10/2023 status Defined Accepted
102 2/28/2023 Accepted 30 1/15/2023 estimation_hour 0 30
102 2/28/2023 InProgress 30 1/20/2023 status Accepted InProgress
102 2/28/2023 InProgress 50 1/25/2023 estimation_hour 30 50
102 2/28/2023 closed 50 1/30/2023 status InProgress Closed

我已经更新了Pandas DataFrame中的上述数据,并迭代了每一行.

在每次迭代中,我将更改参数的旧值更新为问题ID的先前可用行,直到相同.新值被更新到当前行,如下所示

迭代每一行都很耗时

第一次迭代

Issue_Id Due_Date status estimation_hour changed_date changed_parameter old_value new_value
101 1/31/2023 Accepted 40 1/10/2023 status Defined Accepted
101 1/31/2023 closed 40 1/15/2023 estimation_hour 0 20
101 1/31/2023 closed 40 1/16/2023 estimation_hour 20 30
101 1/31/2023 closed 40 1/16/2023 Due_Date 1/20/2023 1/31/2023
101 1/31/2023 closed 40 1/20/2023 status Accepted InProgress
101 1/31/2023 closed 40 1/25/2023 estimation_hour 30 40
101 1/31/2023 closed 40 1/30/2023 status InProgress Closed
102 2/28/2023 closed 50 1/10/2023 status Defined Accepted
102 2/28/2023 closed 50 1/15/2023 estimation_hour 0 30
102 2/28/2023 closed 50 1/20/2023 status Accepted InProgress
102 2/28/2023 closed 50 1/25/2023 estimation_hour 30 50
102 2/28/2023 closed 50 1/30/2023 status InProgress Closed

第二次迭代

Issue_Id Due_Date status estimation_hour changed_date changed_parameter old_value new_value
101 1/31/2023 Accepted 0 1/10/2023 status Defined Accepted
101 1/31/2023 closed 20 1/15/2023 estimation_hour 0 20
101 1/31/2023 closed 40 1/16/2023 estimation_hour 20 30
101 1/31/2023 closed 40 1/16/2023 Due_Date 1/20/2023 1/31/2023
101 1/31/2023 closed 40 1/20/2023 status Accepted InProgress
101 1/31/2023 closed 40 1/25/2023 estimation_hour 30 40
101 1/31/2023 closed 40 1/30/2023 status InProgress Closed
102 2/28/2023 closed 50 1/10/2023 status Defined Accepted
102 2/28/2023 closed 50 1/15/2023 estimation_hour 0 30
102 2/28/2023 closed 50 1/20/2023 status Accepted InProgress
102 2/28/2023 closed 50 1/25/2023 estimation_hour 30 50
102 2/28/2023 closed 50 1/30/2023 status InProgress Closed

第三次迭代

Issue_Id Due_Date status estimation_hour changed_date changed_parameter old_value new_value
101 1/31/2023 Accepted 0 1/10/2023 status Defined Accepted
101 1/31/2023 closed 20 1/15/2023 estimation_hour 0 20
101 1/31/2023 closed 30 1/16/2023 estimation_hour 20 30
101 1/31/2023 closed 40 1/16/2023 Due_Date 1/20/2023 1/31/2023
101 1/31/2023 closed 40 1/20/2023 status Accepted InProgress
101 1/31/2023 closed 40 1/25/2023 estimation_hour 30 40
101 1/31/2023 closed 40 1/30/2023 status InProgress Closed
102 2/28/2023 closed 50 1/10/2023 status Defined Accepted
102 2/28/2023 closed 50 1/15/2023 estimation_hour 0 30
102 2/28/2023 closed 50 1/20/2023 status Accepted InProgress
102 2/28/2023 closed 50 1/25/2023 estimation_hour 30 50
102 2/28/2023 closed 50 1/30/2023 status InProgress Closed

第4次迭代

Issue_Id Due_Date status estimation_hour changed_date changed_parameter old_value new_value
101 1/20/2023 Accepted 0 1/10/2023 status Defined Accepted
101 1/20/2023 closed 20 1/15/2023 estimation_hour 0 20
101 1/20/2023 closed 30 1/16/2023 estimation_hour 20 30
101 1/31/2023 closed 40 1/16/2023 Due_Date 1/20/2023 1/31/2023
101 1/31/2023 closed 40 1/20/2023 status Accepted InProgress
101 1/31/2023 closed 40 1/25/2023 estimation_hour 30 40
101 1/31/2023 closed 40 1/30/2023 status InProgress Closed
102 2/28/2023 closed 50 1/10/2023 status Defined Accepted
102 2/28/2023 closed 50 1/15/2023 estimation_hour 0 30
102 2/28/2023 closed 50 1/20/2023 status Accepted InProgress
102 2/28/2023 closed 50 1/25/2023 estimation_hour 30 50
102 2/28/2023 closed 50 1/30/2023 status InProgress Closed

第5次迭代

Issue_Id Due_Date status estimation_hour changed_date changed_parameter old_value new_value
101 1/20/2023 Accepted 0 1/10/2023 status Defined Accepted
101 1/20/2023 Accepted 20 1/15/2023 estimation_hour 0 20
101 1/20/2023 Accepted 30 1/16/2023 estimation_hour 20 30
101 1/31/2023 Accepted 40 1/16/2023 Due_Date 1/20/2023 1/31/2023
101 1/31/2023 InProgress 40 1/20/2023 status Accepted InProgress
101 1/31/2023 closed 40 1/25/2023 estimation_hour 30 40
101 1/31/2023 closed 40 1/30/2023 status InProgress Closed
102 2/28/2023 closed 50 1/10/2023 status Defined Accepted
102 2/28/2023 closed 50 1/15/2023 estimation_hour 0 30
102 2/28/2023 closed 50 1/20/2023 status Accepted InProgress
102 2/28/2023 closed 50 1/25/2023 estimation_hour 30 50
102 2/28/2023 closed 50 1/30/2023 status InProgress Closed

以此类推..

推荐答案

由于您的情况不是那么简单,您需要reshape 您的数据帧,然后按Issue_Id分组,然后更新值.因为您的数据框是按"CHANGED_DATE"排序的,所以我们的 idea 是向前填充新值,向后填充旧值.如果缺少该参数,只需使用现有值填充即可:

def update_values(df):
    return df['new_value'].ffill().fillna(df['old_value'].bfill())
        
upd_values = (df.pivot_table(index=df.index, columns='changed_parameter',
                             values=['old_value', 'new_value'], aggfunc='first')
                .groupby(df['Issue_Id']).apply(update_values)
                .droplevel('Issue_Id').fillna(df))

df[upd_values.columns] = upd_values

输出:

>>> df
    Issue_Id   Due_Date      status  estimation_hour changed_date changed_parameter   old_value   new_value
0        101  1/20/2023    Accepted                0    1/10/2023            status     Defined    Accepted
1        101  1/20/2023    Accepted               20    1/15/2023   estimation_hour           0          20
2        101  1/20/2023    Accepted               30    1/16/2023   estimation_hour          20          30
3        101  1/31/2023    Accepted               30    1/16/2023          Due_Date   1/20/2023   1/31/2023
4        101  1/31/2023  InProgress               30    1/20/2023            status    Accepted  InProgress
5        101  1/31/2023  InProgress               40    1/25/2023   estimation_hour          30          40
6        101  1/31/2023      Closed               40    1/30/2023            status  InProgress      Closed
7        102  2/28/2023    Accepted                0    1/10/2023            status     Defined    Accepted
8        102  2/28/2023    Accepted               30    1/15/2023   estimation_hour           0          30
9        102  2/28/2023  InProgress               30    1/20/2023            status    Accepted  InProgress
10       102  2/28/2023  InProgress               50    1/25/2023   estimation_hour          30          50
11       102  2/28/2023      Closed               50    1/30/2023            status  InProgress      Closed

Old answer

old_value = pd.to_numeric(df['old_value'], errors='coerce').shift(-1)
new_value = pd.to_numeric(df['new_value'], errors='coerce')

df['estimation_hour'] = old_value.fillna(new_value).ffill().convert_dtypes()
df['status'] = df['new_value'].mask(new_value.notna()).ffill()

输出:

>>> df
   Issue_Id      status  estimation_hour changed_date changed_parameter   old_value   new_value
0       101    Accepted                0    1/10/2023            status     Defined    Accepted
1       101    Accepted               20    1/15/2023   estimation_hour           0          20
2       101  InProgress               20    1/20/2023            status    Accepted  InProgress
3       101  InProgress               40    1/25/2023   estimation_hour          20          40
4       101      Closed               40    1/30/2023            status  InProgress      Closed
5       102    Accepted                0    1/10/2023            status     Defined    Accepted
6       102    Accepted               30    1/15/2023   estimation_hour           0          30
7       102  InProgress               30    1/20/2023            status    Accepted  InProgress
8       102  InProgress               50    1/25/2023   estimation_hour          30          50
9       102      Closed               50    1/30/2023            status  InProgress      Closed

Python相关问答推荐

这些变量是否相等,因为它们引用相同的实例,尽管它们看起来应该具有不同的值?

单击Cookie横幅错误并在Selenium中启用搜索栏

单击cookie按钮,但结果不一致

将C struct 的指针传递给Python中的ioctel

当pip为学校作业(job)安装sourcefender时,我没有收到匹配的分发错误.我已经try 过Python 3.8.10和3.10.11

如何在Pygame中绘制右对齐的文本?

如何使用关键参数按列对Pandas rame进行排序

来自ARIMA结果的模型方程

如何在Python中增量更新DF

pandas DataFrame中类型转换混乱

如何修复使用turtle和tkinter制作的绘画应用程序的撤销功能

使用LineConnection动画1D数据

将jit与numpy linSpace函数一起使用时出错

运行Python脚本时,用作命令行参数的SON文本

使可滚动框架在tkinter环境中看起来自然

为什么默认情况下所有Python类都是可调用的?

lityter不让我输入左边的方括号,'

numpy.unique如何消除重复列?

从列表中获取n个元素,其中list [i][0]== value''

ruamel.yaml dump:如何阻止map标量值被移动到一个新的缩进行?