我在Python Pandas中有如下数据框:
df = pd.DataFrame({
'id' : [999, 999, 999, 185, 185, 185, 999, 999, 999],
'target' : [1, 1, 1, 0, 0, 0, 1, 1, 1],
'event': ['2023-01-01', '2023-01-02', '2023-02-03', '2023-01-01', '2023-01-02', '2023-01-03', '2023-01-01', '2023-01-02', '2023-01-03'],
'survey': ['2023-02-02', '2023-02-02', '2023-02-02', '2023-03-10', '2023-03-10', '2023-03-10', '2023-04-22', '2023-04-22', '2023-04-22'],
'event1': [1, 6, 11, 16, np.nan, 22, 74, 109, 52],
'event2': [2, 7, np.nan, 17, 22, np.nan, np.nan, 10, 5],
'event3': [3, 8, 13, 18, 23, np.nan, 2, np.nan, 99],
'event4': [4, 9, np.nan, np.nan, np.nan, 11, 8, np.nan, np.nan],
'event5': [5, np.nan, 15, 20, 25, 1, 1, 3, np.nan]
})
df = df.fillna(0)
df
Requirements:个
我需要在我的数据框中保留列"id"中没有重复的值,同时也保留具有来自列"Survey"的该id的没有重复的值的所有行.
例如,如您在id=999的示例中所看到的,我们在"Survey"列中有值2023-02-02或2023-04-22,所以我需要保留id=999、2023-02-02或2023-04-022的所有行.
Example of needed result:个
因此,我需要如下内容:
df = pd.DataFrame({
'id' : [999, 999, 999, 185, 185, 185],
'target' : [1, 1, 1, 0, 0, 0],
'event': ['2023-01-01', '2023-01-02', '2023-02-03', '2023-01-01', '2023-01-02', '2023-01-03'],
'survey': ['2023-02-02', '2023-02-02', '2023-02-02', '2023-03-10', '2023-03-10', '2023-03-10'],
'event1': [1, 6, 11, 16, np.nan, 22],
'event2': [2, 7, np.nan, 17, 22, np.nan],
'event3': [3, 8, 13, 18, 23, np.nan],
'event4': [4, 9, np.nan, np.nan, np.nan, 11],
'event5': [5, np.nan, 15, 20, 25, 1]
})
df = df.fillna(0)
df
我如何在Python Pandas中做到这一点?