我正在试着分组消息,这些消息很快就被发送出go 了.参数定义消息之间的最长持续时间,以便它们被视为块的一部分.如果将消息添加到块中,则时间窗口将延长,以便更多消息被视为块的一部分.
Example Input个
datetime | message | |
---|---|---|
0 | 2023-01-01 12:00:00 | A |
1 | 2023-01-01 12:20:00 | B |
2 | 2023-01-01 12:30:00 | C |
3 | 2023-01-01 12:30:55 | D |
4 | 2023-01-01 12:31:20 | E |
5 | 2023-01-01 15:00:00 | F |
6 | 2023-01-01 15:30:30 | G |
7 | 2023-01-01 15:30:55 | H |
Expected output for the parameter set to 1min个
datetime | message | datetime_last | n_block | |
---|---|---|---|---|
0 | 2023-01-01 12:00:00 | A | 2023-01-01 12:00:00 | 1 |
1 | 2023-01-01 12:20:00 | B | 2023-01-01 12:20:00 | 1 |
2 | 2023-01-01 12:30:00 | C\nD\nE | 2023-01-01 12:31:20 | 3 |
3 | 2023-01-01 15:00:00 | F | 2023-01-01 15:00:00 | 1 |
4 | 2023-01-01 15:30:30 | G\nH | 2023-01-01 15:30:55 | 2 |
My failing attempt个
我希望通过滚动窗口来实现这一点,该窗口将不断地追加消息行.
def join_messages(x):
return '\n'.join(x)
df.rolling(window='1min', on='datetime').agg({
'datetime': ['first', 'last'],
'message': [join_messages, "count"]}) #Somehow overwrite datetime with the aggregated datetime.first.
两个聚合都失败,返回ValueError:invalid on specified as datetime, must be a column (of DataFrame), an Index or None
.
我看不出有什么干净利落的方法可以让datetime
在橱窗里变得"容易接近".此外,滚动也不能很好地与弦配合使用.我的印象是,这是一条死胡同,有一种更干净的方法来解决这一问题.
输入和预期数据的片段
df = pd.DataFrame({
'datetime': [pd.Timestamp('2023-01-01 12:00'),
pd.Timestamp('2023-01-01 12:20'),
pd.Timestamp('2023-01-01 12:30:00'),
pd.Timestamp('2023-01-01 12:30:55'),
pd.Timestamp('2023-01-01 12:31:20'),
pd.Timestamp('2023-01-01 15:00'),
pd.Timestamp('2023-01-01 15:30:30'),
pd.Timestamp('2023-01-01 15:30:55'),],
'message': list('ABCDEFGH')})
df_expected = pd.DataFrame({
'datetime': [pd.Timestamp('2023-01-01 12:00'),
pd.Timestamp('2023-01-01 12:20'),
pd.Timestamp('2023-01-01 12:30:00'),
pd.Timestamp('2023-01-01 15:00'),
pd.Timestamp('2023-01-01 15:30:30'),],
'message': ['A', 'B', 'C\nD\nE', 'F', 'G\nH'],
'datetime_last': [pd.Timestamp('2023-01-01 12:00'),
pd.Timestamp('2023-01-01 12:20'),
pd.Timestamp('2023-01-01 12:31:20'),
pd.Timestamp('2023-01-01 15:00'),
pd.Timestamp('2023-01-01 15:30:55'),],
'n_block': [1, 1, 3, 1, 2]})