我希望按ID分组,并找到每个月的最小和最大计数.

Data

DATE        ID  name    
4/30/2023   AA  hi  
4/5/2023    AA  hi  
4/1/2023    AA  hi  
4/1/2023    AA  hello   
4/30/2023   AA  hello   
4/5/2023    AA  hello   
4/5/2023    AA  hey 
4/30/2023   AA  hey 
4/5/2023    AA  ok  
4/30/2023   AA  ok  
4/30/2023   AA  ok  
5/1/2023    AA  ok
5/1/2023    AA  hey
5/25/2023   AA  hi
4/1/2023    BB  hey 
4/2/2023    BB  hi  
4/2/2023    BB  hello   
        
        

Desired

ID  DATE        stat    count
AA  4/1/2023    min     2
AA  4/30/2023   max     5
AA  5/25/2023   min     1
AA  5/1/2023    max     2
BB  4/1/2023    min     1
BB  4/2/2023    max     2

Doing

result = df.groupby(['ID', 'DATE', 'name']).size().reset_index(name='count')
result['stat'] = result.groupby(['ID', 'DATE'])['count'].transform(lambda x: 'min' if x.idxmin() == x.idxmax() else 'max')

然而,这并没有说明日期.如有任何建议,欢迎光临.

推荐答案

这应该会起作用(双关语;])

import pandas as pd
import numpy as np

data = [["4/30/2023", "AA", "hi"]] # Fill rest of data, this was for testing
df = pd.DataFrame(data, columns=['DATE', 'ID', 'name'])

# Convert 'DATE' column to datetime format and extract month and year
df['DATE'] = pd.to_datetime(df['DATE'])
df['month'] = df['DATE'].dt.month
df['year'] = df['DATE'].dt.year

# Group by 'ID', 'month', and 'year' and calculate the count of names
result = df.groupby(['ID', 'year', 'month', 'DATE'])['name'].size().reset_index(name='count')

# Find the min and max counts for each ID and month combination
result_min = result.groupby(['ID', 'year', 'month'])['count'].min().reset_index(name='min_count')
result_max = result.groupby(['ID', 'year', 'month'])['count'].max().reset_index(name='max_count')

# Merge the min and max counts with the original result DataFrame
result = result.merge(result_min, on=['ID', 'year', 'month']).merge(result_max, on=['ID', 'year', 'month'])

# Create a 'stat' column based on the min and max counts
result['stat'] = np.where(result['count'] == result['min_count'], 'min', 'max')

# Drop unnecessary columns and reset index
result = result.drop(columns=['min_count', 'max_count']).reset_index(drop=True)

# We use .to_string() so we can remove the pandas indexing on the left of the print
print(result[['ID', 'DATE', 'stat', 'count']].to_string(index=False))

Python相关问答推荐

如何强制cv2.electrical画顺时针弧线?

在Python中是否可以输入使用任意大小参数列表的第一个元素的函数

Odoo onchange for invoice_Status of sale事件.订单未触发

为什么我的(工作)代码(生成交互式情节)在将其放入函数中时不再工作?

数字梯度的意外值

拆分pandas列并创建包含这些拆分值计数的新列

ModuleNotFound错误:没有名为Crypto Windows 11、Python 3.11.6的模块

将输入管道传输到正在运行的Python脚本中

优化pytorch函数以消除for循环

如何列举Pandigital Prime Set

Polars:用氨纶的其他部分替换氨纶的部分

numpy卷积与有效

avxspan与pandas period_range

我如何根据前一个连续数字改变一串数字?

实现自定义QWidgets作为QTimeEdit的弹出窗口

Pandas Loc Select 到NaN和值列表

UNIQUE约束失败:customuser. username

如何更新pandas DataFrame上列标题的de值?

使用Python查找、替换和调整PDF中的图像'

跳过嵌套JSON中的级别并转换为Pandas Rame