我有一个datetime64[ns]格式的date\U time列数据框:
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 filename 235 non-null object
1 date_time 235 non-null datetime64[ns]
2 r 235 non-null float64
df:
filename date_time r
0 01_ 2022-05-24 12:07:06 3.2E+05
1 01_ 2022-05-24 12:08:15 3.1E+05
2 01_ 2022-05-24 12:09:23 2.9E+05
3 02_ 2022-05-24 12:10:43 5.0E+06
4 04_ 2022-05-24 12:38:26 5.6E+06
.. ... ... ...
230 91_ 2022-05-26 09:57:50 8.9E+06
231 91_ 2022-05-26 09:59:06 8.3E+06
232 91_ 2022-05-26 10:00:23 8.5E+06
233 91_ 2022-05-26 10:01:40 9.0E+06
234 91_ 2022-05-26 10:02:57 9.1E+06
计算按文件名分组的date\u时间的平均值时,只能使用:
df_.groupby(["filename"]).agg(["mean"])
而不是:
df_.groupby(["filename"]).mean()
df_.groupby(["filename"]).agg("mean")
为什么它只适用于df\ux.groupby(["filename"]).agg(["平均"])?
下面是代码和示例:
print("works with:")
print(df_.groupby(["filename"]).agg(["mean"]))
print ("doesn't work with: (no date_time column showing)")
print(df_.groupby(["filename"]).mean())
print(df_.groupby(["filename"]).agg("mean"))
OUT:
works with:
date_time r
mean mean
filename
01_ 2022-05-24 12:08:14.666666752 3.1E+05
02_ 2022-05-24 12:10:43.000000000 5.0E+06
04_ 2022-05-24 12:39:34.999999744 5.2E+06
05_ 2022-05-24 12:42:54.000000000 7.5E+04
06_ 2022-05-24 12:47:06.000000000 3.4E+05
... ... ...
87_ 2022-05-25 16:44:56.000000000 9.5E+06
88_ 2022-05-26 09:15:00.875000064 1.1E+05
89_ 2022-05-26 09:29:22.357143040 8.3E+06
90_ 2022-05-26 09:45:32.500000000 1.1E+05
91_ 2022-05-26 09:55:16.384615424 8.9E+06
[75 rows x 2 columns]
doesn't work with: (no date_time column showing)
r
filename
01_ 3.1E+05
02_ 5.0E+06
04_ 5.2E+06
05_ 7.5E+04
06_ 3.4E+05
... ...
87_ 9.5E+06
88_ 1.1E+05
89_ 8.3E+06
90_ 1.1E+05
91_ 8.9E+06
[75 rows x 1 columns]
r
filename
01_ 3.1E+05
02_ 5.0E+06
04_ 5.2E+06
05_ 7.5E+04
06_ 3.4E+05
... ...
87_ 9.5E+06
88_ 1.1E+05
89_ 8.3E+06
90_ 1.1E+05
91_ 8.9E+06
[75 rows x 1 columns]