Python - 数据聚合

Python - 数据聚合 首页 / 数据科学入门教程 / Python - 数据聚合

Python有几种方法可用于对数据执行聚合,它是使用pandasnumpy库完成的,数据必须可用或转换为 一个数据框以应用聚合功能。

数据框聚合

让无涯教程创建一个DataFrame并对其应用聚合。

import pandas as pd
import numpy as np

df=pd.DataFrame(np.random.randn(10, 4),
      index=pd.date_range('1/1/2000', periods=10),
      columns=['A', 'B', 'C', 'D'])

print df

r=df.rolling(window=3,min_periods=1)
print r

其输出如下-

链接:https://www.learnfk.comhttps://www.learnfk.com/python-data-science/python-data-aggregation.html

来源:LearnFk无涯教程网

                    A           B           C           D
2000-01-01   1.088512   -0.650942   -2.547450   -0.566858
2000-01-02   0.790670   -0.387854   -0.668132    0.267283
2000-01-03  -0.575523   -0.965025    0.060427   -2.179780
2000-01-04   1.669653    1.211759   -0.254695    1.429166
2000-01-05   0.100568   -0.236184    0.491646   -0.466081
2000-01-06   0.155172    0.992975   -1.205134    0.320958
2000-01-07   0.309468   -0.724053   -1.412446    0.627919
2000-01-08   0.099489   -1.028040    0.163206   -1.274331
2000-01-09   1.639500   -0.068443    0.714008   -0.565969
2000-01-10   0.326761    1.479841    0.664282   -1.361169

Rolling [window=3,min_periods=1,center=False,axis=0]                

可以通过将函数传递给整个DataFrame进行聚合,也可以通过标准的获取项方法选择一列。

整个数据框聚合

import pandas as pd
import numpy as np

df=pd.DataFrame(np.random.randn(10, 4),
      index=pd.date_range('1/1/2000', periods=10),
      columns=['A', 'B', 'C', 'D'])
print df

r=df.rolling(window=3,min_periods=1)
print r.aggregate(np.sum)

其输出如下-

链接:https://www.learnfk.comhttps://www.learnfk.com/python-data-science/python-data-aggregation.html

来源:LearnFk无涯教程网

                    A           B           C           D
2000-01-01   1.088512   -0.650942   -2.547450   -0.566858
2000-01-02   1.879182   -1.038796   -3.215581   -0.299575
2000-01-03   1.303660   -2.003821   -3.155154   -2.479355
2000-01-04   1.884801   -0.141119   -0.862400   -0.483331
2000-01-05   1.194699    0.010551    0.297378   -1.216695
2000-01-06   1.925393    1.968551   -0.968183    1.284044
2000-01-07   0.565208    0.032738   -2.125934    0.482797
2000-01-08   0.564129   -0.759118   -2.454374   -0.325454
2000-01-09   2.048458   -1.820537   -0.535232   -1.212381
2000-01-10   2.065750    0.383357    1.541496   -3.201469

                    A           B           C           D
2000-01-01   1.088512   -0.650942   -2.547450   -0.566858
2000-01-02   1.879182   -1.038796   -3.215581   -0.299575
2000-01-03   1.303660   -2.003821   -3.155154   -2.479355
2000-01-04   1.884801   -0.141119   -0.862400   -0.483331
2000-01-05   1.194699    0.010551    0.297378   -1.216695
2000-01-06   1.925393    1.968551   -0.968183    1.284044
2000-01-07   0.565208    0.032738   -2.125934    0.482797
2000-01-08   0.564129   -0.759118   -2.454374   -0.325454
2000-01-09   2.048458   -1.820537   -0.535232   -1.212381
2000-01-10   2.065750    0.383357    1.541496   -3.201469

数据框单列聚合

import pandas as pd
import numpy as np

df=pd.DataFrame(np.random.randn(10, 4),
      index=pd.date_range('1/1/2000', periods=10),
      columns=['A', 'B', 'C', 'D'])
print df
r=df.rolling(window=3,min_periods=1)
print r['A'].aggregate(np.sum)

其输出如下-

链接:https://www.learnfk.comhttps://www.learnfk.com/python-data-science/python-data-aggregation.html

来源:LearnFk无涯教程网

                 A           B           C           D
2000-01-01   1.088512   -0.650942   -2.547450   -0.566858
2000-01-02   1.879182   -1.038796   -3.215581   -0.299575
2000-01-03   1.303660   -2.003821   -3.155154   -2.479355
2000-01-04   1.884801   -0.141119   -0.862400   -0.483331
2000-01-05   1.194699    0.010551    0.297378   -1.216695
2000-01-06   1.925393    1.968551   -0.968183    1.284044
2000-01-07   0.565208    0.032738   -2.125934    0.482797
2000-01-08   0.564129   -0.759118   -2.454374   -0.325454
2000-01-09   2.048458   -1.820537   -0.535232   -1.212381
2000-01-10   2.065750    0.383357    1.541496   -3.201469
2000-01-01   1.088512
2000-01-02   1.879182
2000-01-03   1.303660
2000-01-04   1.884801
2000-01-05   1.194699
2000-01-06   1.925393
2000-01-07   0.565208
2000-01-08   0.564129
2000-01-09   2.048458
2000-01-10   2.065750
Freq: D, Name: A, dtype: float64

数据框多列聚合

import pandas as pd
import numpy as np

df=pd.DataFrame(np.random.randn(10, 4),
      index=pd.date_range('1/1/2000', periods=10),
      columns=['A', 'B', 'C', 'D'])
print df
r=df.rolling(window=3,min_periods=1)
print r[['A','B']].aggregate(np.sum)

其输出如下-

链接:https://www.learnfk.comhttps://www.learnfk.com/python-data-science/python-data-aggregation.html

来源:LearnFk无涯教程网

                 A           B           C           D
2000-01-01   1.088512   -0.650942   -2.547450   -0.566858
2000-01-02   1.879182   -1.038796   -3.215581   -0.299575
2000-01-03   1.303660   -2.003821   -3.155154   -2.479355
2000-01-04   1.884801   -0.141119   -0.862400   -0.483331
2000-01-05   1.194699    0.010551    0.297378   -1.216695
2000-01-06   1.925393    1.968551   -0.968183    1.284044
2000-01-07   0.565208    0.032738   -2.125934    0.482797
2000-01-08   0.564129   -0.759118   -2.454374   -0.325454
2000-01-09   2.048458   -1.820537   -0.535232   -1.212381
2000-01-10   2.065750    0.383357    1.541496   -3.201469
                    A           B
2000-01-01   1.088512   -0.650942
2000-01-02   1.879182   -1.038796
2000-01-03   1.303660   -2.003821
2000-01-04   1.884801   -0.141119
2000-01-05   1.194699    0.010551
2000-01-06   1.925393    1.968551
2000-01-07   0.565208    0.032738
2000-01-08   0.564129   -0.759118
2000-01-09   2.048458   -1.820537
2000-01-10   2.065750    0.383357

祝学习愉快!(内容编辑有误?请选中要编辑内容 -> 右键 -> 修改 -> 提交!)

技术教程推荐

数据中台实战课 -〔郭忆〕

罗剑锋的C++实战笔记 -〔罗剑锋〕

深度学习推荐系统实战 -〔王喆〕

说透数字化转型 -〔付晓岩〕

手把手带你写一个Web框架 -〔叶剑峰〕

超级访谈:对话张雪峰 -〔张雪峰〕

说透元宇宙 -〔方军〕

Serverless进阶实战课 -〔静远〕

深入拆解消息队列47讲 -〔许文强〕

好记忆不如烂笔头。留下您的足迹吧 :)