我有一个CSV文件,其 struct 如下:网格单元、日期、站点温度

我的目标是只保留那些月满的年份,而go 掉那些少于12个月的年份.

我已经弄清楚了如何识别哪些年份的月份是完整的,但我不确定如何将其应用于原始数据集来子集我的数据:

import numpy as np
import pandas as pd


Grid_Cells = [4719,4719,4719,4719,4719,4719,4719,4719,4719,4719,4719,4719,4719,4719,4719
              ,4719,4935,4935,4935,4935,4935,4935,4935,4935,4935,4935,4935,4935]

Dates = ['01-09-2008','01-10-2008','01-11-2008','01-12-2008','01-01-2009','01-02-2009','01-03-2009',
         '01-04-2009','01-05-2009','01-06-2009','01-07-2009','01-08-2009','01-09-2009','01-10-2009',
         '01-11-2009','01-12-2009','01-01-2013','01-02-2013','01-03-2013','01-04-2013','01-05-2013',
         '01-06-2013','01-07-2013','01-08-2013','01-09-2013','01-10-2013','01-11-2013','01-12-2013']

Temps = [1.97479861111111,-0.391396505,-1.784091667,-6.509092742,-11.81903226,-14.34362798,-12.40221774,
         -9.133213889,-1.039681452,0.907477778,3.54647043,3.893416667,2.161473611,0.015456989,-1.567216667,
         -4.373807796,-3.63483871,-7.023452381,-6.49688172,-5.683111111,-1.053548387,7.404777778,9.015913978,
         8.415376344,2.605666667,0.597096774,-2.949,-4.595483871]



dframe = pd.DataFrame(data=Grid_Cells,columns=['Grid Cells'])
dframe['Date'] = Dates
dframe['Station Temperature'] = Temps
dframe['DateTime'] = pd.to_datetime(dframe['Date'])
dt_vals = dframe['DateTime']
dframe['Year'] = [i.year for i in dt_vals]
dframe['Month'] = [i.month for i in dt_vals]
    
months_in_year = dframe.groupby(['Grid Cells','Year'])['Month'].count() #count number of months in year
    
subset_months = months_in_year.drop(months_in_year[months_in_year < 12].index) #drop years with less than 12 months

但我一直有问题,不知道如何使用SUBSET_MONTS来子集我的dFrame,以便只 Select 那些少于12个月的年份(因为它是多索引,同时使用网格单元格和年份作为索引).有没有人有建议?

推荐答案

你们很接近了.在['Grid Cells', 'Year']上使用subset_monthsdframe进行内部连接,您将只有您想要的值.

dframe = dframe.merge(subset_months.reset_index()[['Grid Cells', 'Year']],
                      how = 'inner', 
                      on  = ['Grid Cells', 'Year']
                     )
    Grid Cells        Date  Station Temperature   DateTime  Year  Month
0         4719  01-01-2009           -11.819032 2009-01-01  2009      1
1         4719  01-02-2009           -14.343628 2009-01-02  2009      1
2         4719  01-03-2009           -12.402218 2009-01-03  2009      1
3         4719  01-04-2009            -9.133214 2009-01-04  2009      1
4         4719  01-05-2009            -1.039681 2009-01-05  2009      1
5         4719  01-06-2009             0.907478 2009-01-06  2009      1
6         4719  01-07-2009             3.546470 2009-01-07  2009      1
7         4719  01-08-2009             3.893417 2009-01-08  2009      1
8         4719  01-09-2009             2.161474 2009-01-09  2009      1
9         4719  01-10-2009             0.015457 2009-01-10  2009      1
10        4719  01-11-2009            -1.567217 2009-01-11  2009      1
11        4719  01-12-2009            -4.373808 2009-01-12  2009      1
12        4935  01-01-2013            -3.634839 2013-01-01  2013      1
13        4935  01-02-2013            -7.023452 2013-01-02  2013      1
14        4935  01-03-2013            -6.496882 2013-01-03  2013      1
15        4935  01-04-2013            -5.683111 2013-01-04  2013      1
16        4935  01-05-2013            -1.053548 2013-01-05  2013      1
17        4935  01-06-2013             7.404778 2013-01-06  2013      1
18        4935  01-07-2013             9.015914 2013-01-07  2013      1
19        4935  01-08-2013             8.415376 2013-01-08  2013      1
20        4935  01-09-2013             2.605667 2013-01-09  2013      1
21        4935  01-10-2013             0.597097 2013-01-10  2013      1
22        4935  01-11-2013            -2.949000 2013-01-11  2013      1
23        4935  01-12-2013            -4.595484 2013-01-12  2013      1

Python相关问答推荐

将C struct 的指针传递给Python中的ioctel

解析讨论论坛只给我第一个用户 comments ,但没有给我其他用户回复

使用from_pandas将GeDataFrame转换为polars失败,ArrowType错误:未传递numpy. dype对象

Polars:使用列值引用when / then表达中的其他列

如何在Python中使用io.BytesIO写入现有缓冲区?

GL pygame无法让缓冲区与vertextPointer和colorPointer一起可靠地工作

如何根据另一列值用字典中的值替换列值

提取两行之间的标题的常规表达

try 在树叶 map 上应用覆盖磁贴

删除最后一个pip安装的包

如何在Python中将returns.context. DeliverresContext与Deliverc函数一起使用?

如何删除索引过go 的lexsort深度可能会影响性能?' &>

未删除映射表的行

如何使用html从excel中提取条件格式规则列表?

删除所有列值,但判断是否存在任何二元组

scikit-learn导入无法导入名称METRIC_MAPPING64'

如何在Raspberry Pi上检测USB并使用Python访问它?

python中的解释会在后台调用函数吗?

如何使用Numpy. stracards重新编写滚动和?

在不同的帧B中判断帧A中的子字符串,每个帧的大小不同