我有一个CSV文件,其 struct 如下:网格单元、日期、站点温度
我的目标是只保留那些月满的年份,而go 掉那些少于12个月的年份.
我已经弄清楚了如何识别哪些年份的月份是完整的,但我不确定如何将其应用于原始数据集来子集我的数据:
import numpy as np
import pandas as pd
Grid_Cells = [4719,4719,4719,4719,4719,4719,4719,4719,4719,4719,4719,4719,4719,4719,4719
,4719,4935,4935,4935,4935,4935,4935,4935,4935,4935,4935,4935,4935]
Dates = ['01-09-2008','01-10-2008','01-11-2008','01-12-2008','01-01-2009','01-02-2009','01-03-2009',
'01-04-2009','01-05-2009','01-06-2009','01-07-2009','01-08-2009','01-09-2009','01-10-2009',
'01-11-2009','01-12-2009','01-01-2013','01-02-2013','01-03-2013','01-04-2013','01-05-2013',
'01-06-2013','01-07-2013','01-08-2013','01-09-2013','01-10-2013','01-11-2013','01-12-2013']
Temps = [1.97479861111111,-0.391396505,-1.784091667,-6.509092742,-11.81903226,-14.34362798,-12.40221774,
-9.133213889,-1.039681452,0.907477778,3.54647043,3.893416667,2.161473611,0.015456989,-1.567216667,
-4.373807796,-3.63483871,-7.023452381,-6.49688172,-5.683111111,-1.053548387,7.404777778,9.015913978,
8.415376344,2.605666667,0.597096774,-2.949,-4.595483871]
dframe = pd.DataFrame(data=Grid_Cells,columns=['Grid Cells'])
dframe['Date'] = Dates
dframe['Station Temperature'] = Temps
dframe['DateTime'] = pd.to_datetime(dframe['Date'])
dt_vals = dframe['DateTime']
dframe['Year'] = [i.year for i in dt_vals]
dframe['Month'] = [i.month for i in dt_vals]
months_in_year = dframe.groupby(['Grid Cells','Year'])['Month'].count() #count number of months in year
subset_months = months_in_year.drop(months_in_year[months_in_year < 12].index) #drop years with less than 12 months
但我一直有问题,不知道如何使用SUBSET_MONTS来子集我的dFrame,以便只 Select 那些少于12个月的年份(因为它是多索引,同时使用网格单元格和年份作为索引).有没有人有建议?