我没有看到这样的问题,所以我编写了一些代码,如:
import pandas as pd
import pyodbc
from collections import Counter
import itertools
server = ""
cnxn = ""
query = ("")
try:
df = pd.read_sql(query, cnxn).astype('string')
except:
print("query failed")
else:
cnxn.close()
返回如下数据帧:
partNo planStatus planRev operation
------ ---------- ------- ---------
110068 Released 2A 0100-00-0
110068 Released 2A 0200-00-0
110383 Released 3B 0100-00-0
110383 Released 3B 0200-00-0
110384 In Dev 1C 0100-00-0
110384 In Dev 1C 0200-00-0
现在,我想 for each 具有操作"000"的零件号添加一行:
dfNums= list(df['partNo'].drop_duplicates())
temp = pd.DataFrame({'partNo':dfNums, 'operation':['000']*len(dfNums)})
df = pd.concat([df, temp]).sort_values(by=['partNo', 'operation'])
返回如下数据帧:
partNo planStatus planRev operation
------ ---------- ------- ---------
110068 000
110068 Released 2A 0100-00-0
110068 Released 2A 0200-00-0
110383 000
110383 Released 3B 0100-00-0
110383 Released 3B 0200-00-0
110384 000
110384 In Dev 1C 0100-00-0
110384 In Dev 1C 0200-00-0
因此,现在为了让planStatus和planRev填充到"000"操作行,我能想到的最好方法是:
for num in dfNums:
getNumRevs = list(df.loc[df['partNo'] == num]['planRev'])
getNumStatus = list(df.loc[df['partNo'] == num]['planStatus'])
data = Counter(getNumRevs)
data1 = Counter(getNumStatus)
mostCommonRev = max(getNumRevs, key=data.get)
mostCommonStatus = max(getNumStatus, key=data1.get)
df.loc[df['partNo'] == num, 'planRev'] = mostCommonRev
df.loc[df['partNo'] == num, 'planStatus'] = ""
df.loc[(df['partNo'] == num) & (df['operation'] == '000'), 'planStatus'] = mostCommonStatus
我无法想象这是最有效的方法.有没有更好的方法用groupby
来实现这一点?还是有更好的方法?这给了我一种遍历数据帧的感觉,但这是我实现所需输出的唯一方法,如下所示:
partNo planStatus planRev operation
------ ---------- ------- ---------
110068 Released 2A 000
110068 2A 0100-00-0
110068 2A 0200-00-0
110383 Released 3B 000
110383 3B 0100-00-0
110383 3B 0200-00-0
110384 In Dev 1C 000
110384 1C 0100-00-0
110384 1C 0200-00-0
编辑@rayad的 comments :
# create a temp df of all part nums and operation 000
routingNums = list(df['partNo'].drop_duplicates())
temp = pd.DataFrame({'partNo':routingNums, 'operation':['000']*len(routingNums)})
# add the temp df of '000' ops to the main df and sort
df = pd.concat([df, temp]).sort_values(by=['partNo', 'operation']).reset_index(drop=True)
# make all cimxDatabase values WLCAPP
df.loc[df['operation'] == '000', 'cimxDatabase'] = 'WLCAPP'
# make all '000' op rows have the same planRev and planStatus as the rest of the partNo's associated
check = df[['partNo','planRev','planStatus']].drop_duplicates(subset='partNo', keep='last')
df_to_merge = df[['partNo']].merge(check, on='partNo', how='left')
df.update(df_to_merge, overwrite=True)