我没有看到这样的问题,所以我编写了一些代码,如:

import pandas as pd
import pyodbc
from collections import Counter
import itertools

server = ""
cnxn = ""
query = ("")
try:
    df = pd.read_sql(query, cnxn).astype('string')
except:
    print("query failed")
else:
    cnxn.close()

返回如下数据帧:

partNo    planStatus    planRev    operation
------    ----------    -------    ---------
110068    Released      2A         0100-00-0
110068    Released      2A         0200-00-0
110383    Released      3B         0100-00-0
110383    Released      3B         0200-00-0
110384    In Dev        1C         0100-00-0
110384    In Dev        1C         0200-00-0

现在,我想 for each 具有操作"000"的零件号添加一行:

dfNums= list(df['partNo'].drop_duplicates())
temp = pd.DataFrame({'partNo':dfNums, 'operation':['000']*len(dfNums)})
df = pd.concat([df, temp]).sort_values(by=['partNo', 'operation'])

返回如下数据帧:

partNo    planStatus    planRev    operation
------    ----------    -------    ---------
110068                             000
110068    Released      2A         0100-00-0
110068    Released      2A         0200-00-0
110383                             000
110383    Released      3B         0100-00-0
110383    Released      3B         0200-00-0
110384                             000
110384    In Dev        1C         0100-00-0
110384    In Dev        1C         0200-00-0

因此,现在为了让planStatus和planRev填充到"000"操作行,我能想到的最好方法是:

for num in dfNums:
    getNumRevs = list(df.loc[df['partNo'] == num]['planRev'])
    getNumStatus = list(df.loc[df['partNo'] == num]['planStatus'])
    data = Counter(getNumRevs)
    data1 = Counter(getNumStatus)
    mostCommonRev = max(getNumRevs, key=data.get)
    mostCommonStatus = max(getNumStatus, key=data1.get)
    df.loc[df['partNo'] == num, 'planRev'] = mostCommonRev
    df.loc[df['partNo'] == num, 'planStatus'] = ""
    df.loc[(df['partNo'] == num) & (df['operation'] == '000'), 'planStatus'] = mostCommonStatus

我无法想象这是最有效的方法.有没有更好的方法用groupby来实现这一点?还是有更好的方法?这给了我一种遍历数据帧的感觉,但这是我实现所需输出的唯一方法,如下所示:

partNo    planStatus    planRev    operation
------    ----------    -------    ---------
110068    Released      2A         000
110068                  2A         0100-00-0
110068                  2A         0200-00-0
110383    Released      3B         000
110383                  3B         0100-00-0
110383                  3B         0200-00-0
110384    In Dev        1C         000
110384                  1C         0100-00-0
110384                  1C         0200-00-0

编辑@rayad的 comments :

# create a temp df of all part nums and operation 000
routingNums = list(df['partNo'].drop_duplicates())
temp = pd.DataFrame({'partNo':routingNums, 'operation':['000']*len(routingNums)})
# add the temp df of '000' ops to the main df and sort
df = pd.concat([df, temp]).sort_values(by=['partNo', 'operation']).reset_index(drop=True)
# make all cimxDatabase values WLCAPP
df.loc[df['operation'] == '000', 'cimxDatabase'] = 'WLCAPP'
# make all '000' op rows have the same planRev and planStatus as the rest of the partNo's associated
check = df[['partNo','planRev','planStatus']].drop_duplicates(subset='partNo', keep='last')
df_to_merge = df[['partNo']].merge(check, on='partNo', how='left')
df.update(df_to_merge, overwrite=True)

推荐答案

对于您提供的第一个数据帧:

import pandas as pd

df = pd.DataFrame([
    {"partNo": 110068,"planStatus": "Released", "planRev": "2A", "operation": "0100-00-0",},
    {"partNo": 110068, "planStatus": "Released", "planRev": "2A", "operation": "0200-00-0",},
    {"partNo": 110383, "planStatus": "Released", "planRev": "3B", "operation": "0100-00-0",},
    {"partNo": 110383, "planStatus": "Released", "planRev": "3B", "operation": "0200-00-0",},
    {"partNo": 110384, "planStatus": "In Dev", "planRev": "1C", "operation": "0100-00-0",},
    {"partNo": 110384, "planStatus": "In Dev", "planRev": "1C", "operation": "0200-00-0",},
    ])

下面是另一种方法:

# Create and add new rows
new_rows = [
    pd.DataFrame(
        {
            "partNo": [partno],
            "planStatus": [pd.NA],
            "planRev": [pd.NA],
            "operation": ["000"],
        }
    )
    for partno in df["partNo"].unique()
]

df = (
    pd.concat([df, *new_rows])
    .sort_values(by=["partNo", "operation"])
    .fillna(method="bfill")
    .reset_index(drop=True)
)

# Remove duplicated values in `planStatus` column
df.loc[df["operation"] != "000", "planStatus"] = ""

因此:

print(df)
# Output
   partNo planStatus planRev  operation
0  110068   Released      2A        000
1  110068                 2A  0100-00-0
2  110068                 2A  0200-00-0
3  110383   Released      3B        000
4  110383                 3B  0100-00-0
5  110383                 3B  0200-00-0
6  110384     In Dev      1C        000
7  110384                 1C  0100-00-0
8  110384                 1C  0200-00-0

Python相关问答推荐

使用索引列表列表对列进行切片并获取行方向的向量长度

我如何根据前一个连续数字改变一串数字?

导入...从...混乱

如何在Python中获取`Genericums`超级类型?

* 动态地 * 修饰Python中的递归函数

Python Tkinter为特定样式调整所有ttkbootstrap或ttk Button填充的大小,适用于所有主题

如何排除prefecture_related中查询集为空的实例?

幂集,其中每个元素可以是正或负""""

python—telegraph—bot send_voice发送空文件

如何将数据帧中的timedelta转换为datetime

计算空值

如果有2个或3个,则从pandas列中删除空格

如何在Airflow执行日期中保留日期并将时间转换为00:00

为什么后跟inplace方法的`.rename(Columns={';b';:';b';},Copy=False)`没有更新原始数据帧?

高效生成累积式三角矩阵

如何根据一定条件生成段id

对数据帧进行分组,并按组间等概率抽样n行

有没有一种方法可以在朗肯代理中集成向量嵌入

Pandas 数据框自定义排序功能

是否从Python调用SHGetKnownFolderPath?