我有一个数据框架和一系列字典.数据帧有84k行.每一行都是特定客户端的帐户.

列表中的每个dict都属于特定的客户端.它们最多可以有50把 keys ,最多可以有2把 keys .词典还需要按其列出的顺序apply.每个dict中的第一个键/值显示dict所属的客户端的名称.第二个键/值是规则的名称.

List of Dict Example:


0 {'client': 'client 1', 'Billing Code': 'TNL', 'Valuations': '0', 'Account Number': '>99999'}
1 {'client': 'client 1', 'Billing Code': 'MF', 'User': 'BP', 'Flag': 'S'}
...
13 {'client': 'client 2', 'Billing Code': 'TNL', 'Acct Desc': '*test*}

length: 427, dtype: object

DataFrame has these column names

df.columns = ['Source.Name','User Bank','Bank','Account Number','Account Description','Valuation Date',
              'Preschedule','MF Flag','Load Flag','Global Flag','Money Market Flag','Days Prior to Valuation',
              'Number of Holdings','Total Assets','Unit Value/NAV','MCS Field','From Date','Valuations',
              '# Sweeps','NASDAQ','TLA','Account Type','Fund Group','Master Account Text','Master Feeder Flag',
              'Acct Flag 2','Acct Field 4','Securities At Value','Net Assets','Acct Field 1','Acct Field 2',
              'Group Account Indicator','Group Account Number','Region','Account Status','SMS Billing Code',
             'Translation Date','Portfolio Manager 1','Acct Flag 1','Dual Flag','Securities At Value Base',
             'Net Assets Base','Total Assets Base','Dual OEIC']

Input DataFrame

Client Short Name Source.Name User Bank Bank Account Number Account Description
Client #1 C#1.txt AA 01 1 Test Account
Client #1 C#1.txt AC 01 2 MY ACCOUNT
Client #1 C#1.txt AC 01 3 SUPER FUND
Client #1 C#1.txt AY 01 4 S&P INDEX
Client #1 C#1.txt AY 01 5 Test Account
Client #1 C#1.txt AA 01 6 INDEX
Client #1 C#1.txt AA 01 7 Test Account
Client #1 C#1.txt AA 01 8 RYAN'S Account
Client #2 C#2.txt BA 01 1 Test Account
Client #2 C#2.txt BB 01 33 INDEX
Client #2 C#2.txt BB 01 92 Test Account
Client #2 C#2.txt BZ 01 123123 INDEX
Client #3 C#3.txt BB 01 1657 Test Account
Client #3 C#3.txt BP 01 15454 Test Account
Client #4 C#4.txt GH 01 100 Test Account
Client #4 C#4.txt GH 01 19875 INDEX
Client #4 C#4.txt GY 01 13579 Test Account
Client #4 C#4.txt GE 01 2 INDEX
Client #4 C#4.txt GE 01 72 Test Account
Client #4 C#4.txt GP 01 96 GREEN Account

Desired Output

Client Short Name Source.Name User Bank Bank Account Number Account Description Billing Code
Client #1 C#1.txt AA 01 1 Test Account TNL
Client #1 C#1.txt AC 01 2 MY ACCOUNT MF
Client #1 C#1.txt AC 01 3 SUPER FUND HF
Client #1 C#1.txt AY 01 4 S&P INDEX Index
Client #1 C#1.txt AY 01 5 Test Account TNL
Client #1 C#1.txt AA 01 6 INDEX Index
Client #1 C#1.txt AA 01 7 Test Account TNL
Client #1 C#1.txt AA 01 8 RYAN'S Account HF
Client #2 C#2.txt BA 01 1 Test Account TNL
Client #2 C#2.txt BB 01 33 INDEX Index
Client #2 C#2.txt BB 01 92 Test Account TNL
Client #2 C#2.txt BZ 01 123123 INDEX Index
Client #3 C#3.txt BB 01 1657 Test Account TNL
Client #3 C#3.txt BP 01 15454 Test Account TNL
Client #4 C#4.txt GH 01 100 Test Account TNL
Client #4 C#4.txt GH 01 19875 INDEX Index
Client #4 C#4.txt GY 01 13579 Test Account TNL
Client #4 C#4.txt GE 01 2 INDEX Index
Client #4 C#4.txt GE 01 72 Test Account TNL
Client #4 C#4.txt GP 01 96 GREEN Account MF

列名与键匹配.

我基本上需要遍历每一行数据,并确定它是否符合第一个dict中的标准.如果符合,则df['Billing Code']=该特定dict['Billing Code'],如果有意义的话.如果没有,则转到下一个帐单代码.

迭代可能需要很长时间才能完成所有这一切,因此标题中有"Not Iterate".不确定列表理解是否可以做到这一点.

谢谢大家的帮助!

推荐答案

编辑:

根据您的 comments ,我首先创建一个映射ClientID -> List of Dictionaries:

lst = [
    {
        "client": "Client #1",
        "Billing Code": "TNL",
        "Bank": 1,
        "Account Number": 1,
    },
    {
        "client": "Client #1",
        "Billing Code": "MF",
        "User Bank": "AY",
        "Bank": 1,
    },
    {
        "Billing Code": "TNL",
        "client": "Client #2",
        "User Bank": "BB",
    },
]

# create a mapping client no. -> list of dictionaries
m = {}
for d in lst:
    m.setdefault(d["client"], []).append(d)
    d.pop("client")

然后我将使用df.groupby by Client ID并应用自定义函数:

def fn(x):
    dictionaries = m.get(x.name, [])

    out = []
    for _, row in x.iterrows():
        for d in dictionaries:
            if all(row[k] == v for k, v in d.items() if k != "Billing Code"):
                out.append(d["Billing Code"])
                break
        else:
            out.append("Unclassified")

    x["Billng Code"] = out
    return x


df = df.groupby("Client Short Name").apply(fn)
print(df)

结果是:

   Client Short Name Source.Name User Bank  Bank  Account Number Account Description   Billng Code
0          Client #1     C#1.txt        AA     1               1        Test Account           TNL
1          Client #1     C#1.txt        AC     1               2          MY ACCOUNT  Unclassified
2          Client #1     C#1.txt        AC     1               3          SUPER FUND  Unclassified
3          Client #1     C#1.txt        AY     1               4           S&P INDEX            MF
4          Client #1     C#1.txt        AY     1               5        Test Account            MF
5          Client #1     C#1.txt        AA     1               6               INDEX  Unclassified
6          Client #1     C#1.txt        AA     1               7        Test Account  Unclassified
7          Client #1     C#1.txt        AA     1               8      RYAN'S Account  Unclassified
8          Client #2     C#2.txt        BA     1               1        Test Account  Unclassified
9          Client #2     C#2.txt        BB     1              33               INDEX           TNL
10         Client #2     C#2.txt        BB     1              92        Test Account           TNL
11         Client #2     C#2.txt        BZ     1          123123               INDEX  Unclassified
12         Client #3     C#3.txt        BB     1            1657        Test Account  Unclassified
13         Client #3     C#3.txt        BP     1           15454        Test Account  Unclassified
14         Client #4     C#4.txt        GH     1             100        Test Account  Unclassified
15         Client #4     C#4.txt        GH     1           19875               INDEX  Unclassified
16         Client #4     C#4.txt        GY     1           13579        Test Account  Unclassified
17         Client #4     C#4.txt        GE     1               2               INDEX  Unclassified
18         Client #4     C#4.txt        GE     1              72        Test Account  Unclassified
19         Client #4     C#4.txt        GP     1              96       GREEN Account  Unclassified

Python相关问答推荐

按顺序合并2个词典列表

运输问题分支定界法&

如何使用Python以编程方式判断和检索Angular网站的动态内容?

如何启动下载并在不击中磁盘的情况下呈现响应?

在Python 3中,如何让客户端打开一个套接字到服务器,发送一行JSON编码的数据,读回一行JSON编码的数据,然后继续?

幂集,其中每个元素可以是正或负""""

重置PD帧中的值

ruamel.yaml dump:如何阻止map标量值被移动到一个新的缩进行?

在Python中从嵌套的for循环中获取插值

如何在Python中使用Iscolc迭代器实现观察者模式?

使用python playwright从 Select 子菜单中 Select 值

合并相似列表

在round函数中使用列值

极点:在固定点扩展窗口

使用pythonminidom过滤XML文件

合并Pandas中的数据帧,但处理不存在的列

Parsel无法访问嵌套元素

排除NRRD文件中的多切片卷加载问题

如何使用aiohttp获取图像并直接处理它而不保存它?

在PySpark中,可以从数组中获取任意数量的元素吗?