我有一个数据框架和一系列字典.数据帧有84k行.每一行都是特定客户端的帐户.
列表中的每个dict都属于特定的客户端.它们最多可以有50把 keys ,最多可以有2把 keys .词典还需要按其列出的顺序apply.每个dict中的第一个键/值显示dict所属的客户端的名称.第二个键/值是规则的名称.
List of Dict Example:
0 {'client': 'client 1', 'Billing Code': 'TNL', 'Valuations': '0', 'Account Number': '>99999'}
1 {'client': 'client 1', 'Billing Code': 'MF', 'User': 'BP', 'Flag': 'S'}
...
13 {'client': 'client 2', 'Billing Code': 'TNL', 'Acct Desc': '*test*}
length: 427, dtype: object
DataFrame has these column names
df.columns = ['Source.Name','User Bank','Bank','Account Number','Account Description','Valuation Date',
'Preschedule','MF Flag','Load Flag','Global Flag','Money Market Flag','Days Prior to Valuation',
'Number of Holdings','Total Assets','Unit Value/NAV','MCS Field','From Date','Valuations',
'# Sweeps','NASDAQ','TLA','Account Type','Fund Group','Master Account Text','Master Feeder Flag',
'Acct Flag 2','Acct Field 4','Securities At Value','Net Assets','Acct Field 1','Acct Field 2',
'Group Account Indicator','Group Account Number','Region','Account Status','SMS Billing Code',
'Translation Date','Portfolio Manager 1','Acct Flag 1','Dual Flag','Securities At Value Base',
'Net Assets Base','Total Assets Base','Dual OEIC']
Input DataFrame
Client Short Name | Source.Name | User Bank | Bank | Account Number | Account Description |
---|---|---|---|---|---|
Client #1 | C#1.txt | AA | 01 | 1 | Test Account |
Client #1 | C#1.txt | AC | 01 | 2 | MY ACCOUNT |
Client #1 | C#1.txt | AC | 01 | 3 | SUPER FUND |
Client #1 | C#1.txt | AY | 01 | 4 | S&P INDEX |
Client #1 | C#1.txt | AY | 01 | 5 | Test Account |
Client #1 | C#1.txt | AA | 01 | 6 | INDEX |
Client #1 | C#1.txt | AA | 01 | 7 | Test Account |
Client #1 | C#1.txt | AA | 01 | 8 | RYAN'S Account |
Client #2 | C#2.txt | BA | 01 | 1 | Test Account |
Client #2 | C#2.txt | BB | 01 | 33 | INDEX |
Client #2 | C#2.txt | BB | 01 | 92 | Test Account |
Client #2 | C#2.txt | BZ | 01 | 123123 | INDEX |
Client #3 | C#3.txt | BB | 01 | 1657 | Test Account |
Client #3 | C#3.txt | BP | 01 | 15454 | Test Account |
Client #4 | C#4.txt | GH | 01 | 100 | Test Account |
Client #4 | C#4.txt | GH | 01 | 19875 | INDEX |
Client #4 | C#4.txt | GY | 01 | 13579 | Test Account |
Client #4 | C#4.txt | GE | 01 | 2 | INDEX |
Client #4 | C#4.txt | GE | 01 | 72 | Test Account |
Client #4 | C#4.txt | GP | 01 | 96 | GREEN Account |
Desired Output
Client Short Name | Source.Name | User Bank | Bank | Account Number | Account Description | Billing Code |
---|---|---|---|---|---|---|
Client #1 | C#1.txt | AA | 01 | 1 | Test Account | TNL |
Client #1 | C#1.txt | AC | 01 | 2 | MY ACCOUNT | MF |
Client #1 | C#1.txt | AC | 01 | 3 | SUPER FUND | HF |
Client #1 | C#1.txt | AY | 01 | 4 | S&P INDEX | Index |
Client #1 | C#1.txt | AY | 01 | 5 | Test Account | TNL |
Client #1 | C#1.txt | AA | 01 | 6 | INDEX | Index |
Client #1 | C#1.txt | AA | 01 | 7 | Test Account | TNL |
Client #1 | C#1.txt | AA | 01 | 8 | RYAN'S Account | HF |
Client #2 | C#2.txt | BA | 01 | 1 | Test Account | TNL |
Client #2 | C#2.txt | BB | 01 | 33 | INDEX | Index |
Client #2 | C#2.txt | BB | 01 | 92 | Test Account | TNL |
Client #2 | C#2.txt | BZ | 01 | 123123 | INDEX | Index |
Client #3 | C#3.txt | BB | 01 | 1657 | Test Account | TNL |
Client #3 | C#3.txt | BP | 01 | 15454 | Test Account | TNL |
Client #4 | C#4.txt | GH | 01 | 100 | Test Account | TNL |
Client #4 | C#4.txt | GH | 01 | 19875 | INDEX | Index |
Client #4 | C#4.txt | GY | 01 | 13579 | Test Account | TNL |
Client #4 | C#4.txt | GE | 01 | 2 | INDEX | Index |
Client #4 | C#4.txt | GE | 01 | 72 | Test Account | TNL |
Client #4 | C#4.txt | GP | 01 | 96 | GREEN Account | MF |
列名与键匹配.
我基本上需要遍历每一行数据,并确定它是否符合第一个dict中的标准.如果符合,则df['Billing Code']=该特定dict['Billing Code'],如果有意义的话.如果没有,则转到下一个帐单代码.
迭代可能需要很长时间才能完成所有这一切,因此标题中有"Not Iterate".不确定列表理解是否可以做到这一点.
谢谢大家的帮助!