Background: I am trying to normalize a json file, and save into a pandas dataframe, however I am having issues navigating the json structure and my code isn't working as expected.

Expected dataframe output:给定以下示例json文件(使用随机数据,但格式与真实数据完全相同),这是我试图生成的输出-

New Entity Group Entity ID Adjusted Value
(1/31/2022, No Div, USD)
Adjusted TWR
(Current Quarter No Div, USD))
Adjusted TWR
(YTD, No Div, USD)
Annualized Adjusted TWR
(Since Inception, No Div, USD)
Inception Date Risk Target
Portfolio_1 $260,786 (44.55%) (44.55%) (44.55%) * Apr 7, 2021 N/A
The FW Irrev Family Tr 9552252 $260,786 0.00% 0.00% 0.00% * Jan 11, 2022 N/A
Portfolio_2 $18,396,664 (5.78%) (5.78%) (5.47%) * Sep 3, 2021 Growth
FW DAF 10946585 $18,396,664 (5.78%) (5.78%) (5.47%) * Sep 3, 2021 Growth
Portfolio_3 $60,143,818 (4.42%) (4.42%) 7.75% * Dec 17, 2020 -
The FW Family Trust 13014080 $475,356 (6.10%) (6.10%) (3.97%) * Apr 9, 2021 Aggressive
FW Liquid Fund LP 13396796 $52,899,527 (4.15%) (4.15%) (4.15%) * Dec 30, 2021 Aggressive
FW Holdings No. 2 LLC 8413655 $6,768,937 (0.77%) (0.77%) 11.84% * Mar 5, 2021 N/A
FW and FR Joint 9957007 ($1) - - - * Dec 21, 2021 N/A

Actual dataframe output: despite my best efforts, I have only been able to get bolded rows to map into the dataframe:

New Entity Group Entity ID Adjusted Value
(1/31/2022, No Div, USD)
Adjusted TWR
(Current Quarter No Div, USD))
Adjusted TWR
(YTD, No Div, USD)
Annualized Adjusted TWR
(Since Inception, No Div, USD)
Inception Date Risk Target
Portfolio_1 $260,786 (44.55%) (44.55%) (44.55%) * Apr 7, 2021 N/A
Portfolio_2 $18,396,664 (5.78%) (5.78%) (5.47%) * Sep 3, 2021 Growth
Portfolio_3 $60,143,818 (4.42%) (4.42%) 7.75% * Dec 17, 2020 -

JSON file:这是我试图规范化并映射到数据帧的文件:

{
    "meta": {
        "columns": [
            {
                "key": "node_id",
                "display_name": "Entity ID",
                "output_type": "Word"
            },
            {
                "key": "value",
                "display_name": "Adjusted Value (1/31/2022, No Div, USD)",
                "output_type": "Number",
                "currency": "USD"
            },
            {
                "key": "time_weighted_return",
                "display_name": "Adjusted TWR (Current Quarter, No Div, USD)",
                "output_type": "Percent",
                "currency": "USD"
            },
            {
                "key": "time_weighted_return_2",
                "display_name": "Adjusted TWR (YTD, No Div, USD)",
                "output_type": "Percent",
                "currency": "USD"
            },
            {
                "key": "time_weighted_return_3",
                "display_name": "Annualized Adjusted TWR (Since Inception, No Div, USD)",
                "output_type": "Percent",
                "currency": "USD"
            },
            {
                "key": "inception_event_date",
                "display_name": "Inception Date",
                "output_type": "Date"
            },
            {
                "key": "_custom_portfolio_target_347209",
                "display_name": "Risk Target",
                "output_type": "Word"
            }
        ],
        "groupings": [
            {
                "key": "_custom_new_entity_group_453577",
                "display_name": "NEW Entity Group"
            },
            {
                "key": "top_level_legal_entity",
                "display_name": "Top Level Legal Entity"
            }
        ]
    },
    "data": {
        "type": "portfolio_views",
        "attributes": {
            "total": {
                "name": "Total",
                "columns": {
                    "time_weighted_return": -0.05001974888806926,
                    "inception_event_date": "2020-12-17",
                    "_custom_portfolio_target_347209": null,
                    "time_weighted_return_3": 0.0678647066340392,
                    "time_weighted_return_2": -0.05001974888806926,
                    "value": 7.880126780581851E7,
                    "node_id": null
                },
                "children": [
                    {
                        "name": "Portfolio_3",
                        "grouping": "_custom_new_entity_group_453577",
                        "columns": {
                            "time_weighted_return": -0.04420061615233983,
                            "inception_event_date": "2020-12-17",
                            "_custom_portfolio_target_347209": null,
                            "time_weighted_return_3": 0.07748325432684622,
                            "time_weighted_return_2": -0.04420061615233983,
                            "value": 6.014381761929752E7,
                            "node_id": null
                        },
                        "children": [
                            {
                                "entity_id": 9957007,
                                "name": "FW and FR Joint",
                                "grouping": "top_level_legal_entity",
                                "columns": {
                                    "time_weighted_return": null,
                                    "inception_event_date": "2021-12-21",
                                    "_custom_portfolio_target_347209": "N/A",
                                    "time_weighted_return_3": null,
                                    "time_weighted_return_2": null,
                                    "value": -1.44,
                                    "node_id": "9957007"
                                },
                                "children": []
                            },
                            {
                                "entity_id": 8413655,
                                "name": "FW Holdings No. 2 LLC",
                                "grouping": "top_level_legal_entity",
                                "columns": {
                                    "time_weighted_return": -0.0077309266066708515,
                                    "inception_event_date": "2021-03-05",
                                    "_custom_portfolio_target_347209": "N/A",
                                    "time_weighted_return_3": 0.11844843557716445,
                                    "time_weighted_return_2": -0.0077309266066708515,
                                    "value": 6768936.74,
                                    "node_id": "8413655"
                                },
                                "children": []
                            },
                            {
                                "entity_id": 13396796,
                                "name": "FW Liquid Fund LP",
                                "grouping": "top_level_legal_entity",
                                "columns": {
                                    "time_weighted_return": -0.04149769229150746,
                                    "inception_event_date": "2021-12-30",
                                    "_custom_portfolio_target_347209": "Aggressive",
                                    "time_weighted_return_3": -0.041497430478377395,
                                    "time_weighted_return_2": -0.04149769229150746,
                                    "value": 5.289952672686747E7,
                                    "node_id": "13396796"
                                },
                                "children": []
                            },
                            {
                                "entity_id": 13014080,
                                "name": "The FW Family Trust",
                                "grouping": "top_level_legal_entity",
                                "columns": {
                                    "time_weighted_return": -0.06102013456998856,
                                    "inception_event_date": "2021-04-09",
                                    "_custom_portfolio_target_347209": "Aggressive",
                                    "time_weighted_return_3": -0.039685671858585514,
                                    "time_weighted_return_2": -0.06102013456998856,
                                    "value": 475355.59242999996,
                                    "node_id": "13014080"
                                },
                                "children": []
                            }
                        ]
                    },
                    {
                        "name": "Portfolio_1",
                        "grouping": "_custom_new_entity_group_453577",
                        "columns": {
                            "time_weighted_return": -0.44554958179309,
                            "inception_event_date": "2021-04-07",
                            "_custom_portfolio_target_347209": "N/A",
                            "time_weighted_return_3": -0.44554958179309,
                            "time_weighted_return_2": -0.44554958179309,
                            "value": 260786.03,
                            "node_id": null
                        },
                        "children": [
                            {
                                "entity_id": 9552252,
                                "name": "The FW Irrev Family Tr",
                                "grouping": "top_level_legal_entity",
                                "columns": {
                                    "time_weighted_return": 0.0,
                                    "inception_event_date": "2022-01-11",
                                    "_custom_portfolio_target_347209": "N/A",
                                    "time_weighted_return_3": 0.0,
                                    "time_weighted_return_2": 0.0,
                                    "value": 260786.03,
                                    "node_id": "9552252"
                                },
                                "children": []
                            }
                        ]
                    },
                    {
                        "name": "Portfolio_2",
                        "grouping": "_custom_new_entity_group_453577",
                        "columns": {
                            "time_weighted_return": -0.05780354507057972,
                            "inception_event_date": "2021-09-03",
                            "_custom_portfolio_target_347209": "Growth",
                            "time_weighted_return_3": -0.05470214863844658,
                            "time_weighted_return_2": -0.05780354507057972,
                            "value": 1.8396664156520825E7,
                            "node_id": null
                        },
                        "children": [
                            {
                                "entity_id": 10946585,
                                "name": "FW DAF",
                                "grouping": "top_level_legal_entity",
                                "columns": {
                                    "time_weighted_return": -0.05780354507057972,
                                    "inception_event_date": "2021-09-03",
                                    "_custom_portfolio_target_347209": "Growth",
                                    "time_weighted_return_3": -0.05470214863844658,
                                    "time_weighted_return_2": -0.05780354507057972,
                                    "value": 1.8396664156520832E7,
                                    "node_id": "10946585"
                                },
                                "children": []
                            }
                        ]
                    }
                ]
            }
        }
    },
    "included": []
}

My code: this is the function, which I built to try and normalize the JSON response and save in a pandas dataframe -

def unpack_response():
    while True:
        try:    
            api_response = response_writer()
            df = pd.json_normalize(api_response['data']['attributes']['total']['children'])
            df.columns = df.columns.str.replace(r'columns.', '', regex=False)
            column_name_mapper = {column['key']: column['display_name'] for column in api_response['meta']['columns']}
            df.rename(columns=column_name_mapper, inplace=True)
            break
        except KeyError:
            print("-----------------------------------\n","API TIMEOUT ERROR: TRYING AGAIN...", "\n-----------------------------------\n")
    
    df.rename(columns={'name': 'New Entity Group'}, inplace=True)

    column_names = ["New Entity Group", "Entity ID", "Adjusted Value (1/31/2022, No Div, USD)", "Adjusted TWR (Current Quarter, No Div, USD)", "Adjusted TWR (YTD, No Div, USD)", "Annualized Adjusted TWR (Since Inception, No Div, USD)", "Inception Date"]
    df = df.reindex(columns=column_names)
    
    return df
unpack_response()

Comment about my code:

  • Portfolio_1, Portfolio_2, Portfolio_3-这些粗体行是data的第一级children,似乎是唯一保存到df的行.我想这是因为我的代码引用了df = pd.json_normalize(api_response['data']['attributes']['total']['children']),所以只查看这些列表.我try 将['children']['children']添加到该代码片段的末尾(假设有3倍级别的children,但收到的是TypeError: list indices must be integers or slices, not str).

I would be grateful for any suggestions on how I can improve or add to my function, so I can tap into the key:pair values, which are the 2x lower of the children levels.

推荐答案

就我个人而言,我不会在这个案件中使用pd.json_normalize.您的JSON非常复杂,除非您真正熟悉json_normalize,否则对于普通开发人员来说,理解以下代码可能需要更少的时间.事实上,您甚至不需要查看JSON就可以准确理解这段代码的功能(尽管它肯定会有帮助).

First, we can extract the objects (portfolios and their children) from the JSON into a list, and use a series of steps to get them in the right form and order:

def prep_obj(o):
    """Prepares an object (portfolio/child) from the JSON to be inserted into a dataframe."""
    return {
        'New Entity Group': o['name'],
    } | o['columns']


# Get a list of lists, where each sub-list contains the portfolio object at index 0 and then the portfolio object's children:
groups = [[prep_obj(o), *[prep_obj(child) for child in o['children']]] for o in api_response['data']['attributes']['total']['children']]

# Sort the portfolio groups by their number:
groups.sort(key=lambda g: int(g[0]['New Entity Group'].split('_')[1]))

# Reverse the children of each portfolio group:
groups = [[g[0]] + g[1:][::-1] for g in groups]

# Flatten out the groups into one large list of objects:
objects = [obj for group in groups for obj in group]
# The above is exactly equivalent to the following:
#   objects = []
#   for group in groups:
#       for obj in group:
#           objects.append(obj)

接下来,创建数据帧:

# Create a mapping for column names so that their display names can be used:
mapping = {col['key']: col['display_name'] for col in api_response['meta']['columns']}

# Create a dataframe from the list of objects:
df = pd.DataFrame(objects)

# Correct column names:
df = df.rename(mapping, axis=1)
# Reorder columns:
column_names = ["New Entity Group", "Entity ID", "Adjusted Value (1/31/2022, No Div, USD)", "Adjusted TWR (Current Quarter, No Div, USD)", "Adjusted TWR (YTD, No Div, USD)", "Annualized Adjusted TWR (Since Inception, No Div, USD)", "Inception Date", "Risk Target"]
df = df[column_names]

And formatting:

def format_twr_col(col):
    return (
        col
        .abs()
        .mul(100)
        .round(2)
        .pipe(lambda s: s.where(s.eq(0) | s.isna(), '(' + s.astype(str) + '%)'))
        .pipe(lambda s: s.where(s.ne(0) | s.isna(), s.astype(str) + '%'))
        .fillna('-')
    )

def format_value_col(col):
    positive_mask = col.ge(0)

    col[positive_mask] = (
        col[positive_mask]
        .round()
        .astype(int)
        .map('${:,}'.format)
    )

    col[~positive_mask] = (
        col[~positive_mask]
        .astype(float)
        .round()
        .astype(int)
        .abs()
        .map('(${:,})'.format)
    )
    
    return col

df['Adjusted TWR (Current Quarter, No Div, USD)'] = format_twr_col(df['Adjusted TWR (Current Quarter, No Div, USD)'])
df['Annualized Adjusted TWR (Since Inception, No Div, USD)'] = format_twr_col(df['Annualized Adjusted TWR (Since Inception, No Div, USD)'])
df['Adjusted TWR (YTD, No Div, USD)'] = format_twr_col(df['Adjusted TWR (YTD, No Div, USD)'])

df['Adjusted Value (1/31/2022, No Div, USD)'] = format_value_col(df['Adjusted Value (1/31/2022, No Div, USD)'].copy())

df['Inception Date'] = pd.to_datetime(df['Inception Date']).dt.strftime('%b %d, %Y')

df['Entity ID'] = df['Entity ID'].fillna('')

And... voilà:

>>> pd.options.display.max_columns = None
>>> df
         New Entity Group Entity ID Adjusted Value (1/31/2022, No Div, USD)  Adjusted TWR (Current Quarter, No Div, USD) Adjusted TWR (YTD, No Div, USD)  Annualized Adjusted TWR (Since Inception, No Div, USD) Inception Date  Risk Target
0             Portfolio_1                                          $260,786                                     (44.55%)                        (44.55%)                                            (44.55%)       Apr 07, 2021          N/A
1  The FW Irrev Family Tr   9552252                                $260,786                                         0.0%                            0.0%                                                0.0%       Jan 11, 2022          N/A
2             Portfolio_2                                       $18,396,664                                      (5.78%)                         (5.78%)                                             (5.47%)       Sep 03, 2021       Growth
3                  FW DAF  10946585                             $18,396,664                                      (5.78%)                         (5.78%)                                             (5.47%)       Sep 03, 2021       Growth
4             Portfolio_3                                       $60,143,818                                      (4.42%)                         (4.42%)                                             (7.75%)       Dec 17, 2020          NaN
5     The FW Family Trust  13014080                                $475,356                                       (6.1%)                          (6.1%)                                             (3.97%)       Apr 09, 2021   Aggressive
6       FW Liquid Fund LP  13396796                             $52,899,527                                      (4.15%)                         (4.15%)                                             (4.15%)       Dec 30, 2021   Aggressive
7   FW Holdings No. 2 LLC   8413655                              $6,768,937                                      (0.77%)                         (0.77%)                                            (11.84%)       Mar 05, 2021          N/A
8         FW and FR Joint   9957007                                    ($1)                                            -                               -                                                   -       Dec 21, 2021          N/A

Json相关问答推荐

如何编写MongoDB查询以返回数组数组

JSON API返回多个数组,需要帮助拼合数据以存储在SQL Server数据库表中

将 std::可选值存储到 json 文件 C++

使用 jq 从字符串列表开始创建对象

如何按键过滤

JOLT - 如果对象内部存在键,则将对象移动到数组

如何迭代、动态加载我的表单输入元素,然后在 React 中的表单提交上检索输入值?

使用 KQL 和外部 data() 运算符从 json 文件中提取信息

如何使用 React 从 NASA IMAGES API 中解构所需信息

JQuery,使用 GET 方法发送 JSON 对象

将 JSON 读取到 pandas 数据框 - ValueError:将 dicts 与非系列混合可能会导致排序不明确

在 Apache Spark 中读取多行 JSON

反序列化大型 json 对象的 JsonMaxLength 异常

使用 c# 通用地展平 Json

IE8 原生 JSON.parse 错误导致堆栈溢出

将 Objective-C 对象序列化和反序列化为 JSON

有没有办法使用 Jackson 将 Map 转换为 JSON 表示而不写入文件?

带有方法参数的 WCF webHttpBinding 错误. 最多可以在没有包装元素的情况下序列化一个主体参数

[__NSCFNumber 长度]:发送到实例 UITableView 的无法识别的 Select 器

JavaScriptSerializer 可以排除具有空值/默认值的属性吗?