Python 判断两个字典类型值列以分配标签

发布于06月17日

我有一个数据帧，看起来是这样的:

import pandas as pd
import numpy as np

data = {
    'api_spec_id': [117455, 117455, 117455, 117456, 117456],
    'commit_date': ['2023-06-01', '2023-06-02', '2023-06-03', '2023-06-01', '2023-06-02'],
    'diff_2': [
        [{
            "id": "response-media-type-removed",
            "text": "removed the media type 'application/json' for the response with the status '400'",
            "level": 0,
            "operation": "GET",
            "path": "/continuous/filters"
        },
        {
            "id": "response-media-type-removed",
            "text": "removed the media type 'application/json' for the response with the status '406'",
            "level": 0,
            "operation": "GET",
            "path": "/continuous/filters"
        },
        {
            "id": "response-media-type-removed",
            "text": "removed the media type 'application/json' for the response with the status '501'",
            "level": 0,
            "operation": "GET",
            "path": "/continuous/filters"
        }],
        [],
        [],
        [{
            "id": "response-media-type-removed",
            "text": "removed the media type 'application/xml' for the response with the status '200'",
            "level": 0,
            "operation": "GET",
            "path": "/continuous/filters"
        }],
        []
    ],
    'nonBreakingChanges': [
        {
            "api title added": 0,
            "api title modified": 0,
            "api description added": 0,
            "api description modified": 0,
            "api version added": 0,
            "api version modified": 0,
            "api contact deleted": 0,
            "api contact added": 0,
            "api contact modified": 0,
            "api license deleted": 0,
            "api license added": 0,
            "api license modified": 0,
            "server deleted": 0,
            "server added": 0,
            "server modified": 0,
            "path added": 0,
            "path parameter added": 0,
            "path parameter deleted": 0,
            "path parameter modified": 0,
            "desc schema property of resp": 57
        },
        {},
        {},
        {},
        {
            "api title added": 0,
            "api title modified": 0,
            "api description added": 0,
            "api description modified": 1,
            "api version added": 0,
            "api version modified": 0,
            "api contact deleted": 0,
            "api contact added": 0,
            "api contact modified": 0,
            "api license deleted": 0,
            "api license added": 0,
            "api license modified": 0,
            "server deleted": 0,
            "server added": 0,
            "server modified": 0,
            "path added": 0,
            "path parameter added": 0,
            "path parameter deleted": 0,
            "path parameter modified": 0,
            "desc schema property of resp": 0
        }
    ]
}

df = pd.DataFrame(data)

每个id都有多个提交日期，所以我想遍历所有提交日期，实际上是这样做的:

我有两个列diff_2和nonBreakingChanges，我想通过它们基于四个条件创建新的列type:

首先，如果diff_2和nonBreakingChanges都有值并且不为空，则分配Both

如果只有diff_2具有值，则分配B(在这种情况下，另一列可以是{}、[]、NaN或空字符串)

如果nonbreakingChanges具有值，则分配NB(在这种情况下，另一列可以是{}、[]、NaN或空字符串)

我不确定这如何可能与字典类型的数据，任何建议或 idea 将非常感激.

import pandas as pd def assign_type(row): diff_2 = row['diff_2'] nonBreakingChanges = row['nonBreakingChanges'] if diff_2 and nonBreakingChanges and diff_2 != [] and nonBreakingChanges != {}: return 'Both' elif diff_2 and diff_2 != []: return 'B' elif nonBreakingChanges and nonBreakingChanges != {}: return 'NB' else: return None df['type'] = df.apply(assign_type, axis=1)

Python 判断两个字典类型值列以分配标签

推荐答案

Python相关问答推荐

为什么自定义pytree aux_data对于jnp.数组来说在.jit()之后跟踪，而对于np.数组来说则不是？

如何分割我的收件箱，以便连续的数字各自位于自己的收件箱中？

Pandas使用过滤器映射多列

Python中使用Delivercio进行多个请求

从Python调用GMP C函数时的分段错误和内存泄漏

Pandas滚动分钟，来自其他列的相应值

如果我已经使用了time，如何要求Python在12秒后执行另一个操作.sleep

有什么方法可以避免使用许多if陈述

配置Sweetviz以分析对象类型列，而无需转换

如何删除索引过go 的lexsort深度可能会影响性能？' &>

沿着数组中的轴计算真实条目

使用索引列表列表对列进行切片并获取行方向的向量长度

Vectorize多个头寸的止盈/止盈回溯测试pythonpandas

如何在solve()之后获得症状上的等式的值

try 将一行连接到Tensorflow中的矩阵

如何在表中添加重复的列？

如何并行化/加速并行numba代码？

Python逻辑操作作为Pandas中的条件

旋转多边形而不改变内部空间关系

以逻辑方式获取自己的pyproject.toml依赖项