我有一个数据帧,看起来是这样的:

import pandas as pd
import numpy as np

data = {
    'api_spec_id': [117455, 117455, 117455, 117456, 117456],
    'commit_date': ['2023-06-01', '2023-06-02', '2023-06-03', '2023-06-01', '2023-06-02'],
    'diff_2': [
        [{
            "id": "response-media-type-removed",
            "text": "removed the media type 'application/json' for the response with the status '400'",
            "level": 0,
            "operation": "GET",
            "path": "/continuous/filters"
        },
        {
            "id": "response-media-type-removed",
            "text": "removed the media type 'application/json' for the response with the status '406'",
            "level": 0,
            "operation": "GET",
            "path": "/continuous/filters"
        },
        {
            "id": "response-media-type-removed",
            "text": "removed the media type 'application/json' for the response with the status '501'",
            "level": 0,
            "operation": "GET",
            "path": "/continuous/filters"
        }],
        [],
        [],
        [{
            "id": "response-media-type-removed",
            "text": "removed the media type 'application/xml' for the response with the status '200'",
            "level": 0,
            "operation": "GET",
            "path": "/continuous/filters"
        }],
        []
    ],
    'nonBreakingChanges': [
        {
            "api title added": 0,
            "api title modified": 0,
            "api description added": 0,
            "api description modified": 0,
            "api version added": 0,
            "api version modified": 0,
            "api contact deleted": 0,
            "api contact added": 0,
            "api contact modified": 0,
            "api license deleted": 0,
            "api license added": 0,
            "api license modified": 0,
            "server deleted": 0,
            "server added": 0,
            "server modified": 0,
            "path added": 0,
            "path parameter added": 0,
            "path parameter deleted": 0,
            "path parameter modified": 0,
            "desc schema property of resp": 57
        },
        {},
        {},
        {},
        {
            "api title added": 0,
            "api title modified": 0,
            "api description added": 0,
            "api description modified": 1,
            "api version added": 0,
            "api version modified": 0,
            "api contact deleted": 0,
            "api contact added": 0,
            "api contact modified": 0,
            "api license deleted": 0,
            "api license added": 0,
            "api license modified": 0,
            "server deleted": 0,
            "server added": 0,
            "server modified": 0,
            "path added": 0,
            "path parameter added": 0,
            "path parameter deleted": 0,
            "path parameter modified": 0,
            "desc schema property of resp": 0
        }
    ]
}

df = pd.DataFrame(data)

每个id都有多个提交日期,所以我想遍历所有提交日期,实际上是这样做的:

我有两个列diff_2nonBreakingChanges,我想通过它们基于四个条件创建新的列type:

首先,如果diff_2nonBreakingChanges都有值并且不为空,则分配Both

如果只有diff_2具有值,则分配B(在这种情况下,另一列可以是{}、[]、NaN或空字符串)

如果nonbreakingChanges具有值,则分配NB(在这种情况下,另一列可以是{}、[]、NaN或空字符串)

我不确定这如何可能与字典类型的数据,任何建议或 idea 将非常感激.

推荐答案

import pandas as pd

def assign_type(row):
    diff_2 = row['diff_2']
    nonBreakingChanges = row['nonBreakingChanges']
    
    if diff_2 and nonBreakingChanges and diff_2 != [] and nonBreakingChanges != {}:
        return 'Both'
    elif diff_2 and diff_2 != []:
        return 'B'
    elif nonBreakingChanges and nonBreakingChanges != {}:
        return 'NB'
    else:
        return None

df['type'] = df.apply(assign_type, axis=1)

该代码定义了一个函数assign_type,该函数接受DataFrame中的一行作为输入.它根据指定的条件判断diff_2nonBreakingChanges的值,并返回相应的类型.然后,使用axis=1apply函数对DataFrame的每一行应用该函数,以迭代各行.

生成的DataFrame df将有一个名为‘type’的新列,其中包含根据您提供的条件分配的类型.

Python相关问答推荐

为什么自定义pytree aux_data对于jnp.数组来说在.jit()之后跟踪,而对于np.数组来说则不是?

如何分割我的收件箱,以便连续的数字各自位于自己的收件箱中?

Pandas使用过滤器映射多列

Python中使用Delivercio进行多个请求

从Python调用GMP C函数时的分段错误和内存泄漏

Pandas滚动分钟,来自其他列的相应值

如果我已经使用了time,如何要求Python在12秒后执行另一个操作.sleep

有什么方法可以避免使用许多if陈述

配置Sweetviz以分析对象类型列,而无需转换

如何删除索引过go 的lexsort深度可能会影响性能?' &>

沿着数组中的轴计算真实条目

使用索引列表列表对列进行切片并获取行方向的向量长度

Vectorize多个头寸的止盈/止盈回溯测试pythonpandas

如何在solve()之后获得症状上的等式的值

try 将一行连接到Tensorflow中的矩阵

如何在表中添加重复的列?

如何并行化/加速并行numba代码?

Python逻辑操作作为Pandas中的条件

旋转多边形而不改变内部空间关系

以逻辑方式获取自己的pyproject.toml依赖项