我有一个数据帧,看起来是这样的:
import pandas as pd
import numpy as np
data = {
'api_spec_id': [117455, 117455, 117455, 117456, 117456],
'commit_date': ['2023-06-01', '2023-06-02', '2023-06-03', '2023-06-01', '2023-06-02'],
'diff_2': [
[{
"id": "response-media-type-removed",
"text": "removed the media type 'application/json' for the response with the status '400'",
"level": 0,
"operation": "GET",
"path": "/continuous/filters"
},
{
"id": "response-media-type-removed",
"text": "removed the media type 'application/json' for the response with the status '406'",
"level": 0,
"operation": "GET",
"path": "/continuous/filters"
},
{
"id": "response-media-type-removed",
"text": "removed the media type 'application/json' for the response with the status '501'",
"level": 0,
"operation": "GET",
"path": "/continuous/filters"
}],
[],
[],
[{
"id": "response-media-type-removed",
"text": "removed the media type 'application/xml' for the response with the status '200'",
"level": 0,
"operation": "GET",
"path": "/continuous/filters"
}],
[]
],
'nonBreakingChanges': [
{
"api title added": 0,
"api title modified": 0,
"api description added": 0,
"api description modified": 0,
"api version added": 0,
"api version modified": 0,
"api contact deleted": 0,
"api contact added": 0,
"api contact modified": 0,
"api license deleted": 0,
"api license added": 0,
"api license modified": 0,
"server deleted": 0,
"server added": 0,
"server modified": 0,
"path added": 0,
"path parameter added": 0,
"path parameter deleted": 0,
"path parameter modified": 0,
"desc schema property of resp": 57
},
{},
{},
{},
{
"api title added": 0,
"api title modified": 0,
"api description added": 0,
"api description modified": 1,
"api version added": 0,
"api version modified": 0,
"api contact deleted": 0,
"api contact added": 0,
"api contact modified": 0,
"api license deleted": 0,
"api license added": 0,
"api license modified": 0,
"server deleted": 0,
"server added": 0,
"server modified": 0,
"path added": 0,
"path parameter added": 0,
"path parameter deleted": 0,
"path parameter modified": 0,
"desc schema property of resp": 0
}
]
}
df = pd.DataFrame(data)
每个id都有多个提交日期,所以我想遍历所有提交日期,实际上是这样做的:
我有两个列diff_2
和nonBreakingChanges
,我想通过它们基于四个条件创建新的列type
:
首先,如果diff_2
和nonBreakingChanges
都有值并且不为空,则分配Both
如果只有diff_2
具有值,则分配B
(在这种情况下,另一列可以是{}、[]、NaN或空字符串)
如果nonbreakingChanges
具有值,则分配NB
(在这种情况下,另一列可以是{}、[]、NaN或空字符串)
我不确定这如何可能与字典类型的数据,任何建议或 idea 将非常感激.