我有一个包含许多列的数据帧,其中一列如下所示:
data = {'Product': ['Product A', 'Product B', 'Product C (discontinued in March 2021)', 'Product D', 'Product E (discontinued on 30 April 2004)']}
df = pd.DataFrame(data)
我试着编写了一段代码,它遍历列的每一行,在方括号中标识年份(如果适用),并将方括号中的文本替换为下面的'discont. '
+the year identified
.因此,对于'Product C'
,它应该更改为Product C (discont. 2021)
.
def amend_vals(value):
pattern = r'\((\d{4})\)' # Regex pattern to capture the year inside brackets
match = re.search(pattern, value)
if match:
year = match.group(1)
return re.sub(pattern, '(discont. ' + year + ')', value)
else:
return value
df['Product'] = df['Product'].apply(amend_vals)
不过,这似乎并不管用.有谁有办法修好它吗?