将Series.explode
与Series.str.extractall
一起使用,转换为数字和聚合列表:
df["Sprint Number"] = (df["sprintlist"].explode()
.str.extractall(r"(\d+)$")[0]
.astype(int)
.groupby(level=0)
.agg(list))
print (df)
Key Sprint sprintlist Sprint Number
0 567 Max1;Max2 [Max1, Max2] [1, 2]
1 568 Max2 [Max2] [2]
2 569 DI001 2 [DI001 2] [2]
3 570 DI001 25 [DI001 25] [25]
4 571 DAS 100 [DAS 100] [100]
5 572 DI001 101 [DI001 101] [101]
或使用包含regex
的列表综合:
df["Sprint Number"] = [[int(re.search('(\d+)$', y).group(0)) for y in x]
for x in df["sprintlist"]]
print (df)
Key Sprint sprintlist Sprint Number
0 567 Max1;Max2 [Max1, Max2] [1, 2]
1 568 Max2 [Max2] [2]
2 569 DI001 2 [DI001 2] [2]
3 570 DI001 25 [DI001 25] [25]
4 571 DAS 100 [DAS 100] [100]
5 572 DI001 101 [DI001 101] [101]
如果可能,一些字符串不会以数字加上赋值运算符:=
结尾,测试None
:
import re
mydata = {"Key" : [567, 568, 569, 570, 571, 572] ,
"Sprint" : ["Max1;Max", "Max2", "DI001 2", "DI001 25", "DAS 100" , "DI001 101"]}
df = pd.DataFrame(mydata)
df ["sprintlist"]= df["Sprint"].str.split(";")
df["Sprint Number"] = [[int(m.group(0))
for y in x if( m:=re.search('(\d+)$', y)) is not None]
for x in df["sprintlist"]]
print (df)
Key Sprint sprintlist Sprint Number
0 567 Max1;Max [Max1, Max] [1]
1 568 Max2 [Max2] [2]
2 569 DI001 2 [DI001 2] [2]
3 570 DI001 25 [DI001 25] [25]
4 571 DAS 100 [DAS 100] [100]
5 572 DI001 101 [DI001 101] [101]