如何检测第一个单词并将其包含在 python 的字符串替换行中

发布于07月27日

我想读一列，每行的第一个单词是调查的季度和年份，以及调查的名称.起初，我试图重命名调查名称，在该名称中，我在整个列中保持季度和年度不变，但如果我针对其他季度的文件运行此脚本，则不会检测到整行内容，我的脚本也不会工作.

我的例子:

        Survey Name
0       Q321 Your Voice - Information Tech
1       Q321 Your Voice - Information Tech
2       Q321 Your Voice - Information Tech
3       Q321 Your Voice - Information Tech
4       Q321 Your Voice - Information Tech
                
9630    Q321 Your Voice - Business Group
9631    Q321 Your Voice - Business Group

(Q321=2021第3季度)

我的代码将其转换为:

Survey Name
0       Q321 YV - IT
1       Q321 YV - IT
2       Q321 YV - IT
3       Q321 YV - IT
4       Q321 YV - IT
                
9630    Q321 YV - BG
9631    Q321 YV - BG

我使用的代码:

print(df.loc[:, "Survey.Name"])

'isolate to column of interest and replace commonly incorrect string with the correct output'

df.loc[df['Survey.Name'].str.contains('Q321 Your Voice - Information Tech'), 'Survey.Name'] = \
    'Q321 YV - IT'

df.loc[df['Survey.Name'].str.contains('Q321 Your Voice - Business Group'), 'Survey.Name'] = \
    'Q321 YV - BG'

df.loc[df['Survey.Name'].str.contains('Q321 Your Voice - Study Group'), 'Survey.Name'] = \
    'Q321 YV - SG'
        
    
print(df.loc[:, "Survey.Name"])

但假设我在另一个季度(比如2021第4季度)的文件上运行此脚本:

Survey Name
0       Q421 Your Voice - Information Tech
1       Q421 Your Voice - Information Tech
2       Q421 Your Voice - Information Tech
3       Q421 Your Voice - Information Tech
4       Q421 Your Voice - Information Tech

9630    Q421 Your Voice - Business Group
9631    Q421 Your Voice - Business Group

每次使用新季度时，我都必须更改脚本.我有没有办法"检测"第一个单词(幸运的是恰好是调查的季度和年份)，并将其包含在转换后的版本中，同时替换该列中需要更改的字符串？

def repl(x): head, tail = x.split("-") quarter, *chunk = head.split() head_initials = "".join(c[0] for c in chunk) tail_initials = "".join(c[0] for c in tail.split()) return f"{quarter} {head_initials} - {tail_initials}" res = df["Survey Name"].apply(repl)

replacements = { "Your Voice - Information Tech": "YV - IT Group", "Your Voice - Business Group": "YV - BG", "Your Voice - Human Resources": "YV - LRECS" } def repl(match, repls=replacements): quarter = match.group(1) key = " ".join(match.group(2).strip().split()) return f"{quarter} {replacements.get(key, '')}" res = df["Survey Name"].str.replace(r"(Q\d+)\s+(.+)", repl, regex=True) print(res)

0 Q321 YV - IT Group 1 Q321 YV - IT Group 2 Q321 YV - IT Group 3 Q321 YV - IT Group 4 Q321 YV - IT Group 5 Q321 YV - BG 6 Q321 YV - LRECS Name: Survey Name, dtype: object

{'Survey Name': {0: 'Q321 Your Voice - Information Tech', 1: 'Q321 Your Voice - Information Tech', 2: 'Q321 Your Voice - Information Tech', 3: 'Q321 Your Voice - Information Tech', 4: 'Q321 Your Voice - Information Tech', 5: 'Q321 Your Voice - Business Group', 6: 'Q321 Your Voice - Human Resources'}}

如何检测第一个单词并将其包含在 python 的字符串替换行中

推荐答案

Python相关问答推荐

如何观察cv2.erode()的中间过程？

如何在Python中按组应用简单的线性回归？

如何匹配3D圆柱体的轴和半径？

如何使用stride_tricks.as_strided逆转NumPy数组

将HTML输出转换为表格中的问题

Pythind 11无法弄清楚如何访问tuple元素

根据在同一数据框中的查找向数据框添加值

为什么这个带有List输入的简单numba函数这么慢

在Polars(Python库)中将二进制转换为具有非UTF-8字符的字符串变量

在Wayland上使用setCellWidget时，try 编辑QTable Widget中的单元格时，PyQt 6崩溃

如何使用表达式将字符串解压缩到Polars DataFrame中的多个列中？

优化器的运行顺序影响PyTorch中的预测

为什么抓取的HTML与浏览器判断的元素不同？

Stacked bar chart from billrame

如何让这个星型模式在Python中只使用一个for循环？

我如何根据前一个连续数字改变一串数字？

实现自定义QWidgets作为QTimeEdit的弹出窗口

使用密钥字典重新配置嵌套字典密钥名

有没有一种ONE—LINER的方法给一个框架的每一行一个由整数和字符串组成的唯一id？

为什么'if x is None：pass'比'x is None'单独使用更快？