pythonpptx 如何在多次运行中替换关键字

发布于08月06日

我有两个PPT(File1.pptx和File2.pptx)，其中有以下两行

XX NOV 2021, Time: xx:xx – xx:xx hrs (90mins)
FY21/22 / FY22/23

我希望如下所示更换

(A)NOV 2021至NOV 2022.

(B)FY21/22 / FY22/23至FY21/22 or FY22/23.

但问题是，我的替代品在File1.pptx可以工作，但在File2.pptx不能工作.

当我打印运行文本时，我能够看到它们在两张幻灯片中以不同的方式表示.

def replace_text(replacements:dict,shapes:list):
    for shape in shapes:
        for match, replacement in replacements.items():
            if shape.has_text_frame:
                if (shape.text.find(match)) != -1:
                    text_frame = shape.text_frame
                    for paragraph in text_frame.paragraphs:
                        for run in paragraph.runs:
                            cur_text = run.text
                            print(cur_text)
                            print("---")
                            new_text = cur_text.replace(str(match), str(replacement))
                            run.text = new_text

在File1.pptx中，cur_text如下所示(表示第一个关键字).所以，我的替换起作用了(因为它包含了我正在寻找的关键字)

但在File2.pptx中，cur_text如下所示(表示第一个关键字).因此，替换不起作用(因为cur_text与我的搜索词不匹配)

同样的问题也发生在第二个关键字FY21/22 / FY22/23.

问题是Split关键字可能在当前运行的上一次运行或下一次运行中(没有模式).因此，我们应该能够将搜索项与之前的运行项(以及当前项)进行比较.然后可以找到匹配项(如2021年11月)并进行替换.

这个问题只发生在10%的搜索词(并不是我所有的搜索词)，但这个问题的存在是可怕的，因为如果百分比增加，我们可能不得不做大量的手动工作.我们如何避免这种情况并正确编码呢？

如何获取/提取/查找/识别我们正在多次运行(如果确实存在)的单词，如CTRL+F并将其替换为所需的关键字？

有什么需要帮忙的吗？

from pptx import Presentation from pptx.chart.data import CategoryChartData from pptx.enum.chart import XL_CHART_TYPE,XL_LABEL_POSITION from pptx.util import Inches, Pt from pptx.dml.color import RGBColor from pptx.enum.dml import MSO_THEME_COLOR # create presentation with 1 slide ------ prs = Presentation() slide = prs.slides.add_slide(prs.slide_layouts[5]) textbox_shape = slide.shapes.add_textbox(Pt(200),Pt(200),Pt(30),Pt(240)) text_frame = textbox_shape.text_frame p = text_frame.paragraphs[0] font = p.font font.name = 'Arial' font.size = Pt(10) font.bold = False font.italic = False font.color.rgb = RGBColor(0,0,0) run = p.add_run() run.text = 'Hello there! ' run = p.add_run() run.text = 'How ' font = run.font font.italic = True font.bold = True run = p.add_run() run.text = 'are' font = run.font font.italic = True font.bold = True font.size = Pt(16) run = p.add_run() run.text = ' you?' font = run.font font.italic = True font.bold = True run = p.add_run() run.text = ' What is your name?' run.font.italic = True prs.save('text-01.pptx')

from pptx import Presentation from pptx.chart.data import CategoryChartData from pptx.shapes.graphfrm import GraphicFrame from pptx.enum.chart import XL_CHART_TYPE from pptx.util import Inches def replace_text(replacements, shapes): for shape in shapes: if shape.has_text_frame: text_frame = shape.text_frame for (match, replacement) in replacements.items(): if text_frame.text.find(match)>=0: for paragraph in text_frame.paragraphs: pos = paragraph.text.find(match) while pos>=0: replace_runs_text(paragraph.runs, pos, len(match), replacement) pos = paragraph.text.find(match) def replace_runs_text(runs, pos, match_len, replacement): cnt = len(runs) i = 0 while i<cnt: olen = len(runs[i].text) if pos<olen: # we found the run, where the match starts! to_replace = replacement repl_len = len(to_replace) while i<cnt: run = runs[i] otext = run.text olen = len(otext) if pos+match_len < olen: # our match ends before the end of the text of this run therefore # we put the rest of our replacement string here and we are done! run.text = otext[0:pos]+to_replace+otext[pos+match_len:] return if pos+match_len == olen: # our match ends together with the text of this run therefore # we put the rest of our replacement string here and we are done! run.text = otext[0:pos]+to_replace return # we still haven't found all of our original match string # so we process what we have here and go on to the next run part_match_len = olen-pos ntext = otext[0:pos] if repl_len <= part_match_len: # we now found at least as many characters for our match string # as we have replacement characters for it. Thus we use up the # the rest of our replacement string here and will replace the # remainder of the match with an empty string (which happens # to happen in this exact same spot for the next run ;-)) ntext += to_replace repl_len = 0 to_replace = '' else: # we have got some more match characters but still more # replacement characters than match characters found ntext += to_replace[0:part_match_len] to_replace = to_replace[part_match_len:] repl_len -= part_match_len run.text = ntext # save the new text to the run match_len -= part_match_len # this is what is left to match pos = 0 # in the next run, we start at pos 0 with our match i += 1 # and off to the next run else: pos -= olen # the relative position of our match in the next run's text i += 1 # and off to the next run # create presentation with 1 slide ------ prs = Presentation('text-01.pptx') # what is to be replaced replacements = { 'How are you?': "I'm fine!" } # loop through all slides and replace text in all their shapes for slide in prs.slides: replace_text(replacements, slide.shapes) # save changed presentation prs.save('text-02.pptx')

pythonpptx 如何在多次运行中替换关键字

推荐答案

Python相关问答推荐

Pystata：从Python并行运行stata实例

将特定列信息移动到当前行下的新行

运行Python脚本时，用作命令行参数的SON文本

图像 pyramid .难以创建所需的合成图像

如何从在虚拟Python环境中运行的脚本中运行需要宿主Python环境的Shell脚本？

Polars：用氨纶的其他部分替换氨纶的部分

运输问题分支定界法&

Pandas：将多级列名改为一级

从spaCy的句子中提取日期

为什么if2/if3会提供两种不同的输出？

matplotlib + python foor loop

python中csv. Dictreader. fieldname的类型是什么？'

Flask Jinja2如果语句总是计算为false&

python sklearn ValueError：使用序列设置数组元素

如何在Gekko中使用分层条件约束

Pandas在rame中在组内洗牌行，保持相对组的顺序不变，

使用xlsxWriter在EXCEL中为数据帧的各行上色

如何获取给定列中包含特定值的行号？

对包含JSON列的DataFrame进行分组

对列中的数字进行迭代，得到n次重复开始的第一个行号