pythonpptx 如何在多次运行中替换关键字

发布于08月06日

I have two PPTs (File1.pptx and File2.pptx) in which I have the below 2 lines

XX NOV 2021, Time: xx:xx – xx:xx hrs (90mins)
FY21/22 / FY22/23

I wish to replace like below

a) NOV 2021 as NOV 2022.

b) FY21/22 / FY22/23 as FY21/22 or FY22/23.

But the problem is my replacement works in File1.pptx but it doesn't work in File2.pptx.

When I printed the run text, I was able to see that they are represented differently in two slides.

def replace_text(replacements:dict,shapes:list):
    for shape in shapes:
        for match, replacement in replacements.items():
            if shape.has_text_frame:
                if (shape.text.find(match)) != -1:
                    text_frame = shape.text_frame
                    for paragraph in text_frame.paragraphs:
                        for run in paragraph.runs:
                            cur_text = run.text
                            print(cur_text)
                            print("---")
                            new_text = cur_text.replace(str(match), str(replacement))
                            run.text = new_text

In File1.pptx, the cur_text looks like below (for 1st keyword). So, my replace works (as it contains the keyword that I am looking for)

But in File2.pptx, the cur_text looks like below (for 1st keyword). So, replace doesn't work (because the cur_text doesn't match with my search term)

The same issue happens for 2nd keyword as well which is FY21/22 / FY22/23.

The problem is the split keyword could be in previous or next run from current run (with no pattern). So, we should be able to compare a search term with previous run term (along with current term as well). Then a match can be found (like Nov 2021) and be replaced.

This issue happens for only 10% of the search terms (and not for all of my search terms) but scary to live with this issue because if the % increases, we may have to do a lot of manual work. How do we avoid this and code correctly?

How do we get/extract/find/identify the word that we are looking for across multiple runs (when they are indeed present) like CTRL+F and replace it with desired keyword?

Any help please?

from pptx import Presentation from pptx.chart.data import CategoryChartData from pptx.enum.chart import XL_CHART_TYPE,XL_LABEL_POSITION from pptx.util import Inches, Pt from pptx.dml.color import RGBColor from pptx.enum.dml import MSO_THEME_COLOR # create presentation with 1 slide ------ prs = Presentation() slide = prs.slides.add_slide(prs.slide_layouts[5]) textbox_shape = slide.shapes.add_textbox(Pt(200),Pt(200),Pt(30),Pt(240)) text_frame = textbox_shape.text_frame p = text_frame.paragraphs[0] font = p.font font.name = 'Arial' font.size = Pt(10) font.bold = False font.italic = False font.color.rgb = RGBColor(0,0,0) run = p.add_run() run.text = 'Hello there! ' run = p.add_run() run.text = 'How ' font = run.font font.italic = True font.bold = True run = p.add_run() run.text = 'are' font = run.font font.italic = True font.bold = True font.size = Pt(16) run = p.add_run() run.text = ' you?' font = run.font font.italic = True font.bold = True run = p.add_run() run.text = ' What is your name?' run.font.italic = True prs.save('text-01.pptx')

from pptx import Presentation from pptx.chart.data import CategoryChartData from pptx.shapes.graphfrm import GraphicFrame from pptx.enum.chart import XL_CHART_TYPE from pptx.util import Inches def replace_text(replacements, shapes): for shape in shapes: if shape.has_text_frame: text_frame = shape.text_frame for (match, replacement) in replacements.items(): if text_frame.text.find(match)>=0: for paragraph in text_frame.paragraphs: pos = paragraph.text.find(match) while pos>=0: replace_runs_text(paragraph.runs, pos, len(match), replacement) pos = paragraph.text.find(match) def replace_runs_text(runs, pos, match_len, replacement): cnt = len(runs) i = 0 while i<cnt: olen = len(runs[i].text) if pos<olen: # we found the run, where the match starts! to_replace = replacement repl_len = len(to_replace) while i<cnt: run = runs[i] otext = run.text olen = len(otext) if pos+match_len < olen: # our match ends before the end of the text of this run therefore # we put the rest of our replacement string here and we are done! run.text = otext[0:pos]+to_replace+otext[pos+match_len:] return if pos+match_len == olen: # our match ends together with the text of this run therefore # we put the rest of our replacement string here and we are done! run.text = otext[0:pos]+to_replace return # we still haven't found all of our original match string # so we process what we have here and go on to the next run part_match_len = olen-pos ntext = otext[0:pos] if repl_len <= part_match_len: # we now found at least as many characters for our match string # as we have replacement characters for it. Thus we use up the # the rest of our replacement string here and will replace the # remainder of the match with an empty string (which happens # to happen in this exact same spot for the next run ;-)) ntext += to_replace repl_len = 0 to_replace = '' else: # we have got some more match characters but still more # replacement characters than match characters found ntext += to_replace[0:part_match_len] to_replace = to_replace[part_match_len:] repl_len -= part_match_len run.text = ntext # save the new text to the run match_len -= part_match_len # this is what is left to match pos = 0 # in the next run, we start at pos 0 with our match i += 1 # and off to the next run else: pos -= olen # the relative position of our match in the next run's text i += 1 # and off to the next run # create presentation with 1 slide ------ prs = Presentation('text-01.pptx') # what is to be replaced replacements = { 'How are you?': "I'm fine!" } # loop through all slides and replace text in all their shapes for slide in prs.slides: replace_text(replacements, slide.shapes) # save changed presentation prs.save('text-02.pptx')

pythonpptx 如何在多次运行中替换关键字

推荐答案

Python相关问答推荐

从numpy数组和参数创建收件箱

优化pytorch函数以消除for循环

如何在python xsModel库中定义一个可选[December]字段，以产生受约束的SON模式

如何使Matplotlib标题以图形为中心，而图例框则以图形为中心

lityter不让我输入左边的方括号，'

网格基于1.Y轴与2.x轴显示在matplotlib中

python中csv. Dictreader. fieldname的类型是什么？'

使用Python TCP套接字发送整数并使用C#接收—接收正确数据时出错

为什么我的scipy.optimize.minimize(method="；newton-cg"；)函数停留在局部最大值上？

BeatuifulSoup从欧洲志愿者服务中获取数据和解析：一个从EU-Site收集机会的小铲子

如何在基于时间的数据帧中添加计算值

迭代工具组合不会输出大于3的序列

如何在Django查询集中生成带有值列表的带注释的字段？

运行从Airflow包导入的python文件，需要airflow实例？

GEKKO中若干参数的线性插值动态优化

如何从具有完整层次数据的Pandas框架生成图形？

我的浮点问题--在C++/Python中的试用

将.exe文件从.py转换后出现问题."；ModuleNotFoundError：没有名为'；Selify；的模块

Pandas 身上的负数造型

Pandas：根据系列词典中的值筛选行