如何使用Python自動搜尋PowerPoint檔案中的文字?

我們有時候會製作許多的PPT簡報檔案,記載了許多的重點,重點分散在多個檔案中。如果之後想要再次利用,往往會花上不少時間在搜尋上。這裡找到一個Stackoverflow上的範例程式,透過python-pptx這個套件,輸入指定關鍵字,遞迴搜尋PPT檔案,回傳出包含指定關鍵字的檔案名稱,省下我們的人工找檔案時間。

安裝套件

pip install python-pptx

主程式

# REF https://stackoverflow.com/questions/55497789/find-a-word-in-multiple-powerpoint-files-python/55763992#55763992from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE_TYPE
import ospath = "./"files = [x for x in os.listdir(path) if x.endswith(".pptx")]def CheckRecursivelyForText(shpthissetofshapes):
for shape in shpthissetofshapes:
if shape.shape_type == MSO_SHAPE_TYPE.GROUP:
checkrecursivelyfortext(shape.shapes)
else:
if hasattr(shape, "text"):
shape.text = shape.text.lower()
if "what_ever_you_want_to_find" in shape.text:
print(eachfile)
print("----------------------")
else :
print("No text found in these PPTs")
print("----------------------")
break
for eachfile in files:
prs = Presentation(path + eachfile)
for slide in prs.slides:
CheckRecursivelyForText(slide.shapes)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store