如何使用Python在pdf中替换单词

Question

33 浏览2023年5月21日

匿名的 2022年6月23日

0 Comments

这个问题已经在其他地方有了答案：

如何使用Python替换PDF中的文本？

在Python中搜索和替换PDF中的文本[duplicate]

我想替换PDF文件中的一个单词，但是当我尝试这样做时，它总是返回相同的PDF文件。这是我的代码块。目前我正在使用pypdf2，但如果有任何建议，我可以更换它。我的代码缺少什么部分？

  with open(file_path, 'rb') as file:
        pdf_reader = PdfFileReader(file)
        # Encrypt the word in the PDF content
        encrypted_word = self.cipher.encrypt(word_to_encrypt_bytes)
        encrypted_word_b64 = base64.b64encode(encrypted_word)
        # Write the encrypted PDF content to a new PDF file
        pdf_writer = PdfFileWriter()
        for i in range(pdf_reader.getNumPages()):
            page = pdf_reader.getPage(i)
            page_content = page.extractText()
            page_content_b = page_content.encode('utf-8')
            page_content_b = page_content_b.replace(word_to_encrypt.encode(), encrypted_word_b64)
            page_content = page_content_b.decode('utf-8')
            pdf_writer.addPage(page)
        output_path = os.path.join(file_dir, file_name_without_ext + '_encryptedm' + ext)
        with open(output_path, 'wb') as output_file:
            pdf_writer.write(output_file)

我想在我的PDF中放置一个单词。

admin 更改状态以发布 2023年5月21日

0

1 答案

匿名的 · Answer 1 · 2022-06-23T20:57:58+00:00

使用PyMuPDF提供的解决方案请原谅我，例如页面文本: enter image description here

假设我们想要纠正拼写错误，我们可以使用以下代码:

In [1]: import fitz  # PyMuPDF
...
In [9]: doc=fitz.open("test.pdf")
In [10]: page=doc[0]
In [11]: words=page.get_text("words")  # extract variant by single words
...
In [13]: for word in words:
    ...:     if word[4] == "Currentyle":
    ...:         page.add_redact_annot(word[:4],text="Currently")
    ...:
In [14]: page.apply_redactions()
Out[14]: True
In [15]: doc.ez_save("text-replaced.pdf")

这将给我们带来以下结果: enter image description here