Python 合并 PDF 文件

发布于08月10日

是否可以使用Python合并单独的PDF文件？

假设是这样，我需要进一步扩展.我希望在目录中的文件夹中循环并重复这个过程.

我可能很走运，但是否可以排除每个PDF中包含的页面(我的报告生成总是创建一个额外的空白页面).

推荐答案

使用Pypdf或其后续版本PyPDF2:

作为PDF工具包构建的纯Python库.它能够:

一页一页地拆分文档，

一页一页地合并文档，

(还有更多)

下面是一个适用于这两个版本的示 routine 序.

#!/usr/bin/env python
import sys
try:
    from PyPDF2 import PdfFileReader, PdfFileWriter
except ImportError:
    from pyPdf import PdfFileReader, PdfFileWriter

def pdf_cat(input_files, output_stream):
    input_streams = []
    try:
        # First open all the files, then produce the output file, and
        # finally close the input files. This is necessary because
        # the data isn't read from the input files until the write
        # operation. Thanks to
        # https://stackoverflow.com/questions/6773631/problem-with-closing-python-pypdf-writing-getting-a-valueerror-i-o-operation/6773733#6773733
        for input_file in input_files:
            input_streams.append(open(input_file, 'rb'))
        writer = PdfFileWriter()
        for reader in map(PdfFileReader, input_streams):
            for n in range(reader.getNumPages()):
                writer.addPage(reader.getPage(n))
        writer.write(output_stream)
    finally:
        for f in input_streams:
            f.close()
        output_stream.close()

if __name__ == '__main__':
    if sys.platform == "win32":
        import os, msvcrt
        msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
    pdf_cat(sys.argv[1:], sys.stdout)