Python3.x 如何将CSV或FDF数据解析到Python词典并注入到模板PDF表单中

发布于03月10日

我在客户关系管理系统中有客户数据，我需要使用这些数据来自动完成PDF表单.

我已经搞清楚了CRM提取到CSV，后提取CSV操作，正确的CSV解析为FDF格式，并创建和保存新的PDF文件，should填充来自FDF的数据.代码应该打开一个模板表单，其中的字段名称与CSV和FDF文件匹配，注入字段的值并将其另存为新文件.

我似乎遇到的问题是，我无法将FDF数据放入模板中.当我打开新保存的PDF时，FIED是空白的.我try 过几种不同的方法，但现在我已经陷入困境，我想我已经失go 了保持客观的能力.

我使用的是pypdf 4.1.0库，因为我看到开发人员已经将PyPDF2重新放入其中.

以下是我目前正在编写的代码.我try 了许多不同的方法，但这是代表当前状态的最干净的版本:

import csv
import os
from pypdf import PdfWriter, PdfReader

# Read the CSV file
with open('acrobat_import.csv', 'r') as file:
    reader = csv.DictReader(file)
    data = list(reader)

# Function to create the FDF file content
def create_fdf_content(row):
    fdf_content = """
    
%FDF-1.2
%����
1 0 obj
<</FDF<</F(template_form.pdf)/Fields[
    <</T(courseType)/V/{courseType}>>
    <</T(gender)/V/{gender}>>
    <</T(dateofbirth)/V({dateofbirth})>>
    <</T(preferredLanguage)/V/{preferredLanguage}>>
    <</T(LastName)/V({lastName})>>
    <</T(firstName)/V({firstName})>>
    <</T(middleName)/V({middleName})>>
    <</T(addressStreet)/V({addressStreet})>>
    <</T(addressCity)/V({addressCity})>>
    <</T(addressState)/V({addressState})>>
    <</T(addressCountry)/V({addressCountry})>>
    <</T(addressPostalCode)/V({addressPostalCode})>>
    <</T(phoneNumber)/V({phoneNumber})>>
    <</T(emailAddress)/V({emailAddress})>>
    <</T(studentSigDate)/V({studentSigDate})>>
    <</T(courseStartDate)/V({courseStartDate})>>
    <</T(courseEndDate)/V({courseEndDate})>>
    <</T(classHours)/V({classHours})>>
    <</T(numberOfStudents)/V({numberOfStudents})>>
    <</T(courseState)/V({courseState})>>
    <</T(courseCity)/V({courseCity})>>
    <</T(courseLanguage)/V/{courseLanguage}>>
    <</T(instructorNumber)/V({instructorNumber})>>
    <</T(instructorLastName)/V({instructorLastName})>>
    <</T(instructorFirstName)/V({instructorFirstName})>>
    <</T(instructorSigDate)/V({instructorSigDate})>>
    <</T(courseFee)/V({courseFee})>>
    <</T(examinerNumber)/V({examinerNumber})>>
    <</T(examinerLastName)/V({examinerLastName})>>
    <</T(examinerFirstName)/V({examinerFirstName})>>
    <</T(examinerSigDate)/V({examinerSigDate})>>]
    >> >>
endobj
trailer
<</Root 1 0 R>>
%%EOF
""".format(**row)
    return fdf_content    
    
# Loop through each row, create an FDF file
for row in data:
    if row['lastName'] and row['firstName']:
        # Create the FDF file
        fdf_filename = f"{row['lastName']}_{row['firstName']} - {row['courseType']}.fdf"
        with open(fdf_filename, 'w', encoding='utf-8') as file:
            file.write(create_fdf_content(row))
        print(f"Created FDF file: {fdf_filename}")

# Open the template PDF
template_pdf = PdfReader("template_form.pdf", "rb")

# Create a dictionary from the fdf file
def parse_fdf_file(fdf_filename):
    fields = []
    with open(fdf_filename, 'r', encoding='utf-8') as fdf_file:
        lines = fdf_file.readlines()
        for line in lines:
            if line.startswith('<</T(') and '/V(' in line: # Does not account for radio button fields which have /V/ not /V()
                field_name = line.split('<</T(')[1].split(')')[0]
                field_value = line.split('/V(')[1].split(')')[0]
                fields.append({'field_name': field_name, 'field_value': field_value})
    return fields

# Import the FDF data into the template PDF
pdf_writer = PdfWriter()
page = template_pdf.pages[0]
fields = template_pdf.get_fields()
pdf_writer.add_page(page)
for field in fields:
    pdf_writer.update_page_form_field_values(0, field['field_name'], field['field_value'])

# Save the resulting PDF with the same name as the FDF file
pdf_filename = f"{row['lastName']}_{row['firstName']} - {row['courseType']}.pdf"
with open(pdf_filename, "wb") as pdf_file:
    pdf_writer.write(pdf_file)
print(f"Created PDF file: {pdf_filename}")

我在这里得到的验证是，这段代码确实以正确的格式创建了一个FDF文件，我可以手动将该FDF导入到模板表单中.这告诉我，FDF是正确创建的，因此，数据、字段名称等都是正确的和有效的.

不过，我在想，我在这件事上可能仍然走在一条无效的轨道上.我没有need和FDF文件，我只需要CSV文件中的数据进入模板表单并保存.CSV有多行，每行有1个客户端数据.在阅读pypdf文档时，我突然意识到FDF步骤是浪费的，而且很可能会出现问题，并且FDF表单不是与Python词典相同的实体.

所以，我在这里寻求意见和帮助.解析CSV并将其注入模板表单的最有效或至少最有效的方法是什么？

我应该补充的是，模板表单包含大部分文本表单域，但也有一些单选按钮组.据我所知，FDF只是在文本字段的值前面加了/V(单选按钮值前面加了/V/)的情况下才对它们进行了不同的处理.我还没有修改第75:77行，因为我想先从这里得到反馈.

我确实阅读了this post个和很多其他的，但我不希望创建一个完整的PDF，我需要使用一个模板表单的字段必须填写.

编辑/更新: 已切换到直接解析CSV，但收到‘Error:Key Must is PdfObject’:

import csv
from pypdf import PdfReader, PdfWriter

# Define CSV and PDF template file names
CSV_FILE = "acrobat_import.csv"
PDF_TEMPLATE = "pdf_import_template.pdf"

def process_csv_pdf(csv_file, pdf_template):
    try:
        # 1. Read the CSV data into a dictionary
        with open(csv_file, 'r', newline='') as file:
            reader = csv.DictReader(file)
            csv_data = list(reader)           # Read the data as dictionaries

            print("CSV Data:", csv_data)

    except FileNotFoundError as e:
        print(f"Error: CSV file '{csv_file}' not found.")
        return
    except Exception as e:
        print(f"Error reading CSV file: {e}")
        return

    # 2. Process each row in the CSV
    for row in csv_data:
        try:
            # 3. Prepare variables for file naming
            last_name = row.get('lastName', '')
            first_name = row.get('firstName', '')
            course_type = row.get('courseType', '')

            if not all([last_name, first_name, course_type]):
                print(f"Warning: Skipping row due to missing data: {row}")
                continue  # Skip rows with missing fields

            # 4. Create output file name
            output_filename = f"{last_name}_{first_name} - {course_type}.pdf"
            print(f"Processing: {output_filename}")

            # 5. Fill the PDF template
            reader = PdfReader(pdf_template)
            writer = PdfWriter()
            page = reader.pages[0]
            
             # Get form fields 
            fields = reader.get_fields(0) 
            if fields:
                field_map = {field.get("/T"): field for field in fields.values()} # Create a mapping

                for field_name in row:  # Iterate over your CSV keys directly
                    if field_name in field_map:
                        pdf_field = field_map[field_name]
                         # Debugging: 
                        print(type(pdf_field))  # Print the type of the PdfObject
                        print(pdf_field)        # Print the representation of the field object

                        pdf_field["/V"] = row[field_name]  # Direct update attempt
                        writer.update_page_form_field_values(writer.add_page(page), pdf_field.get("/T"), row[field_name]) # Note: Using .get("/T") for the key



            # 6. Save the output PDF
            with open(output_filename, 'wb') as output_file:
                writer.write(output_file)

            print(f"PDF created: {output_filename}")

        except Exception as e:
            print(f"Error processing row: {row}, Error: {e}")

if __name__ == "__main__":
    process_csv_pdf(CSV_FILE, PDF_TEMPLATE)

Python3.x 如何将CSV或FDF数据解析到Python词典并注入到模板PDF表单中

推荐答案

Python-3.x相关问答推荐

为什么打印语句在Python多处理脚本中执行两次？

在BaseHTTPRequestHandler中填充和返回列表

如何使用魔杖扭曲图像

在 sum() 中将字符串转换为 int (或 float)

如何根据索引子列表对元素列表进行分组或批处理？

从 https：//www.niftytrader.in/stock-options-chart/sbin 提取 SBIN 股票最大痛苦值的 Python 代码不起作用 - 我错过了什么？

三重奏：为什么频道被记录为使用async with，而不是with？

为什么不能用格式字符串 '-' 绘制点？

我应该如何调整我的变量，以便如果有任何单词符合其中的条件，程序会将其附加到新列表中？

Python 3 - 给定未知数量的类别动态地将字典嵌套到列表中

如何知道Pandas 列中的每个后续值是否都大于前面的值？ Python相关

根据另一列值对多个数据框列进行分组

Await Future 来自 Executor：Future 不能在await表达式中使用

Python 3：函数参数中的省略号？

Python过滤器函数 - 单个结果

Jupyter Notebook - 在函数内绘图 - 未绘制图形

如何在 Selenium 和 Python 中使用类型查找元素

Python3 - 如何从现有抽象类定义抽象子类？

如何将文档字符串放在 Enums 上？

调用 Python doctest 时如何启用省略号？