我在客户关系管理系统中有客户数据,我需要使用这些数据来自动完成PDF表单.
我已经搞清楚了CRM提取到CSV,后提取CSV操作,正确的CSV解析为FDF格式,并创建和保存新的PDF文件,should填充来自FDF的数据.代码应该打开一个模板表单,其中的字段名称与CSV和FDF文件匹配,注入字段的值并将其另存为新文件.
我似乎遇到的问题是,我无法将FDF数据放入模板中.当我打开新保存的PDF时,FIED是空白的.我try 过几种不同的方法,但现在我已经陷入困境,我想我已经失go 了保持客观的能力.
我使用的是pypdf 4.1.0库,因为我看到开发人员已经将PyPDF2重新放入其中.
以下是我目前正在编写的代码.我try 了许多不同的方法,但这是代表当前状态的最干净的版本:
import csv
import os
from pypdf import PdfWriter, PdfReader
# Read the CSV file
with open('acrobat_import.csv', 'r') as file:
reader = csv.DictReader(file)
data = list(reader)
# Function to create the FDF file content
def create_fdf_content(row):
fdf_content = """
%FDF-1.2
%����
1 0 obj
<</FDF<</F(template_form.pdf)/Fields[
<</T(courseType)/V/{courseType}>>
<</T(gender)/V/{gender}>>
<</T(dateofbirth)/V({dateofbirth})>>
<</T(preferredLanguage)/V/{preferredLanguage}>>
<</T(LastName)/V({lastName})>>
<</T(firstName)/V({firstName})>>
<</T(middleName)/V({middleName})>>
<</T(addressStreet)/V({addressStreet})>>
<</T(addressCity)/V({addressCity})>>
<</T(addressState)/V({addressState})>>
<</T(addressCountry)/V({addressCountry})>>
<</T(addressPostalCode)/V({addressPostalCode})>>
<</T(phoneNumber)/V({phoneNumber})>>
<</T(emailAddress)/V({emailAddress})>>
<</T(studentSigDate)/V({studentSigDate})>>
<</T(courseStartDate)/V({courseStartDate})>>
<</T(courseEndDate)/V({courseEndDate})>>
<</T(classHours)/V({classHours})>>
<</T(numberOfStudents)/V({numberOfStudents})>>
<</T(courseState)/V({courseState})>>
<</T(courseCity)/V({courseCity})>>
<</T(courseLanguage)/V/{courseLanguage}>>
<</T(instructorNumber)/V({instructorNumber})>>
<</T(instructorLastName)/V({instructorLastName})>>
<</T(instructorFirstName)/V({instructorFirstName})>>
<</T(instructorSigDate)/V({instructorSigDate})>>
<</T(courseFee)/V({courseFee})>>
<</T(examinerNumber)/V({examinerNumber})>>
<</T(examinerLastName)/V({examinerLastName})>>
<</T(examinerFirstName)/V({examinerFirstName})>>
<</T(examinerSigDate)/V({examinerSigDate})>>]
>> >>
endobj
trailer
<</Root 1 0 R>>
%%EOF
""".format(**row)
return fdf_content
# Loop through each row, create an FDF file
for row in data:
if row['lastName'] and row['firstName']:
# Create the FDF file
fdf_filename = f"{row['lastName']}_{row['firstName']} - {row['courseType']}.fdf"
with open(fdf_filename, 'w', encoding='utf-8') as file:
file.write(create_fdf_content(row))
print(f"Created FDF file: {fdf_filename}")
# Open the template PDF
template_pdf = PdfReader("template_form.pdf", "rb")
# Create a dictionary from the fdf file
def parse_fdf_file(fdf_filename):
fields = []
with open(fdf_filename, 'r', encoding='utf-8') as fdf_file:
lines = fdf_file.readlines()
for line in lines:
if line.startswith('<</T(') and '/V(' in line: # Does not account for radio button fields which have /V/ not /V()
field_name = line.split('<</T(')[1].split(')')[0]
field_value = line.split('/V(')[1].split(')')[0]
fields.append({'field_name': field_name, 'field_value': field_value})
return fields
# Import the FDF data into the template PDF
pdf_writer = PdfWriter()
page = template_pdf.pages[0]
fields = template_pdf.get_fields()
pdf_writer.add_page(page)
for field in fields:
pdf_writer.update_page_form_field_values(0, field['field_name'], field['field_value'])
# Save the resulting PDF with the same name as the FDF file
pdf_filename = f"{row['lastName']}_{row['firstName']} - {row['courseType']}.pdf"
with open(pdf_filename, "wb") as pdf_file:
pdf_writer.write(pdf_file)
print(f"Created PDF file: {pdf_filename}")
我在这里得到的验证是,这段代码确实以正确的格式创建了一个FDF文件,我可以手动将该FDF导入到模板表单中.这告诉我,FDF是正确创建的,因此,数据、字段名称等都是正确的和有效的.
不过,我在想,我在这件事上可能仍然走在一条无效的轨道上.我没有need和FDF文件,我只需要CSV文件中的数据进入模板表单并保存.CSV有多行,每行有1个客户端数据.在阅读pypdf文档时,我突然意识到FDF步骤是浪费的,而且很可能会出现问题,并且FDF表单不是与Python词典相同的实体.
所以,我在这里寻求意见和帮助.解析CSV并将其注入模板表单的最有效或至少最有效的方法是什么?
我应该补充的是,模板表单包含大部分文本表单域,但也有一些单选按钮组.据我所知,FDF只是在文本字段的值前面加了/V(单选按钮值前面加了/V/)的情况下才对它们进行了不同的处理.我还没有修改第75:77行,因为我想先从这里得到反馈.
我确实阅读了this post个和很多其他的,但我不希望创建一个完整的PDF,我需要使用一个模板表单的字段必须填写.
编辑/更新: 已切换到直接解析CSV,但收到‘Error:Key Must is PdfObject’:
import csv
from pypdf import PdfReader, PdfWriter
# Define CSV and PDF template file names
CSV_FILE = "acrobat_import.csv"
PDF_TEMPLATE = "pdf_import_template.pdf"
def process_csv_pdf(csv_file, pdf_template):
try:
# 1. Read the CSV data into a dictionary
with open(csv_file, 'r', newline='') as file:
reader = csv.DictReader(file)
csv_data = list(reader) # Read the data as dictionaries
print("CSV Data:", csv_data)
except FileNotFoundError as e:
print(f"Error: CSV file '{csv_file}' not found.")
return
except Exception as e:
print(f"Error reading CSV file: {e}")
return
# 2. Process each row in the CSV
for row in csv_data:
try:
# 3. Prepare variables for file naming
last_name = row.get('lastName', '')
first_name = row.get('firstName', '')
course_type = row.get('courseType', '')
if not all([last_name, first_name, course_type]):
print(f"Warning: Skipping row due to missing data: {row}")
continue # Skip rows with missing fields
# 4. Create output file name
output_filename = f"{last_name}_{first_name} - {course_type}.pdf"
print(f"Processing: {output_filename}")
# 5. Fill the PDF template
reader = PdfReader(pdf_template)
writer = PdfWriter()
page = reader.pages[0]
# Get form fields
fields = reader.get_fields(0)
if fields:
field_map = {field.get("/T"): field for field in fields.values()} # Create a mapping
for field_name in row: # Iterate over your CSV keys directly
if field_name in field_map:
pdf_field = field_map[field_name]
# Debugging:
print(type(pdf_field)) # Print the type of the PdfObject
print(pdf_field) # Print the representation of the field object
pdf_field["/V"] = row[field_name] # Direct update attempt
writer.update_page_form_field_values(writer.add_page(page), pdf_field.get("/T"), row[field_name]) # Note: Using .get("/T") for the key
# 6. Save the output PDF
with open(output_filename, 'wb') as output_file:
writer.write(output_file)
print(f"PDF created: {output_filename}")
except Exception as e:
print(f"Error processing row: {row}, Error: {e}")
if __name__ == "__main__":
process_csv_pdf(CSV_FILE, PDF_TEMPLATE)