Python 是否可以在将数据从一个EXCEL文件移动到另一个文件的同时拆分数据Pandas 或OpenPyxl

发布于01月20日

从本质上讲，我需要将一个Excel文档翻译成另一个.

这两个工作表的格式不同，但包含大多数相同的信息-但在工作表1中，某些数据的格式不同.

例如，图纸1中的"姓名"和图纸2中的"名字"和"姓氏"

可以让我的脚本为我做这件事吗？寻找像逗号这样的分隔符将"Address"分割为"Street"、"City"、"State""Zip"--或者最好是使用Excel工具进行后期翻译.

我已经能够使用Openpyxl直接读取行，代码如下:

    step = 2
    read_start_row = 4
    write_start_row = 3
    amount_of_rows = 30

    for i in range(0, amount_of_rows, step):
        #copy from wb1
        c = ws1.cell(row=read_start_row+i, column=4)
        #paste in ws2
        ws2.cell(row=write_start_row+(i/step), column=4, value=c.value)

但当试图同时更改数据时，不确定从哪里开始.

推荐答案

如果只是复制数据，这是一个快速示例.

The Example sheet has two columns; 'Name' and 'Address' where;

Name includes "First" and "Last" name separated by space
Address includes "Street" "City" "State" and "Zip" separated by comma

示例代码读取源Excel表，将两列拆分成组成部分，并将结果写回目标表

import pandas as pd


### Read data from source sheet
df = pd.read_excel('source.xlsx', sheet_name='Sheet1')

### Split the necessary columns on the delimiters
df[['First', 'Last']] = df['Name'].str.split(' ', n=1, expand=True)  # Delimiter is space
df[['Street', 'City', 'State', 'Zip']] = df['Address'].str.split(', ', n=3, expand=True)  # Delimiter is comma

### Drop the now unnecessary columns
df = df.drop(['Name', 'Address'], axis=1)

### Reorder columns, probably not needed in this case but ensures the columns are in correct order.
df = df[['First', 'Last', 'Street', 'City', 'State', 'Zip']] 

### Write to the destination sheet, start row 2 (startrow 1), drop index and headers
with pd.ExcelWriter('dest.xlsx', mode='a', engine='openpyxl', if_sheet_exists='overlay') as writer:
    df.to_excel(writer, sheet_name="Sheet1", startrow=1, index=False, header=False)

结果数据帧

   First   Last          Street     City State         Zip
0  Mavis   West  421 E DRACHMAN   TUCSON    AZ  85705-7598
1   John  Spurs     100 MAIN ST  SEATTLE    WA       98104
2   Jack   East   105 KROME AVE    MIAMI    FL  33185 3700

In this example the destination Sheet contains the headers already on row 1 (note this is row 0 for to_excel). For this since writing to an existing sheet which already contains data (Headers) we use 'mode=a' (append) which requires Openpyxl as the engine.
Also to_excel write excludes the Headers from the dataframe. However if preferred the destination sheet could be empty and the headers written along with the column data.

Python 是否可以在将数据从一个EXCEL文件移动到另一个文件的同时拆分数据Pandas 或OpenPyxl

推荐答案

Python相关问答推荐

有什么方法可以修复奇怪的y轴Python matplotlib图吗？

将词典写入Excel

如何判断LazyFrame是否为空？

DuckDB将蜂巢分区插入拼花文件

根据网格和相机参数渲染深度

配置Sweetviz以分析对象类型列，而无需转换

如何在具有重复数据的pandas中对groupby进行总和，同时保留其他列

类型错误：输入类型不支持ufuncisnan-在执行Mann-Whitney U测试时[SOLVED]

有症状地 destruct 了Python中的regex？

scikit-learn导入无法导入名称METRIC_MAPPING64'

关于Python异步编程的问题和使用await/await def关键字

根据列值添加时区

重置PD帧中的值

人口全部乱序 - Python—Matplotlib—映射

交替字符串位置的正则表达式

pysnmp—lextudio使用next()和getCmd()生成器导致TypeError：tuple对象不是迭代器''

使用Openpyxl从Excel中的折线图更改图表样式

PYTHON、VLC、RTSP.屏幕截图不起作用

如何使用正则表达式修改toml文件中指定字段中的参数值

判断Python操作：如何从字面上得到所有decorator ？