Python To_parquet 给出错误：__arrow_array__() 得到意外的关键字参数type

发布于08月11日

I'm reading a root file using uproot and converting parts of it into a DataFrame using the arrays method.
This works fine, until I try to save to parquet using the to_parquet method on the dataframe. Sample code is given below.

# First three lines are here to rename the columns and choose what data to keep
data = pd.read_csv(dictFile, header = None, delim_whitespace=True)
dataFile, dataKey = data[0], data[1]
content_ele = {dataKey[i]: dataFile[i] for i in np.arange(len(dataKey))}

# We run over the different files to save a simplified version of them.
file_list = pd.read_csv(file_list_loc, names=["Loc"])

for file_loc in file_list.Loc:

    tree = uproot.open(f"{file_path}/{file_loc}:CollectionTree")

    arrays = tree.arrays(dataKey, library="pd").rename(columns=content_ele)

    save_loc = f"{save_path}/{file_loc[:-6]}reduced.parquet"
    arrays.to_parquet(path=save_loc)

Doing so, results in the following error: _arrow_array_() got an unexpected keyword argument 'type'
It seems to originate from pa.array, if that helps out.

值得注意的是，我遇到过这个错误的最简单的数据帧有2列，每行都有不同长度的尴尬数组(尴尬的.高水平数组)，但每列都是相同的.下面给出一个例子.

           A                      B
0   [31, 26, 17, 23]    [-2.1, 1.3, 0.5, -0.4]
1   [75, 15, 49]        [2.4, -1.8, 0.8] 
2   [58, 45, 64, 47]    [-1.9, -0.4, -2.5, 1.3]
3   [26]                [-1.1]

I've tried both reducing what elements I run on, such as only integers, reducing amount of columns as above.
However, running this exact same method with to_json gives no errors. The problem with that method is that once I read it again, what was previously awkward arrays are now just lists, making it much more impractical to work with whenever I may want to calculate something like array.A/2. Yes, I could just convert it, but it seems wiser to keep the original format and it is easier since I don't have to do it each time.

>>> import awkward as ak >>> import pandas as pd >>> import awkward_pandas >>> ragged_array = ak.Array([[0, 1, 2], [], [3, 4], [5], [6, 7, 8, 9]]) >>> ak_ext_array = awkward_pandas.AwkwardExtensionArray(ragged_array) >>> df = pd.DataFrame({"column": ak_ext_array}) >>> df column 0 [0, 1, 2] 1 [] 2 [3, 4] 3 [5] 4 [6, 7, 8, 9] >>> df.to_parquet("/tmp/file.parquet")

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/jpivarski/mambaforge/lib/python3.9/site-packages/pandas/core/frame.py", line 2889, in to_parquet return to_parquet( File "/home/jpivarski/mambaforge/lib/python3.9/site-packages/pandas/io/parquet.py", line 411, in to_parquet impl.write( File "/home/jpivarski/mambaforge/lib/python3.9/site-packages/pandas/io/parquet.py", line 159, in write table = self.api.Table.from_pandas(df, **from_pandas_kwargs) File "pyarrow/table.pxi", line 3480, in pyarrow.lib.Table.from_pandas File "/home/jpivarski/mambaforge/lib/python3.9/site-packages/pyarrow/pandas_compat.py", line 609, in dataframe_to_arrays arrays = [convert_column(c, f) File "/home/jpivarski/mambaforge/lib/python3.9/site-packages/pyarrow/pandas_compat.py", line 609, in <listcomp> arrays = [convert_column(c, f) File "/home/jpivarski/mambaforge/lib/python3.9/site-packages/pyarrow/pandas_compat.py", line 590, in convert_column result = pa.array(col, type=type_, from_pandas=True, safe=safe) File "pyarrow/array.pxi", line 263, in pyarrow.lib.array File "pyarrow/array.pxi", line 110, in pyarrow.lib._handle_arrow_array_protocol TypeError: __arrow_array__() got an unexpected keyword argument 'type'

>>> ak_ext_array.__arrow_array__() <pyarrow.lib.ChunkedArray object at 0x7ff422d0b9f0> [ [ [ 0, 1, 2 ], [], ... [ 5 ], [ 6, 7, 8, 9 ] ] ]

Python To_parquet 给出错误：__arrow_array__() 得到意外的关键字参数type

推荐答案

Python相关问答推荐

在for循环中保存和删除收件箱

如何从不同长度的HTML表格中抓取准确的字段？

如何判断. text文件中的某个字符，然后读取该行

如何使用scikit-learn Python库中的Agglomerative集群算法以及集群中声明的对象数量？

如何观察cv2.erode()的中间过程？

如何使用上下文管理器创建类的实例？

分组数据并删除重复数据

在Google Colab中设置Llama-2出现问题-加载判断点碎片时Cell-run失败

使用miniconda创建环境的问题

为什么符号没有按顺序添加？

Polars：用氨纶的其他部分替换氨纶的部分

数据抓取失败：寻求帮助

修复mypy错误-赋值中的类型不兼容(表达式具有类型xxx，变量具有类型yyy)

Godot：需要碰撞的对象的AdditionerBody2D或Area2D以及queue_free？

部分视图的DataFrame

driver. find_element无法通过class_name找到元素'""

当递归函数的返回值未绑定到变量时，非局部变量不更新：

Matplotlib中的字体权重

如何防止Pandas将索引标为周期？

如何将数据帧中的timedelta转换为datetime