我有一个函数,它在循环中运行,对数组列表执行计算.
在函数第一次迭代期间的某个时刻,初始化Polars Lazyframe.
在接下来的迭代中,使用相同的模式指定一个新的收件箱,并使用pi.vstack将两个收件箱逐行连接,然后再次指定为lazyframe.
import numpy as np
import polars as pl
def my_func():
array_list = [np.zeros((1,19))]*2 #this is just for example and not representative of shape of real array.
for i, _ in enumerate(array_list):
#calculations are done here
result = np.zeros((1,19)) #result of calculations (correct shape of real result)
if i < 1:
result_df = pl.DataFrame(data = result,
schema = {
'MDL',
'MVL',
'MWVL',
'RR',
'DET',
'ADL',
'LDL',
'DIV',
'EDL',
'LAM',
'TT',
'LVL',
'EVL',
'AWVL',
'LWVL',
'LWVLI',
'EWVL',
'Ratio_DRR',
'Ratio_LD'},
orient='row').lazy()
else:
new_df = pl.DataFrame(data=data,
schema = {
'MDL',
'MVL',
'MWVL',
'RR',
'DET',
'ADL',
'LDL',
'DIV',
'EDL',
'LAM',
'TT',
'LVL',
'EVL',
'AWVL',
'LWVL',
'LWVLI',
'EWVL',
'Ratio_DRR',
'Ratio_LD'
},
orient='row')
#append new dataframe to results
result_df = result_df.collect().vstack(new_df, in_place=True).lazy()
return result_df
当在函数外部返回收件箱时,列名不再按顺序排列,但数据按顺序排列.
例如
result.schema
OrderedDict([('LAM', Float64),
('LDL', Float64),
('ADL', Float64),
('DIV', Float64),
('MDL', Float64),
('MWVL', Float64),
('LWVL', Float64),
('MVL', Float64),
('TT', Float64),
('DET', Float64),
('RR', Float64),
('EDL', Float64),
('Ratio_LD', Float64),
('Ratio_DRR', Float64),
('LVL', Float64),
('LWVLI', Float64),
('EWVL', Float64),
('EVL', Float64),
('AWVL', Float64)])
我想这是由于我对Lazyframes如何工作的天真,但是有没有一种方法可以在不重命名列的情况下强制执行顺序呢?
谢谢.