我经常需要从Microsoft SQL服务器获取大量数据,以便在Rust中使用Polar进行操作,根据企业安全策略,我或多或少都被迫使用ODBC进行这些连接.ODBC要求限制我使用像ConnectorX这样成熟且功能强大的库.我可以使用ARROW_ODBC将查询结果从Arrow连接并高效地读取到RecordBatch对象中,但无法将这些RecordBatch对象转换为Polars DataFrame.
因为RecordBatch
和Series
的实际数据组件具有相同的底层表示形式,所以我认为可以从RecordBatch
的零拷贝创建DataFrame
.
然而,在columns.push(Series::from_arrow(&schema.fields().get(i).unwrap().name(), *column)?);
中,我得到了错误:
mismatched types
expected struct `std::boxed::Box<(dyn polars::export::polars_arrow::array::Array + 'static)>`
found struct `Arc<dyn arrow::array::Array>`
我的印象是,Arc<dyn Array>
是ArrayRef
,是真正的问题,也许我有一个Arc<dyn arrow::array::Array>
和Series::from_arrow()
期待一个北极星Arc<Array>
?如果是这样,我该如何解决?
我的完整代码如下以供参考.
use arrow_odbc::{odbc_api::{Environment, ConnectionOptions}, OdbcReaderBuilder};
use arrow::record_batch::RecordBatch;
use polars::prelude::*;
use anyhow::Result;
const CONNECTION_STRING: &str = "...";
pub fn test() -> Result<()> {
let odbc_environment = Environment::new()?;
let connection = odbc_environment.connect_with_connection_string(
CONNECTION_STRING,
ConnectionOptions::default()
)?;
let cursor = connection.execute("SELECT * FROM Backcast_Power_Plant_Map", ())?.unwrap();
let arrow_record_batches = OdbcReaderBuilder::new().build(cursor)?;
fn record_batch_to_dataframe(batch: &RecordBatch) -> Result<DataFrame, PolarsError> {
let schema = batch.schema();
let mut columns = Vec::with_capacity(batch.num_columns());
for (i, column) in batch.columns().iter().enumerate() {
columns.push(Series::from_arrow(&schema.fields().get(i).unwrap().name(), *column)?);
}
Ok(DataFrame::from_iter(columns))
}
for batch in arrow_record_batches {
dbg!(record_batch_to_dataframe(&batch?));
}
Ok(())
}