我想使用SqlAlchemy ORM将整个数据库表加载到Pandas DataFrame中.我已经成功地查询表中的行数,如下所示:

from local_modules import RemoteConnector
from sqlalchemy import Integer, Column
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.automap import automap_base
import pandas as pd

Base = automap_base()


class Calculations(Base):
    __tablename__ = "calculations"
    id = Column("ID", Integer, primary_key=True)


Base.prepare()

connection = RemoteConnector('server', 'calculations_database')
connection.connect()

Session = sessionmaker(bind=connection.engine)
session = Session()

result = session.query(Calculations).count()
print('Record count:', result)

输出:

Record count: 13915

Process finished with exit code 0

如果可能,而且似乎可以做到,我希望使用来自SQlalChemy.ext.Automap的AUTOMAP_BASE来定义表,而不必手动声明每一列.我使用‘id’这样做是因为我有一个错误,要求我设置一个主键(有没有更好的方法来做这件事?)

为了获得任何结果,我可以执行以下操作:

results = session.query(Calculations).all()

输出:

[<__main__.Calculations object at 0x000001AF2324F510>, <__main__.Calculations object at 0x000001AF2324F6D0>, <__main__.Calculations object at 0x000001AF2324F810>, <__main__.Calculations object at 0x000001AF2324F910>, <__main__.Calculations object at 0x000001AF2324FA50>, <__main__.Calculations object at 0x000001AF2324FB90>, <__main__.Calculations object at 0x000001AF2324FCD0>, <__main__.Calculations object at 0x000001AF2324FE10>, <__main__.Calculations object at 0x000001AF2324FF50>, <__main__.Calculations object at 0x000001AF22CD40D0>, <__main__.Calculations object at 0x000001AF22CD4210>, <__main__.Calculations object at 0x000001AF22CD4350>, <__main__.Calculations object at 0x000001AF22CD4490>, <__main__.Calculations object at 0x000001AF22CD45D0>, <__main__.Calculations object at 0x000001AF22CD4710>, <__main__.Calculations object at 0x000001AF22CD4850>, <__main__.Calculations object at 0x000001AF22CD4990>, <__main__.Calculations object at 0x000001AF22CD4AD0>, <__main__.Calculations object at 0x000001AF22CD4C10>, <__main__.Calculations object at 0x000001AF22CD4D50>, <__main__.Calculations object at 0x000001AF22CD4E90>, <__main__.Calculations object at 0x000001AF22CD4FD0>, <__main__.Calculations object at 0x000001AF22CD5110>, <__main__.Calculations object at 0x000001AF22CD5250>, <__main__.Calculations object at 0x000001AF22CD53D0>, <__main__.Calculations object at 0x000001AF22CD5510>, <__main__.Calculations object at 0x000001AF22CD5650>, <__main__.Calculations object at 0x000001AF22CD5790>, <__main__.Calculations object at 0x000001AF22CD58D0>, <__main__.Calculations object at 0x000001AF22CD5A10>, <__main__.Calculations object at 0x000001AF22CD5B50>, <__main__.Calculations object at 0x000001AF22CD5C90>, <__main__.Calculations object at 0x000001AF22CD5DD0>, <__main__.Calculations object at 0x000001AF22CD5F10>, <__main__.Calculations object at 0x000001AF22CD6050>, <__main__.Calculations object at 0x000001AF22CD6190>, <__main__.Calculations object at 0x000001AF22CD62D0>, <__main__.Calculations object at 0x000001AF22CD6410>, <__main__.Calculations object at 0x000001AF22CD6550>, <__main__.Calculations object at 0x000001AF22CD6690>, <__main__.Calculations object at 0x000001AF22CD67D0>, <__main__.Calculations object at 0x000001AF22CD6910>, <__main__.Calculations object at 0x000001AF22CD6A50>, <__main__.Calculations object at 0x000001AF22CD6B90>, <__main__.Calculations object at 0x000001AF22CD6CD0>, <__main__.Calculations object at 0x000001AF22CD6E10>, <__main__.Calculations object at 0x000001AF22CD6F50>, <__main__.Calculations object at 0x000001AF22CD7090>]

这会将表中的所有列显示为一个对象.我提取这些价值的最佳try 是:

for result in results:
    print(result.__dict__)

输出:

{'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x00000232E0A91730>, 'id': 1.0}
{'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x00000232E0A90E90>, 'id': 2.0} ... and so on

我不仅没有得到值,而且它也不打印列,只打印我在类中定义的ID.我以为当我做自动映射基地的时候,它会自动转移.当我给它们下定义时,它们确实出现了,就像这样:

class Calculations(Base):
    __tablename__ = "Calculations"
    id = Column("Trade ID", Integer, primary_key=True)
    Amount = Column("Amount", Integer)
    Yield = Column("Yield", Integer)

输出:

{'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x000001BFD2092090>, 'Amount': 34303.0, 'Yield': 0.01141, 'id': 1.0}
{'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x000001BFD2091010>, 'Amount': 10000.0, 'Yield': 0.01214, 'id': 2.0}
{'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x000001BFD2090FB0>, 'Amount': 43515.0, 'Yield': 0.01206, 'id': 3.0}
... and so on

我最终想要做的是SQLAlchemy ORM conversion to pandas DataFrame中建议的事情:

df = pd.read_sql_query(sql=session.query(Calculation).all(), con=connection.engine)

但我得到以下错误:

 raise exc.ObjectNotExecutableError(statement) from err
sqlalchemy.exc.ObjectNotExecutableError: Not an executable object: [<__main__.CALC_TFSB_INVESTMENTS object at 0x000001FF42966E50>, ... an so on

我也try 过:

df = pd.read_sql_query(sql=select(Calculations), con=connection.engine)
print(df.head())

如何加载DataFrame?我想我应该如何使用AUTOMAP_BASE来自动化模式检测?我如何改进我的代码,有没有其他我可以添加的东西,也许是DUnder字段,以使事情变得更好?

推荐答案

如果要从一个或多个表加载所有记录,则可以使用read_sql_table而不是read_sql_query:

如果要从数据库加载所有表,请执行以下操作:

from sqlalchemy.engine import create_engine
from sqlalchemy.ext.automap import automap_base

engine = create_engine('sqlite:///data.db', echo=False)
Base = automap_base()
Base.prepare(engine)

data = {}
with engine.connect() as con:
    for tbl in Base.classes:
        name = tbl.__name__
        data[name] = pd.read_sql_table(name, con)

如果您只想从数据库中加载一个表:

from sqlalchemy.engine import create_engine
from sqlalchemy.ext.automap import automap_base

engine = create_engine('sqlite:///data.db', echo=False)
Base = automap_base()
Base.prepare(engine)

with engine.connect() as con:
    df = pd.read_sql_table('Calculations', con)

Python相关问答推荐

如何分割我的收件箱,以便连续的数字各自位于自己的收件箱中?

在Python中是否可以输入使用任意大小参数列表的第一个元素的函数

Django关于UniqueBindition的更新

如何使用stride_tricks.as_strided逆转NumPy数组

将轨迹优化问题描述为NLP.如何用Gekko解决这个问题?当前面临异常:@错误:最大方程长度错误

优化在numpy数组中非零值周围创建缓冲区的函数的性能

线性模型PanelOLS和statmodels OLS之间的区别

Polars比较了两个预设-有没有方法在第一次不匹配时立即失败

Python多处理:当我在一个巨大的pandas数据框架上启动许多进程时,程序就会陷入困境

如何过滤包含2个指定子字符串的收件箱列名?

运输问题分支定界法&

如果满足某些条件,则用另一个数据帧列中的值填充空数据帧或数组

如何在表中添加重复的列?

当点击tkinter菜单而不是菜单选项时,如何执行命令?

合并帧,但不按合并键排序

在两极中过滤

在Python 3中,如何让客户端打开一个套接字到服务器,发送一行JSON编码的数据,读回一行JSON编码的数据,然后继续?

Pandas Data Wrangling/Dataframe Assignment

Flask Jinja2如果语句总是计算为false&

Gekko中基于时间的间隔约束