我得到了一些来自Mongo数据库的数据.这样的表包含几个列,并且某些这样的列由非常奇怪的格式组成.

列/系列的一行的示例

'[{idEvento.$oid=63ffaec3cdc01e6352729bad, dataHoraEvento.$date=1677690003377, codigoTipoEvento=1, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=63ffb5c8cdc01e6352729bae, dataHoraEvento.$date=1677691800676, codigoTipoEvento=3, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=6405cc8711c78c20369b4033, dataHoraEvento.$date=1678090851560, codigoTipoEvento=8, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=6422b4c97e45dd75abb4f831, dataHoraEvento.$date=1679985307560, codigoTipoEvento=6, mesAnoReferenciaContabilizacao=032023, _class=br.com.bb.rcp.model.vantagens.HistoricoContabil}, {idEvento.$oid=6422b4c97e45dd75abb4f832, dataHoraEvento.$date=1679985309584, codigoTipoEvento=6, mesAnoReferenciaContabilizacao=032023, _class=br.com.bb.rcp.model.vantagens.HistoricoContabil}]'

这不是,至少就我的拍摄知识而言,这不是Json.我正在为如何将每个"事件"(由一个{}项组成)转换为列表而苦苦挣扎.

在此之后,我如何根据每个事件的包含来查询/过滤数据?我是否应该将事件分解成新的行并作为字符串进行查询?

推荐答案

您可以try 将字符串"转换"为适当的Json(使用re),然后使用标准的json.loads(Regexre demo):

import re
import json
import pandas as pd


s = "[{idEvento.$oid=63ffaec3cdc01e6352729bad, dataHoraEvento.$date=1677690003377, codigoTipoEvento=1, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=63ffb5c8cdc01e6352729bae, dataHoraEvento.$date=1677691800676, codigoTipoEvento=3, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=6405cc8711c78c20369b4033, dataHoraEvento.$date=1678090851560, codigoTipoEvento=8, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=6422b4c97e45dd75abb4f831, dataHoraEvento.$date=1679985307560, codigoTipoEvento=6, mesAnoReferenciaContabilizacao=032023, _class=br.com.bb.rcp.model.vantagens.HistoricoContabil}, {idEvento.$oid=6422b4c97e45dd75abb4f832, dataHoraEvento.$date=1679985309584, codigoTipoEvento=6, mesAnoReferenciaContabilizacao=032023, _class=br.com.bb.rcp.model.vantagens.HistoricoContabil}]"

s = re.sub(r"([^ =,\[\]\{\}]+)=([^ =,\[\]\{\}]+)", r'"\g<1>":"\g<2>"', s)
data = json.loads(s)

df = pd.DataFrame(data)
print(df)

打印:

              idEvento.$oid dataHoraEvento.$date codigoTipoEvento mesAnoReferenciaContabilizacao                                           _class
0  63ffaec3cdc01e6352729bad        1677690003377                1                         032023                                              NaN
1  63ffb5c8cdc01e6352729bae        1677691800676                3                         032023                                              NaN
2  6405cc8711c78c20369b4033        1678090851560                8                         032023                                              NaN
3  6422b4c97e45dd75abb4f831        1679985307560                6                         032023  br.com.bb.rcp.model.vantagens.HistoricoContabil
4  6422b4c97e45dd75abb4f832        1679985309584                6                         032023  br.com.bb.rcp.model.vantagens.HistoricoContabil

注意:此方法适用于本例,但可能需要针对实际情况调整模式.


编辑:要应用于数据帧,请执行以下操作:

请考虑以下数据帧:

df = pd.DataFrame(
    {
        "col1": [
            "[{idEvento.$oid=01_63ffaec3cdc01e6352729bad, dataHoraEvento.$date=1677690003377, codigoTipoEvento=1, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=63ffb5c8cdc01e6352729bae, dataHoraEvento.$date=1677691800676, codigoTipoEvento=3, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=6405cc8711c78c20369b4033, dataHoraEvento.$date=1678090851560, codigoTipoEvento=8, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=6422b4c97e45dd75abb4f831, dataHoraEvento.$date=1679985307560, codigoTipoEvento=6, mesAnoReferenciaContabilizacao=032023, _class=br.com.bb.rcp.model.vantagens.HistoricoContabil}, {idEvento.$oid=6422b4c97e45dd75abb4f832, dataHoraEvento.$date=1679985309584, codigoTipoEvento=6, mesAnoReferenciaContabilizacao=032023, _class=br.com.bb.rcp.model.vantagens.HistoricoContabil}]",
            "[{idEvento.$oid=02_63ffaec3cdc01e6352729bad, dataHoraEvento.$date=1677690003377, codigoTipoEvento=1, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=63ffb5c8cdc01e6352729bae, dataHoraEvento.$date=1677691800676, codigoTipoEvento=3, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=6405cc8711c78c20369b4033, dataHoraEvento.$date=1678090851560, codigoTipoEvento=8, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=6422b4c97e45dd75abb4f831, dataHoraEvento.$date=1679985307560, codigoTipoEvento=6, mesAnoReferenciaContabilizacao=032023, _class=br.com.bb.rcp.model.vantagens.HistoricoContabil}, {idEvento.$oid=6422b4c97e45dd75abb4f832, dataHoraEvento.$date=1679985309584, codigoTipoEvento=6, mesAnoReferenciaContabilizacao=032023, _class=br.com.bb.rcp.model.vantagens.HistoricoContabil}]",
            "[{idEvento.$oid=03_63ffaec3cdc01e6352729bad, dataHoraEvento.$date=1677690003377, codigoTipoEvento=1, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=63ffb5c8cdc01e6352729bae, dataHoraEvento.$date=1677691800676, codigoTipoEvento=3, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=6405cc8711c78c20369b4033, dataHoraEvento.$date=1678090851560, codigoTipoEvento=8, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=6422b4c97e45dd75abb4f831, dataHoraEvento.$date=1679985307560, codigoTipoEvento=6, mesAnoReferenciaContabilizacao=032023, _class=br.com.bb.rcp.model.vantagens.HistoricoContabil}, {idEvento.$oid=6422b4c97e45dd75abb4f832, dataHoraEvento.$date=1679985309584, codigoTipoEvento=6, mesAnoReferenciaContabilizacao=032023, _class=br.com.bb.rcp.model.vantagens.HistoricoContabil}]",
        ]
    }
)

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               col1
0  [{idEvento.$oid=01_63ffaec3cdc01e6352729bad, dataHoraEvento.$date=1677690003377, codigoTipoEvento=1, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=63ffb5c8cdc01e6352729bae, dataHoraEvento.$date=1677691800676, codigoTipoEvento=3, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=6405cc8711c78c20369b4033, dataHoraEvento.$date=1678090851560, codigoTipoEvento=8, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=6422b4c97e45dd75abb4f831, dataHoraEvento.$date=1679985307560, codigoTipoEvento=6, mesAnoReferenciaContabilizacao=032023, _class=br.com.bb.rcp.model.vantagens.HistoricoContabil}, {idEvento.$oid=6422b4c97e45dd75abb4f832, dataHoraEvento.$date=1679985309584, codigoTipoEvento=6, mesAnoReferenciaContabilizacao=032023, _class=br.com.bb.rcp.model.vantagens.HistoricoContabil}]
1  [{idEvento.$oid=02_63ffaec3cdc01e6352729bad, dataHoraEvento.$date=1677690003377, codigoTipoEvento=1, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=63ffb5c8cdc01e6352729bae, dataHoraEvento.$date=1677691800676, codigoTipoEvento=3, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=6405cc8711c78c20369b4033, dataHoraEvento.$date=1678090851560, codigoTipoEvento=8, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=6422b4c97e45dd75abb4f831, dataHoraEvento.$date=1679985307560, codigoTipoEvento=6, mesAnoReferenciaContabilizacao=032023, _class=br.com.bb.rcp.model.vantagens.HistoricoContabil}, {idEvento.$oid=6422b4c97e45dd75abb4f832, dataHoraEvento.$date=1679985309584, codigoTipoEvento=6, mesAnoReferenciaContabilizacao=032023, _class=br.com.bb.rcp.model.vantagens.HistoricoContabil}]
2  [{idEvento.$oid=03_63ffaec3cdc01e6352729bad, dataHoraEvento.$date=1677690003377, codigoTipoEvento=1, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=63ffb5c8cdc01e6352729bae, dataHoraEvento.$date=1677691800676, codigoTipoEvento=3, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=6405cc8711c78c20369b4033, dataHoraEvento.$date=1678090851560, codigoTipoEvento=8, mesAnoReferenciaContabilizacao=032023}, {idEvento.$oid=6422b4c97e45dd75abb4f831, dataHoraEvento.$date=1679985307560, codigoTipoEvento=6, mesAnoReferenciaContabilizacao=032023, _class=br.com.bb.rcp.model.vantagens.HistoricoContabil}, {idEvento.$oid=6422b4c97e45dd75abb4f832, dataHoraEvento.$date=1679985309584, codigoTipoEvento=6, mesAnoReferenciaContabilizacao=032023, _class=br.com.bb.rcp.model.vantagens.HistoricoContabil}]

然后:

def fn(x):
    x = re.sub(r"([^ =,\[\]\{\}]+)=([^ =,\[\]\{\}]+)", r'"\g<1>":"\g<2>"', x)
    return json.loads(x)

out = df["col1"].apply(fn).explode().apply(pd.Series)
print(out)

打印:

                 idEvento.$oid dataHoraEvento.$date codigoTipoEvento mesAnoReferenciaContabilizacao                                           _class
0  01_63ffaec3cdc01e6352729bad        1677690003377                1                         032023                                              NaN
0     63ffb5c8cdc01e6352729bae        1677691800676                3                         032023                                              NaN
0     6405cc8711c78c20369b4033        1678090851560                8                         032023                                              NaN
0     6422b4c97e45dd75abb4f831        1679985307560                6                         032023  br.com.bb.rcp.model.vantagens.HistoricoContabil
0     6422b4c97e45dd75abb4f832        1679985309584                6                         032023  br.com.bb.rcp.model.vantagens.HistoricoContabil
1  02_63ffaec3cdc01e6352729bad        1677690003377                1                         032023                                              NaN
1     63ffb5c8cdc01e6352729bae        1677691800676                3                         032023                                              NaN
1     6405cc8711c78c20369b4033        1678090851560                8                         032023                                              NaN
1     6422b4c97e45dd75abb4f831        1679985307560                6                         032023  br.com.bb.rcp.model.vantagens.HistoricoContabil
1     6422b4c97e45dd75abb4f832        1679985309584                6                         032023  br.com.bb.rcp.model.vantagens.HistoricoContabil
2  03_63ffaec3cdc01e6352729bad        1677690003377                1                         032023                                              NaN
2     63ffb5c8cdc01e6352729bae        1677691800676                3                         032023                                              NaN
2     6405cc8711c78c20369b4033        1678090851560                8                         032023                                              NaN
2     6422b4c97e45dd75abb4f831        1679985307560                6                         032023  br.com.bb.rcp.model.vantagens.HistoricoContabil
2     6422b4c97e45dd75abb4f832        1679985309584                6                         032023  br.com.bb.rcp.model.vantagens.HistoricoContabil

Python相关问答推荐

删除所有列值,但判断是否存在任何二元组

' osmnx.shortest_track '返回有效源 node 和目标 node 的'无'

如何在Django基于类的视图中有效地使用UTE和RST HTIP方法?

对所有子图应用相同的轴格式

多指标不同顺序串联大Pandas 模型

在单个对象中解析多个Python数据帧

如何并行化/加速并行numba代码?

如何启动下载并在不击中磁盘的情况下呈现响应?

什么是合并两个embrame的最佳方法,其中一个有日期范围,另一个有日期没有任何共享列?

python—telegraph—bot send_voice发送空文件

如何按row_id/row_number过滤数据帧

处理Gekko的非最优解

如何在Python Pandas中填充外部连接后的列中填充DDL值

为用户输入的整数查找根/幂整数对的Python练习

如何将返回引用的函数与pybind11绑定?

使用SQLAlchemy从多线程Python应用程序在postgr中插入多行的最佳方法是什么?'

如何编辑此代码,使其从多个EXCEL文件的特定工作表中提取数据以显示在单独的文件中

删除Dataframe中的第一个空白行并重新索引列

为什么我只用exec()函数运行了一次文件,而Python却运行了两次?

一维不匹配两个数组上的广义ufunc