Python 使用嵌套对象字段的Qdrant过滤

发布于03月17日

我在Qdrant上有一个数据 struct ，在有效载荷中，我有这样的内容:


{
    "attributes": [
        {
            "attribute_value_id": 22003,
            "id": 1252,
            "key": "Environment",
            "value": "Casual/Daily",
        },
        {
            "attribute_value_id": 98763,
            "id": 1254,
            "key": "Color",
            "value": "Multicolored",
        },
        {
            "attribute_value_id": 22040,
            "id": 1255,
            "key": "Material",
            "value": "Polyester",
        },
    ],
    "brand": {
        "id": 114326,
        "logo": None,
        "slug": "happiness-istanbul-114326",
        "title": "Happiness Istanbul",
    },
}

根据Qdrant documentations，我为品牌实现了这样的过滤:

filters_list = []
    if param_filters:
        brands = param_filters.get("brand_params")
        if brands:
            filter = models.FieldCondition(
                key="brand.id",
                match=models.MatchAny(any=[int(brand) for brand in brands]),
            )
            filters_list.append(filter)
        search_results = qd_client.search(
            query_filter=models.Filter(must=filters_list),
            collection_name=f"lang{lang}_products",
            query_vector=query_vector,
            search_params=models.SearchParams(hnsw_ef=128, exact=False),
            limit=limit,
        )

到目前为止还有效.但是当我试图过滤"属性"字段时，事情变得复杂了.如你所见，它是一个字典列表，包含的字典如下:

{
    "attribute_value_id": 22040,
    "id": 1255,
    "key": "Material",
    "value": "Polyester",
}

从前端发送的attrs过滤器是这样的 struct :

attrs structure: {"attr_id": [attr_value_ids], "attr_id": [att_value_ids]}
>>> example: {'1237': ['21727', '21759'], '1254': ['52776']}

如何筛选以查看查询过滤器参数中提供的attr_id(此处为1237或1254)是否存在于attributes字段中，并且是否具有列表中提供的attr_value_id之一(例如此处为['21727', '21759'])？

这是我目前为止try 的:

if attrs:
            # attrs structure: {"attr_id": [attr_value_ids], "attr_id": [att_value_ids]}
            print("attrs from search function:", attrs)
            for attr_id, attr_value_ids in attrs.items():
                # Convert attribute value IDs to integers
                attr_value_ids = [
                    int(attr_value_id) for attr_value_id in attr_value_ids
                ]
                # Add a filter for each attribute ID and its values
                filter = models.FieldCondition(
                    key=f"attributes.{attr_id}.attr_value_id",
                    match=models.MatchAny(any=attr_value_ids),
                )
                filters_list.append(filter)

问题是key=f"attributes.{attr_id}.attr_value_id",是错误的，我不知道如何做到这一点.

更新:也许更近一步:

我决定将数据库中的数据平整化，以便做得更好.首先，我创建了一个名为flattered_attributes的新文件，如下所示:

[
  {
    "1237": 21720
  },
  {
    "1254": 52791
  },
  {
    "1255": 22044
  },
]

此外，在过滤之前，我对从前端发送的attr过滤器采用了相同的方法:

        if attrs:
            # attrs structure: {"attr_id": [attr_value_ids], "attr_id": [att_value_ids]}
            # we need to flatten attrs to filter on payloads
            flattened_attr = []
            for attr_id, attr_value_ids in attrs.items():
                for attr_value_id in attr_value_ids:
                    flattened_attr.append({attr_id:int(attr_value_id)})

现在，我有两个类似的命令列表，我想过滤那些至少有一个命令的人，其中一个命令是从前端(flattened_attr)接收的.

有一种类型的过滤，如果键的值存在于值列表中，我们过滤，如前面提到的here in the docs.但是我不知道如何判断DB中的flattened_attributes字段中是否存在dict.

attrs = param_filters.get("attr_params") if attrs: # attrs structure: {"attr_id": [attr_value_ids], "attr_id": [attr_value_ids]} # we need to flatten attrs to filter on payloads for attr_id, attr_value_ids in attrs.items(): flattened_attr = [] for attr_value_id in attr_value_ids: flattened_attr.append(int(attr_value_id)) filter = models.FieldCondition( key="attributes[].attribute_value_id", match=models.MatchAny(any=flattened_attr), ) filters_list.append(filter) search_results = qd_client.search( query_filter=models.Filter(must=filters_list), collection_name=f"lang{lang}_products", query_vector=query_vector, search_params=models.SearchParams(hnsw_ef=128, exact=False), limit=limit, )

for attr_id, attr_value_ids in attrs.items(): flattened_attr = [] for attr_value_id in attr_value_ids: flattened_attr.append(int(attr_value_id)) filter = models.FieldCondition( key="attributes[].attribute_value_id", match=models.MatchAny(any=flattened_attr), ) filters_list.append(filter)

qd_client.search( query_filter=models.Filter(must=filters_list), collection_name=f"lang{lang}_products", query_vector=query_vector, search_params=models.SearchParams(hnsw_ef=128, exact=False), limit=limit, )

Python 使用嵌套对象字段的Qdrant过滤

推荐答案

Python相关问答推荐

Pandas 第二小值有条件

如何避免Chained when/then分配中的Mypy不兼容类型警告？

如何使用pytest来查看Python中是否存在class attribution属性？

无法使用DBFS File API路径附加到CSV In Datricks(OSError Errno 95操作不支持)

我如何根据前一个连续数字改变一串数字？

joblib：无法从父目录的另一个子文件夹加载转储模型

如何启动下载并在不击中磁盘的情况下呈现响应？

通过ManyToMany字段与Through在Django Admin中过滤

在pandas数据框中计算相对体积比指标，并添加指标值作为新列

如何排除prefecture_related中查询集为空的实例？

Flask Jinja2如果语句总是计算为false&

Python Pandas—时间序列—时间戳缺失时间精确在00：00

导入错误：无法导入名称'；操作'；

如何在海上配对图中使某些标记周围的黑色边框

使用json的 pyramid 在客户端返回意外格式

统计numpy. ndarray中的项目列表出现次数的最快方法

仅使用预先计算的排序获取排序元素

高效地计算数字数组中三行上三个点之间的Angular

将数据从一个单元格保存到Jupyter笔记本中的下一个单元格

如何计算Pandas 中具有特定条件的行之间的天差