我的数据格式如下:

[
  {
    "level_1": "A",
    "cols": [
      "A",
      "B"
    ],
    "arno": "DC",
    "table": [
      {
        country: "NO",
        population: 400,
        color: "red"
      },
      {
        country: "AE",
        population: 100,
        color: "red"
      },
      {
        country: "OT",
        population: 200,
        color: "blue"
      },
      {
        country: "AU",
        population: 200,
        color: "red",
        alo: "n"
      },
      
    ]
  },
  {
    "level_1": "A",
    "cols": [
      "A",
      "B"
    ],
    "arno": "CD",
    "table": [
      {
        country: "NO",
        population: 200,
        color: "blue",
        "Supplier Manager": "['Arnold Khan']"
      },
      {
        country: "AE",
        population: 200,
        color: "red",
        "Supplier Manager": "[]"
      },
      {
        country: "AE",
        population: 200,
        color: "green",
        "Supplier Manager": "['Arnold Khan']"
      },
      {
        country: "OT",
        population: 200,
        color: "blue",
        "Supplier Manager": "['Adam Nor', 'Jim Brown']"
      },
      
    ]
  },
  {
    "level_1": "B",
    "cols": [
      "A",
      "B"
    ],
    "arno": "CD",
    "table": [
      {
        country: "AL",
        population: 400,
        color: "red",
        alo: "y"
      },
      {
        country: "AR",
        population: 100,
        color: "green",
        alo: "y"
      },
      {
        country: "YU",
        population: 200,
        color: "red",
        alo: "y"
      },
      {
        country: "AX",
        population: 200,
        color: "red",
        alo: "n"
      },
      
    ]
  }
]

我正在运行以下查询,以从数据库中所有对象的嵌套数组table中检索值:

db.collection.aggregate([
  {
    $match: {
      "$and": [
        {
          "level_1": "A"
        },
        {
          "arno": "CD"
        }
      ]
    }
  },
  {
    "$addFields": {
      "table": {
        "$filter": {
          "input": "$table",
          "as": "t",
          "cond": {
            "$and": [
              {
                "$or": [
                  {
                    "$eq": [
                      "$$t.color",
                      "blue"
                    ]
                  },
                  {
                    "$eq": [
                      "$$t.color",
                      "red"
                    ]
                  }
                ]
              },
              {
                "$eq": [
                  "$$t.population",
                  200
                ]
              },
              {
                "$or": [
                  {
                    "$regexMatch": {
                      "input": "$$t.Supplier Manager",
                      "regex": "Jim Brown",
                      "options": "i"
                    }
                  },
                  {
                    "$regexMatch": {
                      "input": "$$t.Supplier Manager",
                      "regex": "Arnold Khan",
                      "options": "i"
                    }
                  },
                  
                ]
              }
            ]
          }
        }
      }
    }
  }
])

现在我得到的结果是:

[
  {
    "_id": ObjectId("5a934e000102030405000001"),
    "arno": "CD",
    "cols": [
      "A",
      "B"
    ],
    "level_1": "A",
    "table": [
      {
        "Supplier Manager": "['Arnold Khan']",
        "color": "blue",
        "country": "NO",
        "population": 200
      },
      {
        "Supplier Manager": "['Adam Nor', 'Jim Brown']",
        "color": "blue",
        "country": "OT",
        "population": 200
      }
    ]
  }
]

这是正确的,但我现在想要为table个对象数组中的每个变量列出不同的值,以聚合到table个数组输出的结果中. 例如,Get DISTINCT:

'color' : ["blue"]
'country' : ["OT", "NO"]
'population' : [200]
...

这是可以在MongoDB Aggregate Query中实现的,还是将其加载到Pandas DataFrame并从那里检索更好?

工作操场示例:https://mongoplayground.net/p/VSOZ2YVQa4w

推荐答案

使用$setUnion来区分数组中的值.

{
  $set: {
    color: {
      $setUnion: "$table.color"
    },
    country: {
      $setUnion: "$table.country"
    },
    population: {
      $setUnion: "$table.population"
    }
  }
}

Demo @ Mongo Playground

Mongodb相关问答推荐

使用mongosh将大型json文件插入到mongo集合中

如何在MongoDB中查找和过滤嵌套数组

Spring数据MongoDB(聚合)

(MongoDB)在同一管道中结合联合和交集

Mongoose 和 MongoDB,如何在具有多个引用的两个模型之间创建关系?

以聚合顺序使用 $$ROOT

如何在 2 个应用程序之间共享mongoose模型?

获取收集字节使用情况统计信息的pymongo方法?

Mongo 可尾游标与 Redis 发布/订阅

将 mongoose 字符串模式类型默认值设为空白并使该字段可选

升级mongodb

无法将 $match 运算符用于带有 ObjectId 的 mongodb/mongoose 聚合

Spring Mongodb @DBREF

如何在array.NET驱动程序中的元素属性上创建MongoDB MultiKey索引

如何在 MongoDB 聚合查询中使用 $hint?

在MongoDB中查询一个半​​径内的位置

使用 MongoDB 的 map/reduce 来分组两个字段

为什么我新创建的 mongodb 本地数据库增长到 24GB?

使用 $in 进行不区分大小写的搜索

我如何将 mongodb 与electron一起使用?