我的数据格式如下:
[
{
"level_1": "A",
"cols": [
"A",
"B"
],
"arno": "DC",
"table": [
{
country: "NO",
population: 400,
color: "red"
},
{
country: "AE",
population: 100,
color: "red"
},
{
country: "OT",
population: 200,
color: "blue"
},
{
country: "AU",
population: 200,
color: "red",
alo: "n"
},
]
},
{
"level_1": "A",
"cols": [
"A",
"B"
],
"arno": "CD",
"table": [
{
country: "NO",
population: 200,
color: "blue",
"Supplier Manager": "['Arnold Khan']"
},
{
country: "AE",
population: 200,
color: "red",
"Supplier Manager": "[]"
},
{
country: "AE",
population: 200,
color: "green",
"Supplier Manager": "['Arnold Khan']"
},
{
country: "OT",
population: 200,
color: "blue",
"Supplier Manager": "['Adam Nor', 'Jim Brown']"
},
]
},
{
"level_1": "B",
"cols": [
"A",
"B"
],
"arno": "CD",
"table": [
{
country: "AL",
population: 400,
color: "red",
alo: "y"
},
{
country: "AR",
population: 100,
color: "green",
alo: "y"
},
{
country: "YU",
population: 200,
color: "red",
alo: "y"
},
{
country: "AX",
population: 200,
color: "red",
alo: "n"
},
]
}
]
我正在运行以下查询,以从数据库中所有对象的嵌套数组table
中检索值:
db.collection.aggregate([
{
$match: {
"$and": [
{
"level_1": "A"
},
{
"arno": "CD"
}
]
}
},
{
"$addFields": {
"table": {
"$filter": {
"input": "$table",
"as": "t",
"cond": {
"$and": [
{
"$or": [
{
"$eq": [
"$$t.color",
"blue"
]
},
{
"$eq": [
"$$t.color",
"red"
]
}
]
},
{
"$eq": [
"$$t.population",
200
]
},
{
"$or": [
{
"$regexMatch": {
"input": "$$t.Supplier Manager",
"regex": "Jim Brown",
"options": "i"
}
},
{
"$regexMatch": {
"input": "$$t.Supplier Manager",
"regex": "Arnold Khan",
"options": "i"
}
},
]
}
]
}
}
}
}
}
])
现在我得到的结果是:
[
{
"_id": ObjectId("5a934e000102030405000001"),
"arno": "CD",
"cols": [
"A",
"B"
],
"level_1": "A",
"table": [
{
"Supplier Manager": "['Arnold Khan']",
"color": "blue",
"country": "NO",
"population": 200
},
{
"Supplier Manager": "['Adam Nor', 'Jim Brown']",
"color": "blue",
"country": "OT",
"population": 200
}
]
}
]
这是正确的,但我现在想要为table
个对象数组中的每个变量列出不同的值,以聚合到table
个数组输出的结果中.
例如,Get DISTINCT:
'color' : ["blue"]
'country' : ["OT", "NO"]
'population' : [200]
...
这是可以在MongoDB Aggregate Query中实现的,还是将其加载到Pandas DataFrame并从那里检索更好?