在与另一个集合进行看似复杂的匹配时,我在理解聚合管道时遇到了一些问题.目标是获得特定用户在分析集合中没有video_impression个条目的视频的列表.

我的数据看起来像这样:

db={
  "videos": [
    {
      "_id": "1",
      "name": "1's Video",
      "status": "complete",
      "privacy": "public"
    },
    {
      "_id": "2",
      "name": "2's Video",
      "status": "complete",
      "privacy": "public"
    },
    {
      "_id": "3",
      "name": "3's Video",
      "status": "complete",
      "privacy": "public"
    },
    {
      "_id": "4",
      "name": "4's Video",
      "status": "complete",
      "privacy": "private"
    },
    {
      "_id": "5",
      "name": "5's Video",
      "status": "flagged",
      "privacy": "public"
    }
  ],
  "analytics": [
    {
      "_id": "1",
      "user": "1",
      "event": "video_impression",
      "data": {
        "video": "1"
      }
    },
    {
      "_id": "2",
      "user": "2",
      "event": "video_impression",
      "data": {
        "video": "2"
      }
    }
  ]
}

我设法让一个匹配器工作,但它工作了"globally" ie.它没有考虑用户id,所以它返回的文档与任何人都不匹配.

db.videos.aggregate([
  {
    $match: {
      "status": "complete",
      "privacy": "public"
    }
  },
  {
    $lookup: {
      from: "analytics",
      localField: "_id",
      foreignField: "data.video",
      as: "matched_docs"
    }
  },
  {
    $match: {
      "matched_docs": {
        $eq: []
      }
    }
  }
])

我try 向管道添加另一个$lookup阶段来查找user字段,但似乎也不起作用,因为数据总是空的.我遇到的Here's a Mongo Playground个问题可能有助于进一步解释它.

推荐答案

1.首先,从analytics集合运行此聚合比从videos集合运行要好.或者更好的是,如果你有的话,可以使用users系列.

2.根据jQueeny's comment,‘analytics’集合示例有点不完整.我假设每个事件只存在一次,因此如果用户观看两个视频,analytics中将有两个条目,而不是只有一个包含视频数组的条目.PS.我建议您将其更改 for each 事件类型data中的对象ID数组,每个用户有单独的记录,甚至将它们合并为一个,这取决于您以后计划如何使用它.

anaylytics个Collection :

[
  { "_id": "1", "user": "1", "event": "video_impression", "data": { "video": "1" } },
  { "_id": "2", "user": "2", "event": "video_impression", "data": { "video": "2" } },
  { "_id": "3", "user": "2", "event": "video_impression", "data": { "video": "3" } },
  { "_id": "4", "user": "2", "event": "liked_video", "data": { "video": "2" } }
]

这里的3.方法是使用Uncorrelated subquery来使用pipeline syntax of $lookup来获取所有的视频ID.它的优势是only being run once and then using the cache:

一百零二

但是,如果视频集合太大,并且每个阶段的文档变为100MB,则此管道将失败,您将需要使用相关子查询.

4.这里使用的方法是:

A)使用analytics个集合,只筛选出video_impressions个事件,按user_id分组,然后创建他们观看/印象深刻的set个视频(唯一数组).

b)使用查找来获得all个视频ID为"公共+完整"视频到一个数组中

C)区分所有视频和有印象的视频.

5.顺便说一句,如果你想一次只为一个用户这样做,比如一个网页/FE,那么在第一个匹配阶段添加user: <user_id>event.

db.analytics.aggregate([
  {
    // select only the video_impression events
    $match: { event: "video_impression" }
  },
  {
    // first uniquify your users but you should
    // probably run this from the users collection
    $group: {
      _id: "$user",
      impressioned_vids: { "$addToSet": "$data.video" },
      // remove this if you want the user as _id
      user: { "$first": "$user" }
    }
  },
  { $project: { _id: 0 } },
  {
    // uncorrelated subquery which should only run once
    // and then is cached
    $lookup: {
      from: "videos",
      pipeline: [
        {
          $match: {
            status: "complete",
            privacy: "public"
          }
        },
        {
          $group: {
            _id: null,
            video_ids: { $push: "$_id" }
          }
        }
      ],
      as: "all_vids"
    }
  },
  {
    // put it conveniently into a single list
    $set: { all_vids: { $first: "$all_vids.video_ids" } }
  },
  {
    // these are the public-complete videos which that user has not seen
    $set: {
      unimpressed: {
        $setDifference: [ "$all_vids", "$impressioned_vids" ]
      }
    }
  },
  {
    // get rid of the other fields, uncomment to debug
    $project: {
      user: 1,
      unimpressed: 1
    }
  }
])

对于我修改后的analytics集合和您的原始videos集合,结果如下:

[
  {
    "unimpressed": ["1"],
    "user": "2"
  },
  {
    "unimpressed": ["2", "3"],
    "user": "1"
  }
]

Mongo Playground


Option 2

如果all_vids的列表对于uncorrelated 101来说太大,那么它将需要是correlated $lookup,这将在每个文档中执行一次.主要的变化是在$lookup阶段,在那里我判断视频不在印象/观看视频列表中(或者在本例中,交叉点是空的).这需要将"已看到的视频"数组赋给let中的一个变量,然后在查找管道中使用该变量.

  {
    // correlated subquery which executes per record
    $lookup: {
      from: "videos",
      let: {
        seen_vid_ids: "$impressioned_vids"
      },
      pipeline: [
        {
          $match: {
            status: "complete",
            privacy: "public",
            // I wanted to do:
            // _id: { $nin: "$$seen_vid_ids" }  // syntax error
            // or even
            // _id: { $nin: ["$$seen_vid_ids"] }  // doesn't work
            //
            // but instead had to do this
            $expr: {
              $eq: [
                {
                  "$setIntersection": [ ["$_id"], "$$seen_vid_ids" ]
                },
                []
              ]
            }
          }
        }
      ],
      as: "unseen_vids"
    }
  },

Mongo Playground with the full aggregation

Mongodb相关问答推荐

Mongodb如果数组中有外部键,则将排序应用到查找结果

如何限制/筛选子文档中的条目?

如何在MongoDB中对两个数组进行分组?

MongoDB 投影按数组中的字符串长度排序

Mongo,通过 Lookup 计算多个文档中数组中的项目

使用特定关键字和邻近度进行查询和过滤

我可以在 MongoDB 中将字段值设置为对象键吗?

MongoDB:从开始日期和结束日期数组中匹配特定日期的聚合查询

如何使用指南针连接到 mongodb replicaset (k8s)

MongoDB:嵌套数组计数+原始文档

有没有办法从另一条记录中插入一条记录

程序可以运行,但我不断收到发送到客户端后无法设置标题,我应该忽略它吗?

MongoDB 更改流副本集限制

使用 Flask-pymongo 扩展通过 _id 在 MongoDB 中搜索文档

如何使用python将csv数据推送到mongodb

Node.js 和 Passport 对象没有方法 validPassword

如何使用原子操作在一个文档中切换布尔字段?

MongoDb:如何将附加对象插入对象集合?

没有参考选项的mongoose填充字段

对于社交网站(使用 Ruby on Rails 开发)来说,MongoDB 会是一个好主意吗?