如何使用 MongoDB 以编程方式预拆分基于 GUID 的分片键

发布于10月30日

假设我使用的是一个相当标准的32字符十六进制GUID，我已经确定，因为它是为我的用户随机生成的，所以它非常适合用作切分键，以水平zoom 对MongoDB集合的写入，我将在其中存储用户信息(而写入zoom 是我最关心的问题).

我也知道我需要从至少4个碎片开始，因为流量预测和测试环境中的一些基准测试工作.

最后，我对我的初始数据大小(平均文档大小*初始用户数)有了一个不错的 idea ，大约为120GB.

我想让初始加载变得又好又快，并尽可能多地利用所有4个碎片.如何预先分割这些数据，以便充分利用这4个碎片，并将初始数据加载期间碎片上需要发生的移动、分割等次数降至最低？

推荐答案

我们知道初始数据大小(120GB)，也知道MongoDB is 64MB中默认的最 Big Data 块大小.如果我们将64MB划分为120GB，我们得到1920个块——这是我们应该首先考虑的最小块数.2048恰好是16除以2的幂，并且考虑到GUID(我们的切分键)是基于十六进制的，这比1920(见下文)更容易处理.

NOTE:必须进行预拆分before向集合中添加任何数据.如果在包含数据的集合上使用enableSharding()命令，MongoDB将拆分数据本身，然后在块已经存在的情况下运行该命令，这可能会导致非常奇怪的块分布，所以要小心.

为了回答这个问题，我们假设数据库将被称为users，集合将被称为userInfo.我们还假设GUID将写入_id字段.使用这些参数，我们将连接到mongos并运行以下命令:

// first switch to the users DB
use users;
// now enable sharding for the users DB
sh.enableSharding("users"); 
// enable sharding on the relevant collection
sh.shardCollection("users.userInfo", {"_id" : 1});
// finally, disable the balancer (see below for options on a per-collection basis)
// this prevents migrations from kicking off and interfering with the splits by competing for meta data locks
sh.stopBalancer();

现在，根据上面的计算，我们需要将GUID范围拆分为2048个块.要做到这一点，我们至少需要3个十六进制数字(16^3=4096)，我们将把它们放在范围的最高有效位(即最左边的3个).同样，这应该从mongos个shell 运行

// Simply use a for loop for each digit
for ( var x=0; x < 16; x++ ){
  for( var y=0; y<16; y++ ) {
  // for the innermost loop we will increment by 2 to get 2048 total iterations
  // make this z++ for 4096 - that would give ~30MB chunks based on the original figures
    for ( var z=0; z<16; z+=2 ) {
    // now construct the GUID with zeroes for padding - handily the toString method takes an argument to specify the base
        var prefix = "" + x.toString(16) + y.toString(16) + z.toString(16) + "00000000000000000000000000000";
        // finally, use the split command to create the appropriate chunk
        db.adminCommand( { split : "users.userInfo" , middle : { _id : prefix } } );
    }
  }
}

完成后，让我们使用sh.status()助手判断播放状态:

mongos> sh.status()
--- Sharding Status ---
  sharding version: {
        "_id" : 1,
        "version" : 3,
        "minCompatibleVersion" : 3,
        "currentVersion" : 4,
        "clusterId" : ObjectId("527056b8f6985e1bcce4c4cb")
}
  shards:
        {  "_id" : "shard0000",  "host" : "localhost:30000" }
        {  "_id" : "shard0001",  "host" : "localhost:30001" }
        {  "_id" : "shard0002",  "host" : "localhost:30002" }
        {  "_id" : "shard0003",  "host" : "localhost:30003" }
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "users",  "partitioned" : true,  "primary" : "shard0001" }
                users.userInfo
                        shard key: { "_id" : 1 }
                        chunks:
                                shard0001       2049
                        too many chunks to print, use verbose if you want to force print

我们有2048个块(加上一个额外的块，这要归功于最小/最大块)，但它们都仍然在原始碎片上，因为平衡器关闭了.那么，让我们重新启用平衡器:

sh.startBalancer();

这将立即开始平衡，速度会相对较快，因为所有的数据块都是空的，但仍需要一段时间(如果与其他集合的迁移竞争，速度会慢得多).一旦过了一段时间，再次运行sh.status()，你(应该)就可以得到它了——2048个块都很好地分割成4个碎片，并准备好进行初始数据加载:

mongos> sh.status()
--- Sharding Status ---
  sharding version: {
        "_id" : 1,
        "version" : 3,
        "minCompatibleVersion" : 3,
        "currentVersion" : 4,
        "clusterId" : ObjectId("527056b8f6985e1bcce4c4cb")
}
  shards:
        {  "_id" : "shard0000",  "host" : "localhost:30000" }
        {  "_id" : "shard0001",  "host" : "localhost:30001" }
        {  "_id" : "shard0002",  "host" : "localhost:30002" }
        {  "_id" : "shard0003",  "host" : "localhost:30003" }
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "users",  "partitioned" : true,  "primary" : "shard0001" }
                users.userInfo
                        shard key: { "_id" : 1 }
                        chunks:
                                shard0000       512
                                shard0002       512
                                shard0003       512
                                shard0001       513
                        too many chunks to print, use verbose if you want to force print
        {  "_id" : "test",  "partitioned" : false,  "primary" : "shard0002" }

您现在可以开始加载数据了，但为了绝对保证在数据加载完成之前不会发生拆分或迁移，您还需要做一件事——在导入期间关闭平衡器和自动拆分:

要禁用所有平衡，请从mongos:sh.stopBalancer()运行此命令
如果想让其他平衡操作保持运行，可以对特定集合禁用.以上面的名称空间为例:sh.disableBalancing("users.userInfo")
要在加载过程中关闭自动拆分，需要使用--noAutoSplit选项重新启动用于加载数据的每个mongos.

导入完成后，根据需要反转步骤(sh.startBalancer()、sh.enableBalancing("users.userInfo")，重新启动mongos而不使用--noAutoSplit)，将所有内容恢复为默认设置.

更新:优化速度

如果你不着急的话，上面的方法是可以的.就目前的情况来看，如果你测试这个，你会发现，平衡器不是很快——即使是空块.因此，当你增加你创建的块的数量时，平衡所需的时间就越长.我已经看到，平衡2048个块需要30多分钟，尽管这取决于部署情况.

对于测试或相对安静的集群来说，这可能没问题，但是在繁忙的集群上，关闭平衡器并且不需要其他更新干扰，这将很难确保.那么，我们如何加快速度呢？

答案是提前进行一些手动移动，然后在碎片上分割块.请注意，这仅适用于特定的分片密钥(如随机分布的UUID)或特定的数据访问模式，因此要小心，以免最终导致数据分布不良.

使用上面的例子，我们有4个碎片，所以我们不是做所有的分割，然后平衡，而是分成4个.然后，我们通过手动移动每个碎片，在每个碎片上放置一个块，最后我们将这些块分割成所需的数量.

上例中的范围如下所示:

$min --> "40000000000000000000000000000000"
"40000000000000000000000000000000" --> "80000000000000000000000000000000"
"80000000000000000000000000000000" --> "c0000000000000000000000000000000"
"c0000000000000000000000000000000" --> $max

创建这些命令只需要4个命令，但既然我们有了它，为什么不以简化/修改的形式重复使用上面的循环:

for ( var x=4; x < 16; x+=4){
    var prefix = "" + x.toString(16) + "0000000000000000000000000000000";
    db.adminCommand( { split : "users.userInfo" , middle : { _id : prefix } } ); 
}

下面是thinks现在的样子-我们有4块，都在Shard001上:

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
    "_id" : 1,
    "version" : 4,
    "minCompatibleVersion" : 4,
    "currentVersion" : 5,
    "clusterId" : ObjectId("53467e59aea36af7b82a75c1")
}
  shards:
    {  "_id" : "shard0000",  "host" : "localhost:30000" }
    {  "_id" : "shard0001",  "host" : "localhost:30001" }
    {  "_id" : "shard0002",  "host" : "localhost:30002" }
    {  "_id" : "shard0003",  "host" : "localhost:30003" }
  databases:
    {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
    {  "_id" : "test",  "partitioned" : false,  "primary" : "shard0001" }
    {  "_id" : "users",  "partitioned" : true,  "primary" : "shard0001" }
        users.userInfo
            shard key: { "_id" : 1 }
            chunks:
                shard0001   4
            { "_id" : { "$minKey" : 1 } } -->> { "_id" : "40000000000000000000000000000000" } on : shard0001 Timestamp(1, 1) 
            { "_id" : "40000000000000000000000000000000" } -->> { "_id" : "80000000000000000000000000000000" } on : shard0001 Timestamp(1, 3) 
            { "_id" : "80000000000000000000000000000000" } -->> { "_id" : "c0000000000000000000000000000000" } on : shard0001 Timestamp(1, 5) 
            { "_id" : "c0000000000000000000000000000000" } -->> { "_id" : { "$maxKey" : 1 } } on : shard0001 Timestamp(1, 6)

我们将把$min块留在原处，然后移动另外三块.你可以通过编程来实现这一点，但这取决于块最初的位置、你如何命名你的碎片等.所以我现在就把这本手册留给大家，它不太繁重——只有3moveChunk个命令:

mongos> sh.moveChunk("users.userInfo", {"_id" : "40000000000000000000000000000000"}, "shard0000")
{ "millis" : 1091, "ok" : 1 }
mongos> sh.moveChunk("users.userInfo", {"_id" : "80000000000000000000000000000000"}, "shard0002")
{ "millis" : 1078, "ok" : 1 }
mongos> sh.moveChunk("users.userInfo", {"_id" : "c0000000000000000000000000000000"}, "shard0003")
{ "millis" : 1083, "ok" : 1 }

让我们再次判断，并确保这些块在我们预期的位置:

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
    "_id" : 1,
    "version" : 4,
    "minCompatibleVersion" : 4,
    "currentVersion" : 5,
    "clusterId" : ObjectId("53467e59aea36af7b82a75c1")
}
  shards:
    {  "_id" : "shard0000",  "host" : "localhost:30000" }
    {  "_id" : "shard0001",  "host" : "localhost:30001" }
    {  "_id" : "shard0002",  "host" : "localhost:30002" }
    {  "_id" : "shard0003",  "host" : "localhost:30003" }
  databases:
    {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
    {  "_id" : "test",  "partitioned" : false,  "primary" : "shard0001" }
    {  "_id" : "users",  "partitioned" : true,  "primary" : "shard0001" }
        users.userInfo
            shard key: { "_id" : 1 }
            chunks:
                shard0001   1
                shard0000   1
                shard0002   1
                shard0003   1
            { "_id" : { "$minKey" : 1 } } -->> { "_id" : "40000000000000000000000000000000" } on : shard0001 Timestamp(4, 1) 
            { "_id" : "40000000000000000000000000000000" } -->> { "_id" : "80000000000000000000000000000000" } on : shard0000 Timestamp(2, 0) 
            { "_id" : "80000000000000000000000000000000" } -->> { "_id" : "c0000000000000000000000000000000" } on : shard0002 Timestamp(3, 0) 
            { "_id" : "c0000000000000000000000000000000" } -->> { "_id" : { "$maxKey" : 1 } } on : shard0003 Timestamp(4, 0)

这与我们上面建议的范围相匹配，所以看起来都不错.现在运行上面的原始循环，在每个碎片上"就地"拆分它们，循环完成后，我们应该有一个平衡的分布.还有sh.status()件事需要确认:

mongos> for ( var x=0; x < 16; x++ ){
...   for( var y=0; y<16; y++ ) {
...   // for the innermost loop we will increment by 2 to get 2048 total iterations
...   // make this z++ for 4096 - that would give ~30MB chunks based on the original figures
...     for ( var z=0; z<16; z+=2 ) {
...     // now construct the GUID with zeroes for padding - handily the toString method takes an argument to specify the base
...         var prefix = "" + x.toString(16) + y.toString(16) + z.toString(16) + "00000000000000000000000000000";
...         // finally, use the split command to create the appropriate chunk
...         db.adminCommand( { split : "users.userInfo" , middle : { _id : prefix } } );
...     }
...   }
... }          
{ "ok" : 1 }
mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
    "_id" : 1,
    "version" : 4,
    "minCompatibleVersion" : 4,
    "currentVersion" : 5,
    "clusterId" : ObjectId("53467e59aea36af7b82a75c1")
}
  shards:
    {  "_id" : "shard0000",  "host" : "localhost:30000" }
    {  "_id" : "shard0001",  "host" : "localhost:30001" }
    {  "_id" : "shard0002",  "host" : "localhost:30002" }
    {  "_id" : "shard0003",  "host" : "localhost:30003" }
  databases:
    {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
    {  "_id" : "test",  "partitioned" : false,  "primary" : "shard0001" }
    {  "_id" : "users",  "partitioned" : true,  "primary" : "shard0001" }
        users.userInfo
            shard key: { "_id" : 1 }
            chunks:
                shard0001   513
                shard0000   512
                shard0002   512
                shard0003   512
            too many chunks to print, use verbose if you want to force print

现在你有了它——不用等待平衡器，分布已经均匀了.

如何使用 MongoDB 以编程方式预拆分基于 GUID 的分片键

推荐答案

更新:优化速度

Mongodb相关问答推荐

使用查询参数过滤MongoDB Go驱动程序时出现问题

MongoDB索引使用

MongoDB 聚合 - $project 和 $match 阶段未按预期工作

$lookup 的参数必须是字符串

如何在mongodb中级联删除文档？

mongodb：多键索引 struct ？

node.js & express - 应用程序 struct 的全局模块和最佳实践

findOneAndUpdate 和 findOneAndReplace 有什么区别？

从 PHP 打印 MongoDB 日期

子文档上的mongoose唯一索引

在 MongoDb 中查询小于 NOW 的日期时间值

什么是 mongodb 中的admin数据库？

如何在mongodb中删除数组的第n个元素

Mongoose.js：嵌套属性的原子更新？

在 MongoDB 中快速搜索数十亿个小文档的策略

如何使用 mgo 从 golang 中的 mongodb 集合中 Select 所有记录

MongoError：failed to connect to server [localhost：27017] on first connect

Hadoop Map/Reduce 与内置 Map/Reduce

如何使用 mongodb-java-driver 进行 upsert

MissingSchemaError：Schema hasn't been registered for model