MongoDB分片故障测试和解决方案

Wesley13
• 阅读 602

一、环境

$ cat /etc/redhat-release 
CentOS Linux release 7.0.1406 (Core) 
$ uname -a
Linux zhaopin-2-201 3.10.0-123.el7.x86_64 #1 SMP Mon Jun 30 12:09:22 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ mongo --port 30000
MongoDB shell version: 3.0.6
connecting to: 127.0.0.1:30000/test
mongos> sh.status();
--- Sharding Status ---
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("561728b4030ea038bcb57fa0")
}
  shards:
        {  "_id" : "sh0",  "host" : "sh0/172.30.2.201:27010" }
        {  "_id" : "sh1",  "host" : "sh1/172.30.2.201:27011" }
  balancer:
        Currently enabled:  yes
        Currently running:  no
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours:
                No recent migrations
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "mydb",  "partitioned" : true,  "primary" : "sh0" }

二、准备

插入测试数据

$ mongo --port 30000
MongoDB shell version: 3.0.6
connecting to: 127.0.0.1:30000/test
mongos> use mydb
switched to db mydb
mongos> sh.shardCollection("mydb.test",{"Uid":1});
{ "collectionsharded" : "mydb.test", "ok" : 1 }
mongos> for (var i = 1; i <= 100000; i++) db.test.save({"Uid":i,"Name":"zhanjindong","Age":13,"Date":new Date()});
WriteResult({ "nInserted" : 1 })
mongos> use config;
switched to db config
mongos> db.chunks.find().limit(3);
{ "_id" : "mydb.test-Uid_MinKey", "lastmod" : Timestamp(10, 1), "lastmodEpoch" : ObjectId("56178dad030ea038bcb592c2"), "ns" : "mydb.test", "min" : { "Uid" : { "$minKey" : 1 } }, "max" : { "Uid" : 2 }, "shard" : "sh0" }
{ "_id" : "mydb.test-Uid_2.0", "lastmod" : Timestamp(1, 2), "lastmodEpoch" : ObjectId("56178dad030ea038bcb592c2"), "ns" : "mydb.test", "min" : { "Uid" : 2 }, "max" : { "Uid" : 10 }, "shard" : "sh0" }
{ "_id" : "mydb.test-Uid_10.0", "lastmod" : Timestamp(11, 1), "lastmodEpoch" : ObjectId("56178dad030ea038bcb592c2"), "ns" : "mydb.test", "min" : { "Uid" : 10 }, "max" : { "Uid" : 4691 }, "shard" : "sh1" }

三、Shard Server节点故障

1.分片中副本集的节点故障

参考MongoDB副本集故障测试和解决方案,Query Routers会自动更新副本集的配置,只需要在副本集里面解决故障

2.分片故障

分片故障是指分片中的节点全部不可用了(如果分片中配置了副本集的话,基本是不会出现这种情况的)

1)将sh0停机

$ ps aux | grep mongo
wenhang+  1493  0.0  0.0 112640   976 pts/0    S+   10:42   0:00 grep --color=auto mongo
root     41492  0.5  0.1 1045696 118336 ?      Sl   Oct09   8:12 /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh0/mongodb.conf
root     41509  0.5  0.1 1028196 108720 ?      Sl   Oct09   7:30 /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh1/mongodb.conf
root     41855  0.6  0.0 566248 55208 ?        Sl   Oct09   9:14 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf0/config.conf
root     41870  0.5  0.0 565220 56180 ?        Sl   Oct09   8:29 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf1/config.conf
root     42170  0.5  0.0 565216 56872 ?        Sl   Oct09   8:25 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf2/config.conf
root     42233  0.3  0.0 260180 13200 ?        Sl   Oct09   5:40 /opt/mongodb/bin/mongos --config /data/mongodb/conf/ms0/mongos.conf
root     42286  0.3  0.0 259148 11564 ?        Sl   Oct09   4:31 /opt/mongodb/bin/mongos --config /data/mongodb/conf/ms1/mongos.conf
$ ps aux | grep mongo | grep -v grep
root     41492  0.5  0.1 1045696 118336 ?      Sl   Oct09   8:12 /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh0/mongodb.conf
root     41509  0.5  0.1 1028196 108720 ?      Sl   Oct09   7:31 /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh1/mongodb.conf
root     41855  0.6  0.0 566248 54988 ?        Sl   Oct09   9:14 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf0/config.conf
root     41870  0.5  0.1 565220 72392 ?        Sl   Oct09   8:29 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf1/config.conf
root     42170  0.5  0.0 565216 56776 ?        Sl   Oct09   8:25 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf2/config.conf
root     42233  0.3  0.0 260180 13200 ?        Sl   Oct09   5:40 /opt/mongodb/bin/mongos --config /data/mongodb/conf/ms0/mongos.conf
root     42286  0.3  0.0 259148 11564 ?        Sl   Oct09   4:31 /opt/mongodb/bin/mongos --config /data/mongodb/conf/ms1/mongos.conf
$ sudo kill 41492
$ ps aux | grep mongo | grep -v grep
root     41509  0.5  0.1 1028196 108736 ?      Sl   Oct09   7:31 /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh1/mongodb.conf
root     41855  0.6  0.1 566248 87832 ?        Sl   Oct09   9:14 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf0/config.conf
root     41870  0.5  0.1 565220 89116 ?        Sl   Oct09   8:29 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf1/config.conf
root     42170  0.5  0.0 565216 56852 ?        Sl   Oct09   8:26 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf2/config.conf
root     42233  0.3  0.0 260180 13200 ?        Sl   Oct09   5:40 /opt/mongodb/bin/mongos --config /data/mongodb/conf/ms0/mongos.conf
root     42286  0.3  0.0 259148 11564 ?        Sl   Oct09   4:31 /opt/mongodb/bin/mongos --config /data/mongodb/conf/ms1/mongos.conf

2)查看集群状态

$ mongo --port 30000
MongoDB shell version: 3.0.6
connecting to: 127.0.0.1:30000/test
mongos> sh.status();
--- Sharding Status ---
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("561728b4030ea038bcb57fa0")
}
  shards:
        {  "_id" : "sh0",  "host" : "sh0/172.30.2.201:27010" }
        {  "_id" : "sh1",  "host" : "sh1/172.30.2.201:27011" }
  balancer:
        Currently enabled:  yes
        Currently running:  no
        Failed balancer rounds in last 5 attempts:  5
        Last reported error:  socket exception [CONNECT_ERROR] for sh0/172.30.2.201:27010
        Time of Reported error:  Sat Oct 10 2015 10:44:56 GMT+0800 (CST)
        Migration Results for the last 24 hours:
                19 : Success
                2 : Failed with error 'migration already in progress', from sh0 to sh1
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "mydb",  "partitioned" : true,  "primary" : "sh0" }
                mydb.test
                        shard key: { "Uid" : 1 }
                        chunks:
                                sh0     11
                                sh1     10
                        too many chunks to print, use verbose if you want to force print

3)测试可用性

mongos> db.test.find({Uid:12});
{ "_id" : ObjectId("56178e473af16d338338d3bf"), "Uid" : 12, "Name" : "zhanjindong", "Age" : 13, "Date" : ISODate("2015-10-09T09:52:07.205Z") }
mongos> db.test.find({Uid:8});
Error: error: {
        "$err" : "socket exception [CONNECT_ERROR] for sh0/172.30.2.201:27010",
        "code" : 11002,
        "shard" : "sh0"
}
mongos> db.test.insert({"Uid":12,"Name":"zhanjindong","Age":13,"Date":new Date()});
WriteResult({ "nInserted" : 1 })
mongos> db.test.insert({"Uid":8,"Name":"zhanjindong","Age":13,"Date":new Date()});
WriteResult({
        "nInserted" : 0,
        "writeError" : {
                "code" : 7,
                "errmsg" : "could not contact primary for replica set sh0"
        }
})
mongos>
bye

设计到sh0的操作都会报错,其他分片的可以正常执行

重新启动sh0

$ sudo /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh0/mongodb.conf
about to fork child process, waiting until server is ready for connections.
forked process: 2534
child process started successfully, parent exiting
$ ps aux | grep mongo | grep -v grep
root      2534  1.1  0.0 971800 64060 ?        Sl   10:53   0:00 /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh0/mongodb.conf
root     41509  0.5  0.1 1028196 108804 ?      Sl   Oct09   7:33 /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh1/mongodb.conf
root     41855  0.6  0.0 566248 55572 ?        Sl   Oct09   9:17 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf0/config.conf
root     41870  0.5  0.0 565220 56480 ?        Sl   Oct09   8:33 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf1/config.conf
root     42170  0.5  0.0 565216 57240 ?        Sl   Oct09   8:29 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf2/config.conf
root     42233  0.3  0.0 260180 13388 ?        Sl   Oct09   5:42 /opt/mongodb/bin/mongos --config /data/mongodb/conf/ms0/mongos.conf
root     42286  0.3  0.0 259148 11572 ?        Sl   Oct09   4:32 /opt/mongodb/bin/mongos --config /data/mongodb/conf/ms1/mongos.conf
$ mongo --port 30000
MongoDB shell version: 3.0.6
connecting to: 127.0.0.1:30000/test
mongos> use mydb;
switched to db mydb
mongos> db.test.find({Uid:8});
{ "_id" : ObjectId("56178e453af16d338338d3bb"), "Uid" : 8, "Name" : "zhanjindong", "Age" : 13, "Date" : ISODate("2015-10-09T09:52:05.539Z") }
mongos> db.test.find({Uid:12});
{ "_id" : ObjectId("56178e473af16d338338d3bf"), "Uid" : 12, "Name" : "zhanjindong", "Age" : 13, "Date" : ISODate("2015-10-09T09:52:07.205Z") }
{ "_id" : ObjectId("56187d6743c2e9badb0ca53b"), "Uid" : 12, "Name" : "zhanjindong", "Age" : 13, "Date" : ISODate("2015-10-10T02:52:23.497Z") }
mongos> db.test.insert({"Uid":8,"Name":"zhanjindong","Age":13,"Date":new Date()});
WriteResult({ "nInserted" : 1 })

两个分片都可读写

四、Config Server故障

1.单节点故障

1)停掉cf0

$ ps axu | grep mongo | grep -v grep
root      2534  0.4  0.1 982052 67608 ?        Sl   10:53   0:03 /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh0/mongodb.conf
root     41509  0.5  0.1 1028196 108808 ?      Sl   Oct09   7:37 /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh1/mongodb.conf
root     41855  0.6  0.0 566248 55444 ?        Sl   Oct09   9:23 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf0/config.conf
root     41870  0.5  0.0 565220 56456 ?        Sl   Oct09   8:38 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf1/config.conf
root     42170  0.5  0.0 565216 56820 ?        Sl   Oct09   8:34 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf2/config.conf
root     42233  0.3  0.0 261204 13408 ?        Sl   Oct09   5:45 /opt/mongodb/bin/mongos --config /data/mongodb/conf/ms0/mongos.conf
root     42286  0.3  0.0 259148 11596 ?        Sl   Oct09   4:35 /opt/mongodb/bin/mongos --config /data/mongodb/conf/ms1/mongos.conf
[wenhang.pan@zhaopin-2-201 ~]$ sudo kill 41855
[wenhang.pan@zhaopin-2-201 ~]$ ps axu | grep mongo | grep -v grep
root      2534  0.4  0.1 982052 67636 ?        Sl   10:53   0:03 /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh0/mongodb.conf
root     41509  0.5  0.1 1028196 108808 ?      Sl   Oct09   7:37 /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh1/mongodb.conf
root     41870  0.5  0.0 565220 56324 ?        Sl   Oct09   8:38 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf1/config.conf
root     42170  0.5  0.0 565216 56600 ?        Sl   Oct09   8:34 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf2/config.conf
root     42233  0.3  0.0 261204 13408 ?        Sl   Oct09   5:45 /opt/mongodb/bin/mongos --config /data/mongodb/conf/ms0/mongos.conf
root     42286  0.3  0.0 259148 11596 ?        Sl   Oct09   4:35 /opt/mongodb/bin/mongos --config /data/mongodb/conf/ms1/mongos.conf

2)查看状态

$ mongo --port 30000
MongoDB shell version: 3.0.6
connecting to: 127.0.0.1:30000/test
mongos> sh.status();
--- Sharding Status ---
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("561728b4030ea038bcb57fa0")
}
  shards:
        {  "_id" : "sh0",  "host" : "sh0/172.30.2.201:27010" }
        {  "_id" : "sh1",  "host" : "sh1/172.30.2.201:27011" }
  balancer:
        Currently enabled:  yes
        Currently running:  yes
                Balancer lock taken at Sat Oct 10 2015 11:08:14 GMT+0800 (CST) by zhaopin-2-201:30000:1444358323:1804289383:Balancer:1681692777
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours:
                19 : Success
                2 : Failed with error 'migration already in progress', from sh0 to sh1
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "mydb",  "partitioned" : true,  "primary" : "sh0" }
                mydb.test
                        shard key: { "Uid" : 1 }
                        chunks:
                                sh0     11
                                sh1     10
                        too many chunks to print, use verbose if you want to force print

3)测试可用性

mongos> use mydb;
switched to db mydb
mongos> db.test.find({Uid:12});
{ "_id" : ObjectId("56178e473af16d338338d3bf"), "Uid" : 12, "Name" : "zhanjindong", "Age" : 13, "Date" : ISODate("2015-10-09T09:52:07.205Z") }
{ "_id" : ObjectId("56187d6743c2e9badb0ca53b"), "Uid" : 12, "Name" : "zhanjindong", "Age" : 13, "Date" : ISODate("2015-10-10T02:52:23.497Z") }
mongos> db.test.find({Uid:8});
{ "_id" : ObjectId("56178e453af16d338338d3bb"), "Uid" : 8, "Name" : "zhanjindong", "Age" : 13, "Date" : ISODate("2015-10-09T09:52:05.539Z") }
{ "_id" : ObjectId("56187e117ec37f9bb7244496"), "Uid" : 8, "Name" : "zhanjindong", "Age" : 13, "Date" : ISODate("2015-10-10T02:55:13.529Z") }
mongos> db.test.insert({"Uid":8,"Name":"zhanjindong","Age":13,"Date":new Date()});
WriteResult({ "nInserted" : 1 })
mongos> db.test.insert({"Uid":12,"Name":"zhanjindong","Age":13,"Date":new Date()});
WriteResult({ "nInserted" : 1 })
mongos>
bye

集群可正常使用

2.双节点故障

1)停掉cf1

$ ps axu | grep mongo | grep -v grep
root      2534  0.4  0.1 982052 68128 ?        Sl   10:53   0:05 /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh0/mongodb.conf
root     41509  0.5  0.1 1028196 108892 ?      Sl   Oct09   7:39 /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh1/mongodb.conf
root     41870  0.5  0.1 565220 92328 ?        Sl   Oct09   8:40 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf1/config.conf
root     42170  0.5  0.1 565216 90392 ?        Sl   Oct09   8:36 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf2/config.conf
root     42233  0.3  0.0 261204 13508 ?        Sl   Oct09   5:46 /opt/mongodb/bin/mongos --config /data/mongodb/conf/ms0/mongos.conf
root     42286  0.3  0.0 260172 11716 ?        Sl   Oct09   4:37 /opt/mongodb/bin/mongos --config /data/mongodb/conf/ms1/mongos.conf
$ sudo kill 41870
$ ps axu | grep mongo | grep -v grep
root      2534  0.4  0.1 982052 68128 ?        Sl   10:53   0:05 /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh0/mongodb.conf
root     41509  0.5  0.1 1028196 108892 ?      Sl   Oct09   7:39 /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh1/mongodb.conf
root     42170  0.5  0.1 565216 90408 ?        Sl   Oct09   8:36 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf2/config.conf
root     42233  0.3  0.0 261204 13508 ?        Sl   Oct09   5:46 /opt/mongodb/bin/mongos --config /data/mongodb/conf/ms0/mongos.conf
root     42286  0.3  0.0 260172 11716 ?        Sl   Oct09   4:37 /opt/mongodb/bin/mongos --config /data/mongodb/conf/ms1/mongos.conf

2)查看集群状态

$ mongo --port 30000
MongoDB shell version: 3.0.6
connecting to: 127.0.0.1:30000/test
mongos> sh.status();
--- Sharding Status ---
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("561728b4030ea038bcb57fa0")
}
  shards:
        {  "_id" : "sh0",  "host" : "sh0/172.30.2.201:27010" }
        {  "_id" : "sh1",  "host" : "sh1/172.30.2.201:27011" }
  balancer:
        Currently enabled:  yes
        Currently running:  yes
                Balancer lock taken at Sat Oct 10 2015 11:08:14 GMT+0800 (CST) by zhaopin-2-201:30000:1444358323:1804289383:Balancer:1681692777
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours:
                19 : Success
                2 : Failed with error 'migration already in progress', from sh0 to sh1
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "mydb",  "partitioned" : true,  "primary" : "sh0" }
                mydb.test
                        shard key: { "Uid" : 1 }
                        chunks:
                                sh0     11
                                sh1     10
                        too many chunks to print, use verbose if you want to force print

3)测试可用性

mongos> use mydb;
switched to db mydb
mongos> db.test.find({Uid:8});
{ "_id" : ObjectId("56178e453af16d338338d3bb"), "Uid" : 8, "Name" : "zhanjindong", "Age" : 13, "Date" : ISODate("2015-10-09T09:52:05.539Z") }
{ "_id" : ObjectId("56187e117ec37f9bb7244496"), "Uid" : 8, "Name" : "zhanjindong", "Age" : 13, "Date" : ISODate("2015-10-10T02:55:13.529Z") }
{ "_id" : ObjectId("5618824d7bd66d65a90fb01c"), "Uid" : 8, "Name" : "zhanjindong", "Age" : 13, "Date" : ISODate("2015-10-10T03:13:17.564Z") }
mongos> db.test.find({Uid:12});
{ "_id" : ObjectId("56178e473af16d338338d3bf"), "Uid" : 12, "Name" : "zhanjindong", "Age" : 13, "Date" : ISODate("2015-10-09T09:52:07.205Z") }
{ "_id" : ObjectId("56187d6743c2e9badb0ca53b"), "Uid" : 12, "Name" : "zhanjindong", "Age" : 13, "Date" : ISODate("2015-10-10T02:52:23.497Z") }
{ "_id" : ObjectId("561882527bd66d65a90fb01d"), "Uid" : 12, "Name" : "zhanjindong", "Age" : 13, "Date" : ISODate("2015-10-10T03:13:22.594Z") }
mongos> db.test.insert({"Uid":12,"Name":"zhanjindong","Age":13,"Date":new Date()});
WriteResult({ "nInserted" : 1 })
mongos> db.test.insert({"Uid":8,"Name":"zhanjindong","Age":13,"Date":new Date()});
WriteResult({ "nInserted" : 1 })
mongos>
bye

集群可以正常使用

4)重新启动cf0和cf1

$ ps axu | grep mongo | grep -v grep
root      2534  0.4  0.1 982052 68228 ?        Sl   10:53   0:05 /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh0/mongodb.conf
root     41509  0.5  0.1 1028196 107976 ?      Sl   Oct09   7:39 /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh1/mongodb.conf
root     42170  0.5  0.1 565216 93140 ?        Sl   Oct09   8:37 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf2/config.conf
root     42233  0.3  0.0 261204 13540 ?        Sl   Oct09   5:47 /opt/mongodb/bin/mongos --config /data/mongodb/conf/ms0/mongos.conf
root     42286  0.3  0.0 260172 11752 ?        Sl   Oct09   4:37 /opt/mongodb/bin/mongos --config /data/mongodb/conf/ms1/mongos.conf
$ sudo /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf0/config.conf
about to fork child process, waiting until server is ready for connections.
forked process: 7309
child process started successfully, parent exiting
$ sudo /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf1/config.conf
about to fork child process, waiting until server is ready for connections.
forked process: 7355
child process started successfully, parent exiting
$ ps axu | grep mongo | grep -v grep
root      2534  0.4  0.1 982052 68228 ?        Sl   10:53   0:06 /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh0/mongodb.conf
root      7309  0.6  0.0 545696 37504 ?        Sl   11:17   0:00 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf0/config.conf
root      7355  0.5  0.0 542616 33748 ?        Sl   11:17   0:00 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf1/config.conf
root     41509  0.5  0.1 1028196 108188 ?      Sl   Oct09   7:40 /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh1/mongodb.conf
root     42170  0.5  0.0 565216 57992 ?        Sl   Oct09   8:37 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf2/config.conf
root     42233  0.3  0.0 261204 13540 ?        Sl   Oct09   5:47 /opt/mongodb/bin/mongos --config /data/mongodb/conf/ms0/mongos.conf
root     42286  0.3  0.0 260172 11768 ?        Sl   Oct09   4:38 /opt/mongodb/bin/mongos --config /data/mongodb/conf/ms1/mongos.conf

5)解决方案

参考:

http://docs.mongodb.org/manual/tutorial/replace-config-server/

http://docs.mongodb.org/manual/tutorial/migrate-config-servers-with-different-hostnames/

五、Query Routers故障

1)将ms0停机

$ ps axu | grep mongo | grep -v grep
root      2534  0.4  0.1 982052 68228 ?        Sl   10:53   0:06 /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh0/mongodb.conf
root      7309  0.6  0.0 545696 37504 ?        Sl   11:17   0:00 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf0/config.conf
root      7355  0.5  0.0 542616 33748 ?        Sl   11:17   0:00 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf1/config.conf
root     41509  0.5  0.1 1028196 108188 ?      Sl   Oct09   7:40 /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh1/mongodb.conf
root     42170  0.5  0.0 565216 57992 ?        Sl   Oct09   8:37 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf2/config.conf
root     42233  0.3  0.0 261204 13540 ?        Sl   Oct09   5:47 /opt/mongodb/bin/mongos --config /data/mongodb/conf/ms0/mongos.conf
root     42286  0.3  0.0 260172 11768 ?        Sl   Oct09   4:38 /opt/mongodb/bin/mongos --config /data/mongodb/conf/ms1/mongos.conf
$ sudo kill 42233
$ ps axu | grep mongo | grep -v grep
root      2534  0.4  0.1 982052 68408 ?        Sl   10:53   0:06 /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh0/mongodb.conf
root      7309  0.6  0.0 547752 39096 ?        Sl   11:17   0:01 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf0/config.conf
root      7355  0.6  0.0 544672 34256 ?        Sl   11:17   0:01 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf1/config.conf
root     41509  0.5  0.1 1028196 108164 ?      Sl   Oct09   7:40 /opt/mongodb/bin/mongod --config /data/mongodb/conf/sh1/mongodb.conf
root     42170  0.5  0.0 565216 57796 ?        Sl   Oct09   8:38 /opt/mongodb/bin/mongod --config /data/mongodb/conf/cf2/config.conf
root     42286  0.3  0.0 260172 11768 ?        Sl   Oct09   4:38 /opt/mongodb/bin/mongos --config /data/mongodb/conf/ms1/mongos.conf

2)查看集群状态

$ mongo --port 30000
MongoDB shell version: 3.0.6
connecting to: 127.0.0.1:30000/test
2015-10-10T11:21:04.681+0800 W NETWORK  Failed to connect to 127.0.0.1:30000, reason: errno:111 Connection refused
2015-10-10T11:21:04.683+0800 E QUERY    Error: couldn't connect to server 127.0.0.1:30000 (127.0.0.1), connection attempt failed
    at connect (src/mongo/shell/mongo.js:181:14)
    at (connect):1:6 at src/mongo/shell/mongo.js:181
exception: connect failed
$ mongo --port 30001
MongoDB shell version: 3.0.6
connecting to: 127.0.0.1:30001/test
mongos> sh.status();
--- Sharding Status ---
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("561728b4030ea038bcb57fa0")
}
  shards:
        {  "_id" : "sh0",  "host" : "sh0/172.30.2.201:27010" }
        {  "_id" : "sh1",  "host" : "sh1/172.30.2.201:27011" }
  balancer:
        Currently enabled:  yes
        Currently running:  no
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours:
                19 : Success
                2 : Failed with error 'migration already in progress', from sh0 to sh1
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "mydb",  "partitioned" : true,  "primary" : "sh0" }
                mydb.test
                        shard key: { "Uid" : 1 }
                        chunks:
                                sh0     11
                                sh1     10
                        too many chunks to print, use verbose if you want to force print
mongos>
bye

可见30000端口的mongos已经宕机,不能提供服务了,但30001端口的mongos可以使用

3)测试可用性

mongos> use mydb;
switched to db mydb
mongos> db.test.find({Uid:12});
{ "_id" : ObjectId("56178e473af16d338338d3bf"), "Uid" : 12, "Name" : "zhanjindong", "Age" : 13, "Date" : ISODate("2015-10-09T09:52:07.205Z") }
{ "_id" : ObjectId("56187d6743c2e9badb0ca53b"), "Uid" : 12, "Name" : "zhanjindong", "Age" : 13, "Date" : ISODate("2015-10-10T02:52:23.497Z") }
{ "_id" : ObjectId("561882527bd66d65a90fb01d"), "Uid" : 12, "Name" : "zhanjindong", "Age" : 13, "Date" : ISODate("2015-10-10T03:13:22.594Z") }
{ "_id" : ObjectId("561882f9ff80495f50bec62f"), "Uid" : 12, "Name" : "zhanjindong", "Age" : 13, "Date" : ISODate("2015-10-10T03:16:09.930Z") }
mongos> db.test.find({Uid:8});
{ "_id" : ObjectId("56178e453af16d338338d3bb"), "Uid" : 8, "Name" : "zhanjindong", "Age" : 13, "Date" : ISODate("2015-10-09T09:52:05.539Z") }
{ "_id" : ObjectId("56187e117ec37f9bb7244496"), "Uid" : 8, "Name" : "zhanjindong", "Age" : 13, "Date" : ISODate("2015-10-10T02:55:13.529Z") }
{ "_id" : ObjectId("5618824d7bd66d65a90fb01c"), "Uid" : 8, "Name" : "zhanjindong", "Age" : 13, "Date" : ISODate("2015-10-10T03:13:17.564Z") }
{ "_id" : ObjectId("561882fdff80495f50bec630"), "Uid" : 8, "Name" : "zhanjindong", "Age" : 13, "Date" : ISODate("2015-10-10T03:16:13.788Z") }
mongos> db.test.insert({"Uid":8,"Name":"zhanjindong","Age":13,"Date":new Date()});
WriteResult({ "nInserted" : 1 })
mongos> db.test.insert({"Uid":12,"Name":"zhanjindong","Age":13,"Date":new Date()});
WriteResult({ "nInserted" : 1 })

可见集群可以正常使用

4)解决方案

可以在代码中配置连接多台mongo实现failover,java代码如下:

List<ServerAddress> mongoHostList = new ArrayList<ServerAddress>();
mongoHostList.add(new ServerAddress("ip",port));
mongoHostList.add(new ServerAddress("ip",port));
mongoHostList.add(new ServerAddress("ip",port));
MongoClient mg = new MongoClient(mongoHostList);

六、结论

1.Shard Server故障

1)分片节点故障,可以按照MongoDB副本集故障测试和解决方案中解决,仍可以实现可用性

2)整个分片故障,如果可以重新启动的话,仍然可以使用,否则只能从备份中恢复,必然会丢数据

2.Config Server故障

只要有可用的配置服务器,整个集群都可以正常使用

3.Query Routers故障

只要有可用的mongos服务器,整个集群就可以正常使用,要客户端也实现failover,需要将代码写成连接多台mongo的方式

点赞
收藏
评论区
推荐文章
blmius blmius
3年前
MySQL:[Err] 1292 - Incorrect datetime value: ‘0000-00-00 00:00:00‘ for column ‘CREATE_TIME‘ at row 1
文章目录问题用navicat导入数据时,报错:原因这是因为当前的MySQL不支持datetime为0的情况。解决修改sql\mode:sql\mode:SQLMode定义了MySQL应支持的SQL语法、数据校验等,这样可以更容易地在不同的环境中使用MySQL。全局s
皕杰报表之UUID
​在我们用皕杰报表工具设计填报报表时,如何在新增行里自动增加id呢?能新增整数排序id吗?目前可以在新增行里自动增加id,但只能用uuid函数增加UUID编码,不能新增整数排序id。uuid函数说明:获取一个UUID,可以在填报表中用来创建数据ID语法:uuid()或uuid(sep)参数说明:sep布尔值,生成的uuid中是否包含分隔符'',缺省为
待兔 待兔
3个月前
手写Java HashMap源码
HashMap的使用教程HashMap的使用教程HashMap的使用教程HashMap的使用教程HashMap的使用教程22
Jacquelyn38 Jacquelyn38
3年前
2020年前端实用代码段,为你的工作保驾护航
有空的时候,自己总结了几个代码段,在开发中也经常使用,谢谢。1、使用解构获取json数据let jsonData  id: 1,status: "OK",data: 'a', 'b';let  id, status, data: number   jsonData;console.log(id, status, number )
Wesley13 Wesley13
3年前
Java获得今日零时零分零秒的时间(Date型)
publicDatezeroTime()throwsParseException{    DatetimenewDate();    SimpleDateFormatsimpnewSimpleDateFormat("yyyyMMdd00:00:00");    SimpleDateFormatsimp2newS
Stella981 Stella981
3年前
KVM调整cpu和内存
一.修改kvm虚拟机的配置1、virsheditcentos7找到“memory”和“vcpu”标签,将<namecentos7</name<uuid2220a6d1a36a4fbb8523e078b3dfe795</uuid
Wesley13 Wesley13
3年前
mysql设置时区
mysql设置时区mysql\_query("SETtime\_zone'8:00'")ordie('时区设置失败,请联系管理员!');中国在东8区所以加8方法二:selectcount(user\_id)asdevice,CONVERT\_TZ(FROM\_UNIXTIME(reg\_time),'08:00','0
Wesley13 Wesley13
3年前
00:Java简单了解
浅谈Java之概述Java是SUN(StanfordUniversityNetwork),斯坦福大学网络公司)1995年推出的一门高级编程语言。Java是一种面向Internet的编程语言。随着Java技术在web方面的不断成熟,已经成为Web应用程序的首选开发语言。Java是简单易学,完全面向对象,安全可靠,与平台无关的编程语言。
Stella981 Stella981
3年前
Django中Admin中的一些参数配置
设置在列表中显示的字段,id为django模型默认的主键list_display('id','name','sex','profession','email','qq','phone','status','create_time')设置在列表可编辑字段list_editable
Wesley13 Wesley13
3年前
MySQL部分从库上面因为大量的临时表tmp_table造成慢查询
背景描述Time:20190124T00:08:14.70572408:00User@Host:@Id:Schema:sentrymetaLast_errno:0Killed:0Query_time:0.315758Lock_
Python进阶者 Python进阶者
9个月前
Excel中这日期老是出来00:00:00,怎么用Pandas把这个去除
大家好,我是皮皮。一、前言前几天在Python白银交流群【上海新年人】问了一个Pandas数据筛选的问题。问题如下:这日期老是出来00:00:00,怎么把这个去除。二、实现过程后来【论草莓如何成为冻干莓】给了一个思路和代码如下:pd.toexcel之前把这