MongoDB Sharded ReplicaSet with GridFS

Note if you are in a hurry please read the synopses below and refer to the source code on GitHub – https://github.com/yazad3/mongodb-sharded-replSet-GridFS/ (start with startMongoDBShardedReplSet.sh)

Synopses

This blog will describe the steps in setting up a MongoDB sharded replica set for GridFS. The steps to shard or create a replica set for GridFS is no different from a regular MongoDB Database or Collection. For this blog I have used a Ubuntu Desktop Virtual Machine, though one could setup this environment on other OSs like Windows, UNIX, etc.

I will start with setting up an environment with 2 shards (shard1 and shard2),

MongoDB Sharded Replica Set with GridFS - initial setup

Illustration 1: Two Shard Replica Set

 

Eventually I will add one additional shard to the setup,

 

MongoDB Sharded Replica Set with GridFS - Add an Additional Shard

Illustration 2: Add an additional shard

 

I wanted to demonstrate that multiple hosts could be used here to create in a sort a distributed File System using GridFS, hence every sharded replica set node is on a different host (actually on the same host – used host mapping entries to create multiple hosts). Add the following entries to the /etc/hosts file. Note all the IPs are pointing to local host since I didn’t want to setup those many VMs yet wanted to demonstrate a multi-host setup.

#
127.1.1.1 shard1_repl1.mongo-server.com
127.1.1.2 shard1_repl2.mongo-server.com
127.1.1.3 shard1_repl3.mongo-server.com
127.1.1.4 shard2_repl1.mongo-server.com
127.1.1.5 shard2_repl2.mongo-server.com
127.1.1.6 shard2_repl3.mongo-server.com
127.1.1.7 config-1.mongo-server.com
127.1.1.8 config-2.mongo-server.com
127.1.1.9 config-3.mongo-server.com
127.1.1.10 mongos.mongo-server.com

#

127.1.1.11 shard3_repl1.mongo-server.com

127.1.1.12 shard3_repl2.mongo-server.com

127.1.1.13 shard3_repl3.mongo-server.com

 

Table 1: /etc/hosts snippet

I had also added extra disks to my virtual machine so I could allocate one disk to each mapped host above (using one disk for all logs for simplicity; /media/app is where mongodb is installed). See output of “df -h” command below,

/dev/sdc 7.8G 247M 7.2G 4% /media/app/dev/sdf 2.9G 176M 2.6G 7% /media/config-1/dev/sdb 2.0G 175M 1.7G 10% /media/config-2/dev/sdd 2.0G 175M 1.7G 10% /media/config-3/dev/sdh1 4.8G 363M 4.2G 8% /media/shard1_repl1

/dev/sdg 4.8G 363M 4.2G 8% /media/shard1_repl2

/dev/sdj 4.8G 363M 4.2G 8% /media/shard1_repl3

/dev/sdk 4.8G 363M 4.2G 8% /media/shard2_repl1

/dev/sdl 4.8G 363M 4.2G 8% /media/shard2_repl2

/dev/sde 4.8G 363M 4.2G 8% /media/shard2_repl3

/dev/sdn 2.4G 356M 2.0G 16% /media/shard3_repl1

/dev/sdo 2.4G 356M 2.0G 16% /media/shard3_repl2

/dev/sdp 2.4G 356M 2.0G 16% /media/shard3_repl3

/dev/sdi 7.8G 23M 7.4G 1% /media/log

Table 2: df -h output snippet

Setup

Setting up sharded replica set

Now let’s get started with initial 2 shard setup see Illustration 1: Two Shard Replica Set.

#!/bin/bash

shard=$1
replSet=$2
port=$3

basePath=/media
dbPathDir=$basePath/$replSet/data/
logPathDir=$basePath/log/$shard/$replSet

#Create directories for DB and Log
mkdir -p $dbPathDir $logPathDir

logPath=$logPathDir/$replSet.log

bind_ip=$replSet.mongo-server.com

mongod –replSet $shard –logpath $logPath –dbpath $dbPathDir –bind_ip $bind_ip –port $port –shardsvr –fork

Table 3: _createMongoDBShardedReplSet.sh

Invoking the script create one replica node in a shard, example,

./_createMongoDBShardedReplSet.sh shard1 shard1_repl1 3001

Table 4: Invoke – _createMongoDBShardedReplSet.sh

Repeat this step for all the nodes in the sharded replica set.

Setting up Config Server

Note that the current version of MongoDB sharding requires exactly 3 config instances setup. The script to create a config server is,

#!/bin/bash

configSvrName=$1
port=$2

basePath=/media
dbPathDir=$basePath/$configSvrName/data/
logPathDir=$basePath/log/$configSvrName

mkdir -p $dbPathDir $logPathDir

logPath=$logPathDir/$configSvrName.log

bind_ip=${configSvrName}.mongo-server.com

mongod –logpath ${logPath} –dbpath ${dbPathDir} –bind_ip ${bind_ip} –port ${port} –configsvr –fork

Table 5: _createMongoDBConfigSvr.sh

 

This script can be invoked as show in the below example,

./_createMongoDBConfigSvr.sh config-1 8001

Table 6: Invoke _createMongoDBConfigSvr.sh

Setup mongos process

#!/bin/bash

configServerPort1=$1
configServerPort2=$2
configServerPort3=$3

configHost1=config-1.mongo-server.com
configHost2=config-2.mongo-server.com
configHost3=config-3.mongo-server.com

configdb=${configHost1}:${configServerPort1},${configHost2}:${configServerPort2},${configHost3}:${configServerPort3}

basePath=/media
logPathDir=$basePath/log/mongos

mkdir -p $logPathDir

logPath=$logPathDir/mongos.log

mongos –logpath $logPath –configdb $configdb –fork

Table 7: _configMongoDBShard.sh

Invoke _configMongoDBShard.sh to start the mongos process.

./_configMongoDBShard.sh 8001 8002 8003

Table 8: Invoke _configMongoDBShard.sh

(Note the above script can be improved by taking the host names as parameters).

Putting it altogether

View the start-up script – startMongoDBShardedReplSet.sh (https://github.com/yazad3/mongodb-sharded-replSet-GridFS/blob/master/startMongoDBShardedReplSet.sh)

For the first time run (after hosts and mounts are setup),

./startMongoDBShardedReplSet.sh TRUE

Table 9: Invoking startMongoDBShardedReplSet.sh for the first time

Note the TRUE parameter is significant for initial setup – this TRUE parameter must be skipped for subsequent start-ups,

./startMongoDBShardedReplSet.sh

Table 10: Invoking startMongoDBShardedReplSet.sh subsequently

Adding an additional Shard

To add an additional shard, you can use the same script used in the “Setting up sharded replica set” section. The shards could be running while you do this. See https://github.com/yazad3/mongodb-sharded-replSet-GridFS/blob/master/add3rdShard.sh for an example of adding a shard to an existing environment of shared replica sets.

Stop and cleanup

The stop script is available here – it stops all the nodes in the correct sequence (including the replica set nodes on the 3rd shard) – https://github.com/yazad3/mongodb-sharded-replSet-GridFS/blob/master/stopMongoDBShardedReplSet.sh

Finally if you have messed something up – you can run cleanup (WARNING: Cleanup will delete all your data!) – https://github.com/yazad3/mongodb-sharded-replSet-GridFS/blob/master/cleanup.sh

Advertisements

2 thoughts on “MongoDB Sharded ReplicaSet with GridFS

  1. Hi Yazard,

    Thanks for the article. I have a couple of questions if you don’t mind? What was your use case for this setup? For example what data was you using in GridFS?

    Thanks

    Simon

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s